How to go to your page This eBook contains two volumes. Each volume has its own page numbering scheme, consisting of a volume number and a page number, separated by a hyphen. For example, to go to page 5 of Volume 1, type v1-5 in the “page #” box at the top of the screen and click “Go.” To go to page 5 of Volume 2, type v2-5 in the "page #" box… and so forth.
The PSI Handbook of Virtual Environments for Training and Education
Praeger Security International Advisory Board Board Cochairs Loch K. Johnson, Regents Professor of Public and International Affairs, School of Public and International Affairs, University of Georgia (U.S.A.) Paul Wilkinson, Professor of International Relations and Chairman of the Advisory Board, Centre for the Study of Terrorism and Political Violence, University of St. Andrews (U.K.) Members Anthony H. Cordesman, Arleigh A. Burke Chair in Strategy, Center for Strategic and International Studies (U.S.A.) The´re`se Delpech, Director of Strategic Affairs, Atomic Energy Commission, and Senior Research Fellow, CERI (Fondation Nationale des Sciences Politiques), Paris (France) Sir Michael Howard, former Chichele Professor of the History of War and Regis Professor of Modern History, Oxford University, and Robert A. Lovett Professor of Military and Naval History, Yale University (U.K.) Lieutenant General Claudia J. Kennedy, USA (Ret.), former Deputy Chief of Staff for Intelligence, Department of the Army (U.S.A.) Paul M. Kennedy, J. Richardson Dilworth Professor of History and Director, International Security Studies, Yale University (U.S.A.) Robert J. O’Neill, former Chichele Professor of the History of War, All Souls College, Oxford University (Australia) Shibley Telhami, Anwar Sadat Chair for Peace and Development, Department of Government and Politics, University of Maryland (U.S.A.) Fareed Zakaria, Editor, Newsweek International (U.S.A.)
The PSI Handbook of Virtual Environments for Training and Education DEVELOPMENTS FOR THE MILITARY AND BEYOND Volume 1 Learning, Requirements, and Metrics Edited by Dylan Schmorrow, Joseph Cohn, and Denise Nicholson
Technology, Psychology, and Health
PRAEGER SECURITY INTERNATIONAL
Westport, Connecticut
•
London
Library of Congress Cataloging-in-Publication Data The PSI handbook of virtual environments for training and education : developments for the military and beyond. p. cm. – (Technology, psychology, and health, ISSN 1942–7573 ; v. 1-3) Includes bibliographical references and index. ISBN 978–0–313–35165–5 (set : alk. paper) – ISBN 978–0–313–35167–9 (v. 1 : alk. paper) – ISBN 978–0–313–35169–3 (v. 2 : alk. paper) – ISBN 978–0–313–35171–6 (v. 3 : alk. paper) 1. Military education–United States. 2. Human-computer interaction. 3. Computer-assisted instruction. 4. Virtual reality. I. Schmorrow, Dylan, 1967- II. Cohn, Joseph, 1969- III. Nicholson, Denise, 1967- IV. Praeger Security International. V. Title: Handbook of virtual environments for training and education. VI. Title: Praeger Security International handbook of virtual environments for training and education. U408.3.P75 2009 355.0078’5–dc22 2008027367 British Library Cataloguing in Publication Data is available. Copyright © 2009 by Dylan Schmorrow, Joseph Cohn, and Denise Nicholson All rights reserved. No portion of this book may be reproduced, by any process or technique, without the express written consent of the publisher. Library of Congress Catalog Card Number: 2008027367 ISBN-13: 978–0–313–35165–5 (set) 978–0–313–35167–9 (vol. 1) 978–0–313–35169–3 (vol. 2) 978–0–313–35171–6 (vol. 3) ISSN: 1942–7573 First published in 2009 Praeger Security International, 88 Post Road West, Westport, CT 06881 An imprint of Greenwood Publishing Group, Inc. www.praeger.com Printed in the United States of America
The paper used in this book complies with the Permanent Paper Standard issued by the National Information Standards Organization (Z39.48–1984). 10 9 8 7 6 5 4 3 2 1
To our families, and to the men and women who have dedicated their lives to educate, train, and defend to keep them safe
This page intentionally left blank
CONTENTS
Series Foreword
xi
Preface by G. Vincent Amico
xiii
Acknowledgments
xvii
SECTION 1: LEARNING Section Perspective Gwendolyn Campbell
1
Part I: Biological Band
7
Chapter 1: The Neurophysiology of Learning and Memory: Implications for Training Catherine Poulsen, Phan Luu, and Don Tucker
7
Part II: Cognitive/Rational Band
31
Chapter 2: The Role of Individual Differences in Virtual Environment Based Training Clint Bowers, Jennifer Vogel-Walcutt, and Jan Cannon-Bowers
31
Chapter 3: Cognitive Transformation Theory: Contrasting Cognitive and Behavioral Learning Gary Klein and Holly C. Baxter
50
Part III: Social Band
66
Chapter 4: Creating Expertise with Technology Based Training Karol Ross, Jennifer Phillips, and Joseph Cohn
66
Chapter 5: Cybernetics: Redefining Individualized Training Elizabeth Biddle, Dennis McBride, and Linda Malone
81
viii
Contents
Part IV: Spanning the Bands
97
Chapter 6: A Theoretical Framework for Developing Systematic Instructional Guidance for Virtual Environment Training Wendi Van Buskirk, Jessica Cornejo, Randolph Astwood, Steven Russell, David Dorsey, and Joseph Dalton
97
SECTION 2: REQUIREMENTS ANALYSIS Section Perspective Kay Stanney
115
Part V: Methods
131
Chapter 7: Applied Methods for Requirements Engineering Tom Mayfield and Deborah Boehm-Davis
131
Chapter 8: Creating Tactical Expertise: Guidance for Scenario Developers and Instructors Jennifer Phillips, Karol Ross, and Joseph Cohn
148
Part VI: Requirements Analysis
165
Chapter 9: Training Systems Requirements Analysis Laura Milham, Meredith Bell Carroll, Kay Stanney, and William Becker
165
Chapter 10: Building Virtual Environment Training Systems for Success Joseph Cohn
193
Chapter 11: Learning to Become a Creative Systems Analyst Lemai Nguyen and Jacob Cybulski
208
SECTION 3: PERFORMANCE ASSESSMENT Section Perspective Eduardo Salas and Michael A. Rosen
227
Part VII: Purpose of Measurement
236
Chapter 12: Measurement and Assessment for Training in Virtual Environments Jared Freeman, Webb Stacy, and Orlando Olivares
236
Chapter 13: Training Advanced Skills in Simulation Based Training Jennifer Fowlkes, Kelly Neville, Razia Nayeem, and Susan Eitelman Dean
251
Chapter 14: Examining Measures of Team Cognition in Virtual Teams C. Shawn Burke, Heather Lum, Shannon Scielzo, Kimberly Smith-Jentsch, and Eduardo Salas
266
Contents
ix
Chapter 15: Virtual Environment Performance Assessment: Organizational Level Considerations Robert D. Pritchard, Deborah DiazGranados, Sallie J. Weaver, Wendy L. Bedwell, and Melissa M. Harrell
284
Part VIII: Methods in Performance Assessment
300
Chapter 16: Assessment Models and Tools for Virtual Environment Training William L. Bewley, Gregory K. W. K. Chung, Girlie C. Delacruz, and Eva L. Baker
300
Chapter 17: Automated Performance Assessment of Teams in Virtual Environments Peter Foltz, Noelle LaVoie, Rob Oberbreckling, and Mark Rosenstein
314
Chapter 18: A Primer on Verbal Protocol Analysis Susan Trickett and J. Gregory Trafton
332
Part IX: Capturing Expertise in Complex Environments
347
Chapter 19: Development of Simulated Team Environments for Measuring Team Cognition and Performance Jamie Gorman, Nancy Cooke, and Jasmine Duran
347
Chapter 20: Affective Measurement of Performance James Driskell and Eduardo Salas
362
Chapter 21: Providing Timely Assistance: Temporal Measurement Guidelines for the Study of Virtual Teams Susan Mohammed and Yang Zhang
376
Acronyms
391
Index
395
About the Editors and Contributors
415
This page intentionally left blank
SERIES FOREWORD
LAUNCHING THE TECHNOLOGY, PSYCHOLOGY, AND HEALTH DEVELOPMENT SERIES The escalating complexity and operational tempo of the twenty-first century requires that people in all walks of life acquire ever-increasing knowledge, skills, and abilities. Training and education strategies are dynamically changing toward delivery of more effective instruction and practice, wherever and whenever needed. In the last decade, the Department of Defense has made significant investments to advance the science and technology of virtual environments to meet this need. Throughout this time we have been privileged to collaborate with some of the brightest minds in science and technology. The intention of this three-volume handbook is to provide comprehensive coverage of the emerging theories, technologies, and integrated demonstrations of the state-of-the-art in virtual environments for training and education. As Dr. G. Vincent Amico states in the Preface, an important lesson to draw from the history of modeling and simulation is the importance of process. The human systems engineering process requires highly multidisciplinary teams to integrate diverse disciplines from psychology, education, engineering, and computer science (see Nicholson and Lackey, Volume 3, Section 1, Chapter 1). This process drives the organization of the handbook. While other texts on virtual environments (VEs) focus heavily on technology, we have dedicated the first volume to a thorough investigation of learning theories, requirements definition, and performance measurement. The second volume provides the latest information on a range of virtual environment component technologies and a distinctive section on training support technologies. In the third volume, an extensive collection of integrated systems is discussed as virtual environment use-cases along with a section of training effectiveness evaluation methods and results. Volume 3, Section 3 highlights future applications of this evolving technology that span cognitive rehabilitation to the next generation of museum exhibitions. Finally, a glimpse into the potential future of VEs is provided as an original short story entitled “Into the Uncanny Valley” from Judith Singer and Hollywood director Alex Singer.
xii
Series Foreword
Through our research we have experienced rapid technological and scientific advancements, coinciding with a dramatic convergence of research achievements representing contributions from numerous fields, including neuroscience, cognitive psychology and engineering, biomedical engineering, computer science, and systems engineering. Historically, psychology and technology development were independent research areas practiced by scientists and engineers primarily trained in one of these disciplines. In recent years, however, individuals in these disciplines, such as the close to 200 authors of this handbook, have found themselves increasingly working within a unified framework that completely blurs the lines of these discrete research areas, creating an almost “metadisciplinary” (as opposed to multidisciplinary) form of science and technology. The strength of the confluence of these two disciplines lies in the complementary research and development approaches being employed and the interdependence that is required to achieve useful technological applications. Consequently, with this handbook we begin a new Praeger Security International Book Series entitled Technology, Psychology, and Health intended to capture the remarkable advances that will be achieved through the continued seamless integration of these disciplines, where unified and simultaneously executed approaches of psychology, engineering, and practice will result in more effective science and technology applications. Therefore, the esteemed contributors to the Technology, Psychology, and Health Development Series strive to capture such advancements and effectively convey both the practical and theoretical elements of the technological innovations they describe. The Technology, Psychology, and Health Development Series will continue to address the general themes of requisite foundational knowledge, emergent scientific discoveries, and practical lessons learned, as well as cross-discipline standards, methodologies, metrics, techniques, practices, and visionary perspectives and developments. The series plans to showcase substantial advances in research and development methods and their resulting technologies and applications. Cross-disciplinary teams will provide detailed reports of their experiences applying technologies in diverse areas—from basic academic research to industrial and military fielded operational and training systems to everyday computing and entertainment devices. A thorough and comprehensive consolidation and dissemination of psychology and technology development efforts is no longer a noble academic goal—it is a twenty-first century necessity dictated by the desire to ensure that our global economy and society realize their full scientific and technological potentials. Accordingly, this ongoing book series is intended to be an essential resource for a large international audience of professionals in industry, government, and academia. We encourage future authors to contact us for more information or to submit a prospectus idea. Dylan Schmorrow and Denise Nicholson Technology, Psychology, and Health Development Series Editors
[email protected]
PREFACE G. Vincent Amico It is indeed an honor and pleasure to write the preface to this valuable collection of articles on simulation for education and training. The fields of modeling and simulation are playing an increasingly important role in society. You will note that the collection is titled virtual environments for training and education. I believe it is important to recognize the distinction between those two terms. Education is oriented to providing fundamental scientific and technical skills; these skills lay the groundwork for training. Simulations for training are designed to help operators of systems effectively learn how to operate those systems under a variety of conditions, both normal and emergency situations. Cognitive, psychomotor, and affective behaviors must all be addressed. Hence, psychologists play a dominant role within multidisciplinary teams of engineers and computer scientists for determining the effective use of simulation for training. Of course, the U.S. Department of Defense’s Human Systems Research Agencies, that is, Office of the Secretary of Defense, Office of Naval Research, Air Force Research Lab, Army Research Laboratory, and Army Research Institute, also play a primary role—their budgets support many of the research activities in this important field. Volume 1, Section 1 in this set addresses many of the foundational learning issues associated with the use of simulation for education and training. These chapters will certainly interest psychologists, but are also written so that technologists and other practitioners can glean some insight into the important science surrounding learning. Throughout the set, training technologies are explored in more detail. In particular, Volume 2, Sections 1 and 2 include several diverse chapters demonstrating how learning theory can be effectively applied to simulation for training. The use of simulation for training goes back to the beginning of time. As early as 2500 B.C., ancient Egyptians used figurines to simulate warring factions. The precursors of modern robotic simulations can be traced back to ancient China, from which we have documented reports (circa 200 B.C.) of artisans constructing mechanical automata, elaborate mechanical simulations of people or animals.
xiv
Preface
These ancient “robots” included life-size mechanical humanoids, reportedly capable of movement and speech (Kurzweil, 1990; Needham, 1986). In those early days, these mechanical devices were used to train soldiers in various phases of combat, and military tacticians used war games to develop strategies. Simulation technology as we know it today became viable only in the early twentieth century. Probably the most significant event was Ed Link’s development of the Link Trainer (aka the “Blue Box”) for pilot training. He applied for its patent in 1929. Yet, simulation did not play a major role in training until the start of World War II (in 1941), when Navy captain Luis de Florez established the Special Devices Desk at the Bureau of Aeronautics. His organization expanded significantly in the next few years as the value of simulation for training became recognized. Captain de Florez is also credited with the development of the first flight simulation that was driven by an analog computer. Developed in 1943, his simulator, called the operational flight trainer, modeled the PBM-3 aircraft. In the period after World War II, simulators and simulation science grew exponentially based upon the very successful programs initiated during the war. There are two fundamental components of any modern simulation system. One is a sound mathematical understanding of the object to be simulated. The other is the real time implementation of those models in computational systems. In the late 1940s the primary computational systems were analog. Digital computers were very expensive, very slow, and could not solve equations in real time. It was not until the late 1950s and early 1960s that digital computation became viable. For instance, the first navy simulator to use a commercial digital computer was the Attack Center Trainer at the FBM Facility (New London, Connecticut) in 1959. Thus, it has been only for the past 50 years that simulation has made major advancements. Even today, it is typical that user requirements for capability exceed the ability of available technology. There are many areas where this is particularly true, including rapid creation of visual simulation from actual terrain environment databases and human behavior representations spanning cognition to social networks. The dramatic increases in digital computer speed and capacity have significantly closed the gap. But there are still requirements that cannot be met; these gaps define the next generation of science and technology research questions. In the past decade or so, a number of major simulation initiatives have developed, including distributed interactive simulation, advanced medical simulation, and augmented cognition supported simulation. Distributed simulation enables many different units to participate in a joint exercise, regardless of where the units are located. The requirements for individual simulations to engage in such exercises are mandated by Department of Defense standards, that is, high level architecture and distributed interactive simulation. An excellent example of the capabilities that have resulted are the unprecedented number of virtual environment simulations that have transitioned from the Office of Naval Research’s Virtual Technologies and Environments (VIRTE) Program to actual military
Preface
xv
training applications discussed throughout this handbook. The second area of major growth is the field of medical simulation. The development of the human patient simulator clearly heralded this next phase of medical simulation based training, and the field of medical simulation will certainly expand during the next decade. Finally, the other exciting development in recent years is the exploration of augmented cognition, which may eventually enable system users to completely forgo standard computer interfaces and work seamlessly with their equipment through the utilization of neurophysiological sensing. Now let us address some of the issues that occur during the development process of a simulator. The need for simulation usually begins when a customer experiences problems training operators in the use of certain equipment or procedures; this is particularly true in the military. The need must then be formalized into a requirements document, and naturally, the search for associated funding and development of a budget ensues. The requirements document must then be converted into a specification or a work statement. That then leads to an acquisition process, resulting in a contract. The contractor must then convert that specification into a hardware and software design. This process takes time and is subject to numerous changes in interpretation and direction. The proof of the pudding comes when the final product is evaluated to determine if the simulation meets the customer’s needs. One of the most critical aspects of any modeling and simulation project is to determine its effectiveness and whether it meets the original objectives. This may appear to be a rather straightforward task, but it is actually very complex. First, it is extremely important that checks are conducted at various stages of the development process. During the conceptual stages of a project, formal reviews are normally conducted to ensure that the requirements are properly stated; those same reviews are also conducted at the completion of the work statement or specification. During the actual development process, periodic reviews should be conducted at key stages. When the project is completed, tests should be conducted to determine if the simulation meets the design objectives and stated requirements. The final phase of testing is validation. The purpose of validation is to determine if the simulation meets the customer’s needs. Why is this process of testing so important? The entire development process is lengthy, and during that process there is a very high probability that changes will be induced. The only way to manage the overall process is by performing careful inspections at each major phase of the project. As the organization and content of this handbook make evident, this process has been the fundamental framework for conducting most of today’s leading research and development initiatives. Following section to section, the reader is guided through the requirements, development, and evaluation cycle. The reader is then challenged to imagine the state of the possible in the final, Future Directions, section. In summary, one can see that the future of simulation to support education and training is beyond our comprehension. That does not mean that care must not be taken in the development process. The key issues that must be addressed were
xvi
Preface
cited earlier. There is one fact that one must keep in mind: no simulation is perfect. But through care, keeping the simulation objectives in line with the capabilities of modeling and implementation, success can be achieved. This is demonstrated by the number of simulations that are being used today in innovative settings to improve training for a wide range of applications. REFERENCES Kurzweil, R. (1990). The age of intelligent machines. Cambridge, MA: MIT Press. Needham, J. (1986). Science and civilization in China: Volume 2. Cambridge, United Kingdom: Cambridge University Press.
ACKNOWLEDGMENTS
These volumes are the product of many contributors working together. Leading the coordination activities were a few key individuals whose efforts made this project a reality: Associate Editor Julie Drexler Technical Writer Kathleen Bartlett Editing Assistants Kimberly Sprouse and Sherry Ogreten We would also like to thank our Editorial Board and Review Board members, as follows: Editorial Board John Anderson, Carnegie Mellon University; Kathleen Bartlett, Florida Institute of Technology; Clint Bowers, University of Central Florida, Institute for Simulation and Training; Gwendolyn Campbell, Naval Air Warfare Center, Training Systems Division; Janis Cannon-Bowers, University of Central Florida, Institute for Simulation and Training; Rudolph Darken, Naval Postgraduate School, The MOVES Institute; Julie Drexler, University of Central Florida, Institute for Simulation and Training; Neal Finkelstein, U.S. Army Research Development & Engineering Command; Bowen Loftin, Texas A&M University at Galveston; Eric Muth, Clemson University, Department of Psychology; Sherry Ogreten, University of Central Florida, Institute for Simulation and Training; Eduardo Salas, University of Central Florida, Institute for Simulation and Training and Department of Psychology; Kimberly Sprouse, University of Central Florida, Institute for Simulation and Training; Kay Stanney, Design Interactive,
xviii
Acknowledgments
Inc.; Mary Whitton, University of North Carolina at Chapel Hill, Department of Computer Science Review Board (by affiliation) Advanced Brain Monitoring, Inc.: Chris Berka; Alion Science and Tech.: Jeffery Moss; Arizona State University: Nancy Cooke; AuSIM, Inc.: William Chapin; Carlow International, Inc.: Tomas Malone; CHI Systems, Inc.: Wayne Zachary; Clemson University: Pat Raymark, Patrick Rosopa, Fred Switzer, Mary Anne Taylor; Creative Labs, Inc.: Edward Stein; Deakin University: Lemai Nguyen; Defense Acquisition University: Alicia Sanchez; Design Interactive, Inc.: David Jones; Embry-Riddle Aeronautical University: Elizabeth Blickensderfer, Jason Kring; Human Performance Architects: Richard Arnold; Iowa State University: Chris Harding; Lockheed Martin: Raegan Hoeft; Max Planck Institute: Betty Mohler; Michigan State University: J. Kevin Ford; NASA Langley Research Center: Danette Allen; Naval Air Warfare Center, Training Systems Division: Maureen Bergondy-Wilhelm, Curtis Conkey, Joan Johnston, Phillip Mangos, Carol Paris, James Pharmer, Ronald Wolff; Naval Postgraduate School: Barry Peterson, Perry McDowell, William Becker, Curtis Blais, Anthony Ciavarelli, Amela Sadagic, Mathias Kolsch; Occidental College: Brian Kim; Office of Naval Research: Harold Hawkins, Roy Stripling; Old Dominion University: James Bliss; Pearson Knowledge Tech.: Peter Foltz; PhaseSpace, Inc.: Tracy McSherry; Potomac Institute for Policy Studies: Paul Chatelier; Renee Stout, Inc.: Renee Stout; SA Technologies, Inc.: Haydee Cuevas, Jennifer Riley; Sensics, Inc.: Yuval Boger; Texas A&M University: Claudia McDonald; The Boeing Company: Elizabeth Biddle; The University of Iowa: Kenneth Brown; U.S. Air Force Academy: David Wells; U.S. Air Force Research Laboratory: Dee Andrews; U.S. Army Program Executive Office for Simulation, Training, & Instrumentation: Roger Smith; U.S. Army Research Development & Engineering Command: Neal Finkelstein, Timothy Roberts, Robert Sottilare; U.S. Army Research Institute: Steve Goldberg; U.S. Army Research Laboratory: Laurel Allender, Michael Barnes, Troy Kelley; U.S. Army TRADOC Analysis Center– Monterey: Michael Martin; U.S. MARCORSYSCOM Program Manager for Training Systems: Sherrie Jones, William W. Yates; University of Alabama in Huntsville: Mikel Petty; University of Central Florida: Glenda Gunter, Robert Kenny, Rudy McDaniel, Tim Kotnour, Barbara Fritzsche, Florian Jentsch, Kimberly Smith-Jentsch, Aldrin Sweeney, Karol Ross, Daniel Barber, Shawn Burke, Cali Fidopiastis, Brian Goldiez, Glenn Martin, Lee Sciarini, Peter Smith, Jennifer Vogel-Walcutt, Steve Fiore, Charles Hughes; University of Illinois: Tomas Coffin; University of North Carolina: Sharif Razzaque, Andrei State, Jason Coposky, Ray Idaszak; Virginia Tech.: Joseph Gabbard; Xavier University: Morrie Mullins
SECTION 1
LEARNING SECTION PERSPECTIVE Gwendolyn Campbell Thirty years ago, as a discussant for the International Conference on Levels of Processing organized by Laird Cermak and Fergus Craik, Jenkins (1979) noted that perhaps one of the most surprising and promising developments in the field of memory research was that “no one around this table thinks that memory is simple anymore” (pp. 429–430). Apparently the speakers had backed off from making the kinds of sweeping assertions about the nature of memory that had been common in the past and were instead making reasonable claims that allowed for complex interactions between a host of variables on the processes associated with human learning and memory. Jenkins went further in his discussion of the conference presentations and proposed an organizing framework to support memory researchers in both the design of their work and in the understanding of how their work fit into the larger body of memory research. This “Theorist’s Tetrahedron” was formed with four vertices, each representing a category of variables that could be manipulated and studied within the context of memory research. The value of this type of organizational framework is that it reminds researchers, who are easily caught up in the excitement of the variables that they are manipulating, of the possibility that there are also impacts in their research from variables that they are ignoring. Jenkins used one vertex of this tetrahedron, labeled “subjects,” to represent variables associated with the population being studied—age, gender, ability, knowledge, motivation, and so forth. The vertex labeled “materials” represented variables associated with the stimuli that were presented to those subjects, including the nature of those stimuli (images, numbers, words, nonsense syllables, and so forth) and the way in which those stimuli were organized or sequenced. The “orienting tasks” vertex represented the nature of the mental and/or physical activities that the subjects were asked to conduct with the stimuli. A classic contrast from research around that time, for example, was to ask some subjects to process materials in a “meaningful” way (for example, generate a sentence using the word) and other subjects to process the material in a “nonmeaningful” way (for example, indicate whether or not the word is being presented in all capital letters).
2
Learning, Requirements, and Metrics
Finally, the fourth vertex of the tetrahedron was labeled “criterial tasks” and contained variables associated with the nature of the post-test—was it immediate or delayed, recognition or free recall, and so forth. Well, as is often said, the more things change, the more they stay the same. Looking over the chapters in this section and indeed, the sections in this volume and the volumes in this series, it is clear that, with some minor modifications, the Theorist’s Tetrahedron is still a useful way to acknowledge and organize learning research. Most of the modifications would be in the nature of simply updating the terminology. For example, consider the vertex representing those mental and physical activities that are required of the student. Jenkins referred to these as “orienting tasks,” but we might be more likely today to use the term “instructional activities.” Similarly, it is likely that we would replace the “subject” label with the term “participant” or “student” and the term “criterial tasks” with “performance assessment.” One vertex, “materials,” might require a bit more of an overhaul. At the time, Jenkins was dealing with research that had a common goal for the students—to remember something—and the variable that they often manipulated was the nature of that something (for example, images versus words). If we expect this model to handle learning research, then we need to acknowledge the fact that “remembering” is not the only goal of instruction. Oftentimes we want our students to be able to apply some process, make a decision, recognize a pattern, solve a problem, and so forth. Thus, it seems that a more appropriate category for this vertex might be “learning objectives.” Finally, given that the point of the model is to help researchers not lose sight of the variables that they are holding constant, I would propose the addition of a fifth vertex to represent the context in which the learning is taking place. Obviously, the focus of this series is on learning taking place within virtual environment (VE) based training systems, but there are other possible contexts, such as within a classroom or on the job. The addition of this fifth vertex, as illustrated in Figure SP1.1, creates a geometric solid that is officially known as a pentahedron, but is more commonly referred to as a pyramid. With this modified model in hand, we can now examine the sections in this series of volumes and the chapters in this section to see how they are related to each other, and their coverage of the problem space. As befitting a section titled “Learning,” a construct that takes place inside the student, the majority of the chapters in this section focus primarily on those aspects that are brought to the learning context by the student, or the “student characteristics” vertex. Our final two chapters begin to shift focus to the instructional activities and learning objectives vertices. Section 2 of this volume focuses on requirements analysis, which is, roughly speaking, the process for determining the learning objectives. Section 3 focuses on a third vertex, “performance assessment.” The second volume of this series focuses on the components and technologies associated with our “learning context,” virtual environments. Finally, the last volume of this series embeds this pyramid in a broader context that includes the work environment and makes projections for this pyramid into the future. Thus, at a high level this series
Learning
Figure SP1.1.
3
Theorist’s Pyramid
demonstrates both alignment with and coverage of the classes of variables that have long been recognized as important in the study of learning and memory. Next consider this section. As mentioned earlier, the first four chapters in this section focus on those characteristics that the students bring with them to the learning context. Bowers, Vogel-Walcutt, and Cannon-Bowers (Chapter 2) provide one perspective on this topic by addressing those characteristics that vary from student to student, often referred to as individual differences. These authors cover many different research areas within the broad topic of individual differences and provide a nice organizing framework that distinguishes those characteristics that are relatively stable from those that are more malleable. In addition, they present open research questions regarding the relevance of these individual differences for VE training environments and provide some preliminary design guidance based on the existing literature. Chapters 1 (Poulsen, Luu, and Tucker), 3 (Klein and Baxter), and 4 (Ross, Phillips, and Cohn), on the other hand, focus on an area that is generally thought of as common across students, by describing the nature of learning processes. At first glance, these chapters might seem quite disparate, as the topics range from the neurophysiology of animal learning to a conceptual model of expertise in complex, ill-structured domains such as firefighting or medical practice. In fact, what ties these chapters together is that each looks at learning as it occurs on a different level, or time scale, of human activity. A useful way to see the relationship between these chapters is to place them within Newell’s (1990) time scales of human activity. Newell proposed a series of four levels or bands, each containing units of time that cover approximately three orders of magnitude. Newell’s biological band covers events that take place in the time frame of one-tenth of a millisecond to 10 milliseconds. As is obvious by both the time scale and the band label, these are primarily activities occurring
4
Learning, Requirements, and Metrics
at the neural level. Newell’s cognitive band includes simple deliberate acts and unit tasks, activities that fall between 100 milliseconds and 10 seconds of time. More complex tasks, which often take between minutes and hours to complete, fall into Newell’s rational band, and those human activities that stretch across days, weeks, months, and years fall into his social band. Within the context of this framework, we can see that Poulsen, Luu, and Tucker (Chapter 1) are studying the learning process at the level of Newell’s biological band by focusing on the neurophysiology of learning. Klein and Baxter (Chapter 3), on the other hand, present a characterization of the learning process that focuses on activities that are more likely to take place within Newell’s rational band. More specifically, they present an alternative to the conceptualization that learning is a simple process of accumulating new information and argue that learning often requires replacing oversimplified and incorrect mental models. They focus on the challenges inherent in inducing change to a mental model and describe the roles that virtual environments might play in facilitating this process. Finally, Ross, Phillips, and Cohn (Chapter 4) describe the stages of becoming an expert in a field, an activity that takes years and falls squarely in Newell’s social band. As with the other two chapters, these authors call out the instructional implications of this model of learning and make recommendations about the roles that virtual environments might play in facilitating the process. One might wonder if it is really necessary to have chapters that address the learning process at these different levels. It is certainly an established heuristic that there is a “right” level of analysis for any given question. If a teacher wants to know how best to introduce the topic of fractions to young children, is it really helpful to discuss brain activation? And this levels-of-analysis issue is not restricted to psychology, but is a continuing debate among many communities. For example, sociologists have long been struggling with the question of whether or not you can understand societal phenomena by studying individual behavior (for example, Jepperson, 2007), and 10 years ago there was a growing trend for biology departments to reorganize according to the level of study, splitting molecular biologists from those who study ecosystems (Roush, 1997). On the other hand, there are also many arguments in favor of taking a multilevel approach to understanding human behavior. At a minimum, it can easily be argued that principles identified at one level should not violate principles identified at other levels, and so a multilevel program of study may very well yield constraints and boundary conditions that flow between levels. It has also been argued at a relatively high level that the only way to truly understand a multilevel phenomenon such as human behavior is by analyzing it at multiple levels (for example, Hitt, Beamish, Jackson, & Mathieu, 2007). This argument has been made explicitly within the context of virtual environments, with regards to the construct of “presence” (IJsselsteijn, 2002). Anderson (2002) went beyond these general claims and assessed the empirical evidence available on the question of whether or not there is value in studying learning as it occurs over shorter time intervals when your interest is ultimately in understanding (and affecting) learning as it occurs across longer time intervals. He concluded that there is reasonably
Learning
5
strong evidence that learning at the social band can be understood by decomposing it into cognition that occurs at lower bands (all the way down to the biological band), but more work needs to be done to demonstrate that attending to processes and events at the lowest level of Newell’s hierarchy can yield improvements in outcomes at the social band. Turning more specifically to the chapters in this section, it is interesting to note that, despite the huge disparity in time scales, they do, in fact, have a common thread. That thread has to do with the existence of two learning systems. Poulsen, Luu, and Tucker (Chapter 1) refer to those systems as the fast learning system and the slow learning system and provide evidence that these systems engage different areas of the brain. The fast learning system is typically in charge early on during the acquisition of knowledge in a new area; this system requires a lot of cognitive resources and explicit processing and results in relatively quick and discrete changes to memory. The slow learning system, on the other hand, typically takes over as a person’s knowledge and skill in an area become rich, elaborated, and more and more compiled or automated. When this system is in charge, learning is gradual, may be unconscious, and repeated exposures are required to make even the smallest changes to a person’s cognitive structures and processes. An understanding of these two learning systems explicates a tension that is evident in the other two chapters, and more generally in the study of learning at any level of Newell’s hierarchy. That is the tension between desiring to promote the fast, effortless, automatic performance that we associate with expertise and simultaneously desiring to promote the control and flexibility that allow people to deal with novel situations in new and inventive ways. This tension is, in fact, explicitly called out in the chapter by Ross, Phillips, and Cohn (Chapter 4), and this chapter presents some of the same instructional advice as that of Poulsen, Luu, and Tucker (Chapter 1) in regards to maintaining the proper balance between the two learning systems. This thread is also evident in the discussion that Klein and Baxter (Chapter 3) present regarding the challenges in getting a student to unlearn or discard an established mental model. This challenge can be understood as a recasting of the challenge that Poulsen and colleagues describe of trying to get a person who has compiled knowledge, and thus is operating under the slow learning system, to shift back to using the fast learning system. Thus, while this is not by any means a definitive answer to the question of whether or not it is necessary to study learning at all levels of human activity, it does show one instance in which the principles at these levels are not isolated and independent, but rather inform, explicate, and illustrate each other. The last two chapters can be seen as forming a bridge between this section and the following sections, as they begin to take on perspectives from other vertices. Biddle, McBride, and Malone (Chapter 5) begin by addressing the issue of how understanding biology (in this case, human maturation and neural mylenization) can contribute to our understanding of, and ability to facilitate, human learning. They continue by discussing the need to study interactions between
6
Learning, Requirements, and Metrics
two vertices, “student states” and “instructional activities,” in order to optimize learning outcomes. Finally, in the last chapter in this section (Chapter 6), Van Buskirk, Cornejo, Astwood, Russell, Dorsey, and Dalton explicitly address a leg rather than a vertex of the pyramid by considering the relationship between learning objectives and instructional activities. The basic premises of this chapter are that all instructional activities are not equally effective for tackling all learning objectives and that empirical research can provide guidance as to how to select and implement optimal instructional activities for a given learning objective. Like Jenkins and the other researchers at that international conference in the late 1970s, these authors do not think that the answers will be found in simple, sweeping assertions and instead present a framework of their own to help organize, integrate, and guide current and future instructional research. It will be interesting to see if, 30 years from now, their matrix (possibly with some minor modifications) provides a useful organizing framework for a new generation of researchers using a new generation of technology. REFERENCES Anderson, J. R. (2002). Spanning seven orders of magnitude: A challenge for cognitive modeling. Cognitive Science, 26, 85–112. Hitt, M. A., Beamish, P. W., Jackson, S. E., & Mathieu, J. E. (2007). Building theoretical and empirical bridges across levels: Multilevel research in management. Academy of Management Journal, 50(6), 1385–1399. IJsselsteijn, W. (2002, October). Elements of a multi-level theory of presence: Phenomenology, mental processing and neural correlates. Proceedings of PRESENCE 2002 (pp. 245–249). Porto, Portugal. Jenkins, J. (1979). Four points to remember: A tetrahedral model of memory experiments. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory (pp. 429–446). Hillsdale, NJ: Lawrence Erlbaum. Jepperson, R. L. (2007, August). Multilevel analysis versus doctrinal individualism: The use of the “Protestant Ethic Thesis” as intellectual idealogy. Paper presented at the annual meeting of the American Sociological Association, New York. Retrieved April 11, 2008, from http://www.allacademic.com/meta/ p177199.index.html Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Cambridge University Press. Roush, W. (1997). News & Comment. Science, 275 (5306), 1556.
Part I: Biological Band
Chapter 1
THE NEUROPHYSIOLOGY OF LEARNING AND MEMORY: IMPLICATIONS FOR TRAINING Catherine Poulsen, Phan Luu, and Don Tucker Learning is often considered to be a unitary phenomenon. However, neurophysiological evidence suggests that multiple systems regulate learning, thereby creating unique forms of memory. This chapter presents a model of learning and memory that integrates two complementary learning circuits into a coherent framework that can be used to understand the development of expertise. In this model, learning and memory arise out of primitive systems of motivation and action control. These systems are responsible for memory consolidation within corticolimbic networks, and they are thus critical to higher cognitive function. Yet they include unique motivational influences intrinsic to the control of learning within each system. Understanding these motivational biases may be important in designing effective training methods. For example, the optimistic mood in response to successful achievement may be particularly important to gaining a broad representation of situational awareness. In contrast, anxiety under stress may facilitate focused attention on obvious threats, but impair situational awareness for unexpected threats. Theoretical progress in understanding the neurophysiology of learning and memory may lead to new insights into why certain methods of education and training are effective, and it may suggest new strategies for effectively motivating the learning process in ways that translate to real world contexts. INTRODUCTION With a third of their careers spent in training, military personnel need training to be well motivated, efficient, and cost-effective. The standard approach to training is to conduct a task analysis that identifies required knowledge, motor skills, and cognitive skills and then to develop programs that target these skills. This is a logical approach, but it is not driven by an understanding of the brain’s learning systems or by assessment of these systems during learning. We propose that design of effective training should start with the theoretical principles of learning based on neuroanatomical and neurofunctional evidence.
8
Learning, Requirements, and Metrics
Task analysis, although empirically derived, should be informed by these principles. Identification of skill sets, training, and learning protocols are interdependent and must be guided by empirical measurement of brain system dynamics within a coherent theoretical framework. The result of a neural systems analysis could be principled adjustments during training, with these adjustments optimized for each individual learner. The model presented in this chapter emphasizes that animals as well as humans operate through goal-directed learning, an expectancy based process in which the discrepancy between the anticipated and the actual outcome of an action drives new learning and thereby consolidates context-adaptive performance. Understanding such goal-directed learning may be essential to developing training methods that make experts out of novices. The chapter presents a brief overview of the neural systems underlying expectancy based learning, provides examples of the neurophysiological signatures of expertise and learning in cognitive neuroscience experiments, and concludes by considering the implications of this approach for enhanced training and performance.
THE GOALS OF EDUCATION AND TRAINING: LEARNING VERSUS PERFORMANCE A primary goal of education is to guide learning in its broadest sense, facilitating the acquisition of new knowledge, abilities, attitudes, and perspectives. Training typically has a more restricted objective, focusing on the directed acquisition and practice of a specific skill, from novice to expert level of performance. In either case, a major challenge is how best to facilitate learning. Specifically, how can the rate of learning be enhanced and the level of attainment be maximized? The development of effective interventions requires both sound theory and a sensitive, reliable measure of progress and attainment. This is particularly challenging for a process, such as learning, that is inherently hidden to the observer. In contrast to overt performance, learning is an internal process, sometimes referred to as a latent, unobservable state. Educators and trainers traditionally use performance measures as indicators of this latent state. But, performance indicators alone may be absent, incomplete, or ambiguous. For example, an identical error may reflect a learner’s incomplete knowledge of the task or simply a slip in performance (Campbell & Luu, 2007). Furthermore, considerable learning may take place before any overt change in performance occurs. Trainers need more reliable assessment of the learning process. Measurement of brain activity can provide a useful window on the process of learning, even in the absence of behavioral changes. Recent research indicates that novices differ from experts not just in task performance, but also in the very nature of the brain systems that mediate their learning and performance. Rather than relying solely on performance indicators, neuroadaptive training could directly monitor brain activity and thereby fine-tune information delivery and feedback to more directly support the neural systems engaged at different stages of learning.
The Neurophysiology of Learning and Memory: Implications for Training
9
NEUROPHYSIOLOGY OF LEARNING AND MEMORY Advances in neuroscience have provided new insights into the specific mechanisms of learning and memory. These new insights apply not only to the process of memory consolidation, but also to the motivational influences that often determine success and failure. All multicell organisms learn, yet scientists often neglect to recognize that the fundamental reason an animal learns is to anticipate effectively both internal homeostatic challenges and environmental constraints. The consequences of such neglect is that motivation is often viewed as separate from the learning process itself; motivational levels or affective reactions are merely considered to get the animal to “learn” a task; the motivation and affective mechanisms are often considered external to the core process of learning. Yet self-regulatory mechanisms may be inherent to the memory control processes that achieve learning. It is now well-known that an important aspect of animal learning involves forming and maintaining implicit or explicit cognitive expectancies for the hedonic regularities in the world and adapting these expectancies as they are adjusted or disconfirmed by the ongoing flow of events (for example, Rescorla & Wagner, 1972; Balleine & Ostlund, 2007). It is the internal regulation of information content (that is, the error between expectancies and outcomes) that significantly drives this form of learning. The memory systems supporting these expectancy and outcome representations are thus central to the learning process. Memories as Learning Outcomes As individuals learn through observation and interaction with their environments, traces of these experiences are retained, distributed across multiple neural networks. Conceptual knowledge is extracted from stimulus regularities (Potter, 1999), and stimulus-action-outcome contingencies are encoded and retrieved alongside contextual and hedonic features of the experience (Pribram, 1991). These processing traces, or memories, are thus formed in multiple neural systems as the outcomes of learning and experience. Learning and Dual-Action Control Systems The traditional view of learning as a process often divides it into two distinct stages: an early stage and a late stage. The early stage relies on executive control and short-term memory buffers. Task execution is carried out by slow and effortful control processes. These control processes are limited by the capacity of cognitive resources, require active attention, and can be directed consciously in new task situations (Schneider & Shiffrin, 1977; Shiffrin & Schneider, 1977). In contrast, the late stage is not dependent on executive control or temporary memory buffers, but rather on long-term memory stores. Routine task components, such as identifying task-relevant information in the environment, become automatized and reduce demands on limited cognitive resources. Learning reflects the progression through these stages.
10
Learning, Requirements, and Metrics
Neurophysiological models of animal learning provide additional insights into the cognitive conceptualization of the learning process. We have argued that it is best to conceptualize learning as “action regulation” (Luu & Pederson, 2004; Luu & Tucker, 2003). Action regulation emphasizes the need to adjust behavior according to both internal states and external demands, which require different learning and memory systems; these systems reflect cybernetic constraints on action control. Learning and memory naturally arise from these action regulation processes. Our research on human memory and learning over the past decade has been informed by neuroanatomical findings on self-regulated learning in animals. These findings, and the resultant model, fit nicely with the separation of early and late stages of learning in the cognitive literature. In the animal neurophysiology research, two complementary cortico-limbicthalamic circuits have been distinguished, each providing a unique strategic control on the learning process (Gabriel, Burhans, Talk, & Scalf, 2002). The ventral limbic circuit is made up of the anterior cingulate cortex (ACC) and the dorsomedial nucleus of the thalamus, with input from the amygdala (see Figure 1.1). This ACC based circuit is triggered by exogenous feedback and leads to rapid changes in learning in response to new information that is discrepant with expectations. This circuit is involved in the early stages of learning, whenever new tasks must be learned, or when routine actions and a priori knowledge are no longer appropriate for current demands (Gabriel, 1990; Gabriel, Sparenborg, & Stolar, 1986;
Figure 1.1. Circuits
Slow (Dorsal) and Fast (Ventral) Cortico-Limbic-Thalamic Learning
The Neurophysiology of Learning and Memory: Implications for Training
11
Poremba & Gabriel, 2001). At the psychological level, information in this circuit is deliberately held in short-term memory, existing for only a few seconds. The unique properties of this fast learning system, for example, its contribution to overcoming habitual responses, led Gabriel and colleagues (Gabriel et al., 2002) to suggest this circuit is integral to what has been called the executive control of cognition (Posner & DiGirolamo, 1998). Although numerous modern neuroimaging studies have repeatedly shown the involvement of the ACC early in learning (for example, Chein & Schneider, 2005; Toni, Ramnani, Josephs, Ashburner, & Passingham, 2001), it has been noted that lesions to the ACC, traditionally, do not change scores on intelligence tests or tests of executive control (Devinsky & Luciano, 1993), implying that this region may not be important to learning in humans. The most consistent observation after cingulate lesions in humans is alteration of affect. Patients are often described as being either apathetic or laissez-faire in their attitudes (Cohen, Kaplan, Moser, Jenkins, & Wilkinson, 1999); that is, they are less concerned about daily life events. For example, Rylander (1947) noted that patients report not being concerned when they make mistakes. In more recent studies, ACC lesions have been shown to affect both performance on traditional tests of executive control (such as the Stroop task; Janer & Pardo, 1991; Cohen et al., 1999) and error monitoring (Swick & Turken, 2002). The second circuit is centered on the posterior cingulate cortex (PCC) and the anterior ventral nucleus of the thalamus, with input from the hippocampus (Tucker & Luu, 2006). This dorsal limbic circuit is involved in the later stages of learning (Keng & Gabriel, 1998), when consolidation of information into long-term memory becomes important (Gabriel, 1990). It functions in an automated manner, shaping the context model with small adjustments, requiring little or no effort. In the later stages of learning, a contextual model is fully formed, and discrepancies with expectations result in minor changes that are largely consistent with the internal model and can be made with minimal attentional demands. The PCC based system then applies a feed-forward bias to action regulation, in which action is controlled endogenously and learning is slowly and incrementally updated (Tucker & Luu, 2006). The late stage learning process has been described as context updating (Donchin & Coles, 1988). This neurophysiological model indicates that goal-directed learning is an active process achieved by circuits with qualitative strategic biases. One bias, emerging from feedback control from the viscerosensory regulation of the ventral limbic pathway, leads to rapid, focused changes of associations under conditions of context violation or threat. A second bias, emerging from feed-forward control inherent to the visceromotor function in the dorsal limbic pathway, leads to endogenous, hedonic expectancies for action and a gradual updating of a valued context model (Tucker & Luu, 2006). Noninvasive neurophysiological measures of brain activity, including dense-array electroencephalography (EEG), nearinfrared spectroscopy, and functional magnetic resonance imagery (fMRI) now allow unprecedented access to investigation of these learning mechanisms during learning and performance in humans. We focus here on recent EEG research examining the operation of these two action regulation circuits.
12
Learning, Requirements, and Metrics
Neural Signatures of Learning and Skill Development Characteristic changes in brain activity during early learning, in contrast to changes that occur late in learning, have been identified with both fMRI and EEG. Chein and Schneider’s (2005) fMRI based analysis of the neural components of performance suggested that the effortful control required early in learning engages brain regions characterized as regulating attention (PCC and parietal), comparison (ACC), and task control (dorsolateral prefrontal cortex). As skilled performers became more automatic, activity in frontal regions declined, presumably due to a reduced need for executive control. Similar effects were obtained in a dense-array EEG study of task switching (Poulsen, Luu, Davey, & Tucker, 2005; Poulsen et al., 2003). 1 Subjects performed one of two alternative tasks, either a letter (vowel/consonant) or digit (even/odd) judgment, in response to bivalent (for example, G5) or univalent (for example, &3, G#) stimuli. A cue indicated which task to perform, and trials were sequenced to require task repetition or task switch. Consistent with effortful, controlled processing, performance was slower on the challenging switch trials as compared to repeat trials and evidenced greater engagement of ACC and lateral anterior prefrontal regions (Figure 1.2). A reduction in these control processes with task experience was indicated by behavioral and brain measures. Reaction time decreased linearly early in learning (half 1) to asymptotic levels that were maintained later in learning (half 2). Reaction-time variability, a behavioral performance index of automaticity (Segalowitz, Poulsen, & Segalowitz, 1999; Segalowitz & Segalowitz, 1993) also decreased linearly during half 1, but fluctuated in half 2. This suggests that control processing was reengaged intermittently in half 2 in order to maintain high levels of performance, particularly for the most difficult, bivalent switch trials. This interpretation was further supported by EEG evidence of greater prefrontal cortex and ACC involvement in half 2 (Figure 1.2). Amplitude of the P300, an EEG component associated with context updating that source localized to the PCC and related parietal cortex, was larger in half 2 than in half 1. This suggests greater involvement of the PCC based circuit late in learning, with emphasis on memory consolidation and incremental context updating. The results of this study thus illustrate not only how these two learning circuits characteristically come into play early and late in learning, but also how the relative balance of these two circuits can be dynamically adjusted to meet the challenges of variable task demands. Learning within this system is regulated by a simple cognitive phenomenon: violations of expectancy. EEG recorded during expectancy violations reveals brain activity consistent with the ACC learning circuit. Specifically, when expectancies are violated, such as when an error is committed, a negative deflection in the ongoing EEG is observed over medial frontal scalp regions (Falkenstein, Hohnsbein, Hoormann, & Blanke, 1991; Gehring, Goss, Coles, Meyer, & Donchin, 1993; Luu, Flaisch, & Tucker, 2000). This negative deflection is 1 All results reported were statistically significant. Figures represent grand averages of experimental groups and/or conditions.
The Neurophysiology of Learning and Memory: Implications for Training
13
Figure 1.2. (Left) Regional dipole locations for the right and left prefrontal cortex (PFC) and anterior cingulate (ACC) sources. (Middle) Switch-repeat difference waveforms (nAm) of regional source activity (x, y, z vectors) for the right and left PFC and ACC sources. Differences reveal greater activity in preparation for, and during execution of, a switch trial than a repeat trial. Periods of significant effects are indicated with superposed bars (solid bar: trial-type main effect; open bar: trial-type x laterality interaction). (Right) Anterior medial frontal effect in half 2 at 680 ms: topographic map of the difference amplitude for switch-repeat scalp data on bivalent trials (top), followed by the forward topographic projection of the ACC and PFC sources illustrating their contribution to modeled source activity.
referred to as the error-related negativity and has been source localized to the ACC (see Figure 1.3) (Luu, Tucker, Derryberry, Reed, & Poulsen, 2003; Miltner et al., 2003). Brain responses similar to the error-related negativity are observed in other situations, when errors are not committed but when expectancies are violated in other ways. For example, as subjects learn a repeated sequence of stimuli, when a position within that sequence is changed, medial frontal negativity (MFN) is also observed in response to this violation (for a review, see Luu & Pederson, 2004). TRACKING THE DEVELOPMENT OF EXPERTISE The above studies suggest that brain activity typically shifts from the anterior, ACC based, early learning system with reliance on control processes, to the posterior, PCC based, late learning system with increased automaticity and memory consolidation. We sought to examine, more specifically, the differences between novices and experts and the transition from novice to expert performance.
14
Learning, Requirements, and Metrics
Figure 1.3. (Left) Medial frontal scalp distribution of the error-related negativity. (Right) Cortical source generators of the error-related negativity in anterior cingulate and mid-cingulate regions (plus residual occipital activity for this visual decision task).
Tanaka and Curran (2001) studied bird and dog experts and found that signatures over the visual cortex (a component referred to as the N170, which occurs ~170 ms [milliseconds] after stimulus onset) were largest for images that contained domain-specific expertise content. That is, bird experts exhibited larger activity for images that contained birds as opposed to dogs, and vice versa for the dog experts. Moreover, when trained to identify certain types of targets, amplitude of the N170 increased (Scott, Tanaka, Sheinberg, & Curran, 2006). The time course, location, and pattern of this effect suggest it indexes what has been described in cognitive research as conceptual short-term memory (Potter, 1999). Conceptual short-term memory is a construct that describes the rapid process (100–300 ms post stimulus) by which conceptual information is extracted from visual and auditory stimuli and selected for further processing (200– 500 ms post stimulus). Conceptual short-term memory appropriately belongs to the late learning system. It is through learning and experience that concepts are abstracted and consolidated by the hippocampus into long-term memory traces stored in high order sensory (such as visual) cortices (Squire, 1998). These traces in turn permit rapid and automated extraction of goal-relevant information by these higher order sensory cortices from sensory input. The implication is that, for experts, experience reaches out to perception (via reentrant processes) such that what an expert sees is inherently different, at least neurophysiologically if not also phenomenologically, from what is perceived by the novice. We attempted to understand these findings with respect to satellite image analysts and their performance in detecting targets for which they have extensive experience (Crane, Luu, Tucker, & Poolman, 2007). Experts and novices viewed satellite images, with or without targets of military interest, delivered in rapid
The Neurophysiology of Learning and Memory: Implications for Training
15
Figure 1.4. Grand-average waveform plots for a channel located over the midline of the occipital region for expert (left) and novice (right) subjects. First gray rectangle identifies the N170, the second gray rectangle identifies the N250 (missing in novices), and the third gray rectangle identifies the continuation of the N250 to a peak at 300 ms. The oscillatory appearance of the waveform reflects overlapping visual P1-N1 responses to each image presented in rapid serial visual presentation at a rate of 10 Hz.
serial visual presentation format at a rate of 10 Hz (hertz). We observed that for experts, but not novices, the N170 (see Figure 1.4) was enhanced for target images compared to nontarget images. Furthermore, compared to novices, the experts’ N170 responses to targets were substantially enhanced. These findings are consistent with the notion that learning influences conceptual extraction by conceptual short-term memory. That is, the enhanced N170 observed for the experts likely reflects their domain-specific training, which allows for the separation of targets from nontarget brain responses. We also observed another difference between the target and nontarget waveforms of experts at approximately 250 ms after stimulus onset, an effect that was absent for novices. In Figure 1.4, the second rectangle in the experts’ waveform marks the time that a clear N250 can be seen for target images (see Figure 1.5). This cortical signature is absent in both the waveform (Figure 1.4) and topographic map (Figure 1.5) of novice subjects. When we estimated the cortical source of this scalp-recorded potential, sources were identified along the extent of the posterior temporal lobe, including the fusiform gyrus (see Figure 1.5, right), a region that has been implicated in expert visual processing (Gauthier, Skudlarski, Gore, & Anderson, 2000; but see also McKone, Kanwisher, & Duchaine, 2007). Previous research has shown that when subjects are trained to
16
Learning, Requirements, and Metrics
Figure 1.5. Topographic maps at 250 ms after target stimulus onset (left and middle figures), and the cortical source of the left-lateralized posterior negativity (N250) in experts (right).
discriminate objects at subordinate levels (for example, robins, starlings, and so forth), recognition of these subordinate-level images evoked larger N250 amplitudes than recognition of basic-level images (for example, birds; Scott et al., 2006), suggesting that the N250 reflects cortical activity related to conceptual extraction that is specific for a particular domain of expertise. At approximately 300 ms after stimulus onset, both novices and experts show a negative peak that differentiates target images from nontarget images (see Figure 1.4, gray rectangle around the 300 ms peak). This component may reflect the additional processing of the target images, but is potentiated by the N170 and the N250. The emerging view from these data is that conceptual extraction of information is experience dependent and occurs very early in visual processing. In experts, it is enhanced and automated (seen in the N170 and the N250), contributing to more accurate and rapid selection of target images. Beyond 300 ms after stimulus onset, we observed additional differences in regional brain activity between experts and novices. Most remarkable was a centromedial negativity (N350) observed in experts only (see Figure 1.6). This effect resembles mediofrontal negativities (MFNs) observed in other studies (for example, Tucker et al., 2003), but for the expert intelligence analysts in this experiment there was engagement of the precuneus (posterior midline) and orbital frontal regions, as well as the ACC. Although MFNs are believed to reflect action monitoring functions (such as error monitoring and/or conflict monitoring), it is plausible that the activity here reflects the selection stage of conceptual shortterm memory. Further research with expert performers may clarify the shifts in neural processing that can explain the efficiencies of cognition that are gained with domain expertise. These studies illustrate how noninvasive measurement of neural signatures can distinguish expert from novice performers. As the following studies demonstrate, they can also be used to track changes in the learning state.
The Neurophysiology of Learning and Memory: Implications for Training
17
Figure 1.6. Topographic maps at 350 ms after target stimulus onset (left and middle figures), and the cortical source of the mediofrontal negativity (N350) in experts (right).
Brain Changes Associated with Learning and Practice In the series of experiments described below, dense-array EEG was used to track neural activity during verbal and spatial associative learning. Subjects were required to discover by trial and error which arbitrary key press was associated with a two-digit code (verbal task) or with a dot location (spatial task; Luu, Tucker, & Stripling, 2007). Based on models of early and late learning systems, greater activity in the ventrolateral aspects of the inferior frontal lobes and the ACC was predicted early in learning, and greater activity in the hippocampus, the PCC, and the parietal lobes was predicted later in learning. Figure 1.7 shows the topography, at about 400 ms after stimulus onset, of the scalp potentials (inferior frontal focus) for the digit and spatial learning task, as well as the associated cortical generators of those scalp potentials. We refer to this waveform feature as the lateral inferior anterior negativity (LIAN). As predicted, this activity was lateralized according to the nature of the task (left for digits and right for spatial) and decreased, for the spatial task, as subjects learned the task (for the digit task, the activity remained steady throughout learning). We also found an ACC source that increased with learning (contrary to predictions). The ACC source was seen at the scalp as a mediofrontal negativity (MFN; see Figure 1.8). We interpret involvement of the inferior frontal source as reflecting memory encoding processes, and the increasing involvement of the ACC source as reflecting action monitoring relative to task demands (that is, with increased task knowledge there is a corresponding increase in response conflict). Note that although the time course of the MFN is similar to the negativity observed for expert image analysts (see Figure 1.6), a somewhat different pattern of cortical sources contributes to the MFN in the learning study. Although the pattern of results across several studies emphasizes the importance of midline corticolimbic networks to increasing expertise, it remains to be seen whether skill gained in a few training sessions will be associated with neural mechanisms that explain expertise gained over many years of training.
Figure 1.7. (Left) Distribution of scalp potential at 400 ms post stimulus associated with the digit and spatial memory task. In the code-learning task, early effortful processing was associated with hemisphere-specific inferior frontotemporal negativities (the lateral inferior anterior negativity; LIAN), with activity greater on the left side in learning the digit code, and on the right side in learning the spatial code. Dots on the topographic maps indicate the channel locations of the waveform plots below. The time window to quantify the LIAN (overlapping with the centroparietal P3 window) is indicated by the narrow gray box. (Right) Cortical sources of the LIAN.
The Neurophysiology of Learning and Memory: Implications for Training
19
Figure 1.8. (Top and Middle) Scalp topography of contextual learning, as indexed by an increase in medial frontal negativity (MFN) post learning. Dots on the topographic maps indicate the channel location of the waveform plot below. (Bottom) Cortical sources of the MFN component for digit (left) and spatial (right) targets post learning.
From the learning model we also predicted that there would be hippocampal, PCC, and parietal lobe activity late in learning. As in an earlier study (Poulsen et al., 2005), the increase in the P300 closely tracked the increase in demonstrated task learning (see Figure 1.9). Source analysis indicated activity in the hippocampus, the PCC, and the parietal cortex, consistent with memory-updating processes and memory consolidation. The Luu et al. (2007) study showed that source-localized EEG effects can track brain changes related to trial-and-error learning that are predicted from animal learning models, as well as from human fMRI studies. The results from this study provide unique clues about the time course of regional brain activation associated with the early learning process, as well as changes that occur with task acquisition. However, this study did not address the brain changes associated with extensive practice. Therefore, a second study was conducted using the digit version of the learning task. In this study participants completed four sessions, over four separate days. In the first session, they learned the digit-response mappings through trial and error. The second and third sessions provided additional practice
20
Learning, Requirements, and Metrics
Figure 1.9. (Top and Middle) Scalp topography of the P300. Dots on the topographic maps indicate the channel location of the waveform plots below. The time window to quantify the P300 (overlapping with the anterior LIAN window) is indicated by the full box width. (Bottom) Cortical sources (including the hippocampus, the PCC, and the parietal cortex) of P300.
to reinforce these digit-response mappings. In the fourth session, subjects were required to learn new digit-response mappings. Based on results obtained with the image analyst data, we anticipated learningrelated changes over the occipital cortex at approximately 170 ms after stimulus onset. Data presented in Figure 1.10 confirm this prediction. As subjects learned the task, amplitude of the N170 increased. The most prominent increase was during the first session as subjects progressed from the prelearned stage to the learned stage. The N170 from subsequent sessions (second and third) displayed similar amplitudes to the learned stage of the first session. When subjects had to learn a new digit-response mapping, N170 amplitude decreased to a similar level as the first prelearned state. These results confirm a stimulus-specific learning effect. The N170 progressed bilaterally along the temporal lobes, including the fusiform gyrus, and engaged the anterior temporal lobes by 250 ms. In the target detection task, only the experts had shown this temporal lobe engagement. In the extended practice experiment, however, this engagement was observed for all conditions, including the prelearning period. This was perhaps due to the nature of the stimuli (digits) in this experiment; all subjects have extensive experience with numbers, and activity along the fusiform gyrus may reflect this experience with familiar perceptual objects.
The Neurophysiology of Learning and Memory: Implications for Training
21
Figure 1.10. Waveform plots for a channel located over the left occipital region for correct responses in the four sessions. The prelearned conditions are illustrated for sessions 1 and 4. The gray rectangle identifies the N170.
Following this sequence of electrical events, we observed the development of a midline negativity that peaked at approximately 350 ms post stimulus. This negativity was absent in the prelearned state during the first session (see Figure 1.11), but increased across practice sessions 2 and 3, replicating and extending previous findings on the MFN (see Figure 1.8). Surprisingly, in the fourth session, when subjects were required to learn a new stimulus-response mapping, the enhanced negativity did not disappear, but rather remained, being of similar amplitude to the postlearned state of the first session. This suggests that the N350 reflects experience with the task and not the specific stimulus-response requirements of the task (compare with the N170 effect). Note the similarity between Figures 1.11 and 1.6. Cortical sources for the midline negativity associated with extensive practice are shown in Figure 1.12. The figure on the left shows the cortical activity at 300 ms, when the midline negativity begins to develop. This is similar to the MFN we observed in the first learning study (see Figure 1.8), particularly for the ACC, medial occipital, and orbitofrontal sources. At 350 ms, the sources are much more posterior (PCC, Figure 1.12, middle; left parietal, Figure 1.12, right). The PCC source may be associated with the contextual representation of task parameters, whereas the parietal lobe activity may reflect the actual actionstimulus representation (Goodale & Milner, 1992).
22
Learning, Requirements, and Metrics
Figure 1.11. Topographic maps illustrating the MFN response at 350 ms post stimulus for correct responses before learning (sessions 1 and 4, pre), and after learning (sessions 1 through 4, post).
These studies revealed important learning-related brain activation that can be related to differences observed between expert image analysts and novices. The important findings are summarized here. First, from the learning studies, brain activity associated with both early and late learning systems was observed. The components of the early learning system, the inferior frontal lobe and the ACC, were engaged early in learning, whereas components of the late learning system, the PCC, the hippocampus, and sensory cortices, were engaged in the late stages of learning. Second, several learning-related changes consistent with the observed differences between image analysts and novice subjects were observed within the late learning system, including changes in the N170 and the appearance of the centromedial negativity at 350 ms. The N170 indexes stimulus-specific experience, as new learning is associated with a decrease of its amplitude. In contrast, the centromedial negativity reflects the acquisition of general task parameters because it persists even when subjects are required to learn new stimulus-response mappings in the same task. The similarities between these two learning-related markers and expert-novice differences suggest that certain markers of expertise reflect experience-dependent changes that are a consequence of learning generally.
The Neurophysiology of Learning and Memory: Implications for Training
Figure 1.12. stimulus.
23
Sources of the MFN. (Top) 300 ms post stimulus. (Bottom) 350 ms post
Third, there were expert-related neural signatures that were specific to image analysts. An N250 was present in image analysts’ brain responses, but was absent in novices. In these learning studies, the N250 was observed in both the pre- and postlearned states. These results suggest that for the task of image analysis, which involves acquiring expertise in analysis of uncommon images (at least from the perspective of the experience of the general population), there may be specific changes to the higher order visual cortex. These changes may contribute to rapid conceptual extraction of information from image data, marking the attainment of expertise for image analysts. EDUCATION AND TRAINING PROTOCOLS We have theorized that the intrinsic motive biases of the dorsal (contextual feed-forward) and ventral (item based feedback) human memory systems cause human learning to be controlled for specific purposes of adaptive cognition (Luu et al., 2007; Tucker & Luu, 2006). Such learning is thus strategic, and it is tailored to the processing demands of performance. We reviewed EEG studies
24
Learning, Requirements, and Metrics
of skilled performance that have identified electrophysiological signatures of the engagement of these two circuits in response to processing demands early and late in learning. In translating both the theoretical rationale and the empirical measures into training implications, we apply three principles. First, effective training should provide instruction appropriate to the learner’s current level. This applies not just to content knowledge and performance, but especially to underlying neural processing. Second, effective training should engage processes during learning and practice that match those later required in real world performance environments. This transfer-appropriate training entails introducing not simply the external task features during learning, but significantly the internal, neural dynamics underlying targeted performance. Third, effective training should monitor and adapt to the operation of, and interaction between, both these action control systems.
Emphasis Change: Emphasis on Strategic Control The involvement of the ventral limbic circuit at the early stage of learning indicates the importance of feedback-guided instruction. Very early in learning, while stimulus-response representations are still being acquired, feedback carries a large informational load. As the task is acquired, knowledge of the correct response leads to the generation of expectancies, allowing the learner to employ greater endogenous (feed-forward) control and self-monitoring of performance. Recording of electrical brain responses to errors and feedback thus provides a sensitive index of the development of expectancies through learning. In our research, this progression from prelearned to learned behavior seems to be reflected in a reduction in the feedback-related negativity (exogenous guidance), concurrent with an increase in the error-related negativity (endogenous guidance). During this period, the learner is also developing conceptual short-term representations of the stimuli and their task relevancy (indicated by an increase in the amplitude of the N170) alongside the context of occurrence (indicated by an increase in P300) and the tracking of global task parameters (indexed by the N350). For task context and parameters to be accurately encoded in brain activity for later transfer to performance, the learner should, therefore, experience the task in its full context during training. An existing approach to training, the emphasis-change protocol (Gopher, 1996, 2007) is compatible with these objectives. Emphasis-change training comprises a collection of methods, including training with variable priorities, emphasis change, secondary tasks, and task switching (Gopher, 2007). Common to all these methods is the introduction of systematic variability into training, guided task exploration with performance feedback, and experience with the task in its entirety throughout training. Based on cognitive task analysis, emphasis-change protocols attempt to match processing requirements during training to those required in a real world context. This approach is particularly well suited to the training of complex skills and has been applied effectively in several high performance training programs, including the training of air force pilots (Gopher, Weil,
The Neurophysiology of Learning and Memory: Implications for Training
25
& Bareket, 1994) and helicopter navigation with helmet-mounted displays (Seagull & Gopher, 1997). Whereas emphasis-change methods lead to slower initial learning rates, research indicates they result in high levels of attainment with superior transfer to the real world. Research into the neural mechanisms of learning as described in this chapter may provide additional insight into the effectiveness of emphasis-change programs and how they may be further adapted to optimize learning and transfer to operational performance. Rote Training: Emphasis on Automatization When learners become highly skilled in a given task domain, routine task components become automatized. As noted earlier, automaticity improves the efficiency of performance by reducing demands on cognitive resources and increasing the speed, fluency, and reliability of execution. The N170 and N250 components appear to index the development of automatized stimulus recognition during the later stages of learning and extended practice. Cognitive experimental research indicates that automaticity is most effectively developed through repeated, consistent mapping between stimulus and response (Schneider & Fisk, 1982; Strayer & Kramer, 1994). A common element of many training protocols, therefore, is to provide rote practice to enhance automaticity. Automaticity, however, may also come at a cost. Automatic processes are triggered by external stimuli and are ballistic; that is, they are difficult to stop once initiated. In the absence of concurrent strategic control, automaticity can lead to rigid performance and potentially critical errors of commission. Moreover, automaticity achieved through rote practice is often abstracted from the real task context and associated goal-directed processing; consequently, rote training will typically transfer poorly to real performance situations. One challenge of training programs, therefore, is to provide extended practice environments that develop automaticity within a meaningful context that maintains the goal-directed nature of performance and strategic control dynamics (for example, Gatbonton & Segalowitz, 1988; Gopher, 1996). Training for Complex, Demanding Environments Complex environments create highly variable processing demands that require dynamic self-regulation. Although we have stressed the relative contribution of the ACC and PCC circuits in early versus late learning, respectively, variable environments require dynamic reengagement of the ACC circuit with its feedback control bias—even after learning has been achieved. A similar concept has been put forth by Goschke (2002), who characterized the demands of adaptive behavior as a dynamic balance between competing control dilemmas: persistence versus flexibility, stability versus plasticity, and selection versus monitoring. In cognitive terms, this behavior entails balancing automatization with cognitive flexibility and is the hallmark of an individual who can rapidly perceive, comprehend, and take action in a complex environment. We refer to this capability as flexible expertise. We highlight the importance of this dynamic balance in
26
Learning, Requirements, and Metrics
relation to two challenges of flexible expertise in complex environments—performance under stress and the maintenance of situational awareness. High levels of stress in performance situations can arise from multiple sources, including cognitive overload, fatigue, threats to self-preservation, and even performance anxiety. The improved efficiency afforded by automaticity increases the resilience of performance to stress, fatigue, and distraction. It also frees up cognitive resources for allocation to situational awareness and strategic processing, such as monitoring the environment for critical events, assessing the outcomes of action, and coordinating the performance with others. The development of automaticity alone, however, does not ensure that essential, strategic attention processes will be engaged. Prior research on complex, semiautomated environments has found that most errors stem from lapses in situational awareness, particularly the failure to perceive and attend to critical information, to integrate this information, and to revise one’s mental model of the situation (Endsley, 1995; Jones & Endsley, 1996). Most of these incidents have involved highly trained personnel. Although the automaticity of expertise affords many benefits, performance risks getting “stuck in set” making it difficult to recognize novelty and change one’s dominant mode of response—despite the greater availability of resources. From the perspective of action-control theory, such suboptimal performance by experts in complex environments reflects an imbalance between the two control circuits, with overreliance on the feedforward bias of the PCC system. Training for optimal performance in complex environments must, therefore, exercise strategic switching between these action-control systems by, for example, introducing unpredictability and variability alongside continued practice for automaticity. In addition to guiding new learning, expectancy violation may be essential for maintaining attentional awareness and cognitive flexibility along with the benefits of automaticity. The introduction of variability during practice will trigger the engagement of the expectancy violation mechanism and will train learners to assess the relevancy of this violation and to adaptively adjust their response strategies. Such a protocol will promote an optimal, dynamic balance between the feed-forward PCC circuit and the feedback ACC circuit.
INTEGRATION OF NEURAL MEASURES INTO TRAINING AND EDUCATION Coupled with sophisticated brain monitoring technologies, the neurophysiological model advanced in this chapter offers a new window into adaptive learning and performance processes. It provides sensitive neurometrics for developing online, tailored training protocols and for measuring learner progress and training program success. As described in the preceding sections, we have made initial progress in mapping these learning systems in humans. We think it is now feasible to apply the dense-array EEG technology to monitor learning in real time and thereby guide the training process.
The Neurophysiology of Learning and Memory: Implications for Training
27
Our near-term objective is to develop and test neural based assessment and training tools that will increase the rate and efficiency of domain-specific learning through real time adaptive feedback and information delivery, as well as augment domain-general performance. Our protocol targets three interrelated components of cognitive enhancement: automaticity, situational awareness, and cognitive flexibility. Neuroadaptive training will adjust task parameters (such as stimulus and feedback presentations and required responses) to the learning state and cognitive capacity of the learner. The training protocols will specifically target the engagement of the ACC and PCC based learning and memory systems and monitor their responses through online electrophysiological recording. Along with behavioral performance indicators, these neurophysiological markers of flexible expertise and feedback processing can be used as metrics to (1) assess a learner’s current level of development in each of these cognitive capacities, (2) select candidates for targeted adaptive training, and (3) predict success in actual performance environments. REFERENCES Balleine, B. W., & Ostlund, S. B. (2007). Still at the choice-point: Action selection and initiation in instrumental conditioning. Annals of the New York Academy of Sciences, 1104, 147–171. Campbell, G. E., & Luu, P. (2007, October). A preliminary comparison of statistical and neurophysiological techniques to assess the reliability of performance data. Paper presented at the 4th Augmented Cognition International Conference, Baltimore, MD. Chein, J. M., & Schneider, W. (2005). Neuroimaging studies of practice-related change: fMRI and meta-analytic evidence of a domain-general control network for learning. Cognitive Brain Research, 25, 607–623. Cohen, R. A., Kaplan, R. F., Moser, D. J., Jenkins, M. A., & Wilkinson, H. (1999). Impairments of attention after cingulotomy. Neurology, 53, 819–824. Crane, S. M., Luu, P., Tucker, D. M., & Poolman, P. (2007). Expertise in the human visual system: Neural target detection without conscious detection. Manuscript in preparation. Devinsky, O., & Luciano, D. (1993). The contributions of cingulate cortex to human behavior. In B. A. Vogt & M. Gabriel (Eds.), Neurobiology of the cingulate cortex and limbic thalamus (pp. 427–556). Boston: Birkhauser. Donchin, E., & Coles, M. G. (1988). Is the P300 component a manifestation of context updating? Behavioral and Brain Sciences, 11, 357–374. Endsley, M. R. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors, 37, 32–64. Falkenstein, M., Hohnsbein, J., Hoormann, J., & Blanke, L. (1991). Effects of crossmodal divided attention on late ERP components. II. Error processing in choice reaction tasks. Electroencephalography and Clinical Neurophysiology, 78, 447–455. Gabriel, M. (1990). Functions of anterior and posterior cingulate cortex during avoidance learning in rabbits. Progress in Brain Research, 85, 467–482. Gabriel, M., Burhans, L., Talk, A., & Scalf, P. (2002). Cingulate cortex. In V. Ramachandran (Ed.), Encyclopedia of the human brain (Vol. 1, pp. 775–791). San Diego, CA: Academic.
28
Learning, Requirements, and Metrics
Gabriel, M., Sparenborg, S. P., & Stolar, N. (1986). An executive function of the hippocampus: Pathway selection for thalamic neuronal significance code. In R. L. Isaacson & K. H. Pribram (Eds.), The hippocampus (Vol. 4, pp. 1–39). New York: Plenum. Gatbonton, E., & Segalowitz, N. (1988). Creative automatization: Principles for promoting fluency within a communicative framework. TESOL Quarterly, 22, 473–492. Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). Expertise for cars and birds recruits brain areas involved in face recognition. Nature Neuroscience, 3, 191–197. Gehring, W. J., Goss, B., Coles, M. G. H., Meyer, D. E., & Donchin, E. (1993). A neural system for error detection and compensation. Psychological Science, 4, 385–390. Goodale, M. A., & Milner, D. A. (1992). Separate visual pathways for perception and action. Trends in Neurosciences, 15, 20–25. Gopher, D. (1996). Attention control: Explorations of the work of an executive controller. Cognitive Brain Research, 5(1–2), 23–38. Gopher, D. (2007). Emphasis change as a training protocol for high-demand tasks. In A. F. Kramer, D. A. Wiegmann, & A. Kirlik (Eds.), Attention: From theory to practice (pp. 209–224). New York: Oxford University Press. Gopher, D., Weil, M., & Bareket, T. (1994). Transfer of skill from a computer game trainer to flight. Human Factors, 36, 387–405. Goschke, T. (2002). Voluntary action and cognitive control from a cognitive neuroscience perspective. In S. Maasen, W. Prinz, & G. Roth (Eds.), Voluntary action: An issue at the interface of nature and culture (pp. 49–85). Oxford, United Kingdom: Oxford University Press. Janer, K. W., & Pardo, J. V. (1991). Deficits in selective attention following bilateral anterior cingulotomy. Journal of Cognitive Neuroscience, 3, 231–241. Jones, D. G., & Endsley, M. R. (1996). Sources of situation awareness errors in aviation. Aviation, Space and Environmental Medicine, 67, 507–512. Keng, E., & Gabriel, M. (1998). Hippocampal modulation of cingulo-thalamic neuronal activity and discriminative avoidance learning in rabbits. Hippocampus, 8, 491–510. Luu, P., Flaisch, T., & Tucker, D. M. (2000). Medial frontal cortex in action monitoring. Journal of Neuroscience, 20(1), 464–469. Luu, P., & Pederson, S. M. (2004). The anterior cingulate cortex: Regulating actions in context. In M. I. Posner (Ed.), Cognitive neuroscience of attention (pp. 232–244). New York: Guilford Press. Luu, P., & Tucker, D. M. (2003). Self-regulation and the executive functions: Electrophysiological clues. In A. Zani & A. M. Proverbio (Eds.), The cognitive electrophysiology of mind and brain (pp. 199–223). San Diego, CA: Academic Press. Luu, P., Tucker, D. M., Derryberry, D., Reed, M., & Poulsen, C. (2003). Electrophysiological responses to errors and feedback in the process of action regulation. Psychological Science, 14, 47–53. Luu, P., Tucker, D. M., & Stripling, R. (2007). Neural mechanisms for learning actions in context. Brain Research, 1179, 89–105. McKone, E., Kanwisher, N., & Duchaine, B. C. (2007). Can generic expertise explain special processing for faces? Trends in Cognitive Sciences, 11(1), 8–15. Miltner, W. H., Lemke, U., Weiss, T., Holroyd, C., Scheffers, M. K., & Coles, M. G. (2003). Implementation of error-processing in the human anterior cingulate cortex: A source analysis of the magnetic equivalent of the error-related negativity. Biological Psychology, 64(1–2), 157–166.
The Neurophysiology of Learning and Memory: Implications for Training
29
Poremba, A., & Gabriel, M. (2001). Amygdalar efferents initiate auditory thalamic discriminative training-induced neuronal activity. Journal of Neuroscience, 21, 270–278. Posner, M. I., & DiGirolamo, G. J. (1998). Executive attention: Conflict, target detection, and cognitive control. In R. Parasuraman (Ed.), The attentive brain (pp. 401–423). Cambridge, MA: MIT Press. Potter, M. C. (1999). Understanding sentences and scenes: The role of conceptual shortterm memory. In V. Coltheart (Ed.), Fleeting memories: Cognition of brief visual stimuli (pp. 13–46). Cambridge, MA: MIT Press. Poulsen, C., Luu, P., Davey, C., & Tucker, D. M. (2005). Dynamics of task sets: Evidence from dense-array event-related potentials. Cognitive Brain Research, 24, 133–154. Poulsen, C., Luu, P., Tucker, D. M., Scherg, M., Davey, C., & Frishkoff, G. (2003, March). Electrical brain activity during task switching: Neural source localization and underlying dynamics of scalp-recorded potentials. Poster presented at the Tenth Annual Meeting of the Cognitive Neuroscience Society, New York. Pribram, K. H. (1991). Brain and perception: Holonomy and structure in figural processing. Hillsdale, NJ: Erlbaum. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts. Rylander, G. (1947). Personality analysis before and after frontal lobotomy. In J. F. Fulton, C. D. Aring, & B. S. Wortis (Eds.), Research publications—Association for research in nervous and mental disease: The frontal lobes (pp. 691–705). Baltimore, MD: Williams & Wilkins. Schneider, W., & Fisk, A. D. (1982). Degree of consistent training: Improvements in search performance and automatic process development. Perception and Psychophysics, 31, 160–168. Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic human information processing: 1. Detection, search, and attention. Psychological Review, 84, 1–66. Scott, L. S., Tanaka, J. W., Sheinberg, D. L., & Curran, T. (2006). A reevaluation of the electrophysiological correlates of expert object processing. Journal of Cognitive Neuroscience, 18, 1453–1465. Seagull, F. J., & Gopher, D. (1997). Training head movement in visual scanning: An embedded approach to the development of piloting skills with helmet mounted displays. Journal of Experimental Psychology: Applied, 3, 463–480. Segalowitz, N., Poulsen, C., & Segalowitz, S. (1999). RT coefficient of variation is differentially sensitive to executive control involvement in an attention switching task. Brain and Cognition, 40, 255–258. Segalowitz, N., & Segalowitz, S. (1993). Skilled performance, practice, and the differentiation of speed-up from automatization effects: Evidence from second language word recognition. Applied Psycholinguistics, 14, 369–385. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127–190. Squire, L. R. (1998). Memory systems. Comptes Rendus de l’Academie des Sciences. Serie III, Sciences de la Vie, 321, 153–156.
30
Learning, Requirements, and Metrics
Strayer, D. L., & Kramer, A. F. (1994). Strategies and automaticity: I. Basic findings and conceptual framework. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 318–341. Swick, D., & Turken, A. U. (2002). Dissociation between conflict detection and error monitoring in the human anterior cingulate cortex. Proceedings of the National Academy of Science, 99, 16354–16359. Tanaka, J. W., & Curran, T. (2001). A neural basis for expert object recognition. Psychological Science, 12, 43–47. Toni, I., Ramnani, N., Josephs, O., Ashburner, J., & Passingham, R. E. (2001). Learning arbitrary visuomotor associations: Temporal dynamic of brain activity. Neuroimage, 14, 1048–1057. Tucker, D. M., & Luu, P. (Eds.). (2006). Adaptive binding. New York: Oxford University Press. Tucker, D. M., Luu, P., Desmond, R. E., Jr., Hartry-Speiser, A., Davey, C., & Flaisch, T. (2003). Corticolimbic mechanisms in emotional decisions. Emotion, 3, 127–149.
Part II: Cognitive/Rational Band
Chapter 2
THE ROLE OF INDIVIDUAL DIFFERENCES IN VIRTUAL ENVIRONMENT BASED TRAINING Clint Bowers, Jennifer Vogel-Walcutt, and Jan Cannon-Bowers Most modern theories of learning and instructional design converge on the conclusion that the attributes that learners bring to the instructional environment are important ingredients in the learning process. In fact, the notion that learners bring a unique set of knowledge, skills, aptitudes, abilities, preferences, and experiences to a learning environment is captured by a popular approach known as learner-centered instruction (for example, see Cognition and Technology Group at Vanderbilt [CTGV], 2000; Bransford, Brown, & Cocking, 1999; Kanfer & McCombs, 2000; Clark & Wittrock, 2000). Essentially, proponents of this approach argue that characteristics of the learner must be taken into account in the design and delivery of instruction and that an explicit attempt must be made to build on the strengths of the student. Variables that have been implicated in this regard include prior knowledge, prior skill, prior experience, misconceptions, and interests, among others. In addition, a number of other personal attributes have been shown to affect learning. These include motivation (CTGV, 2000; Clark, 2002; Clark & Wittrock, 2000; Bransford, Brown, & Cocking, 1999), personal agency/self-efficacy (Kanfer & McCombs, 2000; Bandura, 1977, 1986; Gist, 1992), goal orientation (Dweck, 1986; Dweck & Legget, 1988; Bransford, Brown, & Cocking, 1999; Kanfer & McCombs, 2000), goal commitment (Kanfer & McCombs, 2000), emotional state (Clark, 2002), self-regulation (Kanfer & McCombs, 2000), misconceptions (Gentner, 1983), interest (Kanfer & McCombs, 2000), instrumentality (Tannenbaum, Mathieu, Salas, & CannonBowers, 1991), ability (Mayer, 2001), and spatial ability (Mayer, 2001). Clearly, the findings cited above (as well as other work) justify the study of learner characteristics as an important variable in technology-enabled learning. Interestingly, the popular notion that people have unique learning styles—that is, some people learn differently than others—has not been substantiated by the empirical literature (for example, see Cronbach & Snow, 1977). Nonetheless, a host of learner characteristics clearly do affect learning; in fact, it is quite likely
32
Learning, Requirements, and Metrics
these learner attributes will interact with instructional approaches, such that some interventions may be more effective for some learners than others. At the same time, researchers have also discussed the degree to which individual differences might influence an individual’s experience in virtual environments. For example, Kaber and his colleagues discuss this problem from the viewpoint of virtual environment (VE) design in their recent chapter (Kaber, Draper, & Usher, 2002). They discuss a variety of factors that might determine the degree to which individuals perceive or experience the elements of the virtual environment as “real.” They include a review of variables such as user experience, spatial aptitude, age, and so forth. They describe the effect that these variables may have in the degree of immersion and other experiential variables associated with virtual environments and how designers might consider them. In this chapter, we seek to combine these two bodies of literature described above to discuss those individual differences that are most likely to influence learning outcomes in virtual environments. As such, we will focus on the subset of variables mentioned above. In so doing, we will discuss the state of the existing literature and also the research that is required to more fully understand these important relationships. TYPES OF INDIVIDUAL DIFFERENCES THAT AFFECT LEARNING IN VIRTUAL ENVIRONMENTS In reviewing the many variables that might be included in this review, it became apparent there were two distinct classes of individual characteristics that might affect learning in a virtual environment. One of these classes includes a set of immutable characteristics, such as gender, age, and cognitive ability. The second class involves more alterable characteristics, such as attitudes, expectations, and experiences. According to Figure 2.1, both classes of variables are predicted to affect learning in a virtual environment. Moreover, at least some of these variables (for example, cognitive ability) may also have a direct impact on learning outcomes. The following sections review what is known about the influence of individual differences in learning, with specific emphasis on virtual environment based learning systems. IMMUTABLE CHARACTERISTICS As noted, there are a number of individual variables that are fairly consistent in learners over time, and probably not amenable to much alteration. It is important to understand how these features operate since they can affect the degree of learning expected. Furthermore, it may be worthwhile designing variations in the learning system’s design to accommodate various users. Obviously, concerns such as cost and configuration management will have implications on how much the VE can be altered for different users; however, if learning is affected significantly, the cost may be justified. The following sections summarize what is known about the impact of stable individual differences in VE based learning.
The Role of Individual Differences in Virtual Environment Based Training
33
Figure 2.1. Model of Individual Characteristics on Learning in Virtual Environments
Age Perceptual and cognitive skills decline as people age; consequently, these changes must be considered when creating training for VE. Specifically, Paas, Van Gerven, and Tabbers (2005) note that there is a reduction in working memory, cognitive speed, inhibition, and integration likely negatively influencing working and learning in dynamic, complex computer systems. Several researchers have investigated these issues using principles derived from cognitive load theory, multimedia theory, and the human-computer interaction literature and will be summarized below. Age is a general predictor of learning to use computers (Egan, 1988). Because the aging process impacts perceptual and cognitive skills, it may also impact interactions with VE systems (Birren & Livingson, 1985; Fisk & Rogers, 1991; Hertzog, 1989; Salthouse, 1992, as cited in Kaber, Draper, & Usher, 2002). Given that elderly learners are dependent on a more limited pool of cognitive resources, they appear to learn at a slower pace than younger learners, and this becomes more pronounced in complex tasks requiring substantial mental processing (Fisk & Warr, 1996; Salthouse, 1994, 1996, as cited in Paas, Van Gerven, & Tabbers, 2005). Those areas affected by the decline of resources include working memory, cognitive speed, inhibition, and integration. Reduced working memory capacity means that there is essentially less space available to hold thoughts for assimilation before being discarded or filed in long-term memory. To address this problem, Paas, Van Gerven, and Tabbers suggest that, in an effort to maximize the efficiency of the available cognitive resources in older adults, efforts should be made to reduce extraneous cognitive load and increase or focus the learner’s attention on the germane cognitive load. Because aging adults experience more
34
Learning, Requirements, and Metrics
limited capacity for cognitive load and slower mental processing speeds, the cognitive space available may be diminished enough to significantly affect learning in VEs (Clark, Nguyen, & Sweller, 2006). To accomplish this, Paas, Van Gerven, and Tabbers suggest using bimodal presentations (utilizing both the auditory and visual channels), eliminating redundant information (extraneous cognitive load), and, to a lesser degree, using enhanced timing, attention scaffolding, and enhanced layout. However, it is not clear whether the results of this approach would result in transfer of learning in virtual environments. Additional research in this regard is needed. Communication Skills While there have been many personality factors implicated in learning and (particularly) performance, we focus attention here on the collaborative aspects of VEs. Collaborative virtual environments (CVEs) allow people to work together in a shared virtual space and communicate both verbally and nonverbally in real time (Bailenson, Beall, Loomis, Blascovich, & Turk, 2004). When communicating online, people display less social anxiety, fewer inhibitions, and reduced self-awareness. These differences may help to reduce stress, promote a more relaxed atmosphere for learning compared to a classroom setting (Roed, 2003), and support social awareness (Prasolova-Forland & Divitini, 2003a, 2003b). Self-concept and self-representation may also play a role in VE performance. For example, using an avatar (self-representation) may impact virtual group dynamics and often reflects the true personalities of the users. Thus, leaders would generally take on leadership roles in collaborative virtual environments, while followers would take on a more passive role. Other important characteristics that impact performance in collaborative virtual environments include user dedication in virtual tasks, sense of group cohesiveness, and willingness to collaborate (Kaber, Draper, & Usher, 2002). Further, performance in collaborative virtual environments is not only dependent on the user, but also on the feeling of immersion and group acceptance in the VE (Slater & Wilber, 1997). Those who are more open to the experience generally perform better in these environments. On the other hand, those who do not effectively collaborate with others in the VE may actually degrade group performance by acting as a distracter from individual task performance (Romano, Drna, & Self, 1998). Recommendations regarding these issues focus on display transparency, realistic avatars, and individual differences in group work. Attention should be paid to these areas of concern in an effort to improve performance in collaborative virtual environments and decrease the extraneous cognitive load for the participants. Such reductions will help improve the focus on the pertinent or germane information for training and improve group dynamics for learning. Spatial Ability and Memory As noted, spatial ability appears to be an important potential variable in the effectiveness of virtual environments as agents of learning. Both Garg, Norman,
The Role of Individual Differences in Virtual Environment Based Training
35
and Sperotable (2001) and Waller (2000) found mitigating effects of spatial ability in learning. Accordingly, Mayer (2001) suggests that the design of multimedia knowledge presentations should be adjusted according to the learner’s prior knowledge and spatial ability. Clearly, the demands imposed on learners in an interactive, three-dimensional VE are different from those associated with more passive forms of instruction (for example, listening to a lecture or reading a textbook). Precisely what these differences translate to in terms of learning effects and design implications remains to be seen. Several studies have investigated the relationship between spatial abilities and the ability to navigate in VEs and/or acquire spatial knowledge from VEs. For example, it has been found that novice users experience more difficulties using VEs, especially when they have to navigate through small spaces such as virtual doorways (Kaber, Draper, & Usher, 2002). It is likely that these difficulties derive from the difficulty of learning how to use the controls. As such, designers should attempt to make control interfaces as natural as possible to promote time to proficiency for novices (Kaber, Draper, & Usher, 2002). Previously, variable results have been found with regard to the connection between spatial abilities in the real world and the ability of the user to acquire spatial knowledge from a VE (Waller, 2000). However, in Waller’s research, there is support for the connection. Specifically, it was found that spatial visualization and spatial orientation skills correlated with the ability to acquire spatial information from a VE. In agreement, scores on field independence tests and figure rotation ability correlate with task performance in a VE (Oman et al., 2000). Thus, it would seem one could test users with paper and pencil tests of spatial abilities and better predict their abilities to function in VEs. Not surprisingly, Chen, Czerwinski, and Macredie (2000) found that those users with lower spatial abilities were slower and produced more errors than the high spatial abilities group. It is hypothesized that the difference is likely due to reduced navigation skills. Interestingly though, Tracey and Lathan (2001) report that those users with lower spatial skills demonstrated a significant positive transfer from simulation based training to real world tasks compared to users with high spatial abilities. Although it appears counterintuitive that those users with lower spatial abilities would perform worse in the VE but then better transfer the skills to the real world, it is possible that by moving slowly through the system, these users would be better able to master the skills learned and thus allow for better transfer of those skills. However, in a meta-analysis done by Chen and Rada (1996), it was determined that those users with higher spatial abilities may be able to create structural mental models more quickly than users with lower spatial abilities as evidenced by the reduced need to review the table of contents. In other words, those with better spatial skills are able to more quickly learn the layout of the VE system and thus perform better and quicker. Additionally, this increased speed of acquisition of knowledge may act to reduce the extraneous cognitive load of navigation skills allowing the user to focus attention on the training materials. More research in this area is needed to better clarify these connections.
36
Learning, Requirements, and Metrics
Cockburn and McKenzie (2002) found a negative correlation between freedom to locate objects in VEs and performance with users reporting that environments with more dimensions were perceived as more “cluttered” and less efficient. These findings may be explained through the theory of cognitive load in that the additional information acts as a distracter from the germane information needed to navigate and perform in the environment. As such, for training development purposes attention should be paid to balancing interest in having a high level of fidelity or functionality in the VE and the possibility of cognitive overload in the user. Information and functionality choices should be made meaningfully and with the intention of supporting the training goals to achieve maximum performance. Spatial Abilities and Gender The differential impact of gender on training in virtual environments is an area not widely studied in the current literature. However, three main themes have been addressed intermittently and are discussed widely in similar literatures such as human-computer interaction, interactive and serious games, and online or distance learning. From these areas of research, we draw out these themes, namely, differences in spatial abilities, language proficiency, and differential levels of computer usage. Generally speaking, females exhibit not only a lower level of spatial ability (Waller, 2000), but research has shown that they actually use different parts of the brain to attend to spatial tasks (Vogel, Bowers, & Vogel, 2003). Specifically, while males showed a right hemisphere preference for spatial tasks; females failed to show a hemispheric preference, meaning that they use both hemispheres equally. Implications of the use of the left hemisphere by females during spatial tasks may be a result of language usage during spatial tasks. The increased level of language proficiency by females may impact how they interact with the system and their interpretation of the verbal components of the training. Finally, the amount of time spent on the computer has a general positive correlation with the level of comfort using the system thereby influencing the learner’s perceived cognitive task load during activities (Egan, 1988). Because the average female uses computers less frequently than males (Busch, 1995), this difference may influence cognitive task load and has the potential to indirectly impact training in virtual environments. Differences by gender in spatial abilities may influence the ability to effectively function in a VE and the strategies used to acquire spatial information in a VE (Sandstrom, Kaufman, & Huettel, 1998). Research has shown that individuals with higher spatial skills are generally better able to perform in graphic or spatially oriented interfaces (Borgman, 1989); as such, we would expect that females would be at a disadvantage in virtual environments. Because the role of spatial skills is even more important in virtual environments than it is in the real world (Waller, 2000), they may directly impact females’ ability to navigate the environment, with males significantly outperforming females (Tan & Czerwinsk, 2006). To mitigate these differences, females are often given extended time to work within the system. Clearly, however, there is a need to see if this extra
The Role of Individual Differences in Virtual Environment Based Training
37
practice results in an effective learning outcome. Alternatively, research has shown that increasing the size of the display not only increases spatial abilities across genders, allowing them to better perform in the system, but additionally acts to attenuate the differences between the genders in performance (Tan & Czerwinsk, 2006). This might be another strategy for equating the genders in terms of learning outcomes. Finally, gender may interact with other important individual characteristics, such as computer experience. Not only do males use computers more often (Busch, 1995), research also shows that males spend more time playing video games than females (Colwell, Grady, & Rhaiti, 2006). This increased experience level has the eventual effect of greater familiarity of VEs and consequently lessens the extraneous cognitive load placed on the user. Therefore, females may become overwhelmed more quickly in virtual environments and might benefit from easier scenarios in early training trials. Clearly, this is a topic worthy of further study, since it is crucial to understand whether females can benefit equally from VE based training. It is also worth noting that gender differences may eventually fade as girls are exposed to computers, virtual worlds, and computer games at an earlier age. In fact, it may be that gap between the genders, in terms of use of technology, that may be closing. Cognitive Ability There is a vast literature implicating trainee cognitive ability as an important consideration in instructional design (see Hunter, 1986; National Research Council, 1999; Ree, Carretta, & Teachout, 1995; Ree & Earles, 1991; Colquitt, LePine, & Noe, 2000; Randel, Morris, Wetzel, & Whitehill, 1992; Quick, Joplin, Nelson, Mangelsdorff, & Fiedler, 1996; Kaemar, Wright, & McMahan, 1997; Warr & Bunce, 1995), so we will not review that here. Suffice it to say that learners with higher cognitive ability tend to fare better in training, but many other variables are also involved. We do not have reason to believe this will be different for virtual environments. However, an interesting question posed with respect to cognitive ability and virtual environments is that they may have the unique characteristic of being able to mitigate the impact of low cognitive ability in learning. For example, as part of their work in anchored instruction, the Cognition and Technology Group at Vanderbilt (CTGV) discovered that material presented with dynamic visual support (as opposed to only linguistic support) was beneficial to young learners with lower linguistic competencies (Sharp et al., 1995). It is quite possible that the flexibility of presentation of content afforded by virtual environments can enable them to be tailored to the needs of the learner or compensate for any incoming weaknesses. However, it is clear that specific cognitive abilities might enable learners to gain the maximum from virtual environment based training (and vice versa). For example, Yeh (2007) reported that teachers with specific abilities, such as interpersonal intelligence, obtained more benefit from a simulation based training experience to train classroom management skills. Kizony, Katz, and Weiss (2004) report that several cognitive abilities correlated with performance by
38
Learning, Requirements, and Metrics
stroke patients on virtual psychomotor tasks. The cognitive abilities included attention, visual search, and visual memory. Each of these was related to at least one aspect of performance. Obviously, various cognitive abilities are predictive of performance in virtual environments. What is not known, however, is which abilities are important in these environments universally, and which are related to the elements of the learning task. Thus, there is a need for theory development, as well as the consequent empirical research, to explore this important area more fully. Alterable States As mentioned, there is also a class of individual differences that is considered to be states, rather than traits. Hence, these individual differences can be modified prior to training. In fact, understanding how to best prepare a learner for training can have important implications for learning outcomes. For example, if a learner has poor self-efficacy prior to training, it may be relatively easy (and cheap) to raise it before commencing training. It is likely to be more cost-effective to modify the attitude than potentially wasting time in a sophisticated virtual environment. Hence, efforts to understand and modify these so-called “alterable” differences should be made. Prior Knowledge and Experience Several studies have found that the learner’s prior knowledge and experience in a domain affect his or her ability to learn new material. In particular, it has been found that learners with less knowledge and experience need a more structured learning environment (Hannafin, Oliver, & Sharma, 2003) and more direct scaffolding (CTGV, 2000). One goal of virtual environment based education, therefore, should be to activate prior knowledge in the creation of new knowledge—an application of the scaffolding approach to training (Kirkley et al., 2004). Another approach would be to provide “pretraining” to ensure that all trainees have an adequate pool of knowledge to enable the virtual experience to be effective (Bingener et al., 2008). Yet another approach involves assessing the prior knowledge of trainees and selecting, or creating, scenarios that are designed to take the trainee to the next level of knowledge. This approach has been shown to be effective in a variety of settings using simulation based training (for example, Birch et al., 2007). However, this approach requires not only a large library of scenarios, but a selection approach to choose scenarios to fit a given level of prior knowledge. It is also important to consider the experience of trainees with the technology and the interface that links that technology to the user. Researchers have shown that previous experience with a virtual environment is a significant predictor of subsequent performance in that environment (Draper, Fujita, & Herndon, 1987). Further, it might be that the process of developing a mental model of the virtual
The Role of Individual Differences in Virtual Environment Based Training
39
environment interferes with the development of a mental model of the actual learning material (Kaber, Draper, & Usher, 2002). It may be that there is a requirement to “learn” the interface before a trainee is comfortable enough to have a positive learning experience. However, in one case, differences between those who were familiar with the interface and those who were not were made equivalent with a few practice trials (Frey, Hartig, Ketzel, Zinkernagel, & Moosbrugger, 2007). It is not clear whether this type of intervention is routinely effective; further research is needed to better understand this phenomenon. Goal Orientation It has been suggested that the experience that a learner might have in a virtual environment can be affected by a variety of attitudes that the learner brings to the situation. One attitude that seems particularly important in this regard is goal orientation. Goal orientation refers to the learner’s motivation when approaching the learning task. Researchers in this area argue that learners can be either performance or mastery oriented (Brett & VandeValle, 1999; Ford, Weissbein, Smith, Gully, & Salas, 1998; Phillips & Gully, 1997; Dweck, 1986; Dweck & Leggett, 1988; Elliot & Harackiewicz, 1994; Grieve, Whelan, Kottke, & Meyers, 1994; Kozlowski, Gully, McHugh, Salas, & Cannon-Bowers, 1996; Covington, 2000). Performance-oriented individuals are concerned with their performance outcomes (for example, maximizing their scores in a game), while masteryoriented individuals are concerned more about their own learning process by which task competence is achieved (Kozlowski, Gully, Salas, & CannonBowers, 1996). Further, mastery goals are self-referenced and focus on improving individual skills over past performance on a task (Harackiewicz & Elliot, 1993). Evidence also suggests that mastery goals may lead to faster task acquisition and task self-efficacy; improve perceptions of competence, mood, and motivation; and provide higher levels of enjoyment (see Cannon-Bowers & Salas, 1998). Performance goals, on the other hand, are more immediate and outcome oriented. They focus attention on achieving specific performance targets, but not on the strategies used to obtain them. There has been some evidence to suggest that performance-oriented individuals are more likely to perform better in training, but worse on transfer tasks, while the opposite appears to be true for mastery-oriented learners (Dweck, 1986). This is typically explained as being due to the fact that a performance orientation leads learners to figure out one or two strategies that maximize immediate performance, but that they fail to establish more flexible, generalizable strategies. In virtual environments, this variable may be problematic if it is found that typical gaming features (such as score keeping and competition) exacerbate a trainee’s tendency toward performance goals. To ameliorate this, it may be necessary for designers to explicitly develop early content that emphasizes mastery, or strategies embedded in game play that require trainees to explore more generalizable strategies. Interestingly, it seems that goal orientation might also influence the manner in which trainees approach virtual environment based training (Hwang & Li, 2002;
40
Learning, Requirements, and Metrics
Schmidt & Ford, 2003). For example, it has been demonstrated that learners with a mastery orientation achieved better learning outcomes in a computer-mediated learning environment than did those with a performance orientation (Klein, Noe, & Wang, 2006). However, the mechanism that underlies this difference is not well understood. It has been suggested that trainees with a performance orientation are more likely to possess an “entity theory” of the learning situation. That is, they are more likely to see ability as a fixed, inflexible quantity. It has also been argued that these individuals are also more likely to attribute their failures or frustrations to elements of the virtual environments (Hwang & Li, 2002). Furthermore, their data suggest that trainees with a mastery orientation are more likely to enjoy virtual training environments more than those with a performance orientation (Yi & Hwang, 2003). One might suggest that a similar relationship would exist between the constructs of goal orientation and presence, mediating the impact on learning. This is a relationship that requires additional study. It should also be noted that the effectiveness of any virtual learning environment is dependent upon the scenario that is deployed in that system. An understanding of the impact of learner control on the effectiveness of training might help in the creation of these scenarios. For example, it might be advisable to provide trainees with a performance orientation with scenarios that provide less challenge and risk for error and to increase this challenge very gradually. More efficient training might be accomplished with mastery-oriented individuals using more challenging scenarios (Heimbeck, Frese, Sonnentag, & Keith, 2003).
Expectations for Training Researchers have suggested that preconceived attitudes toward training are effective predictors of subsequent training outcomes (Cannon-Bowers, Salas, Tannenbaum, & Mathieu, 1995; Smith-Jentsch, Jentsch, Payne, & Salas, 1996). This is potentially a troublesome finding for scientists interested in using virtual environments for training. Specifically, it might be that the virtual environment is so different from the trainees’ experience that it colors their expectations about how effective it can be, adversely influencing downstream learning. Indeed, some trainees may not readily accept the use of a virtual environment to train “serious” knowledge and skills. While there is virtually no research with which to evaluate this hypothesis, it is clear that some students are at least dubious about being educated in computer based environments (Chiou, 1995; Hunt & Bohlin, 1991). For example, it has been demonstrated that trainees who participated in a “simulation” designed to train decision-making skills had higher expectations, and learned more, than trainees who experience a “game” with the same learning content (Baxter, Ross, Phillips, & Shafer, 2004). Further, it has been suggested that the fidelity of training needs to be increased along with the experience of the trainee in order to keep trainees’ expectations appropriately high (Merriam & Leahy, 2005). Similarly, even if the trainee has positive expectations about the outcomes of this type of learning, the effects might also be mitigated by negative attitudes of
The Role of Individual Differences in Virtual Environment Based Training
41
instructors or coaches who are adjunct to the learning experience (MacArthur & Malouf, 1991). Consequently, there may be benefit in developing training interventions to improve the attitudes of instructors toward virtual environments. Self-Efficacy Much work has focused on the contribution of the student’s self-efficacy in training. Self-efficacy is defined generally as the learner’s belief that he or she has the necessary competence to accomplish the task (Bandura, 1982, 1989; Gist, Schwoerer, & Rosen, 1989; Gist, Stevens, & Bavetta, 1991; Cole & Latham, 1997; Eden & Aviram, 1993; Ford et al., 1998; Mathieu, Martineau, & Tannenbaum, 1993; Martocchio, 1994; Martocchio & Webster, 1992; Mathieu, Tannenbaum, & Salas, 1992; Quin˜ones, 1995; Mitchell, Hopper, Daniels, & George-Falvy, 1994; Phillips & Gully, 1997; Stevens & Gist, 1997; Stajkovic & Luthans, 1998). In general, studies converge on the conclusion that high selfefficacy learners perform better in training than low self-efficacy learners. In addition, self-efficacy appears to affect a host of other motivational variables such as goal setting, self-regulation, self-attribution, and others (see Kanfer & McCombs, 2000; Pintrich & Schunk, 1996). Given these results, the question for the use of virtual environment design is not necessarily whether preexisting self-efficacy will affect training outcomes (evidence suggests that it will), but how the experience might be structured to assess and raise self-efficacy early in training. Although there is little research in this area, there are some findings to suggest that some elements of scenario design can be used to increase a trainee’s self-efficacy. For example, Holladay and Quin˜ones (2003) demonstrated that varying aspects of practice were effective in increasing trainee self-efficacy. This might indicate that a variety of scenarios with varying challenges would be most effective for virtual environment based training. Further, it is important to realize that self-efficacy with the technology of virtual environment itself might influence the training outcome. There are data to suggest that psychological states such as computer self-efficacy might mediate the relationship between a computer based education environment and the eventual training outcome from the virtual experience. This mediating relationship has been demonstrated by Spence and Usher (2007). However, it is interesting to note that increasing computer self-efficacy is not as easy as merely providing experience with the system (Wilfong, 2006). Indeed, even a simple training program designed to increase this type of self-efficacy did not have the expected effect on improving this attitude (Ross & Starling, 2005). Consequently, there is a need for further study of how to improve this apparently important mediating variable. SUMMARY: IMPLICATIONS FOR RESEARCH AND DESIGN Clearly, much is known about how individual differences operate in learning environments. However, specific investigations of individual differences in VE based learning systems are not common. Hence, in Table 2.1 we summarize by
42
Learning, Requirements, and Metrics
Table 2.1. Individual Differences in VE Based Learning Systems Learner Attribute
Definition/Description
Sample Research Question(s)
Preliminary Design Guidance
Gender
Degree to which gender directly or indirectly affects learning outcomes in VE based learning systems
Do males benefit more from training in VEs than females? Will early exposure to computers close any gender gaps?
Provide additional pretraining practice for females; use common interface metaphors
Age
Degree to which age-related changes affect learning outcomes in VE based learning systems
How do age-related cognitive changes affect learning in VEs?
Reduce extraneous stimuli in the virtual environment; create optimized interfaces
Personality— Collaboration
Degree to which users are inclined to work cooperatively with others in a VE
Does the tendency to work cooperatively affect learning in a shared virtual space?
Select for collaborative learning; provide pre-exposure team training
Personality— Immersive Tendencies
Possible individual difference in a user’s tendency to feel immersed in a virtual environment
Is immersive tendency Consider non-VE training for low a reliable individual difference? If so, does tendency individuals it have an impact on learning in a VE?
Cognitive Ability
General class of aptitudes associated with general intelligence (for example, reasoning, verbal comprehension, and working memory capacity)
How does cognitive ability impact the design of VEs for learning? What VE features are needed to accommodate learners of varying cognitive ability? Can VEs compensate for weaknesses in ability?
Provide greater visual support
Spatial Ability
Ability to generate, maintain, and manipulate mental visual images
How does spatial ability impact the design of VEs? What VE features are needed to accommodate learners of varying spatial ability?
Use 2-D graphics for low spatial ability individuals; provide spatial memory aids
Immutable
The Role of Individual Differences in Virtual Environment Based Training Prior Domain-specific Knowledge and knowledge that a Experience learner brings to the learning task
How can prior knowledge be automatically assessed in VEs? How can VE design be optimized to accommodate learners with different levels of prior knowledge?
Provide pretraining for low experience individuals; design training scenarios to scaffold learning
43
Alterable Goal Orientation
Nature of goals (mastery versus performance) set by trainees that influences their learning strategy and what they emphasize in training
How does goal orientation interact with VE design? How can VEs be designed to trigger mastery orientation in learners?
Reinforce mastery orientation behaviors in the virtual environment; provide masteryoriented feedback
Expectations for Training
Beliefs about what the training system/ game will be like; can be past experience with instruction and/or past experience with virtual environments
How do ingoing expectations regarding gaming affect the success of VEs? Do trainees’ incoming expectations for training affect their reactions to the VE?
Share “success stories” or other orientations before training; demonstrate applicability of training to the job
Self-Efficacy
Belief that one has the necessary capability to accomplish a certain level of performance in a given task
How must VE design be modified to accommodate learners with varying levels of self-efficacy?
Scaffold learning scenarios to ensure early successes; provide feedback designed to increase efficacy
offering a set of design recommendations that were gleaned from existing literature and also by providing a set of research questions to guide future work. REFERENCES Bailenson, J. N., Beall, A. C., Loomis, J., Blascovich, M., & Turk, M. A. (2004). Transformed social interaction: Decoupling representation from behavior and form in collaborative virtual environments. Presence, 13(4), 428–441. Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215. Bandura, A. (1982). The assessment and predictive generality of self-percepts of efficacy. Journal of Behavior Therapy and Experimental Psychiatry, 13, 195–199.
44
Learning, Requirements, and Metrics
Bandura, A. (1986). From thought to action: Mechanisms of personal agency. New Zealand Journal of Psychology, 15, 1–17. Bandura, A. (1989). Regulation of cognitive processes through perceived self-efficacy. Developmental Psychology, 25, 725–739. Baxter, H. C., Ross, J. K., Phillips, J., & Shafer, J. E. (2004, December). Framework for assessment of tactical decision making simulations. Paper presented at the Interservice/Industry Training, Simulation & Education Conference (I/ITSEC), Orlando, FL. Bingener, J., Boyd, T., Van Sickle, K., Jung, I., Saha, A., Winston, J., Lopez, P., Ojeda, H., Schwesinger, W., & Anastakis, D. (2008). Randomized double-blinded trial investigating the impact of a curriculum focused on error recognition on laparoscopic suturing training. The American Journal of Surgery, 195(2), 179–182. Birch, L., Jones, N., Doyle, P., Green, P., McLaughlin, A., Champney, C., Williams, D., Gibbon, K., & Taylor, K. (2007). Obstetric skills drills: Evaluation of teaching methods. Nurse Education Today, 27(8), 915–922. Birren, J. E., & Livingston, J. (1985). Cognition, stress, and aging. Englewood Cliffs, NJ: Prentice-Hall. Borgman, C. L. (1989). All users of information retrieval systems are not created equal: An exploration into individual differences. Information Processing & Management, 25, 237–251. Bransford, J. D., Brown, A. L., & Cocking, R. R. (1999). How people learn: Brain, mind, experience, and school. Washington, DC: National Academies Press. Brett, J. F., & VandeValle, D. (1999). Goal orientation and goal content as predictors of performance in a training program. Journal of Applied Psychology, 84, 863–873. Busch, T. (1995). Gender differences in self-efficacy and attitudes toward computers. Journal of Educational Computing Research, 12, 147–158. Cannon-Bowers, J. A., & Salas, E. (Eds.). (1998). Making decisions under stress: Implications for individual and team training. Washington, DC: American Psychological Association. Cannon-Bowers, J. A., Salas, E., Tannenbaum, S. I., & Mathieu, J. E. (1995). Toward theoretically-based principles of trainee effectiveness: A model and initial empirical investigation. Military Psychology, 7, 141–164. Chen, C., Czerwinski, M., & Macredie, R. (2000). Individual differences in virtual environments—Introduction and overview. Journal of the American Society for Information Science, 51(6), 499–507. Chen, C., & Rada, R. (1996). Interacting with hypertext: A meta-analysis of experimental studies. Human-Computer Interaction, 11(2), 125–156. Chiou, G. F. (1995). Reader interface of computer-based reading environment. International Journal of Instructional Media, 22(2), 121–133. Clark, R. (2002). Learning outcomes: The bottom line. Communication Education, 51(4), 396–404. Clark, R., Nguyen, F., & Sweller, J. (2006). Efficiency in learning: Evidence-based guidelines to manage cognitive load. San Francisco, CA: Pfeiffer. Clark, R., & Wittrock, M. C. (2000). Psychological principles in training. In S. Tobias & J. D. Fletcher (Eds.), Training and retraining: A handbook for business, industry, government, and the military (pp. 51–84). New York: Macmillan.
The Role of Individual Differences in Virtual Environment Based Training
45
Cockburn, A., & McKenzie, B. (2002). Evaluating the effectiveness of spatial memory in 2D and 3D physical and virtual environments. Proceedings of the Conference on Human Factors in Computing Systems (pp. 203–210). New York: ACM. Cognition and Technology Group at Vanderbilt (CTGV). (2000). Connecting learning theory and instructional practice: Leveraging some powerful affordances of technology. In H. F. O’Neil, Jr., & R. S. Perez (Eds.), Technology applications in education: A learning view (pp. 173–209). Mahwah, NJ: Lawrence Erlbaum. Cole, N. D., & Latham, G. P. (1997). Effects of training in procedural justice on perceptions of disciplinary fairness by unionized employees and disciplinary subject matter experts. Journal of Applied Psychology, 82(5), 699–705. Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of training motivation: A meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85(5), 678–707. Colwell, J., Grady, C., & Rhaiti, S. (2006). Computer games, self-esteem and gratification of needs in adolescents. Journal of Community and Applied Social Psychology, 5(3), 195–206. Covington, M. V. (2000). Goal theory, motivation, and school achievement: An integrative review. Annual Review of Psychology, 51, 171–200. Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instructional methods: A handbook for research on aptitude-treatment interactions. New York: Irvington. Draper, J. V., Fujita, Y., & Herndon, J. N. (1987). Evaluation of high-definition television for remote task performance (Rep. No. ORNL/TM-10303). Blacksburg, VA: Virginia Polytechnic Institute and State University. Dweck, C. (1986). Motivational processes affecting learning. American Psychologist, 41 (10), 1040–1048. Dweck, C., & Legget, E. (1988). A social-cognitive approach to motivation and personality. Psychological Review, 95, 256–273. Eden, D., & Aviram, A. (1993). Self-efficacy training to speed reemployment: Helping people to help themselves. Journal of Applied Psychology, 78(3), 325–360. Egan, D. (1988). Individual differences in human-computer interaction. In M. Helander (Ed.), Handbook of human-computer interaction (pp. 541–568). North-Holland: Elsevier. Elliot, A. J., & Harackiewicz, J. M. (1994). Goal setting, achievement orientation, and intrinsic motivation: A mediational analysis. Journal of Personality and Social Psychology, 66, 968–980. Fisk, A. D., & Rogers, W. A. (1991). Toward an understanding of age-related memory and visual search effects. Journal of Experimental Psychology, 120, 131–149. Fisk, J. E., & Warr, P. (1996). Age and working memory: The role of perceptual speed, the central executive, and the phonological loop. Psychology and Aging, 11, 316–323. Ford, J. K., Weissbein D. A., Smith, E. M., Gully, S. M., & Salas, E. (1998). Relationships of goal orientation, metacognitive activity, and practice strategies with learning outcomes and transfer. Journal of Applied Psychology, 83(2), 218–233. Frey, A., Hartig, J., Ketzel, A., Zinkernagel, A., & Moosbrugger, H. (2007). The use of virtual environments based on a modification of the computer game Quake III Arena® in psychological experimenting. Computers in Human Behavior, 23(4), 2026–2039. Garg, A., Norman, G., & Sperotable, L. (2001). How medical students learn spatial anatomy. The Lancet, 357(9253), 363–364.
46
Learning, Requirements, and Metrics
Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 77, 155–170. Gist, M. E. (1992). Self-efficacy: A theoretical analysis of its determinants and malleability. Academy of Management Review, 17, 183–211. Gist, M. E., Schwoerer, C., & Rosen, B. (1989). Effects of alternative training methods on self-efficacy and performance in computer software training. Journal of Applied Psychology, 74(5), 884–891. Gist, M. E., Stevens, C. K., & Bavetta, A. G. (1991). Effects of self-efficacy and posttraining intervention on the acquisition and maintenance of complex interpersonal skills. Personnel Psychology, 44, 837–861. Grieve, F. G., Whelan, J. P., Kottke, R., & Meyers, A. W. (1994). Manipulating adults’ achievement goals in a sport task: Effects on cognitive, affective and behavioral variables. Journal of Sport Behavior, 17(4), 227–246. Hannafin, M., Oliver, K., & Sharma, P. (2003). Cognitive and learning factors in webbased distance learning environments. In M. G. Moore & W. G. Anderson (Eds.), Handbook of distance education (pp. 245–260). Mahwah, NJ: Erlbaum. Harackiewicz, J. M., & Elliot, A. J. (1993). Achievement goals and intrinsic motivation. Journal of Personality and Social Psychology, 65, 904–915. Heimbeck, D., Frese, M., Sonnentag, S., & Keith, N. (2003). Integrating errors into the training process: The function of error management instructions and the role of goal orientation. Personnel Psychology, 56(2), 333–361. Hertzog, C. (1989). Influences of cognitive slowing in age differences in intelligence. Developmental Psychology, 25, 636–651. Holladay, C. L., & Quin˜ones, M. A. (2003). Practice variability and transfer of training: The role of self-efficacy generality. Journal of Applied Psychology, 88(6), 1094–1103. Hunt, N. P., & Bohlin, R. M. (1991). Entry attitudes of students towards using computers. Paper presented at the 70th Annual Meeting of California Education Research Association, Fresno, CA. Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of Vocational Behavior, 29(3), 340–346. Hwang, W. Y., & Li, C. C. (2002). What the user log shows based on learning time distribution. Journal of Computer Assisted Learning, 18, 232–236. Kaber, D. B., Draper, J. V., & Usher, J. M. (2002). Influence of individual differences on application design for individual and collaborative immersive virtual environments. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 379–402). Mahwah, NJ: Lawrence Erlbaum. Kaemar, K. M., Wright, P. M., & McMahan G. C. (1997). The effects of individual differences on technological training. Journal of Management Issues, 9, 104–120. Kanfer, R., & McCombs, B. L. (2000). Motivation: Applying current theory to critical issues in training. In S. Tobias & J. D. Fletcher (Eds.), Training and retraining: A handbook for business, industry, government, and the military (pp. 85–108). New York: MacMillan. Kirkley, J. R., Kirkley, S. E., Swan, B., Myers, T. E., Sherwood, D., & Singer, M. J. (2004, December). Developing an embedded scaffolding framework to support problem-based embedded training (PBET) using mixed and virtual reality simulations. Paper presented at the Interservice/Industry Training, Simulation & Education Conference (I/ITSEC), Orlando, FL.
The Role of Individual Differences in Virtual Environment Based Training
47
Kizony, R., Katz, N., & Weiss, P. L. (2004). Virtual reality based intervention in rehabilitation: Relationship between motor and cognitive abilities and performance within virtual environments for patients with stroke. Proceedings of the 5th International Conference on Disability, Virtual Reality & Associated Technologies (pp. 19–26). Oxford, United Kingdom: University of Reading. Klein, H. J., Noe, R. A., & Wang, C. (2006). Motivation to learn and course outcomes: The impact of delivery mode, learning goal orientation, and perceived barriers and enablers. Personnel Psychology, 59(3), 665–702. Kozlowski, S. W. J., Gully, S. M., McHugh, P. P., Salas, E., & Cannon-Bowers, J. A. (1996). A dynamic theory of leadership and team effectiveness: Developmental and task contingent leader roles. In G. R. Ferris (Ed.), Research in personnel and human resource management (Vol. 14, pp. 235–305). Greenwich, CT: JAI Press. Kozlowski, S. W. J., Gully, S. M., Salas, E., & Cannon-Bowers, J. A. (1996). Team leadership and development: Theory, principles, and guidelines for training leaders and teams. In M. Beyerlein, D. Johnson, & S. Beyerlein (Eds.), Advances in interdisciplinary studies of work teams: Team leadership (Vol. 3, pp. 251–289). Greenwich, CT: JAI Press. MacArthur, C. A., & Malouf, D. B. (1991). Teacher beliefs, plans and decisions about computer-based instruction. Journal of Special Education, 25, 44–72. Martocchio, J. J. (1994). Effects of conception of ability on anxiety, self-efficacy, and learning in training. Journal of Applied Psychology, 79(6), 819–825. Martocchio, J. J., & Webster, J. (1992). Effects of feedback and cognitive playfulness on performance in microcomputer software training. Personnel Psychology, 45(3), 553–578. Mathieu, J. E., Martineau, J. W., & Tannenbaum, S. I. (1993). Individual and situational influences on the development of self-efficacy: Implication for training effectiveness. Personnel Psychology, 46, 125–147. Mathieu, J. E., Tannenbaum, S. I., & Salas, E. (1992). Influences of individual and situational characteristics on measures of training effectiveness. Academy of Management, 35, 828–847. Mayer, R. E. (2001). Multimedia learning. New York: Cambridge University Press. Merriam, S. B., & Leahy, B. (2005). Learning transfer: A review of the research in adult education and training. PAACE Journal of Lifelong Learning, 14, 1–24. Mitchell, T. R., Hopper, H., Daniels, D., & George-Falvy, J. (1994). Predicting selfefficacy and performance during skill acquisition. Journal of Applied Psychology, 79 (4), 506–517. National Research Council. (1999). How people learn: Bridging research and practice. Washington, DC: National Academy Press. Oman, C. M., Shebilske, W. L., Richards, J. T., Tubre, T. C., Beall, A. C., & Natapoff, A. (2000). Three dimensional spatial memory and learning in real and virtual environments. Spatial Cognition and Computation, 2(4), 355–372. Paas, F., Van Gerven, P. W. M., & Tabbers, H. K. (2005). The cognitive aging principle in multimedia learning. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 339–354). New York: Cambridge University Press. Phillips, J. M., & Gully, S. M. (1997). Role of goal orientation, ability, need for achievement, and locus of control in the self-efficacy and goal-setting process. Journal of Applied Psychology, 82(5), 792–802.
48
Learning, Requirements, and Metrics
Pintrich, P. R., & Schunk, D. H. (1996). Motivation in education: Theory, research, and applications. Englewood Cliffs, NJ: Prentice Hall Merrill. Prasolova-Forland, E., & Divitini, M. (2003a). Collaborative virtual environments for supporting learning communities: An experience of use. Proceedings of the 2003 International ACM SIGGROUP Conference on Supporting Group Work (pp. 58–67). New York: ACM. Prasolova-Forland, E., & Divitini, M. (2003b). Supporting social awareness: Requirements for educational CVE. Proceedings of the 3rd IEEE International Conference on Advanced Learning Technologies (pp. 366–367). Los Alamitos, CA: IEEE Computer Society. Quick, J. C., Joplin, J. R., Nelson, D. L., Mangelsdorff, A. D., & Fiedler, E. (1996). Selfreliance and military service training outcomes. Military Psychology, 8, 279–293. Quin˜ones, M. A. (1995). Pretraining context effects: Training assignment as feedback. Journal of Applied Psychology, 80, 226–238. Randel, J. M., Morris, B. A., Wetzel, C. D., & Whitehill, B. V. (1992). The effectiveness of games for educational purposes: A review of recent research. Simulation and Gaming, 23, 261–276. Ree, M. J., Carretta, R. T., & Teachout, S. M. (1995). Role of ability and prior job knowledge in complex training performance. Journal of Applied Psychology, 80(6), 721–730. Ree, M. J., & Earles, J. A. (1991). Predicting training success: Not much more than g. Personnel Psychology, 44, 321–332. Roed, J. (2003). Language learner behaviour in a virtual environment. Computer Assisted Language Learning, 16(2), 155–172. Romano, D. M., Drna, P., & Self, J. A. (1998). Influence of collaboration and presence on task performance in shared virtual environments. Paper presented at the 1998 United Kingdom Virtual Reality Special Interest Group (UKVRSIG) Conference, Exeter, England. Ross, J. A., & Starling, M. (2005, April). Achievement and self-efficacy effects of selfevaluation training in a computer-supported learning environment. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada. Sandstrom, N. J., Kaufman, J., & Huettel, S. A. (1998). Males and females use different distal cues in a virtual environment navigation task. Cognitive Brain Research, 6, 351–360. Schmidt, A. M., & Ford, J. K (2003). Learning within a learner control training environment: The interactive effects of goal orientation and metacognitive instruction on learning outcomes. Personnel Psychology, 56(2), 405–429. Sharp, D. L. M., Bransford, J. D., Goldman, S. R., Risko, V. J., Kinzer, C. K., & Vye, N. J. (1995). Dynamic visual support for story comprehension and mental model building by young, at-risk children. Educational Technology Research and Development, 43(4), 1042–1629. Slater, M., & Wilber, S. (1997). A framework for immersive virtual environments (FIVE): Speculations on the role of presence in virtual environments. Presence, 6(6), 603–617. Smith-Jentsch, K. A., Jentsch, F. G., Payne, S. C., & Salas, E. (1996). Can pretraining experiences explain individual differences in learning? Journal of Applied Psychology, 81, 909–936.
The Role of Individual Differences in Virtual Environment Based Training
49
Spence, D. J., & Usher, E. L. (2007). Engagement with mathematics courseware in traditional and online remedial learning environments: Relationship to self-efficacy and achievement. Journal of Educational Computing Research, 37(3), 267–288. Stajkovic, A. D., & Luthans, F. (1998). Self-efficacy and work-related performance: A meta-analysis. Psychological Bulletin, 124, 240–261. Stevens, C. K., & Gist, M. E. (1997). Effects of self-efficacy and goal-orientation training on negotiation skill maintenance: What are the mechanisms? Personnel Psychology, 50, 955–978. Tan, D. S., & Czerwinsk, M. P. (2006). Large displays enhance optical flow cues and narrow the gender gap in 3-D virtual navigation. Human Factors, 48(2), 318–333. Tannenbaum, S. I., Mathieu, J. E., Salas, E., & Cannon-Bowers, J. A. (1991). Meeting trainees’ expectations: The influence of training fulfillment on the development of commitment, self-efficacy, and motivation. Journal of Applied Psychology, 76, 759–769. Tracey, M. R., & Lathan, C. E. (2001). The interaction of spatial ability and motor learning in the transfer of training from virtual to a real task. In J. D. Westwood, H. M. Hoffman, G. T. Mogel, D. Stredney, & R. A. Robb (Eds.), Medicine meets virtual reality (pp. 521–527). Amsterdam: IOS Press. Vogel, J. J., Bowers, C. A., & Vogel, D. S. (2003). Cerebral lateralization of spatial abilities: A meta-analysis. Brain and Cognition, 52(2), 197–204. Waller, D. (2000). Individual differences in spatial learning from computer-simulated environments. Journal of Experimental Psychology, 6(4), 307–321. Warr, P., & Bunce, D. (1995). Trainee characteristics and the outcomes of open learning. Personnel Psychology, 48, 347–376. Wilfong, J. D. (2006). Computer anxiety and anger: the impact of computer use, computer experience, and self-efficacy beliefs. Computers in Human Behavior, 22(6), 1001–1011. Yeh, Y. C. (2007). Aptitude-treatment interactions in preservice teachers’ behavior change during computer-simulated teaching. Computers & Education, 48(3), 495–507. Yi, M. Y., & Hwang, Y. (2003). Predicting the use of web-based information systems: Self-efficacy, enjoyment, learning goal orientation, and the technology acceptance model. International Journal of Human-Computer Studies, 59(4), 431–449.
Chapter 3
COGNITIVE TRANSFORMATION THEORY: CONTRASTING COGNITIVE AND BEHAVIORAL LEARNING Gary Klein and Holly C. Baxter The traditional approach to learning is to define the objectives (the gap between the knowledge a person has and the knowledge the person needs to perform the task), establish the regimen for practice, and provide feedback. Learning procedures and factual data are seen as adding more information and skills to the person’s storehouse of knowledge. However, this storehouse metaphor is poorly suited for cognitive skills and does not address the differing learning needs of novices and experts. Teaching cognitive skills requires the diagnosis of the problem in terms of flaws in existing mental models, not gaps in knowledge. It requires learning objectives that are linked to the person’s current mental models, practice regimens that may have to result in “unlearning” that enables the person to abandon the current, flawed mental models, and it requires feedback that promotes sensemaking. We propose a Cognitive Transformation Theory to guide the development of cognitive skills. We also present several strategies that might be useful in overcoming barriers to understanding and to revising mental models. Finally, we show the implications of Cognitive Transformation Theory for using virtual environments (VEs; where a “live” student interacts with a “simulated” environment) in training. INTRODUCTION How can cognitive skills be improved? The conventional mechanisms of practice, feedback, and accumulation of knowledge rarely apply to cognitive skills in the same way they apply to behavioral skills. In this chapter we argue that cognitive learning requires a different concept of the learning process. Traditional approaches to learning seem clear-cut: (1) identify what you want the student to learn; (2) provide the knowledge and present an opportunity to practice the skill or concept; (3) give feedback so the student can gauge whether the learning has succeeded. Educating students in behavioral skills appears to simply be a matter of practice and feedback.
Cognitive Transformation Theory
51
This approach to learning relies on a storehouse metaphor. It assumes that the learner is missing some critical form of knowledge—factual information or procedures. The learner or the instructor defines what knowledge is missing. Together, they add this knowledge via a course, a practice regimen, or through simple study. Instructors provide feedback to the learner. Then, they test whether the new knowledge was successfully added to the storehouse. We believe that this storehouse metaphor is insufficient to describe learning of cognitive skills. The storehouse metaphor may be useful for learning factual information or for learning simple procedures. But cognitive learning should help people discover new ways to understand events. We can distinguish different forms of knowledge that people need in order to gain expertise: declarative knowledge, routines and procedures, recognition of familiar patterns, perceptual discrimination skills, and mental models. The storehouse metaphor seems best suited for acquiring declarative knowledge and for learning new routines/procedures. It may be less apt for building pattern-recognition skills. It is least appropriate for teaching people to make perceptual discriminations and for improving the quality of their mental models. When people build a larger repertoire of patterns and prototypes, they are not simply adding new items to their lists. They are learning how to categorize the new items and are changing categories and redefining the patterns and prototypes as they gain new experience. The storehouse metaphor implies a simple additive process, which would lead to confusion rather than to growth. We encounter this kind of confusion when we set up a new filing system for an unfamiliar type of project and quickly realize that adding more files is creating only more confusion—the initial categories have to be changed. When people develop perceptual discrimination skills through training in VEs or other methods, they are learning to make distinctions that they previously did not notice. They are learning to “see the invisible” (Klein & Hoffman, 1993) in the sense that they can now make discriminations they previously did not notice. Perceptual learning depends on refashioning the way we attend and the way we see, rather than just adding additional facts to our knowledge base. Cognitive skills depend heavily on mental models. We define a mental model as a cluster of causal beliefs about how things happen. We have mental models for how our car starts when we turn our key in the ignition, for how water is forced out of a garden hose when the spigot is turned on, and for why one sports team has beaten another. In steering a simple sailboat, we have a mental model of why the nose of the boat will turn to the left when we press the tiller to the right. We believe that the water will press against the rudder in a way that swings the back of the boat to the right, creating a counterclockwise rotation in the boat’s heading. Therefore, the slower the boat moves, the less the water pressure on the rudder and the less pronounced this effect should be. According to Glaser and Chi (1988), mental models are used to organize knowledge. Mental models are also described as knowledge structures and schemata. Cognitive learning is not simply a matter of adding additional beliefs into the existing mental models. Rather, we have to revise our belief systems and our
52
Learning, Requirements, and Metrics
mental models as experience shows the inadequacy of our current ways of thinking. We discover ways to extend or even reject our existing beliefs in favor of more sophisticated beliefs. The scientist metaphor is much more suited to cognitive learning. This metaphor views a learner as a scientist engaged in making discoveries, wrestling with anomalies, and finding ways to restructure beliefs and mental models (Carey, 1986). The scientist metaphor is consistent with the field of science education, where students are taught to replace their flawed mental models with better concepts about how physical, chemical, and biological processes actually work. The scientist metaphor emphasizes conceptual change, not accumulation of declarative information. Within psychology, the scientist metaphor is epitomized by Piaget (1929) who described conceptual change as a process of accommodation. Posner, Strike, Hewson, and Gertzog (1982) point out that within the philosophy of science, the empiricist tradition that evaluated a theory’s success in generating confirmed predictions has been superseded by views that emphasize a theory’s resources for solving problems. This replacement fits better within the Piagetian process of accommodation than does the empiricist approach. Posner et al. have described some of the conditions necessary for accommodation to take place: dissatisfaction with existing conceptions, including the difficulties created by anomalies; the intelligibility of new concepts, perhaps by linkage with analogies and metaphors; and the initial plausibility of new conceptions. Although our own approach is firmly within the scientist metaphor, we should note some disconnects. The field of science education assumes a knowledgeable teacher attempting to convince students to accept scientifically acceptable theories. In contrast, many cognitive learning situations do not come equipped with knowledgeable teachers, and the learners have to discover for themselves where their mental models are wrong and how to replace them with more effective ones. The next section describes the kinds of sensemaking needed for cognitive learning. Following that, we present the concept of cognitive transformation as an alternative to the storehouse metaphor, and as an elaboration of the scientist metaphor. Finally, we offer some implications for achieving cognitive learning in virtual environments.
SENSEMAKING REQUIREMENTS FOR LEARNING COGNITIVE SKILLS What is hard about learning cognitive skills is that none of the traditional components of learning—diagnosis, practice, feedback, or training objectives—are straightforward. Each of them depends heavily on sensemaking (for example, Weick, 1995). Bloom’s (1956) taxonomy includes a component of synthesis— building a structure or pattern from diverse elements; and putting parts together to form a whole, with an emphasis on creating a new meaning or structure. This corresponds to the process of sensemaking. We treat cognitive learning as a sensemaking activity that includes four components: diagnosis, learning objectives, practice, and feedback. These
Cognitive Transformation Theory
53
components of sensemaking must be the up-front focus of any VE development in order for effective training transfer to occur. Diagnosis Diagnosing the reasons for weak performance depends on sensemaking. The instructor, whether in person or virtual, has to ferret out the reasons why the student is confused and making errors. Sometimes trainees do not even notice errors or weaknesses and may resist suggestions to overcome problems they do not realize they have. Even if trainees do realize something is wrong, the cause/effect mechanisms are subtle and complex. Outcome feedback, the type of feedback that is most often available in the technologies associated with virtual environments, usually does not provide any clues about what to do differently. That is why instructors and technologies need to be able to provide process feedback as the trainee progresses through the learning process, but they first must diagnose what is wrong with the trainee’s thinking. Diagnosing the reason for poor performance is a challenge to trainees. It is also a challenge to the instructors who may not be able to figure out the nature of the problem and who have no technologies capable of providing a diagnosis at this level. Diagnosis is difficult for instructional developers. The classical systems approach to instructional design is to subtract the existing knowledge, skills, and abilities (KSAs) from the needed KSAs. But for cognitive skills, instructional developers need to understand why the students are struggling. The goal of diagnosis goes beyond establishing learning objectives—it depends on discovering what flaw in a mental model needs to be corrected. For cognitive skills, it is very difficult to determine and define the existing problem. Cognitive Task Analysis (for example, Crandall, Klein, & Hoffman, 2006) methods may be needed to diagnose subtle aspects of cognitive skills. Within the framework of science education, Chi, Glaser, and Rees (1982) have discussed the use of misconceptions to understand why students are confused. Similarly, Shuell (1986) described how a student’s “buggy algorithms” could lead to misconceptions and how analysis of mistakes can provide educators with insights into how to repair the flaws. Learning Objectives With the storehouse metaphor, learning objectives are clear and succinct—the additional declarative or procedural knowledge to be imparted and the changes in performance that reflect whether the student has acquired the new material. But for cognitive learning, the objectives may be to help the students revise their mental models and perhaps to reorganize the way they categorize events. Some learning theorists emphasize the importance of integrating new learning with the concepts that are already known. For example, both Kolb (1984) and Dewey (1938) focus on learning through experience. What is important in Kolb’s reflective observation stage is how the learner transforms an experience into
54
Learning, Requirements, and Metrics
learning through reflection. During reflection, the student compares the new learning to what is already known and tries to make it fit with existing knowledge and sees how to leverage this new knowledge for additional learning. For Dewey, the key is what the learner does with experience. Not all experiences are equal and not all experiences are educational. According to Dewey, individuals reflect on their experiences to learn what thoughts and actions can change real world conditions that need improving. Dewey thought that people were constantly trying to resolve perplexing intellectual situations and difficult moral situations. Theorists such as Kolb and Dewey do not view accumulating or storing knowledge as an end state. Instead, knowledge accumulation kicks off a series of cognitive activities by the individual to figure out ways to test the “goodness” of the new learning through active experimentation or to use the new learning to change an unsatisfactory situation. The field of science education describes this process as “restructuring” (Chi et al., 1982; Shuell, 1986). Carey (1986) draws on the philosophy of science and, in particular, the work of Kuhn (1962), Feyerabend (1962), and Toulmin (1953) to describe conceptual change. When theories change, successive conceptual systems will differ in the phenomena they address, the kinds of explanations they offer, and the concepts they employ. Carey uses the example of theories of mechanics, which historically used different meanings for the terms force, velocity, time, and mass. Thus, Aristotle did not distinguish between average velocity and instantaneous velocity, whereas Galileo highlighted this difference. Carey distinguishes weak restructuring, which simply represents additional relations and schemata (for example, the storehouse metaphor), from strong restructuring, which involves a change in the core concepts themselves. Shuell (1986) uses the term “tuning” to cover Carey’s notion of weak restructuring and further notes that both tuning and restructuring resemble Piaget’s concept of accommodation. We further assert that novices may not have mental models for an unfamiliar domain and will struggle to formulate even rudimentary mental models linking causes to effects. Their learning objective is to employ sensemaking to generate initial mental models of cause/effect stories, whereas experts are revising and adding to current mental models. Following Posner et al. (1982), we suggest that accommodation itself may be a key learning objective—creating dissatisfaction with an inadequate conception, creating openness to a superior replacement.
Practice Providing students with practice is necessary for gaining proficiency. But with cognitive skills, practice is not sufficient. For cognitive skills, trainees often may not know what they should be watching and monitoring. They need adequate mental models to direct their attention, but until they get smarter, they may fail to spot the cues that will help them develop better mental models.
Cognitive Transformation Theory
55
VE can help trainees gain this needed practice in a context that allows them to build more robust mental models. Waller, Hunt, and Knapp (1998) found that while short VE training periods were no more effective than paper and pencil exercises, with sufficient exposure to a virtual training environment, VE training actually surpassed real world training. Numerous studies have supported the effectiveness of VEs. Brooks, Fuchs, McMillan, Whitton, and Cannon-Bowers (2006) found that VEs can provide a higher density of experiences and the chance to practice rare and dangerous scenarios safely, and Witmer, Bailey, and Knerr (1995) validated the ability of VE training to transfer to real world settings in a study they conducted with the training of dismounted soldiers in virtual environments. Managing attention depends on sensemaking. Feedback will not be useful if the trainee does not notice or understand it—and that requires the trainee to know what to attend to and when to shift attention. Barrett, Tugade, and Engle (2004) have suggested that attention management accounts for many of the individual differences in working memory—the ability to focus attention and not be distracted by irrelevancies. For these reasons, we argue that effective practice, whether in actual or in virtual environments, depends on attention management: seeking information—knowing what to seek and when to seek it—and filtering distracting data.
Feedback Providing students with feedback will not be useful if they do not understand it. For complex cognitive skills, such as leadership, time lags between actions and consequences will create difficulties in sorting out what worked, what did not work, and why. Learners need to engage in sensemaking to discover causeeffect relationships between actions taken at time one and the effects seen at time two. To make things more complicated, learners often have to account for other actions and events that are interspersed between their actions and the consequences. They have to figure out what really caused the consequences versus the coincidental events that had nothing to do with their actions. They have to understand the causes versus the symptoms of deeper causes, and they have to sort out what just happened, the factors in play, the influence of these factors, and the time lags for the effects. To add to these complications, having an instructor or training tool provide feedback can actually get in the way of transfer of learning (Schmidt & Wulf, 1997) even though it increases the learning curve during acquisition. By placing students in an environment where they are given rapid feedback, the students are not compelled to develop skills for seeking their own feedback. Further, students may become distracted from intrinsic feedback because it is so much easier to rely on the extrinsic feedback. As a result, when they complete what they set out to learn, they are not prepared to seek and interpret their own feedback. One of the challenges for cognitive learning is to handle time lags between actions and consequences. VE sessions will compress these time lags, which might clarify relationships but will also reduce the opportunity to learn how to
56
Learning, Requirements, and Metrics
interpret delayed feedback. To compensate, VE sessions could add distracters that might have potentially caused the effects as a way to sustain confusion about how to interpret feedback. In addition, VE sessions could be structured to monitor how people interpret the feedback. For cognitive learning, one of the complications facing instructional designers is that the flawed mental models of the students act as a barrier to learning. Students need to have better mental models in order to understand the feedback that would invalidate their existing mental models. Without a good mental model, students will have trouble making use of feedback, but without useful feedback, students will not be able to develop good mental models. That is why cognitive learning may depend on unlearning as well as learning.
THE PROCESS OF UNLEARNING For people to develop better mental models they may have to unlearn some of their existing mental models. The reason is that as people gain experience, their understanding of a domain should become more complex and nuanced. The mental models that provided a rough approximation need to be replaced by more sophisticated ones. But people may be reluctant to abandon inadequate mental models, as they may not appreciate the inadequacies. They may attempt to explain away the inconsistencies and anomalies. A number of researchers have described the reluctance to discard outmoded mental models even in the face of contrary evidence. DeKeyser and Woods (1990) have commented on the way decision makers fixate on erroneous beliefs. Feltovich, Spiro, and Coulson (1997) used a garden path paradigm and identified a range of knowledge shields that pediatric cardiologists employed to discount inconvenient data. Chinn and Brewer (1993) showed that scientists and science students alike deflected inconvenient data. They identified seven reactions to anomalous data that were inconsistent with a mental model: ignoring the data, rejecting the data, finding a way to exclude the data from an evaluation of the theory/model, holding the data in abeyance, reinterpreting the data while retaining the theory/model, reinterpreting the data and making peripheral changes to the theory/model, and accepting the data and revising the theory/model. Only this last reaction changes the core beliefs. The others involve ways to discount the data and preserve the theory. Klein, Phillips, Rall, and Peluso (2006) described the “spreading corruption” that resulted when people distorted data in order to retain flawed mental models. As people become more experienced, their mental models become more sophisticated, and, therefore, people grow more effective in explaining away inconsistencies. Fixations should become less tractable as cognitive skills improve. Therefore, people may have to unlearn their flawed mental models before they can acquire better ones. Sensemaking here is a deliberate activity to discover what is wrong with one’s mental models and to abandon and replace them. Oftentimes, VEs can allow trainees to see the flaws in their mental models by illustrating the potential behavioral outcomes of their current cognitive processes. Being
Cognitive Transformation Theory
57
able to understand these flaws is critical for the unlearning process and enabling accommodation. The process of unlearning that we are presenting resembles the scientific paradigm replacements described by Polanyi (1958) and Kuhn (1962). Another philosopher of science, Lakatos (1976), explained that researchers more readily change their peripheral ideas to accommodate anomalies than their hard-core ideas on which the peripheral ideas are based. As expected, the notion of disconfirmation is central to science education because of the importance and difficulty of changing students’ naive theories. And just as scientists resist changing their theories when exposed to disconfirming evidence, so do students. Eylon and Linn (1988) reviewed studies showing that students can be impervious to contradictions. According to Chinn and Brewer (1993), the more a belief is embedded in supporting data and concepts and is used to support other concepts, the greater the resistance. Further, the anomalous data need to be credible, nonambiguous, and presented in concert with additional data in order to have the necessary impact, which presents additional requirements for effective use of VEs. The term “unlearning” is widely used in the field of organizational learning. Starbuck and Hedberg (2001) stated that “Organizations’ resistance to dramatic reorientations creates a need for explicit unlearning . . . Before attempting radical changes, [organizations] must dismantle parts of their current ideological and political structures. Before they will contemplate dramatically different procedures, policies, and strategies, they must lose confidence in their current procedures, policies, strategies, and top managers” (p. 339). We believe that these observations apply to individuals as well as to organizations and that the concept of unlearning needs to become part of a cognitive learning regimen. Just like organizations, individuals also resist changing their mental models. Chinn and Brewer (1993) refer to Kuhn’s (1962) research to suggest that students will be more likely to abandon a flawed set of beliefs if they have an alternative theory/model available. This method may work best when the alternative model is already part of the students’ repertoire. For example, Brown and Clement (1989) tried to teach students about the balance of forces in operation when a book is resting on a table. The students initially refused to believe that the table exerts an upward force on the book. So they were asked to imagine that they were supporting a book with their hand. Clearly, their hand was exerting force to keep the book from falling. Next, the students were told to imagine that the book was balanced on a spring. Next, they imagined a book balanced on a pliable wooden plank. Eventually, many of the students came to accept that the solid table must be exerting an upward force on the book. This type of gradual introduction of alternative analogies seems very promising. The alternative explanations make it easier to give up the flawed mental model. However, in some situations we suspect that the reverse has to happen. People have to lose confidence in their models before they will seriously consider an alternate. Thus, DiBello and her colleagues developed a two-day program that created a VE to help managers think more effectively about their work (DiBello, 2001). The first day was spent in a simulation of their business designed to have
58
Learning, Requirements, and Metrics
the managers fail in the same ways they were failing in real life. This experience helped the managers lose confidence in their current mental models of how to conduct their work. The second day gave the managers a second shot at the simulated exercise and a chance to develop and use new mental models of their work. DiBello and her colleagues have recently ported their program onto Second Life, an Internet based virtual world video game, as a more effective means of instruction. Schmitt (1996) designed similar experiences for the U.S. Marine Corps. His Tactical Decision Games—low fidelity paper and pencil exercises—put individual marines into situations that challenged their thinking and made them lose confidence in their mental models of tactics and leadership. The exercises, like the more technologically advanced VE, provided a safe environment for rethinking some of their closely held beliefs. When the Tactical Decision Games were presented via a VE format, the stress and training impact appear to have been sustained. Scott, Asoko, and Driver (1991) have described two broad types of strategies for producing conceptual change: creating cognitive conflict and building on existing ideas as analogies. The DiBello and Schmitt approaches fit within the first grouping, to create cognitive conflict. The Brown and Clement work exemplifies the second—introducing analogs as platforms for new ideas. Chinn and Brewer (1993) have also suggested that asking students to justify their models will facilitate their readiness to change models in the face of anomalous data. Rouse and Morris (1986) have voiced concerns about invoking the notion of mental models. The concept of a mental model is typically so vague and ambiguous that it has little theoretical or applied value. However, Klein and Hoffman (2008) argue that the term “mental model” is an umbrella that covers a variety of relationships: causal, spatial, organizational, temporal, and so forth. As long as we are clear about which type of relationship we are interested in, much of the murkiness of “mental models” disappears. Doyle and Ford (1998) presented a useful account of mental models of dynamic systems, which they defined as a relatively enduring and accessible, but limited, internal conceptual representation of an external system whose structure maintains the perceived structure of that system. They differentiated their account from the concept of “mental representations,” which covers a variety of cognitive structures such as schemas, images, scripts, and so forth. With regard to cognitive learning, our emphasis is usually on causal relationships. During the learning process, people are engaged in sensemaking to understand and explain how to make things happen. Under the right circumstances, they may also discover better ways to think about causal connections. People have to diagnose their performance problems, manage their attention, appreciate the implications of feedback, and formulate better mental models by unlearning inadequate models. Learners are not simply accumulating more knowledge into a storehouse. They are changing their perspectives on the world.
Cognitive Transformation Theory
59
That is why we hypothesize that these changes are uneven, rather than smooth and cumulative. COGNITIVE TRANSFORMATION THEORY In this section we present an account of the transition process for acquiring cognitive skills. We are primarily interested in how people learn better mental models to achieve a stronger understanding of what has been happening and what to do about it. In contrast to a storehouse metaphor of adding more and more knowledge, we offer the notion of cognitive transformation—that progress in cognitive skills depends on successively shedding outmoded sets of beliefs and adopting new beliefs. We call this account of cognitive learning “Cognitive Transformation Theory” (CTT). Our central claim is that conceptual learning is discontinuous rather than smooth. We make periodic advances when we replace flawed mental models with better ones. However, during the process of cognitive development our mental models get harder to disconfirm. As we move further up the learning curve or have more expertise, we have to put more and more energy into unlearning—disconfirming mental models—in order to accept better ones. We do not smoothly acquire knowledge as in a storehouse metaphor. Our comprehension proceeds by qualitative jumps. At each juncture our new mental models direct what we attend to and explain away anomalies. As a result, we have trouble diagnosing the flaws in our thinking. Because of problematic mental models, people often misdiagnose their limitations and discard or misinterpret informative feedback. The previous mental model, by distorting cues and feedback, acts as a barrier to advancement. So progress may involve some backtracking to shed mistaken notions. In addition, flawed beliefs have also influenced the way people encoded experiences in the past. Simply changing one’s beliefs will not automatically change the network of implications generated from those beliefs. As a result, people may struggle with inconsistencies based on different mental models that have been used at different times in the past. Instructional developers have to design interventions that help trainees unlearn their flawed mental models. We can represent cognitive transformation theory as a set of postulates: • Mental models are central to cognitive learning. Instruction needs to diagnose limitations in mental models, design interventions to help students appreciate the flaws in their mental models, and provide experiences to enable trainees to discover more useful and accurate mental models. • Mental models are modular. People have a variety of fragmentary mental models, and they weave these together to account for a novel observation. People are usually not matching events to sophisticated theories they have in memory. They are using fragments and partial beliefs to construct relevant mental models. For most domains, the central mental models describe causal relationships. They describe how events transform into later events. Causal mental models typically take the form of a story.
60
Learning, Requirements, and Metrics • Experts have more sophisticated mental models in their domains of practice than novices. Experts have more of the fragmentary beliefs needed to construct a plausible mental model. Therefore, they are starting their construction from a more advanced position. Finally, experts have more accurate causal mental models and have tested and abandoned more inadequate beliefs. • Experts build their repertoires of fragmentary mental models in a discontinuous fashion. In using their mental models, even experts may distort data, oversimplify, explain away diagnostic information, and misunderstand events. At some point, experts realize the inadequacies of their mental models. They abandon their existing mental models and replace these with a better set of causal beliefs. And the cycle begins again. • Learning curves are usually smooth because researchers combine data from several subjects. The reason for the smoothness is the averaging of discontinuous curves. • Experts are fallible. No set of mental models is entirely accurate and complete. • Knowledge shields are the set of arguments learners can use to explain away data that challenge their mental models (Feltovich et al., 1997). Knowledge shields pose a barrier to developing cognitive skills. People are skilled at holding onto cherished beliefs. The better the mental models, the easier it is to find flaws in disconfirming evidence and anomalous observations. The S-shaped learning curve reflects the increasing difficulty of replacing mental models as people’s mental models become more accurate. • Knowledge shields affect diagnosis. Active learners try to overcome their limitations, but they need to understand what those limitations are. Knowledge shields based on poor mental models can lead learners to the wrong diagnoses of their poor performance. • Knowledge shields affect feedback. In building mental models about complex situations, people receive a lot of feedback. However, the knowledge shields enable people to discard or neutralize contradictory data. • Progress depends on unlearning. The better the causal models, the more difficult it is to discover their weaknesses and replace them. In many cases, learners have to encounter a baffling event, an unmistakable anomaly, or an intelligent failure in order to begin doubting their mental models. They have to lose faith in their existing mental models before they can review the pattern of evidence and formulate a better mental model. People can improve their mental models by continually elaborating them, by replacing them with better ones, and/or by unlearning their current mental models. Cognitive development relies on all three processes. • Individual differences in attitudes toward cognitive conflict will affect success in conceptual change. Dreyfus, Jungwirth, and Eliovitch (1990) noted that bright and successful students responded positively to anomalies, whereas unsuccessful students tended to avoid the conflicts.
Cognitive Transformation Theory generates several testable hypotheses. It asserts that individual learning curves will be discontinuous, as opposed to the smooth curves found when researchers synthesize data across several subjects. CTT suggests a form of state-dependent learning. The material learned with one set of mental models may be inconsistent with material learned with a different
Cognitive Transformation Theory
61
mental model. Consequently, learners may be plagued with inconsistencies that reflect their differing beliefs during the learning cycle. IMPLICATIONS FOR VIRTUAL ENVIRONMENTS What is difficult about learning cognitive skills in virtual environments? While on the surface, there can appear to be tremendous benefits to taking advantage of virtual environments and the associated technologies to support cognitive skill development, Koschmann, Myers, Feltovich, and Barrows (1994) note that technology in environments often seems to be focused on the capabilities of the technology rather than on the instructional need. In essence, they are often technology focused learning with learning as an afterthought. Virtual environments are becoming integral to almost all areas of training and educational applications. These virtual environments can include projector based displays, augmented and mixed reality technologies, online structured professional forums, game based learning technologies, and multimodal technologies to name a few. As with the more traditional types of learning discussed in this chapter, Cognitive Transformation Theory can guide the way we develop and use these technologies. Cognitive Transformation Theory revolves around the principle that mental models are central to cognitive learning. Virtual environments give us the opportunity to examine our mental models and build on them. Simulated environments can allow learners to see how a proposed path of action plays out, thereby allowing them to observe flaws in their mental models and begin the process of improving mental models. In addition, virtual environments allow for both intrinsic and extrinsic feedback. Many simulations offer scoring or an after action review capability that allows learners to see how they did in comparison to other students or some set standard. More important than the extrinsic feedback, these virtual environments give learners the ability to see how their actions play out and the challenges they may run into based on their mental models, allowing for self-assessment, adjustment, and improvement in cognitive learning. Because cognitive learning depends heavily on sensemaking, and sensemaking is often complicated by knowledge shields, virtual environment sessions might benefit from designs using garden path scenarios that elicit knowledge shields and give learners a chance to recover from mistaken mindsets and get off the garden path. In a garden path scenario a person is led to accept a proposition that seems obviously true and is then given increasing amounts of contrary evidence gradually leading to the realization that the initial proposition is wrong. The paradigm lets us study how long it takes for participants to doubt and then reject the initial proposition—how long they stay on the garden path. Virtual environments may also support some of the strategies that Posner et al. (1982) described for facilitating accommodation by helping instructors to diagnose errors and also prepare for the defenses trainees might employ as knowledge shields and by helping instructors track the process of concept change.
62
Learning, Requirements, and Metrics
CONCLUSIONS Now we can see what is wrong with the storehouse metaphor of learning described at the beginning of this paper. Learning is more than adding additional information. Learning is about changing the way we understand events, changing the way we see the world, changing what counts as information in the first place. The functions of diagnosis, practice, and feedback are all complex and depend on sensemaking. To replace the storehouse metaphor we have presented a theory of cognitive transformation. We claim that cognitive skills do not develop as a continual accumulation. Rather, cognitive skills and the mental models underlying them progress unevenly. Flawed mental models are replaced by better ones, but the stronger the mental models the more difficult to dislodge them. As a result, learners explain away anomalies, inconsistencies, inconvenient feedback, and misdiagnose their problems. How we teach cognitive skills, therefore, has to help people unlearn their current mental models before helping them develop better ones. If this unlearning process does not occur, the students will use their current mental models to discount the lessons and the feedback. Cognitive Transformation Theory may offer a shift in perspective on cognitive learning. It relies on sensemaking as the core function in learning cognitive skills, as opposed to a storehouse metaphor. These issues pose challenges to the use of VEs for training cognitive skills. The training cannot be treated as a matter of realistically replicating perceptual phenomena. If the technology interferes with diagnosis, distorts cognitive learning objectives, short-cuts the attention management skills needed for practice, and limits the search for and interpretation of feedback, then cognitive learning will be degraded. Fortunately, a VE can provide a platform for unlearning that can be superior to the natural environment. To be effective for cognitive learning, VE approaches will need to move beyond increasing sensory realism and consider the design of scenarios to promote sensemaking. Cognitive Transformation Theory offers some recommendations for how this might be done. By ensuring that the training environment supports diagnosis, attention management, and feedback, virtual environments can become useful and efficient means of achieving cognitive transformations. ACKNOWLEDGMENTS The authors would like to thank Joseph Cohn for his support of this project developed under Contract No. M67854-04-C-8035 (issued by MARCORSYSCOM/PMTRASYS). We would also like to thank Sterling Wiggins, Karol Ross, and Jennifer Phillips for their valuable critiques and inputs.
Cognitive Transformation Theory
63
REFERENCES Barrett, L. F., Tugade, M. M., & Engle, R. W. (2004). Individual differences in working memory capacity and dual-process theories of the mind. Psychological Bulletin, 139, 553–573. Bloom, B. C. (Ed.). (1956). Taxonomy of educational objectives: Handbook I. cognitive domain. New York: David McKay Company, Inc. Brooks, F., Fuchs, H., McMillan, L., Whitton, M., & Cannon-Bowers, J. (2006). Virtual environment training for dismounted teams—Technical challenges. In Virtual Media for Military Applications (RTO Meeting Proceedings No. RTO-MP-HFM-136, pp. 22-1–22-10). Neuilly-sur-Seine, France: Research and Technology Organisation. Brown, D. E., & Clement, J. (1989). Overcoming misconceptions via analogical reasoning: Abstract transfer versus explanatory model construction. Instructional Science, 18, 237–261. Carey, S. (1986). Cognitive science and science education. American Psychologist, 41, 1123–1130. Chi, M., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In R. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1, pp. 7–76). Hillsdale, NJ: Erlbaum. Chinn, C. A., & Brewer, W. F. (1993). The role of anomalous data in knowledge acquisition: A theoretical framework and implications for science instruction. Review of Educational Research, 63, 1–49. Crandall, B., Klein, G., & Hoffman, R. R. (2006). Working minds: A practitioner’s guide to cognitive task analysis. Cambridge, MA: The MIT Press. DeKeyser, V., & Woods, D. D. (1990). Fixation errors: Failures to revise situation assessment in dynamic and risky systems. In A. G. Colombo & A. Saiz de Bustamente (Eds.), Advanced systems in reliability modeling (pp. 231–252). Norwell, MA: Kluwer Academic. Dewey, J. (1938). Experience and education. New York: MacMillan. DiBello, L. (2001). Solving the problem of employee resistance to technology by reframing the problem as one of experts and their tools. In E. Salas & G. Klein (Eds.), Linking expertise and naturalistic decision making (pp. 71–93). Mahwah, NJ: Erlbaum. Doyle, J. K., & Ford, D. N. (1998). Mental models concepts for system dynamics research. System Dynamics Review, 14, 3–29. Dreyfus, A., Jungwirth, E., & Eliovitch, R. (1990). Applying the ‘cognitive conflict’ strategy for conceptual change—some implications, difficulties and problems. Science Education, 74(5), 555–569. Eylon, B.-S., & Linn, M. C. (1988). Learning and instruction: An examination of four research perspectives in science education. Review of Educational Research, 58, 251–301. Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1997). Issues of expert flexibility in contexts characterized by complexity and change. In P. J. Feltovich, K. M. Ford, & R. R. Hoffman (Eds.), Expertise in context (pp. 125–146). Menlo Park, CA: AAAI/MIT Press. Feyerabend, P. (1962). Explanation, reduction and empiricism. In H. Feigl & G. Maxwell (Eds.), Minnesota studies in philosophy of science (Vol. 3, pp. 28–97). Minneapolis: University of Minnesota Press.
64
Learning, Requirements, and Metrics
Glaser, R., & Chi, M. T. H. (1988). Overview. In M. T. H. Chi, R. Glaser, & M. J. Farr (Eds.), The nature of expertise. (pp. xv–xxviii). Mahwah, NJ: Lawrence Erlbaum. Klein, G., & Hoffman, R. R. (2008). Macrocognition, mental models, and cognitive task analysis methodology. In J. M. Schraagen, L. G. Militello, T. Ormerod, & R. Lipshitz (Eds.), Naturalistic decision making and macrocognition (pp. 57–80). Hampshire, England: Ashgate. Klein, G., Phillips, J. K., Rall, E., & Peluso, D. A. (2006). A data/frame theory of sensemaking. In R. R. Hoffman (Ed.), Expertise out of context: Proceedings of the 6th International Conference on Naturalistic Decision Making (pp. 113–155). Mahwah, NJ: Lawrence Erlbaum. Klein, G. A., & Hoffman, R. (1993). Seeing the invisible: Perceptual/cognitive aspects of expertise. In M. Rabinowitz (Ed.), Cognitive science foundations of instruction (pp. 203–226). Mahwah, NJ: Lawrence Erlbaum. Kolb, D. A. (1984). Experiential learning: Experience as the source of learning and development. Englewood Cliffs, NJ: Prentice Hall. Koschmann, T. D., Myers, A. C., Feltovich, P. J., & Barrows, H. S. (1994). Using technology to assist in realizing effective learning and instruction: A principled approach to the use of computers in collaborative learning. The Journal of the Learning Sciences, 3(3), 227–264. Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press. Lakatos, I. (1976). Proofs and refutations: The logic of mathematical discovery. Cambridge, England: Cambridge University Press. Piaget, J. (1929). The child’s conception of the world. New York: Harcourt Brace. Polanyi, L. (1958). Personal knowledge: Towards a post-critical philosophy. Chicago: University of Chicago Press. Posner, G. J., Strike, K. A., Hewson, P. W., & Gertzog, W. A. (1982). Accommodation of a scientific conception: Toward a theory of conceptual change. Science Education, 66(2), 211–227. Rouse, W. B., & Morris, N. M. (1986). On looking into the black box: Prospects and limits on the search for mental models. Psychological Bulletin, 100(3), 349–363. Schmidt, R. A., & Wulf, G. (1997). Continuous concurrent feedback degrades skill learning: Implications for training and simulation. Human Factors, 39(4), 509–525. Schmitt, J. F. (1996, May). Designing good TDGs. Marine Corps Gazette, 96–98. Scott, P. H., Asoko, H. M., & Driver, R. H. (1991). Teaching for conceptual change: A review of strategies. In R. Duit, F. Goldberg, & H. Niederer (Eds.), Research in physics learning: Theoretical issues and empirical studies. Proceedings of an international workshop (pp. 310–329). University of Kiel, Kiel, Germany: Schmidt & Klannig. Shuell, T. J. (1986). Cognitive conceptions of learning. Review of Educational Research, 56, 411–436. Starbuck, W. H., & Hedberg, B. (2001). How organizations learn from success and failure. In M. Dierkes, A. B. Antal, J. Child, & I. Nonaka (Eds.), Handbook of organizational learning and knowledge (pp. 327–350). Oxford, United Kingdom: Oxford University Press. Toulmin, S. (1953). The philosophy of science: An introduction. London: Hutchinson. Waller, D., Hunt, E., & Knapp, D. (1998). The transfer of spatial knowledge in virtual environment training. Presence: Teleoperators and Virtual Environments, 7(2), 129–143.
Cognitive Transformation Theory
65
Weick, K. E. (1995). Sensemaking in organizations. Thousand Oaks, CA: Sage Publications. Witmer, B. G., Bailey, J. H., & Knerr, B. W. (1995). Training dismounted soldiers in virtual environments: Route learning and transfer (ARI Tech. Rep. No. 1022). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. (DTIC No. ADA292900).
Part III: Social Band
Chapter 4
CREATING EXPERTISE WITH TECHNOLOGY BASED TRAINING Karol Ross, Jennifer Phillips, and Joseph Cohn Training that is delivered completely or primarily in a technology based format such as desktop simulation has the potential to offer substantial benefits. It can cost less than full-scale simulations or field exercises and can support multiple practice iterations in a short period of time. It is often engaging and motivating (Druckman, 1995; Garris, Ahlers, & Driskell, 2002; Prensky, 2001), as well as easy to distribute. It provides contexts for decision-making practice that are crucial to building expertise in complicated fields of practice. However, guidelines for designing effective training at advanced levels of learning are lacking, leaving potential benefits of such technology unrealized. The affordances of the technology are only as good as the training design allows them to be. To address this issue, we developed a framework describing performance and the basics of training design for advanced stages of learning. This framework provides insights into how the strengths of technology can best be exploited for learning and lays a foundation for effective training design when technology based training is employed. The framework includes (1) a stage model of cognitive proficiency that describes characteristics of novices, experts, and proficiency levels between the two extremes and (2) guidance for training design to support the advancement of expertise across the stages of performance. The purpose of training is to move individuals from their current state of skill and knowledge to a higher state. Technology provides a powerful means to accelerate this movement by allowing multiple surrogate experiences and shared insights from real-life operations. But, without a commonly recognized account of the stages of performance, we lack a road map to develop that expertise by tailoring technology to the needs of the training audiences. To build the stage model of cognitive performance, we started with an existing model of proficiency (Dreyfus & Dreyfus, 1980, 1986) and added more recent research findings that delineate knowledge and abilities at the stages between novice and expert (Benner, 1984, 2004; Houldsworth, O’Brien, Butler, & Edwards, 1997; McElroy, Greiner, & de Chesnay, 1991). In this effort we focused on professional expertise in complex, ill-structured knowledge domains and especially on learning at the intermediate stages of
Creating Expertise with Technology Based Training
67
development—the most difficult skills to develop and the biggest opportunity for game based and virtual environments. An ill-structured knowledge domain is one in which the following two properties hold: (a) Each case or example of knowledge application typically involves the simultaneous interactive involvement of multiple, wide-application conceptual structures (multiple schemas, perspectives, organizational principles, and so on), each of which is individually complex (i.e., the domain involves concept- and casecomplexity); and (b) the pattern of conceptual incidence and interaction varies substantially across cases nominally of the same type (i.e., the domain involves across-case irregularity). (Spiro, Feltovich, Jacobson, & Coulson, 1992, p. 60)
In other words, each individual case is complex, and there is considerable variability across cases. Different situations in these types of professional domains are likely to require application of varying patterns of principles, even in cases of seemingly similar problems or goals. No pat solutions can be employed with regularity. Such domains require professionals to exercise a great deal of judgment to flexibly apply their knowledge. Further, high levels of skill can be acquired only through operational experience with the task. These professions often require that decisions be made under conditions of time pressure and high stakes. Tactical thinking is one such domain, as are many types of medical practice, firefighting, and police work. Surrogate experiences through technology can develop expertise in such complex fields of endeavor when insightful training design takes into account the learning needs at different stages of performance. THE FIVE-STAGE MODEL OF COMPLEX COGNITIVE SKILL ACQUISITION The five-stage model describes performance at levels from novice to expert in ill-structured, cognitively complex domains. The Dreyfus and Dreyfus model has been applied to training and instruction within domains such as combat aviation, nursing, industrial accounting, psychotherapy, and chess (Benner, 1984, 2004; Dreyfus & Dreyfus, 1986; Houldsworth et al., 1997; McElroy et al., 1991). Incorporating research findings since 1986 from the applied cognitive psychology and naturalistic decision-making communities into the original model provides the stage model for the framework, summarized in Table 4.1. Each stage is further described below and illustrated by a summary table showing the nature of knowledge and performance, as well as implications for training. We direct the reader to Phillips, Ross, and Cohn (Volume 1, Section 2, Chapter 8) for a discussion of how this model has been applied to the domain of tactical thinking. This application provides a number of examples for all stages of the model. Stage 1: Novice Individuals who perform at the novice level have limited or no experience in situations characteristic of the domain in which they seek to gain expertise. They
Table 4.1. Stage
Novice
Advanced Beginner
Competentb
Proficientd
Expertf
a
Overview of the Stages from Novice to Expert (Reprinted by permission; Lester, 2005) Characteristics
–Rigid adherence to taught rules or plans –Little situational perception –No discretionary judgment –Guidelines for action based on attributes or aspectsa –Situational perception is still limited –All attributes and aspects are treated separately and given equal importance –Sees actions at least partially in terms of longer-term goals –Conscious, deliberate planning –Standardized and routinized procedures –Plan guides performance as situation evolvesc –Sees situation holistically rather than in terms of aspects –Sees what is most important in a situation –Perceives deviations from the normal pattern –Uses maxims, whose meanings vary according to the situation, for guidance –Situational factors guide performance as situation evolvese –No longer relies on rules, guidelines, or maxims –Intuitive grasp of situations based on deep tacit understanding –Intuitive recognition of appropriate decision or actiong –Analytic approaches used only in novel situations or when problems occur
Aspects are global characteristics of situations recognizable only after some prior experience. Original item deleted: “Coping with crowdedness.” c Item added. d Original item deleted: “Decision making less labored.” e Item added. f Original item deleted: “Vision of what is possible.” g Item added. b
How KnowlRecognition How Context edge Is Treated of Relevance Is Assessed
Without reference to context
Decision Making
None Analytically Rational
In context Present Holistically Intuitive
Creating Expertise with Technology Based Training
69
are typically taught about the situations they will encounter in terms of objective “attributes,” such as the number of soldiers in a unit, the range radius of enemy assets, or other measurable quantities that can be recognized without operational or practical exercise experience. Novices are also taught context-free rules, such as the formula for determining how long it will take personnel carriers to get from point A to point B under normal conditions. Because the novice’s understanding of the domain is based largely in rules, his or her performance is quite limited and inflexible. As the study of nursing by Benner (1984) points out, rule-guided behavior actually prevents successful performance because a set of rules cannot make clear which tasks are relevant or critical in an actual situation. A novice under the Dreyfus and Dreyfus (1980) model may have a great deal of textbook or classroom knowledge of the domain, but what places him or her in stage 1 is the shortage of actual lived experience. There is a clear distinction between the level of performance that results when textbook principles and theories are applied and the superior performance achieved when an experience base guides performance. Novices can be dangerous to themselves and others when thrust into operational situations with no “feel for” the nature of that experience. Table 4.2 provides a summary of the novice stage of development and includes training implications for this stage. Experiential training can actually be useful at the introductory stage of performance, because the facts taught at this level are often forgotten before the learner has any relevant experiences. Learning facts in the context of situations can support the retention and later appropriate recall of such declarative knowledge (Bransford et al., 1990). Stage 2: Advanced Beginner Advanced beginners have acquired enough domain experience that their performances can be considered marginally acceptable. At this stage, learners can recognize, either on their own or when pointed out to them by an instructor, recurring meaningful “aspects” of the situation. Aspects are global characteristics that are identifiable only through prior experience; the prior experience serves as a comparison case for the current situation. For example, an advanced beginner would be able to grasp that close air support could be helpful in a particular situation after taking part in a previous exercise in which close air support was utilized. A learner at this stage would not know how, when, or where to employ the air assets to the best advantage, but would recognize their potential to help alleviate the situation. While it is possible to make some of these aspects explicit for an ill-structured domain, it is not possible to form objective rules to govern every situation. Building on the close air support example, it is likely that a different array of factors would determine the applicability of air assets for different situations. A single set of well-defined rules cannot adequately address every instance. With experience, the learner will increasingly pick up on the array of cues that signal opportunities for air support. (For more detailed examples of each stage, see Phillips, Ross, and Cohn, Volume 1, Section 2, Chapter 8.) Technology offers the setting for the practitioner to rapidly build an experience base, understand how similar
70
Learning, Requirements, and Metrics
Table 4.2. Summary of Stage 1 of Cognitive Skills Acquisition—Novice Knowledge
• Objective facts and features of the domain (Dreyfus & Dreyfus, 1986). • Context-free (abstract) rules to guide behavior (Dreyfus & Dreyfus, 1986). • Domain characteristics acquired through textbooks and classroom instruction (Benner, 1984).
Stage 1: Novice Performance
• Guided by rules; is limited and inflexible (Benner, 1984). • Shows recognition of elements of the situation without considering the context (Dreyfus & Dreyfus, 1986). • Is variable and awkward (Glaser, 1996). • Focuses on isolated variables (Glaser, 1996). • Consists of a set of individual acts rather than an integrated strategy (Glaser, 1996; McElroy et al., 1991). • Is self-assessed based on how well he or she adheres to learned rules (Benner, 1984; Dreyfus & Dreyfus, 1986). • Reflects a sense of being overwhelmed since all stimuli are perceived to be equally relevant (McElroy et al., 1991).
Training Implications
• Must give learners rules to guide performance (Benner, 1984). • Learners require guidance in the form of instruction or mentoring while developing their experiential knowledge (Houldsworth et al., 1997). • Dialogue with a mentor or instructor enables learner to make sense of his or her experiences and discover he or she learned more than what he or she may have originally thought (Houldsworth et al., 1997). • Structured, situation based learning of facts can increase retention and appropriate recall of declarative information in later stages of learning (Bransford, Sherwood, Hasselbring, Kinzer, & Williams 1990).
cases can vary substantially, and practice understanding different situations and making decisions. Advanced beginners begin to develop their own “guidelines” that stem from an understanding of the domain attributes and aspects. Guidelines are rules that inform behavior by allowing the practitioner to attach meaning to elements of a situation (Dreyfus & Dreyfus, 1980, 1986). For example, a platoon leader at this level of proficiency may know that the first step in conducting an offensive is to set up a base of fire, and he or she may know that the support position should be a certain distance from the primary objective. However, he or she may not understand that he or she needs to take into account not only distance from the objective, but also angles of fire in order to prevent fratricide. And he or she probably cannot distinguish that the rules and factors critical under one set of circumstances are not necessarily decisive in other operational situations. Spiro et al. (1992) note that in complex domains, the application of different patterns of
Creating Expertise with Technology Based Training
71
Table 4.3. Summary of Stage 2 of Cognitive Skills Acquisition—Advanced Beginner Knowledge
• Some domain experience (Benner, 1984; Dreyfus & Dreyfus, 1986). • More objective, contextfree facts than the novice, and more sophisticated rules (Dreyfus & Dreyfus, 1986). • Situational elements, which are recurring, meaningful elements of a situation based on prior experience (Dreyfus & Dreyfus, 1986). • A set of self-generated guidelines that dictate behavior in the domain (Benner, 1984). • Seeks guidance on task performance from contextrich sources (for example, experienced people, documentation of past situations) rather than rule bases (for example, textbooks) (Houldsworth et al., 1997).
Stage 2: Advanced Beginner Performance
• Is marginally acceptable (Benner, 1984). • Combines the use of objective, or context-free, facts with situational elements (Dreyfus & Dreyfus, 1986). • Ignores the differential importance of aspects of the situation; situation is a myriad of competing tasks, all with same priority (Benner, 1984; Dreyfus & Dreyfus, 1986; Shanteau, 1992). • Shows initial signs of being able to perceive meaningful patterns of information in the operational environment (Benner, 1984). • Reflects attitude that answers are to be found from an external source (Houldsworth et al., 1997). • Reflects a lack of commitment or sense of involvement (McElroy et al., 1991).
Training Implications
• It can be beneficial to take off the training wheels and force learners to analyze the situation on their own (Houldsworth et al., 1997). • The learner benefits from having his or her attention directed to certain aspects of the situation by an instructor or mentor. This enables him or her to begin forming principles that can dictate actions (Benner, 1984). • Coaching on cue recognition and discrimination is appropriate and important (Benner, 1984; Benner, 2004). • Coaching on setting priorities is appropriate (Benner, 1984). • Employ strategies to calm learner and decrease anxiety in order to enhance performance capacity (Benner, 2004). • Use diagrams to facilitate the development of accurate mental model development (Scielzo, Fiore, Cuevas, & Salas, 2004).
principles varies from situation to situation, and there is substantial interconnectedness among principles that can be seen only by experiencing one type of situation in a number of varying instantiations. At this stage the practitioner has organized his or her knowledge and experience into principles, but has not built the interconnectedness or developed the ability for flexible application. Table 4.3 provides a summary of the advanced beginner stage of development and includes training implications for this stage.
72
Learning, Requirements, and Metrics
Stage 3: Competent Stage 3 is marked by the ability to formulate, prioritize, and manage longerterm goals or objectives. This perspective gives the operator a better sense of the relative importance of the attributes and aspects of the situation. The transition from advanced beginner to competent is highlighted by a shift from highly reactive behaviors, where actions are taken right when a problem surfaces, to planned behaviors, where the learner can see the larger picture and assess what actions must be taken immediately and what can wait until later. While a learner at stage 3 is not as quick or flexible as a stage 4 learner, he or she can typically manage a large set of incoming information and task demands. The competent performer acts on the situation with a very analytical, hierarchical approach. Dreyfus and Dreyfus (1986) compare this to the problem solving approach described by proponents of information processing. Based on an initial judgment of what part of the situation is most important, the performer generates a plan to organize and thus simplify the situation to improve his or her performance. However, the drawback for competent performers is that their plans drive their behavior to a greater extent than any situational elements that may arise; they tend to hesitate to change their plans midcourse, despite the introduction of new, conflicting information. Simultaneously, competent performers are more emotionally invested in their performances than novices or advanced beginners. Because they actively choose a plan of action for themselves rather than relying on rules offered by a textbook or instructor, they take great pride in success and are distressed by failure (Dreyfus & Dreyfus, 1986). At this point, technology can provide multiple iterations of scenarios to allow the learners to build connections across cases and understand the limits of their plans and the need for adaptability. Table 4.4 provides a summary of the competent stage of development and includes training implications for this stage. Stage 4: Proficient Learners at the proficient level have moved away from perceiving situations in terms of independent aspects and attributes and see the situation as an inseparable whole where aspects and attributes are interrelated and woven together. The situation is not deliberately analyzed for its meaning; an assessment occurs automatically and dynamically because the learner has an extensive experience base from which to draw comparisons. However, decisions regarding appropriate actions continue to require some degree of detached analysis and deliberation. With regard to the situation assessment process, proficient individuals experience the event from a specific perspective, with past experiences in mind. Therefore, certain features of the situation stand out as salient, and others fade into the background as noncritical (see, for example, Crandall & Getchell-Reiter, 1993; Hoffman, Crandall, & Shadbolt, 1998; Klein, 1998). Dreyfus and Dreyfus (1980) assert that at this stage, performers are also positively impacted by new information that is obtained as the situation progresses. While competent performers generally cannot change their plans when faced with conflicting information,
Creating Expertise with Technology Based Training
73
Table 4.4. Summary of Stage 3 of Cognitive Skills Acquisition—Competent Knowledge
• How to think about the situation in terms of overarching goals or tasks (Benner, 1984). • The relative importance of subtasks depending on situational demands (Benner, 1984; Dreyfus & Dreyfus, 1986). • Particular patterns of cues suggest particular conclusions, decisions, or expectations (Dreyfus & Dreyfus, 1986). • A personalized set of guiding principles based on experience (Houldsworth et al., 1997). • How to anticipate future problems (Houldsworth et al., 1997).
Stage 3: Competent Performance
Training Implications
• Is analytic, conscious, and • Decision-making games and simulations are benefideliberate (Benner, 1984; Dreyfus & Dreyfus, 1986). cial at this stage. The scenarios should require the • Does not rely on a set of learner to plan and coordirules (Houldsworth et al., nate multiple, complex sit1997). uational demands (Benner, • Is efficient and organized (Benner, 1984; Dreyfus & 1984). • Coaching should encourage Dreyfus, 1986). • Is driven by an organizing learners to follow through plan that is generated at the on senses that things are not as usual, or on vague outset of the situation (Dreyfus & Dreyfus, 1986). feelings of anxiety. They have to learn to decide • Reflects an inability to digress from the plan, even what is relevant without rules to guide them when faced with new, (Benner, 2004). conflicting information • Use learner’s sense of con(Dreyfus & Dreyfus, fusion or questioning (for 1986). • Reflects an inability to see example, when the plan newly relevant cues due to does not hold up) to improve his or her domain the organizing plan or mental models (Benner, structure that directs 2004). attention (Benner, 2004). • Reflects an emotionally involved performer who takes ownership of successes and failures (Dreyfus & Dreyfus, 1986). • Focuses on independent features of the situation rather than a synthesis of the whole (Houldsworth et al., 1997).
proficient individuals fluidly adjust their plans, expectations, and judgments as features of the situation change. They have an intuitive ability to recognize meaningful patterns of cues without breaking them down into their component parts for analysis. Dreyfus and Dreyfus (1980) term this ability “holistic similarity recognition.” However, the elements that are holistically recognized must still be assessed and combined using sophisticated rules in order to produce a decision or action that meets the individual’s goal(s). Technology can provide performers
74
Learning, Requirements, and Metrics
at this advanced stage the opportunity to practice verifying their perceptions in complex situations, practice adaptive behaviors, and gain more fluidity in their performance. Dreyfus and Dreyfus further describe stage 4 performers as being guided by “maxims” that reflect the nuances of a situation (see also Benner, 1984). These maxims can mean one thing under one set of circumstances, but something else under another set of circumstances. As a simplistic example, consider a building in the midst of an urban combat area whose windows are broken out. This cue could indicate that the building is run down and vacant. It could also indicate that the adversary is occupying the building and has broken out the windows to use it as a base of fire. Other situational cues and factors will need to be considered to determine how to interpret the broken out windows—for example, the adversaries’ history of breaking out windows, typical building types that they have utilized in the past, their last known location and projected current location, the presence or absence of undisturbed dust or dirt around the building, and so forth. Table 4.5 provides a summary of the proficient stage of development and includes training implications for this stage. Stage 5: Expert The fifth and final stage of the Dreyfus and Dreyfus model is the expert. At this level the individual no longer relies on analytic rules, guidelines, or maxims; performance becomes intuitive and automatic. The expert immediately understands which aspects of the situation are critical and does not waste time on the less significant aspects. He or she knows implicitly what action to take and can remedy a situation quickly and efficiently. Experts typically need highly realistic situations or actual operational experiences to stimulate development. They benefit from real-life, highly complex scenarios and live exercises in which they can combine the various resources and talents of others around them and conduct challenging discussions with other experts. Table 4.6 provides a summary of the expert stage of development and includes training implications for this stage. TRAINING DESIGN FOR ADVANCED LEARNING1 Surrogate Experiences and Fidelity Technology based training should be used to build an experience base. That experience base will be activated in every situation encountered in the future to yield more powerful operational performance. Good design of advanced learning experiences can support advancement through stages 2, 3, and 4 of the performance continuum. Design without knowledge of expert cognitive performance and how experiences create advanced learning can waste time and resources, or worse, in some cases negatively affect transfer of training to performance. 1 For more detailed training principles and examples of application, see Phillips, Ross, and Cohn (Volume 1, Section 2, Chapter 8).
Creating Expertise with Technology Based Training
75
Table 4.5. Summary of Stage 4 of Cognitive Skills Acquisition—Proficient Knowledge
• Typical “scripts” for categories of situations (Klein, 1998). • How to set expectancies and notice when they are violated (Benner, 1984). • How to spot the most salient aspects of the situation (Benner, 1984; Dreyfus & Dreyfus, 1986). • Personalized maxims, or nuances of situations, that require a different approach depending on the specific situation, but not how to apply the maxims correctly (Benner, 1984; Houldsworth et al., 1997).
Stage 4: Proficient Performance
Training Implications
• Case studies (that is, • Reflects a perception of scenarios) are valuable, the situation as a whole where the learner’s ability rather than its component to grasp the situation is features (Benner, 1984). solicited and taxed • Is quick and flexible (Benner, 2004). (Benner, 1984). • Reflects a focus on long- • The learner should be required to cite his own term goals and objectives personal experiences and for the situation (Benner, exemplars for perspective 1984). on his views of case studies • Utilizes prior experience (or intuition) to assess the (Benner, 2004). situation, but analysis and • Teach inductively, where the learner sees the deliberation to determine a course of action (Dreyfus situation and then supplies & Dreyfus, 1986; McElroy his or her own way of understanding the situation et al., 1991). • Reflects a synthesis of the (Benner, 2004). • Within the scenarios, the meaning of information over time (Benner, 2004). facilitation should exhaust the learner’s way of • Reflects a more refined understanding and sense of timing (Benner, approaching the situation 2004). (Benner, 2004). • Scenarios should include irrelevant information and, in some cases, insufficient information to generate a good course of action (Benner, 2004). • Scenarios should contain levels of complexity and ambiguity that mirror real world situations (Benner, 2004). • Do not introduce contextfree principles or rules, or decision analysis techniques, within the context of any training or practice (Benner, 1984).
76
Learning, Requirements, and Metrics
Table 4.6. Summary of Stage 5 of Cognitive Skills Acquisition—Expert Knowledge
Stage 5: Expert Performance
• Is fluid and seamless, like • How to make fine walking or talking; discriminations between similar environmental cues “integrated rapid response” (Klein & Hoffman, 1993). (Benner, 1984, 2004; • How a range of equipment Dreyfus & Dreyfus, 1986). and resources function in • The rationale for actions is often difficult to articulate the domain (Phillips, (Benner, 1984). Klein, & Sieck, 2004). • How to perceive meaning- • Relies heavily and successfully on mental ful patterns in large and simulation to predict complex sets of information (Klein, 1998; Dreyfus events, diagnose prior occurrences, and assess & Dreyfus, 1986). courses of action (Einhorn, • A wide range of routines or tactics for getting things 1980; Klein & Crandall, 1995). done (Klein, 1998). • Consists of more time • A huge library of lived, distinguishable experiences assessing the situation and less time deliberating a that impact handling of new situations (Dreyfus & course of action (Lipshitz & Ben Shaul, 1997). Dreyfus, 1986). • How to set expectancies • Shows an ability to detect problems and spot and notice when they are anomalies early (Feltovich violated (Benner, 1984). et al., 1984). • What is typical and • Capitalizes on leverage atypical for a particular points, or unique ways of situation (Dreyfus & Dreyfus, 1986; Feltovich, utilizing ordinary resources (Klein & Wolf, 1998). Johnson, Moller, & • Manages uncertainty with Swanson, 1984; Klein, relative ease, by filling 1998). gaps with rational assumptions and formulating informationseeking strategies (Klein, 1998; Serfaty, MacMillan, Entin, & Entin, 1997). • Shows efficient information search activities (Shanteau, 1992).
Training Implications
• Scenarios require more complex and detailed contextual information to allow experts to build more sophisticated mental models that contain fine discriminations and more elaborate causal links (for example, full-scale simulations, operational environments) (Feltovich et al., 1984). • Real world planning sessions, which often pull experts from various specialty areas, provide good context within which experts can discuss salient situational cues and preferred actions. • Mentoring others provides an opportunity for experts to “unpack” their thought processes as they model them for others, giving them a better understanding of their own mental models (Ross, Battaglia, Hutton, & Crandall, 2003). • As an alternative to fullscale simulations, tabletop exercises conducted with expert-to-expert peer groups can enable experts to compare the importance of situational factors, cues, and preferred actions.
Creating Expertise with Technology Based Training
77
Expertise is built one challenging experience at a time. A useful experience must be cognitively authentic from the user’s point of view. Cognitive authenticity (Ross, Halterman, Pierce, & Ross, 1998) is the quality that a surrogate experience has when it can stimulate and allow the learner to begin using the perceptual and recognitional processes of an expert—learning to notice, in the manner of an expert, what is and is not present in different situations and the patterns those cues suggest, and to identify situational factors that inhibit some actions or create leverage points for others. The context must be authentic in relationship to how practitioners experience and act in real-life settings. Building a context to support authentic domain experience is not the same thing as simulating physical fidelity. Reproducing billowing smoke, elegantly drawn leaves on the trees, or precise shadows is artistically rewarding, but irrelevant if those elements are not used in judgments or decisions typical of that situation. Meanwhile, failing to represent a tiny pile of freshly overturned dirt indicating that the nearby entrance of a cave has been disturbed can interfere with an authentic cognitive experience. A good technology based experience is an opportunity from a first-person perspective to learn to perceive like an expert. Engineering fidelity, the degree to which technology duplicates the physical, functional, and environmental conditions, does not correlate well with psychological fidelity (Fanchini, 1993). Rather than using an engineering approach to fidelity, designs must correspond to the perceptions that practitioners in the domain have of situations. That information comes from the experiences of experts. There is a problem with using subject matter experts (SMEs) to tell us about their cognitive experience. When they are really good, SMEs usually cannot tell us what they are noticing and how they are doing what they do. They say they “just know.” They do not go through an analysis process to assess or decide. Their knowledge is not verbally encoded. They do not use words, even internally, to lay out cues, factors, or actions. Their expertise is so ingrained that when asked about a specific critical incident, experts often feel as if they acted on instinct or intuition that cannot be explained, not recalling their mental paths to an assessment or action. Cognitive Task Analysis (CTA; Crandall, Klein, & Hoffman, 2006) helps experts access their strategies, perceptions, and knowledge through in-depth interviews. CTA allows the training designer to see experiences from the point of view of the expert. Expertise is “unpacked” around a specific, critical incident. This information from the SMEs becomes the basis for training design and represents the elements of authenticity that learners must perceive and the cognitive challenges or dilemmas they must work through in order to exercise cognitive processes that will form the basis for expertise. Advanced training designs must (1) incorporate cognitive challenges found in the field of practice, (2) provide elements of a situation to develop perceptual attunement, and (3) create an immersive experience that stimulates authentic cognitive behavior from the user’s point of view and that replicates the cognition of experts in the field.
78
Learning, Requirements, and Metrics
Complexity, Multiple Representations, and Case Exploration Inadequate training design for cognitively complex, ill-structured knowledge domains is usually the result of oversimplification and rigid structure. Unfortunately, some training designs that have been used for advanced learning are more suitable for introductory or procedural training, which tends to be linear and hierarchical, and which is frequently task focused, compartmentalized, prepackaged, delivered from one perspective, and comprised of simple analogies. Sometimes training designers simplify and modularize complex knowledge as a means of providing what they believe is easy access to difficult concepts. The designs compartmentalize knowledge, present only clear (and few) cases rather than the many exceptions and variations, and neglect to require application of new knowledge to a variety of situations (Feltovich, Spiro, & Coulson, 1993). The difficulty of representing the complexity of ill-structured domains at the advanced level is often too hard to overcome when using traditional training design processes because these processes encourage oversimplification of information and fail to provide guidance for representing and teaching interrelated concepts where variation among cases is the norm.2 Good design at the advanced level provides for extended exploration of operational situations. Exploration is not unstructured discovery learning, but is introduced through prespecified starting points with aids or frameworks for support. One way frameworks can help learners organize their thoughts about a domain as they explore cases is by providing domain specific themes. These themes are drawn from the domain; they are ways that experts express their conceptual understanding such as “know the enemy.” Themes provide anchors during exploration, but do not prescribe a preferred method for structuring specific knowledge. They are not a prespecified knowledge schema or a set of mental models that the student should adopt. Themes are also used to guide the design of multiple representations and to support the student in exploring how concepts differ in their meaning and application across situations. “Multiple representations” refers to multiple explanations, multiple analogies, and multiple dimensions of analysis. Representations must be open for exploration without prescribed connections and endpoints—what some advocates of game based training refer to as “free play.” However, that free play must be set in well-structured representations of an expert’s perceived reality and supported by guided reflection. Advanced learning is primarily about performance and reflection when grappling with ill-structured problems—those situations that have unclear elements, dynamically evolving goals, and multiple solutions or solution paths. Such problems are emergent dilemmas; they grow from the dynamics of a situational context. Game based or virtual environments are the ideal medium for easy access to multiple explorations of interrelated, dynamic settings needed to build the experience bases that support expertise.
2
For more information on how the learner progresses through the stages of learning, see Klein and Baxter (Volume 1, Section 1, Chapter 3).
Creating Expertise with Technology Based Training
79
REFERENCES Benner, P. (1984). From novice to expert: Excellence and power in clinical nursing practice. Menlo Park, CA: Addison-Wesley Publishing Company Nursing Division. Benner, P. (2004). Using the Dreyfus model of skill acquisition to describe and interpret skill acquisition and clinical judgment in nursing practice and education. Bulletin of Science, Technology & Society, 24(3), 189–199. Bransford, J. D., Sherwood, R. D., Hasselbring, T. S., Kinzer, C. K., & Williams, S. M. (1990). Anchored instruction: Why we need it and how technology can help. In D. Nix & R. Spiro (Eds.), Cognition, education and multimedia (pp. 115–141). Mahwah, NJ: Lawrence Erlbaum. Crandall, B., & Getchell-Reiter, K. (1993). Critical decision method: A technique for eliciting concrete assessment indicators from the “intuition” of NICU nurses. Advances in Nursing Sciences, 16(1), 42–51. Crandall, B., Klein, G., & Hoffman, R. R. (2006). Working minds—A practitioner’s guide to cognitive task analysis. Cambridge, MA: The MIT Press. Dreyfus, H. L., & Dreyfus, S. E. (1986). Mind over machine: The power of human intuitive expertise in the era of the computer. New York: The Free Press. Dreyfus, S. E., & Dreyfus, H. L. (1980). A five stage model of the mental activities involved in directed skill acquisition (Unpublished report supported by the Air Force Office of Scientific Research [AFSC], USAF Contract No. F49620-79-C-0063). Berkeley: University of California at Berkeley. Druckman, D. (1995). The educational effectiveness of interactive games. In D. Crookall & K. Arai (Eds.), Simulation and gaming across disciplines and cultures: ISAGA at a watershed (pp. 178–187). Thousand Oaks, CA: Sage. Einhorn, H. J. (1980). Learning from experience and suboptimal rules in decision making. In T. S. Wallsten (Ed.), Cognitive processes in choice and decision behavior (pp. 1– 20). Mahwah, NJ: Lawrence Erlbaum. Fanchini, H. (1993, September). Desperately seeking the reality of appearances: The case of sessions on full-scale simulators. Paper presented at the Fourth International Conference on Human-Machine Interaction and Artificial Intelligence in Aerospace, Toulouse, France. Feltovich, P. J., Johnson, P. E., Moller, J. H., & Swanson, D. B. (1984). LCS: The role and development of medical knowledge in diagnostic expertise. In W. J. Clancey & E. H. Shortliffe (Eds.), Readings in medical artificial intelligence: The first decade (pp. 275–319). Reading, MA: Addison-Wesley. Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1993). Learning, teaching, and testing for complex conceptual understanding. In N. Frederiksen, R. J. Milslevy, & I. I. Vehar (Eds.), Test theory for a new generation of tests (pp.181–217). Hillsdale, NJ: Lawrence Erlbaum. Garris, R., Ahlers, R., & Driskell, J. E. (2002). Games, motivation, and learning: A research and practice model. Simulation & Gaming, 33(4), 441–467. Glaser, R. (1996). Changing the agency for learning: Acquiring expert performance. In K. A. Ericsson (Ed.), The road to excellence (pp. 303–311). Mahwah, NJ: Lawrence Erlbaum. Hoffman, R. R., Crandall, B. W., & Shadbolt, N. R. (1998). Use of the critical decision method to elicit expert knowledge: A case study in cognitive task analysis methodology. Human Factors, 40(2), 254–276.
80
Learning, Requirements, and Metrics
Houldsworth, B., O’Brien, J., Butler, J., & Edwards, J. (1997). Learning in the restructured workplace: A case study. Education and Training, 39(6), 211–218. Klein, G. (1998). Sources of power: How people make decisions. Cambridge, MA: MIT Press. Klein, G., & Wolf, S. (1998). The role of leverage points in option generation. IEEE Transactions on Systems, Man and Cybernetics: Applications and Reviews, 28(1), 157–160. Klein, G. A., & Crandall, B. W. (1995). The role of mental simulation in naturalistic decision making. In P. Hancock, J. Flach, J. Caird, & K. Vicente (Eds.), Local applications of the ecological approach to human-machine systems (Vol. 2, pp. 324–358). Mahwah, NJ: Lawrence Erlbaum. Klein, G. A., & Hoffman, R. (1993). Seeing the invisible: Perceptual/cognitive aspects of expertise. In M. Rabinowitz (Ed.), Cognitive science foundations of instruction (pp. 203–226). Mahwah, NJ: Lawrence Erlbaum. Lester, S. (2005). Novice to expert: The Dreyfus model of skill acquisition. Retrieved May 1, 2008, from http://www.sld.demon.co.uk/dreyfus.pdf Lipshitz, R., & Ben Shaul, O. (1997). Schemata and mental models in recognition-primed decision making. In C. Zsambok & G. Klein (Eds.), Naturalistic decision making (pp. 293–304). Mahwah, NJ: Lawrence Erlbaum. McElroy, E., Greiner, D., & de Chesnay, M. (1991). Application of the skill acquisition model to the teaching of psychotherapy. Archives of Psychiatric Nursing, 5(2), 113–117. Phillips, J. K., Klein, G., & Sieck, W. R. (2004). Expertise in judgment and decision making: A case for training intuitive decision skills. In D. J. Koehler & N. Harvey (Eds.), Blackwell handbook of judgment & decision making (pp. 297–315). Victoria, Australia: Blackwell Publishing. Prensky, M. (2001). Digital game-based learning. New York: McGraw-Hill. Ross, K. G., Battaglia, D. A., Hutton, R. J. B., & Crandall, B. (2003). Development of an instructional model for tutoring tactical thinking (Final Tech. Rep. for Subcontract No. SHAI-COMM-01; Prime Contract No. DASW01-01-C-0039 submitted to Stottler Henke Associates Inc., San Mateo, CA). Fairborn, OH: Klein Associates. Ross, K. G., Halterman, J. A., Pierce, L. G., & Ross, W. A. (1998, December). Preparing for the instructional technology gap: A constructivist approach. Paper presented at the Interservice/Industry Training, Simulation, and Education Conference, Orlando, FL. Scielzo, S., Fiore, S. M., Cuevas, H. M., & Salas, E. (2004). Diagnosticity of mental models in cognitive and metacognitive process: Implications for synthetic task environment training. In S. G. Shiflett, L. R. Elliottt, & E. Salas (Eds.), Scaled worlds: Development, validation, and applications (pp. 181–199). Burlington, VT: Ashgate. Serfaty, D., MacMillan, J., Entin, E. E., & Entin, E. B. (1997). The decision-making expertise of battle commanders. In C. Zsambok & G. Klein (Eds.), Naturalistic decision making (pp. 233–246). Mahwah, NJ: Lawrence Erlbaum. Shanteau, J. (1992). Competence in experts: The role of task characteristics. Organizational Behavior and Human Decision Processes, 53, 252–266. Spiro, R. J., Feltovich, P. J., Jacobson, M. J., & Coulson, R. L. (1992). Cognitive flexibility, constructivism, and hypertext: Random access instruction for advanced knowledge acquisition in ill-structured domains. In T. Duffy & D. Jonassen (Eds.), Constructivism and the technology of instruction: A conversation (pp. 57–76). Mahwah, NJ: Lawrence Erlbaum.
Chapter 5
CYBERNETICS: REDEFINING INDIVIDUALIZED TRAINING Elizabeth Biddle, Dennis McBride, and Linda Malone Adaptive learning has long been discussed and studied with the notion that individualized instruction should be able to optimize learning. However, the application of adaptive learning methods has been primarily through various research and development and prototyping efforts rather than standard practice. This is partly due to the numerous means of tailoring training, such as type of instructional feedback and nature of feedback. It is also partly due to the technologies available to analyze, monitor, and recommend training strategies that optimize a student’s unique capabilities and traits along with his or her current state. Virtual environments (VEs) are rapidly being implemented as primary instructional sources for the training of complex, real world tasks (for example, flying and driving). In addition to providing realistic representations of the operational environments, eliminating safety risks, and reducing costs of operating the actual equipment, VEs allow for the instructor—or automated instructor as will be discussed—to control the environment. This enables the student to obtain experience in responding to catastrophic situations (for example, repeating scenarios to experiment with alternative actions) and the instructor, or automated instructor, to provide the student with feedback in terms of auditory, visual, or other cues that are most appropriate for the situation and the student at that particular moment. Various approaches (for example, neural nets and blackboard systems) have been implemented for the purpose of providing individualized, automated instructional applications. Due to the complexities, time requirements, and costs involved, there have only been separate, nonrelated implementations. Additionally, these types of systems have, thus far, concentrated on tailoring instruction on the student’s observable behavior. However, cognitive (for example, information processing and decision making) as well as noncognitive (for example, motivation and self-efficacy) activities that are not easily observed affect the learning process (for example, Ackerman, Kanfer, & Goff, 1995) and often vary with respect to the contextual domain (Snow, 1989). Additionally, optimal methods of instruction vary depending upon the student’s level of expertise.
82
Learning, Requirements, and Metrics
Noncognitive processes can be detected through evaluation of the student’s affective (emotional) reactions (for example, Ackerman, Kanfer, & Goff, 1995; Bandura, 1997). Physiological and neurophysiological monitoring technologies, for cognitive and noncognitive processes, are widely being investigated today as a means of refining the student performance assessment. The future of VE training is to leverage the advances in these technologies to enable a comprehensive assessment of student performance that is used by the training system to adapt the VE to optimize the student’s use of the system. We now elaborate on the above, first by providing a rudimentary treatment of cybernetics and its relationship to learning systems. This section is not intended to transform the reader into a cyberneticist, but rather to provide a top level overview and to suggest what the future may hold for individualized training from a control theoretic perspective. This section will be followed and substantiated with a brief discussion of individual attributes that are important in pedagogy. Finally, we will concentrate on specific physiological phenomenology that, with improving technology, can be measured in real time and exploited further to enhance the learning experience. CYBERNETICS Cybernetics is a scientific discipline. It is not a synonym for computers or networks. As we describe below, central to the concept of cybernetic systems is the role of variable feedback in guiding systems toward goals. In the present context, we are focused on the science and technology associated with individually optimized training feedback schema and what they may look like in the future. The concept and the Greek term “cybernetics” were probably introduced by τησ (kyberne¯tike¯). The word actually referred to the art of nautPlato: χυβερνη ical steermanship, although Plato himself used the term to describe control within the animal, and the roles of what we now call government. The expression was translated in Latin to “governor” and has been used to describe political regulation for centuries. In perhaps the most significant return to the original, technical meaning of the word, James Watt selected the term to describe his 1790 steam engine’s mechanism for velocity control. Andre´-Marie Ampe`re translated to the French mot juste: “cyberne´tique,” and from about this time (1834) forward, the concept and its associated terminology have experienced tidal episodes in acceptance and in understanding. The perceived successes or failures of artificial intelligence are probably associated with these vicissitudes. The popular literature has underwritten an unintended narrowing of the concept by using the diminutive “cyber” in reference to computers, robots, and numerous Internet notions—from cyborgs to cyberspace. The perception that cybernetics is computer specific is not limited to popular opinion. Vibrant (as it should be) within the Department of Homeland Security, for example, is the National Cyber Security Division. Its mission is to protect the nation’s computer/communication networks. This unfortunate lexical trajectory is perhaps out of control, even to the point that the field of cybernetics proper might need to rename itself. For the purposes of this chapter, we argue that training
Cybernetics: Redefining Individualized Training
83
specialists benefit from understanding the value of cybernetics in its rigorous scientific application. Fortunately, it happens that the feedback systems that are exploited by training systems today are increasingly computer based. For the purposes of this chapter, cyber, popularly implying computer-network oriented, is subsumed by cybernetics. As such, cyber systems represent a truly valuable set of tools for training. We are addressing the more inclusive science of cybernetics, and we will finesse the definitional issue in this chapter in the material that follows by introducing the blend, neurocybernetics. Modern Scientific Cybernetics The father of modern cybernetics is considered to be Norbert Wiener, who was a professor at the Massachusetts Institute of Technology. Wiener, boy genius, provided invaluable mathematical concepts and solutions to the U.S. military in both world wars during his adulthood. Weiner’s (1948) title Cybernetics: or Control and Communication in the Animal and the Machine serves both as the seminal contribution for the field and as a suitable technical definition of cybernetics itself. Guilbaud (1959) referred to the emerging discipline as a “crossroads of the science” because cybernetics found itself penetrating several fields, and deriving from several fields. At the heart of the discipline is the phenomenon of feedback for the purpose of control. We will discuss at a top level the important aspects of cybernetics and control theory, but we encourage the interested reader to consult venerable sources such as Wiener (1948), Ashby (1956, 1960), Guilbaud (1959), and Powers (1973). The science of cybernetics is about systematic state changes or mechanisms, not necessarily (nor even usually) about physical machines. Ashby (1956) begins his introductory treatment with three important interrelated phenomena, and he provides the following everyday example to illustrate. Pale human skin, when exposed to the sun, tends to tan or darken. The skin is thus defined as an operand (acted upon), the sun is the operator, and the darkened skin is the transform. The process thus described is termed a transition. For investigators who think in terms of systems of systems, the transition above is only one transition among many others that occur naturally with this particular operator. As Ashby reminds us, other transitions in this solar context include cold soil ! warm soil, colored pigment ! bleached pigment, and so forth. Multiple, related transitions (particularly those with a common operator) are referred to as a transformation (diurnal global reaction in our solar example). For the purposes of this chapter, we will be concerned with change (transition) in student performance (the operand) as a function of training (the operator). Rudimentary training transitions (for example, maintaining aircraft altitude) are components of the larger transformation (achieving solo status). The important concept is that of states and state changes. As systems change systematically, subsystem and system states can be described whether the dynamics are discrete or fluid. One notation scheme used by Ashby (1956) is very simple to learn and use, as in Figure 5.1. Here, for transform U, A transitions to D, B to A, and so forth.
84
Learning, Requirements, and Metrics
Figure 5.1.
Transformation U, with Five Transitions Internally
For our pilot training example’s purposes, let A = reduce power, forward stick pressure, tolerate − 3g (forces); B = discover altitude high; C = discover altitude low; D = trim, fly straight, and level; E = add power, back stick pressure, tolerate + 3g. Now we can make use of the notated transform for achieving straight and level flight on the assigned altitude, or we can make use of a kinematic diagram, which makes more obvious sense out of the transitions as components of the transformation. In the cybernetic context, feedback techniques may very well be different for each transition and may very well change as a function of the number of iterations that the transformation undergoes (in the above case, U, U2, U3, and so forth). This means that feedback for C ! E might optimally be primary only, E ! D primary and secondary, and so forth (see the Human Development: Learning section in this chapter; see also Figure 5.2). Virtual environments provide significant opportunity for the design of imaginative and effective, individualized feedback control systems for training optimization. The control theory approach importantly not only allows for, but encourages, the fielding of controlled feedback systems that are dynamic— reinforcement techniques will systematically change in response to measured changes in independent and dependent variables. Transformations can be much more complex, of course. We provide in Figure 5.3 the following transformation and its associated kinematic graph in order simply to portray the notational relationship between the two. We can trace the elemental transitions and follow them in the kinematic graph. This exercise shows that the transformation consists of kinematics in which a trajectory progresses either to a stopping point or to a cycle. These are called basins, and they imply stability. The last rudimentary cybernetic concept is that of mechanism. We borrow again from Ashby (1956, who borrowed from Tinbergen, 1951) the following illustration (Figure 5.4) of the three-spined stickleback mating pattern. The importance here is that this set of transitions represents a machine without input. The behavior is reflexive—it consists of fixed action patterns. In our application to training, we recognize two important points. First, humans are not dominantly reflexive; we are much more complex behaviorally. We are a highly plasticial (that is, trainable) species. Second, machines (remember, not a
Figure 5.2. Kinematic (Dynamics) Graph for Transformation U, Where the Ultimate Steady State Is Straight and Level Flight
Cybernetics: Redefining Individualized Training
Figure 5.3.
85
Transformation T and Its Kinematic Graph
piece of equipment) for training will be designed to function with input, because the goal of training is not to achieve and maintain homeostasis or equilibrium, but rather to produce state changes in the trainee’s performance repertoire. That humans are highly trainable as indicated above is obviously important. In order to effectively integrate cybernetics and training for the future, we must discuss the product of training—learning, but this important treatment must be provided in the context of learning’s bigger picture: human development. Human Development: Maturation Developmental psychology focuses on two interacting dynamics. The first is technically delineated as maturation. The natural phenomenon here is that survival skills like walking emerge or “unfold” in the developing individual when (1) physiological critical periods visit the developing organism and when (2) the organism engages the behavior that is being genetically entrained. When this is successful, a developmental milestone is achieved. Thus technically, the infant cannot possibly learn to walk because the developmental window has not made this optional. But the toddler does not learn to walk either. Walking is a survival skill that is encoded in the organism’s DNA, such that when the underlying physiology
Figure 5.4.
The Male-Female Mating Pattern for the Three-Spined Stickleback
86
Learning, Requirements, and Metrics
is prepared, the child will commence the acquisition process and achieve a developmental milestone for motility. We are reminded of the old adage “When the student is ready, the teacher will come.” Disputed in what appears to be more of a political rather than scientific forum, verbal communication, even grammar (not a language per se) may be encoded for maturational debut. Regardless, the underlying physiological processes are becoming clearer: the role of myelin is key. Myelinization in the nervous system is the process whereby, to use pedestrian terms, gray matter becomes white matter. A deep treatment is beyond the scope of this chapter (for more detail, see, for example, McBride, 2005). The product of myelinization is the formation of a gapped, lipid-protein coating (arising from symbiotic glial and oligodendrocyte cells) around the axons of neurons. This electrical insulation produces an increase in nerve conduction speed by as much as three orders of magnitude (that is, from ca .1 m/s (meters per second) to perhaps 100 m/s). A corresponding decrease in neural refractory time (the time in ms during which the neuron cannot refire) means that effectively, the bandwidth is increased in myelinated cells by as much as a factor of 1,000. For comparison purposes, this transformation is equivalent to upgrading the download of a 4.7 Gb (gigabyte) movie via a 10 Base-T at 10 Mb/s (megabits per second) to a 10 Gb/s OC-192. The former requires 1.5 h; the latter, 5 s. Based on the high proportion of bandwidth and the resulting control, a human can, for example, run coordinately at maximum sprint speed in one direction, throw a football with deadly accuracy to a target running at maximum speed in an oblique direction, all while being chased by several 300+ pound men with obliteration on their agenda. Humans have significantly more (total and relative) white matter than other species—remarkably more than even its nearest genetic neighbor, the chimpanzee. As a result, humanity has produced what is unthinkable to other species: mathematics, music, trips to the moon, and more. It is quite arguable that myelin is what makes Homo sapiens unique. From conception and probably into adulthood, the process of myelinization continues such that when we achieve bandwidth sufficient to our bladders, we get bladder control; to our extremities, motility and manipulation; to our temporal lobes, speech, arguably language itself; and so forth. Many components of the human nervous system do not even begin the myelinization process until well after (as in many years) parturition. As we learn more about the emergence of myelinization, we see the distinct possibility that a significant number of trained behaviors in the ultimate, highly skilled human repertoire are partially matured and then honed through learning (the latter will be defined scientifically later). If so, serious cybernetic consideration should be given to this in terms of the design of critical period challenge environments. 1 It is evident that our 1
Indeed, as of this writing, the Office of Naval Research is exploring this prospect with its Combat Hunter program. One hypothesized notion is that during millions of years of human life prior to modernity, many hunting/fighting skills were matured/learned as children grew from the toddler through the juvenile stage and into adulthood. Since this was probably characteristic of life in the Pleiocene/Pleistocene, critical periods were met with opportunities to acquire instinctive capabilities. However, in our highly synthetic world, where little game hunting-like activity is engaged, perhaps critical periods come and go without producing developmental milestones.
Cybernetics: Redefining Individualized Training
87
educational systems at some level recognize and exploit the spontaneous, increasing maturity of the growing child and the growing nervous system therein. Mathematics curricula, for example, follow a progressive complex from numbers to algebra to geometry to trigonometry to calculus, and so forth. If McBride and Uscinski (2008) are correct along this line, identification of the progress of the expanding myelin network, measured noninvasively directly, or through corresponding behavioral testing, suggests a significantly new way of thinking about training. That is, by providing abundant exposure to skill learning challenges during the near-completion period of myelinization, in theory, the trainee might absorb the skill as more of an instinct rather than a conditioned response. It should be noted that one can actually acquire a skill even though it did not arise during a developmental critical period, but the behavior will likely be less efficient than it would have been if maturated.
Human Development: Learning There are several general concepts and technical definitions of learning, but the concepts all converge on the cybernetic notion of directionally transformed behavior. In terms of formal definition, we invoke one that is readily applied at least to perceptual motor skill acquisition. This definition is arguably the most robust for this chapter’s purposes because much of cognitive training involves such skill. Learning is the (rather) permanent change in behavior that comes about as the result of reinforced practice. That is, progress derives from trial and error learning, wherein the behavior in question (for example, landing an aircraft) is refined through practice and feedback. In this sense, landing the aircraft is not thought to be the product of mere maturation, though essential maturation (for example, eye-hand coordination) must have been achieved in order to begin the learning process. Rather, landing accuracy improves based on the maturationally prepared trainee’s exploitation of feedback. We must comment on elements of the definition of learning in the previous paragraph. First the escape words “(rather) permanent” are partially in parentheses because of a technicality. The learning theory community posits that learning is plastic, that is, it grows with feedback and is permanent. Whereas new behavior can be learned that may override previously learned behavior, the originally acquired behavior is said to be “permanently learned.” To provide a shallow argument in support, the so-called tip-of-the-tongue phenomenon is an everyday example of this notion of permanence. That is, finally recalling a name corresponding with a face, days after trying relentlessly, hints at the robustness of a permanently learned association. The words “change in behavior” are fundamentally important. Behavioral psychology respects research into the internal correlates of learning, but the field has classically been focused on observable, molar level behavior. The philosophy is that if behavior is to change, behavior must happen. This leads to the last component of our definition of the state change, learning. “Reinforced practice” is central to cybernetics. The trainee brings existing skills as joint products of aptitude and experience to the first trial of a
88
Learning, Requirements, and Metrics
new training experience as trials continue. Reinforcement (feedback) guides subsequent performance. The key then to learning and to cybernetics is the quality and quantity of feedback provided. We contend that if experimental psychology has learned only one fact, it is that feedback—reinforcement—is sine qui non to the acquisition of skill. Feedback in training contexts comes in many varieties and utilities. The most fundamental feedback is simply knowledge of results, or KR. In order to improve any skill, the student must know the results of his or her efforts. But there is more to feedback than KR. There are myriad schemes (and a rich literature) for improving performance based on other primary sorts of feedback. And there is an abundance of supplementary (secondary and higher order) feedback methods. The science and technology of optimized feedback for human learning—especially its future—will be discussed, but we must first mathematically characterize learning in order to identify the parameters of the learning process that may be exploitable cybernetically. Figure 5.5 is an idealized, group averaged, learning curve. It portrays accuracy (landing accuracy) as a function of practice. The curve reveals progress as negatively accelerated and monotonic. Moreover, as detailed theoretically by Noble (1978),2 a mathematical expression for such is provided in Eq. (1). A = C(1 − e−kN) + T
(1)
The variables are described in Table 5.1. Noble indicates that this equation explains 98 percent of the variance for empirical curves for the 10 perceptual motor skills datasets that he analyzed. These skills range from simple ones to more complex ones. Of importance here is the relative contribution of variables N, k, and T. The first, N, represents the quantity of practice engaged by participants. What is clear in the present graph and from interpretation of its mathematical form is that quantity of practice clearly exerts a positive influence on performance. The next independent variable
Figure 5.5. 2
An Idealized, Group-Average Learning Curve
A comprehensive treatment of Noble’s (1978) theoretical work is beyond the scope of this book. However, the reader is strongly encouraged to pursue an understanding of his work. This approach to the mathematical based study of learning provides significant opportunities for training development beyond our cybernetic approach.
Cybernetics: Redefining Individualized Training
89
Table 5.1. Variables for Eq. (1) N = quantity of practice; number of trials, the independent variable A = accuracy in identifying targets in imagery; positive identifications; dependent variable C = constant that transforms the accuracy of an ideal participant after N trials to R T = theoretical joint contribution of experience and hereditary factors (can be negative also) k = theoretical rate parameter; representative of individual differences in aptitude, and so forth.
of interest is the exponent k. This variable represents the contribution of aptitude factors—these Noble defends empirically and theoretically as the product of genetics. The last variable, T, a joint contribution of experience and heredity, is also very important, though algebraically rather than geometrically (as is the case for k, N ). This variable represents a joint contribution of experiential and hereditary factors. Thus N and T represent in whole or part, environmental contributions to the learning curve. T and k, on the other hand, provide partly and fully, respectively, organismic variability that arises largely from heritable sources. At this point we do not want to fuel an ancient and naive nature versus nurture argument. Clearly both environment and genetics contribute to the acquisition of skilled behavior. The point is that learning, the product of training, can be decomposed into elemental sources of variation with respect to influence. Whether nature-nurture variability is exploited for designing training systems in the future remains to be seen. Our principal point is that training state transformations will comprise multiple, skill-specific transitions, each of which must derive from a technical understanding of how the particular feedback methods succeed or fail for each transition in the transformation.
Human Development: Variability Discovery of the sources of variation in variables such as N, k, and T is important for the future of training system development. However, as we stressed above, mathematical analysis of grouped, averaged learning data accounts for what it addresses: the data of a pool of trainees, represented typically as simple mean scores. So, how do we proceed from group averages to designing individualized training systems? On the one hand, experimental psychology generally considers variance to be a problem and methodologically treats it this way: for this field, group mean differences are the concern. Differential psychology, on the other hand, thrives on variability. Later in this chapter, we provide a comprehensive look at many individual and group differences, with an eye toward exploiting them. Moreover, we believe that future success in cybernetically conceived training will accrue based on the two communities working together. Former American Psychological Association president Lee Cronbach pleaded for this in 1957.
90
Learning, Requirements, and Metrics
The (semi-) individualized approach has been sensible in terms of designing training systems for differences among groups—for example, the aging versus youth; male versus female, and so forth. We take this approach to the next step. The optimal training path from y intercept to asymptote for one individual is not necessarily the same optimum as for another individual. Fleishman (1953, 1966) showed over a half century ago that different underlying factors (for example, manual dexterity and balance) are called into play in skill acquisition as a function of the type of task engaged (that is, between skilled tasks) and how far up the learning curve subjects have aspired to in training (within skilled tasks). It is reasonable that knowledge of how sub-skills dynamically emerge during training episodes can be used to design differential feedback systems. As we show in the pages that follow, variation among (and within) people in their respective aptitudes, cognitive styles, emotions, motivations, and so on is very considerable. More importantly, these organismic variables are reliably and validly measurable, consistent with means of exploiting them with sufficient temporal leave. In other words, we are only beginning to foresee the many and rich ways that feedback systems may be designed for individually tailorable, maximal training yield. Much more is being learned about human learning and, importantly, about the very neural correlates that were ignored or inferred in the past. With continued advances in imagery technology such as functional magnetic resonance imaging (fMRI) and functional near-infrared imaging, it is likely that feedback systems will be organized so that (neural and behavioral) state change sequences can be shepherded most efficiently for desired transforms. With the above consideration, we endorse the term neurocybernetics for what we believe will be an explosive science and technology. In the context of training systems, we suggest that neurocybernetics is the cybernetic exploitation of plasticial, neurobehavioral systems, such that optimized feedback mechanisms operate on an individual’s repertory (operand) in order to effect specifically desired training states.
INDIVIDUAL DIFFERENCES IN LEARNING That people learn differently has been discussed and researched for decades. In the 1970s, computer scientists began focusing their artificial intelligence research toward the development of intelligent tutoring systems to provide instructorless, individualized instruction. Then in the 1980s, Bloom (1984) reported a twosigma increase in learning outcomes for students who received instruction by a one-on-one human instructor versus students who received traditional classroom instruction. This effect is explained by a human tutor’s ability to evaluate student state in combination with task performance and use this information to tailor their interactions. This seminal work initiated a rebirth of individualized training systems research and development to replicate Bloom’s findings with a computerized tutor.
Cybernetics: Redefining Individualized Training
91
Motivation, personality, and perceived autonomy are considered as some of the noncognitive variables that affect student learning (Ackerman, Kanfer, & Goff, 1995; McCombs & Whisler, 1989). Table 5.2 summarizes some of the noncognitive learning variables most commonly discussed and referred to as “affective learning variables” as the effects of these variables on student performance is commonly exhibited through emotional (affective) responses. We understand, and advocate, that many more factors, such as intelligence and past personal experiences for a start, contribute to individual differences in learning, and the potential interactions and relationships may well be infinite. As likely noticed in these brief descriptions, the variables were described as interacting with each other. For instance, self-efficacy increases when student autonomy, motivation, and self-regulation are stronger—although a specific mathematical representation of this relationship has not been identified, and it is doubtful the relationship is constant between and within individuals. In the next section, we discuss potential methods for obtaining these objective measures. PHYSIOLOGY OF PERFORMANCE AND EMOTION The curvilinear relationship between arousal and performance, known as the Yerkes-Dodson law (Yerkes & Dodson, 1908), has long been established. The Yerkes-Dodson law states that performance increases with increased arousal, but only to a certain level. Once the arousal level is too high, performance begins to deteriorate. The maximum optimum arousal level differs among individuals and most likely within individuals depending upon the specific task and current state. Although there has been debate as to whether physiology precipitates an emotion or is an aftereffect of a change in emotion or mood, it is commonly accepted that physiological processes are associated with emotion. The physiological description of emotion is based on the investigations (for example, Levine, 1986) of the physical activation of arousal (resolved) and stress (unresolved). Therefore, arousal is associated with the “fight or flight” phenomenon in which arousal is increased to either sustain a fight or to enable rapid flight. Regardless of whether the individual fights or flees, he or she resolves the situation and thus reduces his or her arousal. Stress on the other hand, increases in situations in which individuals feel that they cannot control the outcome of the situation, such as reducing or eliminating the stressor. Therefore, arousal is considered a positive response, while stress is a negative response. However, arousal can also be negative if levels become too high or sustained for too long (again exactly how “high” varies depending upon the person and his or her current state). With respect to physiology, arousal and stress are controlled by sympatheticadrenal secretions of adrenaline (epinephrine) and cortisol (the interested reader is referred to Levine, 1986). Increased epinephrine secretion leads to an increase in arousal and can occur as a response to both positive and negative stimuli, with the intensity of the stimulus and the person’s perception of the stimulus’s intensity determining how much the epinephrine secretion rate increases (Frankenhauser, 1986). The secretion of cortisol is related to uncontrollable and
92
Learning, Requirements, and Metrics
Table 5.2. Noncognitive (Affective) Learning Variables Affective Learning Variable
Description
Instructional Recommendation
Self-Efficacy
• Student’s belief in his or her ability to successfully complete an activity to achieve a goal.
Promote positive student selfefficacy: • Sufficiently challenge the student (Snow, 1989). • Provide positive, timely, and relevant feedback (Bandura, 1997).
Motivation
• Student’s desire to succeed that drives him or her to extend an effort to learn.
• Motivated students typically possess positive self-efficacy and self-regulatory skills (Bandura, 1997). • Increased motivation is associated with increased student performance (see Ackerman, Kanfer, & Goff, 1995).
Student Autonomy
• Student control over the instructional process.
• Placing responsibility for learning on student increases student autonomy and motivation (Kember, Wong, & Leung, 1999). • Empowering students to make decisions regarding the learning process and encouraging student initiatives promote student autonomy (Reeve, 1998).
Self-Regulatory Skills
• Coping behaviors the student uses to maintain task focus, maintain confidence in the face of criticism and difficulty, and ensure that sufficient effort is extended on the task.
• Metacognitive, emotion, and social skills improve self-regulation (Hattie, Biggs, & Purdie, 1996; Schunk, 1989).
Personality
• Personality traits influence participation in specific types of activities that are optimized for his or her personality and skills set (Matthews, 1999). • Student prominence in these preferred activities serves the opportunity to skills sets optimized for the preferred environment.
• Instructor cognizance of student’s personality traits can be used to guide selection of instructional intervention type.
Cybernetics: Redefining Individualized Training
93
uncertain situations, which results in feelings of helplessness and distress (Levine, 1986; Frankenhauser, 1986). Therefore, the physiology of emotion can be described as a two-dimensional model (for example, Frankenhauser, 1986) of arousal (adrenaline) versus stress (cortisol). The traditional physiological measures used to identify emotion are skin conductance (galvanic skin response), electromyogram, electroencephalogram, heart rate, and respiration. Physiological evaluation and identification of emotion are difficult since the interactions of the sympathetic and parasympathetic response systems during arousal and stress are nonlinear. Additionally, physiological responses to external events differ between individuals and within individuals. Neurophysiological measurement technologies such as dense-array electroencephalography, near-infrared spectroscopy, and fMRI allow for evaluation of cognitive activity, in addition to noncognitive activity. Advances in these technologies are lessening the invasiveness and cumbersomeness of the equipment, thus increasing the potential for integrating these technologies within a VE training environment. Description of the various neurophysiological measurement technologies and measurement approaches is beyond the scope of this chapter (the interested reader is referred to Poulsen, Luu, and Tucker, Volume 1, Section 1, Chapter 1). CLOSED-LOOP TRAINING: BOUNDED ONLY BY IMAGINATION As described by many of the preceding chapters of this book, data obtained from a virtual environment can be used to evaluate trainee performance and then by an instructor or automated system to modify the virtual environment so as to optimize the student’s learning experience in the virtual environment. Further, dynamic monitoring of physiological response to environment interactions or other events can be used to identify changes in the student’s noncognitive state. These physiological and neurophysiological measures, in association with task performance measures, can be used to adapt the virtual environment to optimize the training experience as proposed by Sheldon (2001). This approach is to evaluate aspects of student state (for example, stress, frustration, and cognitive activity) and adapt the virtual environment focusing on the manipulation of a couple of instructional intervention types that have been previously demonstrated to be associated with a specific student state response. This capability is rapidly maturing as recent technology advances have decreased the invasiveness and increased the mobility of physiological and neurophysiological measurement devices. In other words, with the continued advances in physiological and neurophysiological monitoring and measurement, we can obtain real time data regarding the operand—the student’s performance—and use this information to apply a transform—instructional feedback—with the goal of achieving a specific transition—a quantifiable improvement in student performance. To realize the above, the maturation of student state assessment and diagnosis capabilities that are able to detect and adapt to changes in student state more similar to a human instructor needs to occur. This will require the integration of learning/artificial intelligence technologies to identify patterns of student state
94
Learning, Requirements, and Metrics
responses and trends with respect to characteristics of the VE and events that occur during the training session. Instead of using a fixed (known) set of interventions, which are in turn provided in response to a known and quantifiable change in student state, student state assessment and diagnosis technologies will enable the provision of varied types of interventions based on a continued monitoring of the student’s response to the interventions. A taxonomy, similar to the taxonomy presented by Klein and Baxter (Volume 1, Section 1, Chapter 3), is needed to provide a basis for automatically adapting the environment to optimize the student’s instructional experience. A threedimensional taxonomy of environment interactions versus physiological response versus instructional intervention type should address various modalities, combinations of modalities, level of stimulation, and connotation of intervention (when applicable). The following list provides a recommendation of potential areas ripe for exploration. Additionally, task-specific interventions, such as injection of scenario events, introduction of a fault or other system change, and so forth also need to be considered. • Visual: scene/environment, avatars, fidelity, feedback, clutter, color, and so forth; • Auditory: task-specific and non-task-specific cues, communication with avatars or other humans in the loop, instructional feedback, music, noise, and so forth; • Tactile: task-specific and non-task-specific cues, environmental, instructional feedback, memory triggers, and so forth; • Olfactory: task-specific and non-task-specific cues, environment response, memory triggers, and so forth; • Any combination of the above.
CONCLUSION As technologies for interpreting student state advance, methods and technologies for responding to student needs in optimal fashion are needed to truly optimize VE technology for training. There is ample evidence that individual differences in composition and their current state affect learning and interactions in a training environment. There has been tremendous research in the investigation of these differences and methods for adapting instruction to the benefit of these variables. There have also been many strides in the physiological and neurophysiological evaluation and understanding of human response. The integration of these fields with performance assessment and instructional methods provides a road to harnessing the power of VE training applications. REFERENCES Ackerman, P. L., Kanfer, R., & Goff, M. (1995). Cognitive and noncognitive determinants and consequences of complex skill acquisition. Journal of Experimental Psychology: Applied, 1(4), 270–304. Ashby, W. R. (1956). An introduction to cybernetics. London: Chapman & Hall.
Cybernetics: Redefining Individualized Training
95
Ashby, W. R. (1960). Design for a brain: The origin of adaptive behavior. London: Chapman and Hall. Bandura, A. (1997). Self-efficacy: The exercise of control. New York: W. J. Freeman & Co. Bloom, B. S. (1984). The two sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13(4–6), 4–16. Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671–684. Fleishman, E. A. (1953). A factor analysis of intra-task performance on two psychomotor tasks. Psychometrika, 18, 45–55. Fleishman, E. A. (1966). Human abilities and the acquisition of skill. In E. Bilodeau (Ed.), Acquisition of skill (pp. 147–167). New York: Academic. Frankenhaeuser, M. (1986). A psychobiological framework for research on human stress and coping. In M. Appley & R. Trumbull (Eds.), Dynamics of stress: Physiological, psychological, and social perspectives (pp. 101–116). New York: Plenum Press. Guilbaud, G. D. (1959). What is cybernetics? New York: Grove. Hattie, J., Biggs, J., & Purdie, N. (1996). Effects of learning skills interventions on student learning: A meta-analysis. Review of Educational Research, 66(2), 99–136. Kember, D., Wong, A., & Leung, D. (1999). Reconsidering the dimensions of approaches to learning. British Journal of Educational Psychology, 69, 323–343. Levine, P. (1986). Stress. In M. G. Coles, E. Donchin, & S. W. Porges (Eds.), Psychophysiology: Systems, processes, and application (pp. 331–353). New York: Guilford Press. Matthews, G. (1999). Personality and skill: A cognitive-adaptive framework. In P. Ackerman, P. Kyllonen, & R. Roberts (Eds.), Learning and individual differences: Process, trait, and content determinants. Washington, DC: American Psychological Association. McBride, D. K. (2005). The quantification of human information processing. In D. K. McBride & D. Schmorrow (Eds.), Quantifying human information processing (pp. 1–41). New York: Rowman & Littlefield. McBride, D. K., & Uscinski, R. (2008). The large brain of H. sapiens pays a price. Manuscript submitted for publication. McCombs, B. L., & Whisler, J. S. (1989). The role of affective variables in autonomous learning. Educational Psychologist, 24(3), 277–306. Noble, C. E. (1978). Age, race, and sex in the learning and performance of psychomotor skills. In R. T. Osborne, C. E. Noble, & N. Weyl (Eds.), Human variation: The biopsychology of age, race, and sex (pp. 51–105). New York: Academic Press. Powers, W. T. (1973). Behavior: The control of perception. Chicago: Aldine. Reeve, J. (1998). Autonomy support as an interpersonal motivating style: Is it teachable? Contemporary Educational Psychology, 23(3), 312–330. Schunk, D. H. (1989). Self-efficacy and cognitive skill learning. In C. Ames & R. Ames (Eds.), Research on motivation in education: Vol. 3. Goals and cognitions (pp. 13– 44). New York: Academic Press, Inc. Sheldon, E. (2001). Virtual agent interactions. Unpublished doctoral dissertation, University of Central Florida, Orlando. Snow, R. E. (1989). Aptitude-treatment interaction as a framework for research on individual differences in learning. In P. Ackerman, R. Sternberg, & R. Glaser (Eds.), Learning and individual differences: Advances in theory and research (pp. 13–60). New York: W. H. Freeman & Company.
96
Learning, Requirements, and Metrics
Tinbergen, N. (1951). The study of instinct. Oxford, United Kingdom: Clarendon. Wiener, N. (1948). Cybernetics: Or control and communication in the animal and machine. Cambridge, MA: MIT Press. Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18, 459–482.
Part IV: Spanning the Bands
Chapter 6
A THEORETICAL FRAMEWORK FOR DEVELOPING SYSTEMATIC INSTRUCTIONAL GUIDANCE FOR VIRTUAL ENVIRONMENT TRAINING Wendi Van Buskirk, Jessica Cornejo, Randolph Astwood, Steven Russell, David Dorsey, and Joseph Dalton In recent years the military has placed increased emphasis on advancing training technology because ineffective training can have disastrous consequences. One such area of training technology that has received attention recently is scenario based training (SBT) in virtual environments. In an SBT paradigm, the trainee is presented with scenarios or situations that are representative of the actual task environment (Cannon-Bowers, Burns, Salas, & Pruitt, 1998). SBT has many advantages associated with it, ranging from structured repetition to developing higher order skills (Oser, Cannon-Bowers, Salas, & Dwyer, 1999). One particular advantage of SBT is that it gives trainees opportunities to practice and receive feedback, which are generally considered to be important components of any successful training program. Opportunities for trainees to practice and receive feedback are why SBT is recommended as a medium to train people to handle situations that do not occur frequently in “real life” (for example, nuclear reactor accidents), to handle situations that are life threatening and/or expensive (for example, firefighting, aircraft piloting, and submarine navigation), or to train competencies that “require training in an environment that replicates the critical aspects of the operational setting” (Oser et al., 1999, p. 176). Despite the advantages of SBT, the science surrounding the implementation of quality instruction within SBT is lacking. Instructional system designers depend heavily on technology to drive the functionality within SBT virtual environments (Oser et al., 1999). However, instructional system designers also need to focus on creating good instruction to support the learning process during training within these environments. In other words, the mere presence of technology does not equate to successful training, but it is easy to overlook this fact when building expensive, impressive virtual environment training systems. It is therefore important for system designers to remember that “in the absence of a sound learning
98
Learning, Requirements, and Metrics
methodology, it is possible that training systems may not fully achieve their intended objectives” (Oser et al., 1999, p. 177). Equal emphasis must be placed on the science of training and the technological medium of training. We believe this overreliance on technology is due to a lack of guidance regarding which training interventions will result in the most effective training outcomes within SBT environments. For example, an instructor using SBT can give a trainee feedback, but questions remain: Is feedback more effective than another training intervention (for example, deliberate practice)? If so, what type of feedback should be used (for example, outcome, process, velocity, or normative)? When should the instructor introduce the feedback—before, during, or after scenarios? Until guidance is available to address questions such as these, the utility of SBT technology will be limited. In fact, there is a general consensus among training researchers that more research needs to be carried out in order to determine which training interventions correspond with which learning outcomes such that learning is maximized (Tannenbaum & Yukl, 1992; O’Neil, 2003; Salzman, Dede, Loftin, & Chen, 1999); to date this problem has not received adequate attention (Salas, Bowers, & Rhodenizer, 1998; Salas, Bowers, & Cannon-Bowers, 1995; Salas & Cannon-Bowers, 1997). Even in the face of a proliferation of instructional design theories and models and a wealth of accumulated training knowledge, typical training practice (particularly in simulation based and virtual training environments) lags behind our knowledge of the “science of training” (Salas & Cannon-Bowers, 2001). Therefore, a major challenge facing instructional systems designers is to build systems and training programs based on scientific theories and empirical results rather than on individual preferences and methods of convenience. As summarized by Merrill (1997, p. 51), “which learning strategy to use for a particular instructional goal is not a matter of preference, it is a matter of science.” The goal of this chapter is to address the science of training through the development of a systematic, ontological, and comprehensive organization of learning outcomes and training interventions into a single data structure, which we call the Training Intervention Matrix (TIMx). This matrix will provide an organizational structure that researchers can use to integrate training research results and provide empirically supported guidelines for instructional system designers to follow. TRAINING INTERVENTION MATRIX The purpose of the TIMx is to provide instructional system design guidance by linking together two taxonomies—a taxonomy of learning outcomes (LOs) and a taxonomy of training interventions (TIs). Together these two taxonomies will form the foundation architecture of the TIMx. Once the framework is formed, it is our hope that the entries in the intersecting cells, which will contain empirical research results, can be used to derive design guidance. However, the first step to create the TIMx is to develop the two taxonomies. In order to develop the taxonomies of LOs and TIs, we relied on the five requirements specified by Krathwohl, Bloom, and Masia (1964):
A Theoretical Framework for Developing Systematic Instructional Guidance
99
1. A taxonomy must be organized according to a single guiding principle or a set of principles. 2. A taxonomy should be tested by verifying that it is in concurrence with experimental evidence. 3. The order of a taxonomy should correspond to the actual order that occurs among the pertinent phenomena. 4. A taxonomy should be consistent with sound theoretical views. 5. A taxonomy should point “to phenomena yet to be discovered” (Krathwohl et al., 1964, p. 11).
Following these requirements establishes our LOs and TIs as scientifically classified taxonomies rather than simple lists. In the next sections, we focus on how we met the requirement that our LOs and TIs must be organized according to a single guiding principle or set of principles. [To learn how we met Krathwohl et al.’s (1964) four other requirements, see Van Buskirk, Moroge, Kinzer, Dalton, and Astwood (2005)]. X-Axis: Learning Outcomes Taxonomy We began the development of our taxonomy of LOs by first searching the literature for existing taxonomies. Some of the taxonomies we identified included Fleishman, Quaintance, and Laurie’s (1984) taxonomy of human performance, Krathwohl et al.’s (1964) taxonomy of educational objectives, O’Neil’s (2003) distance learning guidelines, and Weigman and Rantanen’s (2002) human error classification. All the taxonomies provided useful guidance in creating lists of learning outcomes to be included in the final taxonomy. However, many of the taxonomies were created for different purposes and contained items that were not appropriate for the domain in which we were working. Additionally, a few of the taxonomies did not satisfy the five taxonomy requirements listed above. Therefore, we set out to create a new taxonomy while leveraging the work of these researchers. In order to develop our own taxonomy of learning outcomes, we turned to Wei and Salvendy’s (2004) human-centered information processing model to serve as our guiding principle (that is, taxonomy requirement #1). Wei and Salvendy’s HCIP Model Wei and Salvendy’s (2004) human-centered information processing (HCIP) model is a worker-oriented model for cognitive task performance. It attempts to identify all of the cognitive aspects of human performance in technical work. We chose this model for several reasons. First, this model is an inputthroughput-output model of human task performance. Second, there is an external feedback loop from response to stimuli. This model captures how the trainee perceives, processes, and responds to information. Third, we chose the HCIP model because it breaks memory into its component parts rather than clustering it all into one “memory” box (that is, the model separates long-term, sensory, and working memory). This was important for our purposes because breaking
100
Learning, Requirements, and Metrics
memory into its component parts should facilitate the organization of the learning outcomes. Finally, we selected the HCIP model because it includes teamwork processes, individual differences, and external factors as components. Teamwork processes are important for our purposes because the TIMx addresses the training of both individuals and teams. The inclusion of individual differences and external factors is important because they account for additional sources of variation —variation within and among individuals and variation due to the situation. In future stages of development, these sources of variation will become caveats or moderators in the cells of our matrix. Modifications to the HCIP Model Despite the noted advantages of Wei and Salvendy’s (2004) model, we adapted it slightly. The main purpose for this was to ensure that our learning outcomes would be mutually exclusive within each module. For example, we combined Wei and Salvendy’s original “mental plan and schedule” module and “mental execution” module to form our “cognitive task execution” module. The original two modules had considerable overlap in their definitions and functions, which did not allow us to cleanly fit learning outcomes into just one module. Our revised version is presented in Figure 6.1 and will be referred to as the humancentered information processing-revised model (HCIP-R). Our revised model maintains the advantages of the original HCIP model (that is, input-throughputoutput, components of memory, and accounts for teams and individual
Figure 6.1.
Revised HCIP Model
A Theoretical Framework for Developing Systematic Instructional Guidance
101
differences) that led us to select it in the first place. The HCIP-R serves as the guiding set of principles for our taxonomy of learning outcomes. As part of establishing the HCIP-R as our way to organize the LOs in our taxonomy, we created definitions for each module in the model. We also created conceptual definitions for each LO in order to have a common set of terminology. Having this common terminology is important so that the research and empirical findings are associated with the correct LO. For instance, it is possible that what one researcher may call “decision making,” another researcher may call “planning.” It creates a standard language to discuss and compare research findings. In order to create these definitions, we identified, created, or modified existing definitions for each of our LOs. Once each LO was defined, we assigned it to the appropriate HCIP-R module. Definitions and module classifications are presented in Table 6.1. It is important to note that this may not be an exhaustive list and other learning outcomes we inadvertently overlooked could be included. Y-Axis: Training Interventions Taxonomy As with the learning outcome taxonomy, we first searched for existing taxonomies of TIs. Unfortunately, our search of the literature revealed no such taxonomies. We therefore developed our own taxonomy by searching for training strategies in the literature. Also like the LO taxonomy, we met the requirement of selecting an organizing framework for our TIs. The organizing framework we selected is called the Common Distributed Mission Training Station (CDMTS; Walwanis Nelson, Owens, Smith, & Bergondy-Wilhelm, 2003). CDMTS allowed us to organize our TIs into one of four modules, which are defined below (see Table 6.2). Common Distributed Mission Training Station (CDMTS) The CDMTS was designed to address simulation and computerized training programs that involve physically distributed teams. It is based on the logic that, regardless of training content, instructors have a common set of needs. The CDMTS is a physical, computerized product that is a training aid for instructors. It was developed based on a framework that presents the interrelationships among several dimensions of military training systems. It is this framework that we use to organize our taxonomy of TIs. We combined the CDMTS architectural framework dimensions into four general training design modules: student-initiated learning, instructional planning, exercise manipulation, and feedback. Each module corresponds to one of the three general stages of an SBT cycle: pre-exercise, during exercise, and postexercise. Student-initiated learning was included to encompass TIs that take place without the guidance of an instructor, such as when team members review each other’s and the overall team’s performance. Instructional planning refers to TIs established before the training exercise occurs. For example, lectures typically would not occur during an exercise. Rather, they typically provide
102
Learning, Requirements, and Metrics
Table 6.1. Learning Outcomes and Definitions for each HCIP-R Module Modules and Learning Outcomes
Definitions
Information Interface
The input of information or data for cognitive processing. The information input can be achieved through perceiving stimuli through physical channels such as visual and auditory.
Perceptual Judgment
Understanding the information implied by the physical properties of an object (Lederman & Wing, 2003).
Perceptual-Motor Skill
The ability to match voluntary physical motion and perceptional observations (Holden, Flach, & Donchin, 1999).
SA: Perceptiona
The perception of cues in the environment (Endsley, 2000).
Spatial Orientation
One’s awareness of oneself in space and the related ability to reason about movement within space (Hunt, 2002).
Visual Scanning
Using vision to search for objects or stimuli in the environment (for example, displays, objects, and rooms) in order to gather information (McCarley et al., 2001; Stein, 1992).
Information Handling
Captures lower level human information processing. It involves recognizing and translating input from the information interface module into information usable for higher level processing (that is, the cognitive task execution module, the interpersonal interactions module, and so forth).
Pattern Recognition The process of identifying or categorizing large perceptual chunks of information in one’s environment (Chase & Simon, 1973; Reisberg, 1997). SA: Comprehensiona
“Encompasses how people combine, interpret, store, and retain information. It includes the integration of multiple pieces of information and a determination of their relevance to the person’s goals” (Endsley, 2000, p. 7).
Cognitive Task Execution
Higher level cognitive task performance related to examining, evaluating, and reviewing available information in order to determine a course of action (or inaction).
Organizing/ Planning
The process of calculating a set of actions that will allow one to achieve his or her goal (Garcia-Martinez & Borrajo, 2000).
Problem Solving/ Decision Making
The process of gathering, organizing, and combining information from different sources to make a choice from between two or more alternatives (Lehto, 1997).
Resource Management
Assessing characteristics of resources and needs and determining their appropriate allocation (Gustafsson, Biel, & Garling, 1999).
SA: Projectiona
“The ability to forecast future situation events and dynamics and their implications” (Endsley, 2000, p. 7).
A Theoretical Framework for Developing Systematic Instructional Guidance
103
Interpersonal Interactions
Behaviors for sharing information and working with others, including both human and synthetic teammates.
Assertiveness
Both the willingness and the ability to communicate one’s own opinions in a manner that will be persuasive to others (SmithJentsch, Salas, & Baker, 1996).
Backup Behaviors
Not only offering assistance to others, but also requesting assistance when one knows he or she is overloaded (SmithJentsch, Zeisig, Acton, & McPherson, 1998).
Communication
Using proper phraseology, providing complete reports, avoiding excess chatter, and generally using speech to share information and ideas in a manner that others will understand, including listening effectively and fostering open communication (SmithJentsch, Johnston, et al., 1998).
Error Correction
Monitoring for errors and taking action to correct these errors when they take place (Smith-Jentsch, Zeisig, et al., 1998).
Explicit Team Coordination
Using planning or communication mechanisms to manage team task dependencies (Espinoza, Lerch, & Kraut, 2002).
Implicit Team Coordination
Coordinating team member actions based on shared cognition and unspoken assumptions about what other team members are likely to do (Espinoza et al., 2002).
Information Exchange
Seeking information from available resources, passing information to appropriate persons, providing accurate “big picture” situation updates, and accurately informing higher commands (Smith-Jentsch, Johnston, & Payne, 1998).
Leadership
Providing guidance or suggestions to others, stating clear team and individual priorities, and appropriately refocusing others in accordance with situational demands (Smith-Jentsch, Johnston, et al., 1998).
Attention
Captures human cognitive task performance related to attention resource allocation needed for cognitive tasks. The resources are the limited capacity inventory that supplies the attention resources to the other modules for jobs.
Attention Prioritization
Determining the relative order in which objects or pieces of information are to be focused on or investigated (Shomstein & Yantis, 2004).
Metacognition
Using knowledge, skills, and beliefs concerning one’s own cognitive processes and products (Flavell, 1976).
Memory and Knowledge Acquisition
This module captures human cognitive task performance related to retrieving, storing, retaining, and transferring information needed for cognitive tasks. It is composed of two classes: working memory and long-term memory.
Declarative Knowledge
Factual knowledge, including recollection of words, definitions, names, dates, and so forth (Weiten, 2001).
104
Learning, Requirements, and Metrics
Memorization
The rote recall of some material, with no required comprehension and/or ability to integrate that material (Lovett & Pillow, 1995).
Procedural Knowledge
Knowledge of actions, skills, operations, and conditioned responses; knowing how to execute actions (Weiten, 2001).
Strategic Knowledge
“The knowledge that enables the formation of strategies; plans of action determining what kinds of knowledge and tactics should be employed in different problem contexts” (Fentem, Dumas, & McDonnell, 1998).
Influences on Individual Differencesb
This module includes state, trait, and environmental factors that moderate the effectiveness of training interventions.
a
SA = Situational awareness. Because of the potentially unlimited number of internal and external influences, we do not attempt to create a list here.
b
information before an exercise takes place. Therefore, it was included in the preexercise module, instructional planning. Exercise manipulation involves TIs that an instructor manipulates in real time, during an exercise, to enhance the training opportunities presented to trainees. Last, feedback refers to TIs that provide trainees information regarding some aspect(s) of an individual’s or a team’s task and/ or team performance. It is important to note that feedback could occur during and/or after an exercise. Therefore, it is its own separate module. In Table 6.2, our TIs are organized into the appropriate CDMTS module and are accompanied by their conceptual definitions. There are two things to note with regard to the TI taxonomy. First, we realize that this may not be an exhaustive list, and other training interventions may have been overlooked inadvertently. And, second, the TIs are broken down into individual components. We realize that there are other training strategies that incorporate several of the individual components. For example, adaptive guidance incorporates TIs such as feedback, sequencing, practice, and so forth (Bell & Kozlowski, 2002). FUTURE RESEARCH NEEDS FOR TIMx DEVELOPMENT With the framework developed (see Figure 6.2), the next iteration of the TIMx development will encompass filling in the cells of the matrix. The most logical approach for filling in the cells is to focus on a LO column. For example, literature describing empirically validated training interventions that can be used to train that specific learning objective would be identified and reviewed. As the cells are filled, the pattern of research results will indicate which TIs are best suited for training the LO. Additionally, the pattern of research results will indicate other areas that need new or continued research attention. For example, a review of the research could show that no empirical research has been conducted on the effectiveness of using sequencing to train pattern recognition within a virtual environment.
A Theoretical Framework for Developing Systematic Instructional Guidance
105
Table 6.2. Training Interventions and Definitions for Each CDMTS Module Module and Training Intervention
Definition of Training Intervention
Instructional Planning (Pre-Exercise) Action Learning
A team engages in a specific work-related problem solving task in which there is an emphasis on learning and problem solving through identification of the root problem and development and implementation of an action plan (Goldstein & Ford, 2002).
Advanced Organizers
An adjunct aid that presents a structure of what is to be learned so that new content can be more easily organized and interpreted (Ausubel, 1960; Langan-Fox, Waycott, & Alber, 2000; Chalmers, 2003).
Anchored Instruction
Integrating examples or real world experiences into instructional programs to provide a common frame of reference for learners in order to help them connect to concepts being taught (adapted from Blackhurst & Morse, 1996).
Cross-Training
Team members learn the skills of one or several additional jobs in order to foster mental representations of task interdependency, team role structure, and how the two interact (Noe, 1999; Salas & Cannon-Bowers, 1997).
Error Based Training
Exposing trainees to both positive and negative examples of the behavior being trained (adapted from Baldwin, 1992).
Event Based Approach to Training
Planned events are introduced into simulated scenarios to evaluate specific skills that would be required in these reallife situations (Salas & Cannon-Bowers, 1997).
Exploration Based Training
Training that builds situations in which the learner can make an error and then explore in a trial-and-error way what the cause of the error was and explore alternative strategies to avoid or fix the error (Frese et al., 1991).
Guided Reflection
Instructor prompts the trainee to mentally review the affective, cognitive, motivational, and behavioral aspects of the trainee’s performance (Cleary & Zimmerman, 2004).
Heuristic Strategies
Instructions that focus on teaching general rules of thumb to find an acceptable approximate solution to a problem when an exact problem solving method is unavailable or too time consuming (Tversky & Kahneman, 2002).
Intelligent Tutoring Systems
Computer based programs that deliver instruction by first diagnosing a trainee’s current level of understanding, then comparing it to an expert model of performance, and finally selecting the appropriate intervention that will advance the trainee’s level of understanding (Goldstein & Ford, 2002; Corbett, Koedinger, & Anderson, 1997).
106
Learning, Requirements, and Metrics
Lecture
Method in which an instructor verbally delivers training material to multiple students simultaneously.
Mental Rehearsal
A trainee is instructed to visualize himself or herself reenacting the target behavior or to visualize himself or herself performing the target behavior perfectly (Davis & Yi, 2004).
Modeling/ Demonstration
Trainees are presented desired learning processes and/or outcomes to mimic (Noe, 1999).
Part Task Training
Training that breaks down a skill or task into components that are practiced separately (Goldstein & Ford, 2002).
Role-Play
A training method in which trainees are given information about a situation and act out characters assigned to them (Noe, 1999).
Sequencing
Practice is provided to learners in a particular order, such as proceeding from easy to difficult or from general to specific. Types of sequencing could include massed/blocked practice, spaced practice, and random practice (Shea & Morgan, 1979; Schmidt & Bjork, 1992).
Student Initiated Learning (Pre-Exercise) Guided Team SelfCorrection
Group reflection in which trainees self-reflect and gain insights into the nature of their own taskwork, teamwork, and the relation of taskwork and teamwork to the team’s overall performance (Smith-Jentsch, Zeisig, Acton, & McPherson, 1998).
Peer Instruction
The students perform tasks individually and then they discuss their responses and their underlying reasoning (Mazur, 1997).
Reflection
The trainee takes the initiative to mentally review the affective, cognitive, motivational, and behavioral aspects of his or her own performance (Cleary & Zimmerman, 2004).
Self-Directed Learning
Learning that takes place outside of a formal training program without the guidance of an actual trainer (Manz & Manz, 1991).
Exercise Manipulation (During Exercise) Cueing/Hinting
Techniques that prompt the trainee to access information already known or that prompt the trainee to carry out the next steps required to reach a correct answer or to make connections between the training task and the larger context (Goldstein & Ford, 2002; Hume, Michael, Rovick, & Evens, 1996).
A Theoretical Framework for Developing Systematic Instructional Guidance
107
Deliberate Practice
A technique used to provide learning guidance by prompting the learner to perform as needed during skill acquisition or to make connections between the training task and the larger context (Ericsson, Krampe, & TeschRomer, 1993).
Didactic Questioning
Features a teacher leading students to concepts through a series of ordered questions that tend to have a single answer (Vlastos, 1983).
Facilitative Questioning
Involves posing open-ended questions to trainees that encourage the trainees to generate their own solutions and ideas without direct input from the teacher (PlasmaLink Web Services, 2006).
Highlighting
Technique used to draw a trainee’s attention to a stimulus in order to increase the likelihood of perception of the stimulus (Wickens, Alexander, Ambinder, & Martens, 2004).
Overlearning
The immediate continuation of practice beyond achievement of the criterion (Rohrer, Taylor, Pashler, Wixted, & Cepeda, 2005; Noe, 1999).
Feedback (During and Post-Exercise) After Action Review
An interactive process in which trainees discuss task planning and execution under the guidance of an instructor (Scott, 1983).
Environmental Feedback
Provides information about the actual/true relationship between the cues in the environment and their associated outcomes (Balzer et al., 1994).
Normative Feedback
Provides an individual with information about his or her standing relative to others, but is not specific performancerelated feedback (Smithers, Wohlers, & London, 1995).
Outcome Feedback
Provides knowledge of the results of one’s actions (Ericsson, Krampe, & Tesch-Romer, 1993; Kluger & DeNisi, 1996; Balzer, Doherty, & O’Connor, 1989).
Process Feedback
Conveys information about how one performs the task (not necessarily how well; Kluger & DeNisi, 1996).
Progress/Velocity Feedback
The trainee’s performance is compared only with his or her own prior performance on the task. The trainee can gauge the rate of progress at which a performance goal is being reached (Kozlowski et al., 2001).
Scaffolding/Faded Feedback
Support or guidance is given on every trial early in practice and then is gradually withdrawn across practice (Schmidt & Bjork, 1992).
108
Learning, Requirements, and Metrics
To illustrate this further, consider two research studies that investigate the effectiveness of training decision making using scenario based training simulations. Buff and Campbell (2002) examined the appropriate information or content to include in effective feedback. Specifically, they investigated the effectiveness of process and outcome feedback (as compared to a no feedback practice group) on participant’s decision-making performance on a simulated radar display task. Their results showed that participants who received process feedback significantly improved their performance (measured as decision accuracy) from preto post-feedback sessions. However, neither participants in the outcome feedback condition nor the no feedback condition improved their performances. Likewise, Cannon-Bowers, Salas, Blickensderfer, and Bowers (1998) also investigated how to train decision making on a simulated radar display task. However, these researchers examined the use of cross-training to train decision making in teams. Their results showed that cross-trained groups made more correct decisions about contacts and did so more quickly than teams assigned to a no training condition. Based on the results there is some indication that process feedback and crosstraining may be effective interventions to use to train decision making. Therefore, we could use these results to fill in the cell intersections of decision making and process feedback as well as decision making and cross-training (see Figure 6.2). Within each cell we anticipate that the effectiveness of the training intervention will depend upon certain caveats, including both internal and external factors. Internal factors include individual difference variables such as goal
Figure 6.2.
Training Intervention Matrix
A Theoretical Framework for Developing Systematic Instructional Guidance
109
orientation, self-efficacy, and fatigue. External factors include environmental factors such as weather, ambient temperature, and having the required equipment, materials, and organizational support. These variables will ultimately serve as caveats that will fit into the Internal and External Influences module. Additionally, there is a debate within the literature regarding the distinction between education and training. For example, Kline (1985) suggests that training tends to concentrate on psychomotor skills, while education concentrates on cognitive skills. However, other researchers have found evidence that cognitive skills, such as decision making, can be trained beyond declarative knowledge (Buff & Campbell, 2002; Cannon-Bowers, Salas, et al., 1998). However, what remains to be addressed is whether or not these decision-making strategies can be transferred to other domains. Within the TIMx, we have included a number of traditional educational interventions (for example, lecture and the Socratic method) in addition to traditional training interventions (for example, cueing/ hinting and part task training). Therefore, it may be possible to use the TIMX as a framework to address this debate empirically. Finally, future research also needs to take the use of technology into consideration. The impact of using different types of technology should be studied, as should the impact of the fidelity of simulations. In fact, Muller et al. (2006) argue that research is needed to determine how three types of fidelity (functional, psychological, and physical) interact to provide the most effective training environment. Research should focus on a blended fidelity training solution to “support the examination of training transfer, across low fidelity, high fidelity, and live training environments” that is capable of training both “technical and higher order skill sets” (Muller et al., 2006, p. 10). SUMMARY A major challenge facing virtual environment and instructional systems designers is choosing the best way to train certain learning outcomes. Typically, this decision has been based on technology requirements and not on empirical research or the “science of training” (Salas & Cannon-Bowers, 2001). To address this challenge, we created two taxonomies: a learning outcome taxonomy and a training intervention taxonomy. We then linked the taxonomies to create a framework within which we could start scientifically answering questions such as “what is the best way to train decision making?” It is our hope that other researchers will find this framework to be a useful tool and will use it to guide their own research on instructional interventions. We also hope that as the TIMx begins to fill, we can provide empirically supported guidance to instructional system designers about the most appropriate training strategies to use in virtual environments so that their decisions are based on science and not on individual preferences. ACKNOWLEDGMENTS We gratefully acknowledge CDR Dylan Schmorrow at the Office of Naval Research who sponsored this work (Contract No. N0001407WX20102).
110
Learning, Requirements, and Metrics
Additionally, we would like to thank Dr. Ami Bolton, Ms. Melissa Walwanis Nelson, and Ms. Beth Atkinson for their insightful comments during the development of the TIMx. REFERENCES Ausubel, D. P. (1960). The use of advance organizers in the learning and retention of meaningful verbal material. Journal of Educational Psychology, 51, 267–272. Baldwin, T. T. (1992). Effects of alternative modeling strategies on outcomes of interpersonal skills training. Journal of Applied Psychology, 76, 759–769. Balzer, W. K., Doherty, M. E., & O’Connor, R. (1989). Effects of cognitive feedback on performance. Psychological Bulletin, 106, 410–433. Balzer, W. K., Hammer, L. B., Sumner, K. E., Birchenough, T. R., Martens, S. P., & Raymark, P. H. (1994). Effects of cognitive feedback components, display format, and elaboration on performance. Organizational Behavior and Human Decision Processes, 58, 369–385. Bell, B. S., & Kozlowski, S. W. J. (2002). Adaptive guidance: Enhancing self-regulation, knowledge, and performance in technology-based training. Personnel Psychology, 55, 267–306. Blackhurst, A. E., & Morse, T. E. (1996). Using anchored instruction to teach about assistive technology. Focus on Autism and Other Developmental Disabilities, 11, 131–141. Buff, W. L., & Campbell, G. E. (2002). What to do or what not to do? Identifying the content of effective feedback. Proceedings of the 46th Annual Meeting of the Human Factors and Ergonomics Society (pp. 2074–2078). Santa Monica, CA: Human Factors and Ergonomics Society. Cannon-Bowers, J. A., Burns, J. J., Salas, E., & Pruitt, J. S. (1998). Advanced technology in scenario-based training. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implication for individual and team training (pp. 365–374). Washington, DC: American Psychological Association. Cannon-Bowers, J. A., Salas, E., Blickensderfer, E., & Bowers, C. A. (1998). The impact of cross-training and workload on team functioning: A replication and extension of initial findings. Human Factors, 40, 92–101. Chalmers, P. A. (2003). The role of cognitive theory in human-computer interface. Computers in Human Behavior, 19, 593–607. Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4, 55–81. Cleary, T. J., & Zimmerman, B. J. (2004). Self-regulation empowerment program: A school-based program to enhance self-regulated and self-motivated cycles of student learning. Psychology in the Schools, 41, 537–550. Corbett, A. T., Koedinger, K. R., & Anderson, J. R. (1997). Intelligent tutoring systems. In M. G. Helander, T. K. Landauer, & P. V. Prabhu (Eds.), Handbook of human-computer interaction (2nd ed., pp. 849–874). Amsterdam: Elsevier. Davis, F. D., & Yi, M. Y. (2004). Improving computer skill training: Behavior modeling, symbolic mental rehearsal, and the role of knowledge. Journal of Applied Psychology, 89, 509–523. Endsley, M. R. (2000). Direct measurement of situation awareness: Validity and use of SAGAT. In M. R. Endsley & D. J. Garland (Eds.), Situation awareness analysis and measurement (pp. 147–173). Mahwah, NJ: Lawrence Erlbaum.
A Theoretical Framework for Developing Systematic Instructional Guidance
111
Ericsson, K. A., Krampe, R. T., & Tesch-Romer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100, 363–406. Espinosa, A., Lerch, J., & Kraut, R. (2002). Explicit vs. implicit coordination mechanisms and task dependencies: One size does not fit all. In E. Salas, S. M. Fiore, & J. CannonBowers (Eds.), Team cognition: Process and performance at the inter- and intraindividual level. Washington, DC: American Psychological Association. Fentem, A. C., Dumas, A., & McDonnell, J. (1998). Evolving spatial representations to support innovation and the communication of strategic knowledge. Knowledge-Based Systems, 11, 417–428. Flavell, J. H. (1976). Metacognitive aspects of problem solving. In L. B. Resnick (Ed.), The nature of intelligence (pp. 231–236). Hillsdale, NJ: Erlbaum. Fleishman, E. A., Quaintance, M. K., & Laurie, A. (1984). Taxonomies of human performance: The description of human tasks. San Diego, CA: Academic Press, Inc. Frese, M., Brodbeck, F., Heinbokel, T., Mooser, C., Schleiffenbaum, E., & Thiemann, P. (1991). Errors in training computer skills: On the positive function of errors. HumanComputer Interaction, 6, 77–93. Garcia-Martinez, R., & Borrajo, D. (2000). An integrated approach to learning, planning, and execution. Journal of Intelligent & Robotic Systems, 29, 47–78. Goldstein, I. L., & Ford, F. J. (2002). Training in organizations: Needs assessment, development, and evaluation (4th ed.). Belmont, CA: Wadsworth/Thomson Learning. Gustafsson, M., Biel, A., & Garling, T. (1999). Outcome-desirability bias in resource management problems. Thinking & Reasoning, 5, 327–337. Holden, J. G., Flach, J. M., & Donchin, Y. (1999). Perceptual-motor coordination in an endoscopic surgery simulation. Surgical Endoscopy, 13, 127–132. Hume, G., Michael, J., Rovick, A., & Evens, M. (1996). Hinting as a tactic in one-on-one tutoring. The Journal of the Learning Sciences, 5, 23–47. Hunt, E. (2002). Precis of thoughts on thought. Mahwah, NJ: Erlbaum. Kline, J. A. (1985, January–February). Education and training: Some differences. Air University Review. Retrieved April 15, 2008, from http://www.airpower.maxwell.af.mil /airchronicles/aureview/1985/jan-feb/kline.html Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119, 254–284. Kozlowski, S. W. J., Toney, R. J., Mullins, M. E., Weissbein, D. A., Brown, K. G., & Bell, B. S. (2001). Developing adaptability: A theory for the design of integrated-embedded training systems. In E. Salas (Ed.), Advances in human performance and cognitive engineering research (Vol. 1, pp. 59–123). Amsterdam: JAI/Elsevier Science. Krathwohl, D. R., Bloom, B. S., & Masia, B. B. (1964). Taxonomy of educational objectives: The classification of educational goals. New York: David McKay Co., Inc. Langan-Fox, J., Waycott, J. L., & Alber, K. (2000). Linear and graphic advanced organizers: Properties and processing. International Journal of Cognitive Ergonomics, 4, 19–34. Lederman, S. J., & Wing, A. M. (2003). Perceptual judgment, grasp point selection and object symmetry. Experimental Brain Research, 152, 156–165. Lehto, M. (1997). Decision making. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (2nd ed., pp. 1201–1248). New York: John Wiley & Sons. Lovett, S. B., & Pillow, B. H. (1995). Development of the ability to distinguish between comprehension and memory: Evidence from strategy-selection tasks. Journal of Educational Psychology, 87, 523–536.
112
Learning, Requirements, and Metrics
Manz, C. C., & Manz, K. (1991). Strategies for facilitating self-directed learning: A process for enhancing human resource development. Human Resource Development Quarterly, 2, 3–12. Mazur, E. (1997). Peer instruction: A user’s manual. Upper Saddle River, NJ: Prentice Hall Series in Educational Innovation. McCarley, J. S., Vais, M., Pringle, H., Kramer, A. F., Irwin, D. E., & Strayer, D. L. (2001, August). Conversation disrupts visual scanning of traffic scenes. Paper presented at the Vision in Vehicles conference, Brisbane, Australia. Merrill, M. D. (1997). Learning-oriented instructional development tools. Performance Improvement, 36(3), 51–55. Muller, P., Cohn, J., Schmorrow, D., Stripling, R., Stanney, K., Milham, L., et al. (2006). The fidelity matrix: Mapping system fidelity to training outcome. Proceedings of the Interservice/Industry Training, Simulation and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. Noe, R. (1999). Employee training and development. Boston: Irwin McGraw-Hill. O’Neil, H. F. (2003). What works in distance learning. Greenwich, CT: Information Age Publishing. Oser, R. L., Cannon-Bowers, J. A., Salas, E., & Dwyer, D. J. (1999). Enhancing human performance in technology-rich environments: Guidelines for scenario-based training. In E. Salas (Ed.), Human/technology interaction in complex systems (Vol. 9, pp. 175– 202). Stanford, CT: JAI Press. PlasmaLink Web Services. (2006, September 4). Glossary of instructional strategies. Retrieved April 16, 2007, from http://glossary.plasmalink.com/glossary.html#F Reisberg, D. (1997). Cognition: Exploring the science of the mind. New York: W. W. Norton. Rohrer, D., Taylor, K., Pashler, H., Wixted, J. T., & Cepeda, N. J. (2005). The effect of overlearning on long-term retention. Applied Cognitive Psychology, 19, 361–374. Salas, E., Bowers, C. A., & Cannon-Bowers, J. A. (1995). Military team research: 10 years of progress. Military Psychology, 7(2), 55–75. Salas, E., Bowers, C. A., & Rhodenizer, L. (1998). It is not how much you have but how you use it: Toward a rational use of simulation to support aviation training. International Journal of Aviation Psychology, 8, 197–208. Salas, E., & Cannon-Bowers, J. A. (1997). Methods, tools, and strategies for team training. In M. A. Quinones & A. Ehrenstein (Eds.), Training for a rapidly changing workplace: Applications of psychological research (pp. 249–279). Washington, DC: American Psychological Association. Salas, E., & Cannon-Bowers, J. A. (2001). The science of training: A decade of progress. Annual Review of Psychology, 52, 471–499. Salzman, M. C., Dede, C., Loftin, R. B., & Chen, J. (1999). A model for understanding how virtual reality aids complex conceptual learning. Presence: Teleoperators and Virtual Environments, 8, 293–316. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Scott, T. D. (1983). Tactical engagement simulation after action review guidebook (Research Rep. No. 83-13). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
A Theoretical Framework for Developing Systematic Instructional Guidance
113
Shea, J. B., & Morgan, R. L. (1979). Contextual interference effects on the acquisition, retention, and transfer of a motor skill. Journal of Experimental Psychology: Human Learning & Memory, 5, 179–187. Shomstein, S., & Yantis, S. (2004). Configural and contextual prioritization in objectbased attention. Psychonomic Bulletin & Review, 11, 247–253. Smith-Jentsch, K. A., Johnston, J. H., & Payne, S. C. (1998). Measuring team-related expertise in complex environments. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 61–87). Washington, DC: American Psychological Association. Smith-Jentsch, K. A., Salas, E., & Baker, D. (1996). Training team performance-related assertiveness. Personnel Psychology, 49, 909–936. Smith-Jentsch, K. A., Zeisig, R. L., Acton, B., & McPherson, J. A. (1998). Team dimensional training: A strategy for guided team self-correction. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 271–298). Washington, DC: American Psychological Association. Smithers, J. W., Wohlers, A. J., & London, M. (1995). A field study of reactions to normative versus individualized upward feedback. Group & Organization Management, 20, 61–89. Stein, E. S. (1992). Air traffic control visual scanning (Rep. No. DOT/FAA/CT-TN92/16). Atlantic City International Airport: Federal Aviation Administration Technical Center. Tannenbaum, S. I., & Yukl, G. (1992). Training and development in work organizations. Annual Review of Psychology, 43, 399–441. Tversky, A., & Kahneman, D. (2002). Judgment under uncertainty: Heuristics and biases. In D. J. Levitin (Ed.), Foundations of cognitive psychology: Core readings (pp. 585– 600). Cambridge, MA: MIT Press. Van Buskirk, W. L., Moroge, J. L., Kinzer, J. E., Dalton, J. M., & Astwood, R. S. (2005). Optimizing training interventions for specific learning objectives within virtual environments: A Training Intervention Matrix (TIMx). Proceedings of the 11th International Conference of Human Computer Interaction [CD-ROM]. Vlastos, G. (1983). The Socratic Elenchus. In A. Price (Ed.), Oxford Studies in Ancient Philosophy (Vol. 1, pp. 27–58). Oxford, England: Blackwell Publishing LTD. Walwanis Nelson, M. M., Owens, J., Smith, D. G., & Bergondy-Wilhelm, M. L. (2003). A common instructor operator station framework: Enhanced usability and instructional capabilities. Proceedings of the 18th Interservice/Industry Training Systems and Education Conference (pp. 333–340). Arlington, VA: American Defense Preparedness Association. Wei, J., & Salvendy, G. (2004). The cognitive task analysis methods for job and task design: Review and reappraisal. Behaviour and Information Technology, 23, 273–299. Weigman, D. A., & Rantanen, E. (2002). Defining the relationship between human error classes and technology intervention strategies (Tech. Rep. No. 1 ARL-02-1/NASA 02-1). Urbana-Champaign: University of Illinois, Aviation Research Lab Institute of Aviation. Weiten, W. (2001). Psychology: Themes and variations. Stamford, CT: ThomsonWadsworth. Wickens, C. D., Alexander, A. L., Ambinder, M. S., & Martens, M. (2004). The role of highlighting in visual search through maps. Spatial Vision, 17, 373–388.
This page intentionally left blank
SECTION 2
REQUIREMENTS ANALYSIS SECTION PERSPECTIVE Kay Stanney The biggest mistakes in any large system design are usually made on the first day. —Dr. Robert Spinrad (1988) Vice President, Xerox Corporation (as cited in Hooks & Farry, 2001)
It is often suggested that the most important step in the training system development lifecycle is comprehensive requirements specification. Yet, it is not uncommon to encounter system solutions that do not meet their intended objectives due to insufficiently articulated requirements (Young, 2001). In fact, it has been estimated that 80 percent of product defects originate during requirements definition (Hooks & Farry, 2001). Requirements should be specified such that training systems can be designed to overcome deficiencies in knowledge, skills, or abilities (KSAs) required to perform a given job (see Milham, Carroll, Stanney, and Becker, Chapter 9). To develop such systems, designers must understand instructional goals so that the training/learning curriculum, content, layout, and delivery mode are designed to prepare personnel to operate, maintain, and support all job components in the required operational environment. To achieve these objectives, it is beneficial to adopt a systems approach to training systems requirement specification (Young, 2004). A systems approach seeks to identify training requirements based on an analysis of job performance requirements data. Training objectives are then formulated from the collected data, which can be used to assess a trainee’s progress toward meeting targeted training objectives. By following a systematic requirements specification process, training systems can be designed to ensure that a desired level of readiness is achieved.
116
Learning, Requirements, and Metrics
LIFECYCLE APPROACHES Several system development lifecycle (that is, system) approaches can be used to guide the training systems requirements engineering process (Sharp, Rogers, & Preece, 2007; Young, 2004). The most appropriate approach depends on the project type and available resources. An early approach was the waterfall systems development lifecycle, which follows a linear-sequential model that focuses on completion of one phase before proceeding to the subsequent phase. It is easy to use and manage due to its rigidity and would work well for training systems that have very well-defined requirements at the outset or when costs and schedule need to be predetermined, as the progress of system development is readily measurable with this approach. It would not be an appropriate approach for complex training systems, where requirements are at a moderate to high risk of changing, as it is inflexible, slow, costly, and cumbersome due to its rigid structure. In an effort to be more flexible and accommodate changes in requirements more effectively than the waterfall approach, use of multiple iterations or spirals of requirements planning, gathering, analysis, design, engineering, and evaluation have been adopted (compare the incremental approach, the evolutionary lifecycle, and the spiral approach; Boehm, 1988). The Rapid Applications Development (RAD) approach (Millington & Stapleton, 1995) is an iterative approach that attempts to quickly produce high quality systems through the use of iterative prototyping, active user involvement, and risk reduction through “time boxing” (that is, time-limited cycles of ~6 months to achieve system/partial system builds) and Joint Application Development (JAD) workshops in which users and developers convene to reach consensus on system requirements. With time boxing and JAD, RAD aims to produce systems quickly and with a tight fit between user requirements and system specifications; thus, it can lead to dramatic savings in time, cost, and development hours. However, the time-limited cycles can lead to a sacrifice in quality and there can be a tendency to push difficult problems to future builds in an effort to demonstrate early success. RAD would be appropriate for training system development efforts that have a high level of user community support, but would not be fitting for large, complex projects with distributed teams and limited user community involvement. Another alternative is the agile system development lifecycle, which involves tight iterations (one to three weeks) through four phases (that is, warm-up, development, release/endgame, and production), with a product being delivered for user feedback at the end of each iteration. During each iteration, developers work closely with users to understand their needs and implement and test solutions that address user feedback. The agile approach would be appropriate for training systems that have emergent and rapidly changing requirements, but would be less fitting for projects that cannot handle the increased system complexity/flexibility and cost that are often associated with the agile approach (for example, training systems with high reliability or safety requirements). While there are several lifecycle approaches, for training systems development it is likely that the suitable choice will be a flexible approach, such as RAD or agile system development.
Requirements Analysis
117
REQUIREMENTS ENGINEERING Regardless of the lifecycle approach followed during a systems development effort, each approach has a requirements engineering phase in which four central activities are conducted: (1) elicitation, (2) modeling and analysis, (3) specification, and (4) verification and validation (see Figure SP2.1; Young, 2001, 2004). Elicitation focuses on gathering requirements from stakeholders. The collected data are then modeled and analyzed to ensure each requirement is necessary, feasible, correct, concise, unambiguous, complete, consistent, verifiable, traceable, nonredundant, implementation free, and usable (Hooks & Farry, 2001; Young, 2004). Next, the requirements are documented in a specification document. Documented requirements are then verified and validated with stakeholders to ensure that the system being developed meets the needs of the intended stakeholders. Elicitation Requirements elicitation is the process by which the requirements for a training system are discovered. It comprises a set of activities that enables the understanding of the goals for, objectives of, and motivations for building a proposed training system. It seeks to garner an understanding of the target operational domain and associated tasks such that requirements can be elicited that result in a training system that satisfies its intended goals. It involves identifying target
Figure SP2.1. Engineering
Four Central Activities Involved in Training Systems Requirements
118
Learning, Requirements, and Metrics
stakeholders, identifying training system goals, identifying the types of requirements to be elicited, and then eliciting the requirements through a set of inquiry techniques. IDENTIFYING STAKEHOLDERS Requirements should address the viewpoints of a variety of stakeholders (that is, alternative viewpoints), including users (for example, trainees and instructors), customers (for example, acquisition community), system developers, quality assurance teams, and requirements analysts. It is thus important to take a multiperspective approach to eliciting requirements to ensure all viewpoints are incorporated (Sharp et al., 2007). IDENTIFYING TRAINING SYSTEM GOALS Training system goals can be identified through training needs analyses (TNAs), changes to operational requirements, or changes to equipment or systems. TNA is a process of gathering and interpreting data from the target training community and operational environment in an effort to identify performance gaps and formulate training solutions (see Cohn, Stanney, Milham, Bell Carroll, Jones, Sullivan, and Darken, Volume 3, Section 2, Chapter 17). A comprehensive TNA provides the basis from which to design training solutions that realize substantial improvements in human performance by closing the identified performance gap in a manner that is compatible with current training practices. Once the training gap is identified, a task analysis (TA) should be conducted in order to gain a complete understanding of the target operational context. The TA will identify specific training goals, including the specific tasks and associated knowledge (that is, foundational information required to accomplish target training tasks), skills (that is, an enduring attribute that facilitates learning on target training tasks), and abilities (that is, an enduring attribute that influences performance on target training tasks) to be trained. The TA will also uncover the supporting tools, resources (that is, reference used to perform target training tasks), operational context that is, platform, system, and environment), sensory modality and fidelity requirements, and use scenarios (see Phillips, Ross, and Cohn, Chapter 8) required for such training (Integrated Learning Environment [ILE], 2006). These goals can then be translated into learning/training and system requirements that drive the training systems development process. TYPES OF REQUIREMENTS A requirement is a statement of the capabilities, physical characteristics, or quality factors that a training system must possess to achieve its intended training objectives (Young, 2004). There are several types of requirements, which fall into two broad categories: functional requirements (that is, what the system must do) and nonfunctional requirements (that is, the constraints on the types of
Requirements Analysis
119
solutions that will meet the functional requirements, including user, usability, performance, interface, operational requirements, environment, facility, resource, verification, acceptance, documentation, security, portability, quality, reliability, maintainability, and safety requirements). For training systems (ILE, 2006), additional requirements include trainee requirements, instructor requirements (defines the instructor prerequisites and qualifications [for example, education, military rank, civilian grade, and experience] and the number of instructors required per training session [that is, instructor/student ratio]), and curriculum requirements (for example, mission tasks to be trained, content, layout, delivery mode, course throughput, and so forth). ELICITATION TECHNIQUES There are several techniques for eliciting requirements (Zhang, 2007; Zowghi & Coulin, 2005). Traditional techniques include introspection, review of existing doctrine, analysis of existing data, interviews (both open-ended and structured), surveys, and questionnaires. Collaborative techniques include conducting focus groups and/or JAD/RAD workshops, prototyping, brainstorming, and participatory design (see Nguyen and Cybulski, Chapter 11). Cognitive techniques include task analysis, protocol analysis, and knowledge acquisition techniques (for example, card sorting, laddering, repertory grids, and proximity scaling techniques) (see Mayfield and Boehm-Davis, Chapter 7). Contextual approaches include ethnographic techniques (for example, participant observation and enthnomethodology), discourse analysis (for example, conversation analysis and speech act analysis), and sociotechnical methods (for example, soft systems analysis). In general, structured interviews are one of the most effective elicitation techniques (Davis, Dieste, Hickey, Juristo, & Moreno, 2006). Observational techniques (that is, task analysis, protocol analysis, and contextual approaches) provide means by which to develop a rich understanding of the target domain and are effective in eliciting tacit requirements that are difficult to verbalize during interviews (Zhang, 2007). Each technique has strengths and weaknesses, and thus it is best to use a combination of approaches (Goguen & Linde, 1993) or a synthetic approach that systematically combines several of the techniques (Zhang, 2007). Table SP2.1 provides a comparative summary of requirements elicitation techniques. Modeling and Analysis Once requirements data have been elicited, the data must be modeled and analyzed to reduce them into a form amenable to specification. In terms of modeling, an effort is made to express the requirements in terms of one or more models that support further analysis and specification. Through delineation of precise models, details missed during elicitation can be identified; the resulting models can also be used to communicate the requirements to developers (Cheng & Atlee, 2007). There are several approaches to modeling functional requirements, and these can focus on the organization (for example, Enterprise Models), the data (for
120
Learning, Requirements, and Metrics
Table SP2.1. Comparative Summary of Requirements Elicitation Techniques Elicitation Type Elicitation Technique Strengths
Shortcomings
Traditional
Collaborative
Introspection
Easy to administer
Cannot introspect contextual data; when experts are used as source for introspection, they may not reflect behaviors of naive trainees
Reviewing existing doctrine
Provides foundational knowledge from which to reason about tasks, environment, and other data sources; may provide detailed requirements for current system
Doctrine does not always match reality; often verbose and replete with irrelevant detail
Interviews
Rich collection of data —both objective and subjective perceptions; represent multiple stakeholders’ opinions; flexible, can probe in depth, and adapt on the fly
Data volume can be cumbersome to structure and analyze; potential for large variability among respondents; difficult to capture contextual data
Questionnaires
Quickly collect data from large number of respondents; can administer remotely; can collect subjective perceptions (for example, attitudes and beliefs) and target trainee characteristics
Difficult to capture contextual data; must be careful to avoid bias in respondent selection; difficult to analyze open-ended questions
Focus groups; brainstorming; JAD/RAD workshops
Fosters natural interaction between people; gauge reaction to early product concepts; foster stakeholder consensus and buy-in; team dynamics can lead to rich understanding of trainee needs
Ad hoc groups may be uncomfortable for participants; few may dominate discussion; possibility of “groupthink”
Requirements Analysis
Cognitive
Contextual
121
Prototyping
Provides an early view of what is feasible; develop understanding of desires and possibilities; good when there is a high level of uncertainty about requirements; good for obtaining early feedback from stakeholders; stimulates discussion
Need system build to apply; can be costly; may assume trainees accept prototype design when, in fact, they accept only its behaviors
Task analysis
Obtain detailed task and rich contextual data
If not well structured, can be time consuming and difficult to analyze rich data
Protocol analysis
Tap tacit knowledge; obtain contextual data if embedded in work context; reveal shortcomings of existing systems
Based on introspection and thus may be biased or unreliable; cannot capture social dimension
Knowledge acquisition
Can elicit tacit knowledge, classification knowledge, hierarchical knowledge, and mental models; elicited knowledge is represented in standardized format
Does not model performance and contextual data; assumes knowledge is structured
Ethnographic
Obtain rich contextual data; uncover existing work patterns and technology usage; identify division into and goals of social groups
Extremely labor intensive; can be difficult to analyze rich data; difficult to assess proposed system changes
122
Learning, Requirements, and Metrics Discourse analysis
Identify division into and goals of social groups; consider effect of new system on existing social structure; uncover value system of organization; tap tacit knowledge
Only applicable to situations with substantial social interaction and verbal data; labor intensive; cannot readily apply prior to prototype stage
example, Entity-Relationship-Attribute models), the functional behaviors of stakeholders and systems (for example, object-oriented models), the stakeholders (that is, viewpoint models), or the domain itself (that is, a model of the context into which the system will operate) (Kovitz, 1999; Nuseibeh & Easterbrook, 2000). Nonfunctional requirements are generally more difficult to model, but it is still essential to operationalize them such that they are measurable (see Cohn, Chapter 10). In terms of training system design, the modeling process should allow for learning theory and research to be built into the model such that it guides training system design. Once the requirements data have been modeled, they can then be more readily analyzed. Requirements analysis focuses on evaluating the quality of captured requirements and identifying areas where further elicitation and modeling may be required. As with modeling, there are several approaches to requirements analysis, including requirements animation (for example, simulate operational models or goal-oriented models), automated reasoning (for example, goalstructured analysis, analogical and case based reasoning, and knowledge based critiquing), and consistency checking (for example, model checking in terms of syntax, data value and type correctness, circularity, and so forth). Whichever analysis approach is used, when developing requirements for training systems, the analysis process must first identify the training gap (see Figure SP2.2). Then the analysis should identify triggers that serve as learning or training requirements that fill the gap (ILE, 2006). For each learning/training requirement, the analysis must identify the critical tasks to be trained, as well as the desired levels of proficiency (see Phillips, Ross, and Cohn, Chapter 8), targeted KSAs, necessary tools, resources, sensory cues, fidelity levels, desired performance outcomes (cognitive, affective, psychomotor, verbal, and social), and training criteria by which to assess acceptable performance (that is, performance standards) (see Milham, Carroll, Stanney, and Becker, Chapter 9). It must also specify the training context (that is, use cases) and desired training conditions (for example, platform, environment, time pressure, and stress level) required to achieve training objectives. The desired training characteristics should also be specified (for example, learning curve, coordination/teaming requirements, chain of command, anticipated performance errors, and remediation strategies), as should the estimated costs of the training and associated return on investment
Requirements Analysis
Figure SP2.2.
Learning Analysis Process (Adapted from ILE, 2006)
123
124
Learning, Requirements, and Metrics
(see Cohn, Chapter 10). Training tasks can then be grouped by skill requirements and prioritized. The result of the learning analysis process is the generation of a learning objective statement (LOS), which “establishes content (and training technology) linkage with the full spectrum of work proficiency required for mission readiness and professional expertise” (ILE, 2006). The LOS aligns the identified training gaps, training objectives (that is, targeted KSAs), content, sequencing, delivery mode, student assessment, and program evaluation. In generating the LOS, it is important to recognize that the effectiveness of a training solution for addressing each targeted KSA depends on the fidelity of the training solution. Cohn et al. (2007) suggest developing a “blended” training solution, with initial acquisition of declarative knowledge and basic skills being trained via classroom lectures and low fidelity training solutions, basic procedural knowledge and problem solving skills being trained and practiced via medium fidelity training solutions, and consolidation of learned declarative knowledge and basic skills and procedures, practice of acquired knowledge and skills (for example, mission rehearsal), as well as development of more advanced strategic knowledge and tactical skills being trained via high fidelity training solutions. If, during the modeling stage, the resultant models are linked to learning theory, this should facilitate the development of a blended training solution.
Specification The requirements specification process involves communicating requirements, requirements management, and requirements traceability (Hooks & Farry, 2001; Kovitz, 1999; Nuseibeh & Easterbrook, 2000; Sharp et al., 2007). The manner in which requirements are documented influences how effectively the requirements can be interpreted and realized in the training system design. Each documented requirement should be necessary, feasible, correct, concise, unambiguous, complete, consistent, verifiable, traceable, nonredundant, implementation free, and usable (Young, 2004). There are a number of specification languages, ranging from formal (for example, Z notation, Vienna Development Method) to semiformal (for example, State-Transition Diagram, Unified Modeling Language, and Data Flow Diagrams) to informal (for example, natural language) that can be used to document training system requirements specifications. Companies often adopt specification templates that are specific to their needs. The LOS is a template that can be followed for training system requirements specification. Learning objective statements are generally broken into three main components (Arreola, 1998): 1. An action word that describes the targeted performance (that is, competency) to be demonstrated after training; 2. A statement of the performance criterion (that is, performance standard) that represents acceptable performance; 3. A statement of the conditions under which the trainee is to perform during training.
Requirements Analysis
125
Kovitz (1999) provides heuristics that can be followed to improve the quality of requirements documentation, regardless of the format in which requirements are documented. Requirements traceability relates to how readily requirements can be read, navigated, queried, and changed (Gotel, 1995; Nuseibeh & Easterbrook, 2000). It can be achieved by cross-referencing, by using specialized templates that provide links between document versions, or by restructuring the requirements specification according to an underlying network or graph that keeps track of requirements changes. A general-purpose tool (for example, hypertext editor, word processor, and spreadsheet), which supports cross-referencing between documents, or a database management system, which provides tools for documenting, editing, grouping, linking, and organizing requirements, can be used to support traceability.
Verification and Validation Verification and validation (V&V) is the process of ensuring a training system meets its functional and nonfunctional requirements. Verification focuses on ensuring a training system design satisfies its specified requirements (Young, 2001). Verification seeks to provide evidence through cognitive walkthroughs, functional and performance testing, and demonstrations that the requirements have been met in the delivered training system. Requirements validation involves certifying that the requirements model is correct in terms of the instructor’s intentions, thereby ensuring that the right problem (that is, training gap) has been solved in the delivered system. Validation establishes that the requirements model specifies a training system solution that is appropriate for meeting trainees’ and instructors’ needs and thus often requires the involvement of stakeholders (see Nguyen and Cybulski, Chapter 11). During validation, the focus is on evaluating external consistency (that is, agreement between the requirements model and the problem domain), ambiguity (that is, the requirement can be interpreted only one way), minimality (that is, no overspecification), and completeness (that is, there is no omission of essential information needed to make the requirements model valid). To ensure the effectiveness of training systems, it is essential that the V&V process considers how best to evaluate training effectiveness (see Cohn, Stanney, Milham, Carroll, Jones, Sullivan, and Darken, Volume 3, Section 2, Chapter 17). The training effectiveness evaluation should assess how effectively the training system design facilitates development and maintenance of targeted training objectives and transfer of training back to the targeted operational environment. Validation can be conducted via a combination of inspection and formal approaches (for example, automated reasoning, such as analogical and case based reasoning, knowledge based critiquing, and consistency checking) and informal techniques (for example, prototyping, running scenarios, animating requirements, and conducting cognitive walkthroughs). The V&V process aims to ensure requirements have been interpreted correctly in the training system
126
Learning, Requirements, and Metrics
solution, are effective and reasonable in filling the training gap, and are recorded in a manner that can be readily verified. STATE OF THE ART IN REQUIREMENTS SPECIFICATION Requirements engineering has evolved over the past few decades into a practice supported by various techniques and tools. Over this time, the following aspects have been identified as best practices in requirements specification: (1) conducting requirements modeling and analysis within the organizational and social context in which the training system is to operate, (2) developing requirements specifications that are solution independent, with a focus on modeling stakeholders’ goals and scenarios that illustrate how these goals are to be achieved rather than modeling the desired functionality of a new system (that is, information flows and system state), and (3) placing emphasis on analyzing and resolving conflicting requirements and reasoning with models that contain inconsistencies (Nuseibeh & Easterbrook, 2000). As we look to the future of training systems requirements engineering (see Table SP2.2), several areas are important to address (Cheng & Atlee, 2007; McAlister-Kizzier, 2004; Nuseibeh & Easterbrook, 2000): • In terms of elicitation, there is a need to develop technologies that improve the precision, accuracy, and variety of requirements elicited, particularly with regard to techniques for identifying stakeholders (that is, instructors, trainees, and so forth) and their requirements with respect to the particular target context/environment. • In terms of elicitation, there is a need to develop tools that assist in structuring ethnographic observations and instructor interviews so that they are targeted toward the following: (1) identifying learning objectives, associated KSAs, and performance criterion and (2) better characterizing the instructional setting, group dynamics, communication patterns/modes, organizational/political climate, and other cultural factors that influence instruction. • In terms of modeling, considerable recent research has focused on improving scenario based modeling approaches, and the focus is now on developing techniques for creating, combining, and manipulating models (for example, model synthesis, model composition, model merging) and supporting requirements model reuse. • In terms of modeling, emphasis should be placed on developing models that build learning theory and research into the modeling process to guide training system design. • In terms of modeling, there is also a need to develop modeling techniques that deal with inconsistent, incomplete, and evolving requirements models; of particular value would be self-managing systems that accommodate varying, uncertain, incomplete, or evolving requirements at run time. This is particularly important given the dynamic nature of CONOPS (concept of operations). • In terms of analysis, there is a need to develop techniques that effectively link learning analysis to training system design (that is, processes that enable translation of learning objectives into training system designs). Such techniques should aim to
Requirements Analysis
127
Table SP2.2. Future Needs in Training Systems Requirements Engineering Requirements Engineering Activity
Future Needs
Elicitation
• Technologies that improve the precision, accuracy, and variety of requirements elicited • Tools for structuring ethnographic observations and instructor interviews to support requirements elicitation
Modeling
• Techniques for creating, combining, and manipulating models (for example, model synthesis, model composition, and model merging) • Tools that support requirements model reuse • Tools that build learning theory and research into the modeling process to guide training system design • Techniques that deal with inconsistent, incomplete, and evolving requirements models (for example, selfmanaging systems)
Analysis
• Techniques that effectively link learning analysis to training system design (that is, processes that enable translation of learning objectives into training system designs) • Techniques for supporting the prioritizing of requirements • Techniques for bridging the gap between current contextualized elicitation approaches and their inputs (for example, multimedia inputs—video/audio) and more formal specification and analysis techniques
Specification
• Requirements management techniques that automate traceability, determine the maturity and stability of elicited requirements, support scaling of requirements specification, and support security of requirements databases • Tools that allow instructors to quickly update and modify training system requirements to reflect the changing tactics, techniques, and procedures that target evolving opponent strategies • Techniques for analyzing the impact a particular training system architectural choice (for example, delivery mode) has on the ability to satisfy current and future requirements • Techniques that support outsourcing of downstream development tasks (that is, filling the gap between requirements specification and development teams that are geographically distributed)
128
Learning, Requirements, and Metrics
Verification and Validation
• Techniques for evaluating how effectively a training system design specification facilitates the development and maintenance of targeted training objectives and transfer of training back to the targeted operational environment • Tools for supporting the use of novelty, value, and surprisingness to explore the creative outcome of requirements engineering • Methods (for example, animations and simulations) for providing information to stakeholders (that is, instructors, trainees, and so forth) to elicit their feedback
systematically specify a blended fidelity training solution, which integrates the optimal mix of classroom instruction, training technologies, and live events throughout a given course of training, to ensure that a desired level of readiness is achieved. • In terms of analysis, there is also a need to develop improved techniques for analyzing the quality of elicited requirements, particularly with regard to revealing misunderstandings or questions that require further elicitation and techniques for prioritizing requirements such that an optimal combination of requirements can be identified and implemented. • In terms of modeling and analysis, there is a need to develop new techniques for formally modeling and analyzing contextual properties of the target environment into which the training system is to be immersed and bridging the gap between current contextualized elicitation approaches and their inputs (for example, multimedia inputs—video/audio) and more formal specification and analysis techniques. • In terms of specification, there is a need to develop requirements management techniques that (1) automate the task of documenting traceability links among requirements and between requirements and downstream artifacts, (2) determine the maturity and stability of elicited requirements, (3) support scaling of requirements specification (that is, techniques that allow for organizing and manipulating large numbers of requirements), and (4) support security of requirements databases. • In terms of specification, there is also a need to develop requirements specification tools that allow instructors to quickly update and modify training system requirements to reflect the changing tactics, techniques, and procedures that target evolving opponent strategies. • In terms of specification, there is a need to develop techniques for analyzing the impact a particular training system architectural choice (for example, delivery mode) has on the ability to satisfy current and future requirements. • In terms of specification, there is also a need to support globalization through the development of requirements specification techniques that support outsourcing of downstream development tasks (that is, filling the gap between requirements specification and development teams that are geographically distributed). • In terms of verification and validation, there is a need to develop techniques for evaluating how effectively a training system design specification facilitates the
Requirements Analysis
129
development and maintenance of targeted training objectives and transfer of training back to the targeted operational environment. • In terms of verification and validation, there is a need to develop tools for supporting the use of novelty, value, and surprisingness to explore the creative outcome of requirements engineering (see Nguyen and Cybulski, Chapter 11). • In terms of verification and validation, there is a need to develop improved methods (for example, animations and simulations) for providing information to stakeholders (that is, instructors, trainees, and so forth) to elicit their feedback.
CONCLUSIONS This chapter has provided an overview of the training systems requirements specification process (that is, elicitation, modeling and analysis, specification, and verification and validation), discussed some of the issues associated with the process and the value of good requirements specification, as well as contemplated future directions for the field. In order to avoid the “big mistakes” that often transpire during large training system development efforts, it is essential to adopt rigorous requirements engineering practices that fully characterize the capabilities, physical characteristics, and quality factors that a training system must possess to achieve its intended training objectives. REFERENCES Arreola, R. A. (1998). Writing learning objectives. Retrieved July 2, 2007, from http:// www.utmem.edu/grad/MISCELLANEOUS/Learning_Objectives.pdf Boehm, B. W. (1988). A spiral model of software development and enhancement. IEEE Computer, 21(5), 61–72. Cheng, B. H. C., & Altee, J. M. (2007). Research directions in requirements engineering. In Future of Software Engineering (FOSE’07) (pp. 285–303). Los Alamitos, CA: IEEE Computer Society. Cohn, J. V., Stanney, K. M., Milham, L. M., Jones, D. L., Hale, K. S., Darken, R. P., & Sullivan, J. A. (2007). Training evaluation of virtual environments. In E. L. Baker, J. Dickieson, W. Wulfeck, & H. O’Neil (Eds.), Assessment of problem solving using simulations (pp. 81–105). Mahwah, NJ: Lawrence Erlbaum. Davis, A., Dieste, O., Hickey, A., Juristo, N., & Moreno, A. M. (2006). Effectiveness of requirements elicitation techniques: Empirical results derived from a systematic review. In 14th IEEE International Requirements Engineering Conference (RE’06) (pp. 179–188). Los Alamitos, CA: IEEE Computer Society. Goguen, J. A., & Linde, C. (1993). Techniques for requirements elicitation. In S. Fickas & A. Finkelstein (Eds.), Proceedings, Requirements Engineering ’93 (pp. 152–164). Los Alamitos, CA: IEEE Computer Society. Gotel, O. (1995). Contribution structures for requirements traceability. London: Imperial College, Department of Computing. Hooks, I. F., & Farry, K. A. (2001). Customer-centered products: Creating successful products through smart requirements management. New York: American Management Association.
130
Learning, Requirements, and Metrics
ILE. (2006). Navy ILE instructional systems design and instructional design process (MPT&ECIOSWIT-ILE-GUID-1). Retrieved May 25, 2007, from https://ile-help. nko.navy.mil/ile/contentItems/Navy%20ILE%20ISD%20Process_20070815.pdf Kovitz, B. L. (1999). Practical software requirements: A manual of contents and style. Greenwich, CT: Manning Publications. McAlister-Kizzier, D. L. (2004, February). Research agenda to assess the effectiveness of technologically mediated instructional strategies 2004. Paper presented at the 23rd Annual Organizational Systems Research Association (OSRA) Conference, Pittsburgh, PA. Millington, D., & Stapleton, J. (1995). Special report: Developing a RAD standard. IEEE Software, 12(5), 54–56. Nuseibeh, B., & Easterbrook, S. (2000). Requirements engineering: A roadmap. Proceedings of the Future of Software Engineering (pp. 35–46). New York: ACM Press. Sharp, H., Rogers, Y., & Preece, J. (2007). Interaction design: Beyond human-computer interaction (2nd ed.). Hoboken, NJ: John Wiley & Sons. Young, R. R. (2001). Effective requirements practices. Boston: Addison-Wesley. Young, R. R. (2004). The requirements engineering handbook. Boston: Artech House. Zhang, Z. (2007, March). Effective requirements development—A comparison of requirements elicitation techniques. Paper presented at the System Quality and Maintainability (SQM2007), Amsterdam, The Netherlands. Retrieved June 19, 2007, from http:// www.cs.uta.fi/re/rem.pdf Zowghi, D., & Coulin, C. (2005). Requirements elicitation: A survey of techniques, approaches, and tools. In A. Aurum & C. Wohlin (Eds.), Engineering and managing software requirements (pp. 19–46). Heidelberg, Germany: Springer-Verlag.
Part V: Methods
Chapter 7
APPLIED METHODS FOR REQUIREMENTS ENGINEERING Tom Mayfield and Deborah Boehm-Davis Virtual environments (VEs) are an exciting and powerful medium that promises much in the way of realistic and accurate operating scenarios, more costeffective training, and better training transfer (Cohn, Volume 1, Section 2, Chapter 10; Foltz, LaVoie, Oberbreckling, and Rosenstein, Volume 1, Section 3, Chapter 17). Specifically, VEs offer the following potential operating benefits: • Simpler training and reduced operator error through better system/equipment interfaces; • Reduced downtime, improved efficiency, and economy in personnel during operations; • Downstream cost savings due to the reduced likelihood of costly design changes; and training benefits; • Interaction at the knowledge level that leads to improved understanding of the process being controlled and a more proactive approach to operability; • Reduced reliance on rule behavior that reduces the risk of mindset incidents and places less emphasis on procedural training to cope with specific emergencies, which rarely occur in the way envisaged.
However, realizing these potential benefits requires careful attention in developing the requirements for the VE. In most respects, developing requirements for VEs is no different from developing requirements for any other complex system. Specifically, developers should follow basic human factors processes, including the use of task analysis techniques to understand the functions that must be accomplished by the system. This chapter will describe methods that can be used to decide on the training requirements and evaluate the VE proposed to meet the tasks underpinning those requirements. Specifically, deciding which tasks and features of tasks are essential to creating a successful training program can be carried out only if a detailed task analysis breakdown is done. Task analysis (TA) is a universal human factors (HF) methodology for decomposing complex activities into understandable “chunks.” TA in its various forms is the only breakdown methodology that will give the VE
132
Learning, Requirements, and Metrics
designer the detail needed to understand the functions that must be accomplished by the system and to decide on the level of fidelity required. Use of TA techniques is important in developing the constraints that will influence performance in VE training environments. Although the traditional approach focuses on a technology-centered approach, successful training systems need to be informed by a human-centered approach. Requirements specification also needs to be informed by the trade-off between cost and realism. In evaluating cost, it is important to evaluate both the cost of producing the trainer and the cost of not adequately training the users. Although the development of a VE for training may be expensive, the cost of failure to train may be exponentially higher. Such trade-offs may become apparent when major emergencies occur, such as at Three Mile Island, where the inability to simulate the exact emergency conditions during training led to wrong assumptions being made that turned a recoverable incident into a major disaster (President’s Commission on the Accident at Three Mile Island, 1979), or the British Airways Flight 009 incident where the captain’s unorthodox use of the Boeing-747 training simulator to simulate a four-engine failure helped him to prevent a possible crash into the ocean and loss of life, thanks to his greater understanding of the aircraft’s behavior (Diamond, 1986). Realism covers both the level of fidelity of individual features of the simulation, as well as its overall appearance. The history of simulation suggests that careful attention to the tasks that need to be trained can allow developers to build simulators that represent needed functionality at a reasonable cost. Early aircraft trainers (such as the Link trainers designed by Edward A. Link in the United States in the 1930s; L3 Communications, 2007) were little more than boxes with hoods, with basic aircraft controls and displays, and a few degrees of freedom to simulate flight. However, they were sufficient to train basic familiarity with the controls, and they reduced training time in the actual aircraft. Vehicle manufacturers, whether cars, tanks, ships, or aircraft, have used mock-ups to show layouts, often in wood, with paper drawings to represent the operating panels. The nuclear industry, through regulatory requirements, provides full-scale simulators of power plants that represent every facet of the design from layout to operational verisimilitude. Those who plan to use VEs should carefully consider what functions and features need to be represented in the training environment. In the case of VEs, this will include the physical tasks associated with access and egress, movement and manipulation of tools and controls, as well as the cognitive tasks associated with operating the virtual world. It is also important to understand the elements of tasks to be simulated in the VE and identify “purposeful behavior” (Cockayne & Darken, 2004). This behavior relates mainly to the physical movement component of the task (motor), but extends to knowing what to do or where to go (cognitive) and when to start and stop (perceptual). In some cases, features will be incorporated in the VE because they are central to the tasks that are being trained. In other cases, certain features that are not critical to the task being trained need to be incorporated to create a realistic scenario that will be accepted by the user. For example, in training a pilot, simulating
Applied Methods for Requirements Engineering
133
access and egress to the vehicle may not be critical for training flight skills, but strapping in and safety checks may need to be simulated to set up realistic scenarios. Similarly, learning to train and fire a tank main armament may not require learning the task of removing shell cases, at least in initial training. However, if shell cases create interference that must be addressed in operating the armament, it may need to be included in the final simulation. The level of fidelity with which individual features and functions must be represented in the synthetic environment in order to effectively convey needed knowledge to the user must also be considered. For example, one might consider whether an object can be shown as a complete entity, that is, with no further breakdown needed, or whether it must be shown as individual parts so that trainees can carry out specific tasks on each part. However, this may be difficult to determine a priori. Take, for example, a situation where a pilot is trained to fly through a particular region using video rather than interactive world mapping. The impact on cognitive processing and effectiveness of operation may not be clear. In this example, the lack of interactivity means that the training experience will bear only a passing resemblance to what the pilot will experience in the real world; however, the impact on performance is not as clear. In the following sections, TA methods are outlined that will provide techniques for realizing such requirements (see Table 7.1). The methods described in the first two sections (data collection and description) allow the analyst to identify Table 7.1. Task Analysis Methods for Requirements Gathering Method
Issues Identified
Collect Task Requirements Information on normal, Data standard, and emergency procedures; information on users (individuals and groups); timing
Outcomes
Collection of individual pieces of data needed to identify tasks to be represented in the VE
Describe the Data Collected
Information on timing and Static or dynamic appropriate level of analysis representation of the tasks for individual tasks required in the VE
Assess Risk
Information on personal and Modifications to system safety requirements identified for the VE
Assess System Performance
Information on system performance and potential improvements
Modifications to requirements identified for the VE
Assess Interactions across Levels of the Organization
Information on selection, training, and workplace design; communication and allocation of function; mental models and device design
Modifications to the requirements identified for the VE
134
Learning, Requirements, and Metrics
requirements for which tasks need to be represented in the VE. The following three sections identify information that can be used to modify the initial requirements based on issues of risk, system performance, or interactions among different users of the system.
KEY ELEMENTS TO CONSIDER IN DEVELOPING THE TA In developing a TA for a virtual environment, several key features should be considered. First is timing. As with many systems, it is important to begin the analysis early in the virtual environment development process. The benefits of early TA include user acceptance and less rework. A second key element is breadth. In developing a virtual environment, it is important to cover only those requirements necessary to capture the relevant tasks. This means that decisions will need to be made up front about what will be required to make the virtual environment work. A third element is selection of the appropriate level of analysis. In general, analysis levels range from the macro to the micro level. At the macro level, concerns tend to focus on system-level issues, which mean that they will focus on physical and behavioral attributes of the system and on issues of person-toperson communications. At the micro level, the focus tends to be at the keystroke level, where the user is interacting with the system. At this level of analysis, cognitive attributes, such as the user’s goal in executing particular actions, become important and therefore must be represented in the analysis. This level is where issues of perception, analysis, response times, and human error typically are considered. In developing requirements for a virtual environment, both macro and micro tasks will likely be necessary. Thus, it will be important to select techniques that fit with each of these levels of analysis (Preece, 1994). A fourth element that must be considered is concurrency; that is, the analysis should take into account when resources may need to be shared and whether sufficient resources exist for such sharing. This concern for sufficient resources can be applied to a variety of purposes—optimizing performance or safety, reducing risk, understanding or improving operations, or developing training. In addition, TA techniques can provide information to specify the allocation of functions, characteristics of the system users, or level of staffing to improve the ability to share resources. These techniques can also provide information on job organization, task and interface design, and methods for skills and knowledge acquisition. Finally, the TA process must provide for design feedback; that is, the TA process should be iterative, allowing for design changes based on initial testing or evaluation. For example, initial testing may reveal the possibility of simplifying the VE without compromising the training requirement. Suppose that the designer has been trying to render a complex-shaped piece of equipment. However, initial testing reveals that users need to know that their access to another component may be blocked by this component during a maintenance operation. It may be sufficient to program this complex shape as a simple block rather than rendering it faithfully.
Applied Methods for Requirements Engineering
135
KEY ELEMENTS FOR A SUCCESSFUL TASK ANALYSIS • Start early. • Ensure that only relevant tasks are represented in the VE. • Ensure the appropriate level of analysis. • Ensure that sufficient resources are allocated to the system. • Ensure an opportunity for design feedback.
COLLECTING TASK REQUIREMENTS DATA Elicitation of task details can be seen as a three-legged stool. The first leg represents operating procedures that provide a view of the system operation in the way it is expected to be used and that include safety requirements as well as standard and emergency modes of operation. The second leg represents the users’ perceptions of the job and task breakdown, which can be obtained from discussion and interviews. Finally, the third leg represents observational data that show how the system is actually used, along with user “fixes” and shortcuts. Without one of the legs, the stool is unbalanced. In knowledge elicitation terms, the data are incomplete and will not reflect the system goals from all three stakeholding entities—design, operations, and users. The following sections discuss the three elements that make up the legs and show how data might be collected to support the development of requirements. Procedures Data In most descriptions of task data gathering, little if anything is said about the use of procedures. There are a number of reasons for this, not least that rarely during the design stage are procedures attempted, possibly due to continuous changes that may make procedures outdated as soon as they are written. Although individual pieces of equipment are often delivered with a manual, system operation may not be fully defined. In fact, it is not uncommon to find procedures still being written after installation and even during commissioning. When developing VEs for training, however, it is likely that the system already has been designed and implemented (see Ricci, Owen, Pharmer, and Vincenzi, Volume 3, Section 1, Chapter 2), so there may well be normal, standard, and emergency operating procedures available. These procedures manuals will provide a valuable first cut at developing the task analysis. They provide basic data on how the designer expected the system to work and are often further validated by the installation engineers once their job is completed. These data often are in an easily assimilated form and may be broken down by person and place, as well as by control and display. With the availability of modern computing systems, the hard copy manuals and procedures will usually be backed up by interactive electronic manuals, which make deriving the task analysis even less time consuming.
136
Learning, Requirements, and Metrics
The procedural data may need to be transcribed into a more usable form for the data description process, and this process may differ depending on whether the data will be analyzed by the same person who collected the data or someone new. Characterizing the procedural tasks into taxonomies (Fleishman & Quaintance, 1984) is one way of sorting out real world tasks for comparison with their virtual world counterparts to aid the VE designer. Generally, the transcription is done through a spreadsheet or simple database. With some training, direct transcription to a hierarchical task analysis (HTA), task decomposition, or even operational sequence diagram (OSD) can be achieved (Kirwan & Ainsworth, 1993; Shepherd, 2001). Data Elicited from Operators/Users In collecting data from operators/users, it is important to recognize that there may be different “types” of users operating the system. The views of each of the user groups are likely to be different, as their perception of the tasks will be different from both the procedures and the observed actions. It is also important to assess the ability of the operator to use the technology and to recognize that all the users must be matched to the task demands of the system. Further, career-training avenues need to be available within the organization for skill advancement and retention of personnel. Thus, it will be important to collect data from all types of users. Data from these user surveys can be used to build a database of operating information that can be structured into guidance for VE designers. Typical user data elicitation techniques include the following (see Volume 1, Section 2 Perspective): • Questionnaires—sets of predetermined questions in a fixed order containing closed (Yes/No), open-ended (How do you . . . ?), level of agreement, and/or level of preference or attitude items; • Structured one-on-one interviews—sets of predetermined questions face-to-face so that open-ended questions or indecisive answers can be explored; • Round table group discussions—may elicit team task behaviors or, with a group of the same users, allow interaction within the group to bring out task behaviors individuals might forget; • Walkthrough/talkthroughs—one-on-one interviews structured around describing user actions without the benefit of the equipment or system in front of him or her; • Verbal protocols—one-on-one interviews while using the equipment or system, where the user explains why an action is being taken.
Each method has advantages and disadvantages (Kirwan & Ainsworth, 1993). The first four, with their total reliance on user memory, can all suffer from incomplete information. However, they are helpful as users will provide details of shortcuts, linking activities (not covered in procedures, but found to be necessary to complete a process), and alternative ways of achieving operating goals. Any discrepancies between what the procedures say should be done and what the users say they do will be confirmed by collecting observational data as described below.
Applied Methods for Requirements Engineering
137
Observational Data The capacity of any human observer is necessarily limited; thus, the collection of observational data should be initially guided by expert opinion or previous data/research in the context. However, the collection of ancillary data should be as broad as possible (for example, videotapes, activity logs, communication transcripts, and so forth) so that new viewpoints or theories can also be explored on an ad hoc basis. Care should be taken to make both the observation and ancillary measures as unobtrusive and nonreactive as possible (Webb, Campbell, Schwartz, & Sechrest, 1966). Generally, naturalistic observation methods and case studies are used in psychology to gain a detailed account of a system’s structure and functioning over time. Methods drawn from different disciplines tend to focus on different levels as the basis for observation and analysis. For example, protocol analysis has been used by cognitive psychologists to gain a view of process at a level on the order of seconds to minutes (Ericsson & Simon, 1980, 1984). Cognitive task analysis techniques, also used by cognitive scientists, create a more detailed analysis of behavior at the level of milliseconds (Schraagen, Chipman, & Shalin, 2000). These methods can be supplemented by interviews, questionnaires, or systematic surveys targeted at specific processes or events (Dillman, 1978). At the other end of the spectrum, ethnographic and other contextual methods (see Section 2 Perspective) used in anthropology (Brislin, Walter, & Thorndike, 1973) can give a similarly rich account of the thinking and actions of one or more actors in a cultural environment across longer time spans, such as days, weeks, or months. The use of these methods for preliminary observation will allow the requirements engineer to explore natural human system function and performance based on his or her current understanding while also extracting the maximum possible amount of information to support the formulation and evaluation of additional insights. Another method for collecting task requirements data is by observation of operators as they are engaged in task completion. A good observational technique “capture[s] all the significant visual and audible events, in a form which makes subsequent analysis reasonably easy, and which does not influence the performance of the task” (Kirwan & Ainsworth, 1993, p. 54). Observation can be done through a strict recording of the activities being observed (such as with the use of some form of video recorder), or it may be done through the use of individuals not engaged in the task who record their observations. These observers may be subject matter experts, but this is not a requirement. Data can be collected continuously throughout the observational period or intermittently throughout the observation period. The most common form of intermittent observation is application of the activity sampling technique (Hoffman, Tasota, Scharfenberg, Zullo, & Donahoe, 2003). This technique measures the user’s behavior at predetermined times. This technique starts with categorization of the activities that might take place over the course of the task. Then, a sampling schedule is developed. The analyst records the targeted behaviors at intervals, either noting by tally the number of times that
138
Learning, Requirements, and Metrics
a specific activity is completed in a window of time or by capturing the sequence of activities completed within a time window. Critical incident technique is a specific form of activity sampling that looks at the key events in an operation rather than preset or random samples. It is often used where the process has specific safety requirements, such as nuclear or chemical operation and processing. COLLECTING TASK REQUIREMENTS DATA Keys to Success • Represent operating procedures that provide a view of the system operation in the way it is expected to be used and that include safety requirements, as well as standard and emergency modes of operation. • Represent the users’ perceptions of the job and task breakdown. • Represent observational data that show how the system is actually used, along with user fixes and shortcuts.
Outcomes • Normal, standard, and emergency procedures • Information on users (individuals or groups, whether supervised or unsupervised) • Information on activities conducted and timing of activities
METHODS FOR DESCRIBING DATA COLLECTED Having collected the task data, the analyst needs to decide how the tasks should be broken down to provide the human factors input to the requirements analysis. That is, defining the requirements goals for the analysis is an important step in choosing the method to describe the data. In most cases, one specific method will not provide all the answers needed. For instance, HTA is a great method for breaking down complex procedures, but does not provide a timeline, and concurrent tasks are difficult to show. Link analysis will show interconnecting tasks and provide a spatial representation, but tasks have to be identified separately and again there is no timeline. Choosing the right method will partly depend on the type of data collection method, but mainly on the expected results from the task analysis. Using TA during design will generally mean that data are less reliable, and more fragmented, than when a TA is done on a mature equipment or system. For VE training systems, the analyst will be more concerned with ensuring that the VE designer has the tasks broken down to a sufficient level of detail such that they can program the virtual world with the most efficient use of memory and provide accurate response times. The following sections outline some of the more common task analysis methods used to provide task description methods suitable for gathering the requirements for the VE.
Applied Methods for Requirements Engineering
139
Task analysis can be seen as a way of taking complex relationships—systems and jobs—and breaking them down into progressively simpler elements—tasks and subtasks, which can then be used as requirements for the operational and training systems. For instance, many of the early methods were concerned with the more physical aspects of system operation and were aimed at breaking tasks down to a level that enabled the controls and displays that would be needed to successfully carry out routine tasks to be defined. Typical of these task decomposition methods are HTA (Shepherd, 2001), OSDs, link analysis, and the task decomposition method itself. These methods are highly structured and demanding in time and resources, but provide a wealth of task-level descriptors that can be used for requirements gathering. Operational sequence diagrams are regularly used in military contexts, or wherever operations are tightly controlled and regulated. Link analysis has been used in a variety of commercial and military applications, with particular success in control room design and air traffic control operations. Hierarchical task analysis has been used to analyze nuclear power plant operations, to provide a basis for training manuals, and to reorganize organizations. It has been shown to work at the multisystem, system, equipment, and component levels. It is particularly noteworthy that cognitive task analysis methods designers have used the more physically based hierarchical task analysis as a descriptive basis for the goals, operators, methods, and selection rules used in GOMS (goals, operators, methods, and selection rules), natural GOMS language, and cognitive, perceptual, and motor GOMS (John & Kieras, 1996a, 1996b). This link between physical and cognitive TA is extremely useful when deciding on a level of analysis, particularly with reference to understanding both the physical and cognitive aspects of a system or equipment operation. It also serves as an information collection tool to systematically expand the basic description of the task element activities. Static Descriptions (For Example, Charting/Networking) A conventional decomposition method is the “tree” diagram as, for instance, in an organizational chart or a fault tree. Predominantly a static representation, an event is broken down into features that reflect the specific area under analysis. This might be the cause and effect of an accident or how one organizational entity interfaces with another. Generally the data will not describe individual tasks, but a collection of events or, as in the case of petri nets, interactive nodes that model the behavior of a system (Peterson, 1977). The descriptions provided by these methods might be more useful to the VE designer than some of the other task analysis methods, as they may reflect traditional engineering conventions. Dynamic Descriptions (For Example, Simulations and Computational Models) Another way of describing data that have been collected is through the use of simulations or models. Simulations reproduce the behavior of a system (or part of a system) and typically allow for interaction with the system. Models represent
140
Learning, Requirements, and Metrics
some portion of a system that is of interest. Some include representations of user cognition, while others focus more on behavioral outcomes (Barnes & Beevis, 2003; Bisantz & Roth, 2008; Gugerty, 1993). Further, some models provide static predictions of performance (John & Kieras, 1996a, 1996b), while others are computational and stochastic, such as adaptive control of thought–rational (Anderson & Lebiere, 1998), executive process/interactive control (EPIC; Kieras & Meyer, 1997), Soar (Rosenbloom, Laird, & Newell, 1993), and constraint based optimal reasoning engine (Eng et al., 2006). Most modeling techniques require expertise both in the application domain (for example, supervisory control) and in interface design. They also each make assumptions about the primitive operations present in the system and about their users. This requires extensive experience with the model on the part of the designer. Fortunately, these models are starting to make the transition from academia to industry. Work is beginning to appear that exercises the models and tests their limits (Gray & Boehm-Davis, 2000; John & Vera, 1992). Gugerty (1993) has described some steps that might be taken to allow these models to be successfully applied in industry; in addition, work is being done to develop tools that can allow designers, even those without a cognitive psychology background, to apply these techniques (John, Prevas, Salvucci, & Koedinger, 2004; John & Salvucci, 2005). DATA COLLECTION Keys to Success • Select analysis method • Represent the task a. Statically b. Dynamically
Outcome • Static or dynamic representation of the tasks required to be represented in the VE
METHODS FOR ASSESSING POTENTIAL SOURCES OF RISK In developing training, it is critical to know what information needs to be conveyed to users or what experiences users need to have in order to remain safe when using the new system. In developing requirements for a VE to be used for training, it is important to identify the risk characteristics of the system so as to provide an accurate simulation in the VE of the problems that might be encountered. A number of methods are available to expert HF appraisers to assess the safety of existing or proposed systems. Barrier analysis and work safety analysis (Kirwan & Ainsworth, 1993) both examine the extent to which protective
Applied Methods for Requirements Engineering
141
measures exist within a system. Event trees (Gertman & Blackman, 1994; Kirwan & Ainsworth, 1993; Kirwan & James, 1989; Park, 1987) and failure modes and effects analyses (Crow, 2002; Kirwan & Ainsworth, 1993), both of which have their origins in systems reliability metrics, examine human reliability and the consequences that derive from human failure. Hazard and operability analyses attempt to identify system design issues based on input from expert personnel (Benedyk & Minister, 1998; Kirwan & Ainsworth, 1993), while influence diagrams (Howard, 1990) provide a graphic representation of the factors that are identified as contributory to safety problems. Finally, the management oversight risk tree technique (Johnson, 1973, 1980) allows for an examination of the influence that the management structure of an organization has on safety. Another approach to evaluating the human interaction with a system is the use of anthropometric models, such as JACK (Badler, 1989; Phillips & Badler, 1988; UGS, 2004), which allow system designers to construct a computer based representation of the product and animate its use based on three-dimensional models of the human. These models may use data from standards or empirical studies (Badler, 1989; Phillips & Badler, 1988; You & Ryu, 2004). These models have proven useful in developing methods for training (requirements for how to train). For example, JACK has been used to develop safe procedures for manual handling operations, such as lifting or using objects in awkward postures, where the actual conditions are likely to be hazardous or a full-scale training rig is impracticable. ASSESSING RISK Keys to Success • Apply event trees, failure mode and effects analyses, and anthroprometric models to evaluate risk potential
Outcome • Assessment of risk that can be used to c. Identify the risk characteristics of the system that must be represented in the VE simulation d. Modify the design of the system to reduce training requirements
METHODS FOR ASSESSING THE SYSTEM (FOR EXAMPLE, CHECKLISTS AND SURVEYS) If the ultimate aim of task analysis is to improve the interface between humans and machines by a more detailed understanding of the tasks involved in system operations, then there needs to be a way of determining whether an improvement has been made. Methods that can help do this are typically checklists and surveys and absolute judgment. The first two are fairly straightforward, and a number of
142
Learning, Requirements, and Metrics
guidance documents exist to help carry out such assessments, such as NUREG0700 (U.S. Nuclear Regulatory Commission, 1981) for nuclear power systems, MIL-STD 1472 (U.S. Department of Defense, 1999) for the military, and the Questionnaire for User Interface Satisfaction (Chin, Diehl, & Norman, 1987) for general interface concerns. Such documents provide a wealth of information, including what to look for, specific design requirements, and even what questions to ask. If part of the original TA data collection was done by questionnaire, then a basis for comparison is available to help determine if an improvement has been made. Within the VE, an assessment of the training value can assess the extent to which training transfer is perceived as effective and efficient (see Cohn, Stanney, Milham, Carroll, Jones, Sullivan, and Darken, Volume 3, Section 2, Chapter 17). Absolute judgment is assessment by a group of experts who reach agreement on a range of criteria applied to the system or equipment. Generally, this technique does not produce the level of accuracy that one might like (Miller, 1956; Nielsen & Phillips, 1993), especially if used instead of normal task analysis data gathering techniques, where much more consistent results can be found. Expert assessments also can be limited in identifying usability and safety problems. For example, it has been shown (Rooden, Green, & Kanis, 1999) that any given expert is likely to identify a relatively unique subset of problems and that the problems identified will be influenced by the materials available for review (for example, a real product, a set of drawings, a mock-up of the product, or a video of user trials). ASSESSING SYSTEM PERFORMANCE Keys to Success • Apply checklists, surveys, and expert appraisals to evaluate system (human and machine) performance
Outcome • Assessment of performance that can be used to e. Modify the requirements for the VE f. Modify the design of the system to reduce training requirements
METHODS FOR ASSESSING INTERACTIONS ACROSS LEVELS OF THE ORGANIZATION The individual analyses that have been described should identify key constraints that influence performance and that need to be represented in requirements. Aspects such as workplace tools and job design, for example, may set boundaries on the performance level that can be achieved in a given situation. However, to get the greatest benefit from this approach to developing requirements, the central framework of the analyses should be extended by a
Applied Methods for Requirements Engineering
143
consideration of the levels of analysis represented within a work organization. For example, this process should determine whether an identified key social cognitive process has plausible links to lower level cognitive psychological or physiological processes or links to upper level industrial/organizational and sociological/anthropological processes. This extension across levels is difficult because theories from individual scientific domains tend to focus on one level to the exclusion of the others. Nevertheless, the vertical integration across levels can give important connections that combine the theoretical views abstracted from different domains into a coherent whole. Consideration of these issues can also be undertaken while the user is undergoing training in the VE. That is, although these issues should be considered when developing requirements for the VE, observation of human performance while immersed in the VE may also provide insights into redesign of the system being modeled. Organizational Issues—Selection and Training and Workplace Design Many industries today are opting for “lean” organizations, even while expecting no loss in quality or worsening error rates (see Hendrick, 2008, for a review of organizational- and system-level issues). These reductions in staffing have implications for the skills required by operators and for the training they require. For example, staffing studies will require complex scenarios so as to fully understand the ramifications of fewer people. However, there will still be a danger in this approach as these scenarios may be sufficient to understand normal operations, but not the problems that more usually arise in emergency situations and when automated systems have broken down. Understanding individual versus organizational roles and responsibilities and changes caused by reorganization are also important. Overlaying the organizational structure on top of the TA breakdown for the individual may present the VE designer with an opportunity to show where gaps may lie. Very often, it is the unscheduled tasks that staff members carry out to fill gaps in the process that cause operational problems when roles are removed or subsumed. Using a VE for training may make those gaps more visible. It might also allow for feedback to the organization on more efficient workplace layouts. VE may also allow staff members to train for circumstances when the technology does not work perfectly or for emergency situations. Small Group Issues—Communication and Function Allocation Issues An advantage of VE over traditional training environments is the interactive aspect of the experience. The VE allows not only for allocation of function between the user and the system as it will operate in the real environment, but it may also allow for adaptive allocation based on measures of performance. VE also can allow small groups to rehearse communications as they may exist in the actual environment.
144
Learning, Requirements, and Metrics
Observation of performance in the training VE may also be useful in making recommendations for changes to either function allocation or small group communications. For example, observation of the training environment may make problems with communication among users and/or operators or issues with function allocation more visible. Individual Issues—Mental Model and Device Design Issues The goal of all training programs is to impart information or knowledge to the user. Thus, it is important to verify what knowledge the VE training environment has conveyed to the user. Carroll and Olson (1987) propose three basic representations to characterize what users know: (1) simple sequences, (2) methods, and (3) mental models. Simple sequences refer to the sequence of actions that must be taken to perform a given task. These sequences are steps that allow users to get things done. They do not require that the user understand why the steps are being performed. Methods refer to the knowledge of which techniques or steps are necessary to achieve a specific goal. This characterization of knowledge, unlike simple sequences, incorporates the notion that people have general goals and subgoals and can apply methods purposefully to achieve them. Mental models refer to a more general knowledge of the workings of a system. Specifically, mental models are defined as “a rich and elaborate structure, reflecting the user’s understanding of what the system contains, how it works, and why it works that way” (Carroll & Olson, 1987, p. 6). However, mental models are assumed to be incomplete (Norman, 1983); thus, the possession of a mental model for a system trained in a VE does not necessarily mean that the user has a technically accurate or complete representation of the system’s functioning. Thus, it is critical to assess user knowledge of the system after training. Any misunderstandings that are common to a number of users can be considered a technical requirement to change the design of the system or the training interface to help more accurately convey system functioning. ASSESS INTERACTIONS ACROSS LEVELS OF THE ORGANIZATION Keys to Success • Evaluate organizational policies on selection and workplace design • Evaluate communication patterns and allocation of functions within small groups • Evaluate individual mental models that result from device design
Outcomes • Assessment of organizational constraints that can be used to g. Modify the requirements for the VE h. Modify the design of the system to reduce training requirements
Applied Methods for Requirements Engineering
145
SUMMARY VEs promise much in the way of realistic and accurate training scenarios. The question is whether this increased realism will lead to more cost-effective training or better transfer of training. This chapter argues that these benefits can be achieved only through the application of the task analysis techniques seen in human factors to support an operator (user)-centered approach to training requirements gathering rather than the more conventional technology-oriented approach. REFERENCES Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Erlbaum. Badler, N. I. (1989, April). Task-driven human figure animation. Paper presented at the National Computer Graphics Association 89, Philadelphia, PA. Barnes, M., & Beevis, D. (2003). Human system measurements and trade-offs in system design. In H. R. Booher (Ed.), Handbook of human systems integration (pp. 233– 263). New York: John Wiley & Sons. Benedyk, R., & Minister, S. (1998). Evaluation of product safety using the BeSafe method. In N. Stanton (Ed.), Human factors in consumer products (pp. 55–74). London: Taylor & Francis, Ltd. Bisantz, A., & Roth, E. (2008). Analysis of cognitive work. In D. A. Boehm-Davis (Ed.), Reviews of human factors and ergonomics (Vol. 3, pp. 1–43). Santa Monica, CA: Human Factors and Ergonomics Society. Brislin, R. W., Walter, J. L., & Thorndike, R. M. (1973). Cross-cultural research methods. New York: John Wiley & Sons. Carroll, J., & Olson, J. R. (Eds.). (1987). Mental models in human-computer interactions: Research issues about what the user of software knows. Washington, DC: National Academy Press. Chin, J., Diehl, V., & Norman, K. (1987, September). Development of an instrument measuring user satisfaction of the human-computer interface. Paper presented at the ACM CHI 88, Washington, DC. Cockayne, W., & Darken, R. P. (2004). The application of human ability requirements to virtual environment interface design and evaluation. In D. Diaper & N. A. Stanton (Eds.), The handbook of task analysis for human-computer interaction (pp. 401–422). Mahwah, NJ: Lawrence Erlbaum. Crow, K. (2002). Failure modes and effects analysis. Palos Verdes, CA: DRM Associates. Diamond, J. (1986). Down to a sunless sea: The anatomy of an incident. Retrieved July 19, 2007, from www.ericmoody.com Dillman, D. A. (1978). Mail and telephone surveys: The total design method. New York: John Wiley & Sons. Eng, K., Lewis, R. L., Tollinger, I., Chu, A., Howes, A., & Vera, A. H. (2006, April). Generating automated predictions of behavior strategically adapted to specific performance objectives. Paper presented at the Human Factors in Computing Systems, Montreal, Quebec, Canada. Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87, 215–251.
146
Learning, Requirements, and Metrics
Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis. Cambridge, MA: MIT Press. Fleishman, E. A., & Quaintance, M. K. (1984). Taxonomies of human performance: The description of human tasks. Orlando, FL: Academic Press. Gertman, D. I., & Blackman, H. S. (1994). Human reliability & safety analysis data handbook. New York: John Wiley & Sons, Inc. Gray, W. D., & Boehm-Davis, D. A. (2000). Milliseconds matter: An introduction to microstrategies and to their use in describing and predicting interactive behavior. Journal of Experimental Psychology: Applied, 6, 322–335. Gugerty, L. (1993). The use of analytical models in human-computer-interface design. International Journal of Man-Machine Studies, 38, 625–660. Hendrick, H. (2008). Macroergonomics: The analysis and design of work systems. In D. A. Boehm-Davis (Ed.), Reviews of human factors and ergonomics, (Vol. 3, pp. 44–78). Santa Monica, CA: Human Factors and Ergonomics Society. Hoffman, L. A., Tasota, F. J., Scharfenberg, C., Zullo, T. G., & Donahoe, M. P. (2003). Management of patients in the Intensive Care Unit: Comparison via work sampling analysis of an acute care nurse and physicians in training. American Journal of Critical Care, 12(5), 436–443. Howard, R. A. E. (Ed.). (1990). Influence diagrams. New York: John Wiley & Sons Ltd. John, B. E., & Kieras, D. E. (1996a). The GOMS family of user interface analysis techniques: Comparison and contrast. ACM Transactions on Computer-Human Interaction, 3(4), 320–351. John, B. E., & Kieras, D. E. (1996b). Using GOMS for user interface design and evaluation: Which technique? ACM Transactions on Computer-Human Interaction, 3(4), 287–319. John, B. E., Prevas, K., Salvucci, D., & Koedinger, K. (2004, April). Predictive human performance modeling made easy. Paper presented at the CHI, Vienna, Austria. John, B. E., & Salvucci, D. D. (2005, October–December). Multipurpose prototypes for assessing user interfaces in pervasive computing systems. IEEE Pervasive Computing, 4, 27–34. John, B. E., & Vera, A. H. (1992, May). A GOMS analysis of a graphic, machine-paced, highly interactive task. Paper presented at the Human Factors in Computing Systems, Monterey, CA. Johnson, W. G. (1973). MORT oversight and risk tree (Rep. No. SAN 821-2). Washington, DC: U.S. Atomic Energy Commission. Johnson, W. G. (1980). MORT safety assurance system. New York: Marcel Dekker. Kieras, D., & Meyer, D. E. (1997). An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction, 12, 391–438. Kirwan, B., & Ainsworth, L. K. (Eds.). (1993). A guide to task analysis. London: Taylor & Francis, Ltd. Kirwan, B., & James, N. J. (1989, June). The development of a human reliability assessment system for the management of human error in complex systems. Paper presented at Reliability 89, Brighton Metropole, England. L3 Communications. (2007). Link simulation & training: Setting the standard for over 75 years. Retrieved July 16, 2007, from http://www.link.com/history.html Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97.
Applied Methods for Requirements Engineering
147
Nielsen, J., & Phillips, V. L. (1993, April). Estimating the relative usability of two interfaces: Heuristic, formal, and empirical methods compared. Paper presented at the ACM INTERCHI’93 Conference on Human Factors in Computing Systems, Amsterdam, The Netherlands. Norman, D. (1983). Some observations on mental models. In D. Gentner & A. L. Stevens (Eds.), Mental models (pp. 7–14). Hillsdale, NJ: Lawrence Erlbaum. Park, K. S. (1987). Human reliability: Analysis, prediction, and prevention of human error. Amsterdam: Elsevier. Peterson, J. L. (1977, September). Petri nets. ACM Computing Surveys (CSUR), 9, 223–252. Phillips, C., & Badler, N. I. (1988, October). Jack: A toolkit for manipulating articulated figures. Paper presented at the ACM/SIGGRAPH Symposium on User Interface Software, Banff, Canada. Preece, J. (1994). Human-computer interaction. New York: Addison-Wesley Publishing Company. President’s Commission on the Accident at Three Mile Island. (1979). The need for change, the legacy of TMI: Report of the President’s Commission on the accident at Three Mile Island (aka “Kemeny Commission report”). Washington, DC: U.S. Government Printing Office. Rooden, M. J., Green, W. S., & Kanis, H. (1999, September). Difficulties in usage of a coffeemaker predicted on the basis of design models. Paper presented at the Human Factors and Ergonomics Society 43rd Annual Meeting, Houston, TX. Rosenbloom, P., Laird, J., & Newell, A. (Eds.). (1993). The soar papers: Research on integrated intelligence. Cambridge, MA: MIT Press. Schraagen, J. M., Chipman, S. F., & Shalin, V. L. (Eds.). (2000). Cognitive task analysis. Mahwah, NJ: Lawrence Erlbaum. Shepherd, A. (2001). Hierarchical task analysis. New York: Taylor & Francis. U.S. Department of Defense. (1999). Department of Defense design criteria standard (MIL-STD 1472F). Washington, DC: Author. U.S. Nuclear Regulatory Commission. (1981). Guidelines for control room design reviews (NUREG-0700). Washington, DC: Author. UGS. (2004). Jack. Retrieved October 18, 2007, from http://www.ugs.com/products/ efactory/jack/ Webb, E. J., Campbell, D. T., Schwartz, R. D., & Sechrest, L. (1966). Unobtrusive measures: Nonreactive research in the social sciences. Chicago: Rand McNally. You, H., & Ryu, T. (2004, September). Development of a hierarchical estimation method for anthropometric variables. Paper presented at the Human Factors and Ergonomics Society 48th Annual Meeting, New Orleans, LA.
Chapter 8
CREATING TACTICAL EXPERTISE: GUIDANCE FOR SCENARIO DEVELOPERS AND INSTRUCTORS Jennifer Phillips, Karol Ross, and Joseph Cohn In a previous chapter (see Ross, Phillips, and Cohn, Volume 1, Section 1, Chapter 4), a five-stage model of learning in complex cognitive domains was presented based on the work of Dreyfus and Dreyfus (1986). Characteristics of a learner’s knowledge and ability were described at each of these stages: novice, advanced beginner, competent, proficient, and expert. With regard to the differential skill sets, a notional strategy was provided for enhancing a learner’s progression from one stage to the next. Cognitive Transformation Theory was introduced in Klein and Baxter (Volume 1, Section 1, Chapter 3), postulating learning as a sensemaking activity and describing a learner’s progression to higher levels of proficiency as a function of replacing and improving domain-specific mental models. This chapter extends the research and theory discussed in Chapters 3 and 4 to the question of how virtual environments (VEs) can be optimally employed to hasten the movement of learners along the continuum from novice to expert in the complex cognitive domain of tactical thinking. The premise of this chapter is that the design requirements for training scenarios and instructional strategies differentially depend on the learner’s level of proficiency. Though VEs can be excellent settings for learning when properly developed, the instructional implementation of these training systems today is often far removed from what is known to be effective for intermediate and advanced learning (for example, the advanced beginner, competent, and proficient stages). A road map for successful design and employment of training in VEs does not exist. Simulation developers have spent much time and money reproducing the most faithfully realistic experiences they could, trusting that experience alone in these environments will create expertise. Now, as a result of efforts to integrate research findings and describe the process by which complex cognitive skills develop in naturalistic domains, it is possible to shed more light on learning stage-specific requirements for VE training systems (Ross, Phillips, Klein, & Cohn, 2005). This chapter provides a brief overview of a Cognitive Skill Acquisition framework and then applies the framework to the tactical
Creating Tactical Expertise: Guidance for Scenario Developers
149
thinking domain in order to provide initial actionable guidance to scenario developers and instructors who utilize VE training to improve complex tactical decision and judgment skills. THE COGNITIVE SKILL ACQUISITION FRAMEWORK Ross and her colleagues (2005) presented a Cognitive Skill Acquisition framework to describe the learning process in ill-structured, cognitively complex knowledge domains and subsequent training implications for each of the five stages of learning. Readers are referred to Ross, Phillips, and Cohn (Volume 1, Section 1, Chapter 4) for a description of the distinctions among the five levels of performance, which are summarized in Table 8.1. The Cognitive Skill Acquisition framework views learning as the process of moving from one stage to the next. Therefore, its training implications reflect in part the goal of getting learners to exhibit the characteristics of the next stage along the continuum. The framework can be applied to specify training requirements in range of domains, as it is applied in this chapter to tactical thinking, so long as • The boundaries of the target domain are clear, • The presence of cognitively complex challenges in performance are evident (that is, it is a cognitively complex domain rather than a rule-driven, procedural domain), and • An analysis of cognitive performance has been conducted and the nature of expertise development is understood, such as through a cognitive task analysis.
The process for applying the Cognitive Skill Acquisition framework to a particular domain requires that the general characteristics of performance for each of the five stages be customized to the domain using cognitive task analysis or similar data. It may be most useful to generate these domain-specific characteristics along a set of themes, as is illustrated in the context of the tactical thinking domain by the example in Table 8.2. Principles for Learning Progression Individuals do not develop general, context-independent cognitive skills such as decision making, sensemaking, and problem detection. They get better at these activities when they develop their mental models in a specific domain (Glaser & Baxter, 2000; Spiro, Feltovich, Jacobson, & Coulson, 1992). Their mental models support understanding, reasoning, prediction, and action (Genter, 2002; Ross et al., 2005). Five principles regarding the learning process for ill-structured, cognitively complex domains have been derived from an extensive review of the research literature addressing expertise and the nature of learning in complex cognitive (or ill-structured) domains (see Klein and Baxter, Volume 1, Section 1, Chapter 3; Ross, Phillips, and Cohn, Volume 1, Section 1, Chapter 4). Principle 1. The nature of training befitting of novices is qualitatively different from training that is effective for advanced learners. Novices respond well to introductory learning that provides rigid rules, structure within the domain, and
Table 8.1. Stage
Novice
Advanced Beginner
Competent
Proficient
Expert
Overview of the Stage Model of Cognitive Skill Acquisition (Reprinted by Permission; Lester, 2005) Characteristics
Rigid adherence to taught rules or plans Little situational perception No discretionary judgment Guidelines for action based on attributes or aspects Situational perception is still limited All attributes and aspects are treated separately and given equal importance Sees action at least partially in terms of longer-term goals Conscious, deliberate planning Standardized and routinized procedures Plan guides performance as situation evolves Sees situation holistically rather than in terms of aspects Sees what is most important in a situation Perceives deviations from the normal pattern Uses maxims, whose meanings vary according to the situation, for guidance Situational factors guide performance as situation evolves No longer relies on rules, guidelines, or maxims Intuitive grasp of situations based on deep tacit understanding Intuitive recognition of appropriate decision or action Analytic approaches used only in novel situations or when problems occur
How Knowledge Is Treated
Recognition of Relevance
How Context Is Assessed
Decision Making
Without reference to context None
Analytically
Rational In context Present Holistically Intuitive
Table 8.2.
General and Domain-Specific Characteristics for the Advanced Beginner Stage
Knowledge
General Characteristics Performance
STAGE 2: ADVANCED BEGINNER Characteristics in Tactical Thinking Domain Tactical Thinking Profile Example
• Is marginally acceptable (Benner, • Some domain experience 1984) (Benner, 1984; Dreyfus & • Combines the use of objective, or Dreyfus, 1986) context-free, facts with situational • More objective, context-free elements (Dreyfus & Dreyfus, 1986) facts than the novice, and • Ignores the differential importance more sophisticated rules of aspects of the situation; situation (Dreyfus & Dreyfus, 1986) • Situational elements, which are is a myriad of competing tasks, all recurring, meaningful elements with same priority (Benner, 1984; Dreyfus & Dreyfus, 1986; Shanteau, of a situation based on prior 1992) experience (Dreyfus & Drey• Shows initial signs of being able to fus, 1986) • A set of self-generated guide- perceive meaningful patterns of information in the operational envilines that dictate behavior in ronment (Benner, 1984) the domain (Benner, 1984) • Seeks guidance on task perfor- • Reflects attitude that answers are to be found from an external source mance from context-rich (Houldsworth et al., 1997) sources (for example, experienced people and documenta- • Reflects a lack of commitment or sense of involvement (McElroy, tion of past situations) rather Greiner, & de Chesnay, 1991) than rule bases (for example, textbooks) (Houldsworth, O’Brien, Butler, & Edwards, 1997)
• Advanced beginners will show some • Mission. Understands that own signs of experiential knowledge, but mission must support intent, but is will still struggle. In urban combat, unable to operationalize intent for example, they are likely to use • Enemy. Understands the impact of only the latest intelligence (rather the enemy on own mission, but than situational cues) to estimate the regards enemy as static being enemy’s current strength and • Terrain. Recognizes important terrain features and avoids nonsubtle location. They will not conceptualize the enemy as a force that could move problem areas such as chokepoints, or take action, with the exception of but remains unable to leverage engaging friendlies from the current terrain to own advantage position. They will look at buildings • Assets. Understands how to apply and take note of their sizes and organic asset capabilities to locations, but not recognize the particular mission requirements implications for the mission—for • Timing. Acknowledges that timing example, brick buildings will not be and sequencing are important • Big Picture. Fails to understand how interpreted to be better strongholds than buildings constructed with own mission and activities function Sheetrock. They will match as a part of the larger organization subordinate units to particular tasks, • Contingencies. Does not consider but are unlikely to mix assets across contingencies • Visualization. Is unable to visualize units (for example, attach a rifle team battlefield to an engineer unit for security).
152
Learning, Requirements, and Metrics
modularized facts and knowledge. Advanced beginner, competent, and proficient individuals are more likely to improve their performances as a result of experiential training where they can practice decisions, assessments, and actions and then reflect on their experiences. The following mistakes are often made in the design of training events for advanced learners: • Simplifying the interrelationships among topics and principles in order to make them easier to understand and employing a single analogy, prototype example, organizational scheme, perspective, or line of argument (Spiro et al., 1992). Training for advanced learners should not seek to simplify concepts that are complex. Simulations should introduce several cases of a single complex principle in order to demonstrate its applicability across a range of circumstances. • Overemphasizing memory by overloading the learner with the need to retrieve previous knowledge. Training should require learners to apply principles rather than carry the heavy baggage of detailed rules and content (Spiro et al., 1992).
Principle 2. People can improve their mental models by continually elaborating them or by replacing them with better ones. However, at each juncture the existing mental models direct what learners attend to and how they interpret environmental cues. This makes it difficult for learners to diagnose what is lacking in their beliefs and take advantage of feedback. Knowledge shields (Feltovich, Johnson, Moller, & Swanson, 1984) work against learners’ attempts to get better and smarter by permitting them to explain away inconvenient or contradictory data. Training scenarios can support the elaboration of mental models by prohibiting common ways of achieving an outcome so that learners must find viable alternatives. Further, scenarios can break learners out of their knowledge shields by purposefully presenting inconvenient data that turn out to be central to an accurate assessment of the situation. Principle 3. The most dramatic performance improvements occur when learners abandon previous beliefs and move on to new ones. Some call this “unlearning.” Old mental models need to be disconfirmed and abandoned in order to adopt new and better ones, and this path to expert mental models is discontinuous. Development does not occur as a smooth progression. It often requires some backtracking to shed mistaken notions (see Klein and Baxter, Volume 1, Section 1, Chapter 3). Training scenarios designed with unmistakable anomalies or baffling events can serve to break learners out of their current perceptions. Principle 4. Learners who can assess their own performance will improve their mental models more quickly than their peers. The knowledge needed to selfassess is built into domain mental models, so by refining this skill learners are also enhancing their understanding of the dynamics of the domain itself. Also, self-assessment is more efficient than assessments provided by a second party; they occur continually, with immediacy, and within each and every experience. Skilled mentors can help learners develop self-assessment skills by helping them diagnose their weaknesses and discover where their mental models are too simplistic. Principle 5. Experiences are a necessary, but not sufficient, component for the creation of expertise.Training experiences can be a waste of time or even be
Creating Tactical Expertise: Guidance for Scenario Developers
153
harmful when they do not allow adequate opportunity for domain-appropriate mental model building, target the right kind of challenges, support performance, and provide insights. It has been noted that “a key feature of ill-structured domains is that they embody knowledge that will have to be used in many different ways, ways that cannot all be anticipated in advance” (Spiro et al., 1992, p. 66). To develop mental models for such complex performance, the learner must be immersed in multiple iterations of experiences from different vantage points to make numerous connections.
THE COGNITIVE SKILL ACQUISITION FRAMEWORK APPLIED TO TACTICAL THINKING Eight themes of tactical thinking performance delineated by Lussier and his colleagues as a result of a cognitive task analysis (Lussier, Shadrick, & Prevou, 2003) provide the structure for the Cognitive Skill Acquisition framework for tactical thinking: • Focus on the Mission and Higher Headquarters’ Intent • Model a Thinking Enemy • Consider Effects of Terrain • Know and Use All Assets Available • Consider Timing • See the Big Picture • Consider Contingencies and Remain Flexible • Visualize the Battlefield
Developmental sequences vary from domain to domain, and training interventions must match these naturally occurring sequences. In the case of tactical thinking, the first four themes—Mission, Enemy, Terrain, and Assets—are hypothesized to represent mental models and develop before the last four themes—Timing, Big Picture, Contingencies, and Visualization, which are higher order cognitive processes or mental manipulations of the first four mental models (Ross, Battaglia, Hutton, & Crandall, 2003; Ross, Battaglia, Phillips, Domeshek, & Lussier, 2003). The training implications described below consider this hypothesized developmental sequence in conjunction with the Cognitive Skill Acquisition framework. Implications for Tactical Thinking Training in Virtual Environments At the novice stage of development of tactical thinking skills the training value of VEs is quite different than at the advanced stages. For novices, VE training should provide learners with support in operationalizing the facts, rules, and processes they learn through other forms of training. In other words, VE training must enable them to practically apply the knowledge they are gaining in order to
154
Learning, Requirements, and Metrics
establish their own experience based mental models about “how things work” on the battlefield and when it is appropriate to use specific procedures and tactics. Advanced stage performers, however, have already developed these basic mental models about friendly assets, mission tasks, terrain features, and the enemy. At these stages, the role of VE training is to facilitate development of a rich base of varied experiences resulting in highly elaborated mental models. Decision making, sensemaking, and other naturalistic cognitive activities can be practiced in complex environments with varied goals, situational constraints, and mission types to produce tactical thinkers who can respond flexibly and effectively in most any situation. This section is organized by stage of learning. Within each stage, indicators of that proficiency level are described. The indicators should be used by instructors or scenario developers to anticipate how the learner will think through tactical problems and also as means of comparing characteristics from one stage to the next, with the goal of producing performance associated with the next stage up. In addition, specific scenario design and instructional requirements are presented. Novices Indicators of Proficiency Level A novice is likely to show the following behaviors: • When asked about his mission, regurgitates the mission order, but fails to reference the commander’s intent, • When asked about assets at his disposal, provides a textbook or standardized characterization of their capabilities, • When asked about the enemy, does not know typical tactics or capabilities of the particular enemy in question. The novice may provide a theoretical set of capabilities based on a class of enemy (for example, “A Soviet commander would . . . ” or “An insurgent would . . .” or “A Middle Eastern adversary would . . .”). • When asked about terrain, goes through a classroom-taught checklist, such as observation, cover and concealment, obstacles, key terrain, and avenues of approach.
Scenario Design Components Existing military training is very strong for individuals at the novice level. Novices require standard rules to anchor their thinking and knowledge about how to execute procedures. However, introduction of VE simulations into the novices’ training program can assist them in developing an understanding of when and how the rules and procedures apply operationally. Simulations for novices should utilize a ground based (rather than a bird’s eye) perspective to immediately familiarize them with cue sets such as they will find in the real world. Further, scenarios should do the following: • Focus on utilization of assets and requirements for mission accomplishment. The content of training scenarios should enable novices to practice executing procedures and
Creating Tactical Expertise: Guidance for Scenario Developers
155
tactics (for example, establishing a blocking position, executing 5 meter and 25 meter searches) in context. Learners should practice on a range of scenarios that illustrate how tactics must be implemented somewhat differently depending on the particulars of the situation. Simulations should also require learners to allocate assets to various mission tasks and receive embedded feedback about the effective range of weapons in context and the time it takes to traverse between points given situational factors such as road conditions and weather. They should illustrate that a unit is not a single fused entity as it appears on a tactical map, but rather consists of moving pieces and parts (such as people and vehicles). This enables learners to begin forming mental models of assets that can be split up or attached to other units and to begin conceptualizing groupings that occupy more than a single static grid coordinate on a map. • Incorporate simple aspects of a dynamic enemy. Training scenarios should exhibit that the enemy is not static. For example, design scenarios in which the enemy moves or splits up his forces. For novices, this is sufficient introduction of the enemy. • Incorporate simple but meaningful terrain features. Simple terrain may include features found in rural settings, such as hills or berms and paved or dirt roads. In urban settings, simple terrain might include one- and two-story buildings and intersections. These features are in contrast to more complex terrain, which may include wooded areas in rural settings that could be sparse or dense and therefore have different mobility affordances. In urban settings, highly complex terrain would include underground sewer systems or densely populated areas with several roads and buildings. For novices, scenarios should include such terrain features as hills or other elevated areas that impact line of sight. They should introduce features that will make clear the difference between cover and concealment, such as buildings (cover and concealment, depending on the construction) or automobiles (concealment, but not cover). They should present dirt versus paved roads that differentially affect rates of movement.
Instructional Strategies Novices will be best served by practicing on a range of scenarios with different assets available and different mission requirements. The goal is to support basic mental model development across a wide range of asset and mission types in order to produce an understanding of asset capabilities and mission tasks in context. At the novice level, instructors or coaches are necessary to guide and direct the learning process more so than at the later stages. Following VE training sessions, an instructor-led after action review should focus on the lines of questioning below and probing regarding the learners’ experiences. For every topic addressed in the after action review, the instructor should ask why learners made particular decisions or situation assessments. Illuminate the thought process of the learner in each case. • Asset capabilities. What was learned about how to use the assets’ capabilities in the context of the situation? • Mission. What actions were taken and why? Was the mission accomplished? Why or why not?
156
Learning, Requirements, and Metrics
• Enemy. What was learned about the enemy? What was surprising about the enemy? How might learners think about the enemy differently next time? • Terrain. How did terrain features impact the mission?
Advanced Beginners Indicators of Proficiency Level An advanced beginner is likely to show the following behaviors: • When asked about the mission, describes the mission and the commander’s intent, but is unable to operationalize that intent within the context of the mission and the battlefield environment, • Is unable to differentiate mission priorities, • Makes straight matches of assets to mission tasks, • Articulates the enemy’s capabilities, but does not consider situational factors that impact the enemy’s probable goals or capabilities, • Identifies basic terrain features that will impact the mission, and • May experience generalized anxiety about performing well without making mistakes, because he or she has no sense of what part of the mission is most important to perform well (Benner, 2004).
Scenario Design Components Advanced beginners are ready to make meaning out of the experiences they glean from simulations. At this level, VEs can supplement existing training by enabling learners to practice implementing tactics that have been newly introduced and employing assets whose capabilities they are learning. In addition, VE simulations for advanced beginners should incorporate enemy and terrain models that are more complex than those in the novice scenarios. Specifically, • Scenarios should reflect an intelligent, dynamic adversary. The enemy should not follow the templates that have been taught in classroom instruction or case study analysis. Enemy forces should move and take action while the learner deliberates about a course of action. The goal is to break the learner out of the mindset that the enemy will be predictable and static. • Scenarios should incorporate terrain that has significant impact on the workability of potential courses of action. For example, movement along a straight, flat road should result in being spotted and engaged by the enemy. Furthermore, enemy courses of action should leverage terrain features (for example, pin friendly forces in a choke point) to illustrate the role of terrain on the battlefield. • Asset capabilities should continue to be exercised. Scenarios should incorporate units that are not full strength or units that have had assets attached or detached. At platoon echelon and below, friendly assets must be depicted as individual moving pieces (soldiers/marines and vehicles) rather than as unit icons that move as a whole. Further, some scenarios should reward learners for thinking ahead about what other assets might be needed or keeping a reserve to deal with future events. Other scenarios should reward the decisive employment of the learner’s full force. Learners need to
Creating Tactical Expertise: Guidance for Scenario Developers
157
develop an understanding of the trade-offs of keeping a reserve element or not and begin to project ahead to assess what might happen in the future that will require preparation and readiness. Finally, scenarios can illustrate how assets can be used to acquire information to reduce levels of uncertainty. Learners should receive useful information (for example, about the enemy’s activities or other important battlefield features) from assets that are positioned to see a wider view than the learners themselves; in this way, advanced beginners can develop mental models about how to proactively acquire information. • The mission required by the scenario should be relatively simple and straightforward and should correspond to tactics and missions that have been taught in classroom or analogous instructional settings. However, some advanced beginner scenarios should incorporate mission tasks that must be prioritized such that learners fail if they do not address the higher priority task first.
Instructional Strategies Advanced beginners would benefit from practicing with the same scenario several times, with performance feedback following each trial and detailed explanations of how performance has improved or denigrated from one trial to the next. Multiple iterations allow learners at this level to understand how different uses of assets and various courses of action impact the outcome. Instructors should encourage experimentation, even with courses of action that are judged to be nonoptimal. It is important for learners to internalize the specific reasons that some courses of action produce better results than others. In addition, learners may find unexpected positive outcomes from a particular course of action. If possible, instructors should be able to introduce small alterations in the environmental conditions to illustrate how variations in situational factors influence the workability and “goodness” of available courses of action. Like novices, advanced beginners still require an instructor to guide and direct their learning process. After action reviews should be instructor led and can address the following lines of questioning: • Utilization of assets. What worked, what did not work, and what factors need to be considered when deciding how to employ assets (for example, morale? readiness?)? • Mission tasks. How were the tasks approached, and why? Which approaches were beneficial, and which were not? Why and why not? • Enemy. What did the enemy do, and why? How did learners know what he was doing? What information led to their assessments? Were their assessments accurate, and why or why not? • Terrain. What features were noticed during planning? How did terrain impact the mission during execution, and why? How would the learners approach the terrain layout differently next time?
Competent Individuals Indicators of Proficiency Level Heightened planning is the hallmark of the competent stage. A competent performer is likely to show the following behaviors:
158
Learning, Requirements, and Metrics
• Is able to predict immediate futures and therefore takes a planful approach (Benner, 2004), • Experiences confusion when the plan does not progress as predicted (Benner, 2004), • Experiences anxiety that is specific to the situation as opposed to generalized anxiety (for example, am I doing this right with regard to this part of the situation?) (Benner, 2004), • Differentiates mission priorities, • Deliberately analyzes what has to occur in order for intent to be achieved, • Considers trade-offs of using assets for various purposes and for keeping a reserve, • Projects forward about what other assets might be needed as the mission progresses, • Generates ideas about what the enemy might be thinking and what the objective might be, but does not have a specific assessment that drives decisions, • Considers the enemy’s capabilities in the context of the terrain and other situational factors, and • Incorporates terrain features into the plan and considers the effects of the terrain on assets employed or needed.
Scenario Design Components Scenarios for competent performers should enable continued development of Asset, Mission, Enemy, and Terrain mental models, but in the context of the Consider Timing and Consider Contingencies cognitive processes. That is, scenarios should present situations where success relies on the timing and sequencing of the operation, planning for contingencies and adapting contingency plans as the mission progresses. • Scenarios should introduce surprises during the execution of missions to provide practice in rapidly responding to changing situations. For example, friendly units could become unable to perform (for example, because they cannot reach their intended position or because a weapon system breaks down), the enemy could move in a nontraditional way or bring a larger force than was expected, key roads could be too muddy to traverse or blocked by locals, or higher headquarters (HQ) could deliver a new fragmentary order based on an opportunistic target. • Scenarios should present conflicts that require prioritization of mission tasks. Learners need to be forced to determine which part of the mission order is most important to higher headquarters based on the commander’s intent. Success should be contingent on taking actions that support intent. • Mission orders should incorporate strict time requirements or the need to synchronize assets or effects, and the scenarios should build in realistic timing of force movement and engagement with the enemy. When success relies on appropriate timing of actions, learners will be forced to make judgments about how long the prerequisite tasks or movements will take. These cases will enable learners to strengthen their mental models about the timing of certain tasks and set up opportunities to learn how to adjust when events do not happen in the planned sequence. • Scenarios should require proper sequencing of tasks in order for the learner to accomplish the mission. That is, learners should be able to see how the mission breaks down
Creating Tactical Expertise: Guidance for Scenario Developers
159
when certain tasks, such as thorough route reconnaissance, are not accomplished prior to other tasks, such as moving forces along a route. • Scenarios should introduce the utility of nonorganic and nonmilitary assets. Learners can be encouraged to request assets from higher headquarters or another unit by realizing that the mission can be accomplished only by accessing those assets. Also, scenarios can present civilian resources such as host nation police, village elders, or relief workers who can provide information or serve important roles (such as communicating with the local populace).
Instructional Strategies At the competent level, instructors play a key role in mental model development, but their participation at the competent, proficient, and expert levels is not required as persistently as it is for novices and advanced beginners. In lieu of an instructor, feedback can be delivered by developing expert responses against which learners can compare their own performances. The unit leader could function as facilitator and elicit peer feedback from participants. Alternatively, feedback can be generated within the VE system by illuminating situational cues, factors, or demands that should have prompted learners to change their approaches or move to a contingency plan. Regardless of the instructional medium, the following issues should be addressed with individuals at the competent level: • Prior to execution, contingencies. What are the different ways the plan could play out, and how would the learner know if that were happening? • Prior to execution, the enemy. What might the enemy be attempting to do, and why? How might the learner assess the enemy’s objectives as the situation plays out? What information should the learner be seeking? • Prior to execution, terrain. What are the critical terrain features on the battlefield? How might they impact both friendly and enemy courses of action? How might terrain be leveraged and used against the enemy? How might the enemy leverage terrain features and use them against friendlies? • Mission plan. Why did the plan break down? What should have been the early indicators that the plan would not play out as intended? • Situation. What were the cues and factors available? How might they have been interpreted? • Timing and sequencing. What issues regarding timing and sequencing needed to be considered, and why? • The Big Picture. What was higher HQ trying to accomplish? What was the learner’s role in accomplishing the larger mission? Did the learner contribute in useful ways to the larger mission?
Proficient Individuals Indicators of Proficiency Level The proficient stage is marked by a qualitative leap from being guided by the formal plan to being guided by the evolving situation. Proficient individuals
160
Learning, Requirements, and Metrics
intuitively recognize changes to the situation, but still deliberate about how to handle the new circumstances (that is, determine what course of action will meet the objectives). Proficient performers are likely to show the following behaviors: • Describe that they changed their perspective or situation assessment during the course of a situation or notice that the situation is not actually what they anticipated it to be (Benner, 2004), • Recognize changes in the situation (for example, due to new information) that will impact or interfere with achieving intent, • Deliberately analyze courses of action to determine the best one for the situation, • Recognize the utility and importance of nonmilitary assets, such as civilian officials or village elders, • Consider their organic assets as parts of a larger team of friendly assets working to achieve a common goal, • Articulate timing and sequencing issues, • Assess the enemy’s objectives and intent based on situational factors, and • Describe key aspects of terrain for both friendly and enemy courses of action.
Scenario Design Components Scenarios for proficient individuals should incorporate high levels of complexity, ambiguity and uncertainty, sophisticated coordination requirements, and situations that evolve and change rapidly into tough dilemmas. More specifically, • Scenarios should present situations where accomplishing the commander’s intent requires a different approach than accomplishing the explicit mission tasks. • Scenarios should incorporate an enemy who uses nonconventional forces and techniques. For example, the enemy could use civilian vehicles, dress deceptively, or otherwise mislead. • Scenarios should incorporate substantial situational changes during execution to force the learner to revise the existing course of action or develop a new one on the fly. Proficient performers should be skilled at recognizing how the situation has changed, but they require multiple repetitions in order to develop and refine the action scripts within their mental models. • Scenarios should incorporate feedback on secondary and tertiary consequences of action. For example, in a counterinsurgency mission, an emotion-driven decision to provide assistance to desperate locals rather than to continue with the original mission may have consequences for mission accomplishment and domino into a larger impact on the operation. Depending on the situation, an action like this could prompt locals to set unwarranted expectations about how relief is provided, bog down relief efforts for a greater need elsewhere, or have political ramifications. • Scenarios should require timing, sequencing, and coordination between and across units rather than only within the learner’s own organic assets. This enables learners to form mental models of friendly forces as a larger team effort and to understand the capabilities and limitations of other dissimilar units (for example, air or artillery).
Creating Tactical Expertise: Guidance for Scenario Developers
161
Instructional Strategies The facilitation, in whatever form it takes, should exhaust the learner’s way of understanding and approaching the situation. Learners should be required to cite their own personal experiences for perspective on their views of the situation depicted in the scenario (Benner, 2004). Benner recommends that instructors teach inductively, where the learners see the situation and then supply their own way of understanding the situation. When an instructor is available, semistructured time-outs during execution of the scenarios are beneficial. These periods of inquiry and reflection encourage learners to discuss their current interpretations of the situation, their mental simulations of how the situation may play out, and their ideas about what courses of action can produce the desired results. Discussion among the learners is nearly as valuable for proficient performers as the probes and dialogue with the instructor. Likewise, after action reviews should encourage dialogue and questioning between the instructor and learners about their interpretations of the situation, their mental simulations and visualizations of the battlefield, and especially their consideration of how various courses of action supported or failed to support the mission goals. When an instructor is unavailable, alternate approaches can provide adequate substitutions. First, individuals at this stage of development can learn quite a bit from their peers. Semistructured after action reviews can be provided to groups of learners to guide their discussion of the exercise. The reviews should focus on the same questions used when an instructor is present—how the situation was assessed, how learners projected into the future, and the rationale for the courses of action employed or adjustments made. In addition, learners should be encouraged to share past experiences that have influenced their thinking about the scenario. Just as competent learners are likely to benefit from expert responses, proficient learners can also use information generated from experts as another instructor-free approach. For proficient performers, expert responses should include very detailed information about how experts thought about the scenario at multiple intervals within the scenario. This information can be generated by conducting in-depth, cognitive task analysis-like interviews with a few expert tacticians. Learners should be provided with experts’ interpretations of the situation, including the cues and factors they recognized pertaining to the enemy objective, the friendly status, and other aspects of the battlefield (for example, terrain or noncombatants). They should see the experts’ projections (that is, mental simulations) about how the situation would play out and the rationale for those projections as well as visualizations of first-, second-, and third-order consequences. There should be a discussion about the courses of action taken by the experts along with a detailed rationale regarding asset allocation, prioritization and primary goal(s), and aspects of timing and/or sequencing. Other topics to review following VE training sessions, with or without instructor leadership, include the following:
162
Learning, Requirements, and Metrics
• The larger picture. What is the larger organization trying to accomplish? How can the learner develop opportunities for the larger organization or otherwise feed the overall objective over and above his or her own mission tasks? • Enemy intent. What is his likely intent? What aspects of the situation could have revealed clues about his intent? How can intent be denied by friendly forces? • Contingencies. In what other ways could the situation have played out? What situational cues would suggest those particular outcomes? What responses (that is, courses of action) would be appropriate for the various contingencies? • Actions. What courses of action could be taken in response to changes in the dynamics of the situation? What are the relative advantages and disadvantages of each?
Experts Indicators of Proficiency Level When an individual moves from proficient to expert status, the main change is his or her ability to intuitively recognize a good course of action. Experts typically show the following behaviors: • Exhibit good metacognition (Kraiger, Ford, & Salas, 1993), meaning they can accurately gauge their own abilities to deal with the situation at hand; • Intuitively generate a plan and take actions that will achieve the commander’s intent; • Eliminate obstacles to higher headquarters’ intent or present opportunities to other units to support achievement of intent; • Fluidly leverage and coordinate organic, nonorganic, and nonmilitary assets; • Understand how to deny the enemy intent; • Use the terrain to create advantages for friendlies and disadvantages for the enemy.
Scenario Design Components Experts involved in VE training sessions may reap the greatest benefit as a mentor to less experienced tacticians. By coaching and being forced to communicate what they know to others, they reflect on and thus strengthen their existing mental models. It may also be possible to develop garden path scenarios (Feltovich et al., 1984) in VE scenarios to challenge the fine discriminations within experts’ mental models. Instructional Strategies Experts benefit from peer discussions reflecting on shared real world experiences, full-scale exercises or simulations, or operational planning sessions (for example, plan development to address a potential crisis situation in a real world “hot spot”). Discussions could be structured to address the following: • Enemy intent. What was (is) the enemy’s objective, and why? What situational cues and factors led to that assessment? At what point in the mission did the enemy’s intent and course of action become clear? What were the key indicators?
Creating Tactical Expertise: Guidance for Scenario Developers
163
• Big picture. How did (could) individual units, or joint/coalition forces, work together to meet the overarching mission? Were assets shared in ways that supported mission accomplishment? What other configurations of assets could have addressed the larger mission intent rather than unit-specific orders? • Contingencies. Did the mission play out in unexpected ways that were not imagined in contingency planning sessions? When was the change noticed? Were there early indicators that could have revealed the new direction to commanders sooner? • Visualization. What were (are) the friendly and enemy leverage points on the battlefield? How did (could) friendly forces deny enemy intent by using the terrain, nonconventional assets (for example, civilians), and other resources or strategies?
SUMMARY This chapter, Klein and Baxter’s chapter (Volume 1, Section 1, Chapter 3), and Ross, Phillips, and Cohn’s chapter (Volume 1, Section 1, Chapter 4) set out to integrate a broad range of research and applications to form a coherent framework for improving VE training for tactical decision making. The framework is grounded in empirical research, but it is by no means complete. Several questions remain, especially with regard to application of the principles to specific training development efforts. These chapters provide a starting point from which the training community can research and evaluate the assertions and refine and evolve the framework to make it more useful as a guide for VE training that will effectively prepare tactical decision makers for current and future challenges. REFERENCES Benner, P. (1984). From novice to expert: Excellence and power in clinical nursing practice. Menlo Park, CA: Addison-Wesley Publishing Company Nursing Division. Benner, P. (2004). Using the Dreyfus model of skill acquisition to describe and interpret skill acquisition and clinical judgment in nursing practice and education. Bulletin of Science, Technology & Society, 24(3), 189–199. Dreyfus, H. L., & Dreyfus, S. E. (1986). Mind over machine: The power of human intuitive expertise in the era of the computer. New York: The Free Press. Feltovich, P. J., Johnson, P. E., Moller, J. H., & Swanson, D. B. (1984). LCS: The role and development of medical knowledge in diagnostic expertise. In W. J. Clancey & E. H. Shortliffe (Eds.), Readings in medical artificial intelligence: The first decade (pp. 275–319). Reading, MA: Addison-Wesley. Gentner, D. (2002). Mental models, psychology of. In N. J. Smelser & P. B. Bates (Eds.), International Encyclopedia of the Social and Behavioral Sciences (pp. 9683–9687). Amsterdam, The Netherlands: Elsevier Science. Glaser, R., & Baxter, G. P. (2000). Assessing active knowledge (Tech. Rep. for Center for the Study of Evaluation). Los Angeles, CA: University of California. Houldsworth, B., O’Brien, J., Butler, J., & Edwards, J. (1997). Learning in the restructured workplace: A case study. Education and Training, 39(6), 211–218.
164
Learning, Requirements, and Metrics
Kraiger, K., Ford, K., & Salas, E. (1993). Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation. Journal of Applied Psychology, 78, 311–328. Lester, S. (2005). Novice to expert: The Dreyfus model of skill acquisition. Retrieved May 1, 2008, from http://www.sld.demon.co.uk/dreyfus.pdf Lussier, J. W., Shadrick, S. B., & Prevou, M. I. (2003). Think like a Commander prototype: Instructor’s guide to adaptive thinking (ARI Research Product No. 2003-01). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. McElroy, E., Greiner, D., & de Chesnay, M. (1991). Application of the skill acquisition model to the teaching of psychotherapy. Archives of Psychiatric Nursing, 5(2), 113–117. Ross, K. G., Battaglia, D. A., Hutton, R. J. B., & Crandall, B. (2003). Development of an instructional model for tutoring tactical thinking (Final Tech. Rep. for Subcontract No. SHAI-COMM-01; Prime Contract DASW01-01-C-0039 submitted to SHAI, San Mateo, CA). Fairborn, OH: Klein Associates. Ross, K. G., Battaglia, D. A., Phillips, J. K., Domeshek, E. A., & Lussier, J. W. (2003). Mental models underlying tactical thinking skills. Proceedings of the Interservice/ Industry Training, Simulation, and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. Ross, K. G., Phillips, J. K., Klein, G., & Cohn, J. (2005). Creating expertise: A framework to guide technology-based training (Final Tech. Rep. for Contract No. M67854-04C-8035 submitted to MARCORSYSCOM/PMTRASYS). Fairborn, OH: Klein Associates. Shanteau, J. (1992). Competence in experts: The role of task characteristics. Organizational Behavior and Human Decision Processes, 53, 252–266. Spiro, R. J., Feltovich, P. J., Jacobson, M. J., & Coulson, R. L. (1992). Cognitive flexibility, constructivism, and hypertext: Random access instruction for advanced knowledge acquisition in ill-structured domains. In T. Duffy & D. Jonassen (Eds.), Constructivism and the technology of instruction: A conversation (pp. 57–75). Mahwah, NJ: Lawrence Erlbaum.
Part VI: Requirements Analysis
Chapter 9
TRAINING SYSTEMS REQUIREMENTS ANALYSIS Laura Milham, Meredith Bell Carroll, Kay Stanney, and William Becker Recent and upcoming advances in virtual environment (VE) technology provide the infrastructure upon which to build state-of-the-art, interactive training systems (Schmorrow, Solhan, Templeman, Worcester, & Patrey, 2003; Stanney & Zyda, 2002). An important advantage afforded by the use of VE systems is the ability to instruct and practice targeted training objectives otherwise restricted by the resource costs, potential dangers, and/or limited availability of live training. VEs afford instructors the ability to increase training efficiency and effectiveness by capitalizing on several advances in simulator technology (for example, appropriate levels of cue fidelity, immersion, portability, practice iteration, and so forth). However, not all VE training systems realize this potential. Developing operationally, theoretically, and empirically driven requirements for VE training systems is a key component to ensure training effectiveness; this chapter discusses two methods for achieving such requirements specifications: operational requirements analysis and human performance requirements analysis. Operational requirements analysis (ORA) focuses on identifying training goals at contextually appropriate task and expertise levels. Through this process, systems developers can take into account the training goals and gaps to be addressed by the system and allow critical contextual considerations to drive system requirements. Systems developed without regard to the operational context can result in suboptimal training solutions, which are ineffective, expensive, and limited in future utility. The ORA process is similar to long-standing system engineering practices that follow a logical sequence of activities to transform an operational need into a specification of the preferred system configuration and associated performance requirements (Goode & Machol, 1957), the difference being context. In ORA, system requirements evolve through a deep understanding of the mission context associated with the operational environment. The outcome from ORA is then fed forward to human performance requirements analysis (HPRA), which translates ORA data into metrics that can be used to assess basic
166
Learning, Requirements, and Metrics
skilled performance and higher order skill sets, such as the development of situation awareness and the conduct of decision making. HPRA seeks to ensure the training system is meeting its overall intended training goal of providing the skill sets needed to support mission outcomes. TRAINING SYSTEM REQUIREMENTS ANALYSIS The overall goal of training systems requirements analysis is to design a system that trains to standards. The process illustrated in Figure 9.1 commences at identification of these standards and cycles through the four stages necessary to ensure that a training system affords attainment of performance that meets these standards. These stages include the following: (1) identification of training goals, (2) development of design specifications that facilitate training goals being met, (3) development of metrics to evaluate if training goals are being met, and (4) development of training management methods that promote attainment of training goals (see Figure 9.2). ORA is concerned with the domain based side of the training system requirements analysis process, which compels consideration of the target operational environment and mission context. HPRA aims to integrate what is known about human performance and learning to measure learning of targeted training objectives. These two distinct, yet interdependent drivers work in cooperation throughout the process to ensure that effective requirements are specified. As such, it is
Figure 9.1. Training System Design Lifecycle (Adapted from Milham, Cuevas, Stanney, Clark, & Compton, 2004)
Training Systems Requirements Analysis
167
Figure 9.2. Training System Design: Operational and Human Performance Requirements
necessary to provide an integrated overview of how these two components comprise the training system design lifecycle. The following section provides an illustration and steps through the four stages of the design lifecycle, summarizing the interaction between the two drivers (that is, ORA and HPRA) and the impact each has on the overall design lifecycle. Training System Design Lifecycle The first stage in the training system design lifecycle is to understand training goals by analyzing the operational context (see Figure 9.1). In the second stage, the training goals are defined in terms of how the task is performed in the operational environment, and further into multimodal sensory information requirements, from which system design specifications can be derived. The third stage takes these data and feeds forward to the HPRA process, which decomposes the training goals from a human performance standpoint into the knowledge, skills, and attitudes (KSAs) required for successful performance of the task. From this metrics are defined that support performance measurement, which is used to assess if the system is meeting the targeted training goals. From an operational standpoint, metrics target the degree that mission performance was successful. From a human performance standpoint, the focus is on whether or not learning
168
Learning, Requirements, and Metrics
is occurring, with respect to the required KSAs. The final stage in the training system design lifecycle is training management. In order to ensure training goals are met, not only do trainees need to be able to practice targeted tasks and training objectives effectively (for example, in scenarios), but there needs to be an element of feedback to facilitate learning. As part of training management, scenarios and scenario manipulation variables can be developed to facilitate performance improvement on targeted tasks. Additionally, performance diagnosis based on metrics can facilitate performance summaries to support instructors in after action review and feedback to facilitate trainee performance improvements. Overviews of the two components of the training system design lifecycle, ORA and HPRA, will now be provided, along with a discussion of why each component is a necessary part of the design lifecycle. OPERATIONAL REQUIREMENTS ANALYSIS Operational requirements analysis involves identifying operationally driven training requirements based on the target task or mission. Operational requirements are defined by the skill set or task set that an operator is required to perform in a system and used to ensure a system operates in a manner consistent with user needs and expectations (Fairley & Thayer, 1997). Operational requirements are the primary building blocks of system requirements and thus ORA serves as the first and foremost step in ensuring that the mission’s targeted spectrum of tasks is supported in the training system. Need for ORA Ensuring training objectives are effectively targeted through proper consideration of operational context is critical as demonstrated by the success of methods, such as event based approach to training (Fowlkes, Dwyer, Oser, & Salas, 1998), which ensure events are embedded in training to provide practice opportunities on targeted training objectives. If developers lack comprehension of mission requirements, including tasks and task requirements, there is potential for training system requirement specifications to lack vital information an operator relies on to complete a task or functionality that is critical to task performance. Further, given that training occurs in a proxy environment (not the operational environment), ORA needs to consider how that environment needs to be constructed to facilitate effective translation of inputs and output in the training environment to the real world. In other words, ORA must specify the sensory cues needed and the appropriate level of fidelity to facilitate transfer. From a practical standpoint, the generally limited resources applied to training necessitate a careful examination of the costly visual, auditory, and haptic human system interfaces that are required. When fidelity levels are too low, then trainees may not receive the multimodal information necessary to build situation awareness, make decisions, or even react to an emerging situation in a way that is similar to the operational environment. For example, a ship engineer may be training on how to react to various emergency situations. If the training control panel does not match that
Training Systems Requirements Analysis
169
of his own craft, he may learn (or overlearn) how to perform the emergency procedures almost automatically—that is—without thinking about the manual steps he is going through. In that situation, if a button or knob is in a different location, or has an alternate function, he may unconsciously perform the highly trained skill incorrectly without realizing it. In the case of providing too high of a fidelity level, it may be that expensive state-of-the-art technologies with a high wow factor are included, without real consideration of their impact on training effectiveness. In addition, it may be that training resources are spent on visuals, for example, without considering the impact of auditory cues on performance in a select domain. In these cases, the costs can be excessive compared to the actual training value gained by implementing the highest fidelity system (for example, is an expensive, fully immersive training system the most cost-effective way of training targeted skills? Is there a more cost-effective alternative to train the same skills?). ORA seeks to ensure that fidelity requirements are based on target training goals and tasks. Otherwise, training systems may lead to ineffectively trained skills, untrained skills, negative training (that is, trained skills and procedures that will negatively affect, even impair, task performance in the real world), or unnecessary levels of fidelity in which the cost of the technology is disproportionate to the training value added. HUMAN PERFORMANCE REQUIREMENTS ANALYSIS Human performance requirements are those driven by human information processing and human performance, as well as human learning (knowledge and skill acquisition) needs. In terms of KSAs, knowledge refers to long-term memory stores, including declarative/semantic knowledge (facts), episodic knowledge (past events), and procedural knowledge (how to). Knowledge can be a foundation for skill performance (for example, procedural knowledge, situation awareness, and so forth) or more global/abstract, such as declarative knowledge. Skill refers to a level of proficiency on a specific task or limited group of tasks (for example, perceptual skills), which is “acquired through extended practice and training” (Ericsson & Oliver, 1995, p. 3). By decomposing tasks from the task analyses into the required knowledge and skills necessary to facilitate task performance, measures of learning can be developed to ensure effective training of the task. Additionally, by incorporating consideration of how humans learn different knowledge and skill types (for example, declarative knowledge, psychomotor skills, perceptual skills, and so forth), training management strategies can be incorporated to accelerate learning. Need for HPRA Consideration of human learning of key knowledge and skills in training system designs is fundamental. In fact, the definition of training is rooted in the view that training encompasses activities that are aimed at leading to skilled behavior. Thus, HPRA considers how target knowledge and skills are best learned. A prevalent
170
Learning, Requirements, and Metrics
finding in the training literature is that practice does not equal training. Simply building a training system that mimics the operational environment (essentially allowing practice) does not ensure learning will occur. HPRA seeks to provide the means to assess human performance and identify instructional elements, such as feedback, that can be incorporated into the training system design to accelerate learning. As such, the HPRA process involves the incorporation of training management systems, which map instructional design principles onto simulation systems, defining several components that facilitate the detection, monitoring, and diagnosis of trainee performance to drive feedback aimed at calling attention to problem performance areas and allowing remediation opportunities. Now that overviews of ORA and HPRA have been provided, the stages of the training system design lifecycle will be discussed in detail. Stage 1: Training Needs/Goals Identification The first stage in the training system design lifecycle is to understand training goals by conducting a training needs analysis (TNA), through which an understanding of the operational context (that is, understanding the user/trainee and task characteristics) is attained. By addressing operational context, designers can create system requirements to target not only the appropriate tasks (for example, shoot the enemy), but the appropriate training objectives associated with each task (for example, procedural steps to arm a weapon system). TNA is a process of gathering and interpreting data from the target training community and operational environment in an effort to identify performance gaps and formulate training solutions (Cohn et al., 2007). TNA focuses on providing data concerning current versus desired performance and knowledge in an identified gap area, attitudes toward the targeted performance gap, causes of or contributing factors toward the performance gap, and potential solutions. A comprehensive TNA provides the basis from which to design training solutions that realize substantial improvements in human performance by closing the identified performance gap in a manner that is compatible with current training practices. Training Goals There are several different aspects of training goals that must be considered when performing a TNA, including those that describe the intended users and those that describe the intended use. One aspect is mission scope. • For instance, is the goal to train a full mission or to simply target a specific part of a mission (part task)?
Another aspect is mission characteristics. • For instance, given a task that involves both individual and team coordination, is the goal only to train individual skills or team skills as well? • Is the intention for the trainee to be able to perform the task under stress or in a basic pedagogical environment?
Training Systems Requirements Analysis
171
Many training courses, particularly in the military, have clearly defined training goals, including mission scope and characteristics, described in such documents as a training and readiness manual. Training and readiness manuals often provide very detailed descriptions of the prerequisites to a course, the tasks that will be targeted in the course at both individual and team levels, and even the level of performance on the task required to successfully complete training. As described through terminal learning objectives, enabling learning objectives, and mission essential task lists, both the desired end state and the incremental steps necessary to get there are often provided, including expected performance in a live-fire environment upon completion of training. In the case where these are not available, not defined clearly, or not at a granular enough level, such requirements must be identified through observation and collaboration with the instructor to ensure the training objectives identified are in line with the organization’s training goals. Also of importance is to identify and characterize the range of target users. • Will trainees all be novice users who will be learning the task for the first time? • Are the trainees at a high enough expertise level that refresher training can be targeted? • What are trainee attitudes toward the task being trained and use of training technology?
Subject matter expert (SME) interviews and questionnaires can be used to obtain information to develop user profiles and ensure the training solution is designed to be compatible with the target training community, environment, and culture. Table 9.1 provides an example of a user profile for a U.S. Marine Corps Fire Support Team (FiST) trainee. Table 9.1. User Profile for U.S. Marine Corps Fire Support Team Trainee Demographics — Age 18–35, 100 percent male — Male population indicates risk of color blindness — English as primary language, U.S. culture — High school graduate to college education Knowledge, Skill Levels — No to low level experience in FiST operations — Declarative and procedural knowledge from classroom and practical application training, practice on target skills in live fire exercises — Low number of deployments Attitudes — Perceived importance of task high due to predeployment status — Motivated to use training technology to learn task — Little experience with VE training systems, so few biases
172
Learning, Requirements, and Metrics
Another consideration in identifying training goals is identification of training gaps (Rossett, 1987; Tessmer, McCann, & Ludvigsen, 1999). Often training systems are designed either to replace or be integrated into a curriculum with other training solutions. In such cases, training goals may be defined by training gaps that the training system can effectively bridge. These could be complete mission tasks, which existing training solutions could not target, or areas in which current training solutions are not producing sufficient training results. These gaps can provide training goals for which the training system has the opportunity to have the most impact. Through examination of the training curriculum and SME interviews, untargeted tasks can be identified. Through discussions with instructors and examination of performance records, it is possible to identify the tasks for which there are consistent patterns of suboptimal performance that are in need of augmentation or acceleration of learning. Considerations to help identify gaps by examining curriculum insertion points include the following: • Is this system being designed to be used in a succession of training simulators that will build on each other? • If so, is it preceded only by classroom training in which case the trainees will likely have only declarative knowledge on which to build? • Which tasks currently are/are not targeted with each training system in the curriculum?
Through SME interviews and questionnaires, target performance gaps, current versus desired performance and knowledge in the identified gap area, attitudes toward the targeted performance gap, causes of or contributing factors toward the performance gap, and potential solutions can be identified (see the example in Table 9.2). In summary, TNA determines the who (that is, target training community), what (that is, target training tasks), when (that is, point of insertion), and where (that is, context of insertion) of the envisioned training solution. Past field work suggests the following lessons learned: • The identification of target trainees is critical to define the expertise of skill sets to be learned. • The TNA should help focus ideas on content and methods of delivery. • Dividing training gaps into must achieve, desirable, and desirable but not necessary can help drive design. • TNA should be informed not just by existing training solutions, but also by the training and education literature.
Stage 2: User-Centered Design Specification Once the training goals have been identified, the next stage is to develop usercentered design specifications through task analyses. The task analysis identifies how target training objectives are achieved in the operational environment. In short, task analysis involves reviewing documentation to gain familiarity with
Training Systems Requirements Analysis
173
Table 9.2. Training Needs Analysis Questions and Example Answers to Identify Performance Gaps for U.S. Marine Corps Fire Support Teams Some questions that might be asked to determine training needs are (cast a broad net) as follows: • What specific skills and abilities do your personnel need or need to improve on?
Team communication/coordination skills
• If you could change one thing in the manner in which your personnel currently perform their tasks, what would it be?
Information sharing, which involves the lead plotting of all pertinent information on a battle board and personnel checking the battle board for relevant information.
• What knowledge, skills, and attitudes would you most like for your personnel to be trained on?
Knowledge: coordination and deconfliction methods
Skills: communication skills
Attitudes: confidence/assertiveness
Narrow Down: • Which tasks are currently not performed at ideal performance levels (note the most important tasks)?
9 line planning
Suppression of enemy air defense (SEAD) planning
Call for fire (CFF) planning
Mission communication
Correction from mark
Visual acquisition and terminal control of aircraft
• What currently prevents personnel from performing these tasks at ideal performance levels?
Lack of taking all pertinent information into account when planning
Incorrect communication procedures
Incorrect scanning methods
Difficulty of detecting aircraft
• What are the ideal performance levels for these tasks?
Time to plan 9 line (<10 min), inclusion of all 9 lines, and coordination FiST lead
Time to plan SEAD (<10 min), deconfliction with aircraft path, and appropriate timeline
174
Learning, Requirements, and Metrics
Discuss Potential Solutions: • How could technology potentially support your personnel in supporting their training needs?
Technology solution: laptop team trainer that presents all team members with terrain view, FiST lead with virtual battle board, forward observer (FO) with virtual CFF sheet and tools and forward air controller (FAC) with virtual 9 line form and tools. Laptops can be set up next to each other to support operationally relevant team coordination.
practices and procedures and to identify task flows, and leveraging observation opportunities and SME interviews to characterize task knowledge requirements. The task analysis should identify trainee information processing requirements, including precise characterization of the inputs (for example, system and environmental cues and feedback) a trainee must receive and outputs (for example, actions, responses, and communication) the trainee must convey, which provide a basis from which to derive fidelity requirements. This starts with identifying mission goals: • What are the trainees trying to achieve in their missions or part missions being trained? • What are the desired outcomes (for example, kill the enemy, report information, and so forth)?
The task analysis continues with the missions being decomposed into tasks and subtasks, as well as task flow, performer(s), performer responsibilities, and tools requirements. Given the mission goal, • What are the steps the performer(s) has to take to successfully complete the mission? • What are the sequence and flow of information? • Who are the key players? • Who is primarily responsible for which tasks? • What tools do they depend on to complete the mission?
The process of identifying the answers to these questions often starts with a review of relevant documentation. With respect to military training, this is typically the training and readiness manual, as well as other military doctrine publications, which provide doctrinal guidance and detailed information on tactics, techniques, and procedures to be employed in different mission types (for example, Joint Publication 3-09.3, Joint Tactics, Techniques, and Procedures for Close Air Support). Upfront documentation review allows a practitioner to be better prepared to collect data via observation and interviews as it facilitates development of a general framework from which to work (typically general task flow), a foundational knowledge on which to build, an understanding of the nomenclature, and an idea of gaps in knowledge to allow direct queries. After
Training Systems Requirements Analysis
175
documentation review, observation of operational performance (rare opportunities in the military) or training exercises (limited opportunities) can provide extremely detailed information with respect to task decomposition into subtasks, team member roles, tools utilized, and more concrete task flow information. Instructor SME interviews are best utilized to fill gaps and drill down to very detailed levels (for example, exceptions to doctrine procedure). As an SME’s time is typically extremely limited, it is critical to utilize this time wisely. If a practitioner queries an SME on basic information he or she knows is easily accessible via documentation provided, the expert may not be so generous with his or her time in the future. Additionally, if SME time is so limited that a structured interview is not possible, a second option is to develop questionnaires that the SME can complete incrementally as time provides. Table 9.3 provides a brief list of types of data resulting from a task analysis. As a next step in developing requirements, the outcome from the task analysis can be used to feed a sensory task analysis, which is used to identify the multimodal cues and functionalities experienced during real world performance. Here is where rich contextual data are gathered and leveraged in the training system design. Sensory task analysis is conducted to determine how trainees gather information from the operational environment and how they act upon the environment in the real world. For each task and subtask, one must identify the multimodal cues (visual, auditory, haptic, and so forth) that the operator relies upon to perceive and comprehend the surrounding environment in order to successfully complete the task. From the tap on the shoulder from a teammate, to the geometry of an incoming aircraft, to the crunch of the ground beneath a tiptoeing enemy’s foot, relevant multimodal information requirements must be identified for each training objective. Table 9.3. Example Types of Data from Task Analysis • Task descriptions, derived from observational analysis and associated documentation (for example, doctrine, field manuals, flow charts, training materials), provide a usercentered model of tasks as they are currently performed. The data to focus on include a. What is the general flow of task activity? b. What is the timing of each task step? c. How frequently is the task performed? d. How difficult or complex is the task? e. How important is the task to overall human-system performance? f. What are the consequences of task errors or omission of the task? • Is the task performed individually or as part of a collective set of tasks, or does it require coordination with other personnel? g. If part of a collective set, what are the interrelationships between the set of tasks? h. If coordination is required, what are the roles and responsibilities of each individual in accomplishing the task?
176
Learning, Requirements, and Metrics
Knowing the multimodal cues the performer depends on is not enough to capture the context of the operational environment; it is also necessary to deduce which aspects of the cues are relied upon and how the cues are used. For example, do personnel on the ground rely on merely a spot of black in the sky to detect incoming aircraft, or do they have to be able to see the wing positions in order to make fine discriminations of aircraft dynamics to assess if the aircraft is pointed at the correct target? It is important to define the task at this level of detail in order for operational requirements to be matched with KSAs to ensure cues are presented at an appropriate level of fidelity to allow successful performance of tasks and effective learning, without unnecessary technology costs. To conduct the sensory task analysis, working from the task/subtask framework, one can use a structured interview or questionnaire to probe an SME or instructor with the following types of questions: • For this subtask, what cues in the environment do you have to see to perform the task? • What sounds do you have to hear? • What physical cues do you have to feel? • Which aspects of the cues are relied upon and how are the cues used? • What actions do you have to perform in response?
Taken together, outcomes from the task analysis and sensory task analysis, as well as an examination of the literature (compare Milham, Hale, Stanney, & Cohn, 2005; Jones, Stanney, & Foaud, 2005; Hale & Stanney, 2006; Samman, Jones, Stanney, & Graeber, 2005), can be used to determine how the identified cues should be presented to the trainee to afford training of the task, detailing the levels of functional and physical fidelity required to allow the task to be effectively trained. Functional fidelity describes the degree to which a simulation imitates the information or stimulus/response options present in the real world (Swezey & Llaneras, 1997). Systems that require high levels of functional fidelity must have authentic relationships between operator inputs and outputs, but may not require spatially or physically accurate representations of system components (that is, physical fidelity). Physical fidelity is the degree to which a simulation imitates the multisensory (that is, visual, auditory, haptic, and olfactory) characteristics present in the real world. In systems that require high physical fidelity, it is important to capture cues beyond just the visual that are required to develop situation awareness about a mission. Further, it may be that accurate spatial and physical models are required to develop targeted skill sets. To illustrate, to target weapons handling objectives, a gun may have to be physically identical to the operational gun to allow skill sets of knowing exactly how to manually interact with the gun to facilitate automated behaviors with select weapons. If shoot/noshoot decision making is the objective, however, then functional fidelity may suffice, where the only requirement is that there is capability to allow the trainee to input a shoot decision to an input device and receive feedback on the shot. In developing fidelity requirements, the following should be considered:
Training Systems Requirements Analysis
177
• In perceptual tasks, are multimodal cues presented in sufficiently complex environments to facilitate search and detection? • In procedural tasks, what are the implications for multimodal cues that are not accurately spatially represented? If skills are overlearned and become automatic, is it feasible that negative transfer will occur? • In tasks that require the development of situation awareness, are subtle real world cues represented that are early indicators of unfolding situations? • In tasks that require the gathering and interpretation of multimodal information, how are haptic, olfactory, or other less common cues represented? • For team training objectives, is the team represented with enough fidelity to support realistic interactions?
Based on past work (Stanney, Graeber, & Milham, 2002; Milham, GledhillHolmes, Jones, Hale, & Stanney, 2004; Bell et al., 2006; Milham & Jones, 2005; Jones & Bell, 2006), a list of lessons learned on collecting task data and specifying fidelity levels is provided below: • When provided with access to current and past training curricula and to SMEs to fill in any unresolved questions, comprehensive task analysis can be accomplished with just a few one-to-three day trips to the field. • It is beneficial to interview two to three SMEs (whether instructor or otherwise), as each informant’s perception of training needs may differ slightly, and thus the core needs can be identified by cross-referencing across multiple SMEs. • When conducting a sensory task analysis, it may be difficult for SMEs to articulate the multimodal cues they rely on as they may not realize the extent of the process they perform; therefore, it is important to have structured questions to attain this information and potential to immerse them in scenarios to elicit contextually derived responses. • Often a task is performed under various environmental conditions (for example, day/ night and high/low visibility), and it is important to extract the multimodal cues relied upon across the range of environmental conditions as they may vary. • The goals of the task and targeted skill sets should drive the selection of fidelity.
Stage 3: Metrics Development Metrics are a key component of the training system requirements specification process, allowing collection of relevant data and the synthesis, summary, and diagnosis of trainee strengths and weaknesses. Metrics allow identification of specific tasks and procedures with which trainees struggle to achieve sufficient performance levels. Metrics can be used to identify training gaps and redefine training goals to drive training system design. Building from the task analysis, metrics to assess performance on each subtask can be defined. Specifically, for each target task resulting from the ORA process, further decomposition into the knowledge and skills required for successful performance of the task is performed within the HPRA process. Identifying applicable knowledge and skills provides insight not only with respect to what is required for operational
178
Learning, Requirements, and Metrics
performance, but also how to best train each task. Specifically, there may be different levels of fidelity best suited for targeting competencies (for example, is the individual learning how to detect subtle anomalies to build situation awareness, in which case visual features are critical, or is the individual learning how to use a new display/control interface, in which case out-the-window views are not as critical as cockpit fidelity?). Decomposing tasks into knowledge and skills can require an understanding not only of the operational tasks, but of the competency literature to create a mapping between task characteristics (for example, visual detection and auditory localization) and underlying information processing competencies/KSA constructs. To accomplish this, tasks are evaluated against a taxonomy of human performance, defining whether the skills are individual or team, perceptual, procedural, decision making, and so forth. From this, categories of tasks are classified into the type of skill set they are related to in order to further drill down into the target skill. For example, if the task analysis defines a set of perceptual tasks that is critical to task performance, the sensory task analysis can be reviewed to identify the multimodal cues that are related to performing the perceptual skill. Given the highly intensive nature of this manual process (requiring a review of literature on information processing knowledge and skills, then a mapping to domain tasks), Ahmad et al. (2007) designed a tool that begins to facilitate this mapping for tasks that rely on perceptual skills through the development of a multimodal cue, training objective, and cost matrix that utilized a sensory-perceptual objective task taxonomy (Champney, Carroll, Milham, & Hale, in press) at its core. This taxonomy links generalizable operational multimodal task characteristics with human sensory and perceptual competencies. With this type of tool, task and subtask information is gathered from the domain and broken down to a level that can be matched against the sensory and perceptual knowledge and skills targeted by the learning system to facilitate the development of metrics to monitor learning across KSAs. Metrics often include assessment of (1) occurrence, (2) time, and (3) accuracy. For instance, did the trainee perform the required task? (for example, engage the enemy), did the trainee shoot the enemy in a timely fashion? (for example, time to shoot), and did the trainee effectively engage the enemy? (for example, accuracy of shot and kill/no kill). Metrics can be derived from documentation. However, typically, development of an effective set of performance metrics requires input from an instructor or SME. Instructors may have performance measures they employ to evaluate training. There are occasions, however, when no concrete metrics are available. For many complex military tasks, there are numerous ways that a task can be performed, and while some involve tactics more in line with doctrine than others, if the mission is successful, instructors may not drill down to technicalities. Mission outcome metrics, however, may not be granular enough to assess training system effectiveness with respect to different training objectives (that is, on which tasks or training objective is there suboptimal performance?). As a result, it may be necessary to work with instructors to identify more granular, process-level metrics. Also of importance with respect to metrics
Training Systems Requirements Analysis
179
are performance thresholds. In order to gauge good versus poor performance as opposed to merely changes over time, it is necessary to identify a threshold of acceptable performance. This often proves challenging, as many times such thresholds depend on an array of environmental conditions. In these cases, attempts should be made to identify approximate performance thresholds or if/ then contingency thresholds for use in evaluating trainee performance. This requires stepping through each subtask and asking the instructor to describe how he or she assesses performance on each subtask. Example questions to extract performance metric data include the following: • How is performance on this task currently assessed by instructors (that is, what are the current performance metrics)? • What behaviors distinguish good versus poor performance? • What are the current versus desired performance levels on these tasks?
Once operational performance metrics are defined, these metrics can be linked to associated KSAs and, from the human performance side, metrics can be derived to evaluate learning of these KSAs. Once these linkages have been established, patterns of learning can be tracked to determine if learning on competencies is occurring over time or whether there are breakdowns at the task level (that is, the skill learning is relatively stable, with the exception of when it is conducted for a specific mission task) or at the skill level (that is, the trainee consistently is underperforming a target competency, which is affecting groups of mission tasks that utilize the skill). The following lessons learned have been identified in past metric development efforts (Milham, Gledhill-Holmes, et al., 2004; Bell-Carroll, Jones, Milham, Delos-Santos, & Chang, 2006): • Outcome metrics often revolve around mission success, but it is critical to define process metrics to determine whether or not tasks were accomplished the correct way. • Parameters of performance (target levels) may vary based on expertise of the individual. • Time and accuracy metrics should be clearly understood to reflect the specific goals of the mission (they tend to be inversely related). • Given many performance metrics, it may prove beneficial to have instructors rate criticality or prioritize performance metrics. • KSA metrics may be more meaningfully interpreted when they are mapped onto specific mission events (understanding if decision making was poor for a difficult event versus an easy event)
Stage 4: Training Management Component Development Training management describes the process of understanding training objectives within a domain, creating training events to allow practice of targeted objectives, measuring and diagnosing performance, and providing feedback to trainees (Oser, Cannon-Bowers, Salas, & Dwyer, 1999). Done well, the training management cycle provides trainers with an opportunity to define and measure complex
180
Learning, Requirements, and Metrics
training objectives (for example, situation awareness), track performance gains of trainees, and help diagnose performance deficits (Pruitt, Burns, Wetteland, & Dumestre, 1997). The training management component relies on training objectives and metrics derived from the ORA, relevant KSAs and metrics derived from HPRA to facilitate development of scenario events and scenario manipulation variables, performance diagnosis methods, and performance summaries and training feedback to facilitate targeting training objectives. The primary training management components that result from the ORA process are scenarios that effectively target training objectives. Scenarios incorporate task models and procedures associated with utilization of target KSAs. In scenario based training, “the scenario itself is the curriculum” (Cannon-Bowers, Burns, Salas, & Pruitt, 1998, p. 365). To build a training scenario it is key to identify those variables that require trainees to adjust or adapt their strategies to facilitate development of expertise (Prince & Salas, 1993) on the KSAs. These variables can be used to adjust difficulty through workload, target density, or time provided to accomplish tasks. Variables can include such factors as the ambiguity of targets, distractions, or variations in other factors, such as day or night missions, or even equipment available to support the task. Each variable can be developed into an array of scenario events along a difficulty continuum, resulting in a library of scenario curriculum events from which scenarios can be developed given a targeted difficulty level and training objective. Effective scenario variables can often be identified by examining instructor-developed scenarios that range in difficulty. These variables can also be identified by a SME or instructor through a structured interview or questionnaire similar to the process for identifying metrics. For each subtask, practitioners should ask what the instructor would include in easy, medium, and hard scenarios aimed at different training objectives. Example questions to guide the development of scenarios include the following: • Are there differences in events that would be presented at different levels of difficulty (for example, the number of enemies)? • Are there differences in information that would be presented to trainees based on difficulty (for example, intelligence suggests there is an enemy air defense in the area versus the trainee having to detect the enemy without aid)? • Are there stressors that would be added as difficulty increases?
These questions typically result in numerous variables and variations that can be used in scenario development. Scenarios can be developed from these variables that target different training objectives over a range of difficulty levels. Additionally, to accelerate learning, scenario selection and manipulation can be driven by learning performance. By diagnosing performance on training objectives via performance metrics, future scenario selection or online scenario manipulation can tailor to continued practice on training objectives that show performance decrements or increased/decreased scenario difficulty due to performance stabilization or performance decreases. The primary training management components resulting from HPRA are performance diagnosis and display of this information to instructors and trainees.
Training Systems Requirements Analysis
181
Based on performance during scenarios as measured by mission performance and learning metrics, performance data can be used to support trainers by providing summaries of performance data to facilitate after action review, or the data can be used to directly provide feedback to trainees. As a trainer tool, it is important to determine the information that trainers will need to facilitate quick and tothe-point illustrations of performance breakdowns. It should decrease trainer workload and be able to illustrate key information related to training goals identified by the trainer. Trainers may require highly adaptive interfaces that can meet changing goals and foci. They may want to be able to replay performance from different perspectives or uncover patterns in performance across members of a team, individuals across an entire class, or monitor performance over time to investigate training impact on emerging needs (for example, new tactics). To get these data, it is important to identify how trainers currently conduct training, performance measurement and diagnosis, and after action review. From this, tools can be developed that support trainer needs in diagnosing trainee performance. Questions to ask trainers may include the following: • What training goals are you assessing? • Which metrics help you assess those goals? • How can these data be displayed to provide a quick look to facilitate after action review?
As a tool to provide feedback directly to trainees, data must trigger interventions or provide modules that are aimed at illustrating performance, supporting the identification of errors or breakdowns, and providing learning aids to facilitate improvement on target competencies. Trainees may need self-paced tools that provide them with not only performance feedback, but adjust the training scenario itself to provide directed training aimed at decreasing gaps in performance. Questions to consider for trainees may include the following: • What can the trainee take away from the displayed data? • What does the trainee need to understand to improve learning on deficiencies? • What kinds of data are provided to trainees by trainers to improve performance?
Previous work in developing training management components has resulted in many lessons learned (Jones, Bell, & Milham, 2006; Carroll, Champney, et al., 2007; Carroll, Milham, Champney, Eitelman, & Lockerd, 2007); a partial list includes the following: • Scenario manipulation variables should be clearly mapped to training objectives to facilitate training targeting these objectives. • It is important to understand how multiple scenario manipulations interact to affect difficulty (for example, interaction between time of day and weather conditions). • Observing after action reviews and talking with instructors is key to determining the data to present after simulation performance; a list of all metrics is not typically useful.
182
Learning, Requirements, and Metrics
• Trainees may not benefit from knowledge of results only; further remediation may be required to facilitate learning on breakdowns in performance. • General training strategies can be used across a host of KSAs, such as providing feedback; however, the most effective learning can be achieved with diagnosis (that is, where the breakdown has occurred, what is the deviation from expert/expected performance) tied to specific feedback (for example, knowledge of correct versus observed performance).
CASE STUDY: MOT2IVE SYSTEM DEVELOPMENT EFFORT This section illustrates the training system design lifecycle by presenting its application to the design and evaluation of the Multi-Platform Operational Team Training Immersive Virtual Environment (MOT2IVE) system. Stage 1: MOT2IVE Training Goals/Needs Identification During initial MOT2IVE system development, the target training audience was predeployment and deployed marines who were participating in or had completed FiST training at Tactical Training Exercise Control Group at 29 Palms, California (see Table 9.1). Having performed these skills in low fidelity training systems and/or live-fire exercises, these marines would have moderate skills and would be in the process of consolidating these skills. In order to ensure MOT2IVE targeted appropriate training goals, SMEs were interviewed to assist in developing a set of training gaps known as friction points to guide training system requirements specification, so that MOT2IVE could target areas in which trainees typically demonstrated performance deficiencies. The target training goals identified were those of close air support and call for fire missions to support a commander’s intent as a three-man team, focusing on the friction point training gaps (for example, engagement of all targets and suppression of enemy air defense). Stage 2: MOT2IVE Task Analysis and Design Specifications Utilizing military doctrine and training and readiness manuals, training observation, and SME interviews, the close air support and call for fire missions were decomposed into tasks and subtasks, responsible team members, and associated training gaps. Tasks and subtasks were structured in a sequential manner to indicate task flow as illustrated in the suppression of enemy air defense (SEAD) subtask example in Table 9.4. Building from the FiST task decomposition, subtasks were further broken into the individual and team knowledge and skills required to successfully perform the tasks (Table 9.5). This was completed by first identifying a set of high level knowledge and skills thought to be most relevant to FiST operations, including the following: coordination (information exchange), coordination (mutual
Training Systems Requirements Analysis
183
Table 9.4. FiST Task Decomposition Examples Responsibility Mission Phase
Mission Planning
Training Gap/Goal
Forward Observer
Forward Air Controller
2
Task
Subtask
Evaluate and communicate SEAD effectiveness
Locate where SEAD rounds hit
SEAD
1
Determine if the SEAD mission was effective
SEAD
2
Communicate whether the suppression was effective to the pilot and advise whether to continue mission or not
Engage all targets
FiST Lead
2
1
performance monitoring), coordination (communication), leadership (guidance), leadership (initiative), team situation awareness (threat/friendly awareness), team situation awareness (timeline awareness), adaptive decision making (asset allocation), adaptive decision making (conflict resolution), spatial/relational knowledge/skills, perceptual knowledge, strategic knowledge and decision making, and procedural knowledge/skills. From this list, the FiST task was examined subtask by subtask through a focus group to determine which knowledge and skills were relevant to each subtask. Through training observation at Tactical Training Exercise Control Group at 29 Palms, California, and instructor interviews, FiST subtasks were further decomposed into multimodal cue and capability requirements via a sensory task analysis to identify all information necessary to perform the close air support and call for fire tasks and interaction requirements (see Table 9.6). Based on a FiST sensory task analysis, operational multimodal cue and capability requirements were transformed into fidelity requirements by examining human information processing requirements and mapping them to knowledge and skill training objectives for each subtask, then validating with an expert. Table 9.7 details the fidelity requirements for the tools/interfaces by which
184
Learning, Requirements, and Metrics
Table 9.5. FiST Knowledge and Skills Decomposition Examples Individual Knowledge and Skills
Mission Phase
Task
Subtask
Mission Planning
Evaluate and communicate SEAD effectiveness
Locate where Perceptual skills: SEAD rounds visual detection of cues to hit indicate SEAD, auditory localization of SEAD round, detect auditory comms from indirect fire agency Determine if the SEAD mission was effective
Perceptual skills: fine visual detail discrimination Spatial skills: visual distance estimation, spatial orientation; Decision making: assess effectiveness
Communicate whether the suppression was effective to the pilot and advise whether to continue mission or not
Decision making: Assess what actions pilot should take
Team Knowledge and Skills
Team coordination: information exchange
trainees receive and act upon the environment and then maps those to the training goals identified in the first step. Stage 3: MOT2IVE Metrics Development In order to assess the degree to which MOT2IVE facilitated the trainee meeting mission goals, operational performance metrics had to be identified. Through instructor interview, metrics that could be used to discriminate good and poor performance levels were identified, including thresholds separating these two
Training Systems Requirements Analysis
185
Table 9.6. FiST Multimodal Cue and Capability Requirements Example Task
Subtask
Required Multimodal Cues Required Capability
Evaluate and communicate SEAD effectiveness
FAC/FO locates where SEAD rounds hit
Visual: – SEAD round on deck – Magnified view through binoculars – Map Auditory: – Sound of SEAD round flying over head and hitting deck Communications from indirect fire agency
FAC/FOs determine if the SEAD mission was effective
Visual: – Terrain – SEAD round on deck – Damage to EAD
FAC communicates Auditory: – Communications from whether the pilot suppression was effective to the pilot and advises whether to continue mission or not
Ability to scan terrain within
– Ability to communicate with pilot
performance levels. Table 9.8 provides example metrics for the SEAD evaluation task. Next, operational performance metrics were mapped to associated knowledge and skills (K&S) to assess the degree that MOT2IVE facilitated learning. If tracked over multiple examples or over time, patterns in learning on these K&S dimensions can be determined (Table 9.8). With MOT2IVE, these metrics were used to develop a performance measurement and diagnostic tool, described in the next section, which tracked learning trends over time, KSA training objectives, and mission phases (see Figure 9.3). When examined with subtask performance, the outputs of this tool can be used to determine if decrements in performance are due to a mission level task or lack of knowledge of a skill set. Stage 4: MOT2IVE Training Management Component Development To monitor trainees’ performance, and to track progress toward training goals, training management strategies and components were developed to inject key events into scenarios, calculate metrics, diagnose breakdowns in performance, and to illustrate these breakdowns to instructors. Scenarios were developed by creating scenario manipulation variables associated with each friction point
186
Learning, Requirements, and Metrics
Table 9.7. FiST Fidelity Requirements Example
Task
Subtask
Evaluate and communicate SEAD effectiveness
FAC/FO locates where SEAD rounds hit
Knowledge & Human Information Skills—Training Processing Objective Requirements
Perceptual skills
Perceptual FAC/FO determines skills Spatial skills if the SEAD mission was effective
Fidelity Requirements
Visual: – Detect SEAD round on deck potentially outside field of view Audio: – Localize: sound of SEAD round flying over head and hitting deck – Detect: communications from IDF agency Response: – Ability to scan terrain
Visual display – Low desolution – Wide field of view – Audio: – Spatialized
Visual: – Discriminate distance and depth perception – Discriminate: visually localize – Discrimination of fine detail
Visual display – High resolution – Normal field of view –
training gap across an array of difficulty levels (that is, crawl, walk, and run). These variables and difficulty variations were used to develop scenarios. An example is provided in Table 9.9. The performance assessment and diagnostic tool (PAST; Carroll, Champney, et al., 2007) training management component was developed to measure and diagnose performance by identifying the root errors associated with mission failures and how they propagated through to mission outcomes. This tool illustrates the chain of events leading up to each error to identify earlier mistakes or contributing factors that may have led to performance errors. Detailed logic was created to chronologically link specific metrics to root causes, and metrics were categorized by the training goal they were related to (see Figure 9.4). This case study has illustrated the successful application of the training system design lifecycle to the development of requirements for a U.S. Marine Corps FiST trainer (MOT 2 IVE), which has been embraced by the operational
Training Systems Requirements Analysis
187
Table 9.8. FiST Performance Metric Examples Task
Subtask
Evaluate and communicate SEAD effectiveness
FAC/FO locates where SEAD rounds hit
FAC/FO determines if the SEAD mission was effective
Performance Metrics
Performance Thresholds
Knowledge and Skills
– Did FAC/FO visually locate SEAD round? – Time to locate SEAD rounds
– Yes
– Spatial knowledge – Perceptual knowledge
–<2 s
– Yes Did FAC/FO correctly evaluate effectiveness of SEAD? – <2 s Time to evaluate SEAD effectiveness
– Yes – Did FAC FAC communicate communicates effectiveness whether the of SEAD to suppression pilot? was effective to the pilot and advises whether to continue mission or not
– Spatial knowledge/ skill – Perceptual knowledge/ skill – Decision making – Team coordination (information exchange)
community and incorporated into multiple U.S. Marine Corps training curricula due to its operational relevance. CONCLUSION This chapter discusses in detail two components of the training system design lifecycle: operational requirements analysis and human performance requirements analysis. Through the ORA process discussed herein, system requirements can be informed by training needs/gaps the system must address, as well as the multisensory information and interaction requirements necessary to target these training gaps. Additionally, the HPRA process seeks to ensure systems are designed to effectively support training through development of metrics to assess whether training goals are being met and, with the provision of training management strategies, to assist in attainment of these goals. Both processes are concerned with understanding the goals of the system from the trainee’s
188
Learning, Requirements, and Metrics
Figure 9.3.
FiST Knowledge and Skill Training Objective Learning Trends
perspective, developing the environment to support and monitor knowledge and skill development, and using measures of performance to provide trainer tools or trainee tools to reduce decrements. By grounding the development of training systems in ORA and HPRA, training effectiveness can be built into the system Table 9.9. FiST Scenario Variation Example Training Gap/ Goal
SEAD
Scenario Manipulation Variable
Type of enemy air defense, location of enemy air defense, and whether enemy air defense was preidentified or unidentified
Crawl Variation
Walk Variation
Run Variation
One enemy air defense that is identified to FiST in scenario (for example, commanders intent briefing, fire support control center info, and so forth)
One enemy air defense not identified to FiST, so FiST has to detect it
Multiple enemy air defenses, one identified to FiST in scenario and one not identified, so FiST has to detect it
Training Systems Requirements Analysis
Figure 9.4. Tree
189
Performance Assessment and Diagnostic Tool Root Error Diagnostic
from the ground up. Doing this is critical to ensure that overall training goals are met by providing trainees with opportunities to conduct operationally relevant missions and supporting the learning of the knowledge and skills that are needed to successfully achieve mission outcomes.
190
Learning, Requirements, and Metrics
REFERENCES Ahmad, A., Carroll, M., Champney, R., Milham, L., Parrish, T., & Chang, D. (2007). Virtual reality training system development guidance tool for multimodal information fidelity level selection: Tool for the Optimization of Multimodal Cues for the Advancement of Training System Design—TOMCAT (Phase 1 Final Rep., Contract No. N00014-07-M-0231). Arlington, VA: Office of Naval Research. Bell, M., Jones, D., Chang, D., Milham, L., Becker, W., Sadagic, A., & Vice, J. (2006). Fire Support Team (FiST) task analysis surrounding eight friction points (VIRTE Program Rep., Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Bell-Carroll, M., Jones, D., Milham, L., Delos-Santos, K., & Chang, D. (2006). Multi-level timeline logic, metric definitions and DIVAARS tags (VIRTE Program Interim Rep., Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Cannon-Bowers, J. A., Burns, J. J., Salas, E., & Pruitt, J. S. (1998). Advanced technology in scenario based training. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 365–374). Washington, DC: American Psychological Association. Carroll, M. B., Champney, R., Jones, D., Milham, L., Delos-Santos, K., & Chang, D. (2007). Metric toolkit deliverable: Performance Assessment and Diagnostic Tool (PAST) design, development and evaluation (VIRTE Program Technical Rep., Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Carroll, M. B., Milham, L., Champney, R., Eitelman, S., & Lockerd, A. (2007). ObSERVE initial training strategies report (Program Interim Rep., Contract No. W911QY-07-C0084). Arlington, VA: Office of Naval Research. Champney, R. K., Carroll, M. B., Milham, L. M., & Hale, K. S. (in press). Sensoryperceptual objective task (SPOT) taxonomy: A task analysis tool. Proceedings of the Human Factors and Ergonomics Society 52nd Annual Meeting. Santa Monica, CA: Human Factors and Ergonomics Society. Cohn, J. V., Stanney, K. M., Milham, L. M., Jones, D. L., Hale, K. S., Darken, R. P., & Sullivan, J. A. (2007). Training evaluation of virtual environments. In E. L. Baker, J. Dickieson, W. Wulfeck, & H. O’Neil (Eds.), Assessment of problem solving using simulations (pp. 81–105). Mahwah, NJ: Lawrence Erlbaum. Ericsson, K. A., & Oliver, W. L. (1995). Cognitive skills. In N. J. Mackintosh & A. M. Colman (Eds.), Learning and skills (pp. 19–36). London: Longman Group Limited. Fairley, R. E., & Thayer, R. H. (1997). The concept of operations: The bridge from operational requirements to technical specifications. In M. Dorfman & R. H. Thayer (Eds.), Software engineering (pp. 44–54). Los Alamitos, CA: IEEE Computer Society Press. Fowlkes, J., Dwyer, D. J., Oser, R. L., & Salas, E. (1998). Event-based approach to training. The International Journal of Aviation Psychology, 8(3), 209–221. Goode, H. H., & Machol, R. E. (1957). Systems engineering: An introduction to the design of large-scale systems. New York: McGraw-Hill. Hale, K. S., & Stanney, K. M. (2006). Enhancing spatial awareness with tactile cues in a virtual environment. Proceedings of the Human Factors and Ergonomics Society 50th Annual Meeting (pp. 2673–2677). Santa Monica, CA: Human Factors and Ergonomics Society.
Training Systems Requirements Analysis
191
Jones, D., & Bell, M. (2006). MOT2IVE V2 system requirements (VIRTE Program Interim Rep., Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Jones, D., Bell, M., & Milham, L. (2006). VIRTE demo 3 scenario manipulation variable matrix (VIRTE Program Interim Rep., Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Jones, D. L., Stanney, K., & Foaud, H. (2005, August). Optimized spatial auditory cues for virtual reality training systems. Paper presented at the 2005 APA Convention, Washington, DC. Milham, L., Gledhill-Holmes, R., Jones, D., Hale, K., & Stanney, K. (2004). Metric toolkit for MOUT (VIRTE Program Rep., Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Milham, L., Hale, K., Stanney, K., & Cohn, J. (2005, July). Using multimodal cues to support the development of situation awareness in a virtual environment. Paper presented at the 1st International Conference on Virtual Reality, Las Vegas, NV. Milham, L., & Jones, D. (2005). MOUT room clearing cross reference of sensory cues (operational and metaphoric) to environmental information required for tasks: Interim technical report (VIRTE Program Interim Rep., Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Milham, L. M., Cuevas, H. M., Stanney, K. M., Clark, B., & Compton, D. (2004). Human performance measurement thresholds (Phase I Final Rep. No. N00178-04-C-3019). Dahlgren, VA: Naval Surface Warfare Center. Oser, R. L., Cannon-Bowers, J. A., Salas, E., & Dwyer, D. J. (1999). Enhancing human performance in technology-rich environments: Guidelines for scenario-based training. In E. Salas (Ed.), Human/technology interaction in complex systems (Vol. 9, pp. 175– 202). Stamford, CT: JAI Press. Prince, C., & Salas, E. (1993). Training and research for teamwork in the military aircrew. In E. L. Wiener, B. G. Kanki, & R. L. Helmreich (Eds.), Cockpit resource management (pp. 337–366). Orlando, FL: Academic Press. Pruitt, J. S., Burns, J. J., Wetteland, C. R., & Dumestre, T. L. (1997). ShipMATE: Shipboard Mobile Aid for Training and Evaluation. Proceedings of the Human Factors and Ergonomics Society 41st Annual Meeting (pp. 1113–1117). Santa Monica, CA: Human Factors and Ergonomics Society. Rossett, A. (1987). Training needs assessment. Englewood Cliffs, NJ: Educational Technology Publications. Samman, S. N., Jones, D., Stanney, K. M., & Graeber, D. A. (2005, July). Speech, Earcons, Auditory Spatial Signals (SEAS): An auditory multi-modal approach. Paper presented at the HCI International Conference, Las Vegas, NV. Schmorrow, D., Solhan, G., Templeman, J., Worcester, L., & Patrey, J. (2003, June). Virtual combat training simulators for urban conflicts and performance testing. Paper presented at the International Applied Military Psychology Symposium, Brussels, Belgium. Retrieved April 8, 2003, from http://www.iamps.org/praha/PaperSchmorrow2.doc Stanney, K. M., Graeber, D., & Milham, L. (2002). Virtual environment landing craft air cushion (VELCAC) knowledge acquisition/engineering (VIRTE Program Rep., Contract No. N0001402C0138). Arlington, VA: Office of Naval Research. Stanney, K. M., & Zyda, M. (2002). Virtual environments in the 21st Century. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 1–14). Mahwah, NJ: Lawrence Erlbaum.
192
Learning, Requirements, and Metrics
Swezey, R. W., & Llaneras, R. E. (1997). Models in training and instruction. In G. Salvendy (Ed.), Handbook of human factors (pp 514–577). New York: Wiley. Tessmer, M., McCann, D., & Ludvigsen, M. (1999). Reassessing training programs: A model for identifying training excesses and deficiencies. Educational Technology Research and Development, 47(2), 86–99.
Chapter 10
BUILDING VIRTUAL ENVIRONMENT TRAINING SYSTEMS FOR SUCCESS Joseph Cohn There are many steps between the first ideation of a system and the last hammer blow to the final to-be-delivered system. These include the following: • Concept generation, • Requirements definition, • Metrics development, • System validation, • Return on investment determination.
These steps are interdependent and collectively represent the different elements that comprise the training system development cycle. While it is possible to enter this cycle at any step—for example, a development team may be asked to calculate a return on investment for a commercial off-the-shelf product to determine if it satisfies an identified need—more often than not this cycle is entered at either the concept generation step or the requirements definition step. Concept generation teams are often tasked with a very general mandate to identify leap-ahead technologies or to anticipate new challenges before they emerge. So, when a design and development team enters this cycle at the concept generation step, they often do so with a vague notion of what it is they are trying to solve. This is often a costly and time consuming approach, but because of the intellectual freedom it provides, it may also be the one most likely to produce a leap-ahead solution because they may have wide latitude in coming up with viable solutions. Starting at this step represents a high risk, and (potentially) high reward decision. Contrastingly, when a team enters this cycle at the requirements definition step, the team members are typically provided with a performance challenge to solve. Here, they must use the specified parameters as the starting point for developing their solutions. This approach is therefore more constrained, but, because of its bounded nature, it is more likely to return a viable solution for a given investment. Starting at this step represents a moderate risk, moderate reward decision.
194
Learning, Requirements, and Metrics
The focus of this chapter is on system development cycles that begin with the requirements definition step. We first explore different ways of developing requirements; next, we look at different approaches for generating concepts that will satisfy these requirements and for building metrics to help developers evaluate progress and success; we follow this with a short discussion on system validation and return on investment calculations and conclude by using a real world example of how these different steps may be applied to developing a notional virtual environment (VE) based infantry training system. REQUIREMENTS A requirement is simply a description of what a system/product/tool must do (Ross & Schoman, 1977). There are many different ways of addressing requirements, but, typically, four basic questions must be answered: 1. What must the system do? 2. How will the system be built? 3. Why build this particular system? 4. When should success be declared?
Some of the more common methods for addressing these questions include • Knowledge elicitation, • Knowledge analysis, and • Knowledge validation.
Knowledge Elicitation There are two main objectives that any knowledge elicitation effort must achieve: (1) identifying the critical gap that the to-be-developed system must address and (2) obtaining the necessary information to characterize the target users and their associated environment. Both objectives may be met with the following: • Obtaining and reviewing all relevant user community documents (training manuals, system documentation, newsletters, and so forth), • Conducting subject matter expert (SME) interviews in order to a. Understand performance gap(s) and b. Determine where to insert a given solution, • Using questionnaires to c. Determine consensus points relating to proposed performance gap(s) and d. Characterize attitudes toward proposed performance gap(s) and proposed system solutions, • Convening focus groups to drive consensus at points of dissension,
Building Virtual Environment Training Systems for Success
195
• Using both questionnaires and SME interviews to develop user community profiles that characterize e. The targeted community in terms of demographics, knowledge, skills, and abilities and f. The targeted environment in terms of the culture and operational context into which the envisioned solution will be integrated.
Knowledge Analysis Once the necessary information is gathered, it must be processed and formatted to support data-driven design decisions. The most common, general tool for formally representing this information is the task analysis (Kirwan & Ainsworth, 1992; Chipman, Schraagen, & Shalin, 2000), which comes in many different forms and is used for a wide range of purposes, including • Identifying task flows, • Characterizing user knowledge requirements, • Determining information processing requirements (that is, inputs to the system and outputs from the system), and • Developing system fidelity requirements.
Knowledge Validation Gathering and representing the information is only part of the challenge. Typically, the team that interacts with the user community is not the team that develops the system that will be delivered to the user community. Consequently, the user information must be represented in such a manner that it supports the development team as it designs and constructs the actual solution. This requires developing models and formalizations that • Characterize task interdependencies in terms of workflow and timelines, • Capture performance specifications, and • Develop use case scenarios that link these models and formalizations so that developers will be better able to anticipate the impact of design trade-offs on user performance.
Once these models and formalizations are constructed, they may then be validated. This includes returning to the same user groups that provided the initial information and reviewing with them the captured knowledge; it may also involve identifying other user groups, from the same community, who have not yet provided their inputs, providing an additional source for comment and review. CONCEPT GENERATION Developing concepts to create solutions that will satisfy a given set of requirements is often the most difficult step of the system development cycle. The
196
Learning, Requirements, and Metrics
critical challenge is taking the various pieces of information—from user needs to system specification—that form the bulk of the requirements development step and envisioning a new approach that will satisfy these needs. The main obstacle is that any design and development team is limited by its own understanding of the problem and solution spaces. The concept generation step focuses on enabling teams to move outside of their collective decision loops and consider both spaces from new perspectives. One approach for arriving at new solutions is to seed new concepts by understanding current solutions and identifying their inefficiencies (Scanlan, 2007). An existing solution will be made up of different components—be they individual technologies, such as a laptop’s hard drive, battery, and so forth, or individual methodologies, such as during action or after action feedback. Each of these components has a set of operating parameters, which combine to create that system’s capabilities. Changes to a component will impact system efficiency. Therefore each of these components can be analyzed in terms of the specified requirements and assessed in terms of their pros and cons. Some general approaches for moving through the concept generation phase include the following (after MacLean, 2006; Durfee, 1999): • Brainstorming: developing a variety of concepts that provide the seed material for follow-on discussions, • Scenario Generation and Role-Playing: exploring different concepts in terms of how they might be applied by the user community to solve the identified gap and stepping through the design and development of actual tools derived from these concepts, and • Prototyping: developing simple, low cost “models” (for example, storyboards) to quickly demonstrate how the concepts being considered will impact, and be impacted by, the user.
When done properly, concept generation will not only lead to new solutions that satisfy identified requirements, but it will also provide testable hypotheses that can be used later, during the validation step, in order to demonstrate the benefits of using the newly developed system.
METRICS The requirements step provides the information that will be used by the design and development teams as they construct their solutions; using this information, the concept generation step provides a way for developers to envision possible solutions. As system development proceeds, it is important to ensure that progress may be continually assessed and that the answer to “when should you declare success?” may be clearly obtained through the cycle, not just at time of system delivery. In order to do this, we must not only develop metrics that quantify end-user performance on the system, but that also support the system designers and developers who are creating the system, and decision makers who represent the broader interests of the organization (or community) for which the system is being developed. These metrics will be developed based to a large
Building Virtual Environment Training Systems for Success
197
extent on the data captured during the requirements step and bounded by the concept generation step. Metrics provide the means for discussing system merits. Done well, a comprehensive set of metrics will provide system designers and developers with a continuous snapshot of system performance to weigh progress against requirements; they will also help system users determine system effectiveness, enabling them to provide constructive feedback to the design and development team, and they will help decision makers (such as those who must purchase the final product) establish long-term returns on investment. Kirkpatrick (1998) defined four levels of metrics that focus on evaluating training systems across these different levels and that are captured over different time horizons: 1. Reaction: immediate responses to the system, in terms of usability, acceptance, and satisfaction (Nielsen, 1993; Stanney, Mollaghasemi, Reeves, Breaux, & Graeber, 2003); best used for obtaining quick snapshots of system progress and alignment with user specified goals. 2. Learning: near-term responses of individual users following exposure to the system; best used as part of a near-term effectiveness evaluation paradigm to ensure that the identified performance gap is bridged at the level of the individual user in response to his or her interaction with the system (Lathan, Tracey, Sebrechts, Clawson, & Higgins, 2002). 3. Behavior: midterm responses of the community in response to large-scale interaction of individual members with the system; best used as a midterm assessment of the system impact across the user community. 4. Results: long-term impact to the larger organization as a result of sustained, community-wide exposure to the system; best used as part of a return on investment assessment.
Interleaved with these levels are different time requirements for capturing these metrics. Some of these levels may be satisfied over relatively short time spans, while others may require longer time investments in order to be fully populated. As we will see later, the notion of time plays an important role in determining return on investment—how long one is willing (or able) to wait to populate a set of metrics will directly impact the usefulness of a given set of metrics. Building Metrics: Different Types Regardless of the role that a particular metric may be used for (quick assessment of system usability or longer term determination of system impact), all metrics must be built by first determining the kinds of information they must represent (Department of Energy, 2005): • Process or outcome: Process metrics capture behaviors that evolve during an event, while outcome metrics capture behaviors following the event (Salas, Burke, Fowlkes, & Priest, 2004);
198
Learning, Requirements, and Metrics
• Measures of performance (MOPs) or measures of effectiveness (MOEs; Sproles, 2000): MOPs measure system performance, providing the engineer’s view of the system, while MOEs measure user performance, providing the user’s view of the system.
Figure 10.1 illustrates the interrelationship between these types of metrics. Both the system and the user views may be captured using either process or outcome measures. Time to populate each type of metric becomes critical when considering the cost-benefit trade-offs with using a given metric to satisfy a particular level of metric. The longer it takes to capture the metric, the greater the cost and the less likely it will be used in final assessment.
Building Metrics: The Process Developing metrics is as much an art as a science. A general process for creating any metric will build on the earlier development cycle steps and incorporate new ones, including the following: 1. Metric Qualification: Using information gathered during both the concept generation and requirements definition steps to identify intended effects. This is oftentimes best done using a prose/descriptive format rather than a mathematical or algorithmic one (for example, how accurate will a user be in performing a given task?). 2. Metric Quantification: Using the prose/descriptive format to develop pseudorepresentations (for example, the user is able to fire 20 rounds in 5 s with 70 percent accuracy). 3. Metric Development: Translating (1) and (2) into an actual metric. This may be done by crafting pen/paper assessments, tapping into the actual system’s data outputs, or
Figure 10.1.
Interrelationship between Different Types of Metrics
Building Virtual Environment Training Systems for Success
199
by capturing the user’s output using an external system (for example, a motion capture system to observe user behavior).
When building metrics, it is important to ensure that any final measure has the following attributes (adapted from Rosenberg & Hyatt, 1997): • Specific: Measures must provide clear and focused indications of an identified behavior. • Assessable: Measures must be statistically analyzable. • Attainable: Measures must characterize properties that are neither too hard nor too easy to capture and that are expected to vary with different contexts and conditions. • Realistic: Measures must clearly link back to identified desired effects. • Timely: Measures must be recordable within a time frame that allows the resultant information to inform the desired level of user.
RETURN ON INVESTMENT Return on investment (ROI) calculations are meant to answer a single question: Is the system development effort worthwhile? Recalling the four different levels of metrics, the answer to this question may be calculated in different ways using different metrics and providing different answers: • At the reaction and learning levels, ROI may tell developers that a particular system configuration is not providing the desired level of response, suggesting that an alternative set of components may be better. • At the behavior level, ROI may provide community leaders with deeper insight into the relationship between a given system and the desired impact on the end-user community’s long-term performance. • At the results level, ROI may enable decision and policy makers to determine if the investment in the system is worth the overall cost
It is important to understand who the end users will be and what information they will need from the calculation. Reaction level metrics provide a wealth of information pertaining to specific aspects of the system as it is being developed (for example, usability assessments, demographic information, and so forth)—a lot of information pertaining to a little bit of the larger system and how it will be used. This information will be used primarily by system developers to make who must have ready access to such minutiae in order to fine-tune the system. On the other hand, results level metrics typically provide a summative assessment of the system as a whole, long after the system has been developed—a little bit of information that will be applied to decisions concerning the entire system. This information will be used primarily by key decision makers as they make determinations on whether to continue using the system, to request largescale upgrades or modifications, or to change how the system is used within the organization.
200
Learning, Requirements, and Metrics
This division of metrics leads to a trade-off in how ROI assessments are used, which can be qualified using a notional cost of waiting function. ROIs based on reaction and learning level metrics can be performed in a relatively short amount of time; consequently, the cost of waiting to capture these metrics is small. On the other hand, ROIs based on behavior or results level metrics, which often take months if not years to capture, have much larger associated cost of waiting— when these metrics are finally quantified, it is likely too hard to make any significant system design changes. Consequently, a major challenge in performing ROIs is finding the right balance between the level of metric and the intended use of the ROI (see Figure 10.2). Depending on one’s perspective, the cost of waiting to obtain a given level of metric will vary with time. Here, the cost to wait is evaluated from the perspective of an organizational decision maker, for whom the results level is the most critical metric. Populating this metric requires a significant investment in time, which may introduce other trade-offs. VALIDATION How a system is validated depends to a large extent on what purpose it was built to serve. Because validation may oftentimes take the form of an experiment, in which the utility of a given system is compared to the utility of other systems, across a set of metrics, it is crucial to develop a testable hypothesis—or
Figure 10.2.
Cost of Waiting Function
Building Virtual Environment Training Systems for Success
201
hypotheses—relating to the anticipated impact of the system. For example, if one were to propose building a training system that was meant to improve situation awareness, one hypothesis for validating this system would be to compare performance using this system against performance using the traditional approach(es). As mentioned earlier, these hypotheses are meant to be formulated during the concept generation step. In terms of building VE simulations, a generic hypothesis would be that training on a given system provides a level of improvement that translates to enhanced performance of the real world task being simulated above that that would be obtained using other types of training interventions (Lathan et al., 2002). One approach to validating this hypothesis is to show that training in a VE produces enhanced performance on real world tasks, using the transfer of training (ToT) method (Lathan et al., 2002). This method estimates simulator effectiveness as a comparison of two groups of trainees, an experimental group that receives simulator training and a control group that receives all of its training in the real world. Performance between the two groups is then compared along a set of metrics (Boldovici, 1987; Cohn, Helmick, Meyers, & Burns, 2000). The specific nature of a given ToT depends on the results of the earlier system development cycle steps. Requirements help define who may serve as the subject pool, and how; metrics provide insight into the expected differences in behaviors between individuals who receive VE training and those who do not receive that training when performing operational tasks in the real world environment, and even return investment provides a pathway for making use of the final results. EXAMPLE To see how these different steps may be linked, let us now turn to a real world example, developing a VE based tool that will train small teams of infantry in the art of room clearing. In keeping with the flow of this chapter, let us assume that we enter the system development cycle at the requirements step given the following information: 1. What must the system do? • Allow users to immediately immerse themselves without prior training • Provide an environment for learning new room clearing methods, practicing already learned ones • Permit use by infantrymen with a wide range of knowledge, skills, and abilities 2. How will the system be built? • Multimodal sensory stimulation (spatialized audio, haptic feedback, and wide field tracking) • Portable and low energy consumption 3. Why build this particular system? • Live training is becoming increasingly costly and dangerous • Training time and space are becoming more difficult to find
202
Learning, Requirements, and Metrics
4. When should success be declared? • Engineer’s View: Achieve desired system specifications • User’s View: Long-term retention and recall of skills; reduced time to train; increased throughput in training pipeline.
Our challenge then is to develop new concepts for achieving these requirements, build the right metrics to help assess progress, and conduct a return on investment assessment and a validation study. Concepts Our challenge is to build a more effective training tool using VE technologies. This tool should be easy to use, low cost, and performance enhancing. These criteria allow us to develop a set of hypotheses, two of which are highlighted below: 1. Hypothesis 1: Our VE will reduce the total cost of training. 2. Hypothesis 2: System fidelity has a significant impact on how well the VE will deliver the desired training.
Using our brainstorming tools, we start by decomposing existing VE systems into their components and capabilities. These include tracking systems, display technologies, and human computer interaction technologies. Considering how these systems may be improved upon provides one an inroad to achieving our requirements. Figure 10.3 illustrates one way of conceptualizing the system components. In line with our hypotheses, we consider both high end and low end components, which may either be pulled directly from the commercial marketplace (commercial off-the-shelf products) or require significant development efforts (total new product). After role-playing and paper prototyping different combinations of solutions—and referencing the knowledge models derived from interviews with the end user—we propose a middle-of-the-road solution that trades high end display and tracking systems for low end ones in order to develop an entirely novel type of natural human computer interface. Metrics For each of the four levels of metrics—reaction, learning, behavior, and results —a range of specific measures could be developed. Table 10.1 presents some of these in terms of both MOPs and MOEs and process and outcome. At the most basic level, reaction, consider metrics that characterize the display system. Process metrics at this level include how quickly the scene is updated (times per second; MOP because it refers to the system properties) and how much information the user is able to continually glean from this display (field of view; MOE because it refers to the user’s experience). Outcome measures include how well the system holds up under sustained use (ruggedness; MOP) and how well the
Building Virtual Environment Training Systems for Success
203
Figure 10.3. Results of the concept generation step. (Top) Possible solutions for achieving the desired requirements. (Bottom) Proposed solutions for achieving the desired requirements based in part on guiding hypotheses.
user holds up under continued exposure to an artificial visual stimulus (side effects; MOE). Scaling up to the next level, learning, consider metrics that capture how the scene unfolds as the system provides training. Process metrics include how well the system moves the user from scene to scene without jarring and abrupt transitions (scene transitions; MOP) and how effectively the user exploits the training being provided (maintaining situational awareness; MOE).
204
Learning, Requirements, and Metrics
Table 10.1. Examples of MOPs and MOEs for a Virtual Environment Training System, Assessed at Each of the Four Levels of Training Effectiveness Evaluation, in Terms of Processes and Outcomes Factor
Reaction Learning
Behavior
Results
MOP Display MOE MOP Scenario MOE
Process
Update rates Field of view Scene transitions Maintain situational awareness MOP Curriculum Technology integration MOE Smoother team interactions MOP Reduced cost of overall training MOE Increased survivability, faster and larger throughput
Outcome
Rugged Side effects Event tagging Combined scores Streamlined training Faster training completion times
Outcome metrics include how well the system was able to capture any long key events occurring within training scenarios (event tagging; MOP) and how much better the user was at performing a set of tasks following completion of the training (combined scores; MOE). Continuing to the next level of metric, behavior, consider how the system impacts multiple users over the long term. Process metrics at this level include evaluating the degree to which these new training tools are integrated into the organizations larger training plan (technology integration; MOP) and the degree to which making this training more readily available impacts how individuals work together (smoother team integration; MOE). Outcome metrics may focus on the ease with which this training is delivered (streamlined training; MOP) and how delivering training using VE technology reduces the amount of time the organization devotes to delivering training (faster training completion times; MOE). Finally, at the highest level of metric, results, the distinction between process and outcome becomes blurred, with corresponding metrics emphasizing such long-term benefits as reduced cost of overall training (MOP) and increased survivability (of users when in actual combat; MOE). Return on Investment Our initial hypotheses can be combined into a single statement: using higher end VEs reduces training cost compared to other systems or to no training at all. Specifically, we suggest that using a high end VE system will train users faster to an acceptable level of performance over other approaches. Consider a marksmanship task, in which a passing score is 80 percent of total rounds fired landing within a particular region of a target. Assume it is determined that
Building Virtual Environment Training Systems for Success
205
without any training students require eight hours of practice on the actual task to achieve this performance level. Further assume that when students are given access to a high end VE for four hours, it is determined that they subsequently require three hours of training on the real task to achieve the desired performance level, a savings of training time of five hours. In contrast, when students use the low end VE for four hours, it is determined that they subsequently require four hours of training on the real task to achieve the desired performance level, a savings of training time of four hours. To assess the effectiveness of using VEs in this application, we use the training effectiveness ratio (TER; Wickens & Hollands, 2000). The TER is defined as
A TER less than 1 indicates the VE is less effective than the current approach, a TER greater than 1 indicates the VE is more effective than the current approach, and a TER equal to 1 indicates no difference in effectiveness. In this example,
while
Here the TER for the low end VE is 1, indicating essentially no benefit in terms of time (with the VE students still devoting a total of eight hours of time to training, divided equally between practicing in the VE and on the actual task), while the TER for the high end is 1.25, indicating some benefit in terms of time saved (with the VE students devoting only seven hours of time to training). This is only a partial assessment of the effectiveness of either VE system. When metrics from other levels, such as behavior and results, are considered, the benefits (or costs) of using the VE system become evident. Such factors as the cost of range time (that is, the number of personnel required to ensure safety, the cost of rounds, wear and tear on the actual weapons, and so forth), as well as the availability of time on the range (sure to be highly restrictive) compared to the availability of time on the VE (depending on how rugged the system, potentially 24/7), compared to the cost of VE development, production, and tech support, may provide additional insights into which VE solution, if any, is ideal. Validation The data used to populate the ROI calculations must come from experimentation. Here, in order to capture the depth of data needed to create the trade-off curves, the ideal experiment design would proceed in several phases. Phase 1 would involve iterative usability and human factors assessments. These would
206
Learning, Requirements, and Metrics
be embedded in the system development plan to occur around the point at which key system functionalities were planned for delivery. The outcome of these assessments would then be rank-ordered recommendations for redesign that the developers could then incorporate into their next system build. Phase 2 would include the ToT types of studies. As Roscoe (1980) suggests, in order to develop the actual curves, the design would require exposure of multiple groups of trainees to incrementally longer training regimens in the VE, and the subsequent assessment of performance in a suitable transfer environment (here, training regimen may be defined either in terms of number of trials or trial length; the distinction depends on how the actual training is sequenced). Phase 3 would involve longer-term tracking of the impact of the training system on the organization. SUMMARY There is no simple recipe for ensuring successful development of a VE training system, and there are no simple equations for calculating success. Training systems that produce enhanced performance in laboratory settings may fail to yield similar results in the field, or they may fail to be adopted by the community for other reasons having nothing to do with their training effectiveness, but everything to do with their failure to become successfully integrated into the community’s concept of operations. Nevertheless, certain steps may be identified which, when followed, will likely increase the likelihood of a system being effective in achieving its performance enhancement goals and successful in becoming part of a user community’s training repertoire. The steps discussed in this chapter provide one way of realizing this goal. REFERENCES Boldovici, J. A. (1987). Measuring transfer in military settings. In S. M. Cormier (Ed.), Transfer of learning: Contemporary research and learning (pp. 239–260). San Diego, CA: Academic Press, Inc. Chipman, S. F., Schraagen, J. M., & Shalin, V. L. (2000). Introduction to cognitive task analysis. In J. Maarten Schraagen, S. F. Chipman, & V. L. Shalin (Eds.), Cognitive task analysis (pp. 3–23). Mahwah, NJ: Lawrence Erlbaum. Cohn, J. V., Helmick, J., Meyers, C., & Burns, J. (2000, November). Training-transfer guidelines for virtual environments (VE). Paper presented at the 22nd Annual Interservice/Industry Training, Simulation and Education Conference, Orlando, FL. Department of Energy. (2005, October). How to measure performance: A handbook of techniques and tools. Retrieved June 6, 2007, from http://www.orau.gov/pbm/ handbook/ Durfee, W. (1999). Concept generation. Retrieved June 3, 2007, from http://www.me. umn.edu/courses/me4054/lecnotes/generate.html Kirkpatrick, D. L. (1998). Evaluating training programs. San Francisco: Berrett-Koehler Publishers. Kirwan, B., & Ainsworth, L. K. (1992). A guide to task analysis. London: Taylor & Francis.
Building Virtual Environment Training Systems for Success
207
Lathan, C. E., Tracey, M. R., Sebrechts, M. M., Clawson, D. M., & Higgins, G. A. (2002). Using virtual environments as training simulators: Measuring transfer. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 403–414). Mahwah, NJ: Lawrence Erlbaum. MacLean, K. (2006). Concept generation and prototyping. Retrieved June 3, 2007, from http://www.ugrad.cs.ubc.ca/~cs344/resources/supp-concGen&Prototype.html Nielsen, J. (1993). Usability engineering. Boston: Academic Press. Roscoe, S. N. (1980). Aviation psychology. Ames: Iowa State University Press. Rosenberg, L., & Hyatt, L. (1997). Developing a successful metrics program. Retrieved June 7, 2007, from http://satc.gsfc.nasa.gov/support/ICSE_NOV97/iasted.htm Ross, D. T., & Schoman, K. E. (1977). Structured analysis for requirements definition. IEEE Transactions On Software Engineering, SE-3(1), 6–15. Salas, E., Burke, C. S., Fowlkes, J. E., & Priest, H. A. (2004). On measuring teamwork skill. In J. C. Thomas (Ed.), Handbook of psychological assessment: Industrial and organizational assessment (pp. 428–442). Hoboken, NJ: John Wiley & Sons. Scanlan, J. (2007). Concept generation. Retrieved June 1, 2007, from http://www .soton.ac.uk/~jps7/Lecture%20notes/Lecture%202%20Concept%20generation.pdf Sproles, N. (2000). Coming to grips with measures of effectiveness. System Engineering: The Journal of the International Council on Systems Engineering, 3(1), 50–58. Stanney, K. M., Mollaghasemi, M., Reeves, L., Breaux, R., & Graeber, D. A. (2003). Usability engineering of virtual environments (VEs): Identifying multiple criteria that drive effective VE system design. International Journal of Human-Computer Studies, 58(4), 447–481. Wickens, C. D., & Hollands, J. G. (2000). Engineering psychology and human performance (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
Chapter 11
LEARNING TO BECOME A CREATIVE SYSTEMS ANALYST Lemai Nguyen and Jacob Cybulski The important role of creativity has increasingly been recognized in requirements engineering (RE), an early stage in the lifecycle of systems development. Although creativity plays an important role in the discovery, exploration, and structuring of the conceptual space of the requirements problem, creativity has not yet been accepted as an essential ingredient of teaching and learning in RE. This chapter describes a novel approach to learning in RE that synthesizes different dimensions of constructivist learning and creativity education theory to support creative problem exploration and solving in RE. This learning approach will be illustrated through a training environment consisting of face-to-face classroom and online activities, as well as, computer based simulation. LEARNING CREATIVE REQUIREMENTS ANALYSIS The development and introduction of a new information system to a business or military organization is an opportunity for innovating or reinventing that organization’s practice, processes, or products in order to leverage their benefits and create value. Requirements engineering is an early process in the systems development lifecycle where innovation plays an especially important role. In general, RE involves the creation of a vision for the future system through the discovery, analysis, modeling, and validation of user requirements. Specifically in the context of this book, RE involves the elicitation, modeling and analysis, specification, and verification and validation (see Volume 1, Section 2 Perspective) of training system requirements. During this process, the systems analyst (requirements engineer) works with various systems development teams and stakeholders often including the management, business people and users (for example, educators and learners of the training system), technology vendors, and possibly the organization’s business partners and/or customers. The description of the requirements engineering topic is covered extensively in terms of process models, requirements elicitation and modeling techniques, support tools, approaches to validating and managing requirements, and documentation and templates. Interested readers are directed to see various textbooks
Learning to Become a Creative Systems Analyst
209
(for example, Robertson & Robertson, 2005; Dennis, Wixom, & Tegarden, 2004; Kotonya & Sommerville, 1998; Sommerville & Sawyer, 1997) or research reviews (for example, Nuseibeh & Easterbrook, 2000; Gervasi, Kamsties, Regnell, & Achour-Salinesi, 2004; Opdahl, Dubois, & Pohl, 2004). Tremendous effort has focused on describing and supporting the systems analyst in the construction of a requirements specification that reflects the real world problem situation through understanding and solving the problem as perceived by the user. This chapter, however, focuses on an alternative view of requirements engineering—creativity—and proposes an approach to training creative systems analysts. Recently, it has been argued that to be effective the systems analyst should also be an inventor (Robertson, 2005), and it is essential that the RE process itself is creative as well (Nguyen & Swatman, 2006). These two emerging arguments open a challenge to the RE community: how best to train and learn to be a creative systems analyst. This chapter addresses this challenge by describing the creativity aspects of RE, discusses advantages and limitations of current education approaches in RE, and proposes a new approach to learning to become a creative systems analyst.
TEACHING AND LEARNING IN REQUIREMENTS ENGINEERING Overview of Current Teaching and Learning Approaches Overall, there are three major approaches to learning RE: (1) taking an industry-intensive course (often ranging from half a day to several days), (2) taking requirements engineering as one of the subjects in a tertiary course (graduate diploma, graduate, or postgraduate degrees), or (3) workplace learning (often working alongside expert systems analysts). Each of these learning approaches has advantages and disadvantages. Analysis of these advantages and disadvantages supports the need to incorporate and promote creative thinking into these learning approaches. Learning RE through an Industry-Intensive Course Industry-intensive courses (or workshops) are often provided by various professional associations and consulting or training companies. These are often instructor led and sometimes can be delivered via a computerized learning system. These courses aim at providing formal knowledge (about processes, techniques, notations, and tools) over a short period of time with small illustrative exercises to allow learners to apply and acquire some practical skills. Limitations of such courses are the unrealistic setting of exercises and a condensed delivery of rich materials (RE knowledge). Due to these limitations, the learner often faces a gap between the knowledge acquired from the course and its application in practice, or a mismatch between “approved” practice and “actual” practice (Nguyen, Armarego, & Swatman, 2005).
210
Learning, Requirements, and Metrics
Learning RE through a Subject(s) Included in a Tertiary Course Students who are enrolled in such degree programs as Information Systems (IS), Software Engineering, or Computer Science often learn RE in a course such as Systems Analysis and Design or in a specific Requirements Engineering course. Such subjects are commonly offered in a single semester with a wide range of classroom, as well as self-paced, learning activities. Typically, lectures and tutorials (or laboratory work) allow the teacher to transfer formal knowledge (processes, techniques, notations, and sometimes tools) and allow students to apply the knowledge received though illustrative exercises or discussion questions. Assignments are often used to enable students to self-learn by drawing and applying relevant knowledge to a given problem. Common advantages of the tertiary learning approach include the acquisition of rich knowledge and opportunities to work on practical exercises repeatedly during a semester. Many problem based assignments are conducted in groups; therefore, they allow learners to interact with each other to discuss and share their learning. Common limitations of this approach include the controlled setting of class activities (especially time), unrealistic practical exercises, and assignments with predefined, teacherdesigned problem space (see, for example, Minor & Armarego, 2004). To overcome the lack of realistic practical exercises and assignments, many universities provide learners with a project based or an industry placement course, often scheduled near the completion of their qualifications. In such courses, learners are engaged in small, self-managed projects with an assigned client or work with a team of professionals at their workplaces. While such project based or industry placement courses support experiential learning, they are also a major source of problems, including inadequate provision of teaching and technical resources, elevated teaching costs, lack of available industry partners/projects, and uncertainties from the workplace environment, which may interfere with the curriculum program and course syllabus set by the teacher and/or the university.
Learning RE at the Workplace (On the Job Training) Many practitioners learn on the job by working alongside more capable experts. Advantages of this approach to learning include the learner’s participation in realistic cases, adoption of real roles and responsibilities, and acquisition of experience in dealing with real clients in real organizational settings; all of these provide a rich experiential learning environment that enables an authentic vocational knowledge acquisition process. However, limitations of this approach include a lack of access to formal (and appropriate) knowledge, a lack of a pedagogical process taking the learner from simple to complex tasks, and a reluctance of industry participants to share their knowledge (Billett, 1995). Due to many business impediments and management’s reluctance to accept the high risks associated with innovative ideas (Cybulski, Nguyen, Thanasankit, & Lichtenstein, 2003), the workplace cannot be treated as a safe learning “playground” in which a flexible and constraint-free environment would allow the learner to try out
Learning to Become a Creative Systems Analyst
211
Table 11.1. Characteristics of Learning Approaches Learning Approaches Characteristics
Industry intensive courses
• Instructor led, classroom activities, formal structured knowledge, unrealistic setting, and condensed materials within a short time
Formal tertiary courses
• Instructor led, a range of activities from classroom to project based, formal structured knowledge, and rich knowledge on a semester basis • Project based learning attempts to address the unrealistic setting at the expense of providers’ resources
Workplace learning • Self-learning, real setting, and rich experiences gained • Lack of formal structured knowledge, lack of a pedagogical approach, high risk to “experiment” ideas
different (and potentially dangerous) strategies when learning different concepts and techniques. See Table 11.1. Discussion A range of approaches to learning RE have been developed and adopted in professional training and higher education. Each comes with its own benefits and limitations. Formal education (course based learning) aims primarily at the acquisition of RE processes (analysis and modeling), techniques, notations, requirements management, and other general abilities (such as communication and team skills); however, it lacks exposure to realistic and collaborative industry projects (Minor & Armarego, 2004). A literature survey by Dallman (2004) noted a lack of learning support for creative thinking, cognitive flexibility, and metacognitive learning strategies in current formal education. Workplace learning, while providing realistic projects, lacks access to formal knowledge and pedagogical processes (Billett, 1995). At the same time, practitioners who are well positioned to effectively transfer their professional experience to the RE learners are not well informed of creativity techniques that may apply to their relevant RE practice (Maiden & Robertson, 2005). The Creativity Problem-Based Learning framework (Armarego, 2004) was developed to integrate cognitive flexibility, metacognitive learning strategies, and constructivist learning elements; to allow the learner to learn in a situated experiential environment; and to provide cognitive apprenticeship by working with an expert coacher. Armarego’s approach provides a rich and flexible learning environment to enable authentic knowledge acquisition and encourage creative thinking. While benefits have been reported (Armarego, 2004), it is yet unclear how the framework supports the inclusion of creativity theory. Creativity processes and techniques to generate ideas and solutions, to extend the conceptual space, and to evaluate the creative outcome can be included within such a framework in an informed and structured way.
212
Learning, Requirements, and Metrics
CREATIVITY IN RE There seem to be two distinct views of the RE process within the RE community. The first view, held by many authors, considers problem solving in RE as a systematic, structured, and evolutionary process, during which the problem is gradually explored, refined, and structured into the requirements model. Various methods have been proposed to guide the systems analyst to decompose the user’s problem and compose the requirements model using different decomposition approaches, modeling techniques, and notations (for example, see Jackson, 2005; Kotonya & Sommerville, 1998; Dennis, Wixom, & Tegarden, 2004). The second—and new—view of the requirements process emerged from action research and case studies (Nguyen & Swatman, 2003; Nguyen, Carroll, & Swatman, 2000; Nguyen, Swatman, & Shanks, 1999), which reveal episodes of insight-driven reconceptualization and restructuring of the requirements model during the generally incremental development of the model. These restructuring episodes can be characterized as “Aha!” moments during which the systems analyst unexpectedly sees a new perspective of the problem and, as a result, restructures the requirements model significantly. These studies confirm the Gestalt psychology theory of insight and restructuring in ill-structured problem understanding and solving (Mayer, 1992; Ohlsson, 1984). Furthermore, this new view of the requirements process emphasizes that the problem in RE is not given (not there waiting to be elicited), but instead emerges as the systems analyst enters the situation, learns, explores, and discovers different problem areas when interacting with the situation and various stakeholders. Hence, the RE process itself can be seen as a constructivist process. Nguyen and Shanks (2006b) noted two analogical views of the design process held within the design studies community—where the design process is seen either as a rational problem solving process (Simon, 1992) or a constructivist process (Scho¨n, 1996). These two views represent two forces of problem solving: the enforcement of a structured process to avoid chaos and errors, as opposed to relaxation of constraints in dealing with the emergent problem space by taking advantage of opportunistic cognitive behaviors and heuristics of participating professionals (Nguyen & Shanks, 2006b). Nguyen and Shanks further suggested that these two views are complementary and need to be integrated to support a collaborative process consisting of cycles of structured building of and opportunistic restructuring of the requirements model. Robertson (2005) set a challenge to RE practice to recognize the importance of discovery and invention of new ideas in the requirements acquisition process rather than simply relying on passive elicitation and analysis of what users say they need. This challenge spurred a review of the role of the systems analyst during the elicitation process, which now has been described as the requirements discovery process. A series of creativity workshops in RE were conducted by Maiden and his colleagues at City University, London, United Kingdom (Maiden & Robertson, 2005; Maiden, Manning, Robertson, & Greenwood, 2004; Maiden & Gizikis, 2001), in which they demonstrated how various creativity techniques, such as brainstorming, domain mapping, analogy reasoning, and constraint
Learning to Become a Creative Systems Analyst
213
removal, to name a few, can be incorporated within structured RE processes to discover and explore ideas and requirements. Other creativity techniques were also suggested to be incorporated within the requirements elicitation by other researchers (Mich, Anesi, & Berry, 2004; Schmid, 2006). Nguyen and Shanks (2006a) reviewed different characteristics of the creative processes in the creativity literature and design studies, related them to the RE process, and called for an integrated process and tool environment to support the systems analyst in adopting creative techniques and tools capable of exploring and structuring the problem space in RE. From a combination of collaborative and cognitive perspectives, a group of researchers at the University of South Australia currently investigate and develop an ICT-enabled1 environment to support creative team problem solving using the distributed cognition theoretical foundation (Blackburn, Swatman, & Vernik, 2006). Another challenge in supporting creativity in RE in an organizational setting was discussed at length by RE practitioners and business and IT managers participating in a focus group (Cybulski et al., 2003). The management practice and organizational culture strongly influence not only the development, but also the appraisal and adoption of creative IT-enabled solutions to business problems. According to Nguyen and Shanks’s (2006a) creativity framework for RE, novelty, value, and surprisingness can be used as three characteristics to recognize and evaluate the creative outcome in RE. Novelty refers to the extent that the new system is different from existing systems. Value refers to the usefulness, correctness, and fit (appropriateness) of the system in the context of use. Surprisingness refers to the unexpected features of the system. Research is currently under way to define ways to assess these characteristics. To support creativity in RE, it is also important to appreciate changes to norms that have traditionally been accepted within, practiced by, and grounded in the organizational culture (Regev, Gause, & Wegmann, 2006). In a similar vein, the Creativity in Requirements Engineering framework classifies and describes various individual and organizational factors that influence creativity by the systems analysts (Cybulski et al., 2003; Dallman, Nguyen, Lamp, & Cybulski, 2005). Overall, creativity has recently received increasing interest within the RE research community. Creativity techniques and tools can be integrated within various requirements engineering approaches to inform and support systems analysts in their collaborative effort to invent and develop requirements for new information systems, including virtual environments for training and education in the military. While such integrated approaches promise potential benefits, the primary focus of the chapter is the creative systems analyst as an expected outcome of a training environment. Creativity plays an important role for those systems analysts who want to add novelty and value while exploring and constructing the problem space and subsequently when solving the problem. However, fostering creativity in RE practice and teaching creative methods in RE education have so far received very little attention (Dallman, 2004; Armarego, 1
ICT: information and communication technology.
214
Learning, Requirements, and Metrics
2004; Nguyen et al., 2005). As a result, experienced practitioners and RE learners are not well informed of how to practice RE creatively. LEARNING TO BECOME A CREATIVE SYSTEMS ANALYST Learning Process Throughout the RE literature, there has been a common agreement that the RE process can be characterized as application domain specific, technical, and contextual, that is, embedded within a specific organization and social setting (Sutcliff & Maiden, 1998; Jackson, 2005; Coughlan & Macredie, 2002; Checkland & Scholes, 1999; Goguen, 1997). Further, RE can be seen as a process of solving “wicked” problems that involve technical issues, social complexity, and dynamics (Conklin, 2006). The problem solving activity in RE requires both problem understanding as well as problem solving (Visser, 1992). The problem solver continually interprets the problem situation, constructs a knowledge representation of the problem, and forms and evaluates possible solutions. This intertwining process of problem understanding and solving is reflected in the incremental structuring and occasional restructuring of the requirements model. There are important implications for the systems analyst to be viewed as a learner. • The emergence of the problem situation suggests that RE itself is a learning process, more specifically, a constructivist learning process during which the learner constructs his or her knowledge by structuring and reflecting upon the emergent problem (Gero, 1996; Scho¨n, 1996; Armarego, 2004; Robillard, 2005). • Creativity plays an important role for the exploration, construction, and expansion of the problem space. Indeed, creativity is defined as an internal process of exploration and transformation of the conceptual space in an individual mind (Boden, 1991, 1998). • The problem in RE is of a technical as well as social nature (Conklin, 2006). Therefore the systems analyst’s learning process takes place in a domain specific, social, and collaborative context.
The above implications led us to believe that the fundamental objectives of RE education must also be reevaluated, which led us to grounding the RE learning approach in a synthesis of the constructivist learning and creativity education theories. Based on Piaget’s (1950) theory, constructivist learning refers to the authentic and personal building up of knowledge. This knowledge building process occurs in the individual learner’s mind through two mechanisms: assimilation and accommodation. Assimilation occurs when the learner interprets and incorporates new learning into an existing conceptual framework representing his or her knowledge of a topic area. Accommodation occurs when the learner could not fit the new learning into his or her existing framework; as a result, he or she reframes (restructures) the existing conceptual framework. These two mechanisms are consistent with the structuring and restructuring activities in RE (Nguyen & Swatman, 2006). Vygotsky (1978) stresses the important role of a
Learning to Become a Creative Systems Analyst
215
combination of collaboration among learners (through which the learner receives feedback and coaching) and practical exercises (through which the learner constructs knowledge and gains skills). These underpinning theories of constructivist learning have been synthesized into the three dimensions of endogenous, exogenous, and dialectic constructivism (Moshman, 1982): • Endogenous dimension: The learner learns through an individual construction of knowledge. Accommodation and assimilation are two mechanisms that enable the endogenous construction of knowledge. The teacher can play a facilitator role, but the learner takes a more active role and assumes ownership of his or her learning and knowledge building. • Exogenous dimension: The learner learns from a combination of formal instructions and realistic and relevant exercises through which he or she refines knowledge through instructions and feedback received from the teacher when undertaking practical exercises. • Dialectic dimension. The learner learns through collaboration and interaction with teachers (experts) and peers through realistic experiences. The scaffolding provided by the more capable collaborators is especially important.
Dalgarno (2005) developed a three-dimensional learning environment that incorporated elements from these three different dimensions of constructivist learning. His successful application of this learning environment in teaching chemistry encouraged us to pursue a rich RE learning environment in which the learner will be supported with elements from the above constructivism dimensions. The learning should take place through a range of learning activities: knowledge acquisition from formal instructions, practical exercises and project based realistic experiences, as well as collaborative and individual construction of knowledge. While extrapolating this view of constructivist learning, we have examined the issue of creativity education, where there has been an argument about whether creativity is a domain-specific or domain-general ability. There has been a strong view that creativity is inherently associated with a certain type of intelligence and that domain expertise is required to identify to what extent a creative product extends a domain knowledge boundary (Solomon, Powell, & Gardner, 1999; Gardner, 1993). Therefore, creativity should be seen as domain specific and creativity education should be adapted to a specific domain. However, RootBernstein and Root-Bernstein (2004) argued that creativity should rather be seen as domain general because it is inherently associated with commonly intuitive and metacognitive capabilities; therefore, creativity education should target intuitive and metacognitive learning. Baer and Kaufman (2005) argued that creativity includes both domain-general as well as domain-specific capabilities. They developed the Amusement Park Theoretical (APT) model for creativity education. Their APT integrates both domain-general and domain-specific creativity elements. Based on this theory, domain-general creativity elements include intelligence, motivation, and environment, that is, creativity-supported culture; whereas domain specific creativity elements are categorized from a general thematic area to a domain and microdomain subarea. APT has been suggested as
216
Learning, Requirements, and Metrics
having potentials in creativity education in RE (Nguyen & Shanks, 2006a). We adapt APT specifically to the RE domain: • At the level of general thematic creativity: intelligence can be determined as problem understanding and solving and social and communication skills; individual motivation should be recognized and linked to learning objectives, and learning environment elements need to be identified and linked to the constructivism dimensions. • At the level of RE specific creativity: business knowledge (for example, training programs and processes in military), technology knowledge, analysis and modeling techniques and tools, and creativity techniques and tools should be integrated to generate creative ideas and to recognize and evaluate creative products.
Having synthesized and adapted the above constructivism dimensions and APT theory to RE, we propose a learning environment including the elements shown in Figure 11.1 to support a constructivist learning approach that incorporates creativity learning for systems analysts. A creativity-supported culture is identified as an element at the general creativity level in APT (Baer & Kaufman, 2005). This element is adapted in our approach as a simulated learning environment to support different constructivism dimensions (through promoting flexibility in framing and reframing knowledge and collaborative creativity). Different levels of (domain-general and domainspecific) creativity elements are integrated within this learning environment. This adaptation assists the learner in recognizing and understating constructivism
Figure 11.1.
Incorporating Creativity Learning within Constructivist Learning
Learning to Become a Creative Systems Analyst
217
dimensions supported by a particular learning program and taking advantage of how the program could support his or her learning process. For example, a course based program (formal education) would potentially support the exogenous dimension of learning, in which case, the learner should apply formal instructions from the learning program while working on RE exercises. Based on the feedback from the instructor, the learner should refine and clarify the knowledge received. However, with RE workplace learning, the learner should apply the accommodation and assimilation mechanisms proactively in more realistic experiences and seek collaboration and formal approval of knowledge constructed from time to time. As different constructivism dimensions are integrated within our proposed environment, the learner needs to recognize them and take advantage of their integration (see the next section). At the level of general problem solving creativity, our proposed approach includes the following: • Elements for intelligence building, such as problem understanding and solving capabilities (for example, problem recognition, strategy planning, idea generation and brainstorming, solution formulation and evaluation, and so forth). • Elements for identifying and communicating motivations. Individual learner’s motivations and learning objectives need to be identified and communicated with the teacher (or coach) to align reward mechanisms to suit individual learning objectives and motivations. • Support for social interactions and collaboration in problem based projects, within and cross-team communication, and with facilitator(s). A combination of social software and face-to-face interactions can be used to facilitate electronic communication and collaboration to allow the learner to acquire social and communication skills (team building, negotiation, exchange of information, group collaborative support, and so forth).
At the RE specific level, our proposed approach integrates the following: • Support for the learner to learn through relevant experiences (small exercises, case studies, and projects) through providing appropriate knowledge and instructions— processes (such as Waterfall, Rapid Application Development, Agile development, and so forth), elicitation techniques (such as scenario based interviews and observation), modeling techniques (such as use case, object oriented, data flow diagram, entity relationship, and so forth), and requirements management tools (see Volume 1, Section 2 Perspective). • Support for individual as well as collective creativity. Creativity techniques, such as brainstorming, imagination, search for ideas, idea association, analogical thinking and play, as well as the use of creativity tools, will be integrated within the RE lifecycle. • Support for flexible cognitive processes and support for monitoring the evolutionary structuring and insight-driven restructuring of the requirements model.
The next section will illustrate the proposed conceptual framework in a case in which creativity was incorporated within a constructivist learning approach demonstrated by students undertaking an RE subject in various master degree programs, including business, commerce, and information systems.
218
Learning, Requirements, and Metrics
A Case of Learning Creative RE Using a Simulated Learning Environment A project in RE (Cybulski, Parker, & Segrave, 2006a, 2006b) was designed to enhance and enrich the learners’ abilities to discover and elicit information systems requirements (from both business and technology viewpoints)—an essential skill of the information systems professional. While teaching requirements elicitation is common in information systems and software engineering schools, such teaching is usually limited to conducting simple interviews and formalizing the collected information into a requirements specification. The more challenging requirements elicitation skills, which unfortunately are very often neglected, include detection of conflicting and redundant information, handling omission of essential facts, and dealing with the absence of management approval and customer feedback to fully validate the collected and analyzed requirements. Through our project, the learner was engaged in various activities to overcome the above-mentioned problems, to independently and collaboratively seek solutions to these problems, and to apply some creative approaches to dealing with the shortcomings of the specified requirements. In this way, the project was designed to support endogenous and dialectic constructivism and creativity learning at a general thematic level of (business) problem solving creativity. Our RE project (codenamed FAB ATM) required the learners to work in teams to produce specification of a banking product (a new generation of automated teller machines [ATMs]). In the initial stages of the project, the teams used a computer simulation (henceforth called FAB ATM simulation) of a virtual meeting room, where the learners had an opportunity to meet with the simulated staff of a hypothetical banking organization (FAB—the First Australian Bank). During the meetings, the teams conducted a series of interviews with a view to collecting requirements for their project. The simulated interviews allowed project teams to first design interview questionnaires and then engage simulated interview participants in a lengthy conversation (see Figure 11.2). The requirements elicited in the process of such interviews represented distinct viewpoints of the bank staff, for example, a technical officer or a branch manager. After the interviews, the learners had to analyze the collected requirements and identify redundancies, conflicts, and omissions; all of which had to be reconciled, removed, or filled in with information obtained in the process of self-directed research. Results of these activities were eventually presented to the bank manager (role-played in the real world) for validation and the final approval and subsequently sealed in the form of a consistent specification document. Immersive educational simulations, games, and role-playing were central to the conduct of our RE project. To support different constructivism dimensions, we used a blended simulation learning environment, where some activities were conducted in classes (lectures and tutorials), some in project teams (faceto-face meetings, online discussion boards, and chat rooms), and yet others with the use of a virtual (meeting) environment—named Deakin LiveSim (Cybulski et al., 2006a, 2006b). Through formal classes (lectures and tutorials), the blended learning environment supported exogenous constructivism. Through a range of inter- and cross-team communication activities and project-related consultation
Learning to Become a Creative Systems Analyst
219
Figure 11.2. Simulated RE Interviews with Multiple Participants in a Corporate Environment
(provided by teachers), the blended environment supported dialectic constructivism. Through simulated interviews, documentation collection, and interpretation, the learners acquired general problem understanding and problem solving skills. In the provided qualitative feedback, the learners generally praised the experiences gained with the simulation; for example, they said, The interview CD is really a very good idea, which offered us a virtual interview environment via multiple media technology. The interview stimulation program did offer us a chance to be a part of interview, to touch it, to feel it and to experience it. The actual interview simulation session was very informative and convenient allowing some flexibility in the actual interview technique.
To incorporate RE creativity learning, a typical RE lifecycle was adopted in the FAB ATM simulation project (see Figure 11.3). The process took the learners through the learning “funnel,” which leads them from the fuzziest (completely open to imagination and creativity—for example, extending ATM with share trading, Web based, or human-touch user interfaces) to the most formal and constrained knowledge (which requires breaking of technology and business dogmas to arrive at some workable solutions). Individual learners started their project work
220
Learning, Requirements, and Metrics
Figure 11.3.
RE Activities Supported by the Simulated Learning Environment
by investigating the problem domain (research) and then in groups exploring and planning their projects (that is, both individual and collective brainstorming and scenarios solution exploration). These were followed by gathering requirements (via a face-to-face backgrounder and interviews with simulated people), analyzing the discovered and elicited requirements (to include their formalization and scoping), and later investigating information systems requirements, alternative solutions, and possible business alignment issues (using user context and user voice analysis teaching). Their next stage involved the verification and validation of requirements (using a formal presentation and feedback collection), integration of new requirements with adapted legacy requirements (in a specification document), and finally the project completion. All learners were also asked to reflect upon, elaborate, and document knowledge and experience gained. In the FAB ATM project, the simulation was used to confront the learner’s (often unstoppable) creativity, imagination, preconceptions, and ideas with “reality.” The information was gathered by asking questions and listening to the answers provided by the simulated people, observing their body language and
Learning to Become a Creative Systems Analyst
221
passing judgment on the degree of trust that could be vested in them, taking notes, working with vastly incomplete data and working under pressure of time, and engaging in independent investigation and collaboration with team members and the simulated people. The tasks that specifically demanded learners to invoke their creative problem solving can be found across the entire project and the RE lifecycle, but it could be specifically located in a number of problem domains, that is, in aiming at business/IT alignment, coping with the richness of the stakeholder base, overcoming deficiencies of the legacy system, setting requirements for technology reuse, dealing with technology selection and innovation, and facing the challenges of the imminent business change. While this blended simulated learning environment (such as that used in the FAB ATM project) cannot completely replace student placement in a real organization, it provides the learners with a safe environment in which they can experiment with different possible outcomes (Cybulski et al., 2006a, 2006b). The FAB ATM project adopts a partial view of reality, which can be referred to as “circumscribed” reality. Such circumscribed reality simulations attach only key aspects of authenticity to their objects and environment. While they sacrifice some degree of reality, at the same time, they never cross the threshold of acceptability to the learner. The FAB ATM simulation provides learners with rich interactivity, which relies on a state machine implemented in Macromedia Flash and which in real time combines video fragments of live people to deliver conversational characters with meaningful behavior. While media form and interaction are simple for the learner, the complexity is created in the learner’s mind rather than in the technology used to support the environment. Finally, any educational computer simulation ought to be part of a larger educational framework with many aspects of learner experience. Hence, our FAB ATM computer simulation supported endogenous constructivism. In addition, our FAB ATM simulation provided the teacher with an opportunity to be in control of educational outcomes (by defining objectives to be reached) and processes (by setting tasks to be undertaken, stages to be completed, and methods to be used by learners) and to achieve the comparability of gained experience, which is hard to attain in the real-life projects and student placement situations. The FAB ATM project provided us with many opportunities to apply innovative and effective learning styles (see Figure 11.4). In addition to the traditional ways of learning by “being told” in lectures, “by discovery” in tutorials and “by doing” in projects, the FAB ATM project also provided avenues for learners to learn by experiencing work and by taking on professional roles of business consultants and systems analysts. All these learning styles are actively pursued in lectures (via demonstrations), tutorials (via discussions), and projects (via a virtual environment). This is achieved by learners being immersed in an authentic and believable simulation environment (such as that used in FAB ATM simulation) and by conducting realistic tasks that allow students to learn “by observing” people’s behavior in a complex corporate setting, “by playing” the professional roles, and “by communicating” and “by collaborating” with their team members
222
Learning, Requirements, and Metrics
Figure 11.4.
Learning Styles Supported by the Simulated Learning Environment
and with the simulated characters (in both virtual and real contexts). Finally, learners also took on the responsibility of teaching each other in face-to-face meetings and online discussion. The richness of the available learning styles offered RE teachers alternative paths to students’ minds, to the seamless creation of new knowledge and skills, and most importantly, to the effective development of professional experience. Through all these, our learning environment supported elements of endogenous, dialectic, and exogenous constructivism learning and teaching. While in our FAB ATM project, the simulation system simulated a technology innovation project in the banking domain, it was based on the typical RE lifecycle. Therefore, the system has the potential to simulate a requirements project in other domains, for example, to develop requirements for a VE training system. The blended approach to educating creative systems analysts provided us with an opportunity to arrive at a compromise between educational outcomes (acquiring knowledge, developing skills, embracing creativity, and gaining experience) and environmental constraints (time, costs, labor, and quality). We relaxed the confines of the problem settings to foster students’ creativity and then confronted them with the reality of which rigidity could be overcome only by breaking technical and business dogmas in the creative fashion. By circumscribing the learner’s reality, we used a combination of simulated reality and virtuality to
Learning to Become a Creative Systems Analyst
223
immerse individuals in the authentic and believable problem situation, yet we were able to control educational outcomes and provide the safety of the protected educational context. We used a variety of media and learning approaches to support the learning process, not only to facilitate students to gain skills, knowledge, and creativity, but also to achieve these objectives creatively. CONCLUSION This chapter weaves a story of requirements engineering education. As many other good stories, the chapter provides a lesson to learn for the reader, be you a practitioner, an educator, or a student. As the nature of information systems changes, so does the role of systems analysts, who are now required to act not only as human repositories for users’ wishes, demands, and requirements, but also to become inventors, innovators, and learners and be the facilitators of such innovativeness and scholarship among their clients and users. Thus, the shift from requirements elicitation to requirements discovery poses new challenges for RE practice, and a major problem is the apparent lack of the creative knack in the systems analysts’ skill portfolio. This is also a challenge for RE educators, who need to expose their students to the authentic and believable situations in which learners can be immersed in realistic problems and in which they can truly experience the processes of domain learning and problem solving and the wickedness of the social and organizational complexities, which constantly redefine the problem and rescope its many solutions. Games, role-playing, and simulations could become part of the answer to the newly posed challenges. However, yet again we may find that it is of fundamental importance for learners and educators to be inventive and open to self-improvement and learning. And so, we, too, need to take the creative path of risk and innovation and to employ new and exciting approaches to using learning and teaching technologies. REFERENCES Armarego, J. (2004, December). Learning requirements engineering within an engineering ethos. Paper presented at the 9th Australian Workshop on Requirements Engineering (AWRE’04), Adelaide, Australia. Baer, J., & Kaufman, J. C. (2005). Bridging generality and specificity: The Amusement Park Theoretical (APT) model of creativity. Roeper Review, 26, 158–163. Billett, S. (1995). Workplace learning: Its potential and limitations. Education + Training, 37(5), 20–27. Blackburn, T., Swatman, P., & Vernik, R. (2006, May). Cognitive dust: Linking CSCW theories to create design processes. Paper presented at the 10th Computer Supported Cooperative Work in Design (CSCWD’06), Nanjing, China. Boden, M. A. (1991). The creative mind: Myths and mechanisms. New York: Basic Books, Inc. Boden, M. A. (1998). Creativity and artificial intelligence. Artificial Intelligence, 103, 347–356.
224
Learning, Requirements, and Metrics
Checkland, P., & Scholes, J. (1999). Soft systems methodology in action: A 30-year retrospective. New York: Wiley. Conklin, J. (2006). Dialogue mapping: Building shared understanding of wicked problems. Hoboken, NJ: John Wiley and Sons Ltd. Coughlan, J., & Macredie, R. D. (2002). Effective communication in requirements elicitation: A comparison of methodologies. Journal of Requirements Engineering, 7(2), 47–60. Cybulski, J., Nguyen, L., Thanasankit, T., & Lichtenstein, S. (2003, July). Understanding problem solving in requirements engineering: Debating creativity with IS practitioners. Paper presented at the Pacific Asia Conference on Information Systems (PACIS 2003), Adelaide, Australia. Cybulski, J., Parker, C., & Segrave, S. (2006a, December). Touch it, feel it and experience it: Developing professional IS skills using interview-style experiential simulations. Paper presented at the 17th Australasian Conference on Information Systems (ACIS 2006), University of South Australia, Adelaide. Cybulski, J., Parker, C., & Segrave, S. (2006b, December). Using constructivist experiential simulations in RE education. Paper presented at the 11th Australian Workshop on Requirements Engineering (AWRE’06), University of South Australia, Adelaide. Dalgarno, B. (2005). The potential of 3D virtual learning environments: A constructivist analysis. e-Journal of Instructional Science and Technology (e-JIST), 5(2). Dallman, S. (2004). What creativity do students demonstrate when undertaking requirements engineering. Melbourne, Australia: Deakin University. Dallman, S., Nguyen, L., Lamp, J., & Cybulski, J. (2005, May). Contextual factors which influence creativity in requirements engineering. Paper presented at the 13th European Conference on Information Systems (ECIS 2005), Regensburg, Germany. Dennis, A., Wixom, B. H., & Tegarden, D. (2004). Systems analysis and design with UML Version 2.0: An object-oriented approach (2nd ed.). Hoboken, NJ: John Wiley & Sons. Gardner, H. (1993). Frames of mind: The theory of multiple intelligences (10th anniversary ed.). New York: Basic Books. Gero, J. S. (1996). Creativity, emergence and evolution in design: Concepts and framework. Knowledge-Based Systems, 9(7), 435–448. Gervasi, V., Kamsties, E., Regnell, B. O., & Achour-Salinesi, C. B. (2004, June). Ten years of REFSQ: A quantitative analysis. Paper presented at the International Workshop on Requirements Engineering: Foundation for Software Quality (REFSQ’05), Riga, Latvia. Goguen, J. A. E. (1997). Towards a social, ethical theory of information: In social science research. In G. Bowker, L. Gasser, L. Star, & W. Turner (Eds.), Technical systems and cooperative work: Beyond the Great Divide (pp. 27–56). Mahwah, NJ: Lawrence Erlbaum. Jackson, M. (2005). Problem frames and software engineering. Information & Software Technology, 47(14), 903–912. Kotonya, G., & Sommerville, I. (1998). Requirements engineering: Processes and techniques. West Sussex, England: John Wiley & Sons. Maiden, N., & Gizikis, A. (2001). Where do requirements come from? IEEE Software, 18(5), 10–12. Maiden, N., Manning, S., Robertson, S., & Greenwood, J. (2004, August). Integrating creativity workshops into structured requirements processes. Paper presented at the 2004 Conference on Designing Interactive Systems, Cambridge, MA.
Learning to Become a Creative Systems Analyst
225
Maiden, N., & Robertson, S. (2005, September). Integrating creativity into requirements engineering process: Experiences with an air traffic management system. Paper presented at the 13th IEEE International Conference on Requirements Engineering (RE’05), Paris, France. Mayer, R. E. (1992). Thinking, problem solving, cognition (2nd ed.). New York: W. H. Freeman and Company. Mich, L., Anesi, C., & Berry, D. M. (2004, June). Requirements engineering and creativity: An innovative approach based on a model of the pragmatics of communication. Paper presented at the Requirements Engineering: Foundation of Software Quality (REFSQ’04), Riga, Latvia. Minor, O., & Armarego, J. (2004). Requirements engineering: A close look at industry needs and model curricula. Proceedings of the 9th Australian Workshop on Requirements Engineering (AWRE’04; pp. 9.1–9.10). Available from http://awre2004. cis.unisa.edu.au/ Moshman, D. (1982). Exogenous, endogenous, and dialectical constructivism. Developmental Review, 2, 371–384. Nguyen, L., Armarego, J., & Swatman, P. (2005, December). Understanding the requirements engineering process: A challenge for practice and education. Paper presented at the 7th International Business Information Management Association, Cairo, Egypt. Nguyen, L., Carroll, J., & Swatman, P. A. (2000, January). Supporting and monitoring the creativity of IS personnel during the requirements engineering process. Paper presented at the 33rd Hawaii International Conference on System Sciences (HICSS-33), Maui, HI. Nguyen, L., & Shanks, G. (2006a, December). A conceptual approach to exploring different creativity facets in requirements engineering. Paper presented at the 17th Australasian Conference on Information Systems (ACIS 2006), Adelaide, Australia. Nguyen, L., & Shanks, G. (2006b, September). Using protocol analysis to explore the creative requirements engineering process. Paper presented at the Information Systems Foundations: Theory, Representation and Reality, 3rd Biennial ANU Workshop on Information Systems Foundations, Canberra, Australia. Nguyen, L., & Swatman, P. A. (2003). Managing the requirements engineering process. Requirements Engineering, 8(1), 55–68. Nguyen, L., & Swatman, P. A. (2006). Promoting and supporting requirements engineering creativity. In A. H. Dutoit, R. McCall, I. Mistrik, & B. Paech (Eds.), Rationale management in software engineering (pp. 209–229). Berlin, Germany: Springer-Verlag. Nguyen, L., Swatman, P. A., & Shanks, G. (1999). Using design explanation within formal object-oriented method. Requirements Engineering, 4(3), 152–164. Nuseibeh, B. A., & Easterbrook, S. M. (2000). Requirements engineering: A roadmap. Proceedings of the Conference on the Future of Software Engineering (pp. 35–46). New York: ACM Press. Ohlsson, S. (1984). I. Restructuring revisited: Summary and critique of the Gestalt theory of problem solving. Scandinavian Journal of Psychology, 25, 65–78. Opdahl, A. L., Dubois, E., & Pohl, K. (2004, June 7–8). Ten years of REFSQ: Outcomes and outlooks. Paper presented at the International Workshop on Requirements Engineering: Foundation for Software Quality (REFSQ’05), Riga, Latvia. Piaget, J. (1950). The psychology of intelligence. New York: Routledge.
226
Learning, Requirements, and Metrics
Regev, G., Gause, D. C., & Wegmann, A. (2006, September). Creativity and the age-old resistance to change problem in RE. Paper presented at the 14th IEEE International Requirements Engineering Conference (RE’06), Minneapolis, MN. Robertson, J. (2005, January/February). Requirements analysts must also be inventors. IEEE Software, 22(1), 48, 50. Robertson, S., & Robertson, J. (2005). Requirements-led project management: Discovering David’s slingshot. Boston, MA: Addison-Wesley. Robillard, P. N. (2005, November/December). Opportunistic problem solving in software. IEEE Software, 22(6), 60–67. Root-Bernstein, R., & Root-Bernstein, M. (2004). Artistic scientists and scientific artists: The link between polymathy and creativity. In R. J. Sternberg, E. G. Grigorenko, & J. L. Singer (Eds.), Creativity: From potential to realization (pp. 127–151). Washington, DC: American Psychological Association. Schmid, K. (2006). A study on creativity in requirements engineering. SoftwaretechnikTrends, 26(1). Retrieved August 2006, from http://pi.informatik.uni-siegen.de/stt/26_1/ Scho¨n, D. A. (1996). Reflective conversation with materials. In T. Winograd (Ed.), Bringing design to software (pp. 171–184). New York: ACM Press. Simon, H. A. (1992). Sciences of the artificial. Cambridge: MIT Press. Solomon, B., Powell, K., & Gardner, H. (1999). Multiple intelligences. In M. A. Runco & S. R. Pritzker (Eds.), Encyclopedia of creativity (Vol. 2, pp. 273–283). San Diego, CA: Academic Press. Sommerville, I., & Sawyer, P. (1997). Requirements engineering: A good practice guide. Chichester, England: John Wiley & Sons Ltd. Sutcliff, A., & Maiden, N. (1998). The domain theory for requirements engineering. IEEE Transactions on Software Engineering, 24(3), 174–196. Visser, W. (1992). Designers’ activities examined at three levels: Organisation, strategies and problem-solving processes. Knowledge-Based Systems, 5(1), 92–104. Vygotsky, L. S. (1978). Mind and society: The development of higher mental processes. Cambridge, MA: Harvard University Press.
SECTION 3
PERFORMANCE ASSESSMENT SECTION PERSPECTIVE Eduardo Salas and Michael A. Rosen Performance measurement and assessment are fundamental components for effective training. They drive the provision of feedback and decisions about remediation. Specifically, in order to maximize learning by providing corrective feedback and to decide what future training is needed, the trainee’s current competency must be assessed and diagnosed. This is as true for training that occurs in virtual environments (VEs) and simulations as it is for training that occurs in a classroom. However, while performance assessment is rarely a simple task in any context, there are unique and challenging issues when assessing performance for training in VEs. Therefore, the chapters in this section are dedicated to exploring a wide variety of performance assessment issues in VEs for training. The goal of this Section 3 Perspective is to place these contributions into context by providing an overview of the fundamentals of performance assessment and measurement in VEs and training. Specifically, we address three main goals. First, we provide an overview of the need for performance assessment and measurement in VEs for training, as well as some of the major challenges to doing this effectively. Second, we describe the concept and process of performance diagnosis, the goal of assessment and measurement in training. Third, we provide a review of some guiding principles to developing performance assessment and measurement systems that are diagnostic of trainee competencies. PERFORMANCE IN VEs FOR TRAINING: WHY DO IT, AND WHAT IS SO HARD ABOUT IT ANYWAY? Simulations have long been used to prepare people for tasks and conditions of performance, particularly those that were important, infrequent, or dangerous. For example, archaeological evidence suggests that leather birthing models (much like modern plastic birthing simulators) were used to teach maneuvers to assist in childbirth in prehistoric times (Macedonia, Gherman, & Satin,
228
Learning, Requirements, and Metrics
2003). Additionally, the board game chaturanga, the ancestor of modern chess developed in seventh century India, provided a simulation of battle and was used to develop strategic and tactical thinking in military commanders. The historical and archaeological record is filled with examples such as these all the way up to and including modern simulation and training beginning in the early part of the last century (for example, flight simulation for the training of military pilots). Using simulations for developing skills, therefore, is not new or novel by any account. However, the modern use of simulations and VEs for training is distinct from this longer historical tradition in two important ways: increased sophistication of the learning environment and increased use for systematic training. Both of these differences have important implications for performance assessment and measurement. First, the technological sophistication of the simulations and VEs used for modern training vastly exceeds preceding learning tools and environments. Spurred primarily by the geometrically increasing power of computers and their ability to represent more and more of the real world with higher levels of physical fidelity, the gap between simulated and simulation is ever narrowing. This increased sophistication of the technology used for training in simulations and VEs offers an increasingly robust palette for the design of learning environments. It has been clearly demonstrated that more physical fidelity (that is, a closer approximation of the physical detail of a targeted real world task) is not always necessary for reaching desired learning outcomes (Hays & Singer, 1989). In fact, some intentional deviations from exact replication of the real world task can produce better training results for some tasks (for example, above real time training for pilots; Lane, 1994) and some trainees (for example, increasing fidelity as the learner progresses from novice to expert; Dreyfus & Dreyfus, 1986). Ultimately though, the greater power and control afforded the developer of learning environments, the greater potential for increased training effectiveness. However, this poses a significant challenge. The more complex the task or learning environment, the more complex the performance of the trainees will become, complicating the matter of measuring performance and generating feedback during practice. Essentially, the complexity of the learning environment can now equal or surpass that of the real world task environment. Consequently, many of the measurement problems associated with measuring performance in the real world (Arvey & Murphy, 1998) are present in the VEs. Second, VEs are used for systematic training and not just experiential learning. That is, VEs are used as part of a training delivery method for the acquisition of specified knowledge, skill, and attitude (KSA) competencies that underlie effective performance. This is an explicit process of identifying the type and level of performance the trainees should acquire during training, designing scenarios that afford opportunities to practice and acquire these competencies, and providing trainees with performance feedback. This is in contrast to unguided practice where the learner may not be practicing correct forms of performance or receiving feedback. Practice and experience are the primary means by which expertise
Performance Assessment
229
develops, but this practice must be structured, guided, and accompanied by feedback (Ericsson, Krampe, & Tesch-Romer, 1993). Taken together, these two points highlight (1) the great promise of training with VEs and (2) a great challenge to achieving this promised effectiveness. That is, training with VEs can potentially accelerate the development of expertise, resulting in a workforce with more individuals functioning at higher levels of effectiveness; however, in order to do this, practice opportunities in VEs must be systematically engineered and trainees must receive corrective feedback. In the following sections, we discuss performance diagnosis, the process of measuring performance to determine the causes of effective and ineffective performance. This information is necessary to make good decisions during the training process. PERFORMANCE DIAGNOSIS: HOW DO YOU ASSESS PERFORMANCE IN VEs? Performance is the actions involved in completing a task (Fitts & Posner, 1967); it is not the results or outcomes of these actions, but the actions themselves (Campbell, 1990). This is an important distinction to make, especially in the context of training. It is the processes of performance, the actions taken while completing a task (that is, how the task is done), that are trained, not the outcomes (that is, the results of all of the actions). Consequently, the processes of performance are what must be assessed during training. Most fundamentally, performance assessment involves (1) capturing performance in some manner (quantitative or qualitative data), (2) comparing that representation of performance to some standard, and (3) making decisions based upon the comparison between observed performance and the standard or target performance. In the context of training in VEs, an idealized version of this process proceeds as follows: 1. Trainee performance is measured during practice activities within the VE. 2. These metrics are compared to prespecified learning objectives and lists of competencies targeted for training to determine the degree to which the trainee possesses these targeted competencies and has met the learning objectives (that is, has the trainee learned the appropriate performance?). 3. The results of this comparison are used to make decisions about what feedback to give and what future training is required to ensure that the trainee has reached the specified learning objectives.
The above three-step process appears simple enough, but can be quite complex to carry out effectively. In many cases, difficulty results from such issues as the multidimensional and dynamic nature of performance being trained and measured, and such practical constraints as the availability of observers and trainers. Additionally, many potential purposes drive the development and implementation of performance assessment or measurement systems for use in VEs. These include selection of personnel, tests for certification or qualification, feedback
230
Learning, Requirements, and Metrics
and remediation during training, assessment of interventions, validation of virtual environments, and others. Developing any measurement system involves tradeoffs, and the purpose of the measurement system determines how these tradeoffs are made. For training in VE, the ultimate purpose of performance measurement is diagnosis; that is, performance measurement should allow for correct inferences to be made about the causes of effective and ineffective performance (Salas, Rosen, Burke, Nicholson, & Howse, 2007; Cannon-Bowers & Salas, 1997). Knowing why an individual or team performed a certain way provides the necessary information for generating and providing the feedback needed for learning. In order to do this, a performance measurement system must capture a broad array of performance measures, a performance profile. This performance profile is illustrated in Figure SP3.1 and is briefly described here. For a more detailed discussion of these issues, see Salas and colleagues (2007). Different measures have different “informational yields” (Swing, 2002); that is, some measures are more informative than others, but no one measure tells the entire story. Therefore, for a measurement system to be diagnostic of the reasons behind an observed performance, it should incorporate a broad range of measures from different sources. This performance profile can then be used to make more informed decisions during training. To that end, a performance profile should have measures that (1) capture performance at multiple levels of analysis (for example, team, individual, and multiteam systems), (2) are grounded in the context of the task being trained (for example, they are not abstract performance dimensions), (3) are linked to the competencies (that is, the KSAs targeted for training, (4) are descriptive of the processes of performance, (5) are captured
Figure SP3.1. Characteristics of Measures Involved in Creating a Performance Profile (From Salas et al., 2007)
Performance Assessment
231
from multiple sources, and (6) capture performance over time. These measures are captured during practice activities (step one of the process outlined above). The resulting data are used to determine the degree to which trainees possess the competencies targeted for training (step 2). This assessment is used to make decisions about feedback and remediation targeted at a specific trainee’s needs (that is, the deficiencies in competencies that may exist). In the following section, we provide a description of some guiding principles for carrying out performance diagnosis. GUIDING PRINCIPLES FOR PERFORMANCE DIAGNOSIS IN VEs In the preceding sections, we have identified why performance assessment is critical for training in VEs and some general challenges to effective performance measurement and assessment in VEs. In this section we summarize existing best practices and guidelines for developing diagnostic measurement systems in scenario based training (for example, Salas, Rosen, Held, & Weissmuller, in press; Rosen, Salas, Wilson, et al., 2008; Oser, Cannon-Bowers, Salas, & Dwyer, 1999). Measures Should Be Based in Theory Theory provides guidance in the form of descriptions of performance. These descriptions are useful for deciding what to measure, one of the most critical and difficult tasks in designing a performance measurement system. For example, if a VE curriculum is designed to build decision-making skills during urban combat, relevant models of decision making (for example, recognition-primed decision-making model; Klein, 1998) provide a basis for understanding the critical processes involved (for example, situation assessment, pattern recognition, and mental simulation). Similarly, if the VE curriculum is designed to develop teamwork skills, theoretical models are available to guide the development of performance measures for capturing the essential components of performance (for example, communication, mutual support, leadership, situation monitoring, and team orientation; Salas, Sims, & Burke, 2005). Measures Should Capture Competencies and Be Designed to Meet Specific Learning Outcomes Performance measures should be driven by the competencies being trained and the targeted learning outcomes. Theory can be used to develop a general understanding of what must be measured, but subsequently the KSAs targeted for acquisition in a given scenario must be explicitly defined. Most types of performance (for example, teamwork) are far too complex to be fully mastered in one scenario; therefore, the specific competencies targeted must be measured as specifically as possible. For example, if a scenario provides opportunities to practice leadership (but not the other aspects of teamwork), then it makes little sense to measure and provide feedback on the other aspects of teamwork that are not required for performance during a scenario.
232
Learning, Requirements, and Metrics
Measures Should Capture Multiple Levels of Performance when Appropriate Performance is often interdependent. Individuals rarely act alone in modern organizations. Consequently, when interdependent work is being trained in VEs, the performance measurement system must be able to make differentiations between individual taskwork and coordinated teamwork (Cannon-Bowers & Salas, 1997). For example, if a VE is used to train teamwork for medical teams responding to trauma, the performance measurement system must distinguish between the individual competencies necessary for taskwork (for example, can the medic start an IV [intravenous therapy]) and the team level aspects of performance (for example, communication). If an IV is late being started, was it because the medic was inefficient or because communication was ineffective? To answer this and related questions, it is necessary to measure multiple levels of performance. Measures Should Be Linked to Events in the VE As noted earlier, the complexity of the task environments represented by VEs rivals that of the real on-the-job environment. Assessing performance on the job is notoriously difficult due to this complexity (Arvey & Murphy, 1998). Consequently, many of the same challenges arise in VEs. However, a significant advantage in VEs is that the training developer has a significant amount of control over the environment. This control can be used to create opportunities for performance diagnosis by inserting critical events into the scenario. Trainee responses to these events are indicative of the presence or absence of targeted competencies (see Rosen, Salas, Silvestri, Wu, & Lazzara, 2008; Fowlkes, Dwyer, Oser, & Salas, 1998). Measures Should Focus on Observable Behaviors One of the primary advantages of using VEs for training is that they afford the opportunity for dynamic practice, that is, for the trainee to exhibit the actual processes and behaviors of performance. Focusing on the observable behaviors during these practice opportunities has several advantages for the present purposes. First, it greatly increases the reliability of ratings made by observers (Bakeman & Gottman, 1997). Second, it increases the likelihood that performance measurement processes can be captured automatically. Additionally, sensors and input devices of various types frequently are used for individuals to interface with and function in the VE; this input to the system is in the form of observable behaviors exhibited by the trainees. This is input that is already captured automatically and can be used as performance measurement with careful planning. Use Multiple Measures from Multiple Sources Triangulation is a valuable strategy for dealing with the complexity of performance in VEs. That is, viewing performance from multiple “angles” (that is,
Performance Assessment
233
different measurement tools) will provide a more robust understanding of what is happening and allow for more certain inferences to be made about the causes of performance. For example, Campbell, McCloy, Oppler, and Sager (1993) propose that performance is determined by three factors: declarative knowledge, procedural knowledge and skill, and motivation. Therefore, if a trainee does not exhibit the targeted performance, it could be because of low levels in one or more of these factors. Different situations (for example, high declarative and procedural knowledge, but low motivation; high declarative knowledge and motivation, but low procedural) require different feedback and future training to correct the deficiencies. Determining which underlying case is causing the observed performance deficiency requires measuring all three factors. Focus Measures on the Processes of Performance Consider the case of training naval aviators to land on an aircraft carrier. The ultimate criterion of interest is, “Did the pilot land the plane safely on the carrier?” However, measuring just this variable, the outcome of the pilot’s performance processes, does not provide enough information to provide detailed corrective feedback. If the pilot does not land safely (that is, is waived off or crashes into the carrier or water), it definitely indicates that there is a problem, but provides no information on how to fix it. Therefore, the processes of performance should be measured during practice activities and not just the outcomes. Train Observers and Structure Observation Protocols When observations are necessary, and they frequently are in training with VEs, steps must be taken to ensure that the data generated by observers are reliable and valid. Two fundamental approaches to doing this include training raters and providing structured tools to guide observation. First, rater training helps to ensure that the judgments made by one rater are the same (or highly similar) to those made by a second observer rating the same performance. Second, structured observation protocols help to guide the observers’ attention to the critical aspects of performance, reducing their overall level of workload and increasing the accuracy of ratings. These approaches seek to eliminate bias from the observer, reduce the overall level of error in the data, and subsequently increase the quality of data upon which decisions are made. Facilitate Post-Training Debriefs and Training Remediation The overarching goal for most performance measurement systems in VEs for training is to maximize learning, that is, to increase the rate at which individuals acquire the targeted competencies. Therefore, the data captured during practice activities should enable rapid decision making about feedback and remediation. It should provide trainers with the fuel for specific, timely, and targeted feedback (for example, visual aids for after action reviews).
234
Learning, Requirements, and Metrics
CONCLUDING REMARKS We hope the chapters contained within this section provide the community with the motivation, information, and needed tools to ensure VEs promote learning when used for training purposes. We also hope the principles provided in this Section 3 Perspective are validated, expanded, modified, and refined as research is conducted and as practitioners try to apply them. We all can benefit from a collaboration between the researchers and the practitioners involved in designing and delivering VE systems for training. ACKNOWLEDGMENTS This work was partially supported by the Office of Naval Research Collaboration and Knowledge Interoperability (CKI) Program and ONR MURI Grant No. N000140610446 (Dr. Michael Letsky, Program Manager). REFERENCES Arvey, R. D., & Murphy, K. R. (1998). Performance evaluation in work settings. Annual Review of Psychology, 49, 141–168. Bakeman, R., & Gottman, J. M. (1997). Observing interaction: An introduction to sequential analysis (2nd ed.). Cambridge, United Kingdom: Cambridge University Press. Campbell, J. P. (1990). Modeling the performance prediction problem in Industrial and Organizational Psychology. In M. D. Dunette & L. M. Hough (Eds.), Handbook of Industrial and Organizational Psychology. Palo Alto, CA: Consulting Psychologists Press. Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, E. (1993). A theory of performance. In N. Schmitt & W. Borman (Eds.), Personnel selection in organizations (pp. 35–70). San Francisco, CA: Jossey-Bass. Cannon-Bowers, J. A., & Salas, E. (1997). A framework for developing team performance measures in training. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance and measurement: Theory, methods, and applications (pp. 45–62). Mahwah, NJ: Erlbaum. Dreyfus, H. L., & Dreyfus, S. E. (1986). Mind over machine: The power of human intuition and expertise in the era of the computer. New York, NJ: The Free Press. Ericsson, K. A., Krampe, R. Th., & Tesch-Romer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406. Fitts, P. M., & Posner, M. I. (1967). Human Performance. Belmont, CA: Brooks/Cole. Fowlkes, J. E., Dwyer, D. J., Oser, R. L., & Salas, E. (1998). Event-based approach to training (EBAT). The International Journal of Aviation Psychology, 8(3), 209–221. Hays, R. T., & Singer, M. J. (1989). Simulation Fidelity in Training System Design. New York: Springer-Verlag. Klein, G. (1998). Sources of power: How people make decisions. Cambridge, MA: MIT Press. Lane, N. E. (1994). Above Real-Time Training (ARTT): Rationale, effects, and research recommendations (No. NEL-TR-94-01). Orlando, FL: Naval Air Warfare Center Training Systems Division.
Performance Assessment
235
Macedonia, C. R., Gherman, R. B., & Satin, A. J. (2003). Simulation laboratories for training in obstetrics and gynecology. Obstetrics & Gynecology, 102(2), 388–392. Oser, R. L., Cannon-Bowers, J. A., Salas, E., & Dwyer, D. J. (1999). Enhancing human performance in technology-rich environments: Guidelines for scenario-based training. In E. Salas (Ed.), Human/Technology Interaction in complex systems (Vol. 9, pp. 175–202). Stamford, CT: JAI Press. Rosen, M. A., Salas, E., Silvestri, S., Wu, T., & Lazzara, E. H. (2008). A measurement tool for simulation-based training in emergency medicine: The simulation module for assessment of resident targeted event responses (SMARTER) approach. Simulation in Healthcare, 3(3), 170–179. Rosen, M. A., Salas, E., Wilson, K. A., King, H. B., Salisbury, M., Augenstein, J. S., Robinson, D. W., & Birnbach, D. J. (2008). Measuring team performance for simulationbased training: Adopting best practices for healthcare. Simulation in Healthcare, 3(1), 33–41. Salas, E., Rosen, M. A., Burke, C. S., Nicholson, D., & Howse, W. R. (2007). Markers for enhancing team cognition in complex environments: The power of team performance diagnosis. Aviation, Space, and Environmental Medicine Special Supplement on Operational Applications of Cognitive Performance Enhancement Technologies, 78 (5), B77–85. Salas, E., Rosen, M. A., Held, J. D., & Weissmuller, J. J. (in press). Performance measurement in simulation-based training: A review and best practices. Simulation & Gaming: An Interdisciplinary Journal. Salas, E., Sims, D. E., & Burke, C. S. (2005). Is there a big five in teamwork? Small Group Research, 36(5), 555–599. Swing, S. R. (2002). Assessing the ACGME general competencies: General considerations and assessment methods. Acad Emerg Med, 9(11), 1278–1288.
Part VII: Purpose of Measurement
Chapter 12
MEASUREMENT AND ASSESSMENT FOR TRAINING IN VIRTUAL ENVIRONMENTS Jared Freeman, Webb Stacy, and Orlando Olivares This chapter of the handbook presents the fundamental knowledge of measurement and assessment and strategies for using them to improve the design of training in virtual environments (VEs). We define the role of assessment in training venues (VEs included), the components of assessment, the functions measurement and assessment can serve in training, what is assessed, how assessments are made, and how the quality of measures themselves can be evaluated. We close by describing promising future directions for measurement and assessment in VEs. THE IMPORTANCE OF MEASUREMENT AND ASSESSMENT Virtual environments are often rich worlds in which to practice skills. This does not necessarily mean that VEs train effectively. To train, VEs must provide frequent practice focused on essential skills, plus feedback. These conditions grow expertise (Ericsson, Krampe, & Tesch-Romer, 1993), and they are necessary for learning in any venue: classroom training, multimedia training, intelligent tutoring systems, Web based training, and live exercises. Measurement and assessment are essential to all three of these conditions to determine how much practice is required, to select experiences that train deficient skills, to populate feedback, and for other functions. Assessment helps ensure that VE systems train effectively. VEs rarely have sufficient measurement and assessment capabilities to support training, however. In 1997, the Office of the Inspector General of the U.S. Department of Defense issued an audit stating that the military had not demonstrated the training value of its simulators, despite huge investments in the technology. More than a decade later, few VEs assess the knowledge and skills of trainees, though many document the effects that trainees achieve (for example, the number of enemies killed or the health of the synthetic patient after virtual surgery). WHAT IS AN ASSESSMENT? We define assessment as a value judgment concerning the adequacy of the learner’s performance. An assessment concerning mastery of a training objective
Measurement and Assessment for Training in Virtual Environments
237
(for example, to triage virtual patients accurately) is generated by applying a performance standard (for example, scores above 95 percent denote expertise) to a measurement (for example, 98 percent) computed from data concerning the behavior of an agent in context. Each of these terms is essential to the concept of assessment and to its practice. VEs that fail to address all of these aspects in their assessments either require the instructor to do so manually or they leave the trainee to infer, often erroneously (Dunning, Johnson, Ehrlinger, & Kruger, 2003), the state of his or her own skill and the lessons to be learned. An agent is the person or organization associated with an assessment. We identify agents with assessments in order to know whom to credit (or blame) or to whom to provide feedback that improves performance. Just who is an agent varies with the function of assessment. If the objective is to improve trainee performance, an assessment might be associated with the trainee, such as a human pilot (or team of pilots) in training. If the objective is to assess the expertise of observers or the agreement between them, an assessment would be associated with the observer(s) monitoring trainee performance. If the objective is to improve the quality of software agents (or nonplayer character, in gaming terms), an assessment would be associated with the agent or its author. Such distinctions are important in complex VEs in which, for example, a pilot’s failure to fire on a ground target might be attributed to (1) poor performance by the pilot, (2) poor observation by the trainer, or (3) failure of a synthetic agent to clear the pilot to fire on the target. The performance context is the setting in which the agent(s) take the measured action (or inaction). The context may be defined by the state of the simulated ambient environment (for example, day/night or raining/not raining). It may be defined by the entities in that environment (for example, four incoming enemy aircraft, one friendly wingman, and two enemy surface-to-air missile (SAM) installations). It may be defined by the actions and interactions of those entities (for example, the formation in which enemy aircraft approach or the destruction of the wingman by enemy missiles). It may be defined by all of these and by their relationships in time and space (for example, the proximity of the enemy SAM, the order in which one encounters the enemy SAM and aircraft, or the latency between those encounters). Data are observations of action. Data may be provided directly by the trainee (for example, in a survey), an observer (such as a trainer), a measurement instrument (such as an eye-tracking device), or the simulation system. Data relevant to behaviors or skills—which VEs are designed to elicit—may concern the state of an object, its actions, or its interactions. For example, data concerning the location of a missile strike may be generated by the simulator as latitude, longitude, and altitude above sea level. Data concerning an avatar of a human operator may concern its location, utterances, posture or gesture, and actions. A measure is a formula that transforms these data so that they lie on a defined measurement scale. Its product is a measurement. For example, a formula for calculating the accuracy of a missile strike may compute the distance between the location of a target and the impact location of a missile. Scales are of several
238
Learning, Requirements, and Metrics
types. A nominal scale consists of discrete categories (for example, a missile “hit” or “missed” its target). An ordinal scale consists of ordered, discrete categories (for example, the miss was within 50 m (meters) of the target, 50–200 m, or more than 200 m). An interval scale consists of ordered, equidistant points (for example, the bombs struck 10 m apart). A ratio scale consists of ordered, equidistant points with a meaningful definition of zero, to support ratio calculations (for example, the missile struck 40 m from the target, twice the allowed distance of 20 m). The units of measures for scales (length, mass, time, temperature, information, area, and so forth) are more or less exhaustively enumerated in the Suggested Upper Merged Ontology of Measures (Teknowledge, 2007). A performance standard partitions the measurement scale into meaningfully labeled sections. It imbues the measurement with value and utility relative to some training objective or performance objective: 80 percent of missiles fired struck the target = journeyman performance. Often, the standard is conditioned on the context. For example, 60 percent accuracy may qualify as journeyman performance in difficult terrain, adverse weather conditions, or when achieved under fire. An assessment is rarely specified in all of its complexity (above) in simulation based training. This is unfortunate. Assessment necessarily includes the components described above. If they are not specified explicitly, they are implicit; trainers and trainees instantiate them by assumption. It is difficult to understand such assessments, and control their quality, because assumptions may vary between scorers. Thus, one assessor may consider an environmental factor important that another ignores; one may assess trainee actions differently from another; one may draw a different line between novices and journeymen, journeymen and experts. Such assessment conditions require evaluators and the designers of evaluation systems to do the best they can; often this is good enough to ensure learning, but these conditions can produce unreliable assessments. Two strategies help to ensure that assessments are reliable. The first is to define measures systematically to ensure that they bear on the objectives of training and refine them experimentally to pare away those that cannot be taken reliably (MacMillan, Entin, & Morley, in press). The second is to specify each of these aspects of measures in a format or formal language (Stacy, Ayers, Freeman, & Haimson, 2006), so that they can be critiqued by experts for completeness and correctness, implemented unambiguously (even automatically), and refined over time. To some extent, however, the decision of how tightly to define an assessment—no less than the decision of what to measure and how—depends on how it will be used. We turn next to the functions of assessments in VE for training. WHAT FUNCTIONS DOES ASSESSMENT SUPPORT IN VE TRAINING SYSTEMS? Measurement and assessment serve two functions in training: instruction and management of instruction. We discuss these below, beginning with assessment for instruction, which is the primary focus of this chapter.
Measurement and Assessment for Training in Virtual Environments
239
Assessment in Training Assessments are content for training feedback. It is common to supplement assessments in feedback with descriptions or replays of the assessed behaviors, their context, and their effects. Feedback may also include inferences from assessed behavior concerning the knowledge and skill of the trainee. Objective assessments in feedback inform the trainee about his or her skill and are catalysts for subjective assessment. For example, one feedback system for authoring and delivering debriefs to distributed military teams (Wiese, Freeman, Salter, Stelzer, & Jackson, in press) presents objective (automatically generated) assessments using stoplight icons (red and green) beside each training goal. Selecting an assessment replays the events in the air mission scenario at the moment of measurement. Instructors and trainees can then subjectively assess their performances in context. Measurements and assessments can support diagnosis of performance failures. Measurement systems (human or machine) that consider the context of performance can discriminate whether an error is due to an actor (trainee, observer, or synthetic entity) who failed to respond when the context required it or to poor scenario design or control, which failed to present the conditions that would test trainee knowledge and skill or which possessed extraneous or inappropriate design elements that inadvertently obscured the relevant conditions (for example, distracted the trainee). Measurement systems that represent causal relations between events can help diagnose the root causes of error. A very detailed and rich specification of assessment context can subsume this causal model. For example, it might specify that a pilot may release weapons only after receiving the communication “free and clear” from a ground controller. If the controller fails to issue that permission and the pilot fails to release weapons, then the root cause of a failure to release weapons lies with the controller, and this is evident in the assessment of both the controller (who receives a failing assessment on issuing the permission) and the pilot (who receives no credit or demerit for failing to release weapons because the context lacked the required permission). The complexity of diagnosis rises with the complexity of the training environment, the number of participants, and the distribution of their actions over time. Consider the case of a VE in which four pilots fly in formation (that is, coordinating with each other) on a bombing run, controlled by a ground surveillance team, which in turn coordinates with a team that supplies intelligence imagery and other information. This is an example of the direction in which some VE training is headed: training distributed, multifunctional teams of teams. Semi-automated measurement and assessment may be required to assess the interactions between trainees in these settings because the coordination between distributed trainees simply may not be observable by a human trainer in any one location. Assessment techniques and technology to support such training are largely in the research stage. Meanwhile, assessment technology may serve trainers best by cueing them to make assessments or diagnoses at moments when coordination is occurring across distributed trainees.
240
Learning, Requirements, and Metrics
Assessments can support the prognosis of performance success and failure. Such predictions can help human and automated controllers of VEs to adapt the difficulty of the training experience so that it provides an appropriate challenge within what Vygotsky (1978) called the zone of proximal development, the range of challenge that the trainee can master with appropriate support. Assessments can drive instructional prescriptions. Prescriptions are recommendations, such as the suggestion that a trainee read a specific manual before resuming training, engage in certain part-task skills training, or execute a particular scenario next. Prescriptions select and schedule training. Thus, they address the requirement, given at the beginning of this chapter, that training provide frequent practice of essential or targeted skills. The benchmarked experiential system for training provides one example of this capability. The authors (Levchuk, Shebilske, & Freeman, 2007; Gildea, Levchuk, Freeman, Narakesari, & Shebilske, 2007; Shebilske, Gildea, Levchuk, & Freeman, 2007) developed optimizations and other benchmarks of performance in a complex team, air warfare task, assessed performance against those benchmarks, entered those assessments into a decision aid that modeled the team knowledge state and the impact of specific training events on that state (both knowledge state and training effects were represented probabilistically using a partially observable Markov decision process), and used the output of that aid to select one of tens of scenarios available for training. This integration of assessments with a mathematical model of instruction strategy produced reliably greater learning, in a laboratory experiment, than did a scenario selection strategy that implemented a hierarchical part-task approach. Assessments can drive training events and context. An assessment that the trainee is performing at or above the highest measurable level in a scenario (a “ceiling effect”) may drive VE controllers (human or automated) to introduce events or a context that increases the level of challenge to the trainee or challenges a different and presumably unmastered skill. These manipulations can take many forms. A VE training system that employs a human trainee as pilot, a synthetic wingman, a synthetic coach, and a synthetic simulation controller (Bell, Ryder, & Pratt, in press) might increase the challenge for a trainee assessed to be an expert by handicapping the performance of the synthetic wingman, decreasing the guidance provided by the coach, or, as a synthetic simulation controller, introducing a thunderstorm to diminish the quality of the environment, decreasing the capabilities of the aircraft, increasing the number of SAM sites that threaten the pilots, and so forth. Such manipulation is common in intelligent tutoring systems, but it is rare for VEs to be imbued with the intelligence to make instructional decisions. One explanation may lie in the complex chain of events this invokes. Changes to context influence which measures are taken and the standards by which performance is assessed, which in turn shape feedback during and after the action. Assessment for Training Management Measurement and assessment can also serve managerial functions. They can help designers to evaluate and incrementally improve software agents, such as
Measurement and Assessment for Training in Virtual Environments
241
synthetic pilots, coaches, and simulation controllers (above). A training system can archive assessments of the state of these agents, the actions of agents, and the actions of trainees; analyze these; and refine the logic or the mathematics of agents so that they make finer distinctions between trainee actions, more accurate diagnoses of failure, better predictions of training outcomes, and improved selection of feedback content and presentation. Assessments can help training managers to select, assess, and train VE controllers, who must manage those conditions during scenario trials; observers, who must take reliable measures and assessments of trainee performance; and trainers, who must convey essential knowledge and skills to trainees or cue them to exercise them in VE training. Assessments can help instructional designers to evaluate VE scenarios, which must present appropriate conditions for learning. Instructional designers should recognize that the effectiveness of a training scenario is partly a function of the trainee; individual differences often moderate the training-outcome relationship. Measures of individual differences can be used to assess trainee characteristics (for example, reaction time, reading speed, and visual acuity) before training, and these characteristics can be used to select good training content given that trainee’s characteristics. Finally, assessments can help software engineers to verify that a simulator is performing as designed. We have witnessed at least one training exercise in which simulation controllers launched an enemy fighter aircraft against human trainees, observed trainees react to it as if it was a commercial aircraft, and later discovered that the software engineers had mistakenly applied the label for that fighter aircraft to a model of a commercial airliner. A very complex measurement scheme is needed to capture these rare errors, one that correlates data concerning the supposed characteristics of an object (for example, the flight characteristics of an enemy fighter aircraft) with the characteristics as they are perceived by the trainee (for example, through tactical displays). WHAT IS ASSESSED IN VIRTUAL ENVIRONMENTS? The objects of assessment in VE training are, at the first order, the end state of a task or mission, the process by which it was executed, the capabilities or attributes of the performing agents, and the state of entities in and out of the VE. At the second order are assessments of trends, such as rate of improvement or learning. A measure of the end state of a task or mission is referred to as a measure of effectiveness (MOE). MOEs address “what” was done and typically include the occurrence, accuracy, or quality of outcomes and/or the timeliness with which they were achieved. A measure of the steps executed in response to an event or in pursuit of an outcome is referred to as a measure of performance (MOP). MOPs address “how” the outcome was achieved. The distinction between an MOE and an MOP is, thus, largely a function of what one considers an outcome, that is, of the training objectives. One trainee’s MOP may be another’s MOE. Assessments of the capabilities or attributes of the trainee are often inferred from MOPs or fine-grained MOEs. They typically concern the state of declarative
242
Learning, Requirements, and Metrics
knowledge (knowledge of what) and procedural knowledge (knowledge of how). Declarative knowledge is commonly and probably best assessed using conventional instruments (for example, pen and paper multiple-choice tests), provided that they recreate realistic cues to elicit recognition or recall of the associated knowledge of interest, where the trainee must interact with an object to demonstrate his or her declarative knowledge. Medical trainees do this when they practice diagnostics using a real mannequin or virtual patient. Procedural knowledge is appropriately evaluated in VEs because VEs provide rich cues to skill selection, controls for skill execution, and a realistic tempo for sequences of actions. Such assessments may describe the individual or the team or organization that collectively demonstrates the coordination of the target skills. The state of trainees themselves is sometimes measured using instruments that assess personality, intelligence, or other relatively stable individual differences. Trainee physiological state is, increasingly, measured using techniques such as eye tracking and electroencephalography to assess otherwise unobservable cognitive states, such as arousal, anxiety, concentration, and insight (Bowden, Jung-Beeman, Fleck, & Kounios, 2005). These factors may moderate training effects, and so assessment on these factors can be used to customize training to the individual. Second order measures and assessments combine data from the primary measures and assessments above. Assessments of efficiency consider the number of effects achieved or processes executed per resource expended or per unit time. Assessments of reliability address accuracy over trials. Measures of learning represent the rate of change in effectiveness, process, or capabilities over time or training trials. Other forms of training assessment are possible, of course. The discussion above addresses assessments of skill or learning in training scenarios. This corresponds to the second level—called learning—in Kirkpatrick’s (1994) hierarchy. VE training can be assessed from trainee reactions (level one) and often is assessed only in this way. It can be assessed from records of performance on the job (level three) and from the results (for example, the return on investment or rate of mission success) that it has on the organization as a whole (level four). In basic research concerning VEs, the characteristics of the environment itself are often measured. Central to this mission is understanding whether the VE delivers the distinctive experience it is intended to deliver: a sense of presence (Mikropoulos & Strouboulis, 2004), which is enabled by immersion and interactivity (Schloerb, 1995; Steuer, 1992). Presence enhances training of interpersonal communication skills in a medical VE (Johnson, Dickerson, Raij, Harrison, & Lok, 2006). Presence in VE training is even more clearly critical when the “real world” environment is largely virtual, as is the case in telemedicine or endoscopic surgery (Zhang, Zhao, & Xu, 2003). Presence is a metric of the ecological validity of VEs, but it is not necessarily correlated with training effectiveness. For example, many experienced aviators believe that high fidelity motion systems are essential for flight training simulators, but the research evidence indicates that they are not (Lintern, 1987). There is even some evidence that the use
Measurement and Assessment for Training in Virtual Environments
243
of a motion system in a flight-training simulator can interfere with transfer to real flight (Lintern & McMillan, 1993). Recent research attempts to model this nonmonotonic relationship between fidelity and training effectiveness (Estock, Alexander, Gildea, Nash, & Blueggel, 2006) to support the design and acquisition of VEs. Measurement of the environment also has value in training applications. Measurement can identify the cues to which trainees are exposed and (fail to) respond. This is particularly important in VEs in which trainees create their own experiences, environments in which it is not possible to plan measurement opportunities in advance. For example, Forterra Systems Inc.’s Online Interactive Virtual Environment presents an urban environment in which trainees command avatars to execute military operations, and trainers or their confederates control civilian and insurgent avatars who populate the virtual city. There is no formally scripted sequence of events (that is, software code). An assessment engine was developed (Haimson & Lovell, 2006) to identify emergent situations in which trainees could exercise doctrinal maneuvers. These assessments serve as bookmarks used by trainers during debriefing to leap to situations in which knowledge of doctrinal maneuvers can be discussed and evaluated.
HOW ARE ASSESSMENTS MADE IN VE TRAINING? Three methods are used to make assessments in VE training: trainee responses to questions and other probes, observer data collection, and collection by automated instruments. These methods have been categorized as subjective measurement (trainee response) and objective measurement (observational and automated measurement; Wilson & Nichols, 2002). What constitutes a measure as objective or subjective has been a point of discussion and contention for many years (see Annett, 2002a, 2002b; Wilson & Nichols, 2002), with subjectivity carrying with it the connotation, at least for ergonomists, of being second-rate, soft science (Annett, 2002b). Nonetheless, subjective measurement is many times the only way in which the experiences and opinions of participants/trainees can be captured. Examples are measuring the side effects and aftereffects of VE participation in VRISE (virtual reality induced symptoms and effects; Cobb, Nichols, Ramsey, & Wilson, 1999) or the sense of presence (Steuer, 1992). In contrast, there are situations in which trainee/participants are not aware or good judges of effects that they are objectively measured to have experienced (for example, postural instability) (Wilson & Nichols, 2002), and here subjective measures are deprecated. Measurement and assessment is typically conducted by human observers, even in the largest and most expensive simulation systems—those that most faithfully reproduce the physical controls (such as a cockpit), environment (for example, landscapes and weather), and object attributes (for example, missile flight characteristics). Assessment in these settings is done largely by observers using paper or electronic forms. Observers may validate that a specific training experience has been delivered, rate or evaluate trainee actions, and record their observations
244
Learning, Requirements, and Metrics
and judgments concerning second order characteristics (such as efficiency) and inferences about unobservable states (such as situational awareness). The reliance on observers is in part an historical artifact of assessment practices in real (not virtual) exercise environments. In part it is a necessity: real and artificial environments are rarely well instrumented for automated measurement and assessment, and it is often difficult and expensive to automatically assess the complex tasks executed in these environments. It is common in military simulation exercises, for example, to have a large cadre of domain experts score a larger assembly of trainees. Naturally, an added benefit of using observers to assess performance in a VE is that those same measures (if developed appropriately) can also be used in live training exercises (Wiese, Nungesser, Marceau, Puglisi, & Frost, 2007). This ensures that trainees are assessed similarly across environments, and it can facilitate analysis of the effectiveness of live and virtual training environments. Subjective measures and assessments are sometimes gathered from trainees using paper or electronic surveys to assess individual differences, reactions to training, and evaluations of their own performances. The latter are not necessarily reliable (Dunning et al., 2003), but often have the valuable side effect of encouraging trainees to recall key training events and think critically about them. Trainees may also be asked to assess the state or performance of other teammates, the team, or the scenario events. In this case, they are acting as observers, ones who have information and perspectives that may differ significantly from those of trained observers. Automated measures and assessments are taken by instrumentation within the VE or external to it. Such physiological measurement systems as eye tracking and functional near-infrared imaging typically are not often tightly integrated with each other or with the VE. Instrumentation within the VE typically taps the system’s databases or data bus. Many military systems use the high level architecture (HLA) that enables measurement instruments to read simulation data (for example, the state of aircraft and their interaction with missiles) and write measures or assessments to the bus as a federate with a status similar to that of any trainee operating a station on the system. For example, a measurement technology developed for assessing human performance in military simulations (Stacy et al., 2006) subscribes to specific data, captures those data when they are delivered across the HLA bus, computes measures of performance from those data using rules typically derived from experts or authoritative sources, and makes measurements and assessments available to debriefing systems using Web services after the training simulation system has been shut down.
HOW CAN ONE ASSESS ASSESSMENT? The quality of measures and assessments is a function of their feasibility, their utility for the function at hand, and their statistical properties. The feasibility of a measure or assessment determines whether it can be made at all, and made efficiently. Assessments in some experimental training
Measurement and Assessment for Training in Virtual Environments
245
settings—often those involving virtual environments—are subject to federal regulations that protect the privacy and health of human subjects. To be taken at all, assessments in these settings must pass muster with an institutional review board concerned with disclosure to subjects, safe conditions of administration, and anonymous or secure storage of data about individuals. The feasibility of assessment is also a function of the cost of developing measures and assessments (below), validating them, administering them, and using them. For example, the cost of developing automated measures may include modifying a simulator data model to distribute data concerning the state of a display or control with which the user interacts to demonstrate a skill and developing software that computes measurements and assessments from those data. This can take months given the complexities of simulation hardware and software, the social and political environment of a simulation engineering team, and the business processes designed to prevent spurious changes to the simulation system. Feasibility is also a function of the cost of administering measures. This cost is low when assessment is automated. The price of observer assessment is driven by the cost of hiring and training a sufficient number of domain experts to make measurements and assessments reliably. The utility of an assessment is determined by whether it fulfills its function. Measures and assessments developed primarily for instructional functions, for example, must be meaningful to trainees and actionable, thus instructionally effective. To achieve this, developers of measures must often analyze performance in the domain at hand, using observation, interviews, surveys, and document analysis. For example, the Mission Essential Competency technique developed and used by the U.S. Air Force employs group interview techniques and confirmatory surveys to define critical competencies, knowledge, and skills in a domain. Experiments can refine this understanding of key competencies. Methods of developing measures and assessments from these findings (MacMillan et al., in press; Carolan et al., 2003) specify the observable behaviors that denote these competencies in a mission simulation. At a minimum, these methods of defining assessments ensure that the trainers and trainees understand them. Instructional science and art can be applied to ensure that these meaningful assessments help trainees to learn, whether the assessments are delivered directly as feedback or influence the content or schedule of training (as described above). Assessments that fulfill their function are said to have consequential validity (Messick, 1995). Measures should also have certain statistical properties: reliability and validity, with reliability being a precondition for validity. A measure is reliable if repeated administration to a subject produces similar scores (for example, of accuracy of communication in foreign language training) or if different evaluators apply the measure to the same data and produce similar scores. The former definition of reliability is confounded with the concept of learning, in which scores on measures rise with training. Thus, the reliability of a measure must be tested in conditions that preclude learning (for example, by controlling expertise or time on task).
246
Learning, Requirements, and Metrics
A measure can be valid in several respects (Anastasi, 1988). Criterion validity exists when a measure accurately assesses current state or accurately predicts a future state. A piloting test has criterion validity if it predicts job performance ratings, rank, or some other indicator of proficiency. Content validity exists when a measure addresses all of a given domain, but only that domain. For example, an assessment of aircraft piloting skill has content validity if it addresses flight preparation, takeoff, cruise, landing, and taxi, and it omits passenger relationship management. Construct validity exists when a measure assesses the unobservable, theoretical construct that it claims to measure (for example, intelligence). Measures with construct validity should correlate with measures that are known to be related to the construct, should not correlate with measures known to address distinct constructs, and they should respond as predicted to the state of other constructs. (These tests are known as convergent, divergent, and nomological validation, respectively). Consequential validity (Messick, 1995) exists when an assessment fulfills its intended function for the individual or the organization, as when a test of customer service skills, when used to select new employees, improves customer service. There is an axiological component to consequential validity; the assessments that an organization performs and the functions that assessment serves are dynamic. They change as the organization evolves and as its missions change. Thus, it is a significant challenge for organizations to continuously align their assessment efforts with their states and missions. Unfortunately, it is unusual for VE training developers to test the reliability and validity of assessments and measures used in applied (not experimental) training. Often, the apparent relevancy of a measure—its face validity—suffices as an endorsement of the measure or assessment. This is disappointing in part because measures developed for their face validity alone often have deep statistical flaws and limited utility. For example, observers may give all teams passing assessments regardless of their performance because the performance standards are vaguely defined, the measures are based on observer inferences and not observable events, or for other reasons.
WHAT IS THE FUTURE OF ASSESSMENT FOR VE TRAINING? The growth of VE training systems presents opportunities to improve how, how frequently, and how well we assess human performance. We envision a future for VE training in which (1) assessment-driven simulation ensures that virtual environments provide training, not just practice and (2) simulation-driven assessment ensures that simulations provide sound conditions for evaluating human performance. Assessment-driven simulation will address the problem—common in largescale military simulation acquisitions and in PC (personal computer) gaming environments repurposed for training—that the virtual environment is designed to provide practice, but without regard for its effectiveness as a training device. Large-scale systems (full motion aircraft cockpit simulators, for example) are
Measurement and Assessment for Training in Virtual Environments
247
designed to maximize the fidelity of controls, displays, and the physics of entities represented in the environment on the assumption that trainers will put these to good use. PC gaming environments (such as Neverwinter Nights and Second Life) are designed to maximize engagement using dramatic visuals, storylines, and rapid reward schedules. Neither class of system generally supports the core functions of instructional design: (1) creating conditions for learning and practice of essential skills and (2) assessing competency on those skills. We look forward to a future in which training requirements drive the design of simulation controls, displays, and physics models. Scenario authoring tools will help instructors put these features to use by constraining the design of scenarios to relevant conditions, ensuring, for example, that scenarios for training medics to respond to mass casualties present a mix of cases that challenges, for example, skills in diagnosis and personnel tasking and does so at a pace that presents a challenge requiring learning. Recent research explores this vision (Stacy, Walwanis, & ColannaRomano, 2007) by formally representing aspects of instructional design (for example, representing training objectives, conditions, and measures in XML [Extensible Markup Language] schemas) and applying constraint logic programming techniques to schedule instructional experiences before and during simulation runs. Simulation-driven assessment will resolve the difficulty of accessing and using simulator data to measure and assess human performance. Current simulations typically provide few assessments, and few measures, mainly MOEs, not MOPs. Large training institutions supplement these assessments with observer assessments and debriefs structured (in the best cases) to elicit subjective assessments. The data required to assess processes often are represented only in replay “video,” but they are not available directly to measurement systems. For example, data may be available that a missile is fired at an aircraft and that the pilot maneuvers to avoid it, but rarely are data available concerning the instruments with which the pilot perceives that event (the auditory signal from a radar warning receiver or the visual signal on a radar display) or controls with which the pilot makes a maneuver. Thus, it is possible to automatically assess that a pilot makes avoidance maneuvers, but the details required to diagnose failures are lost. We look forward to a future in which a rich data stream is available with which to automate some of the measurement and assessment work that human observers now take on. This will free observers to apply their expert judgment to evaluate other trainee competencies (for example, verbal communications) that cannot reliably be assessed automatically or that require subtle interpretation only humans can make.
CONCLUSION This chapter described foundations of measurement and assessment for training in virtual environments. We defined an assessment as a complex object (a value judgment on a measurement computed from data generated by an agent acting in context to attain a training objective), an object that can be formally
248
Learning, Requirements, and Metrics
specified to clarify or even automate assessment. We described ways in which researchers and trainers can apply measures and assessments in training (as feedback, to diagnose and forecast performance, and to drive instruction) and training management (to evaluate software agents, human training staff, scenarios, and the simulator itself ). We specified that the proper objects of measurement and assessment are the trainee (his or her declarative knowledge, procedural knowledge, physiological state, and individual psychological differences) and the training environment itself. We described the fundamental approaches (subjective and objective) to measurement and assessment, and the standards for assessing assessment (on feasibility, utility, reliability, and validity). Finally, we defined future directions for measurement and assessment, in which assessment drives simulation and in which simulation drives assessment. We hope that this framework helps training researchers and designers to instrument VEs so that they measure human performance in context and apply these measures in ways that enhance the power of VEs to train. ACKNOWLEDGMENTS We thank several colleagues who contributed insights and editorial comments: Cullen Jackson, Ph.D., Emily Wiese, Ph.D., Eileen Entin, Ph.D., and Diane Miller. REFERENCES Anastasi, A. (1988). Psychological testing. New York: Macmillan Publishing Co. Annett, J. (2002a). Subjective rating scales: Science or art? Ergonomics, 45, 966–987. Annett, J. (2002b). Subjective rating scales in ergonomics: A reply. Ergonomics, 45, 1042–1046. Bell, B., Ryder, J., & Pratt, S. (in press). Communications and coordination training with speech-interactive synthetic teammates: A design and evaluation case study. In D. Vincenzi, J. Wise, P. Hancock, & M. Mouloua (Eds.), Human factors in simulation and training. Mahwah, NJ: Lawrence Erlbaum. Bowden, E. M., Jung-Beeman, M., Fleck, J., & Kounios, J. (2005). New approaches to demystifying insight. Trends in Cognitive Sciences, 9, 322–328. Carolan, T., MacMillan, J., Entin, E., Morley, R. M., Schreiber, B. T., Portrey, A., Denning, T., & Bennett, W., Jr. (2003). Integrated performance measurement and assessment in distributed mission operations environments: Relating measures to competencies. Proceedings of the Interservice/Industry Training, Simulation & Education Conference [CD ROM]. Arlington, VA: National Training and Simulation Association. Cobb, S. V. B., Nichols, S. C., Ramsey, A. D., & Wilson, J. R. (1999). Virtual realityinduced symptoms and effects (VRISE). Presence: Teleoperators and Virtual Environments, 8, 169–186. Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people fail to recognize their own incompetence. Current Directions in Psychological Science, 12(3), 83–87. Ericsson, K. A., Krampe, R. T. H., & Tesch-Romer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 700, 379, 384.
Measurement and Assessment for Training in Virtual Environments
249
Estock, J. L., Alexander, A. L., Gildea, K. M., Nash, M., & Blueggel, B. (2006). A modelbased approach to simulator fidelity and training effectiveness. Proceedings of the Interservice/Industry Training, Simulation, and Education Conference [CD ROM]. Arlington, VA: National Training and Simulation Association. Gildea, K., Levchuk, G., Freeman, J., Narakesari, S., & Shebilske, W. (2007). BEST: A benchmarked experiential system for training. Proceedings of the 12th Annual International Command & Control Research & Technology Symposium. Retrieved August 29, 2007, from http://www.dodccrp.org/events/12th_ICCRTS/CD/iccrts _main.html Haimson, C., & Lovell, S. (2006). Pattern recognition for cognitive performance modeling. In K. Murray & I. Harrison (Eds.), Capturing and using patterns for evidence detection: Papers from the 2006 Fall symposium (Tech. Rep. No. FS-05-02; pp. 120– 126). Menlo Park, CA: Association for the Advancement of Artificial Intelligence. Johnson, K., Dickerson, R., Raij, A., Harrison, C., & Lok, B. (2006). Evolving an immersive medical communication skills trainer. Presence, 15, 33–46. Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco: Berrett-Koehler. Levchuk, G., Shebilske, W., & Freeman, J. (2007). A model-driven instructional strategy: The benchmarked experiential system for training (BEST). Manuscript submitted for publication. Lintern, G. (1987). Flight simulation motion systems revisited. Human Factors Society Bulletin, 30(12), 1–3. Lintern, G., & McMillan, G. (1993). Transfer for flight simulation. In R. Telfer (Ed.), Aviation instruction and training (pp. 130–162). Aldershot, United Kingdom: Ashgate. MacMillan, J., Entin, E. B., & Morley, R. (in press). Measuring team performance in complex and dynamic military environments: The SPOTLITE Method. Military Psychology. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. Mikropoulos, T. A., & Strouboulis, V. (2004). Factors that influence presence in educational virtual environments. Cyberpsychology & Behavior, 7, 582–591. Schloerb, D. W. (1995). A quantitative measure of telepresence. Presence, 4, 64–80. Shebilske, W., Gildea, K., Levchuk, G., & Freeman, J. (2007). Training experienced teams for new experiences. Proceedings of the Human Factors and Ergonomics Society 51st Annual Meeting [CD-ROM]. Santa Monica, CA: Human Factors and Ergonomics Society. Stacy, W., Ayers, J., Freeman, J., & Haimson, C. (2006). Representing human performance with Human Performance Measurement Language (HPML). Proceedings of the Interservice/Industry Training, Simulation and Education Conference [CD ROM]. Arlington, VA: National Training and Simulation Association. Stacy, W., Walwanis, J. M., & Colanna-Romano, J. (2007). Using pedagogical information to provide more effective scenarios. Proceedings of the Interservice/Industry Training, Simulation & Education Conference [CD ROM]. Arlington, VA: National Training and Simulation Association. Steuer, J. (1992). Defining virtual reality: Dimensions determining telepresence. Journal of Communications, 42, 73–93.
250
Learning, Requirements, and Metrics
Teknowledge. (2007). Overview of the SUMO. Retrieved August 29, 2007, from http:// ontology.teknowledge.com/arch.html#Measure Vygotsky, L. S. (1978). Mind and society: The development of higher mental processes. Cambridge, MA: Harvard University Press. Wiese, E. E., Freeman, J., Salter, W. J., Stelzer, E. M., & Jackson, C. (in press). Distributed after action review for simulation-based training. In D. A. Vincenzi, J. A. Wise, M. Mustapha, & P. A. Hancock, (Eds.), Human factors in simulation and training. Mahwah, NJ: Lawrence Erlbaum. Wiese, E. E., Nungesser, R., Marceau, R., Puglisi, M., & Frost, B. (2007, November). Assessing trainee performance in field and simulation-based training: Development and pilot study results. Proceedings of the Interservice/Industry Training, Simulation and Education Conference [CD ROM]. Arlington, VA: National Training and Simulation Association. Wilson, J. R., & Nichols, S. C. (2002). Measurement in virtual environments: another dimension to the objectivity/subjectivity debate. Ergonomics, 45, 1031–1036. Zhang, G., Zhao, S., & Xu, Y. (2003). A virtual reality based arthroscopic surgery simulator. Proceedings of the IEEE International Conference on Robotics, Intelligent Systems and Signal Processing (Vol. 1, pp. 272–277). Los Alamitos, CA: IEEE.
Chapter 13
TRAINING ADVANCED SKILLS IN SIMULATION BASED TRAINING Jennifer Fowlkes, Kelly Neville, Razia Nayeem, and Susan Eitelman Dean Achieving the coordination of numerous weapons systems and personnel units is a principal goal of the U.S. Marine Corps (USMC). The USMC is small compared with the other branches of the U.S. Armed Services, owns fewer resources, and thus relies heavily on an ability to bring those resources to bear in a strategically coordinated manner, that is, to achieve combined arms. Accordingly, significant USMC training resources are directed toward preparing personnel to execute combined arms operations. Distributed, simulation based training systems, in which elements of combined arms teams are represented and linked, are among the critical enabling technologies for training combined arms and other distributed operations. Despite the importance of simulation, it is difficult to achieve pedagogically sound training using simulation based training for reasons pertaining to complexity of the training environment and the difficulty in training such higher order skills as coordination and adaptability. The objective of the present chapter is to describe the challenges to conducting simulation based training for large teams and then to present instructional strategies that can be used to help address the challenges. The strategies are based on the results of a cognitive task analysis to identify training needs for Fire Support Teams combined with principles for advanced learning. The Fire Support Team is one of the USMC’s resources for providing combined arms. The cognitive task analysis was used to elicit detailed knowledge from seven experienced instructors with respect to a series of challenging combined arms events. The strategies identified can be used to augment existing simulation systems for combined arms training and potentially for training other large tactical teams. In addition, it is hoped that the strategies can be used to guide the design of instructional features in emerging systems.
252
Learning, Requirements, and Metrics
COMPLEX TRAINING ENVIRONMENTS The complexity of the combined arms environment is one of the challenges to providing effective simulation based training. Training combined arms teams may involve tens, or even hundreds, of system operators. As Lane and Alluisi (1992) noted, The players in this simulated battlefield environment are not only the weapon system operators, but also the commanders, staffs, logisticians, support units, intelligence personnel, and decision makers at all levels—in short, all the combat, combat support, and combat-service support elements assigned to the battle force and its support. (p. 23)
Indeed, simulation based training systems used for combined arms represent complex environments that, because of their emergent properties, can undermine effective training that is predicated on controlling training experiences to meet training objectives. That is, training event flow is largely determined by the give-and-take, real time interactions of players and simulated entities in the exercise. While such environments are probably representative of the actual battlefield, they are difficult to engineer as effective training exercises because task content is largely left to chance (Fowlkes, Lane, Dwyer, Willis, & Oser, 1995). There are other consequences of complexity for training. Table 13.1 lists elements of complex environments taken from Feltovich, Hoffman, Woods, and Roesler (2004) and identifies some of the implications for training. Some of the implications follow: effective training and measurement systems must take into account the goal-directed nature of team performance and the integrated manner in which individuals and subteams support overall team goals; training, assessment, and feedback must occur at the individual, subteam, and large team levels; there are multiple ways in which a system can break down, and there are multiple ways in which effective performance can occur; and, finally, to be truly diagnostic, training and measurement systems must take into account the context in which performance occurs in order to be valid. These factors would tax any training system. The scenario based training (SBT) method (Cannon-Bowers, Burns, Salas, & Pruitt, 1998) is one approach that can be used to exert some control over the training environment and to address some of the implications in Table 13.1. Scenario based training is founded on the notion of arranging or constraining scenarios, in a realistic fashion, to optimally support learning. The critical underlying assumptions of SBT are that (a) each of the training components must be linked to achieve an effective learning environment and (b) training opportunities are not left to chance. Thus, in SBT, events are systematically identified or introduced to provide known opportunities to support training. The SBT framework does not ameliorate all of the issues associated with the complexity shown in Table 13.1, but makes many of them more tractable, including controlling task context, being able to anticipate important training and measurement
Training Advanced Skills in Simulation Based Training
253
Table 13.1. Characteristics of Complex Systems and Implications for Training Element of System Complexitya
Implication for Training
Processes occur continuously
Relevant performance strengths and weaknesses • are detected by assessing quantitative and qualitative changes over time versus at specific points. The latter is easier. • may manifest at any time, challenging human and machine measurement systems to stay on guard.
Multiple processes occur at any one time
Instructor personnel must monitor, control, and assess multiple personnel or subteams to be trained, as well as the entities with which they are interacting, imposing a heavy workload and methods for optimally focusing attention.
There are interdependencies among the systems represented
Training and assessment systems must unravel the multiple contributions to performance outcomes. In addition, knowledge from multiple domains (for example, from ground and air systems) is needed to critique performance.
Diverse explanatory principles are needed to account for performance
The complex contributions to performance include trainees (at team, subteam, and individual levels), trainee perspective, status of simulation systems (fully functional, partially functional, and nonfunctional), trainee familiarization with the capabilities and limitations of the simulation system, and a host of scenario context factors such as information ambiguity.
Explanatory principles rely on a total system view
Judgments of performance strengths and weaknesses of an individual or team must consider the context in which the performance took place and the interactions with the other system components. It is not as easy to isolate the causes of performance effects within complex systems.
Cases in the domain Optimal or fixed solution paths are difficult to identify because of display variability the myriad ways that performance can break down as well as recover. This challenges both human and machine performance assessment systems. a
From Feltovich et al. (2004).
opportunities, and reducing instructor workload. Below, we describe the basic SBT method. We then describe how it can be used as a framework for addressing the training of higher order skills within simulation based training environments. The components of SBT are shown in Figure 13.1. Training is supported initially by identifying a master set of tasks or training objectives that describe the complete set of competencies for which the trainer is used. Generally in the military these are task lists (for example, mission essential task list).
254
Learning, Requirements, and Metrics
Figure 13.1. et al., 1998).
Scenario Based Training Framework (Adapted from Cannon-Bowers
Next, to support a specific training event, a subset of training objectives from the master list is selected. In military training exercises, the subset used would be driven by a variety of factors, including the company commander’s or Fire Support Team leader’s objectives and the need to prepare for an upcoming deployment. The training objectives selected for a training event in turn drive the development of the scenario events and related products (for example, mission objectives and orders). In some cases, events are deliberately introduced to achieve a specific learning purpose. For example, in teaching a Fire Support Team member the skills needed to control close air support aircraft, events might include the aircraft being on or off the timeline, not carrying the ordnance expected, and having degraded systems. The idea is to produce or encourage situations that allow rich practice opportunities related to the training objectives. In addition to deliberately introducing events, events can be identified that will occur naturally as a result of the interactions between participants and simulated entities. For these events, timing cannot be known a priori, but instructors can be primed (or prompted) to recognize their occurrences. For example, in an effort assessing army Fire Support Teams, naturally occurring battle events that served as training opportunities included calls for fire, intelligence reports, survivability moves, and battle geometry updates (forward line of own troops and coordinated fire lines) (Fowlkes, Dwyer, Milham, Burns, & Pierce, 1999). In the SBT model, performance assessment is accomplished by evaluating trainee or team responses to the scenario events. Knowing important events that will occur during training allows instructors to develop a priori expectations for performance assessment and to focus their attention appropriately. These expectations can be used to develop automated performance assessment tools or job aids. In addition, using events as a guide for assessment serves to focus measurement so that not everything has to be observed. This in turn reduces instructor workload and creates a more economical expenditure of time. Feedback provided during after action reviews is also organized around events. Finally, the
Training Advanced Skills in Simulation Based Training
255
information collected pertaining to trainee performance (for example, skill inventory and performance history) can be utilized and incorporated into learning management systems to guide future training. Examples of training topics addressed by SBT include team and tactical training in aviation (Colegrove & Alliger, 2001; Jentsch, Abbott, & Bowers, 1999; Salas, Fowlkes, Stout, Milanovich, & Prince, 1999), team dimensional training (TDT) for navy shipboard teams (Smith-Jentsch, Zeisig, Acton, & McPherson, 1998), emergency management (Schaafstal, Johnston, & Oser, 2001), and advanced skills training for military teams (Salas, Priest, Wilson, & Burke, 2006). We now examine the use of SBT as a framework to support the implementation of advanced learning principles.
ADVANCED LEARNING Besides the issue of complexity, another reason that it is difficult to achieve pedagogically sound training in discrete multitone modulation systems is that they, either implicitly or explicitly, are focused on supporting advanced learning. Feltovich, Spiro, and Coulson (1993) define advanced learning as “acquiring and retaining a network of concepts and principles about some domain that accurately represents key phenomena and their interrelationships, and that can be engaged flexibly when pertinent to accomplish diverse, sometimes novel objectives” (p. 181). This is a good characterization of the decision making and performance required by combined arms teams. For combined arms, effective coordination of multiple systems and units in a dynamic tactical environment to achieve intended goals requires vast amounts of knowledge (for example, vast systems and organizational, environmental, and enemy knowledge) and a great deal of experience and skill. Further, the challenge associated with conducting this type of warfare is growing as new weapons systems and technologies continue to be introduced and as the enemy becomes increasingly resourceful and skilled at developing plans of aggression that are not easily predicted. Feltovich et al. (1993) argue that advanced learning is not facilitated very well in any training or education setting. For example, in most educational settings, topics are treated in isolation, whereas in reality they are interdependent; and easily understood examples are provided to illustrate concepts, whereas in the real world, examples are much less clear-cut. In military communities, an additional complication is that operational tempo is so high that it is difficult to get enough time to spend on the higher level curriculum segments. This is a problem with the building-block approach to training. One consequence is that trainees are often unprepared to take advantage of the few advanced skills training opportunities that exist. For example, predeployment training in the military offers critical opportunities for team members to learn to work with other units and warfighting specialties prior to deploying. However, in these settings, instructors find they have to focus on prerequisite skills rather than on the integration skills that the predeployment opportunities are designed to teach (Rasmussen, 1996). Thus, the trainees
256
Learning, Requirements, and Metrics
have less opportunity to acquire advanced knowledge, and instructors have less opportunity to develop training techniques focused on advanced integration principles. In the remainder of the chapter we highlight the use of advanced learning strategies for training combined arms teams based on the principles identified by Feltovich et al. (1993) and within the context of the scenario based training framework. The strategies are summarized in Table 13.2.
Training Objectives Master Set Mission essential task lists provide the core and indisputably important training objectives for military training. But also in support of advanced learning and measurement, other skill decompositions are necessary to identify the complex, higher order skills needed to support complex performance. Examples of these types of decompositions include the U.S. Air Force’s mission essential competencies, which define tactical team competencies at various levels of abstraction, from upper military echelons to specific aircrew skills (for example, Colegrove & Alliger, 2001). Fowlkes, Dwyer, Oser, & Salas (1998) used mission-oriented constraints to help decompose complex skills for aviation teams. From the cognitive task analysis for Fire Support Teams, the authors identified the skills and knowledge needed to support effective performance. Figure 13.2 summarizes the results, as well as the dynamic manner in which we would expect the skills and knowledge to be combined to support effective performance. In Figure 13.2, the skill categories situation assessment, planning, and plan execution represent key functions performed by Fire Support Team members. Each of the functional areas was associated with exemplars of the specific behaviors performed to implement the functions. The functions are usually performed in parallel so that situation assessment, for example, is performed on an ongoing basis. Fire support planning is also performed throughout a battle and so are plan execution behaviors. These represent the top-tiered functions in Figure 13.2. Information exchange, adaptability, and team coordination are also skills required by Fire Support Team members. In our view, each of these supports the upper tiered functions and, hence, the arrangement of the pyramid. Information exchange, for example, is critical to situation assessment, planning, and plan execution. In the same way, adaptability enhances the upper level functions as does having an effectively coordinated team. Finally, as illustrated in Figure 13.2, performance of all the skills is facilitated by knowledge associated with expertise. Examples of the types of knowledge used to facilitate performance that was identified from the cognitive task analysis include knowledge of scheme of maneuver, resource capabilities, enemy asset capabilities, enemy tactics, team members, limitations affecting operations, and rules of engagement. Figure 13.2 is also meant to illustrate via the twisting nature of the pyramid that a given situation requires some context-specific combination of skills and knowledge to form an effective response. Event based techniques can be used
Training Advanced Skills in Simulation Based Training Table 13.2.
257
Advanced Learning Strategies Summarized Using the SBT Framework
SBT Component
Strategy
Training objectives master set
• Use cognitive task analysis and related methods to identify complex skills and knowledge needed for effective performance. • Identify known trainee misunderstandings.
Identify training objectives for training event
• Address experience level of trainees. • Address known trainee misunderstandings.
Design training events
• Insert events (for example, disequilibrium events) that directly challenge trainee misconceptions about the domain. • Provide event sets that can provide opportunities for comparison and contrast to build usable (versus inert) knowledge. • Consider building events that can be compared across scenarios or training events. • Provide situation variations to enhance the identification of regularities.
Performance assessments
• Measures should address qualitative changes in knowledge representation in addition to incremental changes in knowledge accumulation. • Measures should help trainees and instructors compare situations, both potentially within and across scenarios, facilitate understanding, and carry forward important learning themes. • Measures should help alert instructors to important training and assessment opportunities.
Performance diagnosis and debrief/after action review
Facilitate discussions that • enable trainees to assess the linkages between their situation recognition, responses, and the underlying skills and knowledge. • reveal trainee misconceptions with regard to the knowledge they possess or its application to realistic situations (that is, the events in the scenario). • encourage trainees to examine similar cases and link their training experiences to other knowledge they possess.
to create known context for which instructors could anticipate the specific combination of skills and knowledge required for an effective response.
Identify Training Objectives for Training Event The identification of a subset of training objectives is the first step in the preparation for a specific training event using the SBT model. Considerations for selecting training objectives include the level of experience of the trainees, the
258
Learning, Requirements, and Metrics
Figure 13.2. Relationship of Skills and Knowledge Associated with Expertise Based on Findings from the Cognitive Task Analysis in the Combined Arms Domain
input of training officers, and the requirements obtained from team leaders (for example, company commander; Stretton & Johnston, 1997). Another way to select training objectives to support advanced learning is to target known biases and misunderstandings that exist among novices. Feltovich et al. (1993) argue that a simplification bias exists in learners as they acquire information about a domain that may result in faulty mental models. The misunderstandings are often difficult to identify—neither trainees nor instructors may be aware that they exist, and the biases may persist even though information to the contrary is presented. A role of cognitive task analysis is to reveal patterns of misunderstandings among trainees and to use this information to help guide events that can be used to directly challenge the misunderstandings during scenario based training. For the combined arms cognitive task analysis, a number
Training Advanced Skills in Simulation Based Training
259
of these examples were identified. For example, synchronizing combined arms is clearly one of the most difficult tasks faced by inexperienced Fire Support Teams. One of the experts interviewed for the cognitive task analysis noted that an inexperienced Fire Support Team leader would be inclined to “bring everything on,” referring to artillery, mortars, and close air support, and then would have difficulty with coordination and deconfliction. A scenario could be designed to directly address this novice tendency. Another area related to seeing the “big picture” in terms of the scheme of maneuver and how fire support can be used to support it. One of the experts interviewed said, “If they [trainees] don’t understand the concept of how you were moving, and why you were moving, they can’t possibly understand what you want as far as fire support.” Develop or Identify Scenarios and Events Scenario events can be used to directly challenge trainee misunderstandings of combined arms operations, as suggested above, and to more generally train the skills and knowledge that have been linked to effective performance. For example, Fire Support Team activities to be trained and assessed can be identified with the following mission phases: (a) before contact, (b) after contact and during preparation of the package, (c) during package execution, and (d) during attack continuation. In addition to nominal battlefield events, events can also serve as prompts to observe behaviors that occur infrequently or that might not be observable (Fowlkes et al., 1999). An example is a pilot pressuring a forward air controller (part of the Fire Support Team) to run an attack. Instructors would look to the forward air controller to utilize knowledge of the aircraft’s time on station, other resources available, and ways the aircraft might have to be supported (for example, the controller might have to talk the aircraft to the target) in forming a response to pressure from the pilot. SBT can also be used to structure other training strategies for facilitating advanced learning. Pretraining Interventions Trainees come to a training exercise with knowledge, attitudes, and expectations for what the training will entail. These can dramatically affect training outcomes (Smith-Jentsch, Jentsch, Payne, & Salas, 1996). Our interviews with domain experts revealed that trainees do not fully understand how they will benefit from simulation based training. Thus, a simple intervention would be to familiarize trainees with the purpose of the training and what they are expected to learn. Moreover, if a building-block approach is used, provide trainees an overview of the knowledge and skills they will be acquiring across the scenario based training events to avoid isolating the training topics and to provide a useful framework for future learning—the primary function of an advance organizer (AO; Mayer, 1989). The use of SBT provides an excellent complement to the use of AOs because the framework elicited via AOs will likely be more relevant to the training exercise because SBT permits control over the practice environment. Without this control, the give-and-take of team members may affect the
260
Learning, Requirements, and Metrics
direction the scenario takes, potentially making the framework provided by the AO less relevant to the exercise. Principles for Advanced Learning Feltovich et al. (1993) suggest that experiential learning opportunities utilize numerous cases in instruction and emphasize relations among cases and between cases and concepts. The comparison and contrast notion figures prominently in many strategies for training complex skills. For example, Bransford, Franks, Vye, and Sherwood (1989) argue that we can tell trainees about important cues and even allow them practice in recognizing the cues, but still the trainees may not be able to apply their knowledge to new situations. This is known as inert knowledge. Bransford et al. (1989) suggest that perceptual learning may be enhanced by providing perceptual contrasts during experiential learning. These can be used to facilitate “noticing” as well as to enhance the underlying conceptual knowledge. Trainees can be provided with opportunities to assess how new situations are the same or different from situations previously encountered. SBT approaches can help to implement these strategies by engineering situations to be compared. Scenarios can also be designed to directly challenge trainee misconceptions. That is, instructors and exercise controllers introduce novel events, unexpected events, and disequilibrium events (that is, events of a difficulty level with which trainees have not yet gained experience and which cause them to question and reevaluate their assumptions and strategies; Ross & Pierce, 2000). Klein and Baxter (2006) suggest that scenarios can be designed to reveal real world flaws, and then through activities promoting sensemaking and subsequent practice, more accurate mental models can be instantiated. The domain experts we interviewed described how instructors intervene in ongoing scenarios to make similar interventions. For example, a what-iffing strategy (whereby an instructor or controller might opportunistically say, “OK, what if this happened. Now what would you do?”) is sometimes used. It may be worth noting that a computer based training system would have difficulty generating the wealth of situations that experienced instructors can generate. In addition, instructors introduce events “on the fly” to challenge trainees. So the older guys have been there, done that, seen enough things that they can throw that in there. Therefore we’re enhancing the learning curve to share that experience. We may not think of everything and all of a sudden, “hey, there’s an opportunity to do this and we’ll throw that into the mix.” The more we can make them think, the more they have to coordinate, the more they learn. . . . we constantly throw these variations into our model, because . . . we find out when someone is good at it and then we try to ratchet it up so they are continuing to learn.
Feltovich et al (1993) also argue that real world complexity should be represented in training experiences. Military education and training often focus on a “building-block” approach to training. A building-block approach is likely to be
Training Advanced Skills in Simulation Based Training
261
relatively effective for the USMC and has been used to train thousands of soldiers. However, USMC training may be made more effective by complementing it with additional training strategies. A building-block approach is not necessarily optimal for all tasks or skills (for example, Klein & Pierce, 2001). Notably, it may not be ideal for building more advanced levels of expertise, such as adaptive expertise (for example, Holyoak, 1991; Kozlowski, 1998). Hence, other training strategies may be used in concert with the building-block approach to facilitate the acquisition of this critical form of expertise that is quite relevant to the conduct of fire support in a dynamic tactical environment. Examples of such training strategies include those that expose trainees to novel situations that challenge normative skills (Kozlowski, 1998) and unexpected situations and failure (Ross & Pierce, 2000). These training approaches differ from a building-block approach in which skills are gradually developed—not challenged and potentially discarded in the face of completely new situations. Klein and Baxter (2006) argue that what is needed for the training of such complex skills as planning and decision making is cognitive transformation. That is, important learning includes not just the accumulation of facts, but also a transformation of the way causal connections are viewed—a form of insight learning. Evidence for the importance of complexity in perceptual learning was found in a study by Doane, Alderton, Sohn, and Pellegrino (1996). These researchers found that when training requires difficult discriminations between highly similar stimuli, discriminations of novel stimuli are both faster and more accurate than when training involves less similar discriminations. They concluded that study participants acquired different discrimination strategies depending on whether training discriminations were difficult or easy. Specifically, the harder training on difficult discriminations resulted in the use of more precise and detailoriented comparison strategies even when easily discriminated stimuli were later presented, whereas the training on easy discriminations resulted in the use of an imprecise or global comparison strategy. They also found the use of these strategies to persist for a long period following their acquisition. Doane et al. (1996) assert that their findings support the hypothesis that initial training difficulty significantly influences the development of strategic knowledge, a training philosophy that had been advocated by others involved in educational and training research (for example, Schmidt & Bjork, 1992). Similarly, research conducted by Gopher, Weil, and Bareket (1994) demonstrated that certain aspects of expertise acquired using a building-block approach may not transfer to the real and complex task performance environment. Gopher and his colleagues investigated emphasis-change training, whereby instructions and feedback are used to focus a trainee’s attention on specific different aspects of task performance across the training period. According to Gopher et al., this method teaches trainees alternate ways of coping with a high workload task. Furthermore, evidence obtained in their research suggests that the emphasis-change training technique may lead to the development of attention control strategies that generalize across task performance conditions.
262
Learning, Requirements, and Metrics
Develop Performance Measures That Capture Responses to Events When implementing SBT, performance measurement tools are developed around the scenario events to provide links between measurement objectives and diagnosis of performance. This reduces the load on instructors during a training event in that the judgment about elements of acceptable performance has already been accomplished either by the instructor or his or her peers. Johnston, Cannon-Bowers, and Smith-Jentsch (1995) described a variety of individual and team process measures designed to capture responses to events for navy shipboard teams. These included behavioral observation scales, assessment of latencies to events and errors, and ratings. Although the above measures were used to assess performance of navy shipboard teams, they can be adapted for use in a variety of settings. In general, indices of acceptable responses (for example, behaviors and latencies) to events can be developed a priori and incorporated into measurement or job aids. The foregoing suggests two additional emphases for measurement: measures should address qualitative changes in knowledge representation, not just incremental changes in knowledge accumulation (Klein & Baxter, 2006); measures should help trainees and instructors compare situations, both potentially within and across scenarios, to facilitate understanding and carry forward important learning themes. Performance Diagnosis and Feedback Applying the SBT model for diagnosis and feedback, observations and performance assessments are provided to trainees as the events in the scenario are reviewed. Discussions can be facilitated by instructors or by the team members themselves using guided team self-correction (Smith-Jentsch et al., 1998). Smith-Jentsch et al. (1998) presented the TDT method that can serve as a model for combined arms training focusing on team skills. To implement TDT, a model of team performance is introduced at the outset of training, providing an advance organizer. In response to events, trainee strengths and weaknesses with respect to the team skills or team model are noted by instructors, using handheld data collection tools in either computer based or paper-and-pencil format. After the event, instructors facilitate discussion. Specifically, instructors (a) establish a professional climate and (b) prompt team members to systematically address the team skills introduced at the outset of training. This approach can be used to build a coherent model among trainees of effective task performance. Similarly, for SBT in the combined arms domain, instructors can prompt trainees to examine how well they utilized the targeted knowledge and skills in response to events in the simulation, whether the events were preplanned, inserted “on the fly” by an instructor, or occurred by chance through the interactions of trainees with the simulated entities. This approach can be used by an instructor or even by other trainees using team self-correction. A procedure for this would be to query trainees regarding the cues they noticed and the considerations and information they used as the basis for their decisions and actions. A
Training Advanced Skills in Simulation Based Training
263
structured discussion, resembling knowledge elicitation, can be used to reveal strengths and deficiencies in the trainees’ knowledge and skills. The key is to encourage trainees to link key concepts (for example, scheme of maneuver and resources capabilities) to situational cues and to expand the experience base from which they can draw in future situations. For example, consider the following interaction between a trainee and an instructor in response to the trainee’s decision to use close air support (CAS). Instructor: “You decided to use fixed wing CAS. What did you notice about the situation that caused you to go that way?” Fire Support Team Leader: “I keyed in on the ranges of the tanks to my mortar and artillery.” Instructor: “Why? What was important about the ranges?” Fire Support Team Leader: “Mortars aren’t going to do squat against them. Arty will suppress them, but won’t affect the tanks.”
To augment the discussions, instructors should be ready to prompt trainees to consider similar situations or cases from the exercise and to relate their training experiences to other experiences they have had. Creating these comparisons and linkages are methods that can be used to facilitate advanced learning and develop flexible knowledge structures, characteristic of dynamic expertise (Feltovich et al., 1993; Kozlowski, 1998). Teams naturally self-correct during task performance during periods of low workload, a tendency that can be taught, encouraged, and even trained and measured. DISCUSSION We have argued that SBT combined with advanced learning strategies can be used to enhance experiential learning because together they mitigate some of the challenges to training within complex environments and address training of advanced, higher order skills that simulation based training environments allow the training community to address. For the most part, the strategies discussed are not expensive or difficult to implement, and they have a strong research base. In addition, they are among the types of strategies that have been shown to produce improved retention and transfer (Schmidt & Bjork, 1992). However, there may be disadvantages as well. Although the combined approach is likely to result in improved transfer and retention, it might also suppress acquisition performance compared to using a building-block approach or to using simpler scenarios. As Schmidt & Bjork (1992) describe, many of the strategies for improving transfer produce difficulties for learners during acquisition. In addition, the strategies (for example, feedback strategies) might take more time than alternative methods. Such factors may affect trainee and instructor perceptions of the training, which are weighted heavily in training evaluation. The disadvantages seem worth confronting, though. There are few guidelines on how to structure simulation based training environments, and this chapter begins to provide such guidance. The strategies reviewed are among those that can be considered in the thoughtful design of
264
Learning, Requirements, and Metrics
complex training environments and are likely to have a high payoff in the development of resilient and high performing teams. REFERENCES Bransford, J. D., Franks, J. J., Vye, N. J., & Sherwood, R. D. (1989). New approaches to instruction: Because wisdom can’t be told. In S. Vosniadou & A. Ortany (Eds.), Levels of processing and human memory (pp. 470–497). Hillsdale, NJ: Erlbaum. Cannon-Bowers, J. A., Burns, J. J., Salas, E., & Pruitt, J. S. (1998). Advanced technology in scenario-based training. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 365–374). Washington, DC: American Psychological Association. Colegrove, C. M., & Alliger, G. M. (2001). Mission essential competencies: Defining combat mission readiness in a novel way. Paper presented at the SAS-038 NATO Working Group Meeting, Brussels, Belgium. Doane, S. M., Alderton, D. L., Sohn, Y. W., & Pellegrino, J. W. (1996). Acquisition and transfer of skilled performance: Are visual discrimination skills stimulus specific? Journal of Experimental Psychology: Human Perception and Performance, 22, 218–1248. Feltovich, P. J., Hoffman, R. R., Woods, D., & Roesler, A. (2004). Keeping it too simple: How the reductive tendency affects cognitive engineering. IEEE Intelligent Systems, 19 (3), 90–94. Feltovich, P. J., Spiro, R. J., & Coulson, R. K. (1993). Learning, teaching, and testing for complex conceptual understanding. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a new generation of tests (pp. 181–217). Hillsdale, NJ: Lawrence Erlbaum. Fowlkes J. E., Dwyer, D. J., Milham, I. M., Burns, J. J., & Pierce, L. G. (1999). Team skills assessment: A test and evaluation component for emerging weapons systems. Proceedings of the 1999 Interservice/Industry Training, Simulation and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. Fowlkes, J. E., Dwyer, D. J., Oser, R. L., & Salas, E. (1998). Event-based approach to training (EBAT). The International Journal of Aviation Psychology, 8, 209–221. Fowlkes, J. E., Lane, N. E., Dwyer, D. J., Willis, R. P., & Oser, R. (1995). Team performance measurement issues in DIS-based training environments. Proceedings of the 17th Interservice/Industry Training Systems and Education Conference (pp. 272– 280). Arlington, VA: American Defense Preparedness Association. Fowlkes, J., Owens, J., Hughes, C., Johnston, J. H., Stiso, M., Hafich, A., & Bracken, K. (2005). Constraint-directed performance measurement for large tactical teams. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting (pp. 2125–2129). Santa Monica, CA: Human Factors and Ergonomics Society. Gopher, D., Weil, M., & Bareket, T. (1994). Transfer of skill from a computer game trainer to flight. Human Factors, 36, 387–405. Holyoak, K. J. (1991). Symbolic connectionism: Toward third-generation theories of expertise. In K. A. Ericsson & J. Smith (Eds.), Toward a general theory of expertise (pp. 301–336). Cambridge, England: Cambridge University Press. Jentsch, F., Abbott, D., & Bowers, C. (1999). Do three easy tasks make one difficult one: Studying the perceived difficulty of simulation scenarios. Proceedings of the Tenth International Symposium on Aviation Psychology [CD-ROM]. Columbus: The Ohio State University.
Training Advanced Skills in Simulation Based Training
265
Johnston, J. H., Cannon-Bowers, J. A., & Smith-Jentsch, K. A. (1995). Event-based performance measurement system for shipboard command teams. In Proceedings of the First International Symposium on Command and Control Research and Technology (pp. 274–276). Washington, DC: The Center for Advanced Command and Technology. Klein, G., & Baxter, H. C. (2006). Cognitive transformation theory: Contrasting cognitive and behavioral learning [CD-ROM]. Proceedings of the Interservice/Industry Training Simulation and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. Klein, G. & Pierce, L. G. (2001). Adaptive teams. In Proceedings of the 6th ICCRTS collaboration in the information age track 4: C2 decision-making and cognitive analysis. Retrieved from http://www.dodccrp.org/6thICCRTS/ Kozlowski, S. W. J. (1998). Training and developing adaptive teams: Theory, principles, and research. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 115–153). Washington, DC: American Psychological Association. Lane, N. E., & Alluisi, E. A. (1992). Fidelity and validity in distributed interactive simulation: Questions and answers (IDA Document No. 1066). Alexandria, VA: Institute for Defense Analysis. Mayer, R. E. (1989). Models for understanding. Review of Educational Research, 59, 43–64. Rasmussen, E. (1996). Fallon air wing training curriculum. Aimpoint, 12, 38–44. Ross, K. G., & Pierce, L. G. (2000). Cognitive engineering of training for adaptive battlefield thinking. In IEA 14th Triennial Congress and HFES 44th Annual Meeting (Vol. 2, pp. 410–413). Santa Monica, CA: Human Factors. Salas, E., Fowlkes, J., Stout, R., Milanovich, D., & Prince, C. (1999). Does CRM training improve teamwork skills in the cockpit? Two evaluation studies. Human Factors, 41, 326–343. Salas, E., Priest, H. A., Wilson, K. A., & Burke, C. S. (2006). Scenario-based training: Improving military mission performance and adaptability. In A. B. Adler, C. A. Castro, & T. W. Britt (Eds.), Military life: The psychology of serving in peace and combat: Vol. 2. Operational stress (pp. 32–53). Westport, CT: Praeger Security International. Schaafstal, A. M., Johnston, J. H., & Oser, R. L. (2001). Training teams for emergency management. Computers in Human Behavior, 17, 615–626. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Smith-Jentsch, K. A., Jentsch, F. G., Payne, S. C., & Salas, E. (1996). Can pre-training experiences explain individual differences in learning? Journal of Applied Psychology, 81, 110–116. Smith-Jentsch, K. A., Zeisig, R. L., Acton, B., & McPherson, J. A. (1998). Team dimensional training: A strategy for guided team self-correction. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 271–297). Washington, DC: American Psychological Association. Stretton, M. L., & Johnston, J. H. (1997). Scenario-based training: An architecture for intelligent event selection. Proceedings of the 19th Interservice/Industry Training Simulation and Education Conference (pp. 108–117). Arlington, VA: National Training Systems Association.
Chapter 14
EXAMINING MEASURES OF TEAM COGNITION IN VIRTUAL TEAMS C. Shawn Burke, Heather Lum, Shannon Scielzo, Kimberly Smith-Jentsch, and Eduardo Salas The use of work teams in organizations is no longer a distinct competitive advantage used by only the most successful companies, but a common practice driven by the complexity of the work environment. As the use of work teams has increased, teams have taken many different forms in order to meet the needs of a dynamic environment. One form that has become prevalent is virtual teams, with 60 percent of professional employees reporting working in virtual teams (Kanawattanachai & Yoo, 2002). Virtual teams have most recently been defined as “teams whose members use technology to varying degrees in working across locational, temporal, and relational boundaries to accomplish an interdependent task” (Martins, Gilson, & Maynard, 2004, p. 808). Virtual teams have been argued to be a mechanism that can reduce travel time and costs associated with bringing together distributed members working on a common task. While virtual teams offer organizations flexibility, they also create challenges. Difficulties have been identified in the areas of planning and coordination across time zones, cultural differences (Kayworth & Leidner, 2000), effective communication (Sproull & Kiesler, 1986), team monitoring, and backup behavior (Martins et al., 2004). Underlying many of these challenges are differences in the cognitive processes and states that emerge as a result of individuals enacting their respective roles. Although individuals assigned to virtual teams are often experts in their individual roles, the processes and states that emerge do not always serve to promote effective team performance. While progress has been made in understanding the knowledge structures and cognitive processes that promote effective coordination within conventional teams, this has been a relatively neglected area within virtual teams (Martins et al., 2004). Many of the challenges mentioned with regard to teamwork within virtual teams have at their root problems in building compatible knowledge structures that allow members to be anticipatory in their prediction of member needs in the face of degraded social cues. While examining team cognition is often not easy, much progress has been made in this area (for example, Cooke, Salas,
Examining Measures of Team Cognition in Virtual Teams
267
Kiekel, & Bell, 2004; Lewis, 2003). However, organizations often fail to leverage what is known. In light of the above, the purpose of the current chapter is to create a frame within which measures of team performance, specifically team cognition, can be assessed with regard to their applicability to virtual teams. In building the requisite framework virtual teams and the role that team cognition occupies in their effectiveness will be defined. Next, the basic components of team performance measurement systems will be described. Finally, performance measurement characteristics will be used to review how current measures of team cognition may apply within virtual teams, culminating in a set of guidelines.
WHAT ARE VIRTUAL TEAMS? Since their inception, virtual teams have been defined in several ways. Driskell, Radtke, and Salas (2003) define virtual teams as those teams “whose members are mediated by time, distance, or technology” (p. 297). While there are variations across definitions (see Priest, Stagl, Klein, & Salas, 2005, for a review), a fair amount of consistency exists in how the boundaries between virtual and traditional teams have been described. Bell and Kozlowski (2002) argue for two primary characteristics that distinguish virtual and conventional teams: spatial distance and mode of communication. Contrary to conventional teams, virtual teams are not colocated, but are geographically and, often, temporally distributed. The second boundary condition, communication mode, refers to the fact that while conventional teams may augment face-to-face communication with other more technologically enabled forms, communication within virtual teams must be technologically mediated. Thereby, it is not the task itself that distinguishes virtual from conventional teams, but the manner in which tasks are accomplished based on the configural properties of virtual teams. Researchers have recently begun to argue that virtual teams lie along a continuum and vary on their virtualness (see Bell & Kozlowski, 2002; Priest et al., 2005). Four properties have been identified that, when combined, result in virtual team types that vary in workflow patterns and task complexity: member roles, boundaries, lifecycle, and temporal distribution (Bell & Kozlowski, 2002). The first characteristic is the degree to which team members hold singular or multiple roles. As the number of roles team members hold increases, so does the potential for role conflict and ambiguity. A second distinguishing characteristic, but at the team level, is the team’s boundaries. Virtual teams can contain cross-functional, organizational, and cultural boundaries or be more similar to conventional teams and be bounded within a single organization, cultural, or functional boundary (Bell & Kozlowski, 2002). As the degree to which the team crosses different boundaries increases, it becomes more difficult to establish and maintain a team identity, cohesion, and leadership. The third characteristic that has been argued to distinguish between types of virtual teams is their lifecycles. Within virtual teams, members often rotate in and out, disrupting team development and engendering a shorter lifecycle than is typical in most conventional teams (Bell &
268
Learning, Requirements, and Metrics
Kozlowski, 2002). Finally, the temporal distribution may range from operation in real time due to tightly coupled interdependencies to more sequential and asynchronous interaction for those teams that are more loosely coupled. Given this brief examination of the distinguishing features of virtual teams, the next logical question arises: What are the competencies needed to facilitate successful navigation of the virtual team terrain?
TEAM COGNITION WITHIN VIRTUAL TEAMS While the research literature on virtual teams is relatively young, it is reasonable to expect that the core competencies (that is, teamwork and taskwork, see also Marks, Mathieu, & Zaccaro, 2001) identified within conventional teams are a necessary, but not sufficient, condition for success within virtual teams. While taskwork knowledge, skills, abilities, and other characteristics (KSAOs) provide the initial foundation for performance, teamwork KSAOs provide the mechanism by which members are able to coordinate to accomplish the task. It is often not taskwork that poses the greatest challenge for virtual teams, as members are often purposely selected based on their taskwork capabilities, but teamwork. There have been a number of processes and states identified as necessary for virtual team effectiveness (Martins et al., 2004); however, the focus here will be on those that theoretically underlie a team’s ability to implicitly coordinate their actions. This focus was chosen as the environment within which virtual teams operate often produces decrements in the quality and number of nonverbal cues guiding coordination within traditional teams; thereby coordination is often most similar to the notion of implicit coordination as seen within traditional teams. Therefore, a focus on measures of team cognition and how they might apply within virtual teams is warranted.
Team Cognition Defined Team cognition has been defined as the interaction of internalized and externalized processes, which emerge from individual cognition, team interactions, and process behaviors (Fiore & Schooler, 2004). It has been characterized as a type of awareness used to bind a team’s actions (Gutwin & Greenberg, 2004) and communication. Recently, another term, macrocognition, has begun to emerge to describe many of the cognitive processes and states that comprise team cognition. Macrocognition has been defined as “the internalized and externalized high level mental processes employed by teams to create new knowledge during complex, one of a kind, collaborative problem solving” (Letsky, Warner, Fiore, Rosen, & Salas, 2007, p. 7). Major processes have been argued to include the following: individual knowledge building, team knowledge building, developing shared problem conceptualizations, team consensus development, and outcome appraisal (Warner, Letsky, & Cowen, 2005).
Examining Measures of Team Cognition in Virtual Teams
269
Within the current chapter, we focus on the intersection of macrocognition and team cognition as traditionally defined. Components of team knowledge building and developing shared problem conceptualizations will be focused on as they move beyond individual knowledge building, thereby providing the foundation for coordination within virtual teams. Within team knowledge building measurement, developments related to transactive memory and shared mental models will be examined, while team situation awareness will be examined within developing shared problem conceptualizations. Shared Mental Models Shared mental models (SMMs) have been defined as organized knowledge structures that are held by more than one team member and involve the integration of information and the comprehension of a given phenomenon (JohnsonLaird, 1983; Cannon-Bowers, Salas, & Converse, 1993). These structures do not have to be shared in the truest sense, but instead represent compatible knowledge structures. It has been argued that compatible mental models facilitate the effective coordination of action, as well as promote a similar method for processing new information within the team (Klimoski & Mohammed, 1994). SMMs have been found to increase members’ abilities to recognize causal relationships, make sound inferences, and create better explanations regarding the task and team member actions (see Mathieu, Heffner, Goodwin, Salas, & Cannon-Bowers, 2000). Transactive Memory Systems The construct of transactive memory is an expansion on most conceptualizations of SMM in that it speaks to the storage of information, not the interrelationships among the stored items. Wegner (1986) argues that a transactive memory system (TMS) is composed of each individual’s and a collective awareness of with whom that knowledge resides. Others argue that team members’ metaknowledge, consensus/agreement, and accuracy are necessary components (Yoo & Kanawattanachai, 2001; Austin, 2003). The TMS is founded on the idea that, due to complexity, individuals know only part of what the team as a whole knows, and this team knowledge is distributed unequally among members (Moreland, Argote, & Krishnan, 1998). Initial research indicates that a TMS enhances a team’s performance, particularly when the task is complex and requires a considerable contribution of knowledge from individual team members (Faraj & Sproull, 2000). Team Situation Awareness The importance of situational awareness in complex tasks has received extensive empirical support, both at the individual and team levels (for example, Endsley, 1995; Salas, Prince, Baker, & Shrestha, 1995). Situation awareness has been
270
Learning, Requirements, and Metrics
defined as the “perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future” (Endsley, 1995, p. 36). Although often analyzed at an individual level, researchers have begun to conceptualize team situation awareness (TSA). Salas et al. (1995) suggest that TSA is more than the sum of the individual members’ situation awareness, but includes team process behaviors as well. So given the argued importance of team cognition, how do those charged with developing/maintaining virtual team effectiveness diagnose the aforementioned aspects of team cognition? Team performance measurement systems are a proven method that can guide instructors, practitioners, and team members themselves in the diagnosis process.
PROPERTIES OF TEAM PERFORMANCE MEASUREMENT SYSTEMS Team performance measurement is by no means new; however, it is often one of the most overlooked and misunderstood components within team development. The term is somewhat of a misnomer as it does not only refer to the measurement of team performance as an outcome, but also to the processes and states that comprise such performance. Team performance measurement systems serve multiple purposes within organizations, not the least of which is as a diagnostic aid. Quality team performance measurement systems are the only method by which teams can be systematically evaluated to catch problems early before they become ingrained in members’ thought processes and actions. The information gained as a result of systematically designed and soundly implemented team performance measurement systems can also serve as a basis for the design, delivery, and choice of interventions to further team development. For a review of the basic properties of team performance measurement system, see Figure 14.1. When examining team performance measures to assess fit for a particular purpose, it is important to realize that each measure can be delineated into its component parts. The content, the elicitation source, the elicitation method, and the indexing/aggregation method combine to create a single measure. It is not uncommon for these four components to be spoken of as one entity; however, they are distinct components that should each be taken into account when determining the best measure for a situation. The content of a measure refers to what the measure is actually attempting to capture (that is, knowledge, behavior, or attitudes). This can be further subdivided into the specific content of that knowledge, behavior, or attitude. For example, when talking about shared cognition, the content may be knowledge of the equipment, task, team member roles/responsibilities, expertise, or situation. The second component of a performance measure is the source from which the response is elicited (for example, subjective source—peer, participant/self-report, or observer; objective source—equipment). The third component, method, refers to the manner in which the information is extracted (for example, interviews, checklists, card sorts, Likert scales, vignettes,
Examining Measures of Team Cognition in Virtual Teams
Figure 14.1.
271
Team Performance Measurement System Components
or think-alouds). Finally, when dealing with teams, the method of aggregation becomes an important component. This refers to the manner in which the information is scored and compiled (for example, percentage, mean, sum, distance, or correlation). These four components form the basis for understanding any diagnostic measure. TEAM PERFORMANCE MEASUREMENT IN VIRTUAL TEAMS While there are differences between conventional and virtual teams, the components of a diagnostic metric and the associated decisions to be made remain the same. What differs is not the manner in which metrics for virtual teams are designed, but the task and team characteristics that drive the information provided to the metrics. Next, the high level questions to be asked are identified, and prescriptive guidance as to how metrics may look dependent on the characteristics of virtual teams is offered. What to Measure? The first question that must be answered within any diagnostic endeavor is what the metric should capture. The answer is driven by two factors: the construct
272
Learning, Requirements, and Metrics
being diagnosed and the important components within that construct. As we have argued that team cognition is especially important due to its role in promoting coordination within virtual teams, we first briefly review the various manner in which content is conceptualized with regard to SMM, TMS, and TSA. Next, how content is driven by the defining characteristics of virtual teams will be specified. Shared Mental Models—Content With regard to SMM, the literature has primarily argued for knowledge structures to be compatible around four foci or content areas: the equipment, task, team (Rouse, Cannon-Bowers, & Salas, 1992), and team interaction (CannonBowers et al., 1993). Metrics whose content is equipment knowledge focus on how the equipment that the team is interacting with works. This knowledge allows team members to predict what the equipment is likely to do and when to make a response. Conversely, metrics that focus on diagnosing task knowledge structures query knowledge relating to the basic attributes of the task and how to accomplish it. Content within such metrics focuses on diagnosing knowledge and beliefs regarding task procedures, goals, strategies, and the interrelationships among this content. Compatibility in terms of task knowledge has been argued to allow members to describe why task performance is important, what situations may occur, explain task procedures, and predict consequences of performance (Rouse et al., 1992). The last two types of content contained within metrics of SMM are team and team interaction (see Cannon-Bowers et al., 1993). The team mental model (TMM) contains knowledge about team member characteristics, including their task knowledge, skills, abilities, and preferences. The team interaction model (TIM) contains information about the team in relation to the individual and collective requirements needed for effective team interaction. While the TMM allows members to form expectations and predict future performance, the TIM allows members to anticipate and sequence their collective actions. Compatibility between the last two types of content (that is, team and team interaction) has been argued to be most important for coordinated team action (see CannonBowers et al., 1993). As these knowledge structures have been argued to be hierarchical in nature (Rentsch & Hall, 1994), metrics have tended to focus on the content indicative of knowledge structures higher in the hierarchy (for example, task, team, and team interaction). Transactive Memory Systems—Content TMSs are a relatively new development within the team literature. Existing metrics either directly (for example, Rau, 2006) or indirectly (for example, Lewis, 2003) diagnose TMS. For example, Austin (2003) created a questionnaire that directly assessed the team’s knowledge content by measuring the team’s collective knowledge, as well as specialization of, agreement about the location of, and accurate perceptions of who possesses said knowledge, in relation to a
Examining Measures of Team Cognition in Virtual Teams
273
specific topic. Other measures of TMS have included content that is indirectly related to this construct (that is, memory differentiation, task coordination, and task credibility; see, for example, Moreland et al., 1998). Team Situation Awareness—Content Situation awareness has been argued to be comprised of three levels: perceiving components in the environment, comprehending the situation, and predicting future scenarios (Endsley, 1995). Perceiving requires one to be familiar with environmental features and the changes that occur. Comprehending, the second level, engages working memory to assist in comprehending the significance of complex environmental cues. The third level, predicting future actions, requires team members to not only perceive and comprehend the environment, but to implement a mental model of the surroundings. Building from this conceptualization, the content within TSA measures revolves around the perception of key elements within the environment, comprehension of their meaning, and projection of their status in the near future. Most measures focus more heavily on the perception of elements as compared to meaning and projection. Application to Virtual Teams While the preceding paragraphs have afforded a brief review of the content contained within measures of team cognition, the exact content that should be included is dependent on the task and team characteristics that drive performance. In this regard there are four primary characteristics that may impact the content to be assessed: variations in roles, lifecycle, temporal distribution, and interdependence. Beginning with differences in role configurations, Bell and Kozlowski (2002) argued that as teams move along the virtual continuum, the degree to which members occupy multiple roles is likely to vary. As the number of roles increases, there is a greater propensity for role ambiguity and corresponding role conflict. This role conflict is likely to translate to other team members. Specifically, it becomes increasingly likely that members will have different conceptualizations concerning the true nature of particular member roles and responsibilities. This ambiguity often results in less compatibility among the knowledge structures that serve to guide coordination. Given the above, measures designed for virtual teams should, at a minimum, be focused on capturing the content within TMS, as well as team models. Guideline 1: Assess the degree to which members of virtual teams have singular or multiple roles within and across virtual teams. Guideline 1a: Design measures to directly assess a collective awareness of who knows what and areas of expertise (that is, TMS); this becomes increasingly important as role complexity increases. Guideline 1b: Design measures to assess knowledge about team member characteristics and responsibilities (that is, TMM); this becomes increasingly important as role complexity increases.
274
Learning, Requirements, and Metrics
A second characteristic that may differentiate virtual teams is the team’s lifecycle. Within virtual teams there is a tendency for rotating members resulting in a shorter lifecycle (Bell & Kozlowski, 2002). Thereby, it becomes more difficult to maintain the compatible knowledge and affective structures that guide behavior. While mental models at the higher levels (that is, team and team interaction) have been shown to be primarily responsible for the seamless coordination and adaptation within effective teams, these mental models often take more time to fully develop than equipment or task models. Therefore, those mental models that play the heaviest role in implicit coordination and adaptation (skills that are often more challenging for virtual teams due to degraded social cues, loss of face-toface contact, distribution, and added role complexity) may not be fully developed as lifecycles become shorter or membership is changing. Guideline 2: Assess the degree to which virtual team members rotate in and out of the team (that is, a short versus a long lifecycle). Guideline 2a: Design measures to include content related to knowledge of the team and team interaction and within team compatibility of this knowledge; this information becomes more diagnostic within short lifecycles. Guideline 2b: Target the entire content within TMS to better diagnose virtual teams with short lifecycles. The distributed nature of many virtual teams also impacts the content that should be included within measures of team cognition. As members become further distributed in time and space, it becomes more difficult to maintain a common awareness of environmental elements that impact team action. Virtual team members may see very different elements of the situation; consequently “ground truth” regarding TSA must be determined. This argues for a need for measures of content related to both individual and team level situation awareness. Content at the individual level will assist in diagnosing where the breakdown actually occurred (for example, perception, meaning assignment, or communication). Research has indicated that shared mental models allow team members to predict the needs of their teammates (Mathieu et al., 2000), oftentimes without explicit communication (Entin & Serfaty, 1999). This becomes important for as distribution increases, it becomes more difficult to explicitly coordinate, thereby creating an increased reliance on coordination resembling implicit coordination. Guideline 3: Assess the degree of temporal and physical distribution within virtual teams. Guideline 3a: Design content to capture the knowledge contained within team and team interaction models; these models become more difficult to maintain as distribution increases. Guideline 3b: Design content to capture both individual and team level situation awareness; it will assist in diagnosing whether decrements are in terms of failures of perception and meaning at the individual level or communication at the team level, each of which become more challenging as distribution increases. Finally, virtual teams may vary in the level of required task interdependence. As task interdependency increases from pooled to team (see Saavedra, Earley,
Examining Measures of Team Cognition in Virtual Teams
275
& Van Dyne, 1993), there is a corresponding need to coordinate and synchronize member actions. Additionally, there is a tendency for tasks to become more complex as teams move up the task interdependence hierarchy. Teams that are operating under higher levels of interdependency have more freedom in how the task and member roles are structured. Guideline 4: Assess the degree of task interdependence required within the virtual team prior to designing or choosing measures. Guideline 4a: Design measures to include content related to team and team interaction knowledge for virtual teams that require moderate to high degrees of task interdependence. Where to Collect Information? Once the content of measures is decided upon, a second decision that often serves to categorize a metric is the source of the content. Within the larger team performance measurement literature, the predominant source of elicitation is the individual being diagnosed. Although other elicitation sources are used (for example, supervisors, peers, and trained raters), self-report is the most predominant. While it may be argued that no one person has a better understanding of an individual’s cognition than the individual in question, even experts may not have sufficient levels of insight to successfully verbalize the information, and self-reports have repeatedly been criticized for biases. Application to Virtual Teams The determination of whom to elicit the data from should be driven by the research question, as well as an awareness of who has opportunities to observe the behavior being diagnosed. Recently researchers have begun to argue that team cognition manifests itself in the behavioral actions that teams enact (Cooke et al., 2004). Thereby, information can be collected from other sources in addition to the targeted individual. Within virtual teams it is especially important to use a multisource approach to knowledge elicitation as often individual members who are distributed may have very different perceptions serving as input to their knowledge structures. By gathering information from multiple personnel and from objective indices it becomes possible to assess where breakdowns in team cognition are occurring. For example, with regard to TSA, are breakdowns originating due to individual members misperceiving their unique perspectives or due to unsuccessful sharing of critical information? Guideline 5: Collect information from a variety of sources as the triangulation of information will provide a fuller picture of the state of team cognition, especially as distribution increases. How to Elicit Information? Elicitation methods span numerous dimensions: qualitative to quantitative, subjective to objective, and explicit to implicit. Looking across the three cognitive constructs discussed within the current chapter, elicitation methods are
276
Learning, Requirements, and Metrics
primarily subjective and explicit. In particular, methods used to elicit SMM include questionnaires, network scaling methods, concept mapping, causal mapping, card sort, content analysis of communication, and observation. Questionnaires, concept mapping, and card sorts all tend to be explicit/intrusive, rely heavily on self-report data, and use paper-and-pencil instruments. Several programs have emerged that allow the electronic delivery of card sorts and concept mapping (see Hoeft et al., 2003). More indirect methods include network scaling, communication content analysis, and observation. While the former tends to use computer algorithms to score content relationships and network structures, the basis for the input is normally team member relatedness ratings of concepts that tap equipment, task, team, or team interaction knowledge. Conversely, content analysis of communication and observations are normally conducted post hoc or near real time via the use of trained observers. As some have argued that TMS are a subset of shared mental models, it is not surprising that the methods of extraction appear very similar, albeit slightly more limited. Specifically, the predominant methods used to assess TMS are questionnaires, observations, and communication analysis. Finally, with regard to TSA, the most common forms of elicitation are query methods, followed by self/peer ratings, event based observation, and communication analysis. Query methods are programs that ask specific questions about the situation while a participant is performing a task with methods varying as to the query’s obtrusiveness (see Situation Awareness Global Assessment Technique [SAGAT], Endsley, 2000; Situation-Present Assessment Method [SPAM], Durso, Hackworth, Truitt, Crutchfield, & Nikolic, 1999). The SAGAT method freezes the display to ask the question, whereas SPAM presents queries in the task allowing the user to obtain information from the task environment (Cooke, Stout, & Salas, 2001). Often using the same referent as query methods are rating scales in which perceived levels of TSA are assessed. For example, the Situational Awareness Rating Scale (SARS; Bell & Waag, 1995; Waag & Houck, 1994) and the Situational Awareness Rating Technique (SART; Taylor, 1989) have been used to obtain self-report data. The SARS has also been used to collect peer ratings. Less obtrusive methods include event based observation and post hoc communication analysis. Event based observation methods use trained raters to assess the presence of behavioral markers of TSA. This method has been used extensively within a variety of environments and specifies a priori defined events within which markers are created (see Dwyer, Fowlkes, Oser, & Lane, 1997). Finally, communication analysis may overcome limitations associated with query methods and event based ratings. The latter may pose challenges within field environments where there is much ambiguity. The use of communication analysis, whereby the content and pattern of a team’s communication is analyzed, can provide an assessment of TSA. Application to Virtual Teams The heavy reliance on self-report and paper-and-pencil measures that can be cumbersome in virtual distributed teams argues for a need to move elicitation
Examining Measures of Team Cognition in Virtual Teams
277
methods beyond the sole use of traditional methods. Moreover, the distributed nature of virtual teams makes it difficult to conduct observations within context of the entire team. Thus, it is recommended that, in addition to broadening our toolbox, the technology present in such teams is leveraged to translate traditional measures to electronic formats. Guideline 6: Design measures to reduce the additional burdens put on those assessing teams distributed across time and space. Guideline 6a: Take advantage of the technology embedded within virtual teams to translate paper-and-pencil measures to electronic formats. Guideline 6b: Use embedded measurement and design integration mechanisms within the system to further reduce the burden of assessors. While there is a wide variety of methods, there is also a need to move away from existing methods in that they are subjective and self-reporting. The technology present within virtual teams should be leveraged to not only reduce the obtrusiveness and cumbersome nature of existing measures, but also to incorporate techniques in other domains. One such nontraditional, emerging tool is the use of psychophysiological measurement techniques. The term “psychophysiological measurement” is a blanket expression for measures that examine changes in physiological data and how that may translate into differences in psychological states. This provides unique information that can be streamlined into simulation experimental designs (Cacioppo, Bernston, Sheridan, & McClintock, 2000). One downfall is that there is no direct link between psychological and physiological processes, so a certain level of inference must be used to analyze data. The following psychophysiological tools may prove the most beneficial in team cognition assessment: eye tracking, electroencephalogram, and vocal characteristics. As the first two have the most direct connection to team cognition, these will be briefly discussed. An eye tracker measures eye movements, pupil size, focus, and other characteristics of one or both eyes while engaged in a task. An eye tracker can capture certain metrics that are important indicators of cognitive and social processes (Lum, Feldman, Sims, & Salas, 2007). This device provides information regarding an individual’s gaze allowing a researcher to identify what a person is looking at any given time and eye movement patterns across an entire task (Poole & Ball, 2006). Previous studies have measured information exchange by indexing when, and how often, a piece of information was passed from one team member to another, yet this may not indicate that this information was received and used by the member for whom it was intended. Employing an eye tracker, along with traditional measures, might more definitively determine if the information was received and what was done with it. This would be especially useful in diagnosing virtual team performance, as often information gets lost due to the temporal and physical distribution inherent in the medium. Another potential application includes an examination of individual and group differences in shared mental model generation in a virtual environment. For example, eye-tracking data could be collected while participants perform a simulation, and eye patterns while performing the simulation could be used (for example, some people may look more
278
Learning, Requirements, and Metrics
at certain events or information) to predict how team members interacted in the simulation. Electroencephalogram (EEG) is another possible objective method, albeit an intrusive one. EEG measures the electrical activity of the brain using electrodes strategically placed on the scalp that send electrical impulses occurring at the millivolt level to a digital or analog acquisition device. EEG is a more proximal indicator of cognitive activity than some of the other psychophysiological metrics because it measures actual cortical activation (Davidson, Jackson, & Larson, 2000). Certain metrics within EEG, such as bandwidth, are able to discriminate between general arousal and focused attention (Klimesch, 1999). This may have implications for diagnosing virtual teams with regard to TSA, as well as changes in arousal levels during stressful situations and related performance decrements. While these methods have not been validated as measures of team cognition, comparison of validated traditional measures with these newer methods would increase confidence. For example, team processes and macrocognition may be aided by the use of psychophysiological measurement in detection of how often and exactly when teammates look at stimuli in their environment during interaction of a team task. Studies that have used team indices employ measures of physiological compliance or the similarity of physiological activity between team members and show that compliance positively predicts team coordination efforts and team performance (Henning & Korbelak, 2005; Henning, Boucsein, & Gil, 2001). These alternative methods may not only pick up on some of the truly challenging aspects of virtual teams, but also provide a fuller picture of team cognition. Guideline 7: Think outside the box in deciding on methods to employ, consider psychophysiological methods as augmenting traditional methods of gathering information pertaining to team cognition. Guideline 7a: Use eye tracking to obtain objective indices of perception of environmental elements and, in turn, diagnose the components of TSA. Guideline 7b: Pair psychophysiological measures with traditional measures to begin to establish convergent and/or divergent validity among methods of assessing team cognition.
How to Index and Aggregate Elicited Information? A final component of every measure of team cognition is the level (that is, individual or team) at which the construct of interest is captured. The nature of the specific construct of interest should be what drives the decision as to the level of measurement to target. Within measures of team cognition, it is most common to collect information at the individual level and then aggregate to the team level. Typically indexing has been done through averaging. Recently there has been an increased focus on the manner in which team level constructs when measured at the individual level should be aggregated. Kozlowski and Klein (2000) have argued that there are two primary ways in which individual constructs may manifest themselves at the team level: compilation and composition. In the current
Examining Measures of Team Cognition in Virtual Teams
279
situation, composition describes a process whereby an individual construct (that is, situation awareness) emerges upward to the team level (that is, team situation awareness), but essentially remains the same. When this happens and within-unit variance is demonstrated, aggregation to the team level can be represented by the mean or sum. Conversely, compilation is the process whereby similar but distinctively different lower level properties combine into a higher level (for example, team) property that is related to but different from its diverse lower level constituent parts (Kozlowski & Klein, 2000). Constructs that emerge through compilation do not represent shared properties across levels, but rather are qualitatively different (that is, constructs are characterized by patterns). Thereby, constructs that emerge in this manner are best represented by the minimum or maximum, indices of variation, profile similarity, multidimensional scaling (Kozlowski & Klein, 2000, p. 34) along with a number of other related techniques. The specific manner in which team cognition materializes, and how that materialization is operationalized, is contingent upon organizational context, work-flow interdependencies, and other situational factors (Klein & Kozlowski, 2000). Given this, some researchers have argued that capturing individual level cognition and aggregating it to the team level through a mean index may not always be the most appropriate. Cooke, Kiekel, Bell, and Salas (2002) argue that as role specialization increases within a team, it is no longer appropriate to use the mean as an aggregating index. Thereby, Cooke et al. (2002) have proposed a more holistic assessment of team cognition that results from “the interplay of the individual cognition of each team member and team process behaviors” (p. 85). Although this may be more difficult to develop, it may ultimately be a better way of determining certain aspects of team knowledge. Within this framework content is directly assessed at the team level. Guideline 8: Recognize that there is not one correct manner in which to index measures of team cognition—it depends on task and team characteristics. Guideline 8a: Use task interdependence and role structure to assist in guiding choice of aggregation and indexing method.
CONCLUDING COMMENTS As organizations continue to invest in technology and location becomes less of an issue in selecting team members, the use of virtual teams will continue to rise. While many of the lessons that have been learned concerning effectiveness in conventional teams are expected to hold within virtual teams, virtual teams possess some unique challenges. Due to their distributed nature breakdowns in team process, and the knowledge structures that guide such a process, may propagate over time with members not being aware until these have become rather large and ingrained. While systematic, frequently delivered diagnosis with corresponding feedback has been argued to be essential in order for teams to continually evolve and adapt, it is even more important in situations where errors can easily propagate, as is the case with virtual teams.
280
Learning, Requirements, and Metrics
Within the current chapter we have identified several characteristics that distinguish virtual from conventional teams, as well as distinguish among varieties of virtual teams. Knowledge about performance measurement within conventional teams was then leveraged against these characteristics to identify a set of guidelines for metrics of team cognition within virtual teams. While we acknowledge that there is much to learn about virtual teams, we hope that what is offered within the current chapter will begin to foster thinking concerning measurement within such teams. Finally, we hope that the current chapter encourages those responsible for team performance measurement to remember that in the creation and implementation of diagnostic instruments, one must not only consider psychometric properties, but the series of decisions related to content, elicitation source, elicitation method, and the indexing/aggregation. ACKNOWLEDGMENTS The views expressed in this chapter are those of the authors and do not necessarily reflect official U.S. Navy policy. This work was supported in part by an ONR MURI Grant No. N000140610446 (Dr. Michael Letsky, Program Manager). REFERENCES Austin, J. R. (2003). Transactive memory in organizational groups: The effects of content, consensus, specializations, and accuracy on group performance. Journal of Applied Psychology, 88, 866–878. Bell, B. S., & Kozlowski, S. W. J. (2002). A typology of virtual teams: Implications for effective leadership. Group and Organization Studies, 27(1), 14–19. Bell, H. H., & Waag, W. L. (1995). Using observer ratings to assess situational awareness in tactical air environments. In D. J. Garland & M. R. Endsley (Eds.), Experimental analysis and measurement of situation awareness (pp. 93–99). Daytona Beach, FL: Embry-Riddle Aeronautical University Press. Cacioppo, J. T., Bernston, G. G., Sheridan, J. F., & McClintock, M. K. (2000). Multilevel integrative analyses of human behavior: Social neuroscience and the complementing nature of social and biological approaches. Psychological Bulletin, 6, 829–843. Cannon-Bowers, J. A., Salas, E., & Converse, S. A. (1993). Shared mental models in expert team decision making. In N. J. Castellan, Jr. (Ed.), Current issues in individual and group decision making (pp. 221–246). Hillsdale, NJ: Lawrence Erlbaum. Cooke, N. J., Kiekel, P. A., Bell, B., & Salas, E. (2002, October). Addressing limitations of the measurement of team cognition. Proceedings of the 46th Annual meeting of the Human Factors and Ergonomics Society (pp. 403–407). Santa Monica, CA: Human Factors and Ergonomics Society. Cooke, N. J., Salas, E., Kiekel, P. A., & Bell, B. (2004). Advances in measuring team cognition. In E. Salas & S. M. Fiore (Eds.), Team cognition (pp. 83–106). Washington, DC: American Psychological Association. Cooke, N. J., Stout, R. J., & Salas, E. (2001). A knowledge elicitation approach to the measurement of team situation awareness. In M. McNeese, E. Salas, & M. Endsley (Eds.), New trends in cooperative activities: Understanding system dynamics in
Examining Measures of Team Cognition in Virtual Teams
281
complex environments (pp. 114–139). Santa Monica, CA: Human Factors and Ergonomics Society. Davidson, R. J., Jackson, D. C., & Larson, C. L. (2000). Human electroencephalography. In J. T. Cacciopo, L. G. Tassinary, & G. G. Berntson, (Eds.), Handbook of psychophysiology (pp. 27–52). Cambridge, MA: Cambridge University Press. Driskell, J. E., Radtke, P. H., & Salas, E. (2003). Virtual teams: Effects of technological mediation on team performance. Group Dynamics: Theory, Research, and Practice, 7 (4), 297–323. Durso, F. T., Hackworth, C. A., Truitt, T. R., Crutchfield, J., & Nikolic, D. (1999). Situation awareness as a predictor of performance in en route air traffic controllers (Rep. No. DOT/FAA/AM-99/3). Washington, DC: Office of Aviation Medicine. Dwyer, D. J., Fowlkes, J. E., Oser, R. L., & Lane, N. E. (1997). Team performance measurement in distributed environments: The TARGETs methodology. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance assessment and measurement: Theory, methods, and applications (pp. 137–153). Mahwah, NJ: Lawrence Erlbaum. Endsley, M. R. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors, 37, 32–64. Endsley, M. R. (2000). Direct measurement of situation awareness: Validity and use of SAGAT. In M. R. Endsley & D. J. Garland (Eds.), Situation awareness analysis and measurement (pp. 147–174). Mahwah, NJ: Lawrence Erlbaum. Entin, E. E., & Serfaty, D. (1999). Adaptive team coordination. Human Factors, 41(2), 312–325. Faraj, S., & Sproull, L. (2000). Coordinating expertise in software development teams. Management Science, 46, 1544–1568. Fiore, S. M., & Schooler, J. W. (2004). Process mapping and shared cognition: Teamwork and the development of shared problem models. In E. Salas & S. M. Fiore (Eds.), Team cognition: Understanding the factors that drive process and performance (pp. 133– 152). Washington, DC: American Psychological Association. Gutwin, C., & Greenberg, S. (2004). The importance of awareness for team cognition in distributed collaboration. In E. Salas & S. M. Fiore (Eds.), Team cognition: Understanding the factors that drive process and performance (pp. 177–201). Washington, DC: American Psychological Association. Henning, R., & Korbelak, K. T. (2005). Social-psychological compliance as a predictor of future team performance. Psychologia, 45(2), 84–92. Henning, R. A., Bouscein, W., & Gil, M. C. (2001). Social-physiological compliance as a determinant of team performance. International Journal of Psychophysiology, 40, 221–232. Hoeft, R. M., Jentsch, F. G., Harper, M. E., Evans, A. W., Bowers, C. A., & Salas, E. (2003). TPL-KATS—concept map: A computerized knowledge assessment tool. Computers in Human Behavior, 19 (6), 653–657. Johnson-Laird, P. (1983). Mental models. Cambridge, MA: Harvard University Press. Kanawattanachai, P., & Yoo, Y. (2002). Dynamic nature of trust in virtual teams. Strategic Information Systems, 11, 187–213. Kayworth, T. & Leidner, D. (2000). The global virtual manager: A prescription for success. European Management Journal, 18(2), 183–194. Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Research Reviews, 29, 169–195.
282
Learning, Requirements, and Metrics
Klimoski, R., & Mohammed, S. (1994). Team mental model: Construct or metaphor? Journal of Management, 20, 403–437. Kozlowski, S. W. J., & Klein, K. (2000). A multilevel approach to theory and research in organizations: Contextual, temporal, and emergent processes. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations: Foundations, extensions, and new directions (pp. 3–90). San Francisco: Jossey-Bass, Inc. Letsky, M., Warner, N., Fiore, S. M., Rosen, M. A., & Salas, E. (2007). Macrocognition in complex team problem solving. Paper presented at the 11th International Command and Control Research and Technology Symposium (ICCRTS), Cambridge, United Kingdom. Lewis, K. (2003). Measuring transactive memory systems in the field: Scale development and validation. Journal of Applied Psychology, 88, 587–604. Lum, H., Feldman, M., Sims, V., & Salas, E. (2007). Eye tracking as a viable means to study augmented team cognition. In D. D. Schmorrow, D. M. Nicholson, J. M. Drexler, & L. M. Reeves (Eds.), Foundations of Augmented Cognition (4th ed., pp. 190–196). Arlington, VA: Strategic Analysis, Inc. Marks, M. A., Mathieu, J. E., & Zaccaro, S. J. (2001). A temporally based framework and taxonomy of team processes. Academy of Management Review, 26(3), 356–376. Martins, L. L., Gilson, L. L., & Maynard, M. T. (2004). Virtual teams: What do we know and where do we go from here? Journal of Management, 30, 805–835. Mathieu, J. E., Heffner, T. S., Goodwin, G. F., Salas, E., & Cannon-Bowers, J. A. (2000). The influence of shared mental models on team process and performance. Journal of Applied Psychology, 85, 273–283. Moreland, R. L., Argote, L., & Krishnan, R. (1998). Training people to work in groups. In R. S. Tindale, L. Heath, J. Edwards, E. J. Posavac, F. B. Bryant, Y. Suarez-Balcazar, E. Henderson-King, & J. Myers (Eds.), Theory and research on small groups (pp. 37–60). New York: Plenum. Poole, A., & Ball, L. J. (2006). Eye tracking in HCI and usability research. In C. Ghaoui (Ed.), Encyclopedia of human-computer interaction (pp. 211–219). Hershey, PA: Idea Group, Inc. Priest, H. A., Stagl, K. C., Klein, C., & Salas, E. (2005). Creating context for distributed teams via virtual teamwork. In C. A. Bowers, E. Salas, & F. Jentsch (Eds.), Creating high tech teams (pp. 185–212). Washington, DC: American Psychological Association. Rau, D. (2006). Top management team transactive memory, information gathering, and perceptual accuracy. Journal of Business Research, 59, 416–424. Rentsch, J. R., & Hall, R. J. (1994). Members of great teams think alike: A model of team effectiveness and schema similarity among team members. Advances in Interdisciplinary Studies of Work Teams, 1, 223–261. Rouse, W. B., Cannon-Bowers, J. A., & Salas, E. (1992). The role of mental models in team performance in complex systems. IEEE Transactions on Systems, Man, and Cybernetics, 22, 1296–1308. Saavedra, R., Earley, P. C., & Van Dyne, L. (1993). Complex interdependence in taskperforming groups. Journal of Applied Psychology, 78(1), 61–72. Salas, E., Prince, C., Baker, D. P., & Shrestha, L. (1995). Situation awareness in team performance: Implications for measurement and training. Human Factors, 37(1), 123–136. Sproull, L., & Kiesler, S. (1986). Reducing social context cues: Electronic mail in organizational communications. Management Science, 32(11), 1492–1512.
Examining Measures of Team Cognition in Virtual Teams
283
Taylor, R. M. (1989). Situational awareness rating technique (SART): The development of a tool for aircrew system design. In Situational Awareness in Aerospace Operations (AGARD-CP-478; pp. 3.1–3.17). Neuilly Sure Seine, France: NATO-AGARD. Waag, W. L., & Houck, M. R. (1994). Tools for assessing situational awareness in an operational fighter environment. Aviation, Space and Environmental Medicine, 65(5), A13–A19. Warner, N., Letsky, M., & Cowen, M. (2005). Cognitive model of team collaboration: Macro-cognitive focus. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting (pp. 269–273). Santa Monica, CA: Human Factors and Ergonomics Society. Wegner, D. M. (1986). Transactive memory: A contemporary analysis of the group mind. In B. Mullen & G. R. Goethals (Eds.), Theories of group behavior (pp. 185–208). New York: Springer-Verlag. Yoo, Y., & Kanawattanachai, P. (2001). Developments of transactive memory systems and collective mind in virtual teams. The International Journal of Organizational Analysis, 9(2), 187–208.
Chapter 15
VIRTUAL ENVIRONMENT PERFORMANCE ASSESSMENT: ORGANIZATIONAL LEVEL CONSIDERATIONS Robert D. Pritchard, Deborah DiazGranados, Sallie J. Weaver, Wendy L. Bedwell, and Melissa M. Harrell Measuring performance in virtual environments (VEs) is a critical part of VE training design and application. The premise of our chapter is that while individual and team level measurement issues are important (covered in Fowlkes, Neville, Nayeem, and Eitelman, Volume 1, Section 3, Chapter 13; Burke, Lum, Scielzo, Smith-Jentsch, and Salas, Volume 1, Section 3, Chapter 14), there are also several organizational level issues that must be considered for optimal VE training performance measurement. This level of analysis issue is an important topic (Klein & Kozlowski, 2000) because individuals and teams should contribute to the broader organization. Without considering the broader organization, important aspects of measuring performance are missed. Organization level measurement is defined as levels of measurement beyond individuals and teams. It would include multiple team coordination (for example, teams of teams; see Marks, DeChurch, Mathieu, Panzer, & Alonso, 2005), collections of units, broader departments or divisions, and ultimately the entire organization. In this chapter, we first discuss general topics about performance and performance measurement, highlighting problems especially relevant to the organizational level of analysis. We next present a series of specific organizational level issues to be considered in designing any performance measurement system, including VE when used for training. We conclude with some ideas for implementing these suggestions. PERFORMANCE, PERFORMANCE MEASUREMENT, AND MANAGEMENT The primary reason for measuring performance is to maximize performance. One approach is utilizing performance measures to provide feedback to trainees,
Virtual Environment Performance Assessment
285
evaluate effects of an intervention, make decisions about resource allocations, or assess contributions of a large department in the organization. As behavioral scientists, our focus is on performance of people in specific situations. The goal is to encourage people to behave in a manner that produces outputs or results of maximal organization. The organization’s performance management system is designed to do exactly that: manage people so they generate results of maximal value (DeNisi, 2000). This occurs through three conceptually and operationally distinct organizational systems: the measurement, evaluation, and reward systems. The measurement system is what the organization chooses to measure, containing descriptive information about how much of which results are being produced (that is, number of trainees trained, trainee criterion test scores, or mean cost per trainee). The measurement system is important because it defines what evaluators, such as supervisors or training directors, believe are important organizational results. The evaluation system utilizes measures from the measurement system and places them on a continuum from good to bad. Stating 78 percent of trainees are meeting a criterion is a measurement; noting this is unsatisfactory is an evaluation. The measurement system indicates how much was done; the evaluation system indicates how good that amount is. This continuum is ideally a translation of how much is done into how valuable it is for the organization. Finally, the reward system is the process or set of rules by which outcomes are tied to evaluations. If outcomes of value are tied to evaluations that accurately reflect level of output, greater work motivation can result (Latham, 2006; Pritchard & Ashwood, 2008). PERFORMANCE MEASUREMENT: ORGANIZATIONAL LEVEL ISSUES With this background in mind, we now turn to specific organization level issues to be considered in VE training performance measurement. The wellknown Kirkpatrick (1998) framework for evaluating training focuses on four levels of evaluation: reactions, learning, behavior, and results. Our discussion of organizational level issues is not inconsistent with that framework; our arguments can apply to any of the four levels of evaluation. Alignment with Organizational Objectives The single most important organizational level factor for performance measurement is alignment of performance measures with the organization’s overall strategic objectives. This means that what and how things are measured in VE training should be consistent with what is truly of value to the organization, that is, what will lead to meeting an organization’s strategic objectives. While this point may seem obvious, it is more difficult than it appears. One example comes from a military maintenance unit’s development of a performance measurement system (Pritchard, Jones, Roth, Stuebing & Ekeberg, 1989). In this example, a key measure was average time to complete repairs. This implied that taking less time to do repairs added more value to the overall
286
Learning, Requirements, and Metrics
organization. However, after careful analysis, unit personnel realized that meeting repair item demand, not average repair time, was most important for the organization. If demand was low, it was better to do a more thorough repair, including preventative maintenance. If things were busy, it was better to do the minimum required to get the repaired item operational. This led to changing the average repair time measure to percentage of demand met. In a training context, training should focus not only on how to do timely repairs, but also how to balance rapid repairs with preventative maintenance. Understanding this difference would help align learning from the training with organizational goals. It is important to realize the critical issues to performance measurement in VEs are very similar to issues in any other training situation; VE systems are not “paradigmatically” different from other training systems (Caird, 1996). Training content should focus on the knowledge, skills, and attitudes critical to achieving organizational objectives (Caird, 1996). Therefore, just as in other types of training, the alignment with organizational goals should drive training design. Assessing Alignment While scholars have argued the importance of aligning employee contributions with the organization’s strategic objectives (for example, Boswell & Boudreau, 2001; Latham, 2006), there is no clear, objective method of assessing alignment of performance measures with organizational goals. This assessment requires careful, logical analysis of each measure. One approach involves examining what would happen if the measure were maximized. Specifically, what would the person/team do to maximize his or her/its score on the measure and how would that impact the broader organization? Consider the maintenance unit repair example. If the measure is average time to complete repairs, to maximize the measure, unit personnel should minimize work done on each repair, avoid any extra maintenance, avoid working on repairs known to take longer to complete, and when workload was low, relax until more items come in for repair. These work strategies would not be consistent with organizational goals. Another approach to assessing alignment is to ensure upper management supports the use of the measure. In other words, are members of higher management committed to maximizing that measure, and do they agree that the measure will be used to evaluate the individual or unit? Such discussions help identify areas lacking alignment. Measurement Characteristics for Alignment: Identifying and Communicating Value This discussion suggests measures must be aligned, that is, accurately reflect organizational value, and this organizational value must be communicated accurately to managers, supervisors, and unit personnel/trainees. Identifying value means identifying value of different levels of each measure. That is, how much value is being added when trainees’ final score on a
Virtual Environment Performance Assessment
287
performance test is 80 percent? How much value is created if 50 or 100 percent of a field unit is trained? To accurately communicate value to trainees and training managers, the value of different levels of performance on each measure must be known. This issue reflects the evaluation system: the system that places each level of a measure on a good-bad continuum by identifying how much value that level of performance has for the overall organization. This identification of value to the organization is critical to alignment. In fact, specifying this value actually operationalizes organizational value, allowing it to be communicated to personnel at all levels. This identified value must accurately describe value to the organization. If the evaluation system does not match what is of value to the organization, the employee’s behavior will be consistent with the evaluation system, not with what is optimal for the organization (for example, DeNisi, 2000). For example, suppose what is measured and evaluated in VE training is the number of tasks completed. However, if the communication and backup behaviors by which the team accomplishes those tasks are actually most valuable, the evaluation system is not consistent with what is of value to the organization. We now turn to characteristics needed in performance measurement systems to lead to alignment and accurate identification and communication of value. Measure All Important Aspects of Performance For alignment to be present, the measures as a set must cover all important aspects of performance valuable to the organization. A typical approach is to use easy-to-collect measures and ignore important, but difficult to measure, aspects of performance (Borman, 1991). For example, it is often easier to measure quantity than quality, but both are valuable to organizations. A VE training situation might measure performance of a physician treating a virtual cardiac arrest patient by whether the correct medications were given, at the correct time, and in the correct dosage, but not measure how well the physician coordinated activities of other medical personnel team members because this is more difficult to measure. There is no objective methodology for assessing whether all aspects of value to the organization are included in the set of measures. This assessment requires careful, logical analysis by people who do the work, immediate supervisors, management, and key customers. In VEs, this is particularly challenging as VE tasks are often complex. For example, potential tasks that can be trained utilizing VEs are nonroutine procedures, planning and coordination, decision making, and dealing with hazardous situations to trainee health (Caird, 1996). Regardless of difficulty, each aspect of performance should be measured in these situations to accurately reflect the value of VE training to the organization. Include Descriptive and Evaluative Information Any assessment of training performance will include descriptive measures (that is, scores on performance tests, time to reach criterion performance, or
288
Learning, Requirements, and Metrics
percentage of failed trainees). These descriptive measures are part of the measurement system. They are result measures, which indicate how much was done. To identify, assess, and communicate organizational value, descriptive measures should be translated into evaluative measures identifying how good the output is and the organizational value. One approach, in general terms, is to define what is considered poor, adequate, and excellent levels of results by the broader organization. Most training managers have a sense of this level of evaluation. However, it is feasible to get more precise in determining value using quantitative measures to precisely identify the value of each level of each measure. We will discuss techniques for doing this later in this chapter, but for now we want to make the point that quantification has important, practical advantages for performance assessment. An Overall Index of Performance Quantifying value allows for an overall index of performance. Performance in VE training of any complexity will be assessed with multiple, qualitatively different measures. For example, measures of (1) control of a virtual military aircraft in space, (2) correct identification of friends and foes, (3) use of offensive weapons, and (4) use of defensive weapons will produce a series of measures not easily combined. It is important that scores on separate components of performance be measured and used for a variety of purposes (that is, feedback and training). It is also valuable to have an overall index of performance (Pritchard, 1990; Salas, Rosen, Burke, Nicholson, & Howse, 2007). An overall index can be used for monitoring training performance over time, providing ongoing feedback, and evaluating overall training effectiveness (for example, mean across trainees). Additionally, if developed correctly, it is an index of the overall value of the training created for the organization. Suppose levels of output, such as correctly identifying friends 80 percent of the time, are accurately placed on a scale of value to the organization, and this is done for each output measure. If these conditions were met, it would be easy to convert scores on each measure to their corresponding value score and sum value scores for an overall performance index. As a hypothetical example, if correctly identifying friends 80 percent of the time gets a value score of 10, correctly identifying foes 97 percent of the time receives a value score of 75, and having 12 hits with offensive weapons results in a value score of 35, the overall score is the sum of the three, 120. We will discuss specific techniques for doing this later in the chapter. Identifying Relative Importance Not all measures of performance are equally important. In other words, different measures do not contribute equal value to the organization. A good assessment system will identify and incorporate this differential importance. In VE systems, the greatest importance might be placed on tasks that are not easily
Virtual Environment Performance Assessment
289
trained via traditional methods, yet are of greatest value to the organization (Caird, 1996). One approach to differential importance is weighting measures by their relative importance. For example, each measure could be standardized and multiplied by an importance weight; the resulting products are summed to produce an overall evaluation. This standardization allows for adding measures (now on a common scale), and the weight applies the differential importance. Using this approach, weights are determined by organizational personnel who identify how much value is added to the organization. Identifying Nonlinearities There are limitations to this weighting approach. It assumes that the importance of each measure is always the same, no matter how good or poor performance is. Consider the example of VE training in the operation of an electrical power plant where one measure is time to effectively respond to systemgenerated warning alarms. If the alarm response measure is time, it may be important that response be done within 5 minutes, and critical that response be done within 15 minutes. If response takes more than 15 minutes, damage has been done, so responses slower than 15 minutes are no worse than a response time of 15 minutes. If response is less than 5 minutes, this is no better than 5 minutes because for the first 5 minutes of the warning, nothing serious occurs. The point is there is a nonlinear relationship between amount, minutes before response, and value. In this example, value to the organization is equal for performance between 1 and 5 minutes and for performance of 15 minutes or longer, but it varies greatly between 5 and 15 minutes. A simple weighting approach will not capture this nonlinearity. Simple weighting assumes a change in the measure of any given size will be equally valuable at all points along the scale. A number of scholars have argued for the importance of incorporating nonlinearities (Campbell & Campbell, 1988; Pritchard, 1992). Research on performance assessment systems by Pritchard and his colleagues has shown the vast majority of performance measures have nonlinear relationships with value (Pritchard, Paquin, DeCuir, McCormick, & Bly, 2002). For optimal alignment with organizational values, the system must account for nonlinearities. Agreement across Evaluators There are multiple important evaluators for every person within an organization. These include supervisors, peers, subordinates, higher management, internal and external customers, and the employee himself or herself. Using our conceptualization, these diverse evaluators value results differently. Subordinates value different things than peers, one manager values different things than another manager, and the employee may value different things than the supervisor. In fact, role conflict is exactly this—different evaluators of importance valuing diverse things from the individual. Role conflict has several negative effects (Jackson & Schuler, 1985; Tubre & Collins, 2000). A necessary condition for presence of alignment is that different evaluators agree on what is important.
290
Learning, Requirements, and Metrics
Perfect agreement of all evaluators is unrealistic, but it is important that supervisors, higher management, and critical customers largely agree. If important evaluators knowingly or unknowingly place different value on the same levels of output, this sends a conflicting message, making optimal performance difficult. In VE training, subject matter experts (SMEs), trainers, the training supervisor, higher management, and important customers of the trainees must agree on (1) measures used to assess performance and (2) the organizational value of different measures. In VE, measures typically center on accuracy, task completion time, or both (Nash, Edwards, Thompson, & Barfield, 2000). Evaluators should agree on the relative importance of these measures. If different evaluators place different importance on training measures, when trainees return to the job they will most likely behave according to the importance communicated by their most influential evaluator, for example, the person completing their performance appraisals. Therefore, it is critical that managers, supervisors, and trainers agree on the importance of different training performance measures. The first test of agreement is whether there is consensus on measures used to evaluate performance. All SMEs should agree that each measure is a good one and the set as a whole covers all the important aspects of the work or training. While time consuming, this is not typically difficult for personnel to do once measures are identified. The second test is that SMEs agree on the organizational value of each level of performance for each measure. Identifying Improvement Priorities One characteristic needed for maximizing performance is the identification of improvement priorities. Performance is multidimensional; therefore, it is difficult to improve all aspects of performance at once in training or in normal job performance (Brannick, Prince, & Salas, 1993; Smith-Jentsch, Zeisig, Acton, & McPherson, 1998). At any single point in time, trainees or job incumbents focus improvement efforts on a subset of the task. To do this effectively, the value of the subtasks to the organization must be known. Supervisors and incumbents or trainees need to know how much value is placed on different improvements in order to focus improvement efforts on areas of maximum value to the organization. If the value of different levels of performance on each measure has been identified, this information can be used to identify improvement priorities. Specifically, if value of each level of performance has been accurately quantified, the value of improvement on each measure can be calculated and communicated to the incumbent or trainee. Sensitivity to Changing Objectives or Mission While it is important that measurement and evaluation systems be stable over reasonable periods of time, it is also important that they be sensitive to changes in objectives/missions. For example, in the simulation based training system GUARD FIST II the trainee can be trained on (a) locating targets or (b) calling
Virtual Environment Performance Assessment
291
for and adjusting indirect fire support. The objective of this training system can change to emphasize one or both tasks. If mission flexibility is important to the job, the VE needs to train for the changing mission and accurately assess performance in different missions. It is sometimes the case that measures change when missions change, and, therefore, new measures must be developed. Often, however, the measures do not change; it is the organizational value placed on them. For example, in VE training of military flight crews, the measures may consist of aircraft control and navigation, identification of friend or foe, and use of offensive and defensive weapons. In most missions, all factors have some importance, but importance varies from mission to mission. For one type of mission, effective use of offensive weapons may be essential, but for a reconnaissance mission it may be less important. Navigation is always important, but if the mission is to run a search grid to locate survivors in a small raft, it is especially critical. When missions change, the ability to quickly change organizational value placed on the measures is necessary. If the set of possible missions can be identified in advance, different sets of quantitative indices of value can be developed in advance. If changing missions are unpredictable, this requires a cost-effective way to identify value that can quickly be customized for unique missions. Other Measurement Characteristics In the prior section we discussed organizational level measurement characteristics around the issue of alignment with organizational value. We now turn to other measurement characteristics at the organizational level. The Organizational Reward System We discussed the issue of making the evaluation system consistent with what is actually of value to the broader organization. It is also important that the organization’s reward system match what is of value to the organization. For example, if the reward system leads to formal or informal rewards for speed but not for quality, a mismatch between value and reward occurs. In such situations, employee behavior is usually consistent with the evaluation and reward systems, not with what is of value to the organization (Johnston, Brignall, & Fitzgerald, 2002). Furthermore, the closer the reward system is tied to these evaluations, the more the person will behave consistently with the evaluation and reward systems. It is ironic that with a mismatch between evaluations and actual organizational value, the stronger the reward system, the worse things become for the organization. This is an especially salient point in the VE. Utilizing a VE is a costly endeavor (Caird, 1996). If the focus of VE training is not on behaviors consistent with valued organizational goals appropriately tied to the reward system, the training will be less effective. Controllability Measures used to evaluate training performance must be largely under the control of the trainees. We define controllability as the degree to which trainees or
292
Learning, Requirements, and Metrics
incumbents can control the level of their performance measures by varying the amount of effort allocated to the tasks that lead to those performance measures (Pritchard et al., 2008). As Pritchard and his colleagues noted, lack of controllability leads to a variety of negative outcomes and ultimately influences motivation and performance. Often, measures with good face validity are used, but people in the training program or job have limited control over their performances on the measure. For example, the rate of medication errors is a highly face valid measure of physician performance. However, in reality the physician may have a low level of control over this measure because the errors may be due to mistakes by pharmacists or nurses. To improve physician control, changes would need to occur at the organization level such that variance due to other people and processes in the medication system itself is removed, leaving variance only due to the physician’s own actions. This is especially relevant to VE situations. If trainees do not feel they can affect their levels of performance within a VE training situation, they will not fully engage in training. This provides implications for VE training design as well as assessment. The target training behavior should be measured in a manner reflective of varying level of effort from the trainee. Buy-In One notable problem with VE training in particular is buy-in. Trainees sometimes do not perceive measures as valid due in part to negative transfer or hindered, correct, real world performance (Rose, Attree, Brooks, Parslow, & Penn, 2000). Rose and colleagues (2000) provide an example of teaching children to cross a street using VE. While training may provide valuable skills, such as looking both ways and timing involved in actually crossing a street with traffic, it is nearly impossible to include the possibility of injury from oncoming traffic. One way to improve buy-in is to include measures of the all-important aspects of performance. In the street crossing example this suggests simulating what would happen to the child if hit by a vehicle and measuring this knowledge.
IMPLEMENTING ORGANIZATION LEVEL FACTORS In the section above, we identified a series of important organizational level considerations in the development and design of measuring performance in VE training, or measuring performance in general. The final section of this chapter discusses specific ways to deal with these issues. A number of the factors are fairly straightforward, and implementation issues are discussed. Assessing alignment of measures, measuring all important aspects of performance, matching evaluation and reward systems with what is of value to the organization, and controllability of measures can all be handled through logical analysis of the setting. Some specific tests are presented; however, there is no objective methodology. Each requires discussion and judgment by relevant people, especially those doing the work and SMEs.
Virtual Environment Performance Assessment
293
All other factors have one thing in common: they require the determination of how much value to the organization is produced for each level of performance on each performance measure. We now turn to ways to operationalize this value. Identifying and Communicating Value: Specific Techniques The question remains of how to accurately attach value to different levels of performance on performance measures. There are multiple ways to do this. The simplest is to attach some number of points to each level of performance on each measure. For example, five errors are worth 0 points, 4 errors are 5 points, or 3 errors are 10 points. If points are determined for each level of each measure, and these points reflect true value to the organization, a measure of value is generated. If we wanted an overall measure of performance, we could simply sum points an individual or team earned for a given training trial or performance period. This is essentially how the Balanced Scorecard works (Kaplan & Norton, 1996). This approach has the advantage of simplicity, but it has disadvantages as well. What we really need is a value scale that is comparable across performance measures. We want a value score of 20 to lead to the same value to the organization from all measures. So 20 points on an error measure has the same value as 20 points on a speed measure. This means a scale that is the same scale for all performance measures is needed. Giving points to each level of performance separately, the way it is typically done with approaches like the Balanced Scorecard, does not ensure the value scale is comparable across measures. The second disadvantage is that this approach does not formally consider nonlinearities. ProMES Contingencies An approach that overcomes these limitations is that used by the first author and his associates in the productivity measurement and enhancement system (ProMES) intervention (Pritchard, 1990; Pritchard, Harrell, DiazGranados, & Guzman, 2008; Pritchard et al., 1989). ProMES is designed as an intervention to measure performance and to provide feedback used to improve performance through a continuous improvement model. Once the set of performance measures is developed in ProMES, the next step is to develop contingencies. These contingencies capture value to the organization. ProMES contingencies are derived from theory; they operationalize the results-to-evaluation contingencies in the Naylor, Pritchard, and Ilgen (1980) and Pritchard and Ashwood (2008) theories of motivation. A ProMES contingency is a type of graphic utility function that relates the amount of each performance measure to organizational value. Example contingencies from a hypothetical VE training of combat aircraft teams are shown in Figure 15.1. The contingency in the upper left is for a measure of aircraft control, the distance in miles the aircraft is from its intended location. The horizontal axis shows levels of the measure ranging from a low performance level of 10 miles away to a high performance level of 0 miles away from the intended location. The vertical axis is
294
Learning, Requirements, and Metrics
Figure 15.1.
Example Contingencies
called effectiveness, defined as the amount of contribution being made to the organization. The effectiveness scale ranges from −100, which is highly negative, through zero, which is meeting minimum expectations, to +100, which is highly positive. The function itself defines how each level of the measure is related to effectiveness. As depicted in Figure 15.1, a contingency is generated for each performance measure. Contingencies are developed using a discussion to consensus process. A design team is formed composed of job incumbents, at least one level of supervision,
Virtual Environment Performance Assessment
295
and a facilitator. In the case of VE training, the design team would be composed of subject matter experts, the training manager, a facilitator, and possibly some key customers. The basic idea is for a facilitator to break development of contingencies into a series of steps that the design team can do. For example, the first step is to identify the maximum and minimum realistic levels for each measure. In the example aircraft control contingency, the design team decided that the minimum realistic value was 10 miles from the intended location; the maximum possible value was 0 miles. Next, the design team decides a minimum level of acceptable performance, defined as the point of just meeting minimum expectations. Members of the design team discuss this value until a consensus is reached. This point becomes the point of zero effectiveness in the contingency, 5 miles in the aircraft control example. The design team then continues through a set of steps that ultimately lead to the creation of the set of contingencies. More detail on how contingencies are done can be found in Pritchard (1990) and Pritchard, van Tuijl et al. (2008). Advantages of Contingencies Contingencies have several important features that meet criteria for value assessment as identified above. The relative importance of each measure is captured by the overall range in effectiveness score. Those measures with larger ranges, such as identifying friend or foe, see Figure 15.1, can contribute to or detract from the organization in greater amounts and are thus more important than those with smaller ranges. Contingencies also translate measurement into evaluation by identifying how each level of each measure contributes to effectiveness. For example, if the unit had a use of weapons score of 80 percent, this translates into an effectiveness score of +50, quite positive and well above minimum expectations. Contingencies also identify nonlinearities. Figure 15.1 shows several forms of nonlinearity. The control contingency indicates there is a point of diminishing returns where the slope decreases once the aircraft is within 2 miles, indicating improved location accuracy above 2 miles decreases in value. The identify (ID) friend or foe contingency is a critical mass type contingency, where organizational value is very low until the performance reaches a certain point (at least 96 percent) and then increases rapidly. Contingencies also help identify priorities for improvement. One can readily calculate the gain in effectiveness if the unit improved on each measure. For example, suppose the flight crew has a score of 3 miles from the intended location and an ID friend or foe score of 96 percent. The contingency indicates going from 3 miles from the intended location to 2 miles produces a gain in effectiveness of 10 effectiveness points; going from an ID score of 96 percent to a score of 97 percent shows a gain of 20 effectiveness points. Improvement in ID is worth twice as much to the organization as improvement in location. This can be done for each measure. Effectiveness gain scores are a quantification of how valuable each improvement is. Finally, contingencies rescale each measure to the common metric of effectiveness so a single, overall effectiveness score can be formed by summing each indicator’s effectiveness scores. For example, if the effectiveness
296
Learning, Requirements, and Metrics
Table 15.1. Organizational Levels Performance Measurement Guidelines Performance Measures Guidelines
Definition
Key Points for VE
What and how things are • Identify value for different levels measured should be of each measure. consistent with what is • Ensure value accurately reflects truly valued to the true value to the organization. organization • Design VE training to focus on knowledge, skills, and attitudes (KSAs) that reflect the most value to the organization. 2. Measure all important Set of measures cover all • VE training is often more aspects of performance dimensions of complex (that is, situations performance that are of therein are too hazardous to train value to the organization in the “real world”). • Measures must capture all aspects of performance within complex tasks that are of value to the organization. • The goal of any training should 3. Descriptive and Result measures be objective measures of evaluative information designed to indicate how much was done performance that provide (descriptive) and how feedback to the trainee, especially in a VE situation. good it was (evaluative) • Feedback regarding how much was accomplished and how good it was relative to the organizational goals is important. 4. Overall index of Summation of all • Provides meaningful information performance quantified indices of on effectiveness of VE training performance • Provides meaningful information on value of training to the organization • Provides manner in which to 5. Identify relative Differentiating prioritize training in VE systems importance contribution of each measure to the value of • Greatest weight should be placed on tasks that are of highest value the organization to the organization, but not easily trained utilizing training methods other than VE. 6. Agreement across Shared mental models • Evaluators must agree on most evaluators regarding value of output valued levels and utilize measures that accurately reflect VE training performance based on desired levels. 1. Alignment with organizational objectives
Virtual Environment Performance Assessment
297
Determination of which • Utilizing values placed on different levels of performance, it is easy to aspect of performance determine which area requires is most critical for work. improvement at any • VE systems can therefore be given moment utilized more effectively if they can target the required KSAs. 8. Sensitive to changing Measurement system • Should be able to utilize VE objectives or missions should be stable (when system for just-in-time training to considering evaluation maximize value of training to the system), yet flexible to organization changes in objectives/ mission 9. Organizational reward Measures should tie to • For effective transfer of VE system tie the organizational training, evaluation of resulting reward system behaviors should be directly tied to the organizational reward system. 10. Controllability Degree to which • If trainees do not feel they can trainees can control affect their levels of performance level of performance within a VE training situation, they measures by varying will not engage fully in training. amount of effort This provides implications for VE training design as well as assessment. 11. Buy-in Acceptance (or lack • All important aspects of work must thereof ) of applicability be measured to ensure buy-in, even of training to the real those aspects that are difficult to world train or measure (for example, bodily injury that can result from children not paying attention when crossing the street is not easy to demonstrate in a VE training system). 7. Identify improvement priorities
score for the location indicator was +60, it would be added to effectiveness scores from other indicators. If there were 12 indicators, there would be 12 effectiveness scores summed to create the overall effectiveness score. This overall effectiveness score provides a single index of overall productivity. Because contingency development includes inputs from a variety of different subject matter experts, levels of supervision and management, and key customers, it maximizes agreement across evaluators. Finally, contingencies allow for changing objectives or missions. If the mission of the aircraft crew changed from search and destroy to recovery, the contingency for location control might become steeper and the expected minimum performance might change. Once a design team has created contingencies, it takes little time to make such changes when missions adjust.
298
Learning, Requirements, and Metrics
CONCLUSION While it is essential to consider a variety of issues at the individual and team levels in developing measures of performance in VE training settings, there are also issues of importance at the organizational level (refer to Table 15.1 for summary). The most difficult is determination of organizational value for different levels of performance on each measure. If value can be accurately determined, a series of advantages can be realized. Utilizing these in VE training design will allow for more effective VE training. REFERENCES Borman, W. C. (1991). Job behavior, performance, and effectiveness. In M. D. Dunnette & L. Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 2, pp. 271–326). Palo Alto, CA: Consulting Psychologists Press. Boswell, W. R., & Boudreau, J. W. (2001). How leading companies create, measure, and achieve strategic results through “line of sight.” Management Decision, 39, 851–859. Brannick, M. T., Prince, A., & Salas, E. (1993). Understanding team performance: A multimethod study. Human Performance, 6(4), 287–308. Caird, J. K. (1996). Persistent issues in the application of virtual environment systems to training. Proceedings of the 3rd Symposium on Human Interaction with Complex Systems—HICS (pp. 124–132). Dayton, OH: IEEE. Campbell, J. C., & Campbell, R. J. (1988). Industrial-organizational psychology and productivity: The goodness of fit. In J. C. Campbell & R. J. Campbell (Eds.), Productivity in organizations (pp. 82–94). San Francisco: Jossey-Bass. DeNisi, A. S. (2000). Performance appraisal and performance management. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 121–156). San Francisco: Jossey-Bass. Jackson, S. E., & Schuler, R. S. (1985). A meta-analysis and conceptual critique of research on role ambiguity and role conflict in work settings. Organizational Behavior and Human Decision Processes, 33, 1–21. Johnston, R., Brignall, S., & Fitzgerald, L. (2002). ‘Good enough’ performance measurement: A trade-off between activity and action. Journal of the Operational Research Society, 53, 256–262. Kaplan, R. S., & Norton, D. P. (1996). Translating strategy into action: The balanced scorecard. Boston: Harvard Business School Press. Kirkpatrick, D. L. (1998). Evaluating training programs. San Francisco: Berrett-Koehler Publishers, Inc. Klein, K. J., & Kozlowski, S. W. J. (Eds.). (2000). Multilevel theory, research, and methods in organizations. San Francisco: Jossey-Bass. Latham, G. P. (2006). Work motivation: History, theory, research, and practice. Thousand Oaks, CA: Sage Publications. Marks, M. A., DeChurch, L. A., Mathieu, J. E., Panzer, F. J., & Alonso, A. (2005). Teamwork in multi-team systems. Journal of Applied Psychology, 90, 964–971.
Virtual Environment Performance Assessment
299
Nash, E. B., Edwards, G. W., Thompson, J. A., & Barfield, W. (2000). A review of presence and performance in virtual environments. International Journal of HumanComputer Interaction, 12(1), 1–41. Naylor, J. C., Pritchard, R. D., & Ilgen, D. R. (1980). A theory of behavior in organizations. New York: Academic Press. Pritchard, R. D. (1990). Measuring and improving organizational productivity: A practical guide. New York: Praeger. Pritchard, R. D. (1992). Organizational productivity. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 2, pp. 443–471). Palo Alto, CA: Consulting Psychologists Press. Pritchard, R. D., & Ashwood, A. (2008). A manager’s guide to diagnosing and improving motivation. New York: Psychology Press. Pritchard, R. D., Harrell, M. M., DiazGranados, D., & Guzman, M. J. (2008). The productivity measurement and enhancement system: A meta-analysis. Journal of Applied Psychology. Pritchard, R. D., Jones, S. D., Roth, P. L., Stuebing, K. K., & Ekeberg, S. E. (1989). The evaluation of an integrated approach to measuring organizational productivity. Personnel Psychology, 42, 69–115. Pritchard, R. D., Paquin, A. R., DeCuir, A. D., McCormick, M. J., & Bly, P. R. (2002). Measuring and improving organizational productivity: An overview of ProMES, The Productivity Measurement and Enhancement System. In R. D. Pritchard, H. Holling, F. Lammers, & B. D. Clark (Eds.), Improving organizational performance with the Productivity Measurement and Enhancement System: An international collaboration (pp. 3–50). Huntington, NY: Nova Science. Pritchard, R. D., van Tuijl, H., Bedwell, W., Weaver, S., Fullick, J., & Wright, N. (2008). Maximizing controllability in performance measures. Manuscript submitted for publication. Rose, F. D., Attree, E. A., Brooks, B. M., Parslow, D. M., & Penn, P. R. (2000). Training in virtual environments: Transfer to real world tasks and equivalence to real task training. Ergonomics, 43(4), 494–511. Salas, E., Rosen, M. A., Burke, S. C., Nicholson, D., & Howse, W. R. (2007). Markers for enhancing team cognition in complex environments: The power of team performance diagnosis. Aviation, Space, and Environmental Medicine, 78(5), B77–B85. Smith-Jentsch, K., Zeisig, R. L., Acton, B., & McPherson, J. A. (1998). Team dimensional training: A strategy for guided team self-correction. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 271–297). Washington, DC: American Psychological Association. Tubre, T. C., & Collins, J. M. (2000). Jackson & Schuler (1985) revisited: A meta-analysis of the relationships between role ambiguity, role conflict, and job performance. Journal of Management, 26, 155–169.
Part VIII: Methods in Performance Assessment
Chapter 16
ASSESSMENT MODELS AND TOOLS FOR VIRTUAL ENVIRONMENT TRAINING William L. Bewley, Gregory K. W. K. Chung, Girlie C. Delacruz, and Eva L. Baker Although many consider the effectiveness of virtual environment (VE) training to be self-evident, the sad truth is that some training systems, including VE training systems, do not work, and some trainees do not learn, even from VE training systems. Assessments of learner performance can provide evidence for the effectiveness of training systems and for trainee learning, as well as information supporting training system improvement, guidance of instruction, trainee placement decisions, and certification of skill. It is also sadly true, however, that in some cases where assessments are used, they do not work. The problem is poor design. Assessments of learner performance must be designed to measure the entire range of knowledge and skills addressed by the training, and they must be validated as sources of evidence to support the interpretations and uses of assessment results. This chapter describes a model based approach to the design and validation of assessments of performance, with a focus on assessments in VE training. It begins with a discussion of validity, the essential requirement for any assessment, and then describes Baker’s (1997) model based approach to design and validation of performance assessments. The chapter concludes with a discussion of future directions in assessment for VE training and a summary and discussion. INTRODUCTION Virtual environment systems enable interaction with a simulated, often threedimensional computer-generated environment. A typical VE includes representations of objects, people, paths, tools, and information sources and provides facilities for interaction with the environment through gestures and other body movements, speech, and such input devices as a glove, a mouse or joystick, or a keyboard. In VE training, the VE represents the real environment at some level
Assessment Models and Tools for Virtual Environment Training
301
of fidelity. It supports training on complex tasks that traditionally have required hours, days, weeks, or even months of training and practice in the real environment to learn, with assistance and feedback from a human instructor or mentor. Examples of such tasks are firefighting, architectural design, surgery, equipment maintenance, ship handling, flying an airplane, and battle planning. This is not to say that VE training can replace training in the real environment supervised by a knowledgeable instructor—nobody would want a surgeon who had trained only in a VE—but a useful level of knowledge and skill can be developed cost-effectively and safely with VE training in preparation for training in the real environment. What these examples have in common is the complexity of the task in terms of environmental cues, trainee responses, and the interaction of trainee responses and the behavior of the environment over an extended period of time. They have all the characteristics of complex tasks as defined by Williamson, Bejar, and Mislevy (2006): multiple, nontrivial, domain-relevant steps and/or cognitive processes, high potential variability in task performance, and interdependent task features. In addition, performance involves recognition and use of complex situational cues and affordances represented in the environment and feedback providing new situational cues. In ship handling, for example, maneuvers such as underway replenishment and docking require the use of perceptual cues for ship location and speed relative to other objects in the environment, and environmental characteristics such as ship dynamics in response to trainee actions, weather and ocean conditions, and the predicted effects of commands to other humans during different task phases. Why Use VE for Training? VE training is usually faster, less costly, and less dangerous than training in the real environment—compare training in a space shuttle simulator to on-the-job training en route to the space station. The benefits of cost and risk avoidance are convincing, but there are also benefits to learning due to the ability to unobtrusively collect detailed data on the process used by the learner in performing the task, data providing assessment information that can be used to automatically score performance and diagnose learning problems. The VE can also be used to provide an experience not possible in the real environment that benefits learning. One example is practice with parts of a task that cannot be isolated in the real world, for example, repeated takeoffs and landings under controlled conditions in a flight simulator (Carretta & Dunlap, 1998). Another is the ability to see the environment from a viewpoint not possible without a VE, for example, an external view of the airplane to observe the effects of actions during a landing (Wickens & May, 1994). Why Assess Performance in VE Training? Although some enthusiasts consider the effectiveness of VE training to be selfevident, the sad truth is that some training systems, including VE training
302
Learning, Requirements, and Metrics
systems, do not work, and some trainees do not learn, even from VE training systems (Clark & Estes, 2002). Performance assessments produce results that can be used to draw inferences about ability or competence, inferences that may be used for multiple purposes. They may be used to determine ability or competence at the beginning of instruction in support of placement decisions, to diagnose knowledge and skill gaps during training in order to guide instructional decisions, to predict future performance in other settings, to certify competence after training, to evaluate the training program as a whole, or to measure the impact of specific attributes of the training program (see Lampton, Bliss, & Morris, 2002). Overview of the Chapter This chapter describes a model based approach to the design and validation of assessments of performance in VE training. It begins with a discussion of validity and then describes a model based approach to the design and validation of assessments. The chapter concludes with an overview of future directions in assessment for VE training and a summary and discussion. VALIDITY Assessments of learner performance in VE training environments must be designed to measure the entire range of knowledge and skills at the same level of complexity addressed by the training, and they must be validated for the purposes and situations to which they are applied. Validation is the fundamental requirement in assessment development. Its importance cannot be overemphasized. Assessments can be developed with little difficulty. Developing valid assessments—assessments that have been demonstrated to provide evidence appropriate to the uses of their results—requires a rigorous methodology involving significant analysis and testing. Just as assessments are used for different purposes, the evidence to support the validation of assessments will differ depending on its intended use. The validity of an assessment is the key indicator of its technical quality and provides evidence for the appropriateness of the interpretations and uses of the assessment results. Validity is not a general quality of the assessment that applies to all uses for all time, nor is it based on a single procedure, such as always correlating an existing set of scores with those of another measure. Rather, the validity of an assessment depends on the context and inferences to be drawn. Validation should be thought of more as the job of an attorney making a legal case than as the calculation of a statistic. A validity argument must be developed that marshals a wide range of evidence to make the case (American Educational Research Association, American Psychological Association, and National Council for Measurement in Education, 1999). This is very different from early models of validity, in which specific validity types are considered. Some traditional validity types and the question each answers are listed below (see Messick, 1993, pp. 16–19, for a discussion of early conceptions of validity):
Assessment Models and Tools for Virtual Environment Training
303
• Face validity: Does the test performance look like what is supposed to be measured? • Content validity: Is the performance measured related to content goals or domains? • Predictive validity: Do people with higher scores do better on a distal criterion measure? • Criterion validity: Does performance on the new measure relate in predictable ways with an existing measure of known quality?
While all these questions may be considered in making a validity argument, one no longer looks at a list of validity types and chooses one or two as most appropriate or, more likely, easiest to implement. The validity of assessment is often seen as separate from the instruction and instructional goals, something to do after the instructional materials are developed. In the interest of saving time and money, developers may decide to use an existing, readily available assessment without attempting to validate it for a specific purpose. This approach is indeed fast and easy, but it brings the significant risk that the measure selected will not be valid for the tasks, learners, situations, or purposes addressed by the training. One likely consequence is that the measure is not aligned with the goals of the instruction, which makes the assessment irrelevant to its planned use. Another is that the assessment is insensitive to the training experiences of interest—that is, people who receive instruction perform as well as people who do not. Because assessments are used for placement, guidance of instruction, certification, and training program improvement, their quality should be the last element of the training program to be compromised. The measures, their characteristics, and how they are to be used must be considered an integral part of the vision that motivates the training program design. The design of assessments and training should be an integrated activity rather than be approached separately. ASSESSMENT DESIGN Baker’s (1997) model based assessment design methodology helps ensure that assessments are valid. As shown in Figure 16.1, it begins with the specification of the type of learning outcomes to be measured—the cognitive demand—and then describes how cognitive demand influences the domain representation, the task representation, and the scoring model. Cognitive Demand The cognitive demands are the domain-independent and domain-dependent knowledge and cognitive skills that an assessment should target (American Educational Research Association et al., 1999; Baker, 1997; Baker & Mayer, 1999). It must be clearly defined at the beginning of the assessment design process. Only after designers have addressed what the assessment is intended to measure should decisions be made about details that too often are the only concern in assessment design: format (for example, multiple-choice or true-false questions), number of items, and the amount of testing time.
304
Learning, Requirements, and Metrics
Figure 16.1. Baker’s model based assessment design methodology begins with specification of the cognitive demand, which influences the domain representation, the task representation, and the scoring model.
Specification of cognitive demand helps determine how the assessment task can elicit performance demonstrating the desired knowledge and skills. For example, if we are measuring factual knowledge, then an appropriate test format might be multiple-choice or true-false questions, but if we are measuring conceptual understanding, an essay or knowledge map may be appropriate (Baker, Freeman, & Clayton, 1991). But for tasks requiring VE training, the cognitive demand is typically complex, requiring high level cognitive processing involving activity that would be placed in the create and evaluate categories of Anderson and Krathwohl (2001), in Mayer’s (1992) strategic and schematic categories, or in van Merrie¨nboer’s (1997) strategic knowledge supported by cognitive schemata; it may include the use of perceptual cues embedded in the environment with complex interaction of trainee responses and the behavior of the environment, and the use of precise motor skills. Domain Representation A domain representation combines cognitive demand and a representation of domain content. It is an explicit description of the content, knowledge, skills, abilities, interests, and attitudes contained in the construct to be assessed (American Educational Research Association et al., 1999; Baker, 1997; Baker & Mayer, 1999; Baker & O’Neil, 1987). It should be explicit, precise, and externalized, and
Assessment Models and Tools for Virtual Environment Training
305
it should capture the essential elements of what is to be tested with respect to the target environment. As pointed out by Williamson et al. (2006), the emphasis is on “essential elements.” Detailed models are not required if the purpose of the assessment is to support summative decisions. The granularity of the model depends on the targets of inference. A simple estimate of ability may be sufficient for a summative decision. A complex model would be required for cognitive diagnosis and prescription of remediation such as might be used by an intelligent tutoring system. The domain representation is the basis for sampling assessment tasks, the referent against which to evaluate the relevance and representativeness of the task, and it reflects the universe that assessment performance represents. It can also support identification of student deficiencies, which could be used to guide remediation efforts, with domain-referenced comparisons providing diagnostic information on what individuals can and cannot do. For example, if a domain is composed of different subdomains (for example, threat-assessment and threatsector definition skills belonging to the larger domain of air defense planning), then tasks can be sampled from each subdomain and scales developed for each. Student performance on the scales can be used to infer the degree to which students have mastered those skills. Finally, with an explicit domain model, the model itself can be tested and validated (Gitomer & Yamamoto, 1991). See Baker and O’Neil (1987) for an in-depth discussion of domain-referenced testing issues, and see Baker, Chung, and Delacruz (2008) for a discussion of approaches to developing domain representations, including ontologies, the rule-space method, and Bayes nets.
Task Representation The task representation specifies what the assessment task asks the examinee to do. For cognitive demand at the level of facts and concepts, the task could be based on selected-response test items (for example, multiple-choice questions) or constructed-response items (for example, short answer, essay question, or knowledge maps). For high level cognitive processing, it might be a simulation or VE based task. The assessment task is designed to elicit performance that will provide evidence of the examinee’s knowledge and skills. It is the testbed used to observe and gather evidence about performance and is the basis for drawing inferences about the examinee’s competence (American Educational Research Association et al., 1999; Baker, 2002; Baker et al., 1991; Baker & Herman, 1983; Messick, 1995). Task design is driven by the domain representation. If the tasks are not aligned with the domain representation, or the domain representation is not representative of the cognitive demand and domain content, then inferences based on students’ performance on the tasks will not be valid. And, of course, when examinees are tested on content they have not been exposed to, or not tested on content they have been exposed to, the assessment results will not accurately reflect achievement (Baker et al., 1991).
306
Learning, Requirements, and Metrics
To maximize fidelity to the target task, assessments in VE training should be based on tasks identical to the training tasks, which should be designed to be as similar as possible to the task as performed in the real environment. One of the great advantages of an assessment embedded in a VE or a simulation is the ability to unobtrusively measure performance of the task as it is delivered in the training. This can provide valuable information on the process used by the examinee, a potentially valuable addition to measures focused on the outcome of the process such as a rating of overall success, the number of errors, and time to complete. In tasks performed by manipulating objects on a computer screen, this may be done by recording the clickstream—the responses of the examinee in performing the task, usually mouse clicks, with the associated location, time, and task context of the clicks as appropriate. In tasks performed by manipulating simulated controls in a VE, for example, turning a wheel, moving a lever, or pulling a trigger, the manipulations are recorded, again with the associated location, time, and task context. Additional measures—mostly unobtrusive—can also be collected to correlate with process measures, such as video of the trainee’s performance, audio of think-alouds, sensor based measures of gaze (using eye trackers), motor performance (using pressure and motion sensors attached to the device operated by the trainee), and psychophysiological measures of stress and attention, such as electroencephalography, electromyography, an electrocardiogram, electrodermal activity, blood pressure, respiration, or heart rate. Bewley, Lee, Munro, and Chung (2007) describe the use of clickstream data in assessments of air defense planning knowledge and skill. Plans are generated by selecting and moving plan elements, for example, ships, aircraft, and threat sectors. The time and location of each selection and placement are recorded, and a scoring algorithm reduces the data for interpretation by comparison to an expert’s judgment of appropriate planning behavior. Another approach to unobtrusively measuring task performance is the use of sensors. Sensor based measures are being used by Greg Chung and colleagues at CRESST (National Center for Research on Evaluation, Standards, and Student Testing) in assessments of rifle marksmanship performance on a virtual shooting range. The trainee fires a rifle instrumented with pressure sensors on the trigger to measure trigger squeeze, a motion sensor on the muzzle to measure steadiness, and an eye tracker to determine focus at the time of the shot. All measures are correlated with the location of the strike on a laser target. Measures being added in current research include an electroencephalograph, galvanic skin response, heart rate, and respiration.
Scoring Model A disadvantage of the ability to easily measure process performance in assessments for VE training is the ease of collecting too much data. The model based approach to assessment helps mitigate this risk by requiring that tasks and measures map to the domain representation, which maps back to the cognitive demand and content representation—the purpose and goals of the assessment—
Assessment Models and Tools for Virtual Environment Training
307
which helps ensure that data collected will be relevant to the interpretations and uses of the assessment. But a scoring model must also be developed that translates observations of examinee performance into scores that can be used to draw inferences about knowledge or skills. A scoring model includes an information measurement scale, scoring criteria, performance descriptions of each criterion at each point on the scale, and sample responses that illustrate the various levels of performance (American Educational Research Association et al., 1999). Scoring issues for assessments in VE training are complex, as they can generate a rich set of observations that are fine grained, interrelated, and process oriented (Baker & Mayer, 1999; Bennett, 1999; Chung & Baker, 2003b; Clauser, 2000; National Research Council, 2001). It is important to define how the observations are combined and how they are scored and scaled. Evidence needs to be collected on how the measures relate to other measures of the construct and how the measures discriminate between high and low performers. Three major approaches to automated scoring have been used: expert based methods, data-driven methods, and domain-modeling methods. Expert Based Methods There are two expert based methods: using expert performance and modeling expert judgment. In the first approach, actual expert performance is considered the gold standard against which to compare student performance (Baker, 1997; Baker et al., 1991), not what experts say should be competent performance or how experts rate student performance. This approach has been used to develop tasks for content understanding using essays (Baker et al., 1991) and knowledge maps (Herl, O’Neil, Chung, & Schacter, 1999). A related approach is to model experts’ rating of examinees’ performance on various task variables. Expert judgment is considered the gold standard against which to compare student performance, not actual expert performance. This scoring approach has been used successfully to model expert and rater judgments in a variety of applications, including essays (Burstein, 2003), patient management skills (Margolis & Clauser, 2006), and air defense planning (Bewley et al., 2007). One of the major issues with expert based scoring is the selection of the expert (Bennett, 2006; Bennett & Bejar, 1998). Problems include experts’ biases, the influences of the experts’ content and world knowledge, linguistic competency, expectations of student competency, and instructional beliefs (Baker & O’Neil, 1996). Data-Driven Techniques In data-driven techniques, performance data are subjected to statistical or machine learning analyses (for example, an artificial neural network with hidden Markov models). Using artificial neural network and hidden Markov model technologies, Ron Stevens and colleagues have developed a method for identifying learner problem solving strategies and modeling learning trajectories, or
308
Learning, Requirements, and Metrics
sequences of performance states (Stevens, Soller, Cooper, & Sprang, 2004). Applying the method to chemistry, they were able to identify trajectories revealing learning problems, for example, not thoroughly exploring the problem space early, reaching a performance state that makes it unlikely to reach a more desirable end state, and reaching a state from which the learner could transition to a better or worse state with equal likelihood. With this information, it may be possible to perform a fine-grained diagnosis of what learners do not know and to use learning trajectories to guide the sequence of instruction and the type and form of remediation, and do it on the fly. Validation of data-driven methods is complicated because there is no a priori expectation of what scores mean and no inherent meaning of the classification scheme. Interpretation is post hoc, which creates the potential for the introduction of bias in assignments to groups after the groups have been defined. A second problem is that machine learning techniques can be highly sample dependent, and the scoring process is driven by statistical rather than theoretical issues (Bennett, 2006). Because of these issues, validity evidence is particularly important when using data-driven techniques to score student responses. Domain Modeling This approach attempts to model the cognitive demands of the domain itself. The model specifies how knowledge and skills influence each other and the task variables on which observations are being made. The approach relies on a priori linking of student performance variables to hypothesized knowledge and skill states. Student knowledge and skills are then interpreted in light of the observed student performance. This approach has been used successfully in a variety of domains and modeling types, from canonical items (for example, Hively, Patterson, & Page, 1968), to Tatsuoka’s rule-space methodology (for example, Birenbaum, Kelly, & Tatsuoka, 1993), to the use of Bayes nets to model student understanding in such domains as dental hygiene skills, hydraulic troubleshooting, network troubleshooting, Web searching, circuit analyses, and rifle marksmanship (for example, Bennett, Jenkins, Persky, & Weiss, 2003; Chung, Delacruz, Dionne, & Bewley, 2003; Mislevy & Gitomer, 1995; Mislevy, Steinberg, Breyer, Almond, & Johnson, 2002; Williamson, Almond, Mislevy, & Levy, 2006). The most important issue in domain modeling is identifying the essential concepts and their interrelationships. This can be mitigated through cognitive task analyses and direct observation of performance, but it is critical to gather validity evidence to validate the structure of and inferences drawn by the Bayes net. For examples of empirical validation techniques, see Chung, Delacruz, et al. (2003) and Williamson, Almond, and Mislevy (2000). FUTURE DIRECTIONS The need for efficient and cost-effective development of quality assessments has motivated several efforts to create automated or partially automated supports
Assessment Models and Tools for Virtual Environment Training
309
for assessment design (Baker, 2002; Chung et al., 2008). Some have been focused on very specific topics, such as algebra (Koedinger & Nathan, 2004), some involve systems of training (Mislevy & Riconscente, 2005), and others have focused on the development of cognition and content templates and objects. CRESST’s Assessment Design and Delivery System (ADDS) provides the capability to create assessments using assessment components, for example, new or preexisting prompts, and information sources (Vendlinski, Niemi, & Wang, 2005). ADDS users have been found to focus more on measuring conceptual knowledge and to create more appropriate rubrics and coherent prompts that address critical ideas. Additional work should be done to develop assessment design tools. A second important trend is the use of formative assessments embedded in training to diagnose knowledge gaps and guide instructional decisions. Complex modeling can be used during instruction, post-instruction, retention trials, and generalization and transfer measurement to understand and locate specific performance problems and diagnose the causes as a combination of lack of knowledge, attention, motivation, or integration of content and skill. Using clickstream methods (Chung & Baker, 2003a; Stevens & Casillas, 2006), one can now pinpoint some of these areas. Because of the growing sophistication of computationally supported data collection, and the importance of formative information about the trainee’s process during learning, the future of outcome assessment will merge with process information to create learner profiles rather than scores or classifications. We anticipate that these will have domain-independent components that may predict a learner’s likely success in a range of other tasks. We also expect greater use of ontologies for domain representation and see the study of expertise continuing to add to our knowledge of performance measurement and its validity. Finally, we predict an increased use of artificial intelligence and advanced decision analysis techniques to support assessment. These include ontologies, Bayes nets, artificial neural networks, hidden Markov models, lag sequential analysis, and constraint networks.
SUMMARY AND DISCUSSION This chapter has described a model based approach to assessment design and validation, with a focus on assessments for VE training environments. We have argued for the need to assess performance in VE training, and to do so using valid assessments. We have discussed the concept of validity and described Baker’s (1997) model based methodology for assessment design and validation. The methodology begins with defining cognitive demand, which combines with the content representation to influence definition of the domain representation. The domain representation influences the task representation. Finally, the scoring model specifies how observations of task performance are translated to scores that can be used to draw inferences about the knowledge, skills, abilities,
310
Learning, Requirements, and Metrics
attitudes, and other properties of the construct to be assessed—the cognitive demands. The chapter’s central take-away message is the importance of validity as the fundamental requirement for any assessment and the key indicator of the assessment’s technical quality. To ensure training effectiveness, learner performance must be assessed, and the assessments must be valid. The validity message is linked to three supporting ideas: 1. Validity is not a general quality of an assessment. An assessment does not possess a general quality called validity. An assessment’s validity depends on the context of its use and the inferences to be drawn based on the results. A validity argument must be made using a wide range of evidence for the appropriateness of the inferences for the particular context. 2. Begin with a definition of cognitive demand. The first step in assessment design (and instructional design) is a definition of the cognitive demands of the task—the set of processes and performances required for success. This leads to designing methods of measuring these processes and learning outcomes, including designing tasks that will elicit the desired performance, defining performance measures, and operationalizing the scoring algorithm for measuring constructs such as “understanding” or “problem solving” and then validating the approach with empirical evidence. 3. Designing a valid assessment cannot be separated from the design of instruction. Assessment design is not something to do after the training is developed. The assessment, including tasks, measures, and scoring, and how the results are to be used, must be included in the training program design. The design of assessments and training should be an integrated activity rather than approached separately.
Virtual environment training has great promise for training on complex high value tasks that have in the past required extended periods of training using expensive equipment and manpower, sometimes in environments placing trainees and instructors in harm’s way. Great promise and impressive technical capability are not sufficient to conclude effectiveness, however. To realize the promise, practitioners must assess the systems and the learning they help produce, and the assessments must be valid. The model based assessment methodology can help make this happen. REFERENCES American Educational Research Association, American Psychological Association, and National Council for Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Addison Wesley Longman, Inc. Baker, E. L. (1997). Model-based performance assessment. Theory Into Practice, 36(4), 247–254.
Assessment Models and Tools for Virtual Environment Training
311
Baker, E. L. (2002). Design of automated authoring systems for tests. In National Research Council, Board on Testing and Assessment, Center for Education, Division of Behavioral and Social Sciences and Education (Eds.), Technology and assessment: Thinking ahead—Proceedings from a workshop (pp. 79–89). Washington, DC: National Academy Press. Baker, E. L., Chung, G. K. W. K., & Delacruz, G. C. (2008). Design and validation of technology-based performance assessments. In J. M. Spector, M. D. Merrill, J. J. G. van Merrie¨nboer, & M. P. Driscoll (Eds.), Handbook of research on educational communications and technology (pp. 595–604). New York: Lawrence Erlbaum. Baker, E. L., Freeman, M., & Clayton, S. (1991). Cognitive assessment of history for large-scale testing. In M. C. Wittrock & E. L. Baker (Eds.), Testing and cognition (pp. 131–153). Englewood Cliffs, NJ: Prentice-Hall. Baker, E. L., & Herman, J. L. (1983). Task structure design: Beyond linkage. Journal of Educational Measurement, 20, 149–164. Baker, E. L., & Mayer, R. E. (1999). Computer-based assessment of problem solving. Computers in Human Behavior, 15, 269–282. Baker, E. L., & O’Neil, H. F., Jr. (1987). Assessing instructional outcomes. In R. M. Gagne´ (Ed.), Instructional technology (pp. 343–377). Hillsdale, NJ: Erlbaum. Baker, E. L., & O’Neil, H. F., Jr. (1996). Performance assessment and equity. In M. B. Kane & R. Mitchell (Eds.), Implementing performance assessment: Promises, problems, and challenges (pp. 183–199). Mahwah, NJ: Erlbaum. Bennett, R. E. (1999). Using new technology to improve assessment. Educational Measurement: Issues and Practice, 18(3), 5–12. Bennett, R. E. (2006). Moving the field forward: Some thoughts on validity and automated scoring. In D. M. Williamson, I. I. Behar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 403–412). Mahwah, NJ: Erlbaum. Bennett, R. E., & Bejar, I. I. (1998). Validity and automated scoring: It’s not only the scoring. Educational Measurement, 17(4), 9–17. Bennett, R. E., Jenkins, F., Persky, H., & Weiss, A. (2003). Assessing complex problem solving performances. Assessment in Education: Principles, Policy & Practice, 10, 347–359. Bewley, W. L., Lee, J. J., Munro, A., & Chung, G. K. W. K. (2007, April). The use of formative assessments to guide instruction in a military training system. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL. Birenbaum, M., Kelly, A. E., & Tatsuoka, K. K. (1993). Diagnosing knowledge states in algebra using the rule-space model. Journal of Educational Measurement, 20, 221–230. Burstein, J. (2003). The e-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 113–122). Mahwah, NJ: Erlbaum. Carretta, T. R., & Dunlap, R. D. (1998). Transfer of effectiveness in flight training: 1986 to 1997 (Rep. No. AFRL-HE-AZ-TR-1998-0078). Mesa, AZ: U.S. Air Force Research Laboratory. Chung, G. K. W. K., & Baker, E. L. (2003a). An exploratory study to examine the feasibility of measuring problem-solving processes using a click-through interface. Journal of Technology, Learning, and Assessment, 2(2). Available from http://jtla.org
312
Learning, Requirements, and Metrics
Chung, G. K. W. K., & Baker, E. L. (2003b). Issues in the reliability and validity of automated scoring of constructed responses. In M. D. Shermis & J. E. Burstein (Eds.), Automated essay grading: A cross-disciplinary approach (pp. 23–40). Mahwah, NJ: Erlbaum. Chung, G. K. W. K., Baker, E. L., Delacruz, G. C., Bewley, W. L., Elmore, J., & Seely, B. (2008). A computational approach to authoring problem-solving assessments. In E. L. Baker, J. Dickieson, W. Wulfeck, & H. F. O’Neil (Eds.), Assessment of problem solving using simulations (pp. 289–307). Mahwah, NJ: Erlbaum. Chung, G. K. W. K., Delacruz, G. C., Dionne, G. B., & Bewley, W. L. (2003). Linking assessment and instruction using ontologies. Proceedings of the I/ITSEC, 25, 1811–1822. Clark, R. E., & Estes, F. (2002). Turning research into results: A guide to selecting the right performance solutions. Atlanta, GA: CEP Press. Clauser, B. E. (2000). Recurrent issues and recent advances in scoring performance assessments. Applied Psychological Measurement, 24, 310–324. Gitomer, D. H., & Yamamoto, K. (1991). Performance modeling that integrates latent trait and class theory. Journal of Educational Measurement, 28, 173–189. Herl, H. E., O’Neil, H. F., Jr., Chung, G. K. W. K., & Schacter, J. (1999). Reliability and validity of a computer-based knowledge mapping system to measure content understanding. Computers in Human Behavior, 15, 315–334. Hively, W., Patterson, H. L., & Page, S. H. (1968). A “universe defined” system of arithmetic achievement tests. Journal of Educational Measurement, 5, 275–290. Koedinger, K. R., & Nathan, M. J. (2004). The real story behind story problems: Effects of representations on quantitative reasoning. Journal of the Learning Sciences, 13, 129–164. Lampton, D. R., Bliss, J. P., & Morris, C. S. (2002). Human performance measurement in virtual environments. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 701–720). Mahwah, NJ: Erlbaum. Margolis, M. J., & Clauser, B. E. (2006). A regression-based procedure for automated scoring of a complex medical performance assessment. In D. M. Williamson, I. I. Behar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computerbased testing (pp. 123–167). Mahwah, NJ: Erlbaum. Mayer, R. E. (1992). Thinking, problem solving, cognition (2nd ed.). New York: W. H. Freeman and Company. Messick, S. (1993). Validity. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13– 103). Phoenix, AZ: The Oryx Press. Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14(4), 5–8. Mislevy, R., & Gitomer, D. H. (1995). The role of probability-based inference in an intelligent tutoring system. User Modeling and User-Adapted Interaction, 5, 253–282. Mislevy, R., & Riconscente, M. (2005). Evidence-centered assessment design: Layers, structures, and terminology (PADI Tech. Rep. No. 9). Menlo Park, CA: SRI International. Mislevy, R. J., Steinberg, L. S., Breyer, F. J., Almond, R. G., & Johnson, L. (2002). Making sense of data from complex assessments. Applied Measurement in Education, 15, 363–389. National Research Council. (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.
Assessment Models and Tools for Virtual Environment Training
313
Stevens, R., Soller, A., Cooper, M., & Sprang, M. (2004). Modeling the development of problem solving skills in chemistry with a web-based tutor. Proceedings of the 7th International Conference on Intelligent Tutoring Systems (pp. 580–591). Berlin: Springer-Verlag. Stevens, R. H., & Casillas, A. (2006). Artificial neural networks. In D. M. Williamson, I. I. Behar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 259–312). Mahwah, NJ: Erlbaum. van Merrie¨nboer, J. J. G. (1997). Training complex cognitive skills: A four-component instructional design model for technical training. Englewood Cliffs, NJ: Educational Technology Publications. Vendlinski, T., Niemi, D., & Wang, J. (2005). Learning assessment by designing assessments: An on-line formative assessment design tool. In C. Crawford, R. Carlsen, I. Gibson, K. McFerrin, J. Price, & R. Weber (Eds.), Proceedings of Society for Information Technology and Teacher Education International Conference 2005 (pp. 228–240). Norfolk, VA: AACE. Wickens, C. D., & May, P. (1994). Terrain representation for air traffic control: A comparison of perspective with plan view displays (Tech. Rep. No. ARL-94-10/FAA-942). Savoy: University of Illinois, Aviation Research Laboratory. Williamson, D. M., Almond, R. G., & Mislevy, R. J. (2000). Model criticism of Bayesian networks with latent variables. In C. Boutilier & M. Goldzmidt (Eds.), Uncertainty in artificial intelligence; Proceedings of the 16th conference (pp. 634–643). San Francisco: Morgan Kaufmann. Williamson, D. M., Almond, R. G., Mislevy, R. J., & Levy, R. (2006). An application of Bayesian networks in automated scoring of computerized simulation tasks. In D. M. Williamson, I. I. Behar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 201–257). Mahwah, NJ: Erlbaum. Williamson, D. M., Bejar, I. I., & Mislevy, R. J. (2006). Automated scoring of complex tasks in computer-based training: An introduction. In D. M. Williamson, R. J. Mislevy, & I. I. Behar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 1–13). Mahwah, NJ: Lawrence Erlbaum.
Chapter 17
AUTOMATED PERFORMANCE ASSESSMENT OF TEAMS IN VIRTUAL ENVIRONMENTS Peter Foltz, Noelle LaVoie, Rob Oberbreckling, and Mark Rosenstein Multiplayer virtual environments provide an excellent venue for distributed team training. They provide realistic, immersive, engaging situations that can elicit the complex behaviors that encompass teamwork skills. These environments permit trainers the opportunity to target particular skills in order to assess and improve a team’s performance in situations that are difficult to create in live environments. In addition, because virtual environments provide fine-tuned control of the training situation and automate the collection of data, training teams in virtual environments can save effort and money when compared to live training. As the military and other large collaborative organizations incorporate greater network centric methods, operations, tactics, and technologies, virtual environments become an essential means to monitor, train, and assess teams. However, there are numerous challenges to effectively identify, track, analyze, and report on teams in complex virtual environments. For example, many current methods of assessing team and group performance rely on both global outcome metrics and handcrafted assessment techniques. These metrics often lack information rich enough to diagnose failures, detect critical incidents, or suggest improvements for the teams for use in their collaborative aids. It is also problematic for these techniques to produce assessments in the near real time frame that is necessary for effective training feedback because of the reliance on time consuming hand coding. Thus, while there has been an explosive increase in the availability of team information that can be obtained from a virtual environment, there needs to be a concomitant development in tools that can leverage the data to monitor, support, and enhance team performance. In this chapter, we discuss the issues of evaluating teams in virtual environments, describe an automated communications based analysis approach that we have found fruitful in tackling these issues, and finally detail the application and evaluation of this approach in predicting team performance in the context of three task domains.
Automated Performance Assessment of Teams in Virtual Environments
315
TEAM PERFORMANCE MEASUREMENT IN VIRTUAL ENVIRONMENTS Complex team virtual environments provide an ideal venue for team training. Orsanu and Salas (1993) identify a number of critical characteristics for training teams, including having interdependent members with defined roles, using multiple information sources, and sharing common goals. Because of the inherent automation in virtual environments, they afford better ability to measure performance of teams, both in recording what is being done by the team as well as what is communicated by team members. Nevertheless, while a virtual environment can produce a record of what team members have done and said, there are challenges in converting that information into measures of performance and difficulties in determining how those measures can be used to give feedback. Team performance can be seen as a combination of taskwork and teamwork. Taskwork, which is the work a team does to accomplish its mission, is often more amenable to automated analysis from a virtual environment event log. For example, a system can provide information on whether a person moved from x to y at time t and whether an objective was completed. Teamwork, on the other hand, encompasses how the team members coordinate with each other. In order to measure teamwork within virtual environments, the critical aspects of teamwork must be identified along with how they can be measured, assessed, and trained (for example, Salas & Cannon-Bowers, 2001). These skills include leadership, monitoring, backup behavior, coordination, and communication (for example, Cannon-Bowers, Tannenbaum, Salas, & Volpe, 1995; Curtis, Harper-Sciarini, DiazGranados, Salas, & Jentsch, 2008; Freeman, Diedrich, Haimson, Diller, & Roberts, 2003; Hussain et al., 2008). Curtis et al. (2008) identify three teamwork processes that have major impacts on teamwork and that appear to be strongly predictive of team performance: communication, coordination, and team leadership. These processes are typically assessed by subject matter experts (SMEs) watching and checking off behaviors associated with the processes. This protocol can be quite time consuming and is often performed after the exercise is completed rather than in real time, limiting the ability to incorporate teamwork performance measurement into virtual environments or provide timely feedback to teams. Thus, methods are required to automatically measure teamwork in an accurate and responsive manner. This chapter focuses on the aspects of communication that can be used to predict performance and how analyses of communications can be automated to provide rapid measurement of teamwork. COMMUNICATION AS AN INDICATOR OF PERFORMANCE Networked teams in virtual environments provide a rich source of information about their performance through their verbal communication. The communication data contain information both about the actual structure of the network and the flow of meaning through the network over time. The structure and communication patterns of the network can provide indications of team member roles, paths of information flow, and levels of connectedness within and across teams.
316
Learning, Requirements, and Metrics
The content of the information communicated provides detailed indications of the information team members know, what they tell others, whom they tell, and their current situation. Thus, communication data provide information about team cognitive states, knowledge, errors, information sharing, coordination, leadership, stress, workload, intent, and situational status. Indeed, within the distributed training community, trainers and subject matter experts typically rely on listening to a team’s communication in order to assess that team’s performance. Nevertheless, to effectively exploit the communication data, technologies need to be available that can assess both the content and patterns of the verbal information flowing in the network and convert the analyses into results that are usable by teams, instructors, and commanders. In this chapter, we provide an overview of ongoing research and development of a set of tools for the automatic analysis of team verbal communication and discuss their application in measuring team performance in virtual environment training systems. The tools exploit team communication data and use language technologies to analyze the content of communication, thereby permitting characterization of the topics and quality of information being transmitted. To explore these ideas further, we describe how these tools were incorporated into three application environments and the results of their use.
VERBAL COMMUNICATION ANALYSIS The overall goal of automated verbal communication analysis is to apply a set of computational modeling approaches to verbal communication in order to convert the networked communication into useful characterizations of performance. These characterizations include metrics of team performance, feedback to commanders, or alerts about critical incidents related to performance. This type of analysis has several prerequisites. The first is the availability of sources of verbal communication. Second, there must be performance measures that can be used to associate the communication to standards of actual team performance. These prerequisites can then be combined with computational approaches to perform the analysis. These computational approaches include computational linguistics methods to analyze communication, machine learning techniques to associate communication to performance measures, and finally cognitive and task modeling techniques. By applying the computational approaches to the communication, we have a complete communication analysis pipeline as represented in Figure 17.1. Proceeding through the tools in the pipeline, spoken and written communication are converted directly into performance metrics that can then be incorporated into visualization tools to provide commanders and soldiers with applications, such as automatically augmented after action reviews (AARs) and briefings, near real time alerts of critical incidents, timely feedback to commanders of poorly performing teams, and graphic representations of the type and the quality of information flowing within a team. We outline the approach to this communication analysis below.
Automated Performance Assessment of Teams in Virtual Environments
Figure 17.1.
317
The Communication Analysis Pipeline
Communication Data For analysis purposes, communication data include most kinds of verbal communication among team members. Typed communication (for example, chat, e-mail, or instant messages) can be automatically formatted for input into the analysis tools. Audio communication includes the capture of many kinds of spoken data, including use of voice over Internet protocol systems, radios, and phones. Because a majority of communication in virtual environments is typically spoken, two classes of information can be gleaned from the audio stream: content and audio features. Automatic speech recognition (ASR) systems convert speech to text for analysis of content, while audio analysis extracts such characteristics as stress or excitement levels from the audio. ASR systems often also provide measures such as rate of speech and ASR uncertainty. All this processed information can be input into the communication analysis system. Performance Metrics In order to provide feedback on team performance, the toolset learns to associate team performance metrics with the communication streams from those teams. Thus, the system typically requires one or more metrics of team performance. There is a wide range of issues in determining appropriate metrics for measuring team performance (for example, Brannick, Salas, & Prince, 1997). For example,
318
Learning, Requirements, and Metrics
metrics need to be associated with key outcomes or processes related to the team’s tasks; they should indicate and provide feedback on deficiencies for individuals and/or teams, and they need to be sufficiently reliable so that experts can agree on both the value of the metric and on how it should be scored for different teams (Paris, Salas, & Cannon-Bowers, 2001). Objective measures of performance can be used as metrics to indicate specific aspects of team performance. These measures can include threat eliminations, deviations from optimal solution paths, number of objectives completed, and measures derived from task-specific artifacts, such as Size, Acuity, Location, Unit, Time, and Equipment Report and Anterior Cingulate Cortex Report. One advantage of computer based environments is that they are able to automatically track and log events and then generate such objective measures. Subjective measures of performance can also be used as metrics. These can include subject matter experts’ ratings of such aspects as command and control, management of engagement, following doctrine, communication quality, and situation understanding. Additionally, SME evaluations of AARs and identification of specific critical incidents, failures, or errors can be used to measure performance. Care must be taken, as all metrics will have varying levels of reliability as well as validity. For new metrics, it is often advisable to obtain ratings from more than one SME in order to determine reliability. Computational Modeling Tools Communication data are converted into a computational representation that includes measures of the content (what team members are talking about), quality (how well team members seem to know what they are talking about), and fluency (how well team members are talking about it). This process uses a combination of computational linguistics and machine learning techniques that analyze semantic, syntactic, relational, and statistical features of the communication streams. While we will discuss a number of tools, the primary underlying technology used in this analysis is a method for mimicking human understanding of the meaning of natural language called Latent Semantic Analysis (LSA) (see Landauer, Foltz, & Laham, 1998, for an overview of the technology). LSA is automatically trained on a body of text containing knowledge of a domain, for example, a set of training manuals, and/or domain relevant verbal communication. After such training, LSA is able to measure the degree of similarity of meaning between two communication utterances in a way that closely mimics human judgments. This capability can be used to understand the verbal interactions in much the same way a subject matter expert compares the performance of one team or individual to others. The technique has been widely used in other machine understanding applications, including commercial search engines, automated scoring of essay exams, and methods for modeling human language acquisition. The results from the LSA analysis are combined with other computational language technologies, including techniques to measure syntactic complexity, patterns of interaction and coherence among team members, audio features, and
Automated Performance Assessment of Teams in Virtual Environments
319
statistical features of individual and team language (see Jurafsky & Martin, 2000). The computational representation of the team language is then combined with machine learning technology to predict the team performance metrics. In a sense, the overall method learns which features of team communication are associated with different metrics of team performance and then predicts scores for new sets of communication data employing those features. Performance Prediction with the Communication Analysis Toolkit Tests of the toolkit’s use for communication analysis have shown great promise. Tests are performed by training the system on one set of communication data and then testing its prediction performance on a new dataset. This procedure tests that the models generalize to new communication. Over a range of communication types, the toolkit is able to provide accurate predictions of the overall team performance and individual team metrics. It makes reliable judgments of the type of statements each team member is making, and it can predict team performance problems based on the patterns of communication among team members (Foltz, 2005; Gorman, Foltz, Kiekel, Martin, & Cooke, 2003). In addition to the approaches described above, there have been other approaches used to analyze communication in teams that have shown great promise. These have included modeling communication flow patterns to predict team performance and cognitive states (see Gorman, Weil, Cooke, & Duran, 2007; Kiekel, Gorman, & Cooke, 2004). The communication analysis toolkit has been tested in many environments, including an unmanned aerial vehicle synthetic task environment (see Gorman et al., 2003; Foltz, Martin, Abdelali, Rosenstein, & Oberbreckling, 2006), in air force simulators of F-16 missions (Foltz, Laham, & Derr, 2003; Foltz et al., 2006), and in Navy Tactical Decision Making Under Stress (TADMUS) exercises (Foltz et al., 2006). The tools predicted both objective team performance scores and SME ratings of performance at very high levels of reliability (correlations ranged from r = 0.5 to r = 0.9 over 20 tasks). It should be noted that the agreement between the toolkit’s predictions and SMEs is typically within the range of one SME to another. In addition, the tools are able to characterize the type of communication for individual utterances (for example, planning, stating facts, and acknowledging; Foltz et al., 2006). Issues and Current Limitations of This Approach While the next section describes successful applications of this approach, there are a number of issues and current limitations. For verbal communication, this approach requires automatic speech recognition, and that technology currently has a number of limitations. The state of the art requires building acoustic models and speaker-independent but task-specific models currently requiring about 20 hours of speech to train the ASR system, which increases the startup time for a new task domain.
320
Learning, Requirements, and Metrics
The second prerequisite of the approach is performance measures. If objective measures are available, then as soon as the ASR is available, teams can begin to execute the task, communication data and performance data can be collected, and then a performance model can be built. If expert ratings are preferred, then protocols for scoring communications need to be developed and then SMEs must score a set of missions to be used as a training set, though this limitation will impact any approach that uses experts. Besides these startup costs, there is also the issue of the accuracy of the ASR. The communication analysis technologies have been tested with ASR input from a number of datasets of spoken communication (see Foltz et al., 2003). The results indicate that even with typical ASR systems degrading word recognition by 40 percent, the model prediction performance degraded less than 10 percent. Thus, the approach appears to be quite robust to typical ASR errors.
APPLICATIONS OF THE COMMUNICATION ANALYSIS TOOLKIT IN VIRTUAL ENVIRONMENTS A number of applications have been developed to test the performance and validate the use of the toolkit in virtual and live training situations. Below we describe three applications, one monitoring and assessing learning in online discussion environments, another providing real time analyses and visualizations of multinational stability and support operation simulation exercises, and the third providing automated team performance metrics and detection of critical incidents in both convoy operations in simulators and in live training environments. These applications cover the range of immersion in virtual environments. At one end are collaborative discussion environments, which permit use and evaluation of the planning, communication, and coordination aspects of teams, but do not provide the full immersive qualities of a simulator environment. At the other end are virtual convoy environments and similar live training environments where the approach is tested as teams move from virtual to real world training and operations.
Knowledge Post In large networked organizations, it is difficult to track performance in distributed exercises. Knowledge Post is designed for monitoring, moderating, and assessing collaborative learning and planning. The tools within Knowledge Post have been tested in a series of studies at the U.S. Army War College and the U.S. Air Force Academy (LaVoie, Psotka, Lochbaum, & Krupnick, 2004; LaVoie et al., in press; Lochbaum, Streeter, & Psotka, 2002). The application consists of an off-the-shelf threaded discussion group that has been substantially augmented with latent semantic analysis based functionality to evaluate and support individual and team contributions in simulated planning operations. Knowledge Post supports the following abilities:
Automated Performance Assessment of Teams in Virtual Environments
321
• To automatically notify the instructor when the discussion goes off track, • To enhance the overall quality of the discussion and consequent learning level by having expert comments or library articles automatically interjected into the discussion at appropriate places, • To locate material in the discussion or electronic library similar in meaning to a given posting, • To automatically summarize contributions, and • To assess the quality of contributions made by individuals and groups.
The utility of each of the aforementioned functions was empirically evaluated with senior officers, either in research sessions or participating in distributed learning activities at the U.S. Army War College, or with cadets at the U.S. Air Force Academy. Among the findings of the studies was the superiority of learning in a Knowledge Post environment over a face-to-face discussion with significantly improved quality of discussion and the usefulness to the participants of the Knowledge Post searching and summarizing features (Lochbaum et al., 2002). The research conducted with the Army War College established the usefulness and accuracy of a software agent that automatically alerts moderators when groups and individuals are floundering by identifying on- and off-topic comments in a discussion (LaVoie et al., 2004). A human rater coded over 1,000 comments as either on topic or off. A second rater coded a random 10 percent of these comments. The correlation between the two raters for this task was r (162) = .85, p < .001, while the correlation between the LSA based model and one human rater was r (1,605) = 0.72, p < .001, showing that the model was able to accurately determine when a group’s discussion was off topic. The work with the Air Force Academy demonstrated improved solution quality of a group of cadets as a result of exposure to automatically interjected relevant expert comments (LaVoie et al., in press). Cadets participated in a discussion of a challenging leadership scenario. The discussion was conducted in one of three ways: (1) face-toface in a classroom with a live human moderator, (2) in Knowledge Post with an automated moderator that added relevant comments from experts, or (3) in Knowledge Post without the automated moderator. The quality of the discussions was evaluated by using LSA to determine the similarity of the cadets’ discussion to that of senior military officers, and the highest quality discussions were found for groups that used Knowledge Post with the automated moderator to conduct their discussions (see Figure 17.2). Although customized for distributed learning activities, the tools developed within Knowledge Post can be incorporated into other virtual environments for automated analysis and monitoring of teams performing planning based discussions. TeamViz The TeamViz application provides teams and evaluators ways of monitoring performance in large collaborative environments using a set of visualization tools and enhancements built on the Knowledge Post toolset. TeamViz ran live during
322
Learning, Requirements, and Metrics
Figure 17.2.
Quality of Discussion Comments in the Three Discussion Conditions
a U.S.–Singapore simulation exercise designed to evaluate collaboration among joint, interagency, and multinational forces conducting combat and stability operations (Pierce et al., 2006). The system automatically analyzed the content and patterns of information flow of the networked communication. It also provided automated summarizations of the ongoing communications as well as network visualization tools to improve situation understanding of team members. Analyses showed that the technology could track the flow of commander’s intent among the team members by comparing the commander’s briefing to the content of communication of different parts of the team. For example, the commander stressed the importance of naval facility defense in his briefing to three groups: two brigades under his command and the coalition task force (CTF) command staff. Comparing the content of the communications in each group following this briefing shows that Brigade 1 followed the commander’s intent more closely than did Brigade 2 (see Figure 17.3). It was also possible to detect the effects of scenario information injects on performance within the coalition task force and brigades by comparing the communication within each group to the content of the scenario inject. Figure 17.4 shows the response to an inject about a chemical weapons attack. It is clear that the coalition task force responded more quickly to the inject, and with a greater degree of discussion, than did either brigade.
Automated Performance Assessment of Teams in Virtual Environments
323
Figure 17.3. Communication analysis shows that Brigade 1 (Bde1) followed the commander’s intent more closely than Brigade 2 (Bde2).
Singapore staff officers used TeamViz in real time to monitor the communication streams and to inform their commanders of important information flowing in the network, as well as to indicate perceived information bottlenecks. Overall, the TeamViz technologies permit knowledge management of large amounts of communication, as well as improved cognitive interoperability in distributed operations where communication among ad hoc teams is critical. Competence Assessment and Alarms for Teams Convoy operations require effective coordination among a number of vehicles and other elements, while maintaining security and accomplishing specific goals. However, in training for convoy operations it is difficult to monitor and provide feedback to team members in this complex environment. The Defense Advanced Research Projects Agency (DARPA) Automated Competence Assessment and Alarms for Teams (DARCAAT) program was designed to automate performance assessment and provide alarms for live and virtual convoy operations training. As part of the program, communication data and SME based performance measurements were collected, and then specialized tools to assess and visualize performance in convoy operations were developed. The DARCAAT program collected voice communication data during convoy training operations and then collected SME based performance measurements on that data. From these, the DARCAAT program developed specialized tools to assess and visualize convoy operation performance. Two sources of data were
324
Learning, Requirements, and Metrics
Figure 17.4. Communication analysis shows that the coalition task force responded more quickly to a scenario inject than either brigade.
used: one from teams in a virtual environment and one from teams in live training environments. The goal was to evaluate how well performance assessment tools could be applied to a single domain across both virtual and live training. For the virtual environment, communication data were collected from the Fort Lewis Mission Support Training Facility, which uses the DARWARS Ambush! virtual environment for convoy training. DARWARS Ambush! is a widely used game based training system and has been integrated into training for many brigades prior to deployment in Iraq (Diller, Roberts, Blankenship, & Nielson, 2004; Diller, Roberts, & Wilmuth, 2005). DARWARS Ambush! provides an excellent environment for team training and performance analysis because it provides reasonably controlled scenarios and environment and has the ability to instrument teams for voice communications, video, and environmental event data collection. In this environment, up to 60 soldiers can jointly practice battle drill training and leader/team development during convoy operations. Figure 17.5 shows the training environment for DARWARS Ambush! and Figure 17.6 shows a typical user’s view during training. In addition to the virtual environment DARWARS Ambush! data, the DARCAAT program collected live convoy Situational Training Exercise lane training
Automated Performance Assessment of Teams in Virtual Environments
325
data from the National Training Center (NTC) at Fort Irwin. The data included digital audio recordings of FM radio communication among the convoy team members, as well as videos of the convoy operations. Using the virtual and live convoy communications data, subject matter experts rated team performance on a number of metrics (battle drills, adherence to standard operating procedures [SOPs], situation understanding, command and control, and overall team performance) as well as indicated places in the scenario in which a critical event occurred (that is, “an event that significantly alters the battleground”). Prediction models were then built by analyzing the communication data using the full team communication analysis pipeline shown in Figure 17.1. The results indicate that the DARCAAT toolset is able to accurately match SME ratings of team performance as well as detect critical events (for example, performance alarms). Using the DARWARS Ambush! data, the system could automatically detect 87 percent of the SME-rated critical events with a false positive rate of 19 percent. Thresholds for detecting critical events can be adjusted to allow them to be used as performance alarms enabling a commander to set lower thresholds that provide alerts for any case in which a team might be having performance problems at the cost of only a slightly higher false alarm rate. The DARCAAT model also predicted the SME ratings of team performance on each of the performance metrics. Table 17.1 shows the correlations between the SME ratings of overall team performance and the predictions generated by the DARCAAT toolset from analyzing the teams’ communications based on
Figure 17.5. Facility
DARWARS Ambush! at the Fort Lewis Mission Support Training
Figure 17.6. Screen from DARWARS Ambush! Training Scenario
Automated Performance Assessment of Teams in Virtual Environments
327
Table 17.1. Correlation between SME Ratings and DARCAAT Predictions for Overall Team Performance Metric
NTC & Ambush! (n = 51)
Ambush! (n = 45)
Battle drills
0.74
0.73
Command and control
0.71
0.70
Situation understanding
0.83
0.81
SOPs
0.73
0.79
TEAM
0.78
0.72
45 Ambush! missions and 6 NTC missions (all significant p < .01). It should be noted that the correlations between the SMEs and the toolset were equivalent to those found between multiple SMEs rating the same missions. As a demonstration of the application of the DARCAAT toolset, an after action review application was developed that could be integrated into a training program to allow observer/controllers (OCs) and commanders to monitor teams and receive feedback on the team’s performance. The application provides efficient automatic augmentation of AARs assisting the OCs in choosing the most appropriate segments of missions to illustrate training points. Figure 17.7 shows one screen from the AAR tool.
Figure 17.7.
Visualization of Team Performance Scores from the AAR Tool
328
Learning, Requirements, and Metrics
The tool processes the incoming communication data from a team and then allows an OC or commander to load any mission and provides immediate access to several critical pieces of information: • The top left portion of the AAR tool displays the mission divided into a list of sequenced events with highlighted critical events. • Each event in the list is scored on a series of metrics: CC (command and control), SA (situation awareness), SOP (adherence to standard operating procedures), CA (combat action/battle drills), and TP (overall team performance). • The event list can be sorted by score, allowing rapid identification of the most serious issues. • The lower portion of the AAR tool shows a mission timeline linked to the event list, with facility to play audio files and view an ASR transcript of each event.
Overall, the results from the DARCAAT project illustrate that performance measures can be automatically and accurately generated from communication in teams performing in multiuser virtual and live environments. These performance measures can then be incorporated into visualization and training tools that permit trainers to monitor and assess team status in real time. CONCLUSIONS Communication is the glue that holds teams together in networked virtual environments. It is also one of the richest sources of information about the performance of the team. The content and patterns of a team’s communication provide a window into performance and cognitive states of the individuals and the team as a whole. Analysis of the complex cascades of communication requires tools that can assess both the content and patterns of information flowing in the network. The approach described in this chapter can automatically convert the communication into specific metrics of performance thereby permitting a better picture of the state of teams in virtual environments at any point in time. The tools use language technologies to analyze the content of communication thereby permitting characterization of the topics and quality of information being transmitted. The toolkit allows the analysis and modeling of both objective and subjective performance metrics, and it is able to work with large amounts of communication data. Indeed, because of its machine learning foundation, it works better with more data. The toolkit can automatically extract measures of performance by modeling how subject matter experts have rated similar communication in similar situations, as well as modeling objective performance measures. Further, because the methods used are automatic and do not rely on any hand-coded models, they allow performance models to be developed without the extensive effort typically involved in standard task analysis or cognitive modeling approaches. Notably, the approach can be integrated with traditional assessment methods to develop objective and descriptive models of distributed team performance. Overall, the toolset has the ability to provide near real time (within seconds)
Automated Performance Assessment of Teams in Virtual Environments
329
assessment of team performance, including measures of situation understanding, knowledge gaps, workload, and detection of critical incidents. It can be used for tracking teams’ behaviors and cognitive states, for determining appropriate feedback, and for automatically augmenting after action reviews. New Directions There remain a number of challenges to incorporating automated analysis of the content of communication into full-scale virtual environments for training venues. First, virtual environments must provide technology to allow easy collection of communication data to allow analysis by toolsets. In addition, virtual environments need to make log files of participant actions, locations, and movements easily accessible so that tools can derive and analyze additional performance measures. Second, while the results described in this chapter use teams ranging in size from 3 to 70 soldiers, it is important to understand the challenges of scaling up to even larger operations. Finally, a number of other technologies can be included to improve and help generalize modeling performance. These include better modeling of network structures, incorporation of additional modalities of information (for example, event and action information), improved computational modeling tools, and leveraging of other advances in measuring performance in complex virtual environments. The automated analysis of communication can be applied in a wide range of virtual environment applications beyond those described here. This approach can be integrated into and make possible adaptable training systems that automatically adjust the level of difficulty of the training based on performance of the team. Finally, the overall approach helps in understanding the role of communication in complex human networks. Results from analyses of teams in realistic situations can help clarify both how communication affects team performance and how performance is reflected through communication. ACKNOWLEDGMENTS This work was supported in part by grants and contracts from DARPA, the U.S. Army Research Institute, the U.S. Army Research Laboratory, the Office of Naval Research, and the Air Force Research Laboratory. The authors are grateful for the contributions of Terry Drissell, Marita Franzke, Brent Halsey, Kyle Habermehl, Tim McCandless, Chuck Panaccione, Manju Putcha, and David Wroblewski for development and data analyses. REFERENCES Brannick, M. T., Salas, E., & Prince, C. (1997). Team performance assessment and measurement: Theory, methods, and applications. Mahwah, NJ: LEA. Cannon-Bowers, J. A., Tannenbaum, S. I., Salas, E., & Volpe, C. E. (1995). Defining team competencies and establishing team training requirements. In R. Guzzo & E. Salas
330
Learning, Requirements, and Metrics
(Eds.), Team effectiveness and decision making in organizations (pp. 330–380). San Francisco: Jossey-Bass. Curtis, M. T., Harper-Sciarini, M., DiazGranados, D., Salas, E., & Jentsch, F. (2008). Utilizing multiplayer games for team training: Some guidelines. In H. F. Oneil & R. S. Perez (Eds.), Computer games and team and individual learning (pp. 145–165). Oxford, United Kingdom: Elsevier. Diller, D. E., Roberts, B., Blankenship, S., & Nielsen, D. (2004). DARWARS Ambush!— Authoring lessons learned in a training game. Proceedings of the Interservice/Industry Training, Simulation and Education Conference. Arlington, VA: National Training Systems Association. Diller, D. E., Roberts, B., & Willmuth, T. (2005, September). DARWARS Ambush! A case study in the adoption and evolution of a game-based convoy trainer with the U.S. Army. Paper presented at the Simulation Interoperability Standards Organization, Orlando, FL. Foltz, P. W. (2005). Tools for enhancing team performance through automated modeling of the content of team discourse. Proceedings of the HCI International Conference. Saint Louis, MO: Mira Digital Publishing. Foltz, P. W., Laham, R. D., & Derr, M. (2003). Automated speech recognition for modeling team performance. Proceedings of the 47th Annual Human Factors and Ergonomic Society Meeting. Santa Monica, CA: Human Factors and Ergonomics Society. Foltz, P. W., Martin, M. A., Abdelali, A., Rosenstein, M. B., & Oberbreckling, R. J. (2006). Automated team discourse modeling: Test of performance and generalization. Proceedings of the 28th Annual Cognitive Science Conference. Bloomington, IN: Cognitive Science Society. Freeman, J., Diedrich, F. J., Haimson, C., Diller, D. E., & Roberts, B. (2003). Behavioral representations for training tactical communication skills. Proceedings of the 12th Conference on Behavior Representation in Modeling and Simulation. Scottsdale, AZ. Gorman, J., Weil, S. A., Cooke, N., & Duran, J. (2007). Automatic assessment of situation awareness from electronic mail communication: Analysis of the Enron dataset, Proceedings of the Human Factors and Ergonomics Society 51st Annual Meeting (pp. 405–409). Santa Monica, CA: Human Factors and Ergonomics Society. Gorman, J. C., Foltz, P. W., Kiekel, P. A., Martin, M. A., & Cooke, N. J. (2003). Evaluation of Latent Semantic Analysis-based measures of communications content. Proceedings of the 47th Annual Human Factors and Ergonomic Society Meeting. Santa Monica, CA: Human Factors and Ergonomics Society. Hussain, T. S., Weil, S. A., Brunye´, T. T., Sidman, J., & Alexander, A. L., & Ferguson, W. (2008). Eliciting and evaluating teamwork within a multi-player game-based training environment. In H. F. Oneil & R. S. Perez (Eds.), Computer games and team and individual learning (pp. 77–104). Oxford, United Kingdom: Elsevier. Jurafsky, D., & Martin, J. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. New York: Prentice Hall. Kiekel, P.A., Gorman, J. C., & Cooke, N. J. (2004). Measuring speech flow of co-located and distributed command and control teams during a communication channel glitch. Proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to Latent Semantic Analysis. Discourse Processes, 25(2&3), 259–284.
Automated Performance Assessment of Teams in Virtual Environments
331
LaVoie, N., Psotka, J., Lochbaum, K. E., & Krupnick, C. (2004, February). Automated tools for distance learning. Paper presented at the New Learning Technologies Conference, Orlando, FL. LaVoie, N., Streeter, L., Lochbaum, K., Wroblewski, D., Boyce, L., Krupnick, C., & Psotka, J. (in press). Automating expertise in collaborative learning environments. Journal of Asynchronous Learning Networks. Lochbaum, K., Streeter, L., & Psotka, J. (2002, December). Exploiting technology to harness the power of peers. Paper presented at the Interservice/Industry Training, Simulation and Education Conference, Orlando, FL. Orsanu, J., & Salas, E. (1993). Team decision making in complex environments. In G. A. Klein, J. Orsanu, R. Calderwood, & C. E. Zambok (Eds), Decision making in action: Models and methods (pp. 327–345). Norwood, NJ: Ablex Publishing. Paris, C. R., Salas, E., & Cannon-Bowers, J. A. (2001). Teamwork in multi-person systems: A review and analysis. Ergonomics, 43(8), 1052–1075. Pierce, L., Sutton, J., Foltz, P. W., LaVoie, N., Scott-Nash, S., & Lauper, U. (2006, July). Technologies for augmented collaboration. Paper presented at the CCRTS, San Diego, CA. Salas, E., & Cannon-Bowers, J. A. (2001). The science of training: A decade of progress. Annual Review of Psychology, 52, 471–499.
Chapter 18
A PRIMER ON VERBAL PROTOCOL ANALYSIS Susan Trickett and J. Gregory Trafton We have been using verbal protocol analysis for over a decade in our own research, and this chapter is the guide we wish had been available to us when we first started using this methodology. The purpose of this chapter is simply to provide a very practical, “how-to” primer for the reader in the science—and art—of verbal protocol analysis. It is beyond the scope of this chapter to discuss the theoretical grounding of verbal protocols; however, there are several resources that do so, as well as providing some additional practical guidelines for implementing the methodology: Ericsson and Simon (1993); van Someren, Barnard, and Sandberg (1994); Chi (1997); Austin and Delaney (1998); and Ericsson (2006). In addition, numerous studies—too many to list here—have effectively used verbal protocols, and these may provide additional useful information about the application of this technique. Briefly put, verbal protocol analysis involves having participants perform a task or set of tasks and verbalizing their thoughts (“talking aloud”) while doing so. The basic assumption of verbal protocol analysis, to which we subscribe, is that when people talk aloud while performing a task, the verbal stream functions effectively as a “dump” of the contents of working memory (Ericsson & Simon, 1993). According to this view, the verbal stream can thus be taken as a reflection of the cognitive processes in use and, after analysis, provides the researcher with valuable information about not only those processes, but also the representations on which they operate. In addition, verbal protocols can reveal information about misconceptions and conceptual change, strategy acquisition, use, and mastery, task performance, affective response, and the like. Verbal protocol analysis can be applied to virtual environments in several ways. For example, verbal protocols collected during performance by both experts and novices can inform the design of virtual environments to be used for training. By comparing expert and novice performance, a designer could identify specific areas where learners would benefit from support, remediation, or
A Primer on Verbal Protocol Analysis
333
instruction. Similarly, having experts talk aloud while performing a task could help a curriculum designer identify and develop the content to be delivered in the virtual environment. Verbal protocols are likely to be especially useful in evaluation—first, in determining how effective the virtual environment is as a training tool, and second, in assessing the student’s learning and performance, where verbal protocols can provide valuable information about the learner’s cognitive processes, beyond simple measures of accuracy and time on task. Before explaining the “nuts and bolts” of verbal protocol analysis, we should note that there are several caveats to using this method. First, it is important to distinguish between concurrent and retrospective verbal protocols. Concurrent protocols are delivered at the same time as the participant performs the task and are ideally unprompted by the experimenter. Retrospective protocols are provided after the task has been completed in response to specific questions posed by the experimenter, such as “How did you solve this problem, and why did you choose that particular strategy?” The chief problem with retrospective protocols is that people do not necessarily have access either to what they did or why they did it (see Nisbett and Wilson, 1977, for a full discussion of this issue; Ericsson and Simon, 1993, for an overview of research on retrospective versus concurrent protocols; and Austin and Delaney, 1998, for a discussion of when retrospective protocols may be reliably used). Second, it is important to differentiate between having a participant think aloud while problem solving and having him or her describe, explain, or rationalize what he or she is doing. The main problem with participants providing explanations is that this removes them from the process of problem solving, causing them to think about what they are doing rather than simply doing it, and as a result, such explanations may change their performances. Ericsson and Simon (1993) provide a complete discussion of the effects of such “socially motivated verbalizations” on problem solving behavior, and later in this chapter we suggest ways to reduce the likelihood that they will happen. Third, it is important to be aware of the potential for subjective interpretation at all stages of collecting and analyzing verbal protocol data, from interpreting incomplete or muttered utterances to assigning those utterances a code. The danger is that a researcher may think he or she “knows” what a participant intended to say or meant by some utterance and thus inadvertently misrepresent the verbal protocol. We will describe ways to handle the data that reduce the likelihood of researcher bias affecting interpretation of the protocols, and we will suggest some methodological safeguards that can also help minimize this risk. Another important consideration for the researcher is the level of time commitment involved in verbal protocol studies. Simply put, this type of research involves a serious investment of time and resources, not only in collecting data (resources in terms of experimenters, participants, and equipment), but also in transcribing, coding, and analyzing data. It is simply not possible for one researcher to manage all the tasks associated with verbal protocol analysis of a complex task/domain with many participants. Furthermore, both collecting
334
Learning, Requirements, and Metrics
and coding the data require a high level of training, and thus it is important that members of the research team are willing to make a commitment to complete the project. A coder who quits midstream causes a major setback to the research, because a new recruit must be found and trained before the work can continue. Verbal protocol data are “expensive data.” Depending on the task to be performed, experimental sessions generally last one or two hours, and participants must usually be “run” singly. In some studies, participants in an undergraduate psychology pool are not suitable candidates for the research; thus, time must be spent identifying and recruiting appropriate participants. In the case of expert studies or “real world” performance, data collection often involves travel to the participant’s work site and may require coordination with supervisors and coworkers, or even special permission to enter a site. Those who agree to participate in the research usually must give up valuable work time, and schedule changes or other unexpected occurrences may prevent their participation at the last minute. Thus, perhaps even more than in laboratory research, it is crucial to plan as much as possible and to take into account many contingencies that can arise. Such contingencies range from the mundane (for example, triple-checking that the equipment is functioning properly, that backup batteries are available, and that all the relevant forms and materials are in place) to the unpredictable (for example, having backup plans for cancellation or illness). Despite these caveats, we believe that collecting verbal protocol data is a flexible methodology that is appropriate for research in any research endeavor for which the goals are to understand the processes that underlie performance. We think it will be especially fruitful in the domain of virtual reality (VR)/virtual environments (VEs). Not much protocol analysis has been done in this area; however, the technique offers many insights into cognitive processing and is likely to be very useful in comparing VR/VEs to real world performance, for example. Researchers involved in education and training, including VE training, may have many goals—for example, understanding how effective a given training program is, what are its strengths and weaknesses, how it can be improved, how effectively students use the system, whether students actually learn what the instruction is designed to teach, what difficulties they have in mastering the material to be learned, what difficulties they have using the delivery system, and so on. Outcome measures, such as time on task or accuracy on a knowledge assessment test, provide only partial answers and only to some of these questions. Process measures, on the other hand, provide a much richer, more detailed picture of what occurs during a learning or problem solving session and allow a much finergrained analysis of both learning and performance. Verbal protocols are the raw data from which extremely useful process data can be extracted. Although our emphasis in this chapter is only on verbal protocol data, combining different types of process data (such as verbal protocols, eye-track data, and mouse-click data) can create an extremely powerful method to understand not only what is learned, but also how learning occurs.
A Primer on Verbal Protocol Analysis
335
DATA COLLECTION A number of issues related to data collection should be addressed before the first participant even arrives! Several confidentiality issues surround the act of collecting data, which stem from the potentially sensitive nature of video and/or audio recording participants as they perform tasks. First, in obtaining informed consent, it is crucial that participants fully understand the recording process and have a specific option to agree or disagree to have their words, actions, and bodies recorded. Thus, consent forms need to contain two lines, as follows: ____ I agree to have my voice, actions, and upper body videotaped. ____ I do not agree to have my voice, actions, and upper body videotaped. Of course, the actual wording will depend on the scope of recording planned—it might additionally include gestures or walking patterns, for example. Second, agreement to have one’s words and actions recorded should not be taken as agreement to have one’s words and actions shared with others. Although participants might be quite willing to participate in a study and to be recorded while doing so, they might be uncomfortable at the thought of their data being shared, for example, at conferences and other research presentations. If the researcher is planning to use video clips as illustrations during research talks, it is important to have an additional section of the consent form that specifically asks participants whether or not they agree to have their data shared. They have the right to refuse, and this right must be honored. As mentioned earlier, this does not necessarily disqualify participants from participating in the study; it just means that a participant’s video cannot be shared with others. The third issue relating to confidentiality is that of storing the data. It is especially important to reassure participants that the data will be stored confidentially—that data will be identified by a number rather than a participant’s name, that data will be locked away or otherwise secured, and that access to the data will be strictly limited to members of the research team. One of the advantages of verbal protocol data is its richness; the downside of this richness is that the data quickly become voluminous. Even a relatively short session can result in pages of transcribed protocol that must be coded and analyzed. In addition, participants, such as experts, may be scarce and difficult to recruit. For these reasons, verbal protocol studies generally have fewer subjects than other kinds of psychological studies. How many participants are needed? The answer depends on the nature of the study. We (and others) have had as few as one person participate in a particular study. Such single-subject case studies are extremely useful for generating hypotheses that can then be tested experimentally on a larger sample. In general, however, we have found that the more expert the participants, the less variance there is in relevant aspects of performance. With less expert participants, we generally find slightly higher numbers of participants work well. Ideally, we try to include 5 to 10 novices, although this is not always possible. Studies with larger numbers of participants generally involve less exploratory research, for which a coding scheme is already well established and for which only a subset of the data needs to be coded and
336
Learning, Requirements, and Metrics
analyzed. As with all research, the precise number of participants will rest on the goals of the study (exploratory, confirmatory, case study, and so forth), the nature of the participants (experts, novices, or naive subject pool participants), and practical considerations concerning recruitment and accessibility of participants. COLLECTING DATA Adequate recording equipment is a must. Although we have occasionally used audio recording alone, we have found that the additional information contained in video recording can be very valuable when it comes to analyzing the data. Furthermore, we have found that having more than one camera is often worthwhile, for several reasons. First, an additional camera (or cameras) provides an automatic backup in the case of battery or other equipment failure. In addition, a second camera can be aimed at a different area of the experimental setup, thus reducing the need for the experimenter to attempt to follow the participant around while he or she performs the task. In some cases, we have even used three cameras, as additional cameras recording from different angles allow the experimenter to reconstruct more precisely what was happening. We are also somewhat extravagant in our use of videotapes. Even when a session is quite short and leaves a lot of unused space on a tape, we use a new videotape for each participant or new session. Especially with naturalistic tasks, it is difficult to anticipate how long a participant will take. Some people work quite slowly, and underestimating the time they will take can lead to having to change a tape mid-session. Our view is that videotapes and batteries are relatively cheap, whereas missing or lost data are extremely expensive. In short, we have found that thinking ahead to what data we might need has helped us to have fewer regrets once data collection is completed. The most critical aspect of verbal protocol data is the sound, and a high quality lapel microphone is well worth the cost. Many participants are inveterate mutterers, and most participants become inaudible during parts of the task. These periods of incoherence often occur at crucial moments during the problem solving process, and so it is important to capture as much of what may be transcribed as possible. The built-in microphone on most cameras is simply not powerful enough to do so. Using a lapel microphone also allows the researcher to place the camera at a convenient distance behind the participant, capturing both the sound and the participant’s interactions with experimental materials but not his or her face. Attaching an external microphone (such as a zoom or omnidirectional microphone) to any additional cameras that are used will allow the second, or backup, camera to capture sound sufficiently well in most cases, although muttering or very softly spoken utterances may be lost. Having such a reliable backup sound recording system is always a good idea. It is crucial to begin the session with good batteries and to check, before the participant begins the task, that sound is actually being recorded. The experimenter should also listen in during the session to make sure that nothing has gone
A Primer on Verbal Protocol Analysis
337
amiss. Although this point may seem obvious, most researchers who collect verbal protocols have had the unhappy experience of reviewing a videotape only to find there is no sound. We hope to spare you that pain! As with any psychological research, participants should be oriented to the task before beginning the actual study. In the case of verbal protocols, this orientation includes training in actually giving verbal protocols. We have a standard training procedure that we use with all participants.1 During the task, if a participant is silent for more than three or four seconds, it is important to jump in with a prompt. The researcher should simply say, “Please keep talking” or even “Keep talking.” More “polite” prompts, such as “What are you thinking?” are actually detrimental, because they invite a more social response from the participant (such as, “Well, I was just wondering whether . . .”) and thus remove the participant from the process of problem solving into the arena of social interaction. Occasionally, a participant will be unable to provide a verbal protocol, and this difficulty generally manifests itself in one of two ways: either the participant remains essentially silent throughout the task, talking aloud only when prompted and then lapsing back into silence, or the participant persists in explaining what he or she is doing or about to do. In the first case, there is very little the experimenter can do, and depending on the situation, it may be better to bail out rather than subject the participant to this uncomfortable situation, since the data will be unusable in any case. One way to help prevent this from happening is to ensure beforehand that participants are fully at ease in whatever language the protocol is to be given. In the second case, the experimenter must make a judgment call as to whether to pause the session and provide some retraining. Retraining will be disruptive and may make the participant even more nervous; however, it is unlikely that a participant will change strategy midstream. In either case, protocols that consist mostly of explanations cannot be used. These problems should not be confused with protocols that contain a great deal of muttering, broken grammatical structures, or incomplete or incoherent thoughts. Although a challenge to the transcriber and the coder, such protocols are quite acceptable and, in fact, represent “good” protocols, since people rarely think in complete sentences. The objective during verbal protocol collection is to keep the participant focused on the task, without distraction. One obvious way to do this is to make sure that the room in which the session is taking place is not in a public place and that a sign on the door alerts others not to interrupt. Less obviously, it means that the experimenter should resist the urge to ask questions during the session. If domain related questions do arise, the experimenter should make a note of them and ask after the session is completed. Querying participants’ explicit knowledge after task completion does not jeopardize the integrity of the verbal protocols, because it taps into stable domain knowledge that is not altered by the retrospective process. Asking questions during problem solving, however, is disruptive and may alter the participants’ cognitive processes. 1
A copy of this procedure is available at http://www.nrl.navy.mil/aic/iss/aas/cog.complex.vis.php
338
Learning, Requirements, and Metrics
PROCESSING THE DATA Once data collection is completed, verbal protocols must be transcribed and coded before data analysis can begin. Unfortunately, there are no consistently reliable automated tools to perform transcription. Fortunately, transcription can be carried out by less-skilled research assistants, with a couple of caveats. First, transcribers should be provided with a glossary of domain relevant terms; otherwise, when they encounter unfamiliar words, they are likely to misinterpret what is said. Second, transcribers should be trained in segmenting protocols. Both these steps will reduce the need for further processing of the transcriptions before coding can begin. How protocols are segmented depends on the grain size of the analyses to be performed and will therefore vary from project to project. Careful thought should be given to the appropriate segmentation, because this process will affect the results of the analyses. It is beyond the scope of this chapter to illustrate in detail the many options for segmenting protocols; however, Chi (1997) provides a thorough discussion of different methods and levels of segmentation and their relationships with the type of analyses to be performed. In general, we have found that segmenting according to complete thought provides us with the flexibility to code data on a number of different dimensions. By “complete thought” we basically mean a clause (a subject and a verb), although given the incomplete nature of human speech, utterances do not always fall neatly into a clausal structure. However, we have had excellent agreement among transcribers by using the complete thought rubric. Table 18.1 shows an example
Table 18.1. Example of Segmenting by Complete Thought Utterance
OK, 0Z surface map shows a low pressure in the Midwest moving up there’s a little bubble high ahead of it that’s pretty much stacked well with 500 millibars with another low so that’s looking at severe weather in the Midwest, easily it’s at 700 millibars just shows increasing precipitation clouds are going to be increasing for the next 24 hours in western Ohio Oh, what level are you? 850 they’re kind of in a little bit of a bubble high so 0Z looking at low cloud cover no precipitation at all
A Primer on Verbal Protocol Analysis
339
of this segmentation scheme from the meteorology domain. As the illustration shows, the length of each utterance can vary from a single word(s) (for example, “850”) to fairly lengthy (for example, “clouds are going to be increasing for the next 24 hours in western Ohio”). In this domain, we are generally interested in the types of mental operation participants perform on the visualizations, such as reading off information, transforming information (spatially or otherwise), making comparisons, and the like, and these map well to our segmentation scheme. The utterances “it’s at 700 millibars” and “clouds are going to be increasing for the next 24 hours in western Ohio” would each be coded as one “read-off information” event, even though they differ quite a bit in length. Of course, it would be possible to subdivide longer utterances further (for example, “clouds are going to be increasing // for the next 24 hours // in western Ohio”); however, according to our coding scheme this is still one read-off information event, which would now span three utterances. Having a single event span more than one utterance makes data analysis harder. The two most important guidelines in dividing protocols are (1) consistency and (2) making sure that the segments map readily to the codes to be applied. Although transcription cannot be automated, a number of video analysis software programs are available to aid in data analysis, and some of these programs allow protocols to be transcribed directly into the software. We have not found the perfect protocol analysis program yet, though we have used MacShapa, Transana, and Noldus Observer. Looking ahead and deciding what kind of analysis software will be used will also save a great deal of time by reducing the need for further processing of transcripts in order to successfully import them to the video analysis software. Another option that we have successfully used is to transcribe protocols in a spreadsheet program, such as Excel, with each segment on a different row. We then set up columns for our different coding categories and can easily perform frequency counts, create pivot tables, and import the data into a statistical analysis program. This method works very well if the video portion of the protocol is irrelevant; however, in most cases we are interested not only in what people are saying, but what they are doing (and looking at) while they are saying it. In this case, using the spreadsheet method is disadvantageous, because the transcription cannot easily be aligned with the video. Video analysis software has the advantage that video timestamps are automatically entered at each line of transcription, thus facilitating synchronizing text and video. Once data have been transcribed and segmented, they are ready for coding. Unless the research question can be answered by some kind of linguistic coding (for example, counting the number of times a certain word or family of words is used), a coding scheme must be developed that maps to the cognitive processes of interest. Establishing and implementing an effective and reliable coding scheme lies at the heart of verbal protocol analysis, and this is usually the most difficult and time consuming part of the whole process. In some cases, researchers will have strong a priori notions about what to look for. However, verbal protocol analysis is a method that is particularly useful in exploratory research, and in these cases, researchers may approach the protocol with only a general idea.
340
Learning, Requirements, and Metrics
Regardless of the stage of research, coding schemes must frequently be elaborated or refined to match the circumstances of a given study. To illustrate this point, consider a study we performed to determine how meteorologists handle uncertainty in the forecasting task and in the visualizations they use. Our hypothesis was that when uncertain, they would use more spatial transformations than when certain. Our coding scheme for spatial transformations was already in place. However, we had several possible ways that we could code uncertainty. One option was to code it linguistically. We developed a range of linguistic pointers to uncertainty, such as markers of disfluency (um, er, and so forth), hedging words (for example, “sort of,” “maybe,” and “somewhere around”), explicit statements of uncertainty (for example, “I have no idea” and “what’s that?”), and the like. Although this scheme proved highly reliable, with very little disagreement between coders, it turned out to be less useful for the research question we were trying to answer. A forecaster could be highly uncertain overall, but because he or she had no doubt about that uncertainty, the utterances contained few, if any, linguistic markers of uncertainty. Table 18.2 shows an example of this mismatch. The participant was trying to forecast precipitation for a particular period, and the two models (ETA and Global Forecast System [GFS]) she was using disagreed. Model disagreement is a major source of uncertainty for forecasters, so we were confident that at a more “global” level, the participant was highly uncertain at this juncture. Her uncertainty was also supported by her final utterance (that the forecast was going to be hard). However, the Table 18.2. Mismatch between Linguistic and Global Coding Schemes: Uncertain Episode but Certain Utterances Utterance
Linguistic Coding of Uncertainty
No linguistic uncertainty OK, now let’s compare to the expressed in any ETA utterance . . . and they differ Interesting Well, they don’t differ too much they differ on when the precipitation’s coming Let’s go back to 36 hours OK, 42, 48, 54 Yeah, OK, so they have precip coming in 48 hours from now Let me try to go back to GFS and go to 48 hours and see what they have Well, OK, well they don’t differ Well, yeah, cause they’re calling for more so maybe this is gonna be a hard precip forecast
Global Coding of Uncertainty
Model disagreement indicates overall uncertainty (explicitly articulated in last utterance at left)
A Primer on Verbal Protocol Analysis
341
individual utterances that comprise this segment express no uncertainty on a linguistic level. Our hypothesis about the use of spatial transformations mapped to a more global concept of uncertainty than could be captured by this linguistic coding scheme. By using contextual clues (the model disagreement, the participant’s dithering between whether the models disagreed or not, and her frustration at the difficulty of the forecast), we were able to correctly code this sequence as an episode of uncertainty. Conversely, a forecaster could be overall highly certain about something, but use a lot of uncertain linguistic markers. Table 18.3 illustrates this aspect of the problem. We solved our coding problem by dividing the protocol into segments, or episodes, based on participants’ working on a particular subtask within the general forecasting task, such as figuring out the expected rainfall or temperature for a certain period, or resolving a discrepancy between two models. We then coded each episode as overall certain, uncertain, or mixed. Tables 18.2 and 18.3 illustrate uncertain and certain episodes, respectively. Using this more global coding of uncertainty, we could evaluate the use of spatial transformations within certain and uncertain episodes. This level of coding uncertainty was a much better match for our purposes, because spatial transformations occur within a broader context of information uncertainty than can be captured by mere expressions of linguistic uncertainty. Unfortunately, it is often only when the scheme is applied that its weaknesses emerge; consequently, developing a coding scheme is a highly iterative process. Generally, our procedure is to begin with the smallest possible number of codes, which are then described in rather broad terms. We then attempt to apply the coding scheme to a small portion of the data. At this stage, we usually find problems Table 18.3. Certain Episode but Uncertain Utterances (Linguistic Markers of Uncertainty in Italics) Utterance
Look at the AVN MOS . . . and they’re in agreement say it gets down to 43 degrees tonight um, scattered clouds, it may be partly cloudy uh, tomorrow 72 . . . the other model had upper 60s, lower 70s that’s about right uh, Wednesday night . . . clouds I think it said getting around, down, around 52 Thursday, 64 maybe
Linguistic Marker of Uncertainty
um, it may be uh
about right uh I think, around around maybe
Global Coding of Uncertainty
Models agree, indicating forecaster is certain about the temperature, explicitly articulated in utterance “That’s about right”
342
Learning, Requirements, and Metrics
coding specific utterances. It is important to keep very careful, detailed notes during this phase as to the nature of the coding problem. We can then look for patterns in the problematic segments, which often leads to subdividing or otherwise revising a particular code. There are two equally important goals to keep in mind in developing a coding scheme. First, the scheme must be reliable; that is, it should be sufficiently precisely delineated that different coders will agree on the codes applied to any given utterance. Second, the scheme must be useful; that is, it must be at the right “level” to answer the question of interest. The linguistic scheme for coding uncertainty described above was highly reliable, but did not answer our need. The episodic coding scheme captured the relevant level of uncertainty, but required several refinements before we were confident that it was reliable. First, we had to obtain agreement on our division of the protocols into episodes, and then we had to obtain agreement on our coding of each episode as certain, uncertain, or mixed. Obtaining agreement, or establishing inter-rater reliability, is essential in order to establish the validity of a coding scheme (see Cohen, 1960, for a discussion of establishing agreement). Once a coding scheme appears to be viable, the next step is to have two independent coders code the data and then compare their results. Obviously, coders must be well trained prior to this exercise. We establish a set of training materials, based on either a subset of the data, or, preferably, from a different dataset to which the same coding scheme can be applied. Using a different dataset reduces bias when it comes to the final coding of the data, because the coders are not prejudiced by previously having seen and worked with—and having discussed—the transcripts. The training materials consist of a set of written instructions describing each of the codes and a set of examples and nonexamples of the code. We find examples that are as clear and obvious as possible, but we also use “rogue” examples that meet some aspect of the coding criteria, but are nonetheless not illustrations of the code. These “near-misses” help define the boundaries of what does and does not count as a particular example of a code. Beside each example or nonexample, we provide an explanation of the coding decision. Once the coders believe they understand the coding scheme, they code a small subset of the data, and we meet to review the results. Once again, it is important for the coders to keep detailed notes about issues they encounter during this process. The more specific the coder can be, the more likely the fundamental cause of the coding problem can be addressed and resolved. Initially, we look simply at points of agreement and disagreement in order to get a general sense of how consistently the codes are being applied. Although it is tempting to ignore points of agreement, we find it useful to look at individual coders’ reasonings in these instances in order to make sure that agreement is as watertight as possible. Points of disagreement, of course, provide a wealth of useful information in refining and redefining the coding scheme. Often, resolution is simply a matter of providing a crisper definition of the code. Sometimes, however, resolution involves subdividing one coding category into two or collapsing two separate codes into one when the distinction proves blurry or otherwise not
A Primer on Verbal Protocol Analysis
343
useful. The purpose of this stage is to identify and resolve the coders’ uncertainties about the coding scheme in order that inter-rater reliability (IRR) might be established. In order to obtain IRR, one coder codes the entire dataset, and the second coder codes a subset of the data. How much double-coding must be done depends in some measure on the nature of the transcripts and the coding scheme. The more frequently the codes occur, the less double-coding is necessary—usually 20– 25 percent of the data is sufficient. For codes that occur rarely, a larger portion of the data must be double-coded in order to have sufficient instances of the code. If a code is infrequent, it is less likely to occur in any given subset of the data, and coders are more likely to be in agreement. However, they are agreeing that the code did not occur, which does not speak to the consistency with which they identify the code. After both coders have completed this exercise, we investigate their level of agreement. There are two approaches to establishing agreement. The first is simply to count the number of instances in which coders agreed on a given code and calculate the percent agreement. This approach, however, generally results in an inflated measure of agreement, because it does not take into account the likelihood of agreement by chance. A better measure of agreement is Cohen’s kappa (Cohen, 1960). This involves constructing a contingency table and marking the number of times the coders agreed on the code’s occurrence or nonoccurrence2 and the number of instances in which each coder said the code occurred but the other coder did not. Even if codes are scarce and the majority of instances fall into the “no-occurrence” cell, Cohen’s kappa makes the appropriate adjustment for agreement by chance. Cohen’s kappa is easily calculated as follows: κ = (Po − Pc)/(1 − Pc) Po is the proportion of observed agreement, and Pc is the proportion of agreement predicted by chance. Some researchers define poor reliability as a kappa of less than 0.4, fair reliability as 0.4 to 0.6, good reliability as 0.6 to 0.8, and excellent as greater than 0.8. We believe that kappa values lower than 0.6 indicate unacceptable agreement and that in such cases the coding scheme must undergo further revision. A low value for kappa need not be considered a deathblow to the coding scheme. Most coding schemes go through a cycle of construction, application, evaluation, and revision several times before they can be considered valid. Although the process can be tedious, it is a vital part of using verbal protocols, because of the dangers of subjectivity and researcher bias raised earlier in this chapter. It should also be noted that coding schemes will most likely never achieve perfect agreement, nor is perfect agreement necessary. When coders disagree, there are two options. First, all disagreements can be excluded from analysis. The advantage of this approach is that the researcher can be confident that the 2
Nonoccurrence is coded in addition to occurrence in order for coding to be symmetrical. Overlooking agreement about nonoccurrence can lead to a lower estimate of agreement than is warranted. This is especially an issue if instances of a code are relatively rare.
344
Learning, Requirements, and Metrics
included data are reliably coded. The disadvantage is obviously that some valuable data are lost. A second option is to attempt to resolve agreements by discussion. Often, for example, one coder may simply have misunderstood some aspect of the transcript; in such cases, the coder may decide to revise his or her original coding. If, after discussion, coders still disagree, these instances will have to be excluded from analysis. However, the overall loss of data using this method will be less than if disagreements are automatically discarded. Either method is acceptable, although it is important to specify how disagreements were handled when presenting the results of one’s research. After IRR is established and all the data are coded, the analysis proceeds as with any other research, using quantitative statistical analyses or qualitative analytical methods, depending on the research questions and goals.
PRESENTING RESEARCH RESULTS There are some specific issues in presenting research based on verbal protocol analysis. Although the use of verbal protocols has been established as a sound method and has therefore gained increased acceptance in the last few years, such research may nevertheless be difficult to publish. Some reviewers balk at the generally small sample sizes. One approach to the sample size issue is to conduct a generalizability analysis (Brennan, 1992), which can lay to rest concerns that variability in the data is due to idiosyncratic individual differences rather than to other, more systematic factors. Another concern is the possibility of subjectivity in coding the data and the consequential danger that the results will be biased by the researcher’s predilection for drawing particular conclusions. There are two ways to address this concern. The first is to take extreme care not only in establishing IRR, but also in describing how the process was conducted, making sure, for example, that a sufficient proportion of the data was double-coded and that this is reported. Specifying how coders were trained, how coders worked independently, without knowledge of each other’s codings, and how disagreements were resolved can also allay this concern. The second action is to provide very precise descriptions of the coding scheme, with examples that are crystal clear, so that the reader feels that he or she fully understands the codes and would be able to apply them. Although this level of detail consumes a great deal of space, many journals now have Web sites where supplementary material can be posted. When describing a coding scheme, good examples may be worth many thousands of words! Even with these safeguards, research using verbal protocols may still be difficult to publish, especially if it is combined with an in vivo methodology. In vivo research is naturalistic in that it involves going into the environment in which participants normally work and observing them as they perform their work. The stumbling block to publication is that the research lacks experimental control. However, the strength of the method is precisely that it is less likely to change people’s behavior, as imposing experimental controls may do. We have found that an excellent solution to this vicious cycle is to follow our in vivo studies with
A Primer on Verbal Protocol Analysis
345
controlled laboratory experiments that test the conclusions we draw from the in vivo work. In one project, we observed several expert scientists in a variety of domains as they analyzed their scientific data. Our observations suggested that they used a type of mental model we called “conceptual simulation” and that they did so especially when the hypothesis involved a great deal of uncertainty. However, our conclusions were correlational; in order to test whether the uncertainty was causally related to conceptual simulation, we conducted an experimental laboratory study in which we manipulated the level of participants’ uncertainty. In this type of follow-up study, it is especially important to design a task and materials that accurately capture the nature of the original task in order to elicit behavior that is as close as possible to that observed in the natural setting.
CONCLUSION We have found verbal protocol analysis an invaluable tool in our studies of cognitive processes in several domains, such as scientific reasoning, analogical reasoning, graph comprehension, and uncertainty. Our participants have ranged from undergraduates in a university subject pool, to journeymen meteorology students, to experts with advanced degrees and decades of experience. Although we have primarily used verbal protocol analysis as a research tool, we believe that it can be used very effectively in applied settings, such as the design, development, and evaluation of virtual environments for training, where it is important to discern participants’ cognitive processes. There are other methods of data collection that also require participants’ verbal input, such as structured and unstructured interviews, knowledge elicitation, and retrospective protocols. These methods assume that people have accurate access to their own cognitive processes; however, this is not necessarily the case. Consider, for example, the difficulty of explaining how to drive a stick-shift vehicle, compared with actually driving it. People frequently have faulty recall about their actions and motives or may describe what they ought—or were taught—to do rather than what they actually do. In addition, responses may be at a much higher level of detail than meets the researcher’s interest, or they may skip important steps in the telling that they would automatically perform in the doing. Although in some interview settings, such as semistructured interviews, the interviewer has the opportunity to probe the participant’s answers in more depth, unless the researcher already has some rather sophisticated knowledge of the task, the questions posed may not elicit the desired information. The strength of concurrent verbal protocols is that they can be considered a reflection of the actual processes people use as they perform a task. Properly done, verbal protocol analysis can provide insights into aspects of performance that might otherwise remain inside a “black box” accessible only by speculation. Although they are time consuming to collect, process, and analyze, we believe that the richness of the data provided by verbal protocols far outweighs the costs. Ultimately, as with all research, the purpose of the research and the resources available to the researcher will determine the most appropriate method; however,
346
Learning, Requirements, and Metrics
the addition of verbal protocol analysis to a researcher’s repertoire will open the door to a potentially very productive source of data that invariably yield interesting and often surprising results. REFERENCES Austin, J., & Delaney, P. F. (1998). Protocol analysis as a tool for behavior analysis. Analysis of Verbal Behavior, 15, 41–56. Brennan, R. L. (1992). Elements of generalizability theory (2nd ed.). Iowa City, IA: ACT Publications. Chi, M. T. H. (1997). Quantifying qualitative analyses of verbal data: A practical guide. The Journal of the Learning Sciences, 6(3), 271–315. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. Ericsson, K. A. (2006). Protocol analysis and expert thought: Concurrent verbalizations of thinking during experts’ performance on representative tasks. In K. A. Ericsson, N. Charness, P. J. Feltovich, & R. R. Hoffman (Eds.), The Cambridge handbook of expertise and expert performance (pp. 223–241). New York: Cambridge University Press. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (2nd ed.). Cambridge, MA: MIT Press. Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231–259. van Someren, M. W., Barnard, Y. F., & Sandberg, J. A. C. (1994). The think aloud method: A practical guide to modeling cognitive processes. London: Academic Press.
Part IX: Capturing Expertise in Complex Environments
Chapter 19
DEVELOPMENT OF SIMULATED TEAM ENVIRONMENTS FOR MEASURING TEAM COGNITION AND PERFORMANCE Jamie Gorman, Nancy Cooke, and Jasmine Duran
ASSESSING TEAMS IN SIMULATED ENVIRONMENTS From network centric warfare and intelligence analysis to emergency response and modern medical care, today’s sociotechnical systems are rife with tasks comprising teams of human and machine players interacting on many different levels while working interdependently toward a common goal. Catastrophic failures in today’s high technology sociotechnical systems are often due to factors in the social components of these systems (Weir, 2004). This highlights the need for a deeper scientific understanding of the social and team related causes of such failures. However, given the level of complexity of interactions among human and machine players in these environments, application of scientific principles is challenging from both traditional laboratory and naturalistic study perspectives. Through the parallel development of embedded measurement and scenarios highlighting interaction behaviors of interest, simulations provide a middle ground between the real world and the lab for advancing the scientific study of teams of human and machine players and, ultimately, an understanding of how to train or intervene in order to correct deficient team interaction behaviors. Teams have been defined specifically as “a distinguishable set of two or more people who interact dynamically, interdependently, and adaptively toward a common and valued goal/object/mission, who have each been assigned specific roles or functions to perform, and who have a limited life span of membership” (Salas, Dickinson, Converse, & Tannenbaum, 1992, p. 4). Team cognition is the process by which the heterogeneously skilled team members interact to reason, decide, think, plan, and act as a unit. The necessity of teams for accomplishing complex tasks and an appreciation for the uniqueness of the team as a unit have turned attention toward team research and training. Team members are generally exposed to some individual training followed by team training either using classroom approaches, such as Crew Resource Management (Salas, Wilson, Burke, &
348
Learning, Requirements, and Metrics
Wightman, 2006), and/or via exposure to the team task through multiplayer simulations or games (Cannon-Bowers, Salas, Duncan, & Halley, 1994; Salas, Bowers, & Rhodenizer, 1998). Simulations and virtual environments not only provide training solutions for teams, but also serve as testbeds for research on teams. Research conducted in these testbeds is typically more ecologically valid than the highly controlled tasks of traditional laboratory research. However, simulations and games need to be designed to exercise the targeted behavior. Further, there is little value in this technology without some means to measure and assess the constructs of interest, in this case, team performance and team cognition. Our definition of team cognition as the process by which heterogeneously skilled team members interact to reason, decide, think, plan, and act as a unit equates interaction processes to team cognition. Because process variance is unique to teams and directly related to team performance, we will emphasize process measurement (holistic) rather than the aggregate cognition of the team members (collective). In this chapter we show how simulations can be designed to optimally exercise team cognition and how measurement can be integrated within these team simulations. Team Cognition Theoretically, team cognition has been viewed either as collective cognition, which is the sum of the cognitive resources and abilities each individual brings to the task (for example, Langan-Fox, Code, & Langfield-Smith, 2000), or holistically as an emergent property of the interaction processes of team members (for example, Cooke, Gorman, & Rowe, in press). Most views do not entirely exclude the influence of collective or holistic cognition, but vary more as a matter of emphasis of one over the other. Whether a collective or holistic perspective on team cognition is taken has particular implications for how team cognition is measured and the types of interventions chosen to augment team cognition. Collective cognition may work well for homogeneous groups of individuals (for example, juries). However this perspective can be inadequate for explaining cognition in teams with highly heterogeneous team members. Alternatively, empirical evidence in the domain of heterogeneous team command and control suggests that interactions, which are the focus of the holistic perspective, are predictive of team performance, whereas collective knowledge metrics are not (Cooke et al., 2004; Cooke, Gorman, Duran, & Taylor, 2007; Cooke, Gorman, Pedersen, & Bell, 2007; Cooke, Kiekel, & Helm, 2001). Based on these findings, in this chapter we emphasize measures based on the holistic perspective of team cognition. Likewise it is important to develop simulation testbeds that are capable of exploiting team interactions for the purpose of measurement. Scope of Team Simulation Team simulation encompasses a variety of possible technologies. Table 19.1 lays out some of the possibilities in a matrix in which the simulation can be focused on the team members, the team task, or both.
Development of Simulated Team Environments for Measuring Team Cognition
349
Table 19.1. Space of Possible Training Technologies Involving Teams Real Team Members
Simulated Team Members
Real Team Task
Operational environment and on-the-job training
Synthetic agents and training by computer
Simulated Team Task
Games, training environments, and synthetic tasks
Synthetic tasks with one or more synthetic agents or opposition forces
As exemplified in Table 19.1, simulation technology can be used to reproduce the task (or aspects of the task) or team members and in some cases entire teams (for example, simulated opposition forces). The upper left quadrant reflects the case in which team training occurs in the actual context (on the job) with the actual team members. Due to demands of the domain for personnel and resource shortages, training often occurs in this “seat-of-the-pants” context. If we accept very high fidelity simulation as close to a “real” task, then this quadrant may also include live training exercises, such as red versus blue force exercises in the military and emergency response exercises for emergency management. In these exercises measurement is critical, and more is needed; however, it is difficult to stage such exercises, and thus it is difficult to conduct controlled research in real team task environments. The use of virtual environments, including simulations, therefore becomes perhaps the most viable option for meeting the challenge of balancing experimental control with ecological validity. In this chapter we do not address the upper left quadrant, but instead advocate the use of simulation in cases when adequate resources (for example, team members) exist. Most of the research and development on team simulation would fall into the lower left quadrant of Table 19.1. This chapter deals primarily with the lower left quadrant of Table 19.1. This is the case in which the task or part of the task is simulated and the actual team members interact in this context for training or research purposes. Some examples include the Air Force Research Laboratory’s F-16 four-ship simulator (Schreiber & Bennett, 2006), gaming (for example, multiplayer Internet video games such as Counter-Strike), and STEs (synthetic task environments) for research (Schiflett, Elliott, Salas, & Coovert, 2004). It is important to point out, however, that simulation of team members or entire teams is becoming increasingly prevalent. A simulated team member may interact with an actual team member in an operational setting (top right quadrant of Table 19.1). For example, a simulated team member could serve as a synthetic agent that meets the specific needs and requests of an individual (for example, searching for relevant information at the right time). The simulated team member or agent could also reproduce a source of expertise that is missing from the team. GPS (global positioning system) navigation systems in cars provide an example of this kind of simulated team member that interacts in the context of a real situation. Finally, as represented by the bottom right quadrant in Table 19.1, another increasingly common trend in simulation technology for teams is to simulate
350
Learning, Requirements, and Metrics
the task as well as one or more team members (Gluck et al., 2006). Although this chapter focuses on synthetic tasks and real team members, some of the measurement methods considered are also applicable to this last quadrant (that is, synthetic tasks with synthetic team members). Purpose of Assessing Team Cognition through Simulation Even if synthetic environments and their simulated scenarios are designed to elicit team level thinking or to exercise team level cognitive skills, this technology is of little use without comparable measurement technology. It is critical to measure team cognition in these team simulations for several reasons: (1) The results of measures of team cognition will not only provide feedback on whether we are achieving training objectives (that is, team performance is improving), but should also provide diagnostic information that can help in understanding the basis for successes or failures in this regard. (2) The assessment of team cognition can also be used to evaluate the success or failure of design or training interventions directed at improving team cognition or to compare two or more such interventions. (3) The assessment of team cognition if embedded in the task and processed in real or near real time can facilitate online monitoring of team cognition and performance and in some cases lead to real time intervention (for example, monitoring team communications may suggest that a team is losing team situation awareness and actions can be taken to mitigate this possibility). ISSUES AND CHALLENGES OF MEASUREMENT IN TEAM SIMULATION Simulated environments provide a rich context for a variety of measurement opportunities. Issues concerning measurement and data collection, such as deciding what to measure, embedding measures, and balancing scenario flexibility with experimental control, are particularly relevant when conducting research in simulated environments. These issues pose their own unique sets of challenges related to conducting research in simulated environments. What to Measure Of primary relevance to assessment are criterion measures of team performance. Performance measurement is primary because a valid and reliable performance measure provides a benchmark for a researcher to disambiguate experimental treatments, as well as to make inferences about relationships with other team cognitive variables, such as team process. Some recommendations for developing performance measures in simulations include unobtrusiveness (for example, not interrupting the task) and the collection of objective measures at the team level (holistic) versus generating a composite (collective) team member score. Further, embedding performance measures into the task prevents having to use subjective measures while providing instant performance feedback
Development of Simulated Team Environments for Measuring Team Cognition
351
to participants. Later in the chapter, we describe a team performance measure that is based on several parameters embedded in a command and control team simulation. From the holistic perspective, assessment of team process provides perhaps the most direct insight into sources of variance that are most specific to team cognition: team member interactions. A number of interaction-oriented process behaviors have been classed as team process behaviors. Using the multitraitmultimethod approach (Campbell & Fiske, 1959), Brannick, Prince, Prince, and Salas (1995) demonstrated construct validity for team process behaviors related to assertiveness, decision making, adaptability, situation awareness, leadership, and communication. Thus a variety of behaviors can be measured independently as team process, although high correlations between process dimensions may necessitate collapsing across dimensions (for example, Cooke, Gorman, Duran, et al., 2007; compare Smith-Jenstch, Johnston, & Payne, 1998). Emphasizing the holistic perspective, the most direct route to measuring any aspect of team process is through interaction/communication data. Later in the chapter we present team process measures of coordination, team situation awareness (team SA), and communication process measurement in detail.
Embedding Measures Wherever possible, embedded measures are desirable. There are several benefits to embedding measures in the simulation. Embedded measures mitigate subjective bias in post-processing. Since embedded measures occur during task performance, they do not require retrospective judgments to be made. Further, the use of embedded measurement allows for ease of data collection and processing. For example, if a researcher is interested in measuring team communication, then he or she can record team communication in such a way that timestamps are collected in real time. This saves the time and cost of having someone go through communication data to insert timestamps during post-processing.
Flexibility versus Control Often, research conducted in simulated environments is less controlled compared to laboratory experiments due to the very nature of the phenomena being studied. One may view simulated environments as the midpoint of a continuum anchored by naturalistic observation at one end and the laboratory at the other. Conventional psychological laboratories are highly controlled in order to isolate causality under very specific conditions. Although these characteristics are important for experimental validity, it often comes at the expense of ecological validity; specifically, it is difficult to study realistic team phenomena under such tightly controlled conditions. With respect to validity, simulated environments may therefore entail a need for higher ecological validity at the expense of some experimental validity. As a result, simulated team environments should strike a
352
Learning, Requirements, and Metrics
balance between control and ecological validity in order to elicit behaviors that emerge only under comparatively “real world” conditions.
STEPS TOWARD ADDRESSING ISSUES AND CHALLENGES: A CASE STUDY Uninhabited Aerial Vehicle Synthetic Task Environment (UAV-STE) A synthetic task environment for teams in the context of UAV ground control was developed for the purpose of studying team performance and cognition (Cooke & Shope, 2004). This work has been greatly influenced by the assumption that synthetic tasks provide ideal environments for cognitive engineering research on complex tasks in that they serve as a middle ground between the difficult to control naturalistic study and the highly controlled tasks typically found in the lab. The UAV-STE was designed to facilitate experimentation and to this end special attention was given to the exercising of team cognition in the context of UAV-STE scenarios and its measurement. Therefore, this simulation is a good example of the principles and methods discussed in this chapter. The UAV-STE development was based on a cognitive task analysis (Gugerty, DeBoom, Walker, & Burns, 1999) of ground control operations for the Predator at Indian Springs, Nevada (Cooke, Rivera, Shope, & Caukwell, 1999; Cooke & Shope, 2005; Cooke & Shope, 2002a, Cooke & Shope, 2002b; Cooke & Shope, 1998; Cooke, Shope, & Rivera, 2000). The UAV-STE emphasizes team aspects of the task, such as planning, replanning, decision making, and coordination. In fact, the scenarios emphasize or exercise these types of team level cognitive activities at the expense of individually oriented activities, such as takeoff and landing. In addition, it was important that team level cognition had the opportunity to occur multiple times throughout the scenario. This was accomplished by introducing multiple target waypoints as landmarks associated with team cognition during the course of any scenario run. That is, most of the team level cognitive activity takes place in the context of the target landmarks with the activity being initiated some time prior to arrival at the landmark and ending shortly after leaving the landmark. Multiple landmarks thus provide multiple opportunities to observe team cognition. The UAV-STE is a three team-member (pilot, navigator, and photographer) task in which each team member is provided with distinct, though overlapping, training; has unique, yet interdependent roles; and is presented with different and overlapping information during the mission. The overall goal is to fly the UAV to designated target areas (the landmarks) and to take acceptable photos at these areas. The pilot controls airspeed, heading, and altitude and monitors UAV systems. The photographer adjusts camera settings, takes photos, and monitors the camera equipment. The navigator oversees the mission and determines flight paths under various constraints. To successfully complete a UAV-STE mission, which is comprised of multiple targets, the team members need to share information with one another in a coordinated fashion multiple times.
Development of Simulated Team Environments for Measuring Team Cognition
353
Most communication is done via microphones and headsets, although some involves computer messaging. Measures taken include audio records, video records, digital information flow data, embedded performance measures, team process behavior measures, situation awareness measures, and a variety of individual and team knowledge measures. Hardware and software features of the UAV-STE relevant to measurement considerations include the following experimenter console features: • Video and audio recording equipment (including digital audio) for communication analysis, • Intercom and software for logging communications flow, • Embedded performance measures, • Ability to disable or insert noise in channels of communication intercom, • Easy to change start-up parameters and landmark/waypoint library that define a scenario, • Software to facilitate measurement of team process behaviors, • Software to facilitate team SA measurement, • Coordination logging software, and • Numerous possibilities for inserting team SA measurement opportunities (roadblocks) into a scenario.
Also relevant to measurement are the following participant console features: • Participant computer event logging capabilities, • Training software modules with tests, • Software modules for offline knowledge measurement (participant ratings), and • Software for administering questionnaires (participant debriefing, NASA TLX [National Aeronautics and Space Administration Task Load Index], and so forth).
The UAV-STE design features that illustrate some of the concepts described in this chapter include the following: 1. Training needs. STEs allow the trainer or researcher to take liberties in order to best exercise or study the behavior of interest. In the case of the UAV-STE, we were interested in eliciting team cognition. For that reason, it was important to minimize individual training and skill acquisition so that team members could be in a position to quickly acquire the skills needed to contribute to team cognition. In the UAVSTE, individuals train to criterion in 1.5 hours after which they are ready to begin interacting as a team. By comparison, the actual Predator interface is complex, requiring many months of individual training. To accomplish the goal of rapid individual skill acquisition, we modified the interface to simplify individual skill acquisition while preserving the functionality required for the team level cognitive tasks. Thus, part of designing the STE required that the team behaviors of interest be exercised, often at the expense of individual level fidelity. 2. Experimental control. As we have noted, the design of synthetic tasks and their scenarios entails a balancing act between ecological and experimental validity.
354
Learning, Requirements, and Metrics However, trade-offs also need to be managed between measurement and realism, or cognitive fidelity. In the case of the UAV-STE the interface was modified as mentioned previously in order to focus the task on team, rather than individual, cognition. As a result the individual operator fidelity is low; however, the focus on team aspects of the task allow for the measurement of team member interaction and team cognition. In this case a balance between measurement and fidelity involved maximizing the opportunity for measuring team level cognition at the expense of individual cognitive fidelity.
3. Performance measure. Although we were primarily interested in measuring team cognition, these measures should be validated with respect to team performance. Therefore we sought a performance measure that would allow us to evaluate team cognition-related interventions tied to variance in team performance. The team performance measure being fundamental, it was therefore important to identify a robust, reliable measure of team performance in the context of the UAV-STE. In the following sections we describe team performance and team cognition measurement in the UAV-STE in detail.
Team Performance Team performance is measured using a composite score based on the result of critical mission variables, including time each individual spent in an alarm state, time each individual spent in a warning state, the rate with which critical waypoints were acquired, and the rate with which targets were successfully photographed. Penalty points for each of these components are weighted a priori in accord with importance to the task and subtracted from a maximum score of 1,000 (see Cooke et al., 2004, for the scoring algorithm). Each individual role within a team (pilot, photographer, and navigator) also has a score based on various mission variables, including time spent in alarm or warning state, as well as variables that are unique to that role. As with team performance, penalty points for each of the components are weighted in accord with importance to the task and subtracted from a maximum score of 1,000. For example, the most important components for the pilot are time spent in alarm state and course deviations; for the navigator, they are critical waypoints missed and route planning errors; and for the photographer, duplicate good photos, time spent in an alarm state, and number of bad photos are the most important components. Like team performance, individual performance data for a role are collected within the context of each UAV-STE scenario run (that is, mission). The team performance measure has been used in several UAV-STE studies and has been modified in order to take into account workload differences in scenarios (Cooke et al., 2004). The team performance measure serves as the criterion for assessing experimental interventions, as well as validating relationships with team cognition as indexed by coordination, team SA, and communication measures.
Development of Simulated Team Environments for Measuring Team Cognition
355
Coordination Measure We define coordination as the timely pushing and pulling of information among team members. For measurement, coordination consists of the temporal relationships between team members performing different aspects of the team task, relative to a procedural model. In order to develop the measure, first a procedural model of team coordination was developed based on the standard operating procedure for taking pictures of the ground target landmarks. Essentially the standard operating procedure is a function of ordering, timing, and mode of task elements for each target. Ordering corresponds to sequential ordering of task elements for each target. Timing corresponds to the onset time (measured in seconds) of each element for each target within a mission. Mode corresponds to the nature of the element, that is, information mode versus negotiation mode versus feedback mode (Figure 19.1). The procedural model provides a blueprint for team coordination for the repetitive task of taking pictures of ground targets, and the coordination score provides a measure of variation in the application of the procedural blueprint. In the procedural model (Figure 19.1) the optimal coordination procedure for taking a picture of a ground target begins with the navigator informing the pilot of information concerning the upcoming target restrictions (task elements a through c in Table 19.2). The pilot and the photographer then negotiate the appropriate altitude and airspeed for taking the photograph through back-and-forth negotiation (task elements d through g in Table 19.2). Finally, the photographer tells the navigator and the pilot that the target has been photographed (task element h in Table 19.2) and, thus, that the UAV may continue to the next target, which starts the coordination cycle over again.
Figure 19.1. Model for Standard Operating Procedure for Photographing Uninhabited Aerial Vehicle–Synthetic Task Environment Ground Targets
356
Learning, Requirements, and Metrics
Table 19.2. Uninhabited Aerial Vehicle–Synthetic Task Environment Target Procedure Task Elements Information (I)
Negotiation (N)
Feedback (F)
a b c
Navigator tells pilot target restrictions Navigator tells pilot target radius Navigator tells pilot target name
d e f g
Photographer coordinates altitude with pilot Photographer coordinates airspeed with pilot Pilot coordinates altitude with photographer Pilot coordinates airspeed with photographer
h
Photographer acknowledges good photo
Coordination scores (κ’s; Gorman, Amazeen, Cooke, under review) were obtained by evaluating the relationship between the onset times of each task element in the procedural model at each target waypoint: κ = F − I/F − N. The timestamps for κ are collected by an experimenter monitoring team communication in real time using a coordination logger. The logger consists of one panel for each target landmark, and the timestamps for each button on the target panel correspond to one of the three procedural model task elements, information, negotiation, or feedback (Figure 19.1; Table 19.2). The dynamics of κ have provided evidence that teams that are very rigid in their coordination do not respond well to unexpected team SA roadblocks (Gorman et al., under review).
Coordinated Awareness of Situation by Teams (CAST) The coordination based measure of team SA (CAST) is taken in the context of a UAV-STE mission. During a mission, a UAV-STE experimenter introduces a “roadblock” to team coordination at a prespecified event or time. The team is not told about the roadblock. Team communications are monitored for communication relevant to overcoming the roadblock. Interactions between specific team members, in response to the roadblock, are checked off by an experimenter on a CAST scoring sheet. Three types of interactions are checked off based on firsthand perception—recognition of some aspect of the roadblock without being told by another team member, coordinated perception—being told about some aspect of the roadblock by another team member, or coordinated action—sequence of interaction that mitigates the roadblock. Additionally, whether or not the roadblock was overcome is checked off if the team coordinated around the roadblock (Figure 19.2; Gorman, Cooke, Pedersen, Connor, & DeJoode 2005). Figure 19.2 shows two CAST score sheets for a five minute communication channel glitch from navigator to pilot (but not pilot to navigator), which was introduced when teams reached a designated point during a UAV-STE mission (Kiekel, Gorman, & Cooke, 2004). The CAST scoring procedure consisted of listening to team communications around the five minute glitch and then checking appropriate boxes on the CAST score sheet. In Figure 19.2, the score sheet in
Development of Simulated Team Environments for Measuring Team Cognition
357
Figure 19.2. CAST Team SA Scoring Sheet; Panel A: Optimal Solution to Communication Glitch; Panel B: Suboptimal Solution to Communication Glitch
panel A shows a CAST result where the pilot perceived the glitch and coordinated his perceptions with the navigator: namely, that the pilot could not hear the navigator, but the navigator could hear the pilot. This led to the successful coordinated action of the navigator channeling communications to the pilot through the photographer. In Figure 19.2 the score sheet in panel B shows an instance where both the navigator and the pilot perceived the glitch and established a coordinated perception via the photographer. Subsequently the photographer was involved in the coordinated action as a bidirectional conduit of pilot-navigator communications. While both the left and right solutions overcome the glitch, one is more efficient (panel A) than the other (panel B) corresponding to better team SA overall. The solution on the left is more efficient because it accurately reflects the true state of the world (that is, a one-way, not a two-way, glitch) and requires less effort on the part of the team. In fact, the team that reached the solution shown in panel B missed a target because the photographer had dedicated himself to relaying messages back and forth and missed an opportunity for a good target picture. Communication Analysis and Real Time Assessment Interaction based measures, such as team communication measurement, are akin to a team “thinking out loud.” This is akin to a team verbal protocol analysis.
358
Learning, Requirements, and Metrics
In addition to operationalizing the behavior of interest in a holistic manner, communication based measures are amenable to embedded measurement, because team communication occurs naturally during the course of task performance. The challenge that remains is to capture and assess these naturally occurring team member interactions. The design of the UAV-STE provides facilities for not only capturing communication flow data, but capturing it in real time because team members must communicate over headsets. Communication flow (ComLog) data are generated unobtrusively in the background during UAV task performance by team members opening communication channels using push-to-talk buttons. In the UAV-STE there are nine possible communication channels across the team members, where each team member has access to three channels (for example, pilot a` photographer, pilot a` navigator, and pilot a` all). The state of communication channels, characterized as a three-by-three matrix, is sampled eight times per second in order to capture all significant team member interactions, including very brief ones (for example, acknowledgements). Current efforts are focused on reproducing the coordination metric depicted in Figure 19.1 from the ComLog data. Success in this challenging endeavor would present a significant step forward in the assessment of team cognition from relatively low level communication flow data. Specifically, the goal tied to real time communication analysis is to achieve real time automated assessment of team cognition, such that online interventions can be used to augment deficient team cognition (for example, Figure 19.2, panel B).
CONCLUSION Teams play an integral role in the operation of many of today’s high technology, sociotechnical environments. Teams are brought together by the need for multiple, heterogeneous, yet interdependent operators to control these complex systems. In addition, the underlying mechanisms of team cognition (for example, communication) are ubiquitous across a variety of these task settings. The need for a scientific understanding, and support for team cognition, is a pressing challenge for both basic and applied research. Simulated environments may help us address team cognition on both of these research fronts concurrently. Nevertheless, teams remain a challenging subject of study. With respect to possible trade-offs between ecological validity and control, team simulations that are too complex may lead to such poor control that it is difficult to replicate results, while simulations that are too highly controlled may not elicit the interaction behaviors of interest. A balance must be struck between ecological and experimental validity such that the simulated environment is as real as possible while providing a replicable research context. The issues and challenges involved in developing a simulated team task were addressed in the context of the UAV-STE case study. First, the UAV-STE is flexible enough to provide ecological validity in terms of certain aspects of UAV operations (for example, reconnaissance and monitoring flight systems) while providing experimental control (for example, training interventions) that has
Development of Simulated Team Environments for Measuring Team Cognition
359
produced replicable findings (Cooke et al., 2004). Second, the UAV-STE supports the elicitation of team cognition organized around critical landmarks (targets) while providing facilities for monitoring and measuring team interaction and team performance. Overall this case study emphasizes the need for embedded measurement tied to the simulation. Therefore, in the UAV-STE case study we outlined the scope of, and some measurement priorities for, the assessment of team cognition and performance in simulated environments. These include several issues and challenges that need to be addressed when developing a simulated team task, including the design of valid and reliable performance measures and embedding measurement in the context of the simulated environment wherever possible. Valid and reliable performance measures serve several purposes in simulated environments. First, performance measures facilitate hypothesis testing. That is, they serve as dependent variables when manipulating various independent variables, such as training interventions. Second, valid and reliable performance measures make validation of knowledge, process, and other measures possible. With respect to embedded measurement, we emphasize the need for integration of scenario design and assessment during the development of the simulated environment. That is, a scenario designed to elicit specific behaviors at specific times (for example, team coordination organized around landmarks and team SA organized around roadblocks) suggests specific events around which measurement, and therefore assessment, can be organized. Having insight into how the scenario is designed to elicit specific types of behavior at specific times gives researchers the ability to identify what to measure and when. These features of simulated environments make them amenable to the development of valid, reliable embedded real time measures of team cognition, which we see as the next most important step toward mitigating the effects of deficient team interaction on complex sociotechnical systems, before catastrophic failures can occur. ACKNOWLEDGMENTS Support for this work was provided by AFOSR Grant No. FA9550-04-1-0234, AFRL Grant No. FA8650-04-6442, and ONR Grant No. N00014-05-1-0625. REFERENCES Brannick, M. T., Prince, A., Prince, C., & Salas, E. (1995). The measurement of team process. Human Factors, 37, 641–651. Campbell, D. T. & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. Cannon-Bowers, J. A., Salas, E., Duncan, P., & Halley, E. J. (1994). Application of multimedia technology to training for knowledge-rich systems. Proceedings of the 16th Annual Interservice/Industrial Training Systems Conference (pp. 6–11). Washington, DC: National Training and Simulation Association. Cooke, N. J., DeJoode, J. A., Pedersen, H. K., Gorman, J. C., Connor, O. O., & Kiekel, P. A. (2004). The role of individual and team cognition in uninhabited air vehicle
360
Learning, Requirements, and Metrics
command-and-control (Tech. Rep. for AFOSR Grant Nos. F49620-01-1-0261 and F49620-03-1-0024). Mesa: Arizona State University East. Cooke, N. J., & Gorman, J. C. (2005). Assessment of team cognition. International encyclopedia of ergonomics and human factors (2nd ed., pp. 271–275). Boca Raton, FL: CRC Press. Cooke, N. J., Gorman, J. C., Duran, J. L., & Taylor, A. R. (2007). Team cognition in experienced command-and-control teams. Journal of Experimental Psychology: Applied, 13, 146–157. Cooke, N. J., Gorman, J., Pedersen, H., & Bell, B. (2007). Distributed mission environments: Effects of geographic distribution on team cognition, process, and performance. In S. Fiore and E. Salas (Eds.), Toward a science of distributed learning (pp. 147–167). Washington, DC: American Psychological Association. Cooke, N. J., Gorman, J. C., & Rowe, L. J. (in press). An ecological perspective on team cognition. In E. Salas, J. Goodwin, & C. S. Burke (Eds.), Team effectiveness in complex organizations: Cross-disciplinary perspectives and approaches. Mahwah, NJ: Erlbaum. Cooke, N. J., Kiekel, P. A., & Helm E. (2001). Measuring team knowledge during skill acquisition of a complex task. International Journal of Cognitive Ergonomics: Special Section on Knowledge Acquisition, 5, 297–315. Cooke, N. J., Rivera, K., Shope, S. M., & Caukwell, S. (1999). A synthetic task environment for team cognition research. Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting (pp. 303–307). Santa Monica, CA: Human Factors and Ergonomics Society. Cooke, N. J., & Shope, S. M. (1998). Facility for cognitive engineering research on team tasks (Report for Grant No. F49620-97-1-0149). Washington DC: Bolling AFB. Cooke, N. J., & Shope, S. M. (2002a). Behind the scenes. UAV Magazine, 7, 6–8. Cooke, N. J., & Shope, S. M. (2002b). The CERTT-UAV task: A synthetic task environment to facilitate team research. Proceedings of the Advanced Simulation Technologies Conference: Military, Government, and Aerospace Simulation Symposium (pp. 25–30). San Diego, CA: The Society for Modeling and Simulation International. Cooke, N. J., & Shope, S. M. (2004). Designing a synthetic task environment. In S. G. Schiflett, L. R. Elliott, E. Salas, & M. D. Coovert (Eds.), Scaled worlds: Development, validation, and application (pp. 263–278). Surrey, England: Ashgate. Cooke, N. J., & Shope, S. M. (2005). Synthetic task environments for teams: CERTTS’s UAV-STE handbook on human factors and ergonomics methods (pp. 46-1–46-6). Boca Raton, FL: CRC Press, LLC. Cooke, N. J., & Shope, S. M., & Rivera, K. (2000). Control of an uninhabited air vehicle: A synthetic task environment for teams. Proceedings of the Human Factors and Ergonomics Society 44th Annual Meeting (p. 389). Santa Monica, CA: Human Factors and Ergonomics Society. Gluck, K. A., Ball, J. T., Gunzelmann, G., Krusmark, M. A., Lyon, D. R., & Cooke, N. J. (2006, September). A prospective look at synthetic teammate for UAV applications. Invited talk for AIAA “Infotech@Aerospace” Conference on Cognitive Modeling, Arlington, VA. Gorman, J. C., Amazeen, P. G., & Cooke, N. J. (under review). Dynamics of team coordination. Manuscript submitted to Physics Letters A. Gorman, J. C., Cooke, N. J., Pedersen, H. K., Connor, O. O., & DeJoode, J. A. (2005). Coordinated awareness of situation by teams (CAST): Measuring team situation
Development of Simulated Team Environments for Measuring Team Cognition
361
awareness of a communication glitch. Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting (pp. 274–277). Santa Monica, CA: Human Factors and Ergonomics Society. Gugerty, L. DeBoom, D., Walker, R., & Burns, J. (1999). Developing a simulated uninhabited aerial vehicle (UAV) task based on cognitive task analysis: Task analysis results and preliminary simulator data. Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting (pp. 86–90). Santa Monica, CA: Human Factors and Ergonomics Society. Kiekel, P. A., Gorman, J. C., & Cooke, N. J. (2004). Measuring speech flow of co-located and distributed command and control teams during a communication channel glitch. Proceedings of the Human Factors and Ergonomics Society’s 48th Annual Meeting (pp. 683–687). Santa Monica, CA: Human Factors and Ergonomics Society. Langan-Fox, J., Code, S., & Langfield-Smith, K. (2000). Team mental models: Techniques, methods, and analytic approaches. Human Factors, 42, 242–271. Salas, E., Bowers, C. A., & Rhodenizer, L. (1998). It is not how much you have but how you use it: Toward a rational use of simulation to support aviation training. The International Journal of Aviation Psychology, 8, 197–208. Salas, E., Dickinson, T. L., Converse, S. A., & Tannenbaum, S. I. (1992). Toward an understanding of team performance and training. In R. W. Swezey & E. Salas (Eds.), Teams: Their training and performance (pp. 3–29). Norwood, NJ: Ablex. Salas, E, Wilson, K. A., Burke, C. S., & Wightman, D. C. (2006). Does CRM training work? An update, extension, and some critical needs. Human Factors, 48, 392–412. Schiflett, S. G., Elliott, L. R., Salas, E., & Coovert, M. D. (Eds.). (2004). Scaled worlds: Development, validation, and applications. Hants, England: Ashgate. Schreiber, B. T., & Bennett, W. Jr. (2006). Distributed mission operations withinsimulator training effectiveness baseline study: Summary report (Rep. No. AFRLHE-AZ-TR-2006-0015-Vol I, 1123AS03). Air Force Research Laboratory, AZ: Warfighter Readiness Research Division. Smith-Jentsch, K. Johnston, J. H., & Payne, S. C. (1998). Measuring team-related expertise in complex environments. In J. A. Cannon-Bowers & E. Salas (Eds.), Decision making under stress: Implications for individual and team training (pp. 61–87). Washington, DC: American Psychological Association. Weir, D. (2004). Catastrophic failure in complex socio-technical systems. International Journal of Nuclear Knowledge Management, 1, 120–130.
Chapter 20
AFFECTIVE MEASUREMENT OF PERFORMANCE James Driskell and Eduardo Salas The goal of this chapter is to examine the use of affective measures of performance in simulation and training. Affect has emerged as a central topic in psychology relatively recently, and some have termed this resurgence of interest an “affective revolution” in psychology. Therefore, it is informative to briefly note some historical antecedents of this resurgence of interest in affect, especially as it relates to simulation and training. Over 50 years ago, Bloom and associates attempted to develop a taxonomy of educational objectives, culminating in the publication of separate handbooks addressing the cognitive domain (Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956) and the affective domain (Krathwohl, Bloom, & Masia, 1964). Gagne (1984) also proposed multiple categories of learning outcomes, including attitudinal outcomes. Drawing on these perspectives, Kraiger, Ford, and Salas (1993) presented a comprehensive scheme for classification of learning outcomes, emphasizing cognitive, skill based, and affective learning outcomes. Even early on, it was noted that the attempt to structure the affective domain was a difficult task. Krathwohl et al. (1964) defined the affective domain as comprising those learning outcomes that “emphasize a feeling tone, an emotion, or a degree of acceptance or rejection” (p. 7) and including a large number of objectives, such as interests, attitudes, beliefs, and values. It is also useful to examine how the term “affective measures” is used in practice. In other words, when other researchers discuss affect or affective measures, how do they describe this domain? In various reports that have examined affective measures, the topics addressed include affect, emotions, and moods (Humrichouse, Chmielewski, McDade-Montez, & Watson, 2007); feelings or sentiments (Heise, 2002); beliefs (Robinson & Clore, 2002); temperament (Ilies & Judge, 2005); personality or disposition (Thoresen, Kaplan, Barsky, Warren, & de Chermont, 2003); interests and self-perceptions (Cassady, 2002); attitudes and motivational outcomes, such as self-efficacy and goal setting (Kraiger et al., 1993); and affective states, such as trust, collective orientation, and cohesiveness (Stagl, Salas, & Day, 2008). So, we can conclude that affective constructs refer to emotions, moods, beliefs, interests, dispositions, attitudes, motivational states,
Affective Measurement of Performance
363
self-perceptions, and preferences—quite a daunting list. Although some have drawn a simple distinction between cognition (thinking) and affect (feeling), others have noted that in practice, the use of the term affective is largely intuitive (Diener, Smith, & Fujita, 1995). Those who study emotions per se offer more exacting definitions of affective state (see Yik, Russell, & Barrett, 1999). However, for our purposes, it may be appropriate to simply concur with Krathwohl et al. (1964) by noting that the affective domain reflects an important but broad domain of learning and includes a host of constructs, such as attitudes, motivation, self-efficacy, and other noncognitive constructs—all which are important to determine the efficacy of simulation based training. The renewed attention to the affective domain reflects recent attempts to identify characteristics of the individual (and teams) other than cognitive ability that determine skill acquisition and learning in simulation based training environments. Traditionally, affective measures have been viewed with at least some degree of reservation regarding their value or usefulness. First, affective constructs, such as expectations, attitudes, feelings, or beliefs, are not observable. They are instead theoretical constructs, and although they are not directly observable, one may observe their indicators. However, these are critical for learning (for example, motivation to learn; Salas & Cannon-Bowers, 2001). A second area of concern relates to the individual’s access to and accuracy in reporting introspective data. For example, Robinson and Clore (2002) note that when asked to report current feelings, individuals rely on accessible episodic memories, whereas when asked to report on feelings not currently experienced (via prospective or retrospective queries), people access their beliefs about their affect rather than the affect itself. Therefore, on one hand, questions have been raised regarding the validity or accuracy of various types of affective measures. However, on the other hand, researchers argue that these types of measures are especially meaningful because they provide a window into the individual’s affective experience and their reactions to the training or simulation. Gaining this perspective on the individual’s affective state is valuable for several reasons in simulation based training. The first is to determine the effect of training on the affective state of the trainee in instances in which learning outcomes include changes in attitude, motivation, self-efficacy, and so on. This common use of affective measures provides valuable information on whether the goals of training have been met or what needs to change in order to engage and/or motivate the trainee to learn. A second way in which attention to the affective domain in training is useful is the examination of affective constructs as determinants of training effectiveness. Such affective constructs as self-efficacy or anxiety can be examined as predictors of training effectiveness and as mediators of training effects (Salas & CannonBowers, 2001).
THE AFFECTIVE DOMAIN: A TRAINING MODEL Figure 20.1 presents a framework to organize the various ways in which affective measures can inform our understanding of training effectiveness. Elaborating
364
Learning, Requirements, and Metrics
Figure 20.1.
The Affective Domain: A Training Model
the discussion in the previous paragraph, we believe that affective measures are useful in answering several different types of questions. First, affective measures can be utilized to assess training outcomes in the affective domain. Kraiger et al. (1993) noted the distinction between the processes of training effectiveness and training evaluation. Training evaluation refers to the examination of whether a training event has achieved certain learning outcomes. Affective outcomes are one type of learning outcome. Thus, the use of affective measures allows us to address questions related to training outcome and specifically whether a training intervention has accomplished or not accomplished stated learning objectives. As shown in Figure 20.1, affective outcomes that are typically examined include measures of motivation, self-efficacy, and satisfaction. Training effectiveness refers to the examination of why training does or does not accomplish its intended learning outcomes. For example, if we think in input-process-output terms, questions related to training effectiveness involve attention to input and process variables that may determine training outcome. In Figure 20.1, we draw attention to both distal and proximal determinants of training effectiveness. Affective measures can be used to assess distal antecedents of training effectiveness to measure such individual difference trait-like constructs as disposition, general self-efficacy, or goal orientation, that impact training as a more distal input variable. Furthermore, affective measures can be used to assess more proximal antecedents of training effectiveness to measure such state-like constructs as task-specific self-efficacy or state anxiety that are more proximal to performance and that may emerge during training.
Affective Measurement of Performance
365
Moreover, certain constructs, such as goal orientation, may be viewed as both a distal individual difference trait-like variable (at the input level) and as a proximal state-like variable that emerges during training (at the process level). In sum, we believe affective measures may be usefully examined as distal measures (measures of input variables, such as individual differences), proximal measures (measures of training process), and outcome measures (measures of training outcome). In the following, we discuss the use of affective constructs within each of these three categories.
AFFECTIVE CONSTRUCTS AS DISTAL ANTECEDENTS In addressing training effectiveness—why training works—there is considerable practical value in understanding the role that individual differences play in predicting training outcomes. Most individual difference models focus on differences in cognitive ability, and there is good reason for this. Research indicates that predictions of training success from cognitive measures are consistent and positive; for the most part, higher ability individuals perform better in training —they learn more and faster. However, individuals differ not only in ability, but also in terms of achievement orientation, conscientiousness, and ambition— noncognitive factors that are likely to affect training outcome. Yet, Barrick and Mount (1991) have noted that “very little research has investigated the relation of individual measures of personality to measures of training readiness and training success” (p. 22). There are at least two reasons for this. The first concerns the conceptual foundation of personality and its early uses. Historically, personality theory and research emphasized psychopathological and neuropsychic conceptualizations of personality structure. Personality was equated with psychopathology, and personality measurement was used to assess some underlying set of neurotic structures governing behavior. There was considerably less interest in understanding task performance in normal populations, and specifically in identifying desirable characteristics that define effective performance (Driskell, Hogan, & Salas, 1987). A second reason that personality has made relatively little contribution to the examination of training performance is methodological in nature. Until recently, personality psychologists failed to reach any consensus regarding how to define personality and accordingly how it should be measured. Every theory of personality provided its own set of variables or constructs and its own measurement procedures. However, beginning with Fiske (1949), Tupes and Christal (1961), and Norman (1963), personality researchers have converged on five broad dimensions that constitute normal personality. The multitude of personality descriptors identified in previous literature can be expressed in terms of these five factors, often termed the “Big Five.” This development is significant for several reasons. First, research has accumulated that supports the robustness of this five-factor model across different populations and settings. Second, this model establishes a common vocabulary for
366
Learning, Requirements, and Metrics
both describing and measuring personality and can serve as a useful taxonomy for classifying personality. Third, research suggests that these factors are relatively independent of measures of cognitive ability (McCrae & Costa, 1987), and thus these factors may contribute a unique variance to the prediction of training performance. Although there is some divergence on how personality traits should be labeled and organized, personality theorists are in general agreement on the nature of the structure of personality. Most theorists propose a hierarchical model of personality, with broad, higher order factors or traits that subsume and organize more specific lower level facets (compare Saucier & Ostendorf, 1999). For example, the Big Five factor model represents a broad set of traits that are themselves a collection of many facets that have something in common. Whereas the broad, higher level constructs offer an efficient and parsimonious way of describing personality, the more specific facets can offer higher fidelity of trait descriptions and greater predictive validity (Saucier & Ostendorf, 1999; Stewart, 1999). The higher level Big Five dimensions include the following: Neuroticism. This trait is also termed emotional stability or adjustment and refers to a lack of anxiety and nervous tendencies. Those who are emotionally stable tend to be well-adjusted, calm, secure, and self-confident. Viewed from the negative pole of neuroticism, those who score low on this trait tend to be moody, anxious, paranoid, nervous, insecure, depressed, and high-strung (Barrick & Mount, 2001). Adjustment has been defined by Hogan (1986) as freedom from anxiety, depression, and somatic complaints. Watson, Clark, and Tellegen (1988) have viewed lack of adjustment as negative affect, a general dimension of subjective distress and unpleasurable engagement. Hogan and Hogan (1989) found that adjustment was a significant predictor of success in naval explosive ordance disposal training. Extraversion. The higher level Big Five trait of extraversion has been viewed as a combination of assertiveness/dominance and sociability/affiliation. Some theorists view dominance as the primary marker of extraversion, and some view sociability as the primary component of extraversion (Hough, 1992; Saucier & Ostendorf, 1999). The dominance component has also been referred to as ascendance, assertiveness, or surgency (Costa & McCrae, 1992), and high scores reflect those who are active, outgoing, and gregarious. The sociability component describes those who are sociable, friendly, interested in social interaction, and interpersonally adept. Persons low on sociability are withdrawn, reserved, aloof, and prefer solitary tasks to social interactions in which they are less comfortable. In their meta-analysis, Barrick and Mount (1991) found that extraversion was positively related to training proficiency and noted that this relationship likely stemmed from the tendency for those who excelled in training to be both more active/outgoing and more sociable. Of course, this relationship would be expected to be stronger in training situations that require more collaboration and weaker in training situations that do not involve social interaction. Openness. This dimension has been termed openness to experience, intellect, or intellectance and reflects intellectual, cultural, or creative interests. From the
Affective Measurement of Performance
367
positive pole, openness refers to a preference for intellectual curiosity and interest in new ideas and experiences. McRae and Costa (1997) claimed that, from the negative pole, the trait of openness is related to rigidity in behavior and unwillingness to accept change. Barrick and Mount (1991) noted that those who score high on openness are likely to have more positive attitudes toward learning and are more willing to engage in training experiences. In fact, Barrick and Mount (1991) and Hough, Eaton, Dunnette, Kamp, and McCloy (1990) found that openness was positively related to training performance. Gully, Payne, Koles, and Whiteman (2002) reported a positive relationship between openness and training outcome, stating that those high on this characteristic are likely to excel in training environments because they are more curious and imaginative and willing to engage in new approaches to learning. Driskell, Hogan, Salas, and Hoskin (1994) found that the intellectance and ambition scales from the Hogan Personality Inventory predicted training performance in naval electronics training. They also reported that personality variables provided incremental prediction of training success above that provided by cognitive predictors alone, and further that personality predicted other training difficulties, such as nonacademic infractions, that were not predicted by cognitive measures. Agreeableness. The trait of agreeableness is defined by such terms such kindness, trust, and warmth versus selfishness, distrust, and hostility. Persons high on agreeableness are considerate, honest, helpful, and supportive. Persons low on agreeableness are uncaring, intolerant, unsympathetic, and critical. Some researchers have claimed that agreeableness may be the best primary predictor of performance in interpersonal settings (Mount, Barrick, & Stewart, 1998). Thus, agreeableness seems to have high predictive validity for tasks that involve cooperation and that involve smooth relations with others. In a study of Australian Air Force trainees, Sutherland and Watkins (1997) found that agreeableness, conscientiousness, and neuroticism all made significant contributions to predicting training performance beyond that accounted for by cognitive ability. Conscientiousness. The Big Five trait of conscientiousness is comprised of two primary components. Moon (2001) has noted that some researchers emphasize the achievement orientation component of conscientiousness (that conscientious persons persevere and are motivated to achieve), whereas others have viewed conscientiousness in terms of responsibility/dependability (that conscientious persons are dependable, reliable, responsible, and trustworthy). Dependability refers to a tendency toward planfulness and discipline in carrying out tasks to completion. Those high in dependability are responsible, organized, planful, careful, and trustworthy. Those low in dependability are irresponsible, disordered, and impulsive. The achievement component of the Big Five trait of conscientiousness refers to the desire to work hard to achieve goals and to master difficult tasks. Those who score high on this trait set challenging goals, work hard to achieve these goals, and persist in the face of hardships rather than give up or quit. Those who score low on this trait avoid difficult or challenging tasks, work only as hard as necessary, and give up when faced with difficult obstacles.
368
Learning, Requirements, and Metrics
Although the Big Five approach is the most current and influential formulation of individual differences in personality, it is certainly not the only useful model, nor does it capture all there is to say about personality. Although Saucier and Goldberg (1998) argued that virtually all facets of personality fall within the Big Five factor space, Paunonen and Jackson (2000) have claimed that there are a number of traits not represented within the Big Five models. We concur with McAdams and Pals (2006), who note that although the Big Five model is arguably the most recognizable contribution personality psychology has to offer, understanding personality in a more finely grained sense requires going beyond the personality trait concept to include other motivational, social-cognitive, and developmental concerns. AFFECTIVE CONSTRUCTS AS PROXIMAL ANTECEDENTS We have noted that from a dispositional perspective, there are several traits that directly predict learning outcome. However, this brief overview masks the more complex relationships among trait-like individual differences, state-like individual differences, and training outcomes. For example, DeShon and Gillespie (2005) note that such a construct as goal orientation has been viewed as a relatively stable individual difference trait and as a more malleable quasi-trait, among other definitions. These authors feel that this conceptual confusion provides an unstable foundation for understanding this construct and note that “the literature on this construct is in disarray” (p. 1096). However, this may also simply reflect the fact that a number of such variables may be viewed as having a more distal impact on training outcome, as well as having a more proximal impact on training outcome, or as Payne, Youngcourt, and Beaubien (2007) note, may exist as both a trait and a state. Furthermore, Colquitt and Simmering (1998) argue that such variables as conscientiousness and goal orientation may serve as distal variables that influence training through more proximal mechanisms such as motivation to learn. Goal orientation may take two forms: (a) a learning orientation characterized by the desire to increase competence by developing new skills and (b) a performance orientation characterized by a desire to gain success and meet standards in a task setting. Further, Colquitt and Simmering note that this construct has both trait and state properties. That is, individuals may have dispositional goal orientations that we may view as input factors that they bring with them to the situation, as well as state goal orientations that are impacted by situational and training variables. Chen, Gully, Whiteman, and Kilcullen (2000) also propose a more comprehensive training model in which distal individual differences serve as predictors of proximal motivational processes and performance. They present a model in which trait-like constructs that are distal from performance (such as general self-efficacy and goal orientation) impact state-like constructs that are proximal to performance (such as state self-efficacy and state anxiety) to determine training outcomes. They further note that the primary value of such trait-like
Affective Measurement of Performance
369
constructs as general self-efficacy stems from its ability to predict state-like constructs rather than directly influencing training outcome. Payne et al. (2007) present a similar overarching model that views goal orientation as a “compound” trait that is composed of various facets of the Big Five, including achievement, self-esteem, and general self-efficacy. Goal orientation, viewed as one’s dispositional goal preferences, influences proximal outcomes, such as state goal orientation, state self-efficacy, and state anxiety, which in turn impact more distal consequences, such as learning or training outcomes. In their research, they identified three dimensions of goal orientation, including (a) learning goal orientation, or LGO, (b) prove performance goal orientation, or PPGO, defined as the desire to prove one’s competence and gain favorable judgments, and (c) avoid performance goal orientation, or APGO, defined as the desire to avoid the disapproving of one’s competence and avoid negative judgments. They found that LGO was positively related to learning outcomes, APGO was negatively related to learning outcomes, and PPGO was unrelated to learning outcomes. Moreover, they found that LGO was most strongly predicted by high openness and conscientiousness, whereas APGO and PPGO were both associated with low emotional stability.
AFFECTIVE CONSTRUCTS AS MEASURES OF LEARNING OUTCOME As noted, learning is a dynamic, multidimensional and multilevel phenomenon. Learning is the desired outcome of any simulation based training, so careful consideration needs to be given to its assessment and measurement. The “triangulation” of different kinds of measures is needed. The simulation based training field has paid much attention to skills, behaviors, and cognitive actions (what trainees “do” and “think”) taken by trainees during simulations, but little to affective measures (what trainees “feel”). We argue that these could be as diagnostic about learning as the more skill based or cognitively based measures. The training effectiveness field has moved into that direction—the deeper understanding of the role affective constructs have in learning and skill acquisition. The simulation based training community could benefit by incorporating the findings of emergent research in this domain. We illustrate some of that next. Turning now to the question of training evaluation (that is, what types of learning outcomes are achieved by training), there have been several models of training evaluation that address affective measures. Affective measures are examined as a training outcome measure, at least in a basic sense, in Kirkpatricks’s (1976) model as training reaction criteria. Reaction measures represent the trainee’s subjective evaluation of satisfaction with the training experience. Although such reaction measures are often viewed as a narrow and somewhat superficial way to assess training outcome, Sitzmann, Brown, Casper, Ely, and Zimmerman (2008) reported results of a meta-analysis indicating that trainee reactions predicted cognitive learning outcomes of declarative and procedural knowledge, as well as predicted pre-to-post-training changes in motivation and self-efficacy.
370
Learning, Requirements, and Metrics
Although these results provide evidence counter to the prevailing notion that trainee reaction measures are not useful, Kirkpatricks’s model was limited in that it did not consider affect other than as satisfaction with training, and it conceptualized trainee reactions as distinct from learning outcomes. Kraiger et al. (1993) have presented a more comprehensive model of training that views learning outcomes as multidimensional and includes (a) cognitive, (b) skill based, and (c) affective outcomes. This construct-oriented approach defines several key affective learning outcomes, including attitudinal outcomes and motivational outcomes, such as disposition, self-efficacy, and goal setting. According to this perspective, affective measures can be viewed as specific goals or objectives of training and simulation interventions. That is, according to this model, affective measures can be viewed as learning outcomes, in addition to being viewed as distal or proximal determinants of learning. Kraiger et al. (1993) described several types of affective learning outcomes, including the development of such attitudes as safety and changes in such motivational outcomes as goal orientation, self-efficacy, and goal setting. We will briefly describe two areas in which the emphasis on affective outcome measures is particularly salient, stress training and team training. Stress exposure training (SET) is a simulation based approach to mitigating negative stress effects that has been developed for military training applications (see Driskell, Salas, Johnston, & Wollert, 2008; Driskell, Salas, & Johnston, 2006). Extensive laboratory research has documented the effectiveness of the SET training approach in reducing stress effects and enhancing performance (Inzana, Driskell, Salas, & Johnston, 1996; Saunders, Driskell, Johnston, & Salas, 1996). SET incorporates three stages or phases of training: (a) information provision, an initial training stage in which information is provided to the trainee regarding stress, stress symptoms, and likely stress effects in the performance setting; (b) skills acquisition, in which specific skills required to maintain effective performance in a stress environment are taught and practiced; and (c) application and practice, the final stage of application and practice of these skills under simulated conditions that increasingly approximate the criterion environment. Johnston and Cannon-Bowers (1996) defined two specific types of affective training outcomes in the SET training model: a decrease in anxiety and an increase in performance confidence. One primary objective of stress exposure training is a reduction in anxiety. The construct of anxiety has most often been operationally defined in terms of selfreport responses. These self-reports have typically taken the form of Likerttype, or rating, scales or adjective checklists variously labeled as anxiety, tension, or arousal. Although these scales and checklists may be variously labeled, they all require self-report on highly similar sets of items (for example, uneasy, anxious, restless, tense, aroused, and nervous). These items tend to be of roughly equivalent emotionality ratings and to share relatively high free-association frequency (for example, John, 1988), which supports the notion that these various self-report indexes are tapping into a common underlying construct of arousal/ anxiety. A second primary objective of stress exposure training is an increase in
Affective Measurement of Performance
371
confidence or self-efficacy. Self-efficacy refers to the belief in one’s capacity to perform successfully in a range of task situations (Chen et al., 2000). Selfefficacy is related to perceptions of confidence, capability, mastery, and control. Bennett, Alliger, Eddy, and Tannenbaum (2003) reported that the predictive power of measures of confidence was remarkably strong (r’s of 0.68 and 0.86) in evaluating the effectiveness of two military training programs. A substantial body of research has accumulated in recent years on team performance and team training (see Salas, Nichols, & Driskell, 2007; Stagl et al., 2008). Although team performance outcomes are multifaceted, considerable recent emphasis has been placed on affective measures of team functioning. Salas, Sims, and Burke (2005) have described the Big Five components of teamwork, focusing attention on the importance of measures of leadership, adaptability, mutual performance monitoring, backup behavior, and team orientation. Stagl et al. (2008) have described team learning outcomes related to trust, collective orientation, collective efficacy, and cohesion. For example, Driskell and Salas (1992) found that collective orientation, the extent to which team members attend to one another’s task inputs, was a critical factor in effective team performance, noting that collectively oriented team members “benefit from the advantages of teamwork, such as the opportunity to pool resources and correct errors—factors that make teamwork effective” (p. 285). Driskell, Goodwin, Salas, and O’Shea (2006) posed the question “What makes a good team player?” and suggested that team training interventions focus on learning outcomes, such as cooperation, flexibility, responsibility, and cohesiveness. Moreover, attention to affective outcomes in teams may be particularly important in virtual environments given that the technological mediation of team interaction can impact such affective processes as cohesion and trust (Driskell, Radtke, & Salas, 2003).
CONCLUDING REMARKS We conclude this brief overview of the use of affective measures of performance in simulation based training by emphasizing three points. First, we submit that affective constructs can be usefully examined as distal measures (measures of input variables, such as individual differences), proximal measures (measures of training process), and outcome measures (measures of training outcome). Second, although Humrichouse et al. (2007) noted that the basic issues in assessing affect are no different from those involved in assessing any psychological construct, there are critical concerns of validity and reliability that must be addressed by the researcher. Finally, we laud the new generation of training models that take a multidimensional view of learning as incorporating cognitive, behavioral, and affective outcomes. The science of simulation based training is evolving and maturing at a rapid pace. And as we learn more about how, when, and what to measure individuals and teams during training, the greater the benefit of simulation based training to such complex settings as health care, aviation, and the military—where peoples’ lives depend on effective skill and cognitive and affective performance. We hope
372
Learning, Requirements, and Metrics
this chapter motivates more research into the diagnostic value of affective measures. REFERENCES Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 1–26. Barrick, M. R., & Mount, M. K. (2001). Select on conscientiousness and emotional stability. In E. A. Locke (Ed.), Handbook of principles of organizational behavior (pp. 15– 28). Malden, MA: Blackwell. Bennett, W., Alliger, G. M., Eddy, E. R., & Tannenbuam, S. I. (2003). Expanding the training evaluation criterion space: Cross aircraft convergence and lessons learned from evaluation of the Air Force Mission Ready Technician program. Military Psychology, 15, 59–76. Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives, Handbook I: Cognitive domain. New York: David McKay. Cassady, J. (2002). Learner outcomes in the affective domain. In J. Johnston & L. Baker (Eds.), Assessing the impact of technology in teaching and learning (pp. 35–65). Ann Arbor, MI: Institute for Social Research, University of Michigan. Chen, G., Gully, S. M., Whiteman, J., & Kilcullen, R. N. (2000). Examination of relationships among trait-like individual differences, state-like individual differences, and learning performance. Journal of Applied Psychology, 85, 835–847. Colquitt, J. A., & Simmering, M. J. (1998). Conscientiousness, goal orientation, and motivation to learn during the learning process: A longitudinal study. Journal of Applied Psychology, 83, 654–665. Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and Five Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment Resources. DeShon, R. P., & Gillespie, J. Z. (2005). A motivated action theory account of goal orientation. Journal of Applied Psychology, 90, 1096–1127. Diener, E., Smith, H., & Fujita, F. (1995). The personality structure of affect. Journal of Personality and Social Psychology, 69, 130–141. Driskell, J. E., Goodwin, G. F., Salas, E., & O’Shea, P. G. (2006). What makes a good team player? Personality and team effectiveness. Group Dynamics, 10, 249–271. Driskell, J. E., Hogan, R., & Salas, E. (1987). Personality and group performance. In C. Hendrick (Ed.), Review of Personality and Social Psychology (Vol. 9, pp. 91– 112). Newbury Park, CA: Sage. Driskell, J. E., Hogan, J., Salas, E., & Hoskin, B. (1994). Cognitive and personality predictors of training performance. Military Psychology, 6, 31–46. Driskell, J. E., Radtke, P. H., & Salas, E. (2003). Virtual teams: Effects of technological mediation on team performance. Group Dynamics, 7, 297–323. Driskell, J. E., & Salas, E. (1992). Collective behavior and team performance. Human Factors, 34, 277–288. Driskell, J. E., Salas, E., & Johnston, J. (2006). Decision-making and performance under stress. In T. W. Britt, C. A. Castro, & A. B. Adler (Eds.), Military Life: The psychology of serving in peace and combat: Vol. Military performance (pp. 128–154). Westport, CT: Praeger.
Affective Measurement of Performance
373
Driskell, J. E., Salas, E., Johnston, J. H., & Wollert, T. N. (2008). Stress exposure training: An event-based approach. In P. A. Hancock & J. L. Szalma (Eds.), Performance under stress (pp. 271–286). London: Ashgate. Fiske, D. W. (1949). Consistency of the factorial structures of personality ratings from different sources. Journal of Abnormal and Social Psychology, 44, 329–344. Gagne, R. M. (1984). Learning outcomes and their effects: Useful categories of human performance. American Psychologist, 39, 377–385. Gully, S. M., Payne, S. C., Koles, K., & Whiteman, J. (2002). The impact of error training and individual differences on training outcomes: An attribute-treatment interaction perspective. Journal of Applied Psychology, 87, 143–155. Heise, D. R. (2002). Understanding social interaction with affect control theory. In J. Berger & M. Zelditch (Eds.), New directions in sociological theory (pp. 17–40). Boulder, CO: Rowman & Littlefield. Hogan, J., & Hogan, R. (1989). Noncognitive predictors of performance during explosive ordinance disposal training. Military Psychology, 1, 117–133. Hogan, R. (1986). Hogan personality inventory. Minneapolis, MN: National Computer Systems. Hough, L. M. (1992). The “Big Five” personality variables: Construct confusionDescription versus prediction. Human Performance, 5, 139–155. Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-related validities of personality constructs and the effect of response distortion on those validities. Journal of Applied Psychology, 75, 581–595. Humrichouse, J., Chmeilewski, M., McDade-Montez, E., & Watson, D. (2007). Affect assessment through self-report methods. In. J. Rottenberg & S. Johnson (Eds.), Emotion and psychopathology: Bridging affective and clinical science (pp. 13–34). Washington, DC: American Psychological Association. Ilies, R., & Judge, T. A. (2005). Goal regulation across time: The effects of feedback and affect. Journal of Applied Psychology, 90, 453–467. Inzana, C. M., Driskell, J. E., Salas, E., & Johnston, J. (1996). Effects of preparatory information on enhancing performance under stress. Journal of Applied Psychology, 81, 429–435. John, C. H. (1988). Emotionality ratings and free association norms of 240 emotional and nonemotional words. Cognition and Emotion, 2, 49–70. Johnston, J. H., & Cannon-Bowers, J. A. (1996). Training for stress exposure. In J. E. Driskell & E. Salas (Eds.), Stress and human performance (pp. 223–256). Mahwah, NJ: Erlbaum. Kirkpatrick, D. L. (1976). Evaluation of training. In. R. L. Craig (Ed.), Training and development handbook: A guide to human resource development (2nd. ed., pp. 18-1–18-27). New York: McGraw-Hill. Kraiger, K., Ford, J. K., & Salas, E. (1993). Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation [Monograph]. Journal of Applied Psychology, 78, 311–328. Krathwohl, D. R., Bloom, B. S., & Masia, B. B. (1964). Taxonomy of educational objectives, Handbook II: Affective domain. New York: David McKay. McAdams, D. P., & Pals, J. L. (2006). A new Big Five: Fundamental principles for an integrative science of personality. American Psychologist, 61, 204–217.
374
Learning, Requirements, and Metrics
McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of Personality and Social Psychology, 52, 81–90. McCrae, R. R., & Costa, P. T. (1997). Conceptions and correlates of openness to experience. In R. Hogan, J. Johnson, & S. Briggs (Eds.), Handbook of personality psychology (pp. 825–847). San Diego, CA: Academic Press. Moon, H. (2001). The two faces of conscientiousness: Duty and achievement-striving within escalation of commitment dilemmas. Journal of Applied Psychology, 86, 533–540. Mount, M. K., Barrick, M. R., & Stewart, G. L. (1998). Five-factor model of personality and performance in jobs involving interpersonal interactions. Human Performance, 11, 145–165. Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. Journal of Abnormal and Social Psychology, 66, 574–583. Paunonen, S. V., & Jackson, D. N. (2000). What is beyond the Big Five? Plenty! Journal of Personality, 68, 821–835. Payne, S. C., Youngcourt, S. S., & Beaubien, J. M. (2007). A meta-analytic examination of the goal orientation nomological net. Journal of Applied Psychology, 92, 128–150. Robinson, M. D., & Clore, G. L. (2002). Belief and feeling: Evidence for an accessibility model of emotional self-report. Psychological Bulletin, 128, 934–960. Salas, E., & Cannon-Bowers, J. A. (2001). The science of training: A decade of progress. Annual Review of Psychology, 52, 471–499. Salas, E., Nichols, D., & Driskell, J. E. (2007). Testing three team training strategies in intact teams: A meta-analysis. Small Group Research, 38, 471–488. Salas, E., Sims, D. E., & Burke, C. S. (2005). Is there a “Big Five” in teamwork? Small Group Research, 36, 555–599. Saucier, G., & Goldberg, L. R. (1998). What is beyond the Big Five? Journal of Personality, 66, 495–524. Saucier, G. & Ostendorf, F. (1999). Hierarchical subcomponents of the Big Five personality factors: A cross-cultural replication. Journal of Personality and Social Psychology, 76, 613–627. Saunders, T., Driskell, J. E., Johnston, J., & Salas, E. (1996). The effect of stress inoculation training on anxiety and performance. Journal of Occupational Health Psychology, 1, 170–186. Sitzmann, T., Brown, K. G., Casper, W. J., Ely, K., & Zimmerman, R. D. (2008). A review and meta-analysis of the nomological network of trainee reactions. Journal of Applied Psychology, 93, 280–295. Stagl, K.C., Salas, E., & Day, D.V. (2008). Assessment of team learning outcomes: Improving team learning and performance. In V. I. Sessa & M. London (Eds.), Work Group Learning (pp. 369–392). Mahwah, NJ: Erlbaum. Stewart, G. L. (1999). Trait bandwidth and stages of job performance: Assessing differential effects of conscientiousness and its subtraits. Journal of Applied Psychology, 84, 959–968. Sutherland, L., & Watkins, J. (1997). The role of personality in training performance in two military samples. Paper presented at the International Military Testing Association Conference, Sydney, Australia.
Affective Measurement of Performance
375
Thoresen, C. J., Kaplan, S. A., Barsky, A. P., Warren, C. R., & de Chermont, K. (2003). The affective underpinnings of job perceptions and attitudes: A meta-analytic review and integration. Psychological Bulletin, 129, 914–945. Tupes, E. C., & Christal, R. E. (1961). Recurrent personality factors based on trait ratings (Rep. No. ASD-TR-61-97). San Antonio, TX: Personnel Laboratory USAF, Lakeland Air Force Base. Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54, 1063–1070. Yik, M., Russell, J., & Barrett, L. (1999). Structure of self-reported current affect: Integration and beyond. Journal of Personality and Social Psychology, 77, 600–619.
Chapter 21
PROVIDING TIMELY ASSISTANCE: TEMPORAL MEASUREMENT GUIDELINES FOR THE STUDY OF VIRTUAL TEAMS Susan Mohammed and Yang Zhang Because the time based features of distributed teams have generally been treated as an afterthought, we argue that temporal dynamics should be brought into the forefront of virtual team research. Summarized in the form of 10 temporal measurement guidelines, we address the key issues of what, how frequently, and when to measure in a virtual team context. Through adopting a multilevel approach and highlighting several exciting opportunities for future research, we hope to stimulate a more systematic and comprehensive approach to capturing temporal dimensions in distributed team studies. A common feature in many organizations is increasing levels of team virtuality, in which members utilize technology to work interdependently across locational, temporal, and relational boundaries (Martins, Gilson, & Maynard, 2004). In response, there has been a proliferation of research on virtual teams in the last decade, and a recent monograph on team effectiveness identified geographically dispersed teams as one of two emerging trends most likely to affect critical team processes (Kozlowski & Ilgen, 2006). Despite this burgeoning literature, however, “there is little theory and comparatively few deliberate studies of the effects of temporal dimensions on computer-mediated communication. Yet, temporal effects are crucial” (Walther, 2002, p. 251). Therefore, adopting a multilevel approach, this chapter offers 10 temporal measurement guidelines that address the key issues of what, how frequently, and when to measure in a distributed team context. Virtual teams are characterized by distinct temporal qualities, including differences in working hours and time zones, the choice of synchronous (same time) or asynchronous (different time) technologies, and increased time pressure resulting from the rigid time limits imposed on temporary teams. Despite the multifaceted time based qualities of distributed teams, however, temporal dynamics have not been comprehensively incorporated into the conceptualization or measurement
Providing Timely Assistance: Temporal Measurement Guidelines
377
of virtuality. For example, although temporal boundaries are discussed by a small subset of researchers (for example, Espinosa, Cummings, Wilson, & Pearce, 2003; Martins et al., 2004), electronic dependence and geographic dispersion have received the most emphasis in defining virtuality (Gibson & Gibbs, 2006). Nevertheless, the distinction between teams that are distributed across space and time versus those that are distributed across space, but collocated in time, can be substantive (Bell & Kozlowski, 2002), although both would be subsumed under the general heading of “virtual.” Likewise, teams differ significantly with respect to lifecycle, history of interaction, and the timing and frequency of face-to-face (FTF) interactions, even though the current conceptualization of virtuality is not nuanced enough to take these temporal dimensions into account. Failure to recognize the critical role of time in virtual team functioning could cause root problems to go undiagnosed or be misattributed to other factors. For example, unattended synchronous meetings, unexpected delays, or missed deadlines may be interpreted as a lack of team dedication when the real source may be confusion resulting from lean communication media, diverse time based orientations, and/or time zone differences. Therefore, in a virtual context, temporal factors should be taken into account in determining what and how to measure. Below, time based features are identified and phrased as questions that researchers should ask as they embark on virtual team studies. Adopting a multilevel approach, this chapter starts with the individual level of analysis and addresses temporal characteristics (time based individual differences and cultural background). At the team level, the discussion includes the team temporal mindset (prior history and life span), time based implications of technology use (synchronous and asynchronous), and temporal process mechanisms (coordination and leadership). At the macrolevel, the focus is on the role of the temporal context (time zone differences, deadlines, and time pressure). After addressing what to measure, attention shifts to how frequently and when to measure. Ten guidelines for measurement are discussed in the text and summarized in Table 21.1. WHAT TO MEASURE: TEMPORAL DIMENSIONS OF VIRTUAL TEAMS To What Extent Do Team Members Differ on Temporal Characteristics? What is the configuration of time based individual differences in the team? Because they are so deeply ingrained, time based characteristics have been recognized as one of the fundamental parameters of individual differences (Bluedorn & Denhardt, 1988). For example, time urgency relates to the need to have control over deadlines, as well as the feeling of being driven and chronically hurried (for example, Conte, Mathieu, & Landy, 1998). Time perspective refers to the tendency to be past, present, or future oriented (for example, Zimbardo & Boyd, 1999). Member diversity on time based individual differences may have potent influences in virtual teams. For example, because time-urgent members are
378
Learning, Requirements, and Metrics
Table 21.1. Summary of Temporal Measurement Guidelines for the Study of Virtual Teams Measurement Focus
Level of Analysis Temporal Measurement Guideline
What to Measure Temporal Characteristics
Individual
1. Recognize that diversity of time based individual differences (for example, time urgency and time perspective) can exert considerable influence on virtual team processes and outcomes. 2. Recognize that time is defined differently, depending on the cultural background of team members and that these time based differences may significantly affect group processes and performance.
Team Temporal Mindset
Team
3. Assess and report the length of time the group has been together prior to the measurement of study variables, including whether this contact has been FTF or distributed in nature. 4. Ensure alignment between the life span of virtual teams and the nature of variables to be investigated (for example, use temporary virtual teams for constructs that emerge quickly and ongoing teams for processes that evolve over time).
Technology Use
Team
Temporal Process Team Mechanisms
5. Measure the frequency of asynchronous and synchronous media use in virtual teams, as well as the match between the technology and the task type (for example, asynchronous communication for less complex tasks and synchronous communication for more complex tasks). Also consider assessing the timing and rhythm of FTF meetings in ongoing teams. 6. Explore temporal process mechanisms (for example, temporal coordination and leadership) as moderators in virtual team studies because they may ameliorate many of the problems caused by differing time zones, variability on time based individual differences and cultural backgrounds, as well as heavy reliance on asynchronous communication.
Providing Timely Assistance: Temporal Measurement Guidelines Temporal Context
Macro
379
7. Account for the type and degree of time separation within teams, as well as the mechanisms employed to handle time zone differences. 8. Expand performance measurement (beyond quality and quantity) to include the timeliness of work completion and whether external deadlines have been met. Consider perceptions of time pressure as additional temporal measures.
How Frequently to Measure Virtual Teams over Time
Various
9. Make every effort to measure constructs over time, taking theoretical (for example, how often the variable is predicted to change) and logistical (for example, managing participant burden) concerns into account in determining the frequency of measurement.
Various
10. Carefully plan the timing of measurement in virtual teams and align assessment with the time when critical processes are occurring.
When to Measure Determining Appropriate Intervals for Measurement
chronically hurried and time-patient members underestimate the passage of time, the mix of the two within a team may generate dysfunctional conflict (Mohammed & Angell, 2004). In addition, individuals with future time perspectives may perceive collaborators with present time perspectives as undisciplined, whereas individuals with present time perspectives may perceive collaborators with future time perspectives as uptight and demanding (Waller, Conte, Gibson, & Carpenter, 2001). Because temporal individual differences are subtle and often remain in the background of thought processes, it is likely that they will be misattributed to more explicitly addressed personality traits and stereotypes, even when they spark serious conflict in the team (Mohammed & Harrison, 2007). Failure to identify the underlying source of team difficulties can cause teams to apply incorrect solutions to problems. Heavy use of mediated communication and the distribution of collaborators across locations can exacerbate misperceptions about team members concerning time based individual differences. According to social identity/deindividuation theory, the reduced number and quality of cues available to communicators in lean media causes an overreliance on a few social cues (Lea & Spears, 1991). Therefore, scanty information provides the basis for social categorizations that exert considerable influence on how team members are perceived and treated.
380
Learning, Requirements, and Metrics
Although temporal characteristics likely operate beneath conscious awareness even in collocated teams, the potential problems caused by this form of diversity in virtual contexts are multiplied. Nevertheless, the way a team resolves the asynchronies resulting from temporal diversity may be a potentially important determinant of performance (Mohammed & Harrison, 2007). Although the effect of individual differences on virtual team performance is often ignored (for example, Powell, Piccoli, & Ives, 2004), examining diversity of personality traits has been identified as a promising area for future research (for example, Martins et al., 2004). The study of temporal characteristics in distributed contexts is likely to be especially fruitful. Guideline 1 is stated in Table 21.1. What are the temporal implications of diverse cultural backgrounds in virtual teams? Temporal individual differences derive, in part, from culture, and one of the defining characteristics of virtuality is nationally diverse members (Gibson & Gibbs, 2006). Because some of the most significant nonlanguage difficulties in cross-cultural interactions arise from temporal differences (Bluedorn, Kaufman, & Lane, 1992), members of globally dispersed teams are almost guaranteed to encounter divergence in how time and schedules are interpreted (Saunders, Slyke, & Vogel, 2004). For example, perceptions of how much margin there is around deadlines vary from culture to culture. In a series of three cross-national studies, Levine, West, and Reis (1980) found that public clocks and watches were less accurate in Brazil, and Brazilians expressed less regret over being late than Americans. Whereas Americans tend to apologize if they are 5 minutes late, Saudi Arabians do not feel the need to apologize unless they are 20 minutes late (Brislin & Kim, 2003). Quick service is often equivalent to good service in the United States, but many other countries do not place as great an emphasis on speed (Brislin & Kim, 2003). Although Latin America and southern Europe subscribe to event time in which schedules are fluid and meetings take as long as needed, North America and northern Europe subscribe to clock time in which events follow prespecified schedules, and time is tightly allocated (Saunders et al., 2004). In addition, cultures with a short-term orientation (for example, the United States and Russia) focus on the present and immediate gratification, but cultures with a long-term orientation (for example, Japan and China) are more concerned with persistence (Hofstede, 2001). Furthermore, Asians and Pacific Islanders are comfortable with silence because it allows them to carefully plan the next step, but Americans and Western Europeans are often annoyed by lengthy gaps in interaction (Brislin & Kim, 2003) As these examples illustrate, time is culturally bound (Saunders et al., 2004), and cultural orientation shapes the beliefs, preferences, and values of group members toward time. Nevertheless, because failure to communicate important contextual information has been identified as a key hindrance to establishing mutual knowledge in distributed teams (Cramton, 2001), it is unlikely that team members will spontaneously discuss differences in temporal orientations. Therefore, Saunders and colleagues (2004) advocate creating an awareness of differences and developing team norms on punctuality as solutions to handling
Providing Timely Assistance: Temporal Measurement Guidelines
381
variability on temporal perceptions among remote team members. In their review on multinational and multicultural (MNMC) virtual teams, Connaughton and Shuffler (2007) state that although culture is frequently examined in terms of nationality, race, and sex, there is a need to “move beyond unidimensional views of culture and beyond static, dichotomous views of distribution to reflect the complexities of MNMC distributed team characteristics and processes” (p. 408). Examining the time based features of culture would begin to address this research call. Guideline 2 is stated in Table 21.1.
What Is the Team Temporal Mindset Concerning Past and Present Interaction? How much prior history does the team have? Familiarity facilitates interpersonal interaction and influences performance (for example, Harrison, Mohammed, McGrath, Florey, & Vanderstoep, 2003). Clearly, knowledge of the habits and abilities of team members implies a different level of development than teams without a prior history. Because participants may sign up for experiments with friends or be in classroom teams with previous work partners, researchers should take previous history into account. In the case of field studies with ongoing virtual teams, group (and not just organizational) tenure should be reported. In addition, as the level of team maturity affects measurement choices, it would also be useful to identify where the groups are in terms of social development (for example, forming, storming, norming, performing, and adjourning; Tuckman & Jensen, 1977), as well as task progress in relation to the deadline (Gersick, 1988). While past member interaction is important for all team types, of particular interest in the virtual context is the extent to which FTF interaction is part of the distributed team’s history (Connaughton & Shuffler, 2007). As there is substantial evidence that computer-mediated groups are less efficient, requiring more time and effort to achieve a common understanding (for example., Hightower & Sayeed, 1996), team members that have FTF contact, particularly at the beginning of the work cycle, are at a substantial advantage over those who have never met in person. For example, Furst, Reeves, Rosen, and Blackburn (2004) reported that working virtually slowed progress through the formation stage of project team development by reducing opportunities to communicate. Guideline 3 is listed in Table 21.1. What is the expected life span of the team? Although the prototypical virtual team is generally characterized as an ad hoc group that adjourns after tasks with finite time limits are completed, longer-term virtual teams are predicted to become more common as globalization dictates their necessity (Zakaria, Amelinckx, & Wilemon, 2004). Longevity has been identified as an important characteristic in the distributed context because temporary and ongoing teams have diverse structures and processes (Saunders & Ahuja, 2006). Specifically, anticipating future interaction causes electronic partners to alter their communication strategies, resulting in improved interpersonal relationships (Walther, 2002).
382
Learning, Requirements, and Metrics
Indeed, no significant differences in relational communication (Walther, 1994) resulted between computer-mediated versus FTF groups when there was the ongoing expectation of continued interaction. Therefore, “temporal effects can outweigh media effects on decision quality when groups have a chance to develop common experiences” (Walther, 2002, p. 251). In contrast to the primary focus on task accomplishment in temporary teams, ongoing virtual teams are more socially oriented and have time to develop norms, establish deeper trust, and resolve conflict (Saunders & Ahuja, 2006). Therefore, a mismatch occurs when longer-term processes that take time to unfold are examined in shorter-term teams that have no expectation of future interaction. However, reviews of the virtual team literature concluded that most studies utilized short-term student samples that met an average of four to five weeks (Powell, Piccoli, & Ives, 2004), yet interpersonal processes of trust building and conflict resolution received the most emphasis (Martins et al., 2004). Certain processes emerge only after the team has worked together for an extended period and may be salient only in teams that expect continued contact. Clearly, the temporary versus ongoing nature of virtual teams permits the meaningful measurement of some variables, while constraining other possible choices. For example, long-term virtual teams are better suited to the investigation of variables such as group identity, social integration, group norms, and psychological safety. Guideline 4 is listed in Table 21.1.
What Are the Temporal Implications of Technology Use and Task Type? Temporal differences are embedded in the technology that is used, with asynchronous (delayed or different time) teams allowing for more response time than synchronous (“real” or same time) teams (for example, Warkentin, Sayeed, & Hightower, 1997). Specifically, synchronous media (for example, FTF communication, telephone, and chat) facilitates turn taking, allows for subtle cues to be conveyed, and provides instantaneous feedback, but imposes several constraints on when and where members can participate (Montoya-Weiss, Massey, & Song, 2001). In contrast, information exchange takes longer with asynchronous technology (for example, e-mail, Internet newsgroups, and electronic bulletin boards) because members can reflect on received messages and carefully compose and edit responses (Warkentin et al., 1997). However, communication may become disjointed when feedback is delayed and interruptions or long pauses occur (Montoya-Weiss et al., 2001). Cramton (2001) identified differences in speed of access to information and difficulty interpreting the meaning of silence as key hindrances to establishing mutual knowledge in distributed teams. Although both asynchronous and synchronous communication have strengths and weaknesses, their effectiveness is determined, in part, by the nature of the team task in which they are employed. For example, asynchronous technology is particularly well suited for straightforward tasks, such as idea generation, because it overcomes the limitation of only one person being able to speak at a time (for example, Valachich, Dennis, & Connolly, 1994). However, due to the
Providing Timely Assistance: Temporal Measurement Guidelines
383
increase in media richness, synchronous technology is generally recommended for tasks requiring detailed information sharing, reciprocal interdependence, and high coordination (for example, Bell & Kozlowski, 2002). To illustrate, Maznevski and Chudoba (2000) found that effective global virtual teams sequenced FTF coordination meetings at various intervals when there was a need to conduct complex decision making that required intensive interaction. In contrast, ineffective teams utilized expensive in-person contact to collect simple data. Therefore, the correspondence between the technology and the task type is a significant factor in determining virtual team effectiveness. Whereas temporary virtual teams with straightforward objectives may achieve task accomplishment by relying solely on electronic communication, FTF exchanges can significantly enhance the performance of ongoing teams (Saunders & Ahuja, 2006). In-person contact provides opportunities to “clear the air” interpersonally, deal with long-standing conflicts, and handle complex issues, as well as rejuvenate motivation on extended projects (for example, Furst et al., 2004). The timing of FTF meetings is a significant consideration, with more frequent in-person contact advised when the task requires high interdependence and when members have not developed shared mental models of teamwork (Maznevski & Chudoba, 2000). Indeed, a longitudinal study of global virtual teams concluded that FTF contact set the basic temporal rhythm for team interaction. Specifically, FTF “coordination meetings served as a heartbeat, rhythmically pumping new life into the team’s processes before members circulated to different parts of the world and task” (Maznevski & Chudoba , 2000, p. 486). Guideline 5 is stated in Table 21.1.
What Temporal Process Mechanisms Are Employed in Virtual Teams? Whereas temporal patterns surface naturally in synchronous groups, asynchronous groups necessitate an explicit focus on synchronization (Massey, MontoyaWeiss, & Hung, 2003). Therefore, recent attention has been given to temporal coordination as a mechanism for improving asynchronous collaboration (for example, Im, Yates, & Orlikowski, 2005; Montoya-Weiss et al., 2001). In a study on global virtual project teams, temporal coordination, defined as a process intervention for directing the “pattern, timing, and content of interaction incidents in a team,” enhanced convergence oriented behaviors and was associated with higher performance (for example, Massey et al., 2003, p. 131). In addition, there has been increased discussion on the role of leadership in helping distributed collaborators to manage performance in the virtual team literature (for example, Bell & Kozlowski, 2002). The notion of temporal leadership reflects the extent to which leaders prioritize and set milestones, as well as pace the team so that work is finished on time (Nadkarni & Mohammed, 2007). Indeed, temporal coordination and leadership may play a pivotal role in determining virtual team success or failure, especially in the face of time constraints, asynchronous communication, and diverse time based characteristics. Therefore, it is expected that interest in these
384
Learning, Requirements, and Metrics
constructs will continue to grow in the distributed team literature. Guideline 6 is stated in Table 21.1.
What Is the Temporal Context? How do differences in time separation impact virtual team functioning? The challenges presented by varying time zones are substantial in internationally dispersed teams, including restricted possibility for synchronous interaction, miscommunication from reliance on asynchronous interaction, as well as delay and rework costs (for example, Carmel, 2006). Even a time zone difference of just one hour can reduce overlapping time by several hours because of divergence in when the workday starts and ends, as well as when lunch is taken (Espinosa & Carmel, 2003). Time zone differences can be exploited by operating “roundthe-clock,” in which dispersed collaborators pass off their work at the end of the day to team members around the world who continue to work while they sleep (Carmel, 2006). However, because near-flawless coordination and communication are required before these benefits can be realized, many teams are not able to sustain round-the-clock operations over the long term (Espinosa & Carmel, 2003). Indeed, most research reports that work in teams is disrupted rather than assisted by varying time zones (for example, Carmel, 2006; Munkvold & Zigurs, 2007; Sarker & Sahay, 2004). Although receiving the most emphasis, it is important to note that time zone differences are only one type of time separation. Differences in work schedules, shifts, and lunch breaks, as well as nonoverlapping weekends and holidays, can also be significant hurdles to overcome in virtual teams (Espinosa & Carmel, 2003). For example, while a nine-to-five schedule in a Monday to Friday workweek is standard for Americans, Friday is not a workday in Arab countries, and Spaniards work until after 7 P.M. because they start later and take a longer lunch break (Espinosa & Carmel, 2003). In addition, the diversity in national holidays also presents coordination challenges, especially when remote partners fail to communicate this kind of contextual information (Cramton, 2001). The extent to which global virtual teams will be disadvantaged depends on the type and degree of time separation. For example, having multiple collaborators widely dispersed across several time zones is more difficult than members being collocated within two sites that are working in different time zones (Espinosa & Carmel, 2003). In addition, the lower the number of overlapping hours, the greater the restriction of when synchronous technology can be utilized. For team members in India, coordinating synchronous communication with remote partners in New York is far more taxing than with partners in Europe or Asia because of the magnitude of the time separation (Carmel, 2006). Therefore, many global virtual teams may rely on suboptimal media such as e-mail for complex forms of collaboration when a simple phone call would provide much needed clarification. Significant project delays can ensue when the lack of media richness in asynchronous communication perpetuates miscommunication and misinterpretation (Espinosa & Carmel, 2003).
Providing Timely Assistance: Temporal Measurement Guidelines
385
In addition to the configuration of time zone differences and the number of overlapping hours, another factor determining the extent to which time separation will affect team functioning is the effectiveness of strategies used to overcome these difficulties. Existing interventions include stating the clock time for all involved countries for each task, establishing liaison roles to facilitate interaction across locations, and instituting messaging norms to enhance communication (Espinosa & Carmel, 2003; Sarker & Sahay, 2004). Global information technology firms have also invested in technological tools to better structure data and have created organizational cultures that encourage employees to work longer hours in order to maximize overlapping hours with remote partners (Carmel, 2006). Guideline 7 is listed in Table 21.1. What are the team’s external deadlines, and how much time pressure are members experiencing? Deadlines are a significant component of a group’s temporal context, and virtual team success is heavily dependent on the timeliness of work completion (for example, Sarker & Sahay, 2004). Nevertheless, the length of time taken to finish projects is commonly omitted as a criterion variable in virtual team research (for example, Carte, Chidambaram, & Becker, 2006; MontoyaWeiss, Massey, & Song, 2001), with more attention given to the quality and quantity of performance. In addition, insufficient attention has been paid to subjective assessments of time pressure, despite its importance in the virtual context. One of the most robust results in this literature is that computer-mediated communication takes longer than FTF interaction (for example, Walther, 2002). This finding, coupled with the fact that temporary distributed teams are often required to execute tasks with short time limits, highlights the heightened time pressure in many distributed teams. Furthermore, time limits have been shown to have a significant impact on the interpersonal dynamics of dispersed collaborators (for example, Walther, 2002). Therefore, subjective assessments of workload intensity that capture whether team members perceive that they do not have enough time, have just enough time, or have more than enough time for team tasks should be collected. Guideline 8 is listed in Table 21.1.
HOW FREQUENTLY TO MEASURE: VIRTUAL TEAMS OVER TIME Having discussed what to measure in terms of the temporal dimensions of virtual teams, we now turn our attention to the issue of how frequently to measure. Some of the concerns with cross-sectional studies stem from the potential for type I and type II temporal errors (McGrath, Arrow, Gruenfeld, Hollingshead, & O’Connor, 1993). Type I temporal errors occur when the conclusions from short-lived teams are not sustained over a longer term. For example, global virtual teams experience a pattern of “swift,” but fragile, trust that erodes over time (Jarvenpaa & Leidner, 1999). Similarly, McGrath and colleagues (1993) found that performance losses commonly ascribed to the use of computermediated versus FTF communication disappeared by the third or fourth week that a team worked together. Type II temporal errors occur when the effects from
386
Learning, Requirements, and Metrics
longer-term teams do not occur in short-lived teams. To illustrate, Yoo and Kanawattanachai (2001) indicated that a virtual team’s collective mind (social cognitive system in which individuals heedfully interrelate their actions) developed only in the later stages of a project’s life after a transactive memory was in place. Given the compelling nature of these results, researchers should aggressively seek opportunities to conduct longitudinal studies. Despite the empirical demonstration that observing teams over time is critical to uncovering team effects, however, a recent review concluded that “research on virtual teams has been predominantly conducted using single work sessions, thus ignoring the role of time on group processes and outcomes” (Martins et al., 2004, p. 819). Clearly, one reason for the prevalence of cross-sectional studies is the formidable logistical obstacles encountered in doing longitudinal research on distributed teams. The greater the frequency of data collection, the greater the likelihood of participant fatigue and attrition. Therefore, the theoretical need to capture changes over time in dynamic variables must be balanced with the logistical likelihood of gaining sufficient participation from a majority of members in each team for each wave of data collection. Guideline 9 is stated in Table 21.1. WHEN TO MEASURE: DETERMINING APPROPRIATE TIME INTERVALS FOR MEASUREMENT Based on the convenience of data collection, the timing of measurement may be somewhat arbitrary in many virtual team studies. Nevertheless, researchers are increasingly advocating for greater specificity regarding when measures are collected and whether data are gathered at appropriate times (for example, Marks, Mathieu, & Zaccaro, 2001). Because computer-mediated teams require more time and effort by members to achieve the same level of shared understanding in FTF teams, measurement must allow sufficient time for distributed members to adapt to one another and the communication medium. For example, it is possible to assess team processes too early, before the team has had adequate time for meaningful interaction, as well as too late, well after key communications have already occurred. A time-sensitive team task analysis, as well as qualitative research, can assist in identifying appropriate times for measurement. When timing data collection, researchers should also be cognizant of critical events in the distributed team context (for example, project deadlines and major team conflicts). As the rhythm of FTF meetings has been shown to play a pivotal role in global virtual teams (Maznevski & Chudoba, 2000), it is imperative that investigators understand the implications for measurement. Whether data should be collected before, during, and/or after FTF interactions will depend on the purpose of the study and the nature of the variables under investigation. Guideline 10 is stated in Table 21.1. CONCLUSION Summarized in Table 21.1, the 10 temporal guidelines derived above alert investigators to the temporal variables that could potentially influence virtual
Providing Timely Assistance: Temporal Measurement Guidelines
387
team measurement. As an initial step, researchers should catalog the various ways in which time could play a role in distributed team functioning for their particular study so that assessment tools can be designed to tap key temporal dimensions. Process-related guidelines 4 (alignment between team life span and study variables), 9 (frequency of measurement), and 10 (when to measure) should be explicitly considered in the design of all virtual team studies. At a minimum, variables such as team tenure (guideline 3), the type and degree of time separation (7), the frequency of asynchronous and synchronous media use (5), and the timeliness of work completion (8) should be descriptively reported as a matter of course in studies of ongoing virtual teams. Content-related guidelines 1 (time based individual differences), 2 (member cultural background), and 6 (temporal process mechanisms) are more contingent on a study’s nature and purpose, but are recommended as promising avenues for future investigation. In addition, several guidelines, including matching technology and task type (5), as well as assessing temporal coordination, leadership (6), and time pressure (8), are practically oriented to facilitate virtual team success. Despite the added complexities and costs incurred by more comprehensively incorporating time into measurement, it is predicted that some of the most fruitful research streams in the coming years will occur at the intersection of virtual teams and temporal dynamics. REFERENCES Bell, B. S., & Kozlowski, S. W. J. (2002). A typology of virtual teams: Implications for effective leadership. Group & Organization Management, 27(1), 14–49. Bluedorn, A. C., & Denhardt, R. B. (1988). Time and organizations. Journal of Management, 14(2), 299–320. Bluedorn, A. C., Kaufman, C. F., & Lane, P. M. (1992). How many things do you like to do at once? An introduction to monochronic and polychronic time. Academy of Management Executive, 6, 17–26. Brislin, R. W., & Kim, E. S. (2003). Cultural diversity in people’s understanding and uses of time. Applied Psychology, 52(3), 363–382. Carmel, E. (2006). Building your information systems from the other side of the world: How Infosys manages time-zone differences. MIS Quarterly Executive, 5(1), 43–53. Carte, T. A., Chidambaram, L., & Becker, A. (2006). Emergent leadership in self-managed virtual teams: A longitudinal study of concentrated and shared leadership behaviors. Group Decision and Negotiation, 15, 323–342. Connaughton, S. L., & Shuffler, M. (2007). Multinational and multicultural distributed teams: A review and future agenda. Small Group Research, 38(3), 387–412. Conte, J. M., Mathieu, J. E., & Landy, F. J. (1998). The nomological and predictive validity of time urgency. Journal of Organizational Behavior, 19, 1–13. Cramton, C. D. (2001). The mutual knowledge problem and its consequences for dispersed collaboration. Organization Science, 12(3), 346–371. Espinosa, J. A., & Carmel, E. (2003). The impact of time separation on coordination in global software teams: A conceptual foundation. Software Improvement and Practice, 8, 249–266.
388
Learning, Requirements, and Metrics
Espinosa, J. A., Cummings, J. N., Wilson, J. M., & Pearce, B. M. (2003). Team boundary issues across multiple global firms. Journal of Management Information Systems, 19 (4), 157–190. Furst, S. A., Reeves, M., Rosen, B., & Blackburn, R. S. (2004). Managing the life cycle of virtual teams. Academy of Management Executive, 18(2), 6–20. Gersick, C. J. G. (1988). Time and transition in work teams: Toward a new model of group development. Academy of Management Journal, 31, 9–41. Gibson, C. B., & Gibbs, J. L. (2006). Unpacking the concept of virtuality: The effects of geographic dispersion, electronic dependence, dynamic structure, and national diversity on team innovation. Administrative Science Quarterly, 51(3), 451–495. Harrison, D. A., Mohammed, S., McGrath, J. E., Florey, A. T., & Vanderstoep, S. W. (2003). Time matters in team performance: Effects of member familiarity, entrainment, and task discontinuity on speed and quality. Personnel Psychology, 56(3), 633–669. Hightower, R. T., & Sayeed, L. (1996). Effects of communication mode and prediscussion information distribution characteristics on information exchange in groups. Information Systems Research, 7(4), 451–465. Hofstede, G. (2001). Culture’s consequences: Comparing values, behaviors, institutions, and organizations across nations. Thousand Oaks, CA: Sage. Im, H., Yates, J., & Orlikowski, W. (2005). Temporal coordination through communication: Using genres in a virtual start-up organization. Information Technology & People, 18(2), 89–119. Jarvenpaa, S. L., & Leidner, D. E. (1999). Communication and trust in global virtual teams. Organization Science, 10, 791–865. Kozlowski, S. W. J., & Ilgen, D. R. (2006). Enhancing the effectiveness of work groups and teams. Psychological Science in the Public Interest, 7(3), 77–124. Lea, M. R., & Spears, R. (1991). Computer-mediated communication, deindividuation and group decision making. International Journal of Man-Machine Studies, 34, 283–301. Levine, R. V., West, L. J., & Reis, H. T. (1980). Perceptions of time and punctuality in the United States and Brazil. Journal of Personality and Social Psychology, 38, 541–550. Marks, M. A., Mathieu, J. E., & Zaccaro, S. J. (2001). A temporally based framework and taxonomy of team processes. Academy of Management Review, 26(3), 356–376. Martins, L. L., Gilson, L. L., & Maynard, M. T. (2004). Virtual teams: What do we know and where do we go from here? Journal of Management, 30(6), 805–835. Massey, A. P., Montoya-Weiss, M. M., & Hung, Y. (2003). Because time matters: Temporal coordination in global virtual project teams. Journal of Management Information Systems, 19(4), 129–155. Maznevski, M. L., & Chudoba, K. M. (2000). Bridging space over time: Global virtual team dynamics and effectiveness. Organization Science, 11(5), 473–492. McGrath, J. E., Arrow, H., Gruenfeld, D. H., Hollingshead, A. B., & O’Connor, K. M. (1993). Groups, tasks, and technology: The effects of experience and change. Small Group Research, 24, 406–420. Mohammed, S., & Angell, L. (2004). Surface- and deep-level diversity in workgroups: Examining the moderating effects of team orientation and team process on relationship conflict. Journal of Organizational Behavior, 25, 1015–1039. Mohammed, S., & Harrison, D. (2007, August). Diversity in temporal portfolios: How time-based individual differences can affect team performance. Paper presented at the Academy of Management Conference, Philadelphia, PA.
Providing Timely Assistance: Temporal Measurement Guidelines
389
Montoya-Weiss, M. M., Massey, A. P., & Song, M. (2001). Getting it together: Temporal coordination and conflict management in global virtual teams. The Academy of Management Journal, 44(6), 1251–1262. Munkvold, B. E., & Zigurs, I. (2007). Process and technology challenges in swift-starting virtual teams. Information & Management, 44(3), 287–299. Nadkarni, S., & Mohammed, S. (2007, December). Diversity on temporal individual differences and team performance: The moderating role of temporal leadership. Paper presented at the 21st annual meeting of the Australian and New Zealand Academy of Management, Sydney, Australia. Powell, A., Piccoli, G., & Ives, B. (2004). Virtual teams: A review of current literature and directions for future research. The DATA BASE for Advances in Information Systems, 35(1), 6–36. Sarker, S., & Sahay, S. (2004). Implications of space and time for distributed work: An interpretive study of US-Norwegian systems development teams. European Journal of Information Systems, 13(1), 3–20. Saunders, C., Slyke, C. V., & Vogel, D. (2004). My time or yours? Managing time visions in global virtual teams. The Academy of Management Executive, 18(1), 19–31. Saunders, C. S., & Ahuja, M. K. (2006). Are all distributed teams the same? Differentiating between temporary and ongoing distributed teams. Small Group Research, 37(6), 662–700. Tuckman, B. W., & Jensen, M. A. C. (1977). Stages of small-group development revisited. Group and Organization Studies, 2, 419–427. Valachich, J. S., Dennis, A. R., & Connolly, T. (1994). Idea generation in computer-based groups: A new ending to an old story. Organizational Behavior and Human Decision Processes, 57, 448–467. Waller, M. J., Conte, J. M., Gibson, C. B., & Carpenter, M. A. (2001). The effect of individual perceptions of deadlines on team performance. Academy of Management Review, 26(4), 586–600. Walther, J. B. (1994). Anticipated ongoing interaction versus channel effects on relational communication in computer-mediated interaction. Human Communication Research, 20, 473–501. Walther, J. B. (2002). Time effects in computer-mediated groups: Past, present and future. In P. Hinds & S. Kiesler (Eds.), Distributed work (pp. 236–257). Cambridge, MA: MIT Press. Warkentin, M. E., Sayeed, L., & Hightower, R. (1997). Virtual teams versus face-to-face teams: An exploratory study of a Web-based conference system. Decision Sciences, 28(4), 975–996. Yoo, Y., & Kanawattanachai, P. (2001). Developments of transactive memory systems and collective mind in virtual teams. International Journal of Organizational Analysis, 9 (2), 187–208. Zakaria, N., Amelinckx, A., & Wilemon, D. (2004). Working together or apart? Building a knowledge-sharing culture for global virtual teams. Creativity and Innovation Management, 13, 15–29. Zimbardo, P. G., & Boyd J. N. (1999). Putting time in perspective: A valid, reliable individual-differences metric. Journal of Personality and Social Psychology, 77(6), 1271–1288.
This page intentionally left blank
ACRONYMS
AAR ACC ADDS AO APGO APT ARI ARL ASR ATM CAS CAST CC CDMTS CFF CONOPS CTA CTF CTGV C2 CTT CVE DARPA DIS EAD EEG EPIC FAB FAC FiST fMRI FO FTF
after action review anterior cingulate cortex Assessment Design and Delivery System advance organizer avoid performance goal orientation Amusement Park Theoretical Army Research Institute Army Research Laboratory automatic speech recognition automated teller machine close air support coordinated awareness of situation by teams command and control Common Distributed Mission Training Station call for fire concept of operations Cognitive Task Analysis coalition task force Cognition and Technology Group at Vanderbilt command and control Cognitive Transformation Theory collaborative virtual environment Defense Advanced Research Projects Agency distributed interactive simulation enemy air defense electroencephalogram/electroencephalography executive process/interactive control First Australian Bank forward air controller Fire Support Team functional magnetic resonance imaging forward observer face-to-face
392 GFS GOMS GPS HCIP HCIP-R HF HLA HPRA HQ HTA ICT ID IRR IS IV JAD K&S KR KSAs KSAOs LGO LIAN LOs LOS LSA MFN MIT MNMC MOEs MOPs MOT2IVE NASA NTC OC ONR ORA OSD PAST PC PCC PPGO ProMES RAD RE ROI SA
Acronyms Global Forecast System goals, operators, methods, and selection rules global positioning system human-centered information processing human-centered information processing-revised human factors high level architecture human performance requirements analysis headquarters hierarchical task analysis information and communication technology identify/identification inter-rater reliability Information Systems intravenous Joint Application Development knowledge and skills knowledge of results knowledge, skills, and abilities knowledge, skills, abilities, and other characteristics learning goal orientation lateral inferior anterior negativity learning outcomes learning objective statement Latent Semantic Analysis medial or mediofrontal negativity Massachusetts Institute of Technology multinational and multicultural measures of effectiveness measures of performance Multi-Platform Operational Team Training Immersive Virtual Environment National Aeronautics and Space Administration National Training Center, Fort Irwin observer/controller Office of Naval Research operational requirements analysis operational sequence diagram performance assessment and diagnostic tool personal computer posterior cingulate cortex prove performance goal orientation productivity measurement and enhancement system Rapid Applications Development requirements engineering return on investment situation awareness
Acronyms SAGAT SAM SARS SART SBT SEAD SET SME SMMs SOPs SPAM STE TA TADMUS TDT TER TI TIM TIMx TLX TMM TMS TNA ToT TP TSA UAV UAV-STE UML USMC VR VRISE XML
Situation Awareness Global Assessment Technique surface-to-air missile Situational Awareness Rating Scale Situational Awareness Rating Technique scenario/simulation based training suppression of enemy air defense stress exposure training subject matter expert shared mental models standard operating procedures Situation-Present Assessment Method synthetic task environment task analysis Tactical Decision Making Under Stress team dimensional training training effectiveness ratio training intervention team interaction model Training Intervention Matrix NASA Task Load Index team mental model transactive memory system training needs analysis transfer of training team performance team situation awareness unmanned aerial vehicle unmanned aerial vehicle synthetic task environment unified modeling language U.S. Marine Corps virtual reality virtual reality induced symptoms and effects Extensible Markup Language
393
This page intentionally left blank
INDEX
AARs (automatically augmented after action reviews), 316, 318, 327–28, 328f Absolute judgment, 142 ACC (anterior cingulate cortex) based circuit, 10–11, 12–13, 13f, 16–17, 21– 22, 25–27 Accommodation, 52, 54, 214 Action regulation system, 10f Activity sampling technique, 137–38 Adaptive behavior, 25–26, 140 Adaptive expertise, 261 Adrenaline (epinephrine), 91 Advanced beginners, 69–71, 71t, 152, 154, 156–57. See also Cognitive Skill Acquisition framework Advanced training, 255–63; described, 255–56; develop or identify scenarios and events, 259; identify training objectives for training event, 257–59, 258t; performance diagnosis and feedback, 261–62; pretraining interventions, 259–60; principles for advanced training, 260–61; training objectives master set, 256–57, 257t Affective learning variables, 91, 92t Affective measurement of performance, 362–72; affective domain, described, 362–63; distal antecedents, 364–68; measurement of performance, 369–71; proximal antecedents, 364–65, 368–69; training model, 363–65, 364f Agent, defined, 237
Age of learners, 33–34 Aggregation methods, 278–79 Agile system development, 116 Agreeableness, 367 Ahmad, A., 178 Alderton, D. L., 261 Alignment, assessing, 285–91 Alliger, G. M., 371 Alluisi, E. A., 252 Alternative models, 57 Ampe`re, Andre´-Marie, 82 Amusement Park Theoretical (APT) model, 215–17 Analog computational programs, xiv Anderson, L. W., 304 Anterior cingulate cortex (ACC) based circuit, 10–11, 12–13, 13f, 16–17, 21– 22, 25–27 Anterior ventral nucleus of the thalamus, 11 Anthropology, 137 Anthropometric models, 141 Anxiety, 366, 367, 370–71. See also Stress Apathy, 11 APGO (avoid performance goal orientation), 369 APT (Amusement Park Theoretical) model, 215–17 Arab countries, 380, 384 Aristotle, 54 Armarego, J., 211 Arousal and performance, 91, 93
396
Index
Artificial intelligence research, 90 Ashby, W. R., 82–83 Ashwood, A., 293 Asia, 380, 384 Asoko, H. M., 58 ASR (automatic speech recognition) systems, 319–20, 320 Assessing risk, 140–41 Assessments. See Performance assessments Assimilation, 214 Asynchronous technology, 382–83 Attack Center Trainer, xiv Attention management, 54 Augmented cognition, xv Austin, J. R., 272 Authentic domain experience, 77. See also Fidelity Automated performance assessment of teams, 314–29; applications of the communication analysis toolkit, 320– 28, 322f; Communication Analysis Pipeline, 317f; communication as an indicator of performance, 315–16; competence assessment and alarms for teams, 323–28, 325f, 326f; correlation between SME Ratings and DARCAAT Predictions for Overall Team Performance, 327t; semiautomated measurement, 244; team performance measurement, 315; Visualization of Team Performance Scores from the AAR Tool, 327f Automatically augmented after action reviews (AARs), 316, 318, 327–28, 327f Automatic speech recognition (ASR) systems, 319–20, 320 Automatized tasks, 9, 25, 26. See also Habitual responses Avatar (self-representation), 34 Avoid performance goal orientation (APGO), 369 Baer, J., 215–17 Bailey, J. H., 54 Baker, E. L., 303–4, 304f
Balanced Scorecard, 293 Bareket, T., 261 Barrett, L. F., 54 Barrick, M. R., 365, 366, 367 Barrier analysis, 140–41 Barrows, H. S., 61 Basins, 83 Baxter, H. C., 94, 260, 261 Bayes nets, 308 Beaubien, J. M., 368, 369 Beginners. See Advanced beginners; Novice level Behavioral learning, 50–51, 53, 54, 62, 87–88. See also Cognitive transformation theory Behavior level of metrics, 197 Bejar, I. I., 301 Bell, B. S., 267, 273, 279 Benner, P., 69, 161 Bennett, W., 371 Bewley, W. L., 306 Bias of observers, 233, 307 Big Five factor model, 365–69 Bjork, R. A., 263 Blackburn, R. S., 381 Blickensderfer, E., 108 Bloom, B. C., 52 Bloom, B. S., 90, 98–99, 362–63 Blue Box (Link Trainers), xiv, 132 Bowers, C. A., 108 Brainstorming, 196 Brain system dynamics. See Neurophysiology of learning and memory Brannick, M. T., 351 Bransford, J. D., 260 Brazil, 380 Breadth of TA development, 134 Brewer, W. F., 56, 57, 58 British Airways Flight 009, 132 Brooks, F., 54 Brown, D. E., 57, 58 Brown, K. G., 369–70 Buff, W. L., 108 Building-block approach to training, 260–61 Building virtual environment training
Index systems, 193–206; concept generation, 195–96, 202; example, 201–2; knowledge analysis, 195; knowledge elicitation, 194–95; metrics development, 196–99, 202–4; requirements, 194–96; return on investment, 199–200, 204–5; validation, 195, 200–201, 205–6 Burke, C. S., 371 Buy-in, 292 Campbell, G. E., 108 Cannon-Bowers, J. A., 54, 108, 262, 370 Carey, S., 54 Carroll, J., 144 Case exploration, 78 Casper, W. J., 369–70 CAST (Coordinated Awareness of Situation by Teams), 356–57, 357f CDMTS (Common Distributed Mission Training Station), 101–4, 105–7t Ceiling effect, 240 Centromedial negativity (N350), 16, 21, 24 Chaturanga, 228 Chein, J. M., 12 Chen, C., 35 Chen, G., 368–69 Chess, 228 Chi, M. T. H., 51, 53, 338 Childbirth, 227 China (ancient), xiii–xiv China (modern), 380. See also Asia Chinn, C. A., 56, 57, 58 Christal, R. E., 365 Chudoba, K. M., 383 Chung, G. K. W. K., 306 City University, London, 212–13 Clark, L. A., 366 Clement, J., 57, 58 Clickstream data, 306, 309 Clore, G. L., 363 Closed-loop training, 93–94 Cockburn, A., 36 Cognition: assessment design and, 303–4, 304f; cognitive authenticity, 77; cognitive demands, 303–5, 307–8;
397
cognitive learning, overview, 51–52; Cognitive Skill Acquisition framework, 149–63; Cognitive Skill Acquisition framework, advanced beginners, 156–57; Cognitive Skill Acquisition framework, competent performers, 157–58; Cognitive Skill Acquisition framework, experts, 162– 63; Cognitive Skill Acquisition framework, implications for tactical thinking training in virtual environments, 153–54; Cognitive Skill Acquisition framework, novices, 154– 56; Cognitive Skill Acquisition framework, overview, 149, 150t; Cognitive Skill Acquisition framework, principles for learning progression, 149–53; Cognitive Skill Acquisition framework, proficient performers, 158–62; Cognitive Skill Acquisition framework, General and Domain-Specific Characteristics for the Advanced Beginner Stage, 151t; Cognitive Task Analysis (CTA), 53, 77, 137, 258–60; Cognitive Transformation Theory (CTT), 50–62, 152; Cognitive Transformation Theory (CTT), implications for virtual environments, 61; Cognitive Transformation Theory (CTT), overview, 59–61; Cognitive Transformation Theory (CTT), process of unlearning, 56–59; Cognitive Transformation Theory (CTT), sensemaking requirements, 52– 56; Coordinated Awareness of Situation by Teams (CAST), 356–57, 357f; domain modeling, 304–5, 308; individual differences in abilities, 37–38; macrocognition, 268–69; for performance assessments, 303–4; physiological and neurophysiological monitoring technologies, 82; scientist metaphor, 52; scoring model, 306–7; task representation, 305–6; team cognition, 347–50; UAV-STE case study, 352–59, 356t.See also Neurophysiology of learning and memory
398
Index
Cognition and Technology Group at Vanderbilt (CTGV), 37 Cohen, J., 343–44 Cohn, J. V., 124 Collaborative virtual environments (CVEs), 34 Collective cognition, 348. See also Virtual teams, measures of team cognition within Colquitt, J. A., 368 Combat Hunter program, 86 n.1 Common Distributed Mission Training Station (CDMTS), 101–4, 105–7t Communication: asynchronous and synchronous, 382–83; audio, 317; within collaborative virtual environments (CVEs), 34; communication analysis, 276, 320–28; Communication Analysis Pipeline, 317f; communication analysis toolkit, 320–25, 322f; cultural diversity within virtual teams, 380–81, 384; as an indicator of performance, 315–16; language proficiency of women, 36; prerequisites to communication for evaluations, 316; within small groups, 143–44; visual support for learners with lower linguistic competences, 37 Competency/self-efficacy, 41, 91, 371 Competent performers, 72, 73t; cognitive skill acquisition framework, 157–58; feedback for, 159; indicators of proficiency level, 157–58; instructional strategies, 159; scenario design components for, 158–59. See also Cognitive Skill Acquisition framework Complex training environments, 78, 252– 55, 253t, 254f Computational modeling tools, 318–19 Computer self-efficacy, 41 Computer usage, by gender, 36, 37 Concept generation, 196, 202, 247 Concept of operations (CONOPS), 126 Conceptual change, 54 Conceptual learning, 59–60 Conceptual short-term memory, 14–16 Conceptual simulation, 345
Concurrency in TA development, 134 Concurrent verbal protocols, 333. See also Verbal protocol analysis Confidentiality, 335 Connaughton, S. L., 381 CONOPS (concept of operations), 126 Conscientiousness, 367, 368 Consequential validity, 246 Constraint based optimal reasoning engine, 140 Constructivist learning, 214–17, 216f Construct validity, 246 Content validity, 246, 303 Contingencies, 293–97, 294f Controllability, 291–92 Cooke, N. J., 279 Coordinated Awareness of Situation by Teams (CAST), 356–57, 357f Coordination, defined, 355 Cortisol, 91, 93 Costa, P. T., 367 Cost of Waiting Function, 200f Costs: delay and rework costs, 384; of fidelity systems, 132–33, 169; return on investment (ROI), 199–200, 204–5 Coulson, R. L., 56, 255 Cramton, C. D., 382 Creative systems analysts, 208–23; a case of learning creative RE using a simulated learning environment (FAB ATM), 218–23, 219f; creativity in RE, 212–14; learning creative requirements analysis, 208–11, 214–23; Learning Styles Supported by the Simulated Learning Environment, 222f; RE Activities Supported by the Simulated Learning Environment, 220f Creativity Problem Based Learning framework, 211 CRESST (National Center for Research on Evaluation, Standards, and Student Testing), 306, 309 Criterion validity, 246, 303 Cronbach, Lee, 89 CTA (Cognitive Task Analysis), 53, 77, 137, 258–60
Index CTGV (Cognition and Technology Group at Vanderbilt), 37 CTT. See Cognitive Transformation Theory Cultural diversity within virtual teams, 380–81, 384 Curran, T., 14 Curtis, M. T., 315 Cybernetics, 81–94; basins, 83; closed-loop training, 93–94; described, 82–83; feedback, 82, 83; guiding systems towards goals, 82; human development: learning, 87–89; human development: maturation, 85–87; human development: variability, 89– 90; individual differences in VE based training, 90–91, 92t; kinematics, 83, 84–85f; modern scientific cybernetics, 83–85; neurocybernetics, 90; physiology of performance and emotion, 91–93 Cybernetics: or Control and Communication in the Animal and the Machine (Weiner), 82 Czerwinski, M., 35 Dalgarno, B., 215 Dallman, S., 211 DARCAAT program, 323–27, 327t DARWARS Ambush!, 324–27, 325f, 326f Data collection: within assessment process, 237; collecting task requirements data, 135–40; communication data, 317; data-driven techniques assessment design, 307–8; elicited from operators/users, 136; example types of data from task analysis, 175t; to identify user-centered design specification, 174–75; lessons learned, 177 Decision making, 56 Declarative knowledge, 241–42 Defects, 115 De Florez, Luis, xiv Deindividuation theory, 379–80 DeKeyser, V., 56
399
Delay and rework costs, 384 Dense-array electroencephalography (EEG), 11–13, 17, 23–24, 93, 278 Dependability (personality trait), 367 DeShon, R. P., 368 Design. See Requirements engineering Design lifecycle, 116, 182–87 Developing concepts. See Concept generation Dewey, J., 53–54 Diagnostic measurement systems. See Performance assessments Dialectic constructivism, 215, 216f DiBello, L., 57–58 Differential psychology, 89 Digital computers, history of development, xiv Disconfirmation, 57 Disequilibrium events, 260 Display size, 37 Distance learning guidelines, 99 Distributed interactive simulation, described, xiv Doane, S. M., 261 Documentation, 174–75, 177. See also Data collection; specific types of documents Domain modeling, 304–5, 308 Dorsal limbic circuit, 10f, 11 Dorsolateral prefrontal cortex, 12 Doyle, J. K., 58 Dreyfus, A., 60 Dreyfus, H. L., & Dreyfus, S. E., 67, 69, 72–74 Driskell, J. E., 267, 367, 371 Driver, R. H., 58 Dunnette, M. D., 367 Dwyer, D. J., 255, 256 Dynamic Descriptions, 139–40 Eaton, N. K., 367 Eddy, E. R., 371 Education, xiii, 8, 108–9. See also Learning Educational objectives taxonomy, 99 EEG (dense-array electroencephalography), 11–13, 17, 23–24, 93, 278
400
Index
Effectiveness of simulation programs: overview of, xv; return on investment (ROI), 199–200, 204–5, 204t; training effectiveness ratio (TER), 205. See also Validity Egypt (ancient), xiii Elderly learners, 33–34 Electroencephalogram. See EEG (dense-array electroencephalography) Electromyogram, 93 Elicitation: comparative summary of requirements elicitation techniques, 120–22t; described, 117–18; indexing and aggregating elicited information, 278–79; learning analysis process, 123f; modeling and analysis, 119, 122– 24, 126; requirements engineering, 194–95; virtual teams performance assessments, 275–79 Eliovitch, R., 60 Ely, K., 369–70 Embedded measures in team simulations, 351, 359 Emotions, 72, 91, 93. See also Affective measurement of performance Emphasis-change training, 24–25, 261 Endogenous constructivism, 215, 216f Engle, R. W., 54 Entity theories of learning situations, 40 Environment, 89, 242, 243, 351–52 EPIC (executive process-interactive control), 140 Epinephrine (adrenaline), 91 Ericsson, K. A., 333 Errors: apathy towards, 11; diagnosis of, 53, 61; error-related negativity, 13, 14f, 24; product defects, 115; unlearning process, 56, 59–60. See also Failures Europe, 380, 384 Evaluations. See Performance assessments Event trees, 140–41 Executive process-interactive control (EPIC), 140 Exogenous constructivism, 215, 216f Expectations for training, 40–41 Experience: assessing prior to training,
38–39; learning through, 54; as necessary, but not sufficient, component, 152–53. See also Training Experimental psychology, 88, 89 Expert performers, 66–78; cognitive skill acquisition framework, 67–74, 68t, 70– 73t, 75t–76t, 162–63; expert based methods within assessment design, 307; expert versus novice performance, 13–23, 15–19f; exploration of operational situations, 78; ill-structured knowledge domain, 67, 78; indicators of proficiency level, 162; scenario design components for, 74–78, 162; verbal protocol analysis from, 332–33 Extraversion, 366 Eye trackers, 242, 244, 277–78, 306 Eylon, B.-S., 57 FAB ATM (case of learning creative RE), 218–23, 219f Face-to-face (FTF) interactions within teams, 377, 381–83, 385–86 Face validity, 246, 303 Failures, 40, 72, 140–41. See also Errors Fault trees, 139 FBM Facility, xiv Feasibility of a measure or assessment, 244–45 Feedback: within advanced training, 261– 62; assessments as content for, 239; closed-loop training, 93; for competent performers, 159; cybernetics, 82, 83; delayed feedback, 54–55; design feedback, 134; disadvantages, 263; in early stage of learning, 24; emphasis-change training, 261; extrinsic feedback from trainers, 54, 61; intrinsic feedback from students, 54, 61; as knowledge of results (KR), 88; outcome, 53; scenario based training (SBT) method, 97, 98, 262; within sensemaking, 52–53, 61; subsequent performance guided by, 88; time lags between actions and consequences, 54–55; TIMx development, 108; training management component
Index development and, 181. See also Performance assessments Feelings. See Emotions Feltovich, P. J., 56, 61, 252, 253t, 255, 256, 258, 260–61 Females, 36–37 Feyerabend, P., 54 Fidelity: assessment-driven simulation, 247; blended fidelity training solution, 109; costs, 132–33, 169; defined, 77; disproportionate to the training value, 169; functional fidelity, 176; levels, 133; ORA requirements, 168–69, 176– 77; physical fidelity, 176, 228, 243; psychological fidelity, 77 Fight or flight phenomenon, 91 Fire Support Teams. See U.S. Marine Corps Fire Support Team (FiST) Fiske, D. W., 365 Fleishman, E. A., 90, 99 Flexible expertise, 25–26 Flight simulation, history of, xiv FMRI (functional magnetic resonance imagery), 11–12, 19–20, 90, 93 Ford, D. N., 58 Ford, J. K., 362 Forterra Systems Inc., 243 Fort Irwin, 325 Fort Lewis Mission Support Training Facility, 324–27, 325f, 326f Fowlkes J. E., 255, 256 Franks, J. J., 260 Free play, 78 FTF (face-to-face) interactions within teams, 377, 381–83, 385–86 Fuchs, H., 54 Functional fidelity, 176. See also Fidelity Functional magnetic resonance imagery (fMRI), 11–12, 19–20, 90, 93 Functional requirements, 118–19 Furst, S. A., 381 Future directions, 308–9 Gabriel, M., 11 Gagne, R. M., 362 Galileo, 54
401
Galvanic skin response (skin conductance), 93 Game based training: free play, 78; gaming environments, 247; usage by gender, 37. See also specific training programs by name Game hunting, 86 n.1 Garden Path scenarios, 162 Garg, A., 34–35 Gender, 36–37 Generic hypothesis, 201 Genetics, 89 Gertzog, W. A., 52, 54, 61 Gestalt psychology theory, 212 Gillespie, J. Z., 368 Glaser, R., 51, 53 Global positioning system (GPS) navigation systems, 349 Global virtual teams. See Temporal measurement guidelines for the study of virtual teams Goal orientation (training): cybernetics, 82; defined by training gaps, 172, 173– 74t; within design lifecycle, 170–72, 182; as distal variable that influences training, 365, 368–69; goal-directed learning, 8, 11; GOMS (goals, operators, methods, and selection rules), 139; identifying, 118; individual differences in, 39–40 Goldberg, L. R., 368 GOMS (goals, operators, methods, and selection rules), 139 Goodwin, G. F., 371 Gopher, D., 261 Goschke, T., 25 GPS (global positioning system) navigation systems, 349 GUARD FIST II, 290–91 Guilbaud, G. D., 82 Gully, S. M., 367, 368–69 Habitual responses, 11, 381. See also Rote training Hazard and operability analyses, 141 HCIP (human-centered information processing) model, 99–101
402
Index
HCIP-R (human centered information processing-revised model), 100–101, 100f, 102–4t Heart rate, 93 Hedberg, B., 57 Hewson, P. W., 52, 54, 61 Hierarchical task analysis (HTA), 136, 138, 139 High level architecture (HLA), 244 Hippocampus, 11, 14, 19–22, 20f Hoffman, R. R., 58, 252, 253t Hogan, J., 366, 367 Hogan, R., 366 Hogan Personality Inventory, 367 Holistic cognition, 73, 348. See also Virtual teams, measures of team cognition within Holladay, C. L., 41 Hoskin, B., 367 Hough, L. M., 367 HPRA. See Human performance requirements analysis HTA (hierarchical task analysis), 136, 138, 139 Human-centered information processing (HCIP) model, 99–101 Human centered information processing-revised model (HCIP-R), 100–101, 100f, 102–4t Human development: learning, 87–89; maturation, 85–87; variability, 89–90 Human error classification, 99 Human failures, 140–41 Human performance requirements analysis (HPRA), 169–70; described, 165–66, 169, 187–88; need for, 169– 70; stage 1: training needs/goals identification, 170–72, 182; stage 2: user-centered design specification, 172–77; stage 3: metrics development, 177–79; stage 4: training management component development, 179–82; training system design lifecycle, 167– 68, 167f; U.S. Marine Corps Fire Support Teams examples, 171t, 173–74t Human performance taxonomy, 99 Humrichouse, J., 371
Hunt, E., 54 Hunting, 86 n.1 Hypothesis, 196, 200–201, 359 Identify (ID) friend or foe contingency, 295 Identifying stakeholders, 118, 126 Ilgen, D. R., 293 Imagery technology. See fMRI (functional magnetic resonance imagery) Indexing/aggregation methods, 278–79, 288 India (modern), 384 India (seventh century), 228 Individual differences in VE based training, 31–43; age, 33–34; alterable states, 38; cognitive ability, 37–38; communication skills, 34; cybernetics, 89–91, 92t; expectations for training, 40–41; goal orientation, 39–40; immutable characteristics, 32–37; implications for research and design, 41, 43; Individual Differences in VE Based Learning Systems, 42–43t; Model of Individual Characteristics on Learning in Virtual Environments, 33f; prior knowledge and experience, 38– 39; self-efficacy, 41, 91; spatial abilities and gender, 36–37; spatial ability and memory, 34–36; types of individual differences, 32 Influence diagrams, 141 Informed consent, 335 Instructional strategies. See Training Intelligent tutoring systems, 90 Interactions among different users of the system, 142–43 Interpersonal intelligence, 37 JACK (anthropometric models), 141 Jackson, D. N., 368 Japan, 380 Jenkins, J., 1–2 Johnston, J. H., 262, 370 Joint Application Development (JAD) workshops, 116
Index Jungwirth, E., 60 Kaber, D. B., 31–32 Kamp, J. D., 367 Kanawattanachai, P., 386 Katz, N., 37–38 Kaufman, J. C., 215–17 Kiekel, P. A., 279 Kilcullen, R. N., 368–69 Kinematics, 83, 84–85f Kirkpatrick, D. L., 197, 242, 285, 369–70 Kizony, R., 37–38 Klein, G., 56, 58, 94, 260, 261 Klein, K., 278–79 Kline, J. A., 109 Knapp, D., 54 Knerr, B. W., 54 Knowledge: assessing prior to training, 38–39; declarative, 241–42; described, 169; knowledge of results (KR), 88, 115; knowledge, skills, and abilities (KSAs), 53, 115, 169, 179; procedural, 242 Knowledge Post, 320–23, 322f Kolb, D. A., 53–54 Koles, K., 367 Koschmann, T. D., 61 Kovitz, B. L., 125 Kozlowski, S. W. J., 267, 273, 278–79 Kraiger, K., 362, 364, 370 Krathwohl, D. R., 98–99, 304, 362, 363 KR (knowledge of results), 88. See also Feedback KSAs, 53, 115, 169, 179 Kuhn, T. S., 54, 57 Lakatos, I., 57 Lane, N. E., 252 Latent Semantic Analysis (LSA), 318–19 Lateral inferior anterior negativity (LIAN), 17, 18f, 19f, 20f Lathan, C. E., 35 Latin America, 380 Laurie, A., 99 Learning: action regulation, 10–11; definitions of, 87; early stage, 9, 22, 24;
403
goal-directed learning, 8, 11; An Idealized, Group-Average Learning Curve, 88f–89f; late stage, 9, 11–12, 14, 22; Learning goal orientation (LGO), 369; learning level of metrics, 197; Learning Objective Statement (LOS), 123f, 124; Learning outcomes (LOs) taxonomies, 98–101, 98–99, 104, 108–9; neuroscience on influences, 9; sensemaking objectives, 52–53; Theorist’s Pyramid, 2–3; Theorist’s Tetrahedron, 1–2; training goals versus, 8; unlearning process, 56–60, 61– 62, 152 Leather birthing models, 227 Lee, J. J., 306 Level of analysis of TA development, 134 Levine, R. V., 380 LGO (learning goal orientation), 369 LIAN (lateral inferior anterior negativity), 17, 18f, 19f, 20f Lifecycle approaches, 116, 167–68, 167f, 182–87 Linguistic competencies. See Communication Link analysis, 138, 139 Link Trainer (“Blue Box”), xiv, 132 Linn, M. C., 57 Longer-term virtual teams, 381–82 LOS (Learning Objective Statement), 123f, 124 LOs (learning outcomes) taxonomies, 98–101, 98–99, 104, 108–9 LSA (Latent Semantic Analysis), 318–19 Lussier, J. W., 153 Luu, P., 19–20 Machine learning analyses, 307–8 Macredie, R., 35 Macrocognition, 268–69 Macromedia Flash, 221 Maiden, N., 212–13 Male-Female Mating Pattern for the Three-Spined Stickleback, 85f Management oversight risk tree technique, 141
404
Index
Markov model technologies, 307–8 Masia, B. B., 98–99 Mastery-oriented individuals, 39–40 Maturation, 85–86 Maxims, 74 Mayer, R. E., 35, 304 Maznevski, M. L., 383 McAdams, D. P., 368 McBride, D. K., 87 McCloy, R. A., 367 McCrae, R. R., 367 McGrath, J. E., 385 McKenzie, B., 36 McMillan, L., 54 Measurement of the environment, 242, 243 Measurements. See Performance assessments Mechanism, 54, 83 Medial frontal negativities (MFNs), 13, 16, 17, 19f, 20–23, 22f, 23f Medical simulation, described, xv Memories: of aging adults, 33–34; conceptual short-term memory, 14–16; expectancy and outcome representations, 9; human-centered information processing (HCIP) model, 99–101; long-term, 11, 14; self-regulatory mechanisms, 9; spatial ability and memory, 34–36; Theorist’s Tetrahedron, 1–2; Transactive memory system (TMS), 269, 272–73, 276. See also Cognition; Neurophysiology of learning and memory Men, 36–37 Mental models, 51–58; alternative models, 57; cognitive transformation theory postulates, 59–60; conceptual simulation, 345; described, 51, 58, 144; improvement of, 51, 152; as incomplete, 144; of learning material, 38–39; process of learning better models, 59–60; shared mental models (SMM), 269, 272, 276; unlearning, 56–60, 61–62, 152; of virtual environments, 38–39 Merrill, M. D., 98
Methods, defined, 144 Metrics, 196–99; attributes, 199; development of, 177–79; example, 202; measures of effectiveness (MOE), 198, 202–3, 204t; measures of performance (MOP), 198, 202–3, 204t; MOT2IVE system, 184–85; objective measures, 318; outcome metrics, 197, 202–4; process, 197, 198–99, 202–4; subjective, 318; types of, 197–98, 198f MFNs (medial frontal negativities), 13, 16, 17, 19f, 20–23, 22f, 23f MIL-STD 1472, 142 Mislevy, R. J., 301 Mission characteristics, 170–71. See also Goal orientation (training) Mission Essential Competency technique, 245 Mission outcome metrics, 178 MNMC (multinational and multicultural) virtual teams, 381, 384 Models, described, 139–40 Mode of task elements, 355 MOE (measures of effectiveness), 198, 202–3, 204t, 241, 241–42 Moon, H., 367 MOP (measures of performance), 198, 202–3, 204t, 241–42, 261 Morris, N. M., 58 MOT2IVE system development effort, 182–87; FiST fidelity requirements example, 186t; FiST knowledge and skills decomposition examples, 184t; FiST knowledge and skill training objective learning trends, 188f; FiST multimodal cue and capability requirements example, 185t; FiST performance metric examples, 187t; FiST scenario variation example, 188t; FiST task decomposition examples, 183t Mount, M. K., 365, 366, 367 Muller, P., 109 Multimodal cues, 175–76, 177 Multinational and multicultural (MNMC) virtual teams, 381, 384 Multiplatform operational team training
Index immersive virtual environment system. See MOT2IVE system development effort Multiplayer virtual environments. See Virtual teams Multiple representations, 78 Multitrait-multimethod approach, 351 Munro, A., 306 Myelinization, 86–87 Myers, A. C., 61 N170 (visual cortex), 14–16, 15f, 20–22, 20f, 21f, 24, 25 N250 (posterior negativity), 14–15, 15f, 16f, 23, 25 N350 (centromedial negativity), 16, 21, 24 National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 306, 309 National Cyber Security Division, 82 National Training Center (NTC), 325 Nature-nurture variability, 89. See also Environment Naylor, J. C., 293 Near-infrared spectroscopy, 11, 93 Negative affect, 366 Nervous system, 86–87 Neurocybernetics, defined, 90 Neurophysiological measurement technologies, 11–24; EEG (dense-array electroencephalography), 11–13, 17, 23–24, 93, 278; eye trackers, 242, 244, 277–78, 306; fMRI (functional magnetic resonance imagery), 11–12, 19– 20, 90, 93; near-infrared spectroscopy, 11, 93 Neurophysiology of learning and memory, 7–27; education and training protocols, 23–27; error-related negativity, 12–13, 14f, 24; learning and dual-action control systems, 9–11, 10f; learning versus performance, 8; memories as learning outcomes, 9; neural signatures of learning and skill development, 12–13; neurophysiological model on goal-directed learning,
405
11; parts of the brain used for spatial tasks, 36; tracking the development of expertise, 13–23. See also Memories Neuroticism, 366, 367 Neverwinter Nights, 247 Nguyen, L., 212, 213 Noble, C. E., 88–89 Noncognitive processes, 82 Nonfunctional requirements, 118–19 Nonlinearities, identifying, 289, 295 Norman, G., 34–35 Norman, W. T., 365 Novelty, 213 Novice level: cognitive skill acquisition framework, 67–69, 68t, 70t, 154–56; expert versus novice performance, 13– 23, 15–19findicators of proficiency level, 154; instructional strategies for, 155; scenario design components for, 154–55; verbal protocol analysis from, 332–33. See also Cognitive Skill Acquisition framework NTC (National Training Center), 325 Nuclear industry, 132, 141 NUREG- 0700, 141 Objective measurement, 243 Objectives (training). See Goal orientation (training) Observational bias, 233, 307 Observational data, 137–38 Observational protocols, 233 Observation of operators, 137 Office of Naval Research: Combat Hunter program, 86 n.1; Virtual Technologies and Environments (VIRTE) Program, xiv Office of the Inspector General, 236 Olson, J. R., 144 O’Neil, H. F., 99 Online Interactive Virtual Environment, 243 “On the fly” events, 260, 262–63 Openness as personality trait, 367 Operational requirements analysis (ORA), 168–69; described, 165–66, 168, 187–88; need for, 168–69; stage
406
Index
1: training needs/goals identification, 170–72, 182; stage 2: user-centered design specification, 172–77; stage 3: metrics development, 177–79; stage 4: training management component development, 179–82; training system design, 167f; training system design lifecycle, 167–68 Operational sequence diagram (OSD), 136, 139 ORA. See Operational requirements analysis Ordering, defined, 355 Organizational charts, 139 Organizational objectives, 285–91 Orsanu, J., 315 OSD (operational sequence diagram), 136, 139 Oser, R. L., 255, 256 O’Shea, P. G., 371 Outcome metrics, 179, 197, 202–4 Paas, F., 33–34 Pacific Islanders, 380 Pals, J. L., 368 Parietal cortex, 19, 20f Paunonen, S. V., 368 Payne, S. C., 367, 368, 369 PCC (posterior cingulate cortex), 11–12, 17, 19–22, 20f, 25–27 PC gaming environments. See Game based training Pellegrino, J. W., 261 Peluso, D. A., 56 Perceptual learning, 260, 261 Performance assessments, 227–34; absolute judgment, 142; advanced training level, 261–62; agreement across evaluators, 289–90, 297; arousal and, 91, 93; Characteristics of Measures Involved in Creating a Performance Profile, 230f; closed-loop training, 93; context, 237; coordination, defined, 355; diagnosis supported by, 239; for feedback, 239; guiding principles for, 231–34; hypothesis testing facilitated by, 359;
identifying relative importance, 288– 89, 295–97; improvement priorities, 290, 295; index of performance, 278– 79, 288; limitations of, 8; measurement system, defined, 285; measures of effectiveness (MOE), 198, 202–3, 204t, 241, 241–42; measures of performance (MOP), 198, 202–3, 204t, 241– 42, 261; metric data, 178, 179, 245–46, 317–18; neuroadaptive training, 8; observation protocols, 233; overview of, 284–85; Performance Prediction with the Communication Analysis Toolkit, 319; process, 229–31; reward system, defined, 285; scenario based training (SBT) method, 262; within sensemaking, 52–53, 61; sensitivity to changing objectives or mission, 290– 91, 297; team performance measurement systems, 270–79, 271f, 350–51; of trainees prior to training, 38–39; validity (see Validity) Performance assessments: design, 300– 310; cognitive demand, 303–4, 304f; data-driven techniques, 307–8; domain modeling, 308; domain representation, 304–5; expert based methods, 307; future directions of design, 308–9; overview, xv; Performance assessment Design and Delivery System (ADDS) (CRESST), 309; purpose of performance assessments, 301–2; scoring model, 306–7; task representation, 305–6; validity, 302–3, 310 Performance assessments: for training, 227–29, 236–48; agent, defined, 237; assessment, described, 236–38; assessment-driven simulation, 246–47; automated measures, 244; capabilities or attributes of the trainee, 241–42; diagnosis supported by, 239; feasibility of, 244–45; future of assessment for VE training, 246–47; measurements within, 237–38; objective measurement (observational and automated measurement), 243; objects of assessment in VE training, 241–43;
Index performance context within, 237; physiological measurement, 244; process, 243–44; prognosis of performance success and failure, 240; reliability of, 238; second order measures and assessments, 242; simulation-driven assessment, 246–47; subjective measurement (trainee response), 243, 244; for training management, 240–41; of training objectives, 236, 238; utility of, 236, 245 Performance assessments: organizational level considerations, 284–98; advantages of contingencies, 295–97; buy-in, 292; controllability, 291–92; defined, 284; evaluating training aligning with, 285–91; identifying and communicating value: specific techniques, 293; identifying nonlinearities, 289, 295; implementing organization level factors, 292–97; Organizational Levels Performance Measurement Guidelines, 296–97t; organizational reward system, 291; ProMES contingencies, 293–95 Performance levels. See Individual differences in VE based training Performance-oriented individuals, 39–40 Personality traits, 365–69 PFC (prefrontal cortex), 13f Phillips, J. K., 56 Physical fidelity, 176, 228, 243. See also Fidelity Physiological responses: to emotions, 91– 93; measurement of, 244, 306, 309; taxonomy of environment interactions versus, 94; Yerkes-Dodson law, 91 Piagetian process, 52, 54, 214 Plato, 82 Pleiocene/ Pleistocene periods, 86 n.1 Polanyi, L., 57 Posner, G. J., 52, 54, 61 Posterior cingulate cortex (PCC), 11–12, 17, 19–22, 20f, 25–27 Posterior negativity (N250), 14–15, 15f, 16f, 23, 25
407
PPGO (prove performance goal orientation), 369 Predictive validity, 303 Prefrontal cortex (PFC), 13f Prescriptions, 240 Pretraining, 38, 259–60 Prevou, M. I., 153 Prince, A., 351 Prince, C., 351 Prior knowledge and experience, 38–39 Pritchard, R. D., 289, 292, 293, 295 Problem solving, 333 Procedural knowledge, 242 Procedures data, 135–36 Process data, 334 Process metrics, 179, 197, 202–4 Product defects, 115 Productivity measurement and enhancement system (ProMES) intervention, 293–95 Proficiency levels. See Advanced beginners; Competent performers; Expert performers; Novice level; Proficient performers Proficient performers, 72–74, 75t; cognitive skill acquisition framework, 158–62; indicators of proficiency level, 159–60; instructional strategies, 161– 62; scenario design components for, 160. See also Cognitive Skill Acquisition framework Prognosis of performance success and failure, 240 Progression, principles for learning, 149–53 ProMES contingencies, 293–95 Protective measures, 140–41, 141–42 Protocol analysis, 137 Prototyping, 196 Prove performance goal orientation (PPGO), 369 Psychological fidelity, 77 Psychophysiological measurement, 11–24; EEG (dense-array electroencephalography), 11–13, 17, 23–24, 93, 278; eye trackers, 242, 244, 277–78, 306; fMRI (functional magnetic
408
Index
resonance imagery), 11–12, 19–20, 90, 93; near-infrared spectroscopy, 11, 93 Purposeful behavior, 132 Quaintance, M. K., 99 Questionnaires, 142. See also Documentation Quin˜ones, M. A., 41 Rada, R., 35 Radtke, P. H., 267 Rall, E., 56 Rantanen, E., 99 Rapid Applications Development (RAD) approach, 116 RE. See Requirements engineering Reaction criteria, 369–70 Reaction level of metrics, 197 Realism. See Fidelity Rees, E., 53 Reeves, M., 381 Reflection, 54 Reis, H. T., 380 Reliability of measurements, 245, 359 Remediation, 233 Requirements engineering (RE), 115–29, 131–45; assessing system performance, 141–42; collecting task requirements data, 135–40; as a constructivist process, 212; cost versus realism trade-off, 132–33; deficiencies in KSA overcome by training systems, 115; described, 194, 208; elicitation techniques, 119–26; Four Central Activities Involved in Training Systems Requirements Engineering, 117f; functional requirements, 118–19; Future Needs in Training Systems Requirements Engineering, 127–28t; identifying stakeholders, 118, 126; identifying training system goals, 118; key elements to consider in developing the TA, 135; knowledge analysis, 195; knowledge elicitation, 194–95; knowledge validation, 195; learning RE, at the workplace, 210–11, 217; learning RE, Characteristics of Learning
Approaches, 211t; learning RE, Creativity Problem Based Learning framework, 211; learning RE, overview of, 209; learning RE, through industry-intensive courses, 209, 217; learning RE, through subject(s) included in a tertiary course, 210; lifecycle approaches, 116; link between physical and cognitive TA, 139; methods for assessing interactions across levels of the organization, 142–43; methods for assessing potential sources of risk, 140–41; methods for describing data collected, 138–39, 287–88; modeling and analysis, 126, 128; nonfunctional requirements, 118–19; protective measures, 140–41; requirements traceability, 125; specification process, 124–25, 128; state of the art in requirements specification, 126–29; Task Analysis Methods for Requirements Gathering, 133t; verification and validation (V&V), 125–26, 128–29 Resources, 134. See also Costs Respiration, 93 Restructuring, 54 Results level of metrics, 197 Retrospective verbal protocols, 333. See also Verbal protocol analysis Return on investment (ROI), 199–200, 204–5, 204t Reward systems, 285, 291 Risk assessments, 140–41 Robertson, J., 212 Robinson, M. D., 363 Robots, xiii–xiv Roesler, A., 252, 253t ROI. See Return on investment Role playing, 196 Root-Bernstein, R., & Root-Bernstein, M., 215 Roscoe, S. N., 206 Rose, F. D., 292 Rosen, B., 381 Ross, K. G.,, 149 Rote training, 9, 25, 26. See also Habitual responses
Index Rouse, W. B., 58 Rule-space methodology, 308 Russia, 380 Rylander, G., 11 Safety. See Protective measures SAGAT (Situation Awareness Global Assessment Technique), 276 Salas, E.: on Big Five components of teamwork, 371; on characteristics of training teams, 315, 351; classification of learning outcomes, 362; on cross-training, 108; mission-oriented constraints to help decompose complex skills, 256; on personality variables, 367; on role specialization increases within a team, 279; on team dimensional training (TDT), 255; on team situation awareness (TSA), 270; virtual teams definition, 267 Salvendy, G., 99–101 SARS (Situational Awareness Rating Scale), 276 SART (situational awareness rating technique), 276 Saucier, G., 368 Saudi Arabia, 380 Saunders, C. S., 380–81 Scaffolding approach to training, 38 Scenario based training (SBT) method: for advanced beginners, 156–57; for competent performers, 158–59; described, 97–98, 101, 252–56, 254f; disadvantages, 263; for expert performers, 162; for feedback, 97, 98, 262–63; FiST example, 259; instructional strategies, 162–63; for novices, 154–55; for proficient performers, 160; scenario generation, 196, 247; for TIMx development, 108 Schmidt, R. A., 263 Schmitt, J. F., 58 Schneider, W., 12 Science education, 52, 53, 54, 57 Scoring model, 306–7 Scott, P. H., 58 Second Life, 58, 247
409
Self-concept, 34 Self-efficacy, 34, 41, 91, 371 Sensemaking requirements for learning cognitive skills, 52–56, 61–62 Sensor based measures, 306 Sensory task analysis, 175–76, 177 SET (stress exposure training), 370–71 Shadrick, S. B., 153 Shanks, G., 212, 213 Shared mental models. See SMM Sherwood, R. D., 260 Shuell, T. J., 53 Shuffler, M., 381 Simmering, M. J., 368 Simon, H. A., 333 Simple sequences, 144 Simplification bias, 256 Sims, D. E., 371 Simulated team environments for measuring team cognition and performance, 347–59; assessing teams, 347–50; Coordinated Awareness of Situation by Teams (CAST), 356–57, 357f; coordination with real team members, 349; issues and challenges of measurement, 350–52; performance measurement, 350–51; Space of Possible Training Technologies Involving Teams, 349t; team cognition, described, 347, 348; Uninhabited Aerial Vehicle Synthetic Task Environment (UAV-STE) case study, 352–59, 356t Simulation-driven assessment, 247 Simulation programs, overview: analog computational programs, xiv; augmented cognition, xv; described, 139; digital computers, history of development, xiv; distributed interactive simulation, xiv; flight simulation, history of, xiv; fundamental components of, xiv; medical simulation, xv Singapore, 322–23 Situational awareness: in demanding environments, 26; Situational Awareness Rating Scale (SARS), 276;
410
Index
Situational awareness rating technique (SART), 276; Situational Training Exercise lane training data, 324–25; Situation Awareness Global Assessment Technique (SAGAT), 276; team cognition and, 269–70, 273 Situation-Present Assessment Method (SPAM), 276 Sitzmann, T., 369–70 Skills, described, 169 Skin conductance (galvanic skin response), 93 Small group issue, 143–44 SMEs (subject matter experts), 77, 171– 72, 175–76, 177, 194–95 Smith-Jentsch, K. A., 262 SMM (shared mental models), 269, 272, 276 Soar model, 140 Social identity/deindividuation theory, 379–80 Sohn, Y. W., 261 Spain, 384 SPAM (Situation-Present Assessment Method), 276 Spatial abilities, 34–37 Special Devices Desk at the Bureau of Aeronautics, xiv Specification process, 124–25 Spence, D. J., 41 Sperotable, L., 35 Spinrad, Robert, 115 Spiro, R. J., 56, 70–71255 S-shaped learning curve, 60 Stage model of cognitive performance, 67–74, 68t, 70t, 71t, 73t, 75t–76t Stagl, K.C., 371 Stakeholders: elicitation, 117; identifying, 118, 126; validation with, 117 Starbuck, W. H., 57 Static Descriptions, 139 Statistical learning analyses, 307–8 Stevens, Ron, 307–8 Storehouse metaphor, 51, 53, 54, 62 Stress, 26, 91, 93. See also Anxiety Stress exposure training (SET), 370–71
Strike, K. A., 52, 54, 61 Subjective measurement (trainee response), 243, 244 Subject matter experts (SMEs), 77, 171– 72, 175–76, 177, 194–95 Surprisingness, 213 Surrogate experiences and fidelity, 74–77 Sutherland, L., 367 Synchronous communication, 382–83 Synthesis, described, 52 Systems analysts. See Creative systems analysts Systems requirements engineering process. See Requirements engineering Tabbers, H. K., 33–34 Tactical Decision Games, 58 Tactical expertise and the cognitive skill acquisition framework, 150–151t, 150– 51t; advanced beginners, 151t, 156–57; competent performers, 157–58; experts, 162–63; implications for tactical thinking training in VEs, 153– 54; novices, 154–56; overview, 149, 150t; principles for learning progression, 149–53; proficient performers, 158–62 Tactical Training Exercise Control Group at 29 Palms, California, 183 Tanaka, J. W., 14 Tannenbuam, S. I., 371 Task analysis (TA): assessment design, 305–6; described, 131–32, 138–39, 195; informed by neurophysiology, 7; link between physical and cognitive TA, 139; sensory task analysis, 175– 76, 177; task decomposition method, 139 Tatsuoka’s rule-space methodology, 308 Taxonomies, 94, 98–99, 136 TDT method, 262 Teams. See Virtual teams TeamViz application, 321–23 Tellegen, A., 366 Temporal measurement guidelines for the study of virtual teams, 376–87;
Index assessments of workload intensity, 385; determining appropriate time intervals for measurement, 386; external deadlines, 385; frequency of measurements, 385–86; member diversity on time based individual differences, 377–81, 384; Summary of Temporal Measurement Guidelines for the Study of Virtual Teams, 378–79t; team temporal mind-set concerning past and present interaction, 381–82; temporal context, 384–85; temporal implications of technology use and task type, 382–83; temporal process mechanisms employed in virtual teams, 383–84; time perspective, described, 377 TER (training effectiveness ratio), 205 Testable hypotheses, 196, 200–201 Thalamus, 11 Theorist’s Pyramid, 2–3 Theorist’s Tetrahedron, 1–2 Theory, performance measures based on, 231 Three Mile Island, 132 Three-Spined Stickleback, 85f Time and accuracy metrics, 179 Time based features of distributed teams. See Temporal measurement guidelines for the study of virtual teams Time boxing, 116 Time-zone differences, 384 Timing, defined, 355 Timing of TA development, 134 TIM (team interaction model), 272 TIMx. See Training Intervention Matrix Tinbergen, N., 83 TIs (training interventions) taxonomies, 98–99, 101–4 TMM (team mental models), 272 TMS (team memory systems), 272–73 TMS (transactive memory system), 269, 272–73, 276 TNAs (training needs analyses), 118, 170–72 Toulmin, S., 54 Tracey, M. R., 35
411
Training: advanced beginners, 157; advanced training: training objectives master set, 256–59; assessments of objectives, 236, 238; to become a creative systems analyst (see Creative systems analysts); competent performers, 159; content driven by assessments, 240; described, xiii, 236; education goals versus, 8; evaluation, described, 364, 369–70; gaps, 172, 173–74t, 194; novices, 155; proficient performers, 161–62; scenario design components, 162–63; taxonomy of environment interactions, 94; training analysis (TA), 118; training effectiveness ratio (TER), 205; training interventions (TIs) taxonomies, 98–99, 101–4; training management component development, 179–82; training needs analyses (TNAs), 118, 170–72. See also Goal orientation (training) Training advanced skills in simulation based training, 74–78, 251–64; advanced training, described, 255; Cognitive Task Analysis (CTA), 77; complex training environments, 78, 252–55, 253t; developing/identifying scenarios and events, 259; developing performance measures, 261; fidelity, 77; five-stage model of complex cognitive skill acquisition, 67–74; performance diagnosis and feedback, 261– 62; pretraining interventions, 259–60; principles for advanced training, 260– 61; Scenario Based Training Framework, 254f; Skills and Knowledge Associated with Expertise, 258t; training advanced skills in simulation based training, 257t; training objectives, 256–59 Training Intervention Matrix (TIMx), 97– 109; blended fidelity training solution, 109; caveats, 108–9; Common Distributed Mission Training Station (CDMTS), 101–4, 105–7t; described, 98–99; education versus training
412
Index
debate, 109; future research needs for TIMx development, 104, 108–9; modifications to the HCIP Model, 100– 101; Training Intervention Matrix (TIMx), 108f; Wei and Salvendy’s HCIP Model, 99–101; X-Axis: Learning Outcomes Taxonomy, 99; Y-Axis: Training Interventions Taxonomy, 101 Training systems requirements analysis, 165–90; case study: MOT2IVE system development effort, 182–87; design lifecycle, 166f, 167–68; design lifecycle: stage 1: training needs/goals identification, 170–72, 182; design lifecycle: stage 2: user-centered design specification, 172–77, 182–83; design lifecycle: stage 3: metrics development, 177–79, 184–5; design lifecycle: stage 4: training management component development, 179–82, 185–87; Example Types of Data from Task Analysis, 175t; human performance requirements analysis (HPRA), 167f, 169–70; human performance requirements analysis (HPRA), described, 165–66, 169, 187–88; human performance requirements analysis (HPRA), stage 1: training needs/goals identification, 170–72; human performance requirements analysis (HPRA), stage 2: user-centered design specification, 172–77; human performance requirements analysis (HPRA), stage 3: metrics development, 177–79; human performance requirements analysis (HPRA), stage 4: training management component development, 179– 82; operational requirements analysis (ORA), 165–66, 168–69, 187–88; Performance Assessment and Diagnostic Tool Root Error Diagnostic Tree, 189f; U.S. Marine Corps Fire Support Teams examples, 171t, 173–74t Transactive memory system (TMS), 269, 272–73, 276 Transition, 93 Tree diagrams, 139
Triangulation for performance assessments, 232–33 TSA (team situation awareness), 270, 276 Tugade, M. M., 54 Tupes, E. C., 365 Typed communication, 317 Uninhabited Aerial Vehicle Synthetic Task Environment (UAV-STE), 352– 59, 356t University of South Australia, 213 Unlearning process, 56–60, 61–62, 152 U.S. Air Force, 245, 320–23, 322f U.S. Army War College, 320–23, 322f Uscinski, R., 87 U.S. Department of Defense, xiv, 236 U.S. Department of Homeland Security, 82 User-centered design specification, 172, 174–75, 182–83 User data elicitation techniques, 136 Usher, E. L., 41 U.S. Marine Corps (USMC): Fire Support Teams (FiST), 183t, 184t, 185t, 186t, 187t, 188f, 188t; importance of simulation, 251; MOT2IVE system development effort, 182–87; Tactical Decision Games, 58 Utility of an assessment, 245 Validity, 205–6; Baker’s model, 303–4, 304f; consequential, 246; construct, 246; content, 246, 303; criterion, 246, 303; described, xv; face, 246, 303; knowledge validation, 195; of measurements, 246; methods, 200– 201; predictive, 303; in simulated team environments performance measures, 351–52, 359; with stakeholders, 117; supporting ideas, 310; V&V (verification and validation), 125–26, 128–29 Value, as characteristic of RE, 213 Van Gerven, P. W. M., 33–34 Van Merrie¨nboer, J. J. G., 304 Variability during practice, 25–26 Ventral limbic circuit, 10–11, 10f, 24
Index Verbal protocol analysis, 332–46; caveats, 333–34; Certain Episode but Uncertain Utterances (Linguistic Markers of Uncertainty in Italics), 341t; Cohen’s kappa, 343–44; concurrent verbal protocols, 333; data collection preparation, 335–36; defined, 332; Example of Segmenting by Complete Thought, 338t; expert versus novice performers, 332–33; inter-rater reliability (IRR), 342–44; limiting disruptions, 337; Mismatch between Linguistic and Global Coding Schemes: Uncertain Episode but Certain Utterances, 340t, 341t; number of participants, 335–36; presenting research results, 344–45; processing data, 338–44; resources required for, 333–34; retrospective verbal protocols, 333; standard training procedure for participants, 337; tools for collecting data, 336–37, 339; Uninhabited Aerial Vehicle Synthetic Task Environment (UAV-STE) example, 357–58 Verbal skills. See Communication Verification and validation (V&V), 125– 26, 128–29. See also Validity VEs. See Virtual environments Video games. See Game based training VIRTE (Virtual Technologies and Environments) Program, xiv–xv Virtual environments (VEs), overview: benefits, 131, 301; historical tradition, 132, 227–28; overview of, 300–301 Virtual teams: boundaries, 267; collective mind of, 386; conventional teams versus, 267–68; cultural diversity within, 380–81, 384; described, 267– 68, 315, 347, 376–77; distributed nature, 273; face-to-face (FTF) interactions within, 377, 381–83, 385–86; habits and abilities of members, 381; lifecycles, 267, 274; organizational objectives, 285–91; performance measures of, 232(see also Automated performance assessment of teams); role conflict, 267; roles occupied within,
413
273; round-the-clock operations, 384; selection, training and workplace design, 143; staff reductions, 143; task interdependency, 274–75; taskwork, 315; team interaction model (TIM), 272; team learning, Big Five components, 371; team memory systems (TMS), 272–73; team mental models (TMM), 272; team situation awareness (TSA), 270, 276; temporal distribution, 268; temporary versus ongoing nature of, 381–82; time-zone differences, 384; unlearning process, 57 Virtual teams, measures of team cognition within, 266–80; assessing teams, 347–50; Coordinated Awareness of Situation by Teams (CAST), 356–57, 357f; coordination with real team members, 349; described, 268–69, 347, 348; elicitation methods, 275–79; issues and challenges of, 350–52; performance measurement, 350–51; properties of team performance measurement systems, 270–71, 271f; shared mental models (SMM), 269, 272, 276; simulated environments for measuring team cognition and performance, 347– 59; source measures, 275; Space of Possible Training Technologies Involving Teams, 349t; team mental models (TMM), 272; team situation awareness (TSA), 269–70, 273, 276; transactive memory system (TMS), 269, 272–73, 276; UAV-STE case study, 352–59, 356t Virtual Technologies and Environments (VIRTE) Program, xiv–xv Visual cortex (N170), 14–16, 15f, 20–22, 20f, 21f, 24, 25 Visual support, 37 VRISE (virtual reality induced symptoms and effects), 243 Vye, N. J., 260 Vygotsky, L. S., 214–15, 240 Waller, D., 35, 54
414
Index
Waterfall systems development lifecycle, 116 Watkins, J., 367 Watson, D., 366 Watt, James, 82 Wegner, D. M., 269 Wei, J., 99–101 Weigman, D. A., 99 Weil, M., 261 Weiss, P. L., 37–38 West, L. J., 380 “What-iffing” strategy, 260 Whiteman, J., 367, 368–69 Whitton, M., 54
Wiener, Norbert, 82 Williamson, D. M., 301, 305 Witmer, B. G., 54 Women, 36–37 Woods, D. D., 56, 252, 253t Work safety analysis, 140–41 Yeh, Y. C., 37 Yerkes-Dodson law, 91 Yoo, Y., 386 Youngcourt, S. S., 368, 369 Zimmerman, R. D., 369–70 Zone of proximal development, 240
ABOUT THE EDITORS AND CONTRIBUTORS THE EDITORS DYLAN SCHMORROW, Ph.D., is an international leader in advancing virtual environment science and technology for training and education applications. He has received both the Human Factors and Ergonomics Society Leland S. Kollmorgen Spirit of Innovation Award for his contributions to the field of Augmented Cognition, and the Society of United States Naval Flight Surgeons Sonny Carter Memorial Award in recognition of his career improving the health, safety, and welfare of military operational forces. Schmorrow is a Commander in the U.S. Navy and has served at the Office of the Secretary of Defense, the Office of Naval Research, the Defense Advanced Research Projects Agency, the Naval Research Laboratory, the Naval Air Systems Command, and the Naval Postgraduate School. He is the only naval officer to have received the Navy’s Top Scientist and Engineers Award. JOSEPH COHN, Ph.D., is a Lieutenant Commander in the U.S. Navy, a full member of the Human Factors and Ergonomics Society, the American Psychological Association, and the Aerospace Medical Association. Selected as the Potomac Institute for Policy Studies’ 2006 Lewis and Clark Fellow, Cohn has more than 60 publications in scientific journals, edited books, and conference proceedings and has given numerous invited lectures and presentations. DENISE NICHOLSON, Ph.D., is Director of Applied Cognition and Training in the Immersive Virtual Environments Laboratory at the University of Central Florida’s Institute for Simulation and Training. She holds joint appointments in UCF’s Modeling and Simulation Graduate Program, Industrial Engineering and Management Department, and the College of Optics and Photonics. In recognition of her contributions to the field of Virtual Environments, Nicholson received the Innovation Award in Science and Technology from the Naval Air Warfare Center and has served as an appointed member of the international NATO Panel on “Advances of Virtual Environments for Human Systems Interaction.” She joined UCF in 2005, with more than 18 years of government experience ranging
416
About the Editors and Contributors
from bench level research at the Air Force Research Lab to leadership as Deputy Director for Science and Technology at NAVAIR Training Systems Division. THE CONTRIBUTORS G. VINCENT AMICO, Ph.D., is one of the pioneers of simulation—with over 50 years of involvement in the industry. He is one of the principal agents behind the growth of the simulation industry, both in Central Florida and nationwide. He began his simulation career in 1948 as a project engineer in the flight trainers branch of the Special Devices Center, a facility now known as NAVAIR Orlando. During this time, he made significant contributions to simulation science. He was one of the first to use commercial digital computers for simulation, and in 1966, he chaired the first I/ITSEC Conference, the now well-established annual simulation, training, and education meeting. By the time he retired in 1981, he had held both the Director of Engineering and the Direct of Research positions within NAVAIR Orlando. Amico has been the recipient of many professional honors, including the I/ITSEC Lifetime Achievement Award, the Society for Computer Simulation Presidential Award, and an honorary Ph.D. in Modeling and Simulation from the University of Central Florida. The NCS created “The Vince Amico Scholarship” for deserving high school seniors interested in pursuing study in simulation, and in 2001, in recognition of his unselfish commitment to simulation technology and training, Orlando mayor Glenda Hood designated December 12, 2001, as “Vince Amico Day.” RANDOLPH ASTWOOD is a Research Psychologist at NAWCTSD Orlando and is currently a doctoral candidate in the Industrial/Organizational Psychology program at the University of Central Florida. In addition, he holds an M.S. degree in Industrial/Organizational Psychology from the University of Central Florida. His primary research interests include training and teams. EVA L. BAKER is Director at CRESST and a UCLA Distinguished Professor. Her research has focused on the design and validity of assessment models that integrate research from learning and psychometrics, exploring effectiveness of technology for assessment and instruction. Her interests traverse subject matter fields, learner ages, and education and training purposes. HOLLY C. BAXTER, Ph.D., Co-Founder and Chief Scientist of Strategic Knowledge Solutions, has spent the past decade specializing in cognitively based instructional design, assessment metrics, and training in both military and commercial environments. She has published numerous articles in the field and has been an invited speaker at multiple conferences and events. WILLIAM BECKER, Ph.D., is a member of the research faculty in the MOVES Institute at the Naval Postgraduate School. His specialty is the development of hardware and software to support advanced training for military personnel. He is currently working with the U.S. Marine Corps.
About the Editors and Contributors
417
WENDY L. BEDWELL is a doctoral student in the University of Central Florida Industrial/Organizational Psychology program. Her research interests include motivation, technology, training, and distributed teams. She earned a B.A. in Psychology from James Madison University and a master’s degree in Distance Education from the University of Maryland, University College. WILLIAM L. BEWLEY, Ph.D., is Assistant Director at CRESST. His research focuses on applications of advanced technology to instruction and assessment of performance on complex tasks. He is an experimental psychologist with a background in education and training, computer science, software development, program management, and product management. ELIZABETH BIDDLE, Ph.D., is the Instructional Systems Site Lead at Boeing Training Systems & Services in Orlando, Florida. She has led human performance and training research and development activities in the area of human performance, adaptive learning, and training simulation. She is currently leading new business initiatives for advanced training capabilities. DEBORAH BOEHM-DAVIS, Ph.D., is currently Professor and Chair of Psychology at George Mason University and has worked previously at General Electric, NASA Ames, and Bell Laboratories. She has been president and secretary-treasurer of the Human Factors and Ergonomics Society and president of the Applied Experimental and Engineering Psychology Division of the American Psychological Association. CLINT BOWERS is a Professor of Psychology and Digital Media at the University of Central Florida. His research interests include the use of technology for individual and team learning. C. SHAWN BURKE is a Research Scientist at the Institute for Simulation and Training, University of Central Florida. She is currently investigating team adaptability, multicultural team performance, multiteam systems, leadership, measurement, and training of such teams. Dr. Burke received her doctorate in Industrial/ Organizational Psychology from George Mason University in 2000. GWENDOLYN CAMPBELL is a Senior Research Psychologist at NAWCTSD. She holds an M.S. and a Ph.D. in Experimental Psychology from the University of South Florida and a B.A. in Mathematics from Youngstown State University. Her research interests include human performance modeling and a cognitively based science of instruction. JAN CANNON-BOWERS is a Senior Research Associate at the UCF’s Institute for Simulation and Training and Director for Simulation Initiatives at the College of Medicine. Her research interests include the application of technology to the learning process. In particular, she has been active in developing synthetic learning environments for a variety of task environments.
418
About the Editors and Contributors
MEREDITH BELL CARROLL is a senior research associate at Design Interactive, Inc. She is currently a doctoral candidate in Human Factors and Applied Experimental Psychology at the University of Central Florida. Her research interests include human/team performance and training in complex systems with focuses on performance measurement and virtual training technology. GREGORY K. W. K. CHUNG, Ph.D., is a senior research associate at CRESST. He has experience embedding advanced computational tools in computer based assessments to measure problem solving and content knowledge in K16 and military domains. Dr. Chung earned a Ph.D. in Educational Psychology, an M.S. degree in Educational Computing, and a B.S. degree in Electrical Engineering. JOSEPH COHN received his Ph.D. in Neuroscience from Brandeis University’s Ashton Graybiel Spatial Orientation Laboratory and continued his postdoctoral studies with Dr. J. A. Scott Kelso. His research interests focus on maintaining human performance/human effectiveness in real world environments by optimizing the symbiosis of humans and machines. NANCY COOKE received her Ph.D. in Cognitive Psychology from New Mexico State University in 1987. She is a Professor of Applied Psychology at Arizona State University Polytechnic and Science Director of the Cognitive Engineering Research Institute in Mesa, Arizona. She is currently Editor-inChief of Human Factors. Her research focuses on team cognition. JESSICA CORNEJO received her Ph.D. in Industrial & Organizational Psychology from the University of Central Florida in 2007. She is currently an Organizational Development Project Leader at CVS Caremark. Her professional interests lie in the areas of selection, organizational development, and diversity. JACOB CYBULSKI is Associate Professor in the School of Information Systems at Deakin University. His research includes IS theory, methodology and strategy, with a focus on business/IT alignment. His projects range from engineering and telecommunications to business applications and recently also e-commerce and Web systems, educational video, and e-simulation. JOSEPH DALTON is currently a graduate student in the Industrial Engineering and Management Systems program at the University of Central Florida. He received his B.S. in Liberal Studies focusing in the areas of Mathematics, Computer Science, and Psychology. His research interests include human performance and applied usability. SUSAN EITELMAN DEAN is a Senior Scientist at Applied Research Associates. Her work focuses on the development of intelligent agents, including instructor support and role-player simulations for military training. Susan holds a M.S. in Industrial Engineering from the University of Central Florida.
About the Editors and Contributors
419
GIRLIE C. DELACRUZ is a Research Associate at CRESST. Her current work involves researching the use of technology, simulations, and games to improve assessment and learning in both military and educational contexts. Ms. Delacruz is currently a doctoral student in Psychological Studies in Education at UCLA. DEBORAH DIAZGRANADOS is a doctoral candidate in the Industrial/ Organizational Psychology program at the University of Central Florida. Ms. DiazGranados received a B.S. in Psychology and Management from the University of Houston, and her M.S. is in Industrial/Organizational Psychology from the University of Central Florida. DAVID DORSEY is employed by the National Security Agency. Dr. Dorsey holds a Ph.D. in Industrial-Organizational Psychology and a graduate minor in Computer Science from the University of South Florida. His professional interests include performance measurement, testing and assessment, training and training technologies, and computational modeling. JAMES DRISKELL is president of Florida Maxima Corporation and adjunct professor of psychology at Rollins College. At Florida Maxima, he has conducted research on training, selection, and performance under stress for the U.S. Army, the U.S. Navy, the U.S. Air Force, NASA, the FAA, the National Science Foundation, the Department of Homeland Security, and others. JASMINE DURAN is a first year Applied Psychology graduate student at Arizona State University Polytechnic and is employed by the Cognitive Engineering Research Institute as a research assistant. She received a B.S. in Psychology from Arizona State University in 2005. Miss Duran’s research interests include team performance related to coordination, communication, and decision making. PETER FOLTZ is founder and Vice President for Research at Pearson Knowledge Technologies and Senior Research Associate at the University of Colorado, Institute of Cognitive Science. His work focuses on cognitive science approaches to measuring individual and team knowledge. Peter has served as principal investigator for research for the U.S. Army, the U.S. Air Force, the U.S. Navy, DARPA, and NSF. JENNIFER FOWLKES, Ph.D., is a Managing Cognitive Engineer at CHI Systems, Inc. She enjoys working with other scientists to imbue scenario based training systems with sound training infrastructures that link training to real world needs, learning science, and emerging training technologies. JARED FREEMAN, Ph.D., is Sr. Vice President for Research at Aptima. His research and development efforts focus on performance and communications measurement, training, and the design of organizational structures and processes. Dr. Freeman is the author of more than 80 articles and book chapters concerning these and related topics.
420
About the Editors and Contributors
JAMIE GORMAN received his Ph.D. in Psychology from New Mexico State University in 2006. Dr. Gorman is currently a postdoctoral researcher at the Cognitive Engineering Research Institute and Arizona State University Polytechnic in Mesa, Arizona. His research includes dynamical systems theory of team coordination, team collaboration, and communications research. MELISSA M. HARRELL is a doctoral student in Industrial/Organizational Psychology at the University of Central Florida. Her primary research interest is performance measurement and improvement. She earned a B.S. in Psychology from the University of Florida and her M.S. in Industrial/Organizational Psychology from the University of Central Florida. GARY KLEIN, Ph.D., helped found the field of naturalistic decision making. He has written Sources of Power: How People Make Decisions (1998, MIT Press), The Power of Intuition (2004, A Currency Book/Doubleday), and Working Minds: A Practitioner’s Guide to Cognitive Task Analysis (Crandall, Klein, and Hoffman, 2006, MIT Press). NOELLE LAVOIE is a founder of Parallel Consulting, LLC, where she is the lead Cognitive Psychologist, and a former Senior Member of Technical Staff at Pearson Knowledge Technologies. Her work includes studying online collaborative learning, visualization tools to support multinational collaboration, tacit knowledge based assessment, and development of military leadership. HEATHER LUM is a doctoral student pursuing a degree in Applied Experimental & Human Factors Psychology from the University of Central Florida. She is also a graduate research fellow at the UCF Institute for Simulation & Training. Her research currently includes psychophysiological assessment of team processes and cognition. PHAN LUU, Ph.D., is Chief Technology Officer and Scientist at EGI. His research interests include learning and memory, affect, personality, and neural mechanisms of self-regulation. LINDA MALONE, Ph.D., is a Professor in Industrial Engineering. She is the coauthor of a statistics text and has authored or coauthored over 75 refereed papers. She has been an associate editor of several journals. She is a Fellow of the American Statistical Association. TOM MAYFIELD has been a Human Factors engineer for 35 years with experience in nuclear, military, and marine ergonomics. At Rolls-Royce plc, he was responsible for introducing virtual reality as an HF tool in the design for operability of nuclear submarine control systems. He is a FErgS and full member of the HFES, a past President of the Potomac Chapter of the HFES, and was an Adjunct Associate Professor at GMU.
About the Editors and Contributors
421
DENNIS MCBRIDE, Ph.D., MPA, is Interim Director of the Center for Neurotechnology Studies and President of the Potomac Institute for Policy Studies. His Ph.D. in experimental psychology from the University of Georgia and his postdoctoral master’s degree in systems from the University of Southern California focused on mathematical learning theory and cybernetics. LAURA MILHAM received her doctorate from the Applied Experimental and Human Factors Psychology program at the University of Central Florida. At Design Interactive, she is the Training Systems Director and Principal Investigator of numerous projects in support of the development and assessment of the effectiveness of training systems and training management systems. SUSAN MOHAMMED is an associate professor of Industrial and Organizational Psychology at The Pennsylvania State University. She received her Ph.D. from The Ohio State University. Her research focuses on teams and decision making, with a special emphasis on team mental models, team composition/ diversity, and the role of time in team research. RAZIA NAYEEM, Ph.D., is a Research Psychologist with SA Technologies. Her research interests include development of training requirements using knowledge elicitation techniques and the study of stress effects. Razia’s recent work includes examining the effects of stress on the performance of combat medic tasks. KELLY NEVILLE, Ph.D., is an Associate Professor of Human Factors and Systems at Embry-Riddle Aeronautical University and a cognitive engineer with CHI Systems, Inc. She is interested in the ways humans interact with each other and with technology in work and training contexts, and in influences that shape these interactions. LEMAI NGUYEN, Ph.D., is a Senior Lecturer at School of Information Systems, Deakin University. Her primary contributions are to the study of creativity and problem solving activities in Requirements Engineering. Other areas of her research contributions include health care informatics, sociotechnical aspects of virtual communities, online learning, and information systems in general. ROB OBERBRECKLING has worked with Pearson as a Senior Member of Technical Staff and currently performs research and engineering in natural language processing, cognitive science, and machine learning systems. ORLANDO OLIVARES, Ph.D., is a Senior Industrial-Organizational Psychologist at Aptima. He provides a full range of services related to enhancing organizational effectiveness, namely, organizational assessment, problem identification and resolution, personnel selection, and the development of performance and business metrics and incentive systems. Dr. Olivares holds a Ph.D. in Industrial/ Organizational Psychology from Texas A&M.
422
About the Editors and Contributors
JENNIFER PHILLIPS is the President and Senior Scientist of Cognitive Training Solutions, LLC. She has over 12 years of experience conducting research and developing applications in the area of human cognition and naturalistic decision making. Her research interests include skill acquisition, cognitive performance improvement, and the nature of expertise. CATHERINE POULSEN, Ph.D., is a Scientist at EGI, where she conducts research using cognitive experimental and dense-array EEG methods. Her primary line of research examines the neural dynamics underlying the adaptive control of learning and performance, and its modulation by goals, incentives, and feedback. ROBERT D. PRITCHARD is Professor of Psychology and Management at the University of Central Florida. He received the Distinguished Scientific Contribution Award from the Society for Industrial and Organizational Psychology and is a Fellow in the Society for Industrial and Organizational Psychology, the American Psychological Association, and the American Psychological Society. MICHAEL A. ROSEN is a doctoral candidate at the University of Central Florida and a graduate researcher at the Institute for Simulation and Training. He has co-authored over 60 peer reviewed journal articles, book chapters, and conference papers related to teams, decision making and problem solving, performance measurement, and simulation. MARK ROSENSTEIN is a Senior Member of Technical Staff at Pearson applying machine learning and natural language processing techniques to problems involving understanding and assessing language and the activities connected with the use of language. KAROL ROSS, Ph.D., is a research psychologist at the Institute for Simulation & Training at the University of Central Florida. She is also the Chief Scientist for the Cognitive Performance Group, a small business in Orlando, Florida. She has over 20 years of experience in military training research. STEVEN RUSSELL is a Research Scientist at Personnel Decisions Research Institutes in Arlington, Virginia. He holds a Ph.D. in Industrial-Organizational Psychology from Bowling Green State University. His professional interests include the design and evaluation of training programs, criterion measurement, and test development and validation, including item response theory techniques. EDUARDO SALAS is Trustee Chair, Pegasus Professor, and Professor of Psychology at the University of Central Florida and Program Director for the Human Systems Integration Research Department at the Institute for Simulation and Training. His expertise includes teamwork, designing/implementing team training strategies, training effectiveness, and developing performance measurement tools.
About the Editors and Contributors
423
SHANNON SCIELZO is an Assistant Professor at the University of Texas– Arlington. She received her Ph.D. in Industrial/Organizational Psychology from the University of Central Florida in 2008 where she worked as a graduate fellow for the Multidisciplinary University Research Initiative and a graduate research assistant in the Team Training and Workforce Development Laboratory. KIMBERLY SMITH-JENTSCH, Ph.D., has been a member of the psychology department at the University of Central Florida since 2003. Her research on teams, training, performance assessment, and mentoring is published in journals such as the Journal of Applied Psychology, Personnel Psychology, Journal of Organizational Behavior, and Journal of Vocational Behavior. WEBB STACY, Ph.D., is Vice President for Technology at Aptima. Dr. Stacy is responsible for Aptima’s current and future strategic technology portfolio and is involved in creating intelligent systems for modeling and assessing human performance. Dr. Stacy received a Ph.D. in Cognitive Science from SUNY/Buffalo and a B.A. in Psychology from the University of Michigan. KAY STANNEY is President of Design Interactive, Inc. She received her Ph.D. in Industrial Engineering from Purdue University, after which time she spent 15 years as a professor at the University of Central Florida. She has over 15 years of experience in the design, development, and evaluation of human-interactive systems. J. GREGORY TRAFTON, Ph.D., is a cognitive scientist at the Naval Research Laboratory. He is interested in building and applying theories in complex, real world situations, including meteorology, scientific visualization, interruptions and resumptions, and robotics. He has collected data in both naturalistic settings and laboratory environments to build computational and mathematical theories. SUSAN TRICKETT is a cognitive scientist in Denver, Colorado. She is interested in spatial cognition, the use of simple and complex visualizations, learning, and the acquisition of expertise, particularly in complex, real world domains. DON TUCKER, Ph.D., is CEO and Chief Scientist at EGI, Professor of Psychology and Associate Director of the NeuroInformatics Institute at University of Oregon. His research examines self-regulatory mechanisms of the human brain. These include motivational and emotional control of cognition, as well as neurophysiological control of arousal, sleep, and seizures. WENDI VAN BUSKIRK is a Research Psychologist at NAWCTSD. She is currently a doctoral candidate in the Applied Experimental and Human Factors Psychology Ph.D. program at the University of Central Florida. Her research interests include human cognition and performance, instructional strategies, human performance modeling, and human-computer interaction.
424
About the Editors and Contributors
JENNIFER VOGEL-WALCUTT, Ph.D., is a researcher at UCF’s Institute for Simulation and Training. She leads the Learning Initiatives team within the ACTIVE Lab focusing on learning efficiency in complex environments. She is currently applying this research to military domains by creating strategies and methodologies for more efficient training. SALLIE J. WEAVER is a doctoral student in the Industrial and Organizational Psychology program at the University of Central Florida. She earned her M.S. in Industrial/Organizational Psychology from UCF. Her research interests include individual and team training, performance measurement, motivation, and simulation, with an emphasis in health care. YANG ZHANG obtained a Bachelor of Arts degree from Grinnell College and her Master of Science degree at Penn State. She is currently pursuing her Ph.D. in Industrial/Organizational Psychology at Penn State. Her research interests include teams and cross-cultural issues in organizations. Her dissertation focuses on temporal issues on teams.
The PSI Handbook of Virtual Environments for Training and Education
Praeger Security International Advisory Board Board Cochairs Loch K. Johnson, Regents Professor of Public and International Affairs, School of Public and International Affairs, University of Georgia (U.S.A.) Paul Wilkinson, Professor of International Relations and Chairman of the Advisory Board, Centre for the Study of Terrorism and Political Violence, University of St. Andrews (U.K.) Members Anthony H. Cordesman, Arleigh A. Burke Chair in Strategy, Center for Strategic and International Studies (U.S.A.) The´re`se Delpech, Director of Strategic Affairs, Atomic Energy Commission, and Senior Research Fellow, CERI (Fondation Nationale des Sciences Politiques), Paris (France) Sir Michael Howard, former Chichele Professor of the History of War and Regis Professor of Modern History, Oxford University, and Robert A. Lovett Professor of Military and Naval History, Yale University (U.K.) Lieutenant General Claudia J. Kennedy, USA (Ret.), former Deputy Chief of Staff for Intelligence, Department of the Army (U.S.A.) Paul M. Kennedy, J. Richardson Dilworth Professor of History and Director, International Security Studies, Yale University (U.S.A.) Robert J. O’Neill, former Chichele Professor of the History of War, All Souls College, Oxford University (Australia) Shibley Telhami, Anwar Sadat Chair for Peace and Development, Department of Government and Politics, University of Maryland (U.S.A.) Fareed Zakaria, Editor, Newsweek International (U.S.A.)
The PSI Handbook of Virtual Environments for Training and Education DEVELOPMENTS FOR THE MILITARY AND BEYOND Volume 2 VE Components and Training Technologies Edited by Denise Nicholson, Dylan Schmorrow, and Joseph Cohn
Technology, Psychology, and Health
PRAEGER SECURITY INTERNATIONAL
Westport, Connecticut
•
London
Library of Congress Cataloging-in-Publication Data The PSI handbook of virtual environments for training and education : developments for the military and beyond. p. cm. – (Technology, psychology, and health, ISSN 1942–7573 ; v. 1-3) Includes bibliographical references and index. ISBN 978–0–313–35165–5 (set : alk. paper) – ISBN 978–0–313–35167–9 (v. 1 : alk. paper) – ISBN 978–0–313–35169–3 (v. 2 : alk. paper) – ISBN 978–0–313–35171–6 (v. 3 : alk. paper) 1. Military education–United States. 2. Human-computer interaction. 3. Computer-assisted instruction. 4. Virtual reality. I. Schmorrow, Dylan, 1967- II. Cohn, Joseph, 1969- III. Nicholson, Denise, 1967- IV. Praeger Security International. V. Title: Handbook of virtual environments for training and education. VI. Title: Praeger Security International handbook of virtual environments for training and education. U408.3.P75 2009 355.0078’5–dc22 2008027367 British Library Cataloguing in Publication Data is available. Copyright © 2009 by Denise Nicholson, Dylan Schmorrow, and Joseph Cohn All rights reserved. No portion of this book may be reproduced, by any process or technique, without the express written consent of the publisher. Library of Congress Catalog Card Number: 2008027367 ISBN-13: 978–0–313–35165–5 (set) 978–0–313–35167–9 (vol. 1) 978–0–313–35169–3 (vol. 2) 978–0–313–35171–6 (vol. 3) ISSN: 1942–7573 First published in 2009 Praeger Security International, 88 Post Road West, Westport, CT 06881 An imprint of Greenwood Publishing Group, Inc. www.praeger.com Printed in the United States of America
The paper used in this book complies with the Permanent Paper Standard issued by the National Information Standards Organization (Z39.48–1984). 10 9 8 7 6 5 4 3 2 1
To our families, and to the men and women who have dedicated their lives to educate, train, and defend to keep them safe
This page intentionally left blank
CONTENTS
Series Foreword
xi
Preface by G. Vincent Amico
xiii
Acknowledgments
xvii
SECTION 1: VIRTUAL ENVIRONMENT COMPONENT TECHNOLOGIES Section Perspective Mary Whitton and R. Bowen Loftin Appendix A: Modeling and Rendering by Mary Whitton and Jeremy Wendt Appendix B: Speech and Language Systems: Recognition, Understanding, and Synthesis by Ramy Sadek
1
15 21
Part I: Subsystem Components
23
Chapter 1: Tracking for Training in Virtual Environments: Estimating the Pose of People and Devices for Simulation and Assessment Greg Welch and Larry Davis
23
Chapter 2: Visual Displays: Head-Mounted Displays Mark Bolas and Ian McDowall
48
Chapter 3: Projector Based Displays Herman Towles, Tyler Johnson, and Henry Fuchs
63
Chapter 4: Audio Ramy Sadek
90
Chapter 5: Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular C ¸ ag˘atay Bas¸dog˘an and R. Bowen Loftin
116
Chapter 6: Mixed and Augmented Reality for Training Steven Henderson and Steven Feiner
135
viii
Contents
Part II: Topics for Component Integration
157
Chapter 7: Designing User Interfaces for Training Dismounted Infantry James Templeman, Linda Sibert, Robert Page, and Patricia Denbrook
157
Chapter 8: Rendering and Computing Requirements 173 Perry McDowell, Michael Guerrero, Danny McCue, and Brad Hollister Chapter 9: Behavior Generation in Semi-Automated Forces Mikel Petty
189
Chapter 10: Games and Gaming Technology for Training Perry McDowell
205
Chapter 11: Virtual Environment Sickness and Implications for Training Julie Drexler, Robert Kennedy, and Linda Malone
219
Chapter 12: Evaluating Virtual Environment Component Technologies Mary Whitton and Fred Brooks
240
SECTION 2: TRAINING SUPPORT TECHNOLOGIES Section Perspective Jan Cannon-Bowers and Clint Bowers
263
Chapter 13: Guidelines for Using Simulations to Train Higher Level Cognitive and Teamwork Skills Rene´e Stout, Clint Bowers, and Denise Nicholson
270
Part III: Training Management
297
Chapter 14: After Action Review in Simulation Based Training Don Lampton, Glenn Martin, Larry Meliza, and Stephen Goldberg
297
Chapter 15: Interfacing Interactive 3-D Simulations with Learning Systems 311 Curtis Conkey and Brent Smith 326 Chapter 16: Enhancing Situation Awareness Training in Virtual Reality through Measurement and Feedback Jennifer Riley, David Kaber, Mohamed Sheik-Nainar, and Mica Endsley Chapter 17: Assessing Cognitive Workload in Virtual Environments Brad Cain and Joe Armstrong
348
Part IV: Training Paradigms
363
Chapter 18: Knowledge Elicitation: The FLEX Approach Scott Shadrick and James Lussier
363
Chapter 19: Story Based Learning Environments Andrew Gordon
378
Contents
ix
Chapter 20: Intelligent Tutoring and Pedagogical Experience Manipulation in Virtual Learning Environments H. Chad Lane and Lewis Johnson
393
Chapter 21: Enhancing Virtual Environments to Support Training Mike Singer and Amanda Howey
407
Acronyms
423
Index
427
About the Editors and Contributors
451
This page intentionally left blank
SERIES FOREWORD
LAUNCHING THE TECHNOLOGY, PSYCHOLOGY, AND HEALTH DEVELOPMENT SERIES The escalating complexity and operational tempo of the twenty-first century requires that people in all walks of life acquire ever-increasing knowledge, skills, and abilities. Training and education strategies are dynamically changing toward delivery of more effective instruction and practice, wherever and whenever needed. In the last decade, the Department of Defense has made significant investments to advance the science and technology of virtual environments to meet this need. Throughout this time we have been privileged to collaborate with some of the brightest minds in science and technology. The intention of this three-volume handbook is to provide comprehensive coverage of the emerging theories, technologies, and integrated demonstrations of the state-of-the-art in virtual environments for training and education. As Dr. G. Vincent Amico states in the Preface, an important lesson to draw from the history of modeling and simulation is the importance of process. The human systems engineering process requires highly multidisciplinary teams to integrate diverse disciplines from psychology, education, engineering, and computer science (see Nicholson and Lackey, Volume 3, Section 1, Chapter 1). This process drives the organization of the handbook. While other texts on virtual environments (VEs) focus heavily on technology, we have dedicated the first volume to a thorough investigation of learning theories, requirements definition, and performance measurement. The second volume provides the latest information on a range of virtual environment component technologies and a distinctive section on training support technologies. In the third volume, an extensive collection of integrated systems is discussed as virtual environment use-cases along with a section of training effectiveness evaluation methods and results. Volume 3, Section 3 highlights future applications of this evolving technology that span cognitive rehabilitation to the next generation of museum exhibitions. Finally, a glimpse into the potential future of VEs is provided as an original short story entitled “Into the Uncanny Valley” from Judith Singer and Hollywood director Alex Singer.
xii
Series Foreword
Through our research we have experienced rapid technological and scientific advancements, coinciding with a dramatic convergence of research achievements representing contributions from numerous fields, including neuroscience, cognitive psychology and engineering, biomedical engineering, computer science, and systems engineering. Historically, psychology and technology development were independent research areas practiced by scientists and engineers primarily trained in one of these disciplines. In recent years, however, individuals in these disciplines, such as the close to 200 authors of this handbook, have found themselves increasingly working within a unified framework that completely blurs the lines of these discrete research areas, creating an almost “metadisciplinary” (as opposed to multidisciplinary) form of science and technology. The strength of the confluence of these two disciplines lies in the complementary research and development approaches being employed and the interdependence that is required to achieve useful technological applications. Consequently, with this handbook we begin a new Praeger Security International Book Series entitled Technology, Psychology, and Health intended to capture the remarkable advances that will be achieved through the continued seamless integration of these disciplines, where unified and simultaneously executed approaches of psychology, engineering, and practice will result in more effective science and technology applications. Therefore, the esteemed contributors to the Technology, Psychology, and Health Development Series strive to capture such advancements and effectively convey both the practical and theoretical elements of the technological innovations they describe. The Technology, Psychology, and Health Development Series will continue to address the general themes of requisite foundational knowledge, emergent scientific discoveries, and practical lessons learned, as well as cross-discipline standards, methodologies, metrics, techniques, practices, and visionary perspectives and developments. The series plans to showcase substantial advances in research and development methods and their resulting technologies and applications. Cross-disciplinary teams will provide detailed reports of their experiences applying technologies in diverse areas—from basic academic research to industrial and military fielded operational and training systems to everyday computing and entertainment devices. A thorough and comprehensive consolidation and dissemination of psychology and technology development efforts is no longer a noble academic goal—it is a twenty-first century necessity dictated by the desire to ensure that our global economy and society realize their full scientific and technological potentials. Accordingly, this ongoing book series is intended to be an essential resource for a large international audience of professionals in industry, government, and academia. We encourage future authors to contact us for more information or to submit a prospectus idea. Dylan Schmorrow and Denise Nicholson Technology, Psychology, and Health Development Series Editors
[email protected]
PREFACE G. Vincent Amico It is indeed an honor and pleasure to write the preface to this valuable collection of articles on simulation for education and training. The fields of modeling and simulation are playing an increasingly important role in society. You will note that the collection is titled virtual environments for training and education. I believe it is important to recognize the distinction between those two terms. Education is oriented to providing fundamental scientific and technical skills; these skills lay the groundwork for training. Simulations for training are designed to help operators of systems effectively learn how to operate those systems under a variety of conditions, both normal and emergency situations. Cognitive, psychomotor, and affective behaviors must all be addressed. Hence, psychologists play a dominant role within multidisciplinary teams of engineers and computer scientists for determining the effective use of simulation for training. Of course, the U.S. Department of Defense’s Human Systems Research Agencies, that is, Office of the Secretary of Defense, Office of Naval Research, Air Force Research Lab, Army Research Laboratory, and Army Research Institute, also play a primary role—their budgets support many of the research activities in this important field. Volume 1, Section 1 in this set addresses many of the foundational learning issues associated with the use of simulation for education and training. These chapters will certainly interest psychologists, but are also written so that technologists and other practitioners can glean some insight into the important science surrounding learning. Throughout the set, training technologies are explored in more detail. In particular, Volume 2, Sections 1 and 2 include several diverse chapters demonstrating how learning theory can be effectively applied to simulation for training. The use of simulation for training goes back to the beginning of time. As early as 2500 B.C., ancient Egyptians used figurines to simulate warring factions. The precursors of modern robotic simulations can be traced back to ancient China, from which we have documented reports (circa 200 B.C.) of artisans constructing mechanical automata, elaborate mechanical simulations of people or animals. These ancient “robots” included life-size mechanical humanoids, reportedly capable of movement and speech (Kurzweil, 1990; Needham, 1986). In those
xiv
Preface
early days, these mechanical devices were used to train soldiers in various phases of combat, and military tacticians used war games to develop strategies. Simulation technology as we know it today became viable only in the early twentieth century. Probably the most significant event was Ed Link’s development of the Link Trainer (aka the “Blue Box”) for pilot training. He applied for its patent in 1929. Yet, simulation did not play a major role in training until the start of World War II (in 1941), when Navy captain Luis de Florez established the Special Devices Desk at the Bureau of Aeronautics. His organization expanded significantly in the next few years as the value of simulation for training became recognized. Captain de Florez is also credited with the development of the first flight simulation that was driven by an analog computer. Developed in 1943, his simulator, called the operational flight trainer, modeled the PBM-3 aircraft. In the period after World War II, simulators and simulation science grew exponentially based upon the very successful programs initiated during the war. There are two fundamental components of any modern simulation system. One is a sound mathematical understanding of the object to be simulated. The other is the real time implementation of those models in computational systems. In the late 1940s the primary computational systems were analog. Digital computers were very expensive, very slow, and could not solve equations in real time. It was not until the late 1950s and early 1960s that digital computation became viable. For instance, the first navy simulator to use a commercial digital computer was the Attack Center Trainer at the FBM Facility (New London, Connecticut) in 1959. Thus, it has been only for the past 50 years that simulation has made major advancements. Even today, it is typical that user requirements for capability exceed the ability of available technology. There are many areas where this is particularly true, including rapid creation of visual simulation from actual terrain environment databases and human behavior representations spanning cognition to social networks. The dramatic increases in digital computer speed and capacity have significantly closed the gap. But there are still requirements that cannot be met; these gaps define the next generation of science and technology research questions. In the past decade or so, a number of major simulation initiatives have developed, including distributed interactive simulation, advanced medical simulation, and augmented cognition supported simulation. Distributed simulation enables many different units to participate in a joint exercise, regardless of where the units are located. The requirements for individual simulations to engage in such exercises are mandated by Department of Defense standards, that is, high level architecture and distributed interactive simulation. An excellent example of the capabilities that have resulted are the unprecedented number of virtual environment simulations that have transitioned from the Office of Naval Research’s Virtual Technologies and Environments (VIRTE) Program to actual military training applications discussed throughout this handbook. The second area of major growth is the field of medical simulation. The development of the human
Preface
xv
patient simulator clearly heralded this next phase of medical simulation based training, and the field of medical simulation will certainly expand during the next decade. Finally, the other exciting development in recent years is the exploration of augmented cognition, which may eventually enable system users to completely forgo standard computer interfaces and work seamlessly with their equipment through the utilization of neurophysiological sensing. Now let us address some of the issues that occur during the development process of a simulator. The need for simulation usually begins when a customer experiences problems training operators in the use of certain equipment or procedures; this is particularly true in the military. The need must then be formalized into a requirements document, and naturally, the search for associated funding and development of a budget ensues. The requirements document must then be converted into a specification or a work statement. That then leads to an acquisition process, resulting in a contract. The contractor must then convert that specification into a hardware and software design. This process takes time and is subject to numerous changes in interpretation and direction. The proof of the pudding comes when the final product is evaluated to determine if the simulation meets the customer’s needs. One of the most critical aspects of any modeling and simulation project is to determine its effectiveness and whether it meets the original objectives. This may appear to be a rather straightforward task, but it is actually very complex. First, it is extremely important that checks are conducted at various stages of the development process. During the conceptual stages of a project, formal reviews are normally conducted to ensure that the requirements are properly stated; those same reviews are also conducted at the completion of the work statement or specification. During the actual development process, periodic reviews should be conducted at key stages. When the project is completed, tests should be conducted to determine if the simulation meets the design objectives and stated requirements. The final phase of testing is validation. The purpose of validation is to determine if the simulation meets the customer’s needs. Why is this process of testing so important? The entire development process is lengthy, and during that process there is a very high probability that changes will be induced. The only way to manage the overall process is by performing careful inspections at each major phase of the project. As the organization and content of this handbook make evident, this process has been the fundamental framework for conducting most of today’s leading research and development initiatives. Following section to section, the reader is guided through the requirements, development, and evaluation cycle. The reader is then challenged to imagine the state of the possible in the final, Future Directions, section. In summary, one can see that the future of simulation to support education and training is beyond our comprehension. That does not mean that care must not be taken in the development process. The key issues that must be addressed were cited earlier. There is one fact that one must keep in mind: No simulation is perfect. But through care, keeping the simulation objectives in line with the
xvi
Preface
capabilities of modeling and implementation, success can be achieved. This is demonstrated by the number of simulations that are being used today in innovative settings to improve training for a wide range of applications. REFERENCES Kurzweil, R. (1990). The age of intelligent machines. Cambridge, MA: MIT Press. Needham, J. (1986). Science and civilization in China: Volume 2. Cambridge, United Kingdom: Cambridge University Press.
ACKNOWLEDGMENTS
These volumes are the product of many contributors working together. Leading the coordination activities were a few key individuals whose efforts made this project a reality: Associate Editor Julie Drexler Technical Writer Kathleen Bartlett Editing Assistants Kimberly Sprouse and Sherry Ogreten We would also like to thank our Editorial Board and Review Board members, as follows: Editorial Board John Anderson, Carnegie Mellon University; Kathleen Bartlett, Florida Institute of Technology; Clint Bowers, University of Central Florida, Institute for Simulation and Training; Gwendolyn Campbell, Naval Air Warfare Center, Training Systems Division; Janis Cannon-Bowers, University of Central Florida, Institute for Simulation and Training; Rudolph Darken, Naval Postgraduate School, The MOVES Institute; Julie Drexler, University of Central Florida, Institute for Simulation and Training; Neal Finkelstein, U.S. Army Research Development & Engineering Command; Bowen Loftin, Texas A&M University at Galveston; Eric Muth, Clemson University, Department of Psychology; Sherry Ogreten, University of Central Florida, Institute for Simulation and Training; Eduardo Salas, University of Central Florida, Institute for Simulation and Training and Department of Psychology; Kimberly Sprouse, University of Central Florida, Institute for Simulation and Training; Kay Stanney, Design Interactive, Inc.; Mary Whitton, University of North Carolina at Chapel Hill, Department of Computer Science
xviii
Acknowledgments
Review Board (by affiliation) Advanced Brain Monitoring, Inc.: Chris Berka; Alion Science and Tech.: Jeffery Moss; Arizona State University: Nancy Cooke; AuSIM, Inc.: William Chapin; Carlow International, Inc.: Tomas Malone; CHI Systems, Inc.: Wayne Zachary; Clemson University: Pat Raymark, Patrick Rosopa, Fred Switzer, Mary Anne Taylor; Creative Labs, Inc.: Edward Stein; Deakin University: Lemai Nguyen; Defense Acquisition University: Alicia Sanchez; Design Interactive, Inc.: David Jones; Embry-Riddle Aeronautical University: Elizabeth Blickensderfer, Jason Kring; Human Performance Architects: Richard Arnold; Iowa State University: Chris Harding; Lockheed Martin: Raegan Hoeft; Max Planck Institute: Betty Mohler; Michigan State University: J. Kevin Ford; NASA Langley Research Center: Danette Allen; Naval Air Warfare Center, Training Systems Division: Maureen Bergondy-Wilhelm, Curtis Conkey, Joan Johnston, Phillip Mangos, Carol Paris, James Pharmer, Ronald Wolff; Naval Postgraduate School: Barry Peterson, Perry McDowell, William Becker, Curtis Blais, Anthony Ciavarelli, Amela Sadagic, Mathias Kolsch; Occidental College: Brian Kim; Office of Naval Research: Harold Hawkins, Roy Stripling; Old Dominion University: James Bliss; Pearson Knowledge Tech.: Peter Foltz; PhaseSpace, Inc.: Tracy McSherry; Potomac Institute for Policy Studies: Paul Chatelier; Renee Stout, Inc.: Renee Stout; SA Technologies, Inc.: Haydee Cuevas, Jennifer Riley; Sensics, Inc.: Yuval Boger; Texas A&M University: Claudia McDonald; The Boeing Company: Elizabeth Biddle; The University of Iowa: Kenneth Brown; U.S. Air Force Academy: David Wells; U.S. Air Force Research Laboratory: Dee Andrews; U.S. Army Program Executive Office for Simulation, Training, & Instrumentation: Roger Smith; U.S. Army Research Development & Engineering Command: Neal Finkelstein, Timothy Roberts, Robert Sottilare; U.S. Army Research Institute: Steve Goldberg; U.S. Army Research Laboratory: Laurel Allender, Michael Barnes, Troy Kelley; U.S. Army TRADOC Analysis Center– Monterey: Michael Martin; U.S. MARCORSYSCOM Program Manager for Training Systems: Sherrie Jones, William W. Yates; University of Alabama in Huntsville: Mikel Petty; University of Central Florida: Glenda Gunter, Robert Kenny, Rudy McDaniel, Tim Kotnour, Barbara Fritzsche, Florian Jentsch, Kimberly Smith-Jentsch, Aldrin Sweeney, Karol Ross, Daniel Barber, Shawn Burke, Cali Fidopiastis, Brian Goldiez, Glenn Martin, Lee Sciarini, Peter Smith, Jennifer Vogel-Walcutt, Steve Fiore, Charles Hughes; University of Illinois: Tomas Coffin; University of North Carolina: Sharif Razzaque, Andrei State, Jason Coposky, Ray Idaszak; Virginia Tech.: Joseph Gabbard; Xavier University: Morrie Mullins
SECTION 1
VIRTUAL ENVIRONMENT COMPONENT TECHNOLOGIES SECTION PERSPECTIVE Mary Whitton and R. Bowen Loftin Anyone desiring to build an effective virtual environment based training system must make knowledgeable selections of hardware and software for the system that delivers the training content and allows the trainee to interact with that content. The goal of this section is to introduce the hardware and software—the component technologies—that make up the virtual environment (VE) system so that readers can make informed decisions when trade-offs are necessary during system design. The component technologies are as disparate as a laser system that displays images directly on the retina, software that controls semi-autonomous agents, and intersimulator data communication standards. The chapters in this section provide some tutorial information; they are intended to update, rather than duplicate, information available in Stanney’s (2002) Handbook of Virtual Environments. Figure SP1.1 shows how VE systems are related to other parts of this book. System and fidelity requirements are inputs to the VE system design process. The application requirements drive component selection, as well as provide such ancillary constraint data as cost targets and portability goals. When VE systems are in use, they can receive session specific data from and send data to training support technologies, systems that track trainee progress and define training sequences. Any data needed for after action review, to evaluate trainees, or to document training effectiveness is output from the VE training system, typically in the form of logs of events from the individual VE stations and logs of network traffic. Volume 3, Section 1 (not called out in Figure SP1.1) includes descriptions of several training systems.
2
VE Components and Training Technologies
Figure SP1.1.
Relationship of Volume 2, Section 1 to Other Sections of the Book
The authors of the chapters were given almost total freedom to present their topics as they chose. This was so that they could emphasize what, from their own experiences, they know to be the important considerations when using the technology in an integrated system. The consequence of this freedom is that the chapters do not share a common structure, and a subtopic area covered in one chapter may not appear in another. So be it.
INTRODUCTION TO VE COMPONENT TECHNOLOGIES In popular use, the term virtual environment has been applied to systems as dissimilar as a fully immersive simulator for a multiengine airliner and an interactive game played on a smart phone.1 Despite the differences in the complexity and cost, the components of the two systems have much the same functionality: both accept user input, both use that input in an application program, and both provide feedback to the user. 1 VE researchers do not agree on a definition of a VE (or whether to use the term virtual environment or virtual reality), and they define the difference between a VE application and an interactive threedimensional computer graphics application in a variety of ways based on factors such as field of view of the display, whether the scene changes when the user moves his or her head, and the level of the user’s engagement with the activity in the virtual scene.
Virtual Environment Component Technologies
3
Figure SP1.2. VE System Components. The exploded block shows the functional components of a system for a single trainee. Such stations are replicated and connected by the network for team training. Also connected via the network are training support systems, logging/after action review systems, and storage for models and programs. See Table SP1.1 for additional detail. Figure courtesy of Computer Science, UNC–Chapel Hill.
The block on the right in Figure SP1.2 is a high level functional and data-flow diagram of a stand-alone VE system. The four elements in that diagram form a continuous interaction-feedback loop: the user performs some action, devices or sensors translate user action into input data, computing elements determine a response to the input, and display devices provide feedback to the user of changes caused by the input. Table SP1.1 expands on Figure SP1.2 with a list of components of virtual environment systems and includes pointers to chapters in which the topics are discussed in more detail.
Immersion and Interactivity Virtual environments have two characteristics that, along with the content, are the three major factors determining the quality of the user’s VE experience. Virtual environments are immersive, that is, they substitute synthetically generated
Table SP1.1. Examples of component technologies in virtual environment systems are listed. More information is available in the chapters enclosed in parentheses. Hardware
Software
Content/Data
Input – Trackers (Welch and Davis, Volume 2, Section 1, Chapter 1) – Microphones (Sadek, Volume 2, Section 1, Chapter 4) – Wands, Gamepads, and so forth – Physiological monitors
Operating System (McDowell, Guerrero, McCue, and Hollister, Volume 2, Section 1, Chapter 8)
Models (Whitton and Loftin, Volume 2, Section 1, Section Perspective) – Shape – Appearance – Behavior (Whitton and Loftin, Volume 2, Section 1, Section Perspective; Petty, Volume 2, Section 1, Chapter 9)
Computing (McDowell, Guerrero, McCue, and Hollister, Volume 2, Section 1, Chapter 8) – CPU, Memory – Graphics – Networking
Networking (McDowell, Guerrero, McCue, and Hollister, Volume 2, Section 1, Chapter 8) – Communication standards
Scenarios (Shadrick and Lussier, Volume 2, Section 2, Chapter 18)
Output – Visual Display (Bolas and McDowall, Volume 2, Section 1, Chapter 2; Towles, Johnson, and Fuchs, Volume 2, Section 1, Chapter 3) – Headphones and speakers (Sadek, Volume 2, Section 1, Chapter 4) – Other senses (Bas¸dog˘an and Loftin, Volume 2, Section 1, Chapter 5)
Application Simulators (McDowell, Guerrero, McCue, and Hollister, Volume 2, Section 1, Chapter 8)) – Entity behaviors (Petty, Volume 2, Section 1, Chapter 9)
Model Creation Tools (Whitton and Loftin, Volume 2, Section 1, Section Perspective) – Shape – Appearance
Rendering – Visual (Whitton and Loftin, Volume 2, Section 1, Section Perspective; McDowell, Guerrero, McCue, and Hollister, Volume 2, Section 1, Chapter 8) – Audio (Sadek, Volume 2, Section 1, Chapter 4) – Other senses (Bas¸dog˘an and Loftin, Volume 2, Section 1, Chapter 5)
Speech (Whitton and Loftin, Volume 2, Section 1, Section Perspective) – Voice recognition – Speech understanding – Speech synthesis
SW Tools (McDowell, Guerrero, McCue, and Hollister, Volume 2, Section 1, Chapter 8; McDowell, Volume 2, Section 1, Chapter 10) – Middleware –Game engines –Physics engines
Systems & System Issues Augmented & Mixed Reality (Henderson and Feiner, Volume 2, Section 1, Chapter 6) Dismounted Infantry (Templeman, Sibert, Page, and Denbrook, Volume 2, Section 1, Chapter 7) Simulation sickness (Drexler, Kennedy, and Malone, Volume 2, Section 1, Chapter 11) Evaluation (Whitton and Brooks, Volume 2, Section 1, Chapter 12)
6
VE Components and Training Technologies
sensory input for sensory data from the real world, and virtual environments are interactive, that is, when a user does something that generates an input to the system, the results of that input are almost immediately apparent. According to some strict definitions of a virtual environment system, minimally a VE system must immerse the user in synthetic visual imagery and must change that imagery at interactive rates when the user turns his or her head. Immersion Users depend on data they gather with their senses to understand the situation in which they are placed and to inform their decisions and actions. VE system designers must know if it is possible for the system to generate the sensory cues the user needs for accurate decision making and action. Users are immersed in synthetic sensory input by various displays, most often visual and audio. More rarely VE users are presented with haptic, vestibular, olfactory, or taste stimuli. The quality of the immersion is determined by the quality of the models used for objects, sounds, and so forth, the quality of the software used to simulate or render the synthetic stimuli, and the characteristics of the devices that deliver the stimuli. Good immersion requires that the system have good models—sufficient object detail, sufficiently subtle behaviors of entities, and so forth. Rendering, particularly for visual and audio stimuli, requires highly accurate simulation of the physics of light and sound waves. After the data are rendered, they must be displayed or delivered to the user in a manner such that the user can perceive them. For instance, it may not be possible to hear subtle noises if the system uses cheap speakers. Likewise, if the critical visual cue for detecting a threat occupies only a single pixel on a handheld device’s 320 × 240 pixel screen, then that combination of object size and display resolution is inappropriate for training that particular threat-detection task. The quality of the entire virtual experience depends on the quality of both the immersion and the content. The content and the sensory stimuli together can cause changes in the user’s psychological and physiological state: the user may feel as if he or she is present in the virtual scene (and not in a laboratory), may feel emotions, and may exhibit changes in his or her physiological state, such as a rise in heart rate if encountering a stressful situation. (The use of the word immersion to refer exclusively to the synthetic stimuli delivered to the user, and the use of the word presence to refer to a component of the user’s psychological state, have been, and remain, controversial among VE researchers. Witmer and Singer (1998) and Slater (1999, 2003) offer differing opinions. Interactivity Both moving around and making things happen in a virtual environment are examples of interactions between the user and the environment. Ideally, the user sees the effect of his or her input almost immediately after making an action. If the user senses no delay between input and feedback, the system is called
Virtual Environment Component Technologies
7
interactive. For example, if a user pushes a joystick to indicate forward motion, and he or she moves though the scene in no more time than it would take to make a real step forward, the system is interactive. The time between user action/input and a change in the display is the end-toend latency of the system; this latency is the time it takes data to travel around the interaction-feedback loop in Figure SP1.2. It is generally accepted that an application is interactive if this latency is around 50 milliseconds (about 3 frames at a 60 frame per second frame update rate). System latency as low as 40 milliseconds has been shown to decrease task performance in visual search tasks (Wickens, 1986).
Components of VE Systems Input Devices and User Interfaces The target application dictates the choice of input devices for a particular VE system. For instance, a helicopter simulator will have devices that function like the helicopter control stick, pedals, and collective and provide input to the vehicle dynamics simulator. As another example, an infantry member might hold a riflelike device on which a tracker is mounted. The tracker readings control where the rifle appears in the visual scene, and, when the rifle is fired, the readings are used by the simulation to determine what the round hit. Input devices for training systems range from the familiar controllers for computer games (keyboards, mice, buttons, joysticks, dials, sliders, pedals, steering wheels, the Wii remote, and so forth) to sophisticated reproductions of the actual control panels of aircraft, ships, and tanks. Trackers are a special category of input device that measure and report the position and/or orientation (together called pose) of objects in the scene. Other sensors that might be in the (real) training environment include microphones for voice input, video cameras recording trainee behavior, and physiological monitors worn by the trainee to record such data as heart rate. Good usability engineering practice for user interfaces is to build, test, revise, test again, and so on. Many factors affect how successful a device and interface will be. Questions that should be asked during development include the following: Is the interface natural and intuitive? Is using it fatiguing? How well does it reproduce or approximate the way the target user does the task in the real world? How hard is it to learn the interface? After someone has learned the interface, is there any residual cognitive effort required to use it, and does that effort detract from a user’s ability to focus on and perform his or her main task? Does the interface interfere with another interface in the system? Computing and Networking Today the computing components of VE systems are almost always standard, off-the-shelf computers. When configured with high performance graphics boards or chips, high end laptops and office PCs have sufficient CPU and
8
VE Components and Training Technologies
graphics processing power to generate scenes with the level of realism adequate for many training tasks. Custom-configured clusters of computers may be required to drive multiscreen displays or perform calculations for complex physical simulations. At run time, the computer is loaded with model data and with application software, including rendering software. Models of objects and entire virtual scenes can be specific not only to a single training application, but also to a particular training scenario. In addition to the main application control program, the software may include programs for graphics, physics simulation, simulators for specific vehicles, and programs to control semi-automated forces. Once the session begins, the main application program manages input from the user, the model data, and the various software components in order to generate displays for the user. Networking VEs for Team Training While a user can learn individual level skills on a stand-alone VE system and may be able to learn the rudiments of a team task on a stand-alone system, teams need to train as teams. The left side of Figure SP1.2 shows multiple VE systems networked together for team training. Trainee specific portions of the training application are executed on each system. In order for each user’s application software to maintain a correct view of the state of the entire scenario, local state data have to be distributed across the network to the other trainees’ stations. The more trainees and the more moving objects, such as vehicles, that are in a scenario, the more state data have to move across the network, and the higher the potential for network delays. Network delays can adversely affect system interactivity, which may reduce trainees’ abilities to perform their tasks, which, in turn, may reduce the efficacy of the system for training. The standard reference on networked virtual environments is Singhal and Zyda (1999) now supplemented by those authors’ book on networked games (Singhal & Zyda, 2008).
Outputs: Displays and Logs Display devices and data logs are the primary outputs of virtual environment systems. Displays provide instantaneous and ephemeral feedback to the users; logs are a permanent record of what occurred. Display is the generic term used for the presentation of any type of sensory data to the user. Visual displays are essential; audio is common; and touch (haptics), smell, and taste are more difficult to implement and are used only where cues from those senses are essential for task performance or to increase the realism of a virtual setting to a very high level. In a training context, logs may include user inputs (keystrokes, button pushes, or pose data from trackers), the state of all the simulators and entities in the session, and voice communications. The data can be collected at each time step of the simulation or at predetermined milestones in the training episode. Logs are
Virtual Environment Component Technologies
9
the inputs to systems that can replay training sessions. Used as part of an after action review, a replay can enhance the value of training sessions by enabling the trainers and trainees to discuss the actual sequence of events that occurred. Some existing training systems log only data that are passed between stations over the network. While a useful level of after action review can be performed with these data, because the logs contain no data about individual behavior, there is little or no ability to evaluate an individual’s performance and diagnose his or her weaknesses. INTRODUCTION TO THE CHAPTERS IN THIS SECTION This section is organized with chapters on individual component technologies coming before chapters on topics concerning the whole VE system. Two chapters, inserted where they seem to make the best sense, illustrate the additional issues that arise when components are integrated in a system to meet specific user needs. Trackers and Visual Displays Many practitioners differentiate VEs from other interactive three-dimensional graphics applications by requiring that, to be a VE system, the user’s view of the scene must change when the user turns his or her head. To accomplish change in the view, the user’s head pose must be measured and reported to the application. Trackers (Welch and Davis, Chapter 1) are input devices that constantly, and at a minimum of 60 times per second, report the position and/or orientation of the object to which they are attached. In the case of a head tracker, the tracker data are used by the computing components to determine the point of view from which the virtual scene is rendered. Both head-mounted displays (Bolas and McDowall, Chapter 2) and projection displays (Towles, Johnson, and Fuchs, Chapter 3) more fully immerse the user in a virtual scene than do desktop LCD flat-panel displays or the small screens of handheld devices. The user experience and the level of infrastructure required by head-mounted displays and projectors differ radically and these two chapters will help the reader understand the benefits and drawbacks of each. “Mixed and Augmented Reality for Training” (Henderson and Feiner, Chapter 6) overlay synthetically generated information on a view of the real world, or combine real and virtual elements into a single scene. The chapter discusses the special benefits of augmented reality (AR) and mixed reality for a number of specific training applications and expands on the previously presented information on head-mounted displays and tracking to address the peculiar requirements of AR and mixed reality systems. The authors of this chapter use the logically more appropriate term head-worn displays rather than the term head-mounted displays. Displays for the Other Senses In VE systems designed to provide high quality immersion, providing high quality audio (Sadek, Chapter 4) (that is, spatial audio of high fidelity) is arguably
10
VE Components and Training Technologies
second in importance only to supplying high quality visual stimuli. The audio stimuli may even be more important than the quality of the visuals in some specific training or rehabilitation applications. This chapter comprehensively covers the subject—the fundamental science, the computations required, how to select between headphones and speakers, and how to prevent damage to users’ hearing. Displays for the other senses are discussed in multimodal display systems: haptic, olfactory, gustatory, and vestibular (Bas¸dog˘an and Loftin, Chapter 5). These four senses—also known as touch, smell, taste, and orientation/acceleration— are much less frequently stimulated in VE systems than the visual and audition systems, and this chapter explains why by pointing out the difficulties of doing so. The lack of something to feel, the lack of haptic stimulation, is considered by many to be the most frequent cause of breaks in the VE user’s illusion of being present in the synthetic scene. User Input: Meeting Complex User Needs “Designing User Interfaces for Training Dismounted Infantry” (Templeman, Sibert, Page, and Denbrook, Chapter 7) illustrates both how important it is that the system designer has a complete understanding of the interaction needs of the customer and how complex it can be to integrate multiple input devices to meet those needs. A locomotion interface becomes complex when it has to support the ability to look and aim a weapon in one direction while moving in another direction (including backwards and sideways), at any speed. Computing: Behavior Models, Computing Requirements, and Game Engines Populating a large battlefield with a real human operator for each vehicle is prohibitively expensive. A solution to that problem is an important class of software that generates semi-automated forces with the characteristic that one human (or an artificially intelligent system) can control many synthetic entities (individuals or collectives). “Behavior Generation in Semi-Automated Forces” (Petty, Chapter 9) describes how behaviors are programmed for these entities. The chapter on computing requirements (McDowell, Guerrero, McCue, and Hollister, Chapter 8) looks in detail not only at hardware and software requirements driven by the visual displays, but also at the computing requirements of physics simulators (collision detection and response), rendering for senses other than visual, supporting sensor networks that may be in the (real) training environment, and supporting communications. Computer games have many of the same characteristics as training systems, and games are increasingly used for training. “Games and Gaming Technology for Training” (McDowell, Chapter 10) discusses how games and game engines can be used as training tools. The author concludes with a thoughtful analysis of what it will take for game technology to be better accepted in the training community.
Virtual Environment Component Technologies
11
Evaluating the VE System and Components The final chapter of the section is devoted to evaluation of component technologies (Whitton and Brooks, Chapter 12). Using a format of short case studies and lessons learned, the chapter looks at methods and metrics for evaluating both individual component technologies and entire systems. The chapter concludes with a short discussion of the role of usability engineering in the development of VE components and systems.
LOOKING TO THE FUTURE VE Components Become Commodity Products This section of this volume examines VE component technologies as they relate to the design, development, and deployment of VE systems intended for training. If one looks into the intermediate future (say, 5 to 10 years), can one envision these components moving from specialized, laboratory artifacts to commodity products? Within the VE component technologies described in this section only computing has been “commoditized” to the point that a sophisticated VE system can be built around a computing platform that is readily available for $10,000 or less. Contrast this to VE development in the early 1990s when the necessary computing platforms could cost $250,000 to $1,000,000. Sadly, such a reduction in cost (or increase in capability) has not occurred in most other VE component technologies. Nonetheless, there is hope. The recent success of the Wii from Nintendo, for example, demonstrates that an input device, if manufactured in quantities of 100,000 or more can be both capable and relatively inexpensive. It is a virtual certainty that at least some other VE component technologies will follow suit, especially if they become part of a popular game system (like the Wii). On the display side, stereo-ready, consumer-grade, large-screen, real-projection televisions based on Texas Instruments’ DLP technology are now available from two manufacturers (http:// www.dlp.com/hdtv/). We can expect that technologies such as tracking, spatial audio, and head-worn displays will be commoditized and, thus, provide a significant impetus to the development and deployment of VE systems for training. Moving toward the Holodeck Star Trek’s holodeck, eerily like the ultimate display described by Sutherland (1965), is often cited as a model of the ideal virtual environment system. One way to define a research and development agenda is to compare the capabilities of current VE technologies to the capabilities of the holodeck. Defining characteristics of the holodeck make it a good model for a team training environment: the (real) people participating are together in a shared space, there are perfect
12
VE Components and Training Technologies
instances of real objects (vehicles, weapons, control panels, tools, and so forth) available, and the virtual people are indistinguishable from the real. Where is our technology inadequate today? Table SP1.2 is partial list of areas in need of improvements. Major questions that remain are whether we will be able to make virtual training as realistic as live training and thus provide safe and possibly more readily accessible ways to start trainees up the learning curve, increasing their confidence and the probability that they will perform successfully, and whether we have to Table SP1.2. A Partial List of Improvements Needed in VE Component Technologies Category
Need
Visual Displays
• Displays that offer proper stereo views for multiple trainees working in the same space. • Computing and graphics hardware to support such displays. • High resolution, bright, wireless head-worn displays.
Haptic Displays
• New methods of delivering tactile and haptic feedback. What can we do to approximate the ability to sit on virtual chairs? This is perhaps the most challenging of all the technologies of the holodeck. Sutherland (1965) described it thusly, “The computer can control the existence of matter.”
Sensors
• Unencumbering, fast, accurate full-body tracking without encumbrances.
Tracking data to control movement of avatars (virtual bodies of self and other humans) and autonomous virtual humans.
Data accurate enough that movement of vitual humans looks natural and movement of avatars of real people contains the idiosyncratic movement patterns of each individual.
Tracking data accurate enough to control models of fingers so that participants/trainees have full use of their hands for interacting with the VE.
Speech
• Speech recognition and understanding systems sufficiently fast to allow a natural flow of verbal interaction.
Mixed Reality
• Systems of displays, trackers, and sensors to support augmented and mixed reality systems so that real objects can be used while participating in a scenario.
Tools
• Content development tools easy and fast enough that a user can train tonight against a tactic first seen in battle today.
Model builders: real things and places; behaviors; sound.
Scenario builders including automated narrative/storyline generation.
Automatic generation of scenario variants.
• Evaluation and error diagnosis tools that accept event logs as input.
Virtual Environment Component Technologies
13
have full realism to train all skills. Fully immersive systems, today’s approximations of the holodeck, require considerable supporting infrastructure in terms of trackers, computers, and displays. The low infrastructure solution has always been envisioned as nothing more encumbering than a pair of sunglasses. Beyond the Holodeck: Radically Different Technology Looking into the future more than 25 years, one can speculate about the ultimate virtual environment. First, let go of Sutherland’s idea of a room with computer-controlled matter and focus on what the human experience with the ultimate display should be—an experience just as real as being there. People have such experiences now—dreams. Dreams can be vivid, multisensory experiences that are identical in fidelity to the real world. Dreams provide evidence that the human brain has all the processing power needed for the ultimate VE. The ultimate VE system would use a biological computer (the human brain) at least for graphical and other sensory display production. The system would rely on direct coupling to the human nervous system for both input and output. The challenge is discovering how to stimulate the brain, especially the channels associated with the sense organs, so that external sensory stimulation is not required and so that the content of the VE dream can be controlled. Already, devices delivering direct stimulation to the vestibular system are available, though still crude (see Bas¸dog˘an and Loftin, Chapter 5). Obviously there are major barriers to overcome—both of a technical and of a cultural nature—before this VE system based on direct coupling of a computer to the user’s brain and nervous system can be realized. Yet the potential is clearly there, if content can be created, implanted, and experienced safely, efficiently, and effectively. ACKNOWLEDGMENTS The authors would like to thank Jeremy D. Wendt and Chris VanderKnyff for their help in the preparation of Appendix A and Ramy Sadek for providing Appendix B. Preparation of this chapter was supported in part by funding from the Office of Naval Research. REFERENCES Cassell, J., Sullivan, J., Prevost, S., & Churchill, E. (Eds.) (2000). Embodied conversational agents. Cambridge, MA: The MIT Press. Johnsen, K., Dickerson, R., Raij, A., Harrison, C., Lok, B., Stevens, A., & Lind, D. (2006). Evolving an immersive medical communication skills trainer. Journal on Presence: Teleoperators and Virtual Environments, 15(1), 33–46. Jurafsky, D., & Martin, J. H. (2008). Speech and language processing (2nd ed.). Upper Saddle River, NJ: Prentice Hall. Luebke, D., Reddy, M., Cohen, J., Varshney, A., Watson, B., & Huebner, R. (2002). Level of Detail for 3D Graphics. San Francisco: Morgan Kaufmann.
14
VE Components and Training Technologies
Moller, T., & Haines, E. (2002). Real time rendering. Natick, MA: A. K. Peters. Mori, M. (2005). Bukimi no tani [The uncanny valley] (K. F. MacDorman & T. Minato, Trans.). (Originally published in 1970; Energy, 7(4), 33–35.) Retrieved April 22, 2008, from http://graphics.cs.ucdavis.edu/~staadt/ECS280/Mori1970OTU.pdf Pharr, M., & Humphreys, G. (2004). Physically based rendering: From theory to implementation. San Francisco: Morgan Kaufmann. Shirley, P, Ashikhmin, M., Gleicher, M., Marschner, S., Reinhard, E., Sung, K., et al. (2005). Fundamentals of computer graphics. Wellesley, MA: A. K. Peters. Singhal, S., & Zyda, M. (1999). Networked virtual environments: Design and implementation. Reading, MA: Addison-Wesley. Singhal, S., & Zyda, M. (2008). Networked games: Design and implementation. Reading, MA: Addison-Wesley. Slater, M. (1999). Measuring presence: A response to the Witmer and Singer questionnaire. Presence: Teleoperators and Virtual Environments, 8(5), 560–566. Slater, M. (2003). A note on presence terminology. Presence-Connect, 3(1). Retrieved April 21, 2008, from http://www.presence-connect.com Stanney, K. (Ed.). (2002). Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Erlbaum Associates. Sutherland, I., (1965). The ultimate display. Information Processing 1965: Proceedings of IFIP Congress, 65(2), 506–508. Traum, D., Swartout, W., Gratch, J., Marsella, S., Kenny, P., Hovy, E., et al. (2005). Dealing with doctors: A virtual human for non-team interaction. Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue (pp. 232–236). East Stroudsburg, PA: Association for Computational Linguistics. Wickens, C. D. (1986). The effects of control dynamics on performance. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance (pp. 39-1–39-60). New York: Wiley. Witmer, B., & Singer, M., (1998). Measuring presence in virtual environments: A presence questionnaire. Presence: Teleoperators and Virtual Environments, 7(3), 225–240.
APPENDIX A: MODELING AND RENDERING Mary Whitton and Jeremy Wendt This appendix is a basic introduction to two of the fundamental elements of computer graphics: three-dimensional (3-D) modeling and rendering. Given its length, the appendix is little more than a framework, some definitions, a few targeted references, and keywords for searching the World Wide Web, digital libraries, or library catalogs. For those who want to learn more about computer graphics, a good basic textbook is Fundamentals of Computer Graphics by Shirley and colleagues (2005). Rendering programs start with models and generate images. Inputs to rendering programs are object shape, object appearance, and scene-lighting models; models of dynamics define how objects move and change from one rendered frame to the next. Pharr and Humphreys (2004) and Moller and Haines (2002) are texts on the general topic of rendering and real-time rendering, respectively. The large number of popular press books on topics such as computer graphics, 3-D modeling, rendering, and animation speaks to the wide use of these techniques today. Art meets technology. Modeling for computer graphics is a combination of art and technology: the art is in defining the shape and appearance; the technology is in the programs and hardware that render the objects and scenes. For the art and technology to serve the application well, the artists and the technologists must collaborate among themselves and, for training applications, with training experts and subject matter experts. The team goal is to produce a set of models that instantiates the training experience that the trainers imagined. Developing high-quality art assets—the models of objects in the scenes, the artwork used to color them, and the lighting of the scene—can require many months of work by many artists and programmers. Although highly trained and experienced professionals produce better-looking models than amateurs or less highly trained professionals, the level of artistic quality available in top-of-theline computer games requires more resources (time, money, and people) than most training projects can afford. Digital media and computer animation services
16
VE Components and Training Technologies
are available from small businesses and independent contractors, as well as from large production houses. MODELING OBJECTS—SHAPE AND SURFACE APPEARANCE Object Shape In most VE and graphics systems, object shape is defined by a triangle mesh that approximates the shape of the object’s outer surface. Triangles are used because they are the most efficient data format for modern graphics processors. Models can be created in geometric modeling programs, by software procedures (procedural modeling), or from 3-D-range images. Collections of ready-to-use 3-D models are available for purchase. Reference books on shape modeling are almost all specific to particular software packages; research references are generally specific to a method of modeling and its associated mathematics. Geometric Modeling Geometric modeling software is available with a wide variety of capabilities and costs. Some packages, for example, Google SketchUp, are free and easy to learn, but have limited features. Autodesk’s professional products 3ds Max and Maya integrate tools for sophisticated modeling, animation, and rendering, but the learning curve is steep. There are well over two dozen 3-D modeling software packages; many of them are available at no cost. Procedural Modeling Procedural modeling techniques generate objects by following a set of rules. Procedural methods can be used to create enough buildings for a city or enough trees for a forest. L-systems are rules that generate plants, and fractal procedures produce realistic-looking synthetic terrain, clouds, mountains, and coastlines. Procedural modeling programs are available for purchase, as are ready-to-use models of plants. Terrain Models of the earth’s surface are called terrain models or digital elevation models, and they are often available at no cost. Range Data—Modeling Real Places LIDAR (light detection and ranging) and scanning-laser range finder technologies measure the distance from the scanner to the surface of objects in the space being scanned. The 3-D points located by the scanner are converted into a triangle mesh model of the space. Some systems achieve very high realism by using color photographs of the scene as texture maps.
Appendix A: Modeling and Rendering
17
Object Surface Appearance As of this writing, texture mapping is the most widely used method of adding appearance detail to objects. Textures are images that are applied like decals to the surface of objects during rendering. Texture maps add such visual detail as color, patterns, dirt, and rust. Photographs and other 2-D media can be used as textures, as can the output of 2-D painting programs (paint programs). As hardware and software support has increased, the use of variants of texture mapping such as bump mapping, normal mapping, and displacement mapping has grown. These maps modify the rendering calculations in ways that increase visual realism with minimal increase in rendering time. Issues and Considerations Object Numbers and Complexity The more objects and the more detail in each object (that is, the more triangles in the overall model), the higher the computational load and the more likely that latency will increase and reduce interactivity. Luebke and his colleagues (2002) describe techniques that reduce the computational load of large datasets. Dynamic Shape Model Modification Some applications modify object models during program execution in order to provide visual feedback of shape changes caused by the collision of objects, for example, human tissue deforming when pushed by a surgical tool. Both the physical simulation and the model modification require significant computational resources. SCENE LIGHTING Lighting effects can be used to create cinematic effects ranging from joy to terror, as well as to generate such familiar effects as shadows and reflections. Lighting, like texture mapping, is part of the process of rendering a virtual scene. The rendering process simulates the physics of the interaction of lights with surfaces and computes the final color of each pixel. Fixed Function and Programmable Lighting Pipelines OpenGL (open graphics library) and Direct3D are application programming interfaces (APIs) for writing both 2-D and 3-D graphics applications. The packages support a fixed set of lighting effects, including both the diffuse and specular components of direct lighting—light that comes from light sources defined by their position, color, and shape. Both OpenGL and Direct 3D support application-specific lighting effects through programmable shaders. Shaders are
18
VE Components and Training Technologies
programs that run on graphics processing units (GPUs). NVIDIA and ATI, major suppliers of GPUs, both provide extensive resources for developers. See Chapter 8—McDowell, Guerrero, McCue, and Hollister—for a further discussion of shaders. Global Illumination Real objects are lit not only by light coming directly from a light source, but also by light reflected from other objects in the scene. Global illumination rendering techniques include light from interreflections when calculating the color of a surface. Radiosity and ray tracing, the two best-known global illumination techniques, are computationally expensive and are not used in interactive applications. Radiosity can be precomputed for static (nonmoving) scene configurations and the lighting stored in special textures called light maps that are applied to objects during rendering. Issues and Considerations Build versus Buy Most application developers choose to buy rendering software because it is not cost-effective to have a dedicated in-house rendering software team. There are many 2-D and 3-D graphics API and graphics middleware and game engine products available. They range in price from free and very low cost with minimal features to high cost with concomitant capability. Middleware and game engine software offer functionality at a higher level than APIs, but at a sufficiently general level that they can be used in a wide variety of applications. Lighting Simulation Quality and Interactivity Many applications strive to render a new frame about every 16 milliseconds. (The 16 milliseconds figure is related to the refresh rate specification of displays. See Chapter 3—Towles, Johnson, and Fuchs.) Many applications set the fidelity of their lighting simulation at a level that uses all of the computing resources available for the time left after other essential per-frame operations (for example, managing user interface devices and computing collisions and collision responses) are completed. MODELING OBJECT MOVEMENT DYNAMICS Besides looking right, objects must behave correctly when they interact with other objects or entities in the scene. Simulators define the motion of virtual objects and entities that can move under their own power. Unpowered objects move (and possibly change shape) in response to external forces (for example, as a result of colliding with another object or entity).
Appendix A: Modeling and Rendering
19
Simulators Vehicle Simulators Vehicle simulators are programs that compute an approximation of the behaviors and state of a real vehicle. The behavior is based on internal state (for example, remaining fuel), user input to the vehicle’s control system (for example, braking, or turning), and physical constraints of the environment (for example, slope of the road, or collisions). Data from the simulator are used to compute feedback for the user. For instance, if the oil overheats, the virtual oil gauge light would come on. Semiautomated Forces Semiautomated forces are the topic of Chapter 9—Petty. Often the purpose of semiautomated forces is to have large numbers of entities in a training scenario without large numbers of actual participants. Semiautomated forces respond in preprogrammed ways to high-level commands issued by a human in response to changing conditions. Virtual Humans A convincing virtual human must look real and move realistically and must, without human input, respond appropriately to its environment and situation. Motion capture or mocap is one way to define the movements of virtual humans. (See Chapter 1—Welch and Davis for more on motion capture.) If the virtual human is performing a predefined movement, the desired movement can be performed by an actor and recorded using the motion capture system. To generate unscripted movements, models of common movements (for example, start, stop, run, walk, or jump) can be recorded individually and then linked in various sequences to form new movements. Models of body positions characteristic of various emotional states can be recorded and used to control virtual humans so that the virtual humans are able to communicate emotions through body language. Advances in facial animation are resulting in more believable motion of the mouth and the surrounding facial tissue when virtual humans speak. The Uncanny Valley Human judgment of the realism of virtual humans is not monotonic with model quality. When model quality gets high, humans surprisingly become more intolerant of small flaws in appearance and behavior that they would have accepted in a lower-quality model. Only when the model is close to perfect do ratings of realism again rise. Mori (1970) observed this phenomenon with respect to the realism of robot faces, and it has since been observed with respect to the appearance and motion of virtual humans.
20
VE Components and Training Technologies
Collision Detection and Response The two fundamental operations associated with collisions are collision detection and collision response. Detecting collisions is a matter of testing whether two models touch or overlap in virtual space and determining the moment of collision. Collision response algorithms determine the correct response to a collision (changes in object models, changes of direction, and so forth) by simulating the physics of the collision event—the velocity of the objects, their material properties, friction, and so forth. Collisions between rigid bodies can be detected and responses computed at interactive rates; collisions involving one or more deformable bodies such as the medical instrument-human tissue example used earlier, are much more computationally expensive and rarely interactive. Physics engines are software packages that simulate collision response based on Newtonian physics. Physics engines vary in computational precision, speed, and price. Simulations of any kind are expensive. They require considerable programmer time to develop and considerable computer speed to execute. Even when using commercial packages, developers must be careful not to overload the system so that the latency grows to the point that interactivity is compromised and the system becomes unusable.
APPENDIX B: SPEECH AND LANGUAGE SYSTEMS: RECOGNITION, UNDERSTANDING, AND SYNTHESIS Ramy Sadek In many, if not most, domains, human interactions occur via speech. Thus, speech recognition and understanding, together with speech generation, may be essential components of a VE based (or, for that matter, any) training system. This appendix provides a brief introduction to speech and language systems that could be used in a VE developed for training. Speech and language systems are much sought after in VEs and are a highly active area of research. Broadly, there are four components to a complete language system: speech recognition, understanding, reasoning, and speech synthesis. Of these components, understanding and reasoning are the least accessible to VEs. These two areas mark cutting-edge research, so developer and end-user packages do not yet exist. The few groups working in these areas rely on in-house software built for the specific needs of their research. Therefore projects seeking this functionality will need to collaborate with a specialized research group. Speech recognition and synthesis are more accessible technologies. Commercial, academic, and open-source recognition packages are available and include the Hidden Markov Model Tookit (http://htk.eng.cam.ac.uk/), Institute for Signal and Information Processing Toolkit (www.ece.msstate.edu/research/isip/projects/speech/), Dragon NaturallySpeaking (www.dragontalk.com), and ViaVoice (www.nuance.com/). The recognition problem is a complex one, and to date no drop-in solutions allow the addition of speech recognition to a VE system. Integrating any of these packages with simulation software generally requires a dedicated engineer. The packages were designed with different aims. For example, one package may provide moderate accuracy for a large range of speakers, while another package offers high accuracy for a specific speaker for whom the system is tuned or “trained.” Still other packages may aim to offer domain-specific recognition, for example, recognizing specific technical terms at the expense of recognizing a more general vocabulary. The best product choice is dependent on the specific needs and goals of each VE, and, fortunately, most of these packages offer clear descriptions of their functionality and features.
22
VE Components and Training Technologies
Speech synthesis or text-to-speech systems are somewhat easier to integrate since, at the most basic level, they take as input a piece of text. Some recognition packages (for example, ViaVoice) also offer text-to-speech capabilities. While text-to-speech capability is an open area of research, some packages offer dropin solutions that require little custom programming effort. VEs requiring nuanced or emotional speech responses may require an engineer to fine-tune a commercial system, annotating and massaging the data to achieve life-like speech. USING SPEECH RECOGNITION, UNDERSTANDING, AND SYNTHESIS SYSTEMS For basic input-response interaction, some VE projects have matched verbal inputs to a set of predetermined responses to skirt the language understanding problem. This technique can work well if dialogue is restricted to a small domain. Examples of this include the Virtual Patient simulator that helps train (real) medical students how to interact with (real) patients by having them practice interviewing virtual patients (Johnsen et al., 2006). A system using more sophisticated speech understanding techniques has been developed at the Institute for Creative Technology at the University of Southern California. Traum and his colleagues at the institute developed a speech system that is part of an immersive virtual reality system used to train military personnel in interactions with local, noncombatant populations. The training scenario described in Traum et al. (2005) requires the trainee, a real officer, to negotiate with a local computer-generated doctor with the goal of getting the doctor to relocate his clinic out of a danger zone. The trainee interacts verbally with the doctor and the computer-generated visuals convey, via the doctor’s posture and gestures, additional cues to the doctor’s state of mind during the exchange. MORE INFORMATION The first application for future commercial speech understanding products is likely to be in intelligent virtual agents. Advances can be tracked in the proceedings of the Intelligent Virtual Agents conference. Another source of information is proceedings of meetings of the Association for Computational Linguistics (ACL) and its special interest group on discourse and dialogue (SIGdial) and its special interest group SIGDAT for linguistic data and corpus based approaches to natural language processing. Two recommended references are Jurafsky and Martin (2008) and Cassell, Sullivan, Prevost, and Churchill (2000). REFERENCES Cassell, J., Sullivan, J., Prevost, S., & Churchill, E. (Eds.) (2000). Embodied conversational agents. Cambridge, MA: The MIT Press. Jurafsky, D., & Martin, J. H. (2008). Speech and language processing (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Part I: Subsystem Components
Chapter 1
TRACKING FOR TRAINING IN VIRTUAL ENVIRONMENTS: ESTIMATING THE POSE OF PEOPLE AND DEVICES FOR SIMULATION AND ASSESSMENT Greg Welch and Larry Davis Estimating or tracking human and device motion over time is a central requirement for most virtual environment (VE) based training systems. In some cases it is sufficient to know a trainee’s head or torso location [two-dimensional (2-D) or 3-D position only] within the training environment. Other cases require the full body pose—the position and orientation. Still other cases require complete body posture—the positions, orientations, and/or configurations of the trainee’s arms, hands, legs, feet, as well as handheld devices (for example, surgical instruments or weapons). Sometimes this information must be known with precision and accuracy to better than a millimeter; sometimes less spatial and temporal resolution is needed. Primary uses for motion tracking for training include real time and online simulation associated with a “live” training activity, and on- or offline assessment of performance or behavior. Live training is most often associated with VE tracking where, for example, a military trainee performing a room-clearing exercise might wear a head-mounted display (HMD) while moving around a virtual room looking for virtual enemies. In this case, at a minimum the trainees’ heads would need to be tracked for the purpose of rendering the proper HMD imagery as they are moving around. Most likely their weapons would also need to be tracked to render it in the HMD imagery, and additionally perhaps their hands or other limbs would be tracked, so that they, too, could be properly rendered in the HMD imagery. Given the room-clearing scenario, one might want to know how efficiently the trainees moved during the exercise, where they were looking, where their weapons were pointing, and so forth. In this chapter we look at tracking scenarios, technologies, and issues related to training in VE training systems. We explore the fundamental aspects of tracking only to the degree it is useful for considering, choosing, and using tracking systems for training. For further information about the fundamental technologies
24
VE Components and Training Technologies
and methods used in tracking systems, we encourage the reader to refer to the many excellent existing survey articles (Ferrin, 1991; Meyer, Applewhite, & Biocca, 1992; Durlach & Mavor, 1994; Bhatnagar, 1993; Allen, Bishop, & Welch, 2001; Welch & Foxlin, 2002). In addition, Foxlin’s (2002) chapter in Kay M. Stanney’s Handbook of Virtual Environments is an excellent source of information about the requirements for tracking and the underlying fundamental technologies (Stanney, 2002, pp. 163–210). Beyond discussing the fundamental technologies, Allen et al. (2001, pp. 52–56) discuss the most common source/ sensor configurations, and both Allen et al.’s and Foxlin’s chapters discuss the most common approaches and algorithms for estimating pose from the source/ sensor measurements. The remainder of this chapter is organized as follows. In the first section we describe the tracking considerations relevant to the most common scenarios related to training. While we do not intend to provide a complete tracking survey in this chapter, in the second section we describe some of the tracking technologies available today for training. In the third section we discuss what we feel are the most important fundamental issues that one should consider when purchasing, installing, and using commercial tracking systems. Finally, in the fourth section we speculate just a little about where research and development are heading. Note that portions of the third section “Fundamental Usage Issues” were reproduced or adapted (with ACM copyright permission) from the course notes for the ACM SIGGRAPH 2001 course “Tracking: Beyond 15 Minutes of Thought” (Allen et al., 2001). TRACKING SCENARIOS When choosing or using a motion tracking system for training purposes, one needs to consider many factors. Here we explore three particular issues: what, where, and when to track. Considering these issues in advance of choosing a tracking system should narrow the choices. It should also help to calibrate expectations for what might be possible in terms of precision, accuracy, robustness, and overall suitability, while also allowing the developers to focus on the relevant issues described in the fourth section “Looking Ahead.” What to Track The primary consideration is what needs to be tracked, and for what purpose. While this might at first seem obvious, one tracking solution will rarely fit all circumstances. Review of the tracking scenario can reveal unrealistic expectations (practical limitations) or liberating opportunities to determine degrees of freedom, accuracy, and precision to accommodate the practical limitations of the available tracking technology. In his excellent Taxonomy of Usability Characteristics in Virtual Environments, Gabbard distinguishes between “VE User Interface Input Mechanisms” and “Tracking User Location and Orientation” (Gabbard & Hix, 1997, pp. 24–25).
Tracking for Training in Virtual Environments
25
Although user interface mechanisms could potentially be included in a training application, here we will concentrate primarily on trainee pose and posture. In VE, one needs to consider weight and bulk of tracking components placed on a user. However, for training, these potential distractions and biases are even more of a concern, as ideally the desired response is evoked in exactly the same situation as the real event. If the training environment relies on user-worn components that the trainees have to contend with, it might affect their performances and hence their training. The impact depends on how noticeable the component is and/or how it forces them to adjust their behavior. For example, for a marine doing live-fire training exercises, a full Lycra bodysuit instrumented with retroreflective spheres or inertial sensors could provide a wealth of information about the marine’s dynamic posture, but the marine is likely to be very conscious of the suit, even if it is somehow integrated into his or her normal camouflage clothing. A global positioning system (GPS) and other devices can add weight, which corresponds to unusual forces on the body, which might be noticed. How noticeable a body-worn component is depends on the relative forces—something that is very noticeable on the hand might be less so on the back/torso. Knowing what parts of the body need to be tracked will help determine the necessary degrees of freedom, accuracy, resolution, and so forth. For example, head tracking for head-mounted, display based virtual reality (VR) can be very demanding in particular in terms of delay-induced error (“Temporal Issues” in the third section). Because our heads are relatively heavy, and attached to the mass of the torso, people cannot translate their heads very fast. However, as pointed out by Ron T. Azuma, “At a moderate head or object rotation rate of 50° per second, 100 milliseconds (ms) of latency causes 5° of angular error. At a rapid rate of 300° per second, keeping angular errors below 0.5° requires a combined latency of under 2 ms!” (Azuma, 1993, p. 50). A more typical 50 milliseconds of delay corresponds to about 15° of error for such rotation. However, if one intends to use a room-mounted display system (projected imagery or flat panels), concerns about latency-induced rotational error are typically reduced dramatically. This is because such head rotation typically causes relatively small eye translation with respect to the fixed displays. People can rotate their wrists at angular rates that are roughly comparable to head rotations, but they can translate their hands much faster than their heads, by rotating rapidly about the wrist or elbow. (Hands have much less mass than heads!) Typical arm and wrist motion can occur in as little as ½ second, with typical “fast” wrist tangential motion occurring at three meters per second (Atkeson & Hollerbach, 1985). Such motion corresponds to approximately 1 to 10 centimeters of translation throughout the sequence of 100 measurements used for a single estimate. For systems that attempt submillimeter accuracies, even slow motion occurring during a sequence of sequential measurements impacts the accuracy of the estimates. For example, in a multiple-measurement system with 30 millisecond total measurement time, motion of only three centimeters per second corresponds to approximately one millimeter of target translation throughout the sequence of sensor measurements acquired for one estimate.
26
VE Components and Training Technologies
Finally, it is critical to consider how many individuals one needs to track simultaneously for real time graphics or training/behavioral analysis. The primary concern is one of the sociability of the tracking system, as defined in Meyer et al. (1992). That is, how well does the approach/system support multiple simultaneous users? For example, an optical system could be more prone to occlusions from other trainees than magnetic or inertial systems. Even if the medium itself is relatively unaffected by multiple nearby users, as is the case, for example, with inertial sensors, one has to look at how one gets the data off the devices and processed simultaneously in real time. It is often hard enough to do this with one user, much less two or more. Sociability is something to investigate if you intend to track multiple trainees. Where to Track Beyond what one is tracking (“What to Track”) one needs to consider where the tracking needs to be done. Some applications might require only small-scale tracking where the user does not walk around. For example, if one is looking at training a task involving manual dexterity of the hand or fingers, such as surgical suturing, one does not need wide-area tracking, but instead might get by with a couple of “glove” devices. If the pose of the hand (back or palm) is of interest over a small area, one might be able to use conventional single-sensor magnetic systems, such as those made by Ascension Technology Corporation or Polhemus, or inertial hybrids, such as those made by InterSense Inc. For 6 DOF head tracking, the training task might involve only what some call “fish tank” VR, whereby the user stands or sits in front of a cathode ray tube monitor or flat panel display, viewing some imagery that requires head-motion parallax, and so forth. In such cases concerns about optical or acoustic occlusions are likely to be lessened, allowing consideration of vision (camera) based tracking, and so forth. If the allowable volume of motion is restricted, one might be able to use mechanical tracking such as the Shooting Star Technology ADL-1. If one needs 6 DOF head tracking over a room or lab-sized space, one needs to be looking at such wide-area systems as the HiBall by 3rdTech, the IS-900 by InterSense, or a cellular magnetic system, such as the Ascension Flock of Birds. With an increase in working volume comes a likely increase in number of trainees. If multiple trainees need to be supported in a large space, one needs to consider the sociability of the candidate systems as mentioned in “What to Track.” Perhaps the most difficult tracking challenges are related to unusually large spaces, in particular, outdoors. Outdoor environments can present exceptional challenges in terms of the sheer scale of the working volume (and corresponding difficulties with sensor signals); difficulties in dealing with such uncontrollable environmental factors as too little or too much light, sound, and so forth; and even something as seemingly mundane as getting signals off the body-worn sensors of the individual trainees. The problem becomes one not of just signal strength and data bandwidth if the trainees will be distributed over a large outdoor area, but
Tracking for Training in Virtual Environments
27
potentially one of timing—latency, synchronization, and so on. Over very large areas, for example, a live-fire desert training environment, certain technologies become impractical or impossible. For example, an active magnetic system could not be made to work over several kilometers. In such cases self-tracking approaches, such as those that make use of inertial devices, become more attractive as their accuracy and sensitivity do not necessarily depend on external infrastructure. On the other hand, there is no inertial tracking system that can function with reasonable bounded error in an unaided fashion over a large area (see “Inertial” in the second section for more detail). As such the approach would need to combine the inertial sensing with other approaches, such as GPS (which itself is limited to meters of accuracy), or even vision/camera based approaches. When to Track Finally, beyond what (or why) and where to track, one needs to consider when to track. By “when” we really mean when will the sensor data (whatever they are) be processed to generate pose or posture estimates. For example, if one needs to create computer-generated imagery for the trainee(s), one will need some form of online (active during the training) and real time (fast enough to keep up) estimations. In such cases, temporal issues such as latency (see “Temporal Issues” in the third section) might be of primary concern. If one is interested only in postexercise pose or posture analysis, then as long as the sensor data can be collected in real time, and accurately timestamped if synchronization is needed, then the sensor data can be processed offline after the exercise to estimate the pose/posture that is then analyzed. While the latter case (post-training analysis) might sound easier, in fact, waiting to process sensor data until later in time (offline) can mean that one needs to transmit and/or store tremendous amounts of data during the exercise. The pose/posture estimation from the raw sensor data, in effect, provides a form of compression of the sensor data, as many readings are (typically) combined into single estimates at a reduced rate. On the other hand, analyzing sensor data in an offline fashion (after the exercise) means that one can effectively “look into the future” when filtering the data. The benefits of such noncausal filtering can be tremendous if, for example, the data have structured errors in them, gaps, and so on. Systems that operate in an online fashion cannot (by definition) look ahead in time and, therefore, can generate pose/posture estimates based only on past measurements. This can make such systems susceptible to data dropout, as well as unexpected changes in target (trainee) dynamics, such as transitions from still to rapid motion. Note that it should be possible to take a hybrid approach, where some processing is done online, in real time (perhaps to both compress the sensor data and provide some online feedback or analysis to the trainers) and further pose/posture refinement and behavioral analysis (for example) are done later, offline.
28
VE Components and Training Technologies
TODAY’S TECHNOLOGIES Once one has determined what is to be tracked and the reasons for tracking it, the remaining decision is the type of tracking technology to use. There are many factors to influence this decision, including the operating principle of the tracker, the required performance, and the cost. In this section, we provide guidance in selecting commercial off-the-shelf (COTS) tracking systems. We first discuss the types of trackers, categorized by principle of operation, and give examples of COTS systems within each category. We then provide a list of tracker characteristics to consider when deciding upon a particular tracking system. Finally, we discuss choices that must be made when considering how one plans to interface with the desired tracking system. The tracking system taxonomy found in Welch and Foxlin (2002) is used to classify the systems according to different operating principles. In addition, the specific tracking systems highlighted are limited to systems available for retail purchase at the time of writing. Some of the more popular trackers that have been discontinued by their manufacturers, such as the Boom (Binocular OmniOrientation Monitor) 3C from Fakespace Labs or the ADL-1 from Shooting Star Technology, are often available through online auction sites. For more detailed discussion regarding operating principles and tracking in general, the reader may refer to Welch and Foxlin (2002) and Foxlin (2002). Mechanical Mechanical tracking systems use physical links to determine the pose of a tracked object. Mechanical tracking systems have the advantages of high update rates (number of pose reports per second) and high accuracy. However, the object being tracked is tethered to the tracker, limiting the range of motion. Mechanical tracking technology is used for digitizers, as well as tracking hand motion and motion capture/pose determination. The MicroScribe G2LX from Immersion Corporation and the FaroArm from FARO Technologies are examples of digitizers, which are tracking systems used to create digital representations of real objects. The CyberGlove, also from Immersion, is a popular device for measuring hand movements. It uses changes in electrical resistance to indicate the amount of bending of the fingers. Another popular option is the X-IST DataGlove from noDNA that includes conductive bend sensors and piezoelectric pressure sensors. Two mechanical tracking systems used for motion capture are the Gypsy-6 from Animazoo (a mechanical system that uses inertial sensors) and the ShapeTape and ShapeWrapIII systems from Measurand that use fiber optic bend sensing. Inertial Inertial trackers use the earth’s gravitational field to determine the pose of tracked objects. Inertial trackers can be very small, have low latency, and
Tracking for Training in Virtual Environments
29
consume small amounts of power. Their drawback is that they suffer from drift (a gradual loss of measurement accuracy). The errors in accuracy accumulate because of the numerical integrations performed to convert accelerations and velocities into positions or angles. The tendency to drift is often mitigated by using inertial systems as part of hybrid tracking systems (see “Hybrid” in the second section). A popular inertial tracker is the InertiaCube series from InterSense. The InertiaCube3 combines prediction algorithms with accelerometers and gyros to provide 360° of rotational measurement. Other examples of inertial tracking systems include the 3D-Bird from Ascension (180° in elevation and 360° in azimuth and attitude) and the MTx from Xsens (360° in all directions). Acoustic Acoustic trackers use ultrasonic sound (near 40 kilohertz) to determine the pose of objects. The time-of-flight differences from multiple sources are measured and position is determined based upon the characteristics of sound traveling in the air. However, multipath reflection of the emitted sound can severely degrade tracking performance, as can occlusions between the emitter and receiver. Acoustic tracking tends to suffer in outdoor conditions, as well as near walls, where air currents and noise can cause interference. Acoustic tracking is also used as part of hybrid tracking systems. Resolutions tend to be several millimeters, and accuracies can be difficult to maintain if conditions cannot be very controlled. An example of acoustic trackers is the Hexamite HX11. The HX11 tracks the location of pulsed, ultrasonic emitters with chains of ultrasonic receivers. In theory, the system has no limit to the tracking coverage area. Magnetic Magnetic tracking systems use electromagnetic field differences to determine position and orientation. They offer good update rates and rugged performance. Two categories of magnetic tracking systems are systems that produce magnetic fields using alternating current (AC) and direct current (DC). An AC tracker is generally more accurate than a DC tracker. However, AC trackers produce eddy currents in surrounding metals that in turn produce small magnetic fields that degrade performance. In addition, the performance of both types of magnetic systems is degraded by the presence of external magnetic fields and ferroelectric materials. Two classic VE tracking systems are the Polhemus Fastrak and the Ascension Flock of Birds. The Fastrak uses AC magnetic fields to determine position and orientation, while the Flock of Birds uses a pulsed, DC magnetic field. Recent demonstrations of the Polhemus PATRIOT wireless tracking system have shown considerable improvement regarding robustness to metallic interference. The PATRIOT is expandable and can track up to four objects simultaneously.
30
VE Components and Training Technologies
Optical Optical tracking systems use the properties of light to determine the pose of tracked objects. Markers that emit light (active trackers) or reflect light (passive trackers) are used to determine pose. The arrangement of the light sources and sensors provide a subclassification for optical trackers. If the sensors are mounted on the object to be tracked, the approach is sometimes called “inside out.” If the sensors are fixed, and the markers are attached to the object to be tracked, the approach is sometimes called “outside in.” Optical trackers provide the highest accuracy of any tracking type typically being submillimeter, and they provide data in the 120 to over 1,000 frames per second reducing motion-blur artifacts. They suffer from line-of-sight issues, meaning the sensors or emitters can be blocked. Optical trackers that use infrared light often have difficulties in bright lights or direct sunlight. An important feature of an optical tracker is its field of view, meaning the angle through which its sensors can detect targets. The HiBall tracking system from 3rdTech is an inside-out tracker capable of unlimited tracking coverage in theory. Another inside-out tracker is the LaserBIRD from Ascension that has an infrared light source that sweeps through the tracked area. Outside-in optical trackers include the DynaSight from Origin Instruments and the Certus from Northern Digital Inc. Both of these systems use active infrared emitters to determine the pose of tracked objects. The DMAS (Digital Motion Analysis Suite) from Motion Imaging Corporation is another outside-in system, but it is a markerless tracking system, tracking objects through analysis of video frames. Many optical tracking sytems are used for motion capture in addition to pose tracking. Many are passive trackers that detect reflections from special markers in the tracked area. Examples of these systems include the OptiTrack from NaturalPoint, the PPT from WorldViz, the MX from Vicon, the Impulse from PhaseSpace Inc., the Eagle from Motion Analysis Corporation, and the ProReflex from Qualisys.
Radio Frequency and Ultrawide Band Radio frequency trackers use differences in radio signals to determine position in an environment in meter or submeter resolutions. These differences include signal strength, signal content, and time of flight. Ultrawide band (UWB) communications use similar principles, but operate over a broader frequency range (for example, 2 GHz–7.5 GHz [gigahertz]). The Ubisense system uses a pulsed UWB signal to determine the 3-D location of tags within the tracked area. Up to 1,000 tags can be located simultaneously by the system and simultaneous radio frequency (RF) communication occurs to dynamically change update rates. In this area, there have also been recent announcements from Thales on an indoor/outdoor RF tracking system and from
Tracking for Training in Virtual Environments
31
AeroScout regarding a Wi-Fi based active radio-frequency-identification tracking system. Hybrid As the name suggests, hybrid trackers combine multiple types of operating principles to provide increased robustness of pose measurements. Two examples of hybrid systems from InterSense are the IS-900 and the IS-1200. The IS-900 combines acoustic and inertial measurements, and the IS-1200 combines optical and inertial measurements. Another hybrid system is the Hy-BIRD from Ascension. It combines inertial and optical tracking methods for use in cockpit applications. Factors to Consider when Evaluating Tracking Systems After considering the factors intrinsic to the technology used by the tracker, a tracking system should be evaluated based upon its performance characteristics. These characteristics include the following: 1. Degrees of Freedom: three (orientation or position tracking only) or six (position and orientation tracking). 2. Accuracy: The absolute difference between the real position of the tracked object and the position reported by the tracker. 3. Resolution: The minimum change in the position of the tracked object that can be detected by the tracker. 4. Jitter: The instantaneous change in position from frame to frame, as reported by the tracker when the tracked object is stationary. 5. Drift: The gradual change in position (or bias) reported by the tracker over time. 6. Latency: The amount of delay between when the tracked object moves and when the data corresponding to the movement is transmitted. 7. Update Rate: The number of measurements that the tracker makes each second. This number may decrease as additional objects are tracked. 8. Range: In principle, the maximum distance that a single object can be tracked. To increase the range, additional sensors may be required. 9. Maximum Tracked Objects: The maximum number of objects that can be tracked simultaneously. 10. Operating Principle: Does the application environment enable the choice of tracking technology? For example, a magnetic tracker should not be used near a generator. 11. Untethered Operation: Can the tracked objects move freely or are they constrained by a wire, connector, and so forth? 12. Price.
In Figure 1.1, a table comparing many of these factors for the preceding tracking systems is presented. While this table is a starting point for investigations or discussions in choosing a tracking system, the reader is cautioned to
Figure 1.1. Comparison of Commercial Off-the-Shelf Tracking Systems
Tracking for Training in Virtual Environments
33
consider the issues presented in the “Fundamental Usage Issues” section of this chapter.
Interfacing with the Tracker A final aspect to consider is the challenge of interfacing hardware and software for communication with the tracking system. Many tracking systems ship with DB-9 or DB-25 connectors for RS-232 serial data communications. The advantages of serial communication are fewer wires, longer cables, and a well-known standard. The main disadvantage is that all data must be converted into a serial format, transmitted, and then converted back from the serial format (which may differ between operating systems). If the tracker uses a serial connection, fewer interface issues will arise if it is connected to a computer with a serial port on the motherboard. If a computer with a serial port is unavailable (as is the case with most laptops), a serial port expansion card can be added or a USB-to-serial converter can be used (USB = universal serial bus). USB-to-serial converters are inexpensive and do not require the addition of internal electronics, but they may pose problems communicating with the tracker. The environment where the tracker will be used must also be considered when choosing the software interface. A sample application is included with the tracking system to allow measurement out of the box, but integration with existing applications requires more consideration. Typically, tracking systems ship with a C/C++ application programming interface for customized software integration. In addition, many tracking products also have software interfaces available in virtual world-building applications and in preexisting device interface libraries. Vizard, from WorldViz LLC (http://www.worldviz.com/), is a commercially available world-building application that is easy to learn. VR Juggler (http://www.vrjuggler.org/) is an open source software platform for virtual environment application development. At a more basic level, the Virtual Reality Peripheral Network (http://www.cs.unc.edu/Research/vrpn/) is a public domain set of libraries for distributed tracking/interface device connections that can be easily incorporated in custom applications. All three software packages contain ready-to-use interfaces for many of the tracking systems mentioned in this section and the capability to customize interfaces to accommodate other types of trackers.
FUNDAMENTAL USAGE ISSUES For VE displays, one’s overall goal is perfect, continuous registration and/or rigidity. For motion capture, one’s overall goal is accuracy that is sufficient enough to do training-related behavioral analysis. In either case, tracking is hardly perfect. But it can often be made “good enough” if one chooses carefully to try and address the fundamental sources of error as much as possible.
34
VE Components and Training Technologies
There are several sources of error in estimates from tracking and motion capture systems. Whether looking at head tracking or hand tracking, the basic principles are the same. There are, of course, many causes of visual error in interactive computer graphics systems. There are many people who would argue that various errors originating in the tracking system dominate all other sources. In his 1995 Ph.D. dissertation analyzing the sources of error in an augmented reality (AR) system for computer-aided surgery, Rich Holloway stated, Clearly, the head tracker is the major cause of registration error in AR systems. The errors come as a result of errors in aligning the tracker origin with respect to the World CS [coordinate system] (which may be avoidable), measurement errors in both calibrated and multibranched trackers, and delay in propagating the information reported by the tracker through the system in a timely fashion. (Holloway, 1995, p. 135)
Holloway’s dissertation offers a very thorough look at the sources of error in the entire VR pipeline, including the stages associated with tracking. It is a valuable resource for those interested in a rigorous mathematical analysis. Chapter 8 of the dissertation discusses some methods for combating the problems introduced by tracker error, in particular, delay. For a person designing, calibrating, or using a tracking or motion capture system, it is useful to have some insight into where errors come from. As Michael Deering notes in his 1992 SIGGRAPH paper, “The visual effect of many of the errors is frustratingly similar” (Deering, 1992). This is especially true for tracking errors. We have seen people build VR applications with obvious head tracker transformation errors, and yet people had great difficulty figuring out what part of the long sequence of transforms was wrong—if it was a static calibration error, or a simple sign error. Yet even when all of the transforms are of the correct form, the units of translation and orientation match, and all the signs are correct, there are still unavoidable errors in motion tracking, errors that confound even the most experienced of practitioners of interactive computer graphics. No matter what the approach, the process of pose estimation can be thought of as a sequence of events and operations. The sequence begins with the user motion and typically ends with a pose estimate arriving at the host computer, ready to be consumed by the application. Clearly by the time a pose estimate arrives at the host computer, it is already “late”—and you still have to render an image and wait for it to be displayed! “Motion Prediction” in the third section offers some hope for addressing the long delays and in some sense “catching up” with the user motion, but that does not mean that we do not want to minimize the delay and to understand how all of the various errors affect the outcome. The sources of error in tracking and motion capture can generally be divided into two primary classes: spatial and temporal errors. We refer to issues and errors that arise when estimating the pose of an immobile target as spatial issues. (Note that spatial issues include measurement noise, which is generally a function
Tracking for Training in Virtual Environments
35
of time, but statistically stationary.) We refer to issues and errors that arise when tracking a moving object as temporal issues. These issues include errors that arise from the inevitable sources of delay in the tracking pipeline (delay-induced error).
Spatial Issues For an immobile sensor (static motion), we can further divide the measurement errors into two types: repeatable and nonrepeatable. Some trackers (for example, magnetic ones) have systematic, repeatable distortions of their measurement volume, which cause them to give erroneous data; we will call this effect static field distortion. The fact that these measurement errors are repeatable means that they can be measured and corrected as long as they remain unchanged between this calibration procedure and run time. See Livingston and State (1997) for an example of how this can be done. One also needs to consider the nonrepeatable errors made by the tracker for an immobile sensor. Some amount of noise in the sensor inputs is inevitable with any measurement system, and this measurement noise typically leads to random noise or jitter in the pose estimates. By our definition, this type of error is not repeatable and therefore not correctable a priori via calibration. Moreover, jitter in the tracker’s outputs limits the degree to which the tracker can be calibrated. The amount of jitter is often proportional to the distance between the sensor(s) and the source(s) and may become relatively large near the edge of the tracker’s working volume. While these effects are true for many source/sensor combinations, let us consider the effects related to image-forming digital cameras. There are two reasons we choose to look at cameras. For one, camera geometry should be relatively easy for most readers to understand. In addition, cameras are increasingly being used in tracking and motion capture systems, probably partly because of decreasing costs, increasing resolutions, and increasing image processing capabilities in computers in general (device and central processing unit bandwidth, and computation power). Cameras effectively measure the number of photons arriving at each photo cell, over the period that the shutter is open. Those photons might have originated from an active tracking source such as a light-emitting diode, or they might originate from an ambient light source and be reflected by a passive tracking target or marker. In either case, there are two related issues that are useful to consider: the size, or cross section, of the target in the camera (the resolution of the target) and the amount of light reaching the camera (the brightness and/or contrast). Most readers will be familiar with the notion that as a target being imaged by a camera gets farther away, its image gets smaller in the camera. Specifically, as shown in Figure 1.2, as the distance d to a target increases, the angle θ that the object spans in the camera’s field of view decreases proportionally. The effect is twofold. First, the smaller the angle θ, the fewer camera pixels cover the target. Correspondingly, as the distance at which one is attempting to image a target
36
VE Components and Training Technologies
Figure 1.2. Camera
Relationship between Distance and Size of a Target as Imaged in the
increases, so increases the projection or size of each pixel at that distance. In other words, as distance increases, one’s ability to resolve fixed size objects in the world decreases. While the purely geometric relationship between distance and size (or resolution) is important, in the case of cameras (and similarly for magnetic devices) one also needs to consider the decrease in light that reaches the camera with increasing distance. This affects the brightness of any light emanating from or reflected by the target, as measured by the camera. As with a camera, as the distance d to a target increases, the angle θ that the object spans in the camera’s field of view decreases proportionally. However, as indicated in Figure 1.3, the brightness decreases at a rate proportional to the square of the distance. This quadratic reduction is because the photons propagate away from the light source (or reflective patch) in a particular direction, covering some solid angle. For the sake of illustration, consider an omnidirectional light source, where the light propagates equally in all directions in a spherical manner. In this case, because the area of the surface of the wave front increases proportionally to the square of the distance, the density of photons on the surface of the wave front (density per surface area) decrease proportionally to the square of the distance. For a fixed size reflective patch on a tracking target for example, this means that the number of photons hitting the target decreases proportionally to the square of the distance. Beyond distance to the target, it is usually the case that the angle between the normal direction (for example, the surface normal) of the target and the camera and/or light source plays a role in the number of photons reaching the surface or camera. Figure 1.4 provides a simple illustration of this. In some cases, for example, the brightness is proportional to cos α. In that case, if the surface is seen “straight on,” then there is no attenuation of the light because cos(0) = 1. At the other extreme, if the camera is seeing the surface from an extreme angle, for
Tracking for Training in Virtual Environments
37
Figure 1.3. Relationship between Distance and the Amount of Light (Brightness) Reaching a Target
example, 90°, then there would be extreme attenuation of the light because cos(90°) = 0. Finally, one needs to consider that in a passive system, for example, vision based tracking systems with passive markers, the photons have to travel from the light source to the target as in Figure 1.3 and then from the target back to the camera (each photo cell) as in Figure 1.2. The effect is that the density of photons can, in some cases, decrease proportionally to the square of the square of the distance. In other words, the brightness is proportional to 1/d 4.
Figure 1.4. Relationship between the Normal-Camera Angle and the Amount of Light (Brightness) Reaching a Target
38
VE Components and Training Technologies
For image based systems, any reduction in brightness corresponds to a reduction in contrast (the ratio of brightest to darkest signal in the image), which corresponds to a reduction in the effective resolution at that distance. This effect can be illustrated or measured by the modulation transfer function. The shape of the modulation transfer function indicates the magnitudes of various spatial frequencies measured, compared to the spatial frequencies inherent in the scene. In general, the less light there is, the more difficult it is to resolve something. Thus the amount of light impacts the resolution of the system. In some cases, one can precisely control the light; in others, one cannot. For example, some optical motion capture systems attempt to address modulation transfer function issues by using infrared lights that are located very near the cameras—oftentimes in a ring around each camera. If the timing of the lighting can be controlled precisely, one can “pump” a lot of photons into the scene just when the camera shutter is open, thus increasing the signal without unduly flooding the scene with infrared light. With optical and other systems, one might also use differential signaling to improve the contrast and hence the signal-noise ratio. The idea is to take one image with the lights on (bright) and one with the lights off (dim). When one subtracts the two, the reflective objects should dominate the result, while other bright sources should be eliminated (subtracted out). There are practical limits on how much one can achieve with this approach, and there are temporal concerns as one must capture two sequential images to do the differencing. (The target might well be moving during the capture time!) Temporal Issues Beyond the spatial concerns covered in the previous section, there are several temporal concerns related to tracking. The problems include the rate at which discrete measurements are made of a moving target (any medium), the duration of each low level sample of a device (for example, how long a camera shutter is open), and the delay or latency from the time the measurement is made to the time the effect is “seen” by the remainder of the system (graphics subsystem, display, and so forth). Delay-Induced Error Any measurement of a nonrepeating, time-varying phenomenon is valid (at best) at the instant the sample occurs—or over the brief interval it occurs, and then becomes “stale” with the passage of time until the next measurement. The age of the data is thus one factor in its accuracy. Any delay between the time the measurement is made and the time that measurement is manifested by the system in a pose estimate contributes to the age and therefore the inaccuracy of that measurement. The older the tracker data are, the more likely that the displayed image will be misaligned with the real world. We feel that concerns related to dynamic error (including dynamic tracker error and delay-induced error from above) deserve distinct discussion. This class
Tracking for Training in Virtual Environments
39
of error is often less obvious when it occurs, and when one does recognize it, it is difficult to know where to look to minimize the effects. Further, it is literally impossible to reduce the delay to zero. One typically has to contend with overall system delays on the order of 10–100 milliseconds. See Meehan, Razzaque, Whitton, and Brooks (2003) for one example of the effects such delays can have.
First-Order Dynamic Error Probably the most significant effect here is the overall dynamic error caused by continued user motion after a tracker cycle (sample, estimate, and produce) has started. If the user’s head is rotating with an angular velocity of d θ/dt and translating with a linear velocity of dx/dt, then simple first-order models for the delayinduced orientation and translation error are given by
where Δt is the sum of the total motion delay Δtm for the tracking system as described below, as well as Δtg, the delay through the remainder of the graphics pipeline—including rendering and image generation, video synchronization delay, frame synchronization delay, and internal display delay. The video synchronization delay is the amount of time spent waiting for a frame buffer to swap—on average ½ the frame time. (Synchronization delay in general is described more later in the chapter.) The internal display delay is any delay added by the display device beyond the normal frame delay. For example, some liquid crystal display and digital light projector devices buffer images internally in a nonintuitive manner as they convert adjustable display resolution from the input to a fixed pattern of pixels on the screen, sometimes introducing several video frames of latency. The delay must be measured on a per-device basis if it is important.
Motion-Induced Measurement Noise Clearly the placement of sources and sensors can affect the signal quality as described earlier. But there are often other internal (aka intrinsic) parameters that need to be specified. For example, for cameras one needs to specify the focus and aperture settings, gains, frame rates, and shutter/exposure times. In particular, here we want to point out the potential for motion-induced noise during a camera exposure, a magnetic current measurement, an acoustic phase measurement, and so forth. In a nutshell, just as the target motion is an issue for multiple measurements, it is often an issue for even a single measurement. Without loss of generality, let us assume a regular camera update rate of 1/dt. Each cycle can be divided into sampling (exposure) time τ s, processing time τ p, and idle time τ i. The three times sum to the overall update period, that is,
40
VE Components and Training Technologies
dt = τs + τp + τi. Because cameras integrate light over the nonzero shutter time τs, estimating camera motion or dynamic scene structure using feature or color matching always involves a trade-off between maximizing the signal and minimizing any motion-induced noise. If the shutter time is too short, the dynamic range or contrast in the image will be too low, reducing the effective resolution, increasing the measurement uncertainty, and negatively impacting the final motion or structure estimates. Conversely, if the shutter time is too long, the measurements will be corrupted by scene or camera motion (blur), again reducing the effective resolution, increasing the measurement uncertainty, and negatively impacting the final estimates. See, for example, Figure 1.5, which illustrates the amount of motion in the image planes of various cameras, under a changing scene. This issue is discussed more in Welch, Allen, Ilie, and Bishop (2007). Sensor Sample Rate Per Shannon’s sampling theorem (Jacobs, 1993) the measurement or sampling rate rss should be at least twice the true target motion bandwidth, or an estimator may track an alias of the true motion. Given that common arm and head motion bandwidth specifications range from 2 to 20 Hz (hertz) (Fischer, Daniel, & Siva, 1990; Foxlin, 1993; Neilson, 1972), the sampling rate should ideally be greater than 40 Hz. Furthermore, the estimation rate re should be as high as possible so that slight (expected and acceptable) estimation error can be discriminated
Figure 1.5. An Example of Motion-Induced Measurement Error in a Camera while Imaging a Dynamic Environment (Moving Target or Camera)
Tracking for Training in Virtual Environments
41
from the unusual error that might be observed during times of significant target dynamics. Synchronization Delay While other latencies certainly do exist in the typical VE system (Mine, 1993; Durlach & Mavor, 1994; Wloka, 1995), tracker latency is unique in that it determines how much time elapses before the first possible opportunity to respond to user motion. When the user moves, we want to know as soon as possible. Within the tracking system pipeline of events (and throughout the rendering pipeline) there are both fixed latencies associated with well-defined tasks, such as executing functions to compute the pose, and variable latencies associated with the synchronization between well-defined asynchronous tasks. The latter is often called synchronization delay, although sometimes also phase delay or rendezvous delay. See, for example, Figure 1.6. In the example of Figure 1.6, measurements and pose estimates occur at regular but different rates. Inevitably, any measurement will sit for some time before being used to compute a pose estimate. At best, the measurement will be read immediately after it is made. At worst the measurement will be read just before it is replaced with a newer measurement. On average, the delay would be ½ the measurement rate. Figure 1.7 presents a more involved example, a sequence of intertracker events and the corresponding delays. Consider an instantaneous step-like user motion as depicted in Figure 1.7. The sequence of events begins at tm, the instant the user begins to move. In this example the sensors are sampled at a regular rate rss = 1/τss, such as would typically be the case with video or a high speed analog to digital conversion. On average, there will be Δτss = τss/2 seconds of sample synchronization delay before any sample is used for pose estimation. Because the pose estimate computations are repeated asynchronously at the regular rate of re = 1/τe, there will be an average of Δτe = τe/2 seconds of estimation synchronization delay, after which time the estimation will take τe seconds. Assuming a client-server architecture, such as Taylor (2006), the final estimate will be written
Figure 1.6. When a measurement is taken at time a, but not used to estimate the pose until time, the intervening time is called synchronization delay.
42
VE Components and Training Technologies
Figure 1.7.
An Example Sequence of Total Tracker-Related Events and Delays
to a server communications buffer where it is being read at a rate of rsrb = 1/τsrb and will therefore wait an average of Δτsrb = τsrb/2 seconds before being read and transmitted over the network to the client. The network transmission itself will take τnet, and the final client read-buffer synchronization delay will take Δτcrb = τcrb/2 seconds, where τcrb = 1/rcrb (the client read-buffer rate). The total (average) motion delay in this example is then
Tracking for Training in Virtual Environments
43
where rss is the sensor sample rate, re is the estimate rate, τe = 1/re, rsrb is the server read-buffer rate, τnet is the network transmission time, and rcrb is the client read-buffer rate. Note that this bound does not include any latency inherently added by pose estimate computations that also implement some form of filtering. Total Tracker Error Summing the static measurement error and the dynamic error, we get a total error of
where Δtm is from Equation (3) and includes the remainder of the graphics pipeline delay as described in “First-Order Dynamic Error” in the third section. Clearly the final rotation and translation error is sensitive to both the user motion velocity and the total delay of the tracker and graphics pipeline. Motion Prediction When trackers are used to implement VE or AR systems, end-to-end delays of the total system will result in a perceived “swimming” of the virtual world whenever the user’s head moves. The delay causes the virtual objects to appear to follow the user’s head motion with a velocity-dependent error. The sequence of events in a head-mounted display system goes something like that shown in Figure 1.8. The interval from t0 to t5 is on the order of 30 milliseconds in the fastest systems and upward to 200 milliseconds in the slowest. If the user is moving during this interval the image finally displayed at t5 will not be appropriate for the user’s new position. We are displaying images appropriate for where the user was rather than for where he or she is.
Figure 1.8.
Time Series of Events in a Head-Mounted Display System
44
VE Components and Training Technologies
The most important step in combating this swimming is to reduce the end-toend delay. This process can be taken only so far though. Each of the steps takes some time, and this time is not likely to be reduced to a negligible amount simply by accelerating the hardware. After the avoidable delays have been eliminated, one can attempt to mitigate the effect of the unavoidable delays by using motion prediction. The goal is to extrapolate the user’s past motion to predict where he or she will be looking at the time the new image is ready. As Azuma and Bishop (1995) point out, this is akin to driving a car by looking only at the rearview mirror. To keep the car on the road, the driver must predict where the road will go based solely on the view of the past and knowledge of roads in general. The difficulty of this task depends on how fast the car is going and on the shape of the road. If the road is straight and remains so, then the task is easy. If the road twists and turns unpredictably, the task will be impossible. Motion predictors attempt to extract information from past measurements to predict future measurements. Most methods, at their core, attempt to estimate the local derivatives so that a Taylor series can be evaluated to estimate the future value. Several available commercial systems offer or support motion prediction. The differences among methods are mostly in the type and amount of smoothing applied to the data in estimating the derivatives. The simplest approach simply extends a line through the previous two measurements to the time of the prediction. This approach will be very sensitive to noise in the measurements. More sophisticated approaches will take weighted combinations of several previous measurements. This will reduce sensitivity to noise, but will incur a delay in responding to rapid changes. All methods based solely on past measurements of position and orientation will face a trade-off between noise and responsiveness. Performance of the predictor can be improved considerably if direct measurements of the derivatives of motion are available from inertial sensors. As described earlier, linear accelerometers and rate gyros provide estimates of the derivatives of motion with high bandwidth and good accuracy. Direct measurements are superior to differentiating the position and orientation estimates because they are less noisy and are not delayed. Azuma and Bishop demonstrated prediction using inertial sensors that reduced swimming in an augmented reality system by a factor of 5 to 10 with end-to-end delay of 80 ms (Azuma & Bishop, 1994). Further, Azuma and Bishop (1995) show that error in predictions based on derivatives and simple models of motion are related to the square of the product of the prediction interval and the bandwidth of the motion sequence. Doubling the prediction interval for the same sort in input will quadruple the error.
LOOKING AHEAD Given the rapid pace of technology advances, and active work in the fields, we expect that by the time this book is in print, some of the information will be
Tracking for Training in Virtual Environments
45
outdated. This aging process will continue, as is the case with almost any technology-related book. This is one reason we have attempted to wrap discussion of today’s technologies in the context of the fundamental circumstances and issues that are likely to continue to be relevant for the foreseeable future. Any attempt to look ahead into the future faces even more difficult challenges. And yet we want to attempt to share with the reader what appears to be some emerging trends and potential opportunities. Our hope is that the combination of the previous material and this brief speculation will combine to help make the reader a better consumer of the available technologies, and perhaps a better tracking systems engineer when needed. We would claim that many of the fundamental challenges related to head and hand tracking indoors for one or two users have been addressed to a point where very interesting VR work is being done without major issues related to tracking. It is arguably not the dominant problem it once was for circumstances involving only a few people. The dominant research challenges are largely related to the competing desires for increased performance and reduced infrastructure. These challenges continue to be tackled in corporate and university labs. However, there remains the significant challenge of real time, online head, hand, and full-body tracking for teams of individuals, as might arise in team training applications. The major issue is that of the sociability of the current approaches, as defined in Meyer et al. (1992). As far as we know, all current commercial and research systems will begin to run into problems with more than a few colocated collaborating trainees. The issues include source or sensor bandwidth (cannot flash or image fast enough), processing speeds, signal synchronization, and signal interference between/by nearby users. In fact, we think there exists an interesting conflict between team training desires and the sociability shortcomings of today’s centralized tracking systems: as trainees get closer to each other, the tracking information will likely become more critical, and yet that is precisely when the likelihood of interference from each other increases. It seems that to accommodate teams of colocated collaborating trainees researchers might have to rethink the entire single-user centralized approach. Perhaps the most exciting (to us) area of ongoing research related to tracking for training in virtual environments is related to tracking outdoors, as would be needed for military training exercises, for example. The excitement in this area comes in two forms. First is the growing crossover between the computer graphics and computer vision communities. Some examples include work by Tobias Ho¨llerer et al. at the University of California at Santa Barbara, Ulrich Neumann et al. at the University of Southern California, and Didier Stricker et al. at Fraunhofer-Gesellschaft. With simultaneous ongoing advances in computer vision algorithms and cameras, this synergy promises to provide some exciting capabilities in the not-so-distant future. On the topic of cameras, the second form of excitement related to tracking outdoors is in the continually improving technologies that can be used outdoors. This includes rapid improvements to cameras, shrinking and more stable inertial sensors, and improved GPS, including differential signaling and pseudolites.
46
VE Components and Training Technologies
Happily, these improvements continue to be spurred on by other commercial demands. On a final note, we think that perhaps acoustic/audio sensors are currently undervalued and might find new favor in addressing both team-training needs and outdoor tracking. (The InterSense acoustic hybrids are one shining counterexample.) With respect to team training, small acoustic devices could provide a complementary absolute reference for inertial sensors in a body-relative tracking scheme. The work by Vlasic et al. (2007) is a good example of this. With respect to tracking outdoors, we find it interesting that blind people use internalized models of environmental noise as an absolute reference for estimating their location. This includes sounds from traffic indicating the road and sounds from air conditioning units indicating a building. Just as researchers have realized that the human combination of vision and inertial (vestibular) sensing is valuable, we might also recognize the added value of environmental sounds as yet another source of absolute geospatial references. REFERENCES Allen, B. D., Bishop, G., & Welch, G. (2001). Tracking: Beyond 15 minutes of thought [SIGGRAPH 2001 Course 11]. In Computer Graphics, Annual Conference on Computer Graphics & Interactive Techniques (SIGGRAPH 2001 course pack edition). Los Angeles: ACM Press. Atkeson, C. G., & Hollerbach, J. M. (1985). Kinematic features of unrestrained vertical arm movements. Journal of Neuroscience, 5(9), 2318–2330. Azuma, R. T. (1993). Tracking requirements for augmented reality. Communications of the ACM, 36(7), 50–51. Azuma, R. T., & Bishop, G. (1994). Improving static and dynamic registration in an optical see-through hmd. In Computer Graphics, Annual Conference on Computer Graphics & Interactive Techniques (pp. 197–204). Los Angeles: ACM Press. Azuma, R. T., & Bishop, G. (1995). A frequency-domain analysis of head-motion prediction. In Computer Graphics, Annual Conference on Computer Graphics & Interactive Techniques (pp. 401–408). Los Angeles: ACM Press. Bhatnagar, D. K. (1993). Position trackers for head mounted display systems: A survey (Tech. Rep. No. TR93-010). Chapel Hill: University of North Carolina at Chapel Hill. Deering, M. (1992). High resolution virtual reality. In SIGGRAPH ’92: Proceedings of the 19th Annual Conference on Computer Graphics and Interactive Techniques (pp. 195– 202). New York: ACM. Durlach, N. I., & Mavor, A. S. (Eds.). (1994). National research council report on virtual reality: Scientific and technological challenges. Washington, DC: National Academy Press. Ferrin, F. J. (1991). Survey of helmet tracking technologies. Proceedings of SPIE, 1456, 86–94. Fischer, P., Daniel, R., & Siva, K. (1990). Specification and design of input devices for teleoperation. Proceedings of the IEEE Conference on Robotics and Automation (pp. 540–545). Cincinnati, OH: IEEE Computer Society Press. Foxlin, E. (1993). Inertial head-tracking. Unpublished master’s thesis, Massachusetts Institute of Technology, Cambridge.
Tracking for Training in Virtual Environments
47
Foxlin, E. (2002). Motion tracking requirements and technologies. In K. Stanney, (Ed.), Handbook of virtual environments: Design, implementation, and application (pp. 163–210). Mahwah, NJ: Lawrence Erlbaum. Gabbard, J. L., & Hix, D. (1997). A taxonomy of usability characteristics in virtual environments (Office of Naval Research Tech. Rep. Grant No. N00014-96-1-0385). Blacksburg: Virginia Polytechnic Institute and State University. Retrieved April 22, 2008, from http://people.cs.vt.edu/~jgabbard/publications/index.html Holloway, R. L. (1995). Registration errors in augmented reality systems. Unpublished doctoral dissertation, University of North Carolina at Chapel Hill. Jacobs, O. (1993). Introduction to control theory (2nd ed.). Oxford, England: Oxford University Press. Livingston, M. A., & State, A. (1997). Magnetic tracker calibration for improved augmented reality registration. Presence: Teleoperators and Virtual Environments, 6(5), 532–546. Meehan, M., Razzaque, S., Whitton, M. C., & Brooks, F. P. (2003). Effect of latency on presence in stressful virtual environments. Proceedings of the IEEE Virtual Reality (p. 141). Washington, DC: IEEE Computer Society. Meyer, K., Applewhite, H., & Biocca, F. (1992). A survey of position trackers. Presence, A publication of the Center for Research in Journalism and Mass Communication, 1 (2), 173–200. Mine, M. R. (1993). Characterization of end-to-end delays in head-mounted display systems (Tech. Rep. No. TR93-001). Chapel Hill: University of North Carolina at Chapel Hill. Neilson, P. (1972). Speed of response or bandwidth of voluntary system controlling elbow position in intact man. Medical and Biological Engineering, 10(4), 450–459. Stanney, K. M., editor (2002). Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum. Taylor, R. (2006). Virtual reality peripheral network. Retrieved April 22, 2008, from http://www.cs.unc.edu/Research/vrpn Vlasic, D., Adelsberger, R., Vannucci, G., Barnwell, J., Gross, M., Matusik, W., & Popovic´, J. (2007). Practical motion capture in everyday surroundings. In SIGGRAPH ’07: ACM SIGGRAPH 2007 (p. 35). New York: ACM. Welch, G. (1995). A survey of power management techniques in mobile computing operating systems. ACM Operating Systems Review (SIGOPS-OSR), 29(4), 47–56. Welch, G. (1996). SCAAT: Incremental tracking with incomplete information. Unpublished master’s thesis, University of North Carolina at Chapel Hill. Welch, G., Allen, B. D., Ilie, A., & Bishop, G. (2007). Measurement sample time optimization for human motion tracking/capture systems. In G. Zachmann, (Ed.), Proceedings of the IEEE VR 2007 Workshop on Trends and Issues in Tracking for Virtual Environments. Aachen, Germany: Shaker Verlag. Welch, G., & Foxlin, E. (2002). Motion tracking: No silver bullet, but a respectable arsenal. IEEE Computer Graphics Applications, 22(6), 24–38. Wloka, M. M. (1995). Lag in multiprocessor virtual-reality. Presence: Teleoperators and Virtual Environments, 4(1), 50–63.
Chapter 2
VISUAL DISPLAYS: HEAD-MOUNTED DISPLAYS Mark Bolas and Ian McDowall Head-mounted displays (HMDs) came from the future—devices to envelop our eyes and ears and cloister us from the real world while immersing us in computer-generated fantasies limited only by our ability to algorithmically create them. Looking back 40 years to Ivan Sutherland’s (1963) “Sketchpad,” it is easy to see that HMDs quickly progressed from science fiction to delivering grounded results. Systems are now used to visualize the placement of instruments in automobile interiors, to decide where to sink the next oil well, and to train personnel in virtual scenarios too dangerous for actual practice. It is sometimes difficult, however, to separate the promise of these devices from the reality of their performance—especially when considering training applications that must accurately represent specific and well-defined environments. What makes this particularly vexing is that, while technical specifications can easily be compiled, it is difficult to understand the usefulness of these specifications with respect to a training system’s effectiveness. Some specifications just do not matter, while others create artifacts that, while hard to predict, can make or break a system. This chapter attempts to increase the decision maker’s understanding by interpreting the user’s experience of HMD technologies. It gives an overview of the physical, cognitive, and perceptual ramifications of common HMD choices and describes current design and technology examples. Readers interested in more detailed information are encouraged to look at Head Mounted Displays: Designing for the User by James E. Melzer and Kirk Moffitt (1997). WHY CHOOSE AN HMD? When creating display systems for immersive training applications, it is useful to think of an HMD as mapping pixels from its microdisplays out into a hypothetical three-dimensional virtual environment. For example, if the goal is to train a user how to fix an engine, then the pixels would best be placed in a manner
Visual Displays: Head-Mounted Displays
49
representing an engine—a few feet away from a user and concentrated in a small area. If, however, the goal is to familiarize a user with a dense urban area, then the pixels would need to panoramically span a virtual area the size of a few city blocks. For a screen based display, the engine application might use a single large stereoscopic screen placed in front of the user, while the city would require a large multiscreen panoramic configuration. This model of mapping pixels to a virtual environment’s objects can be used to consider characteristics unique to HMDs when evaluating display systems for the following specific training tasks: Flexible—As the above two examples highlight, the physical topology and required distribution of mapped pixels in a virtual space can vary greatly between different applications. HMDs can easily accommodate a range of training scenarios because they are available with a wide range of resolutions and fields of view. These choices allow the designer to tailor the display for the application at hand and to modify such choices to follow the demands of evolving system requirements. Efficient—Because HMDs use head tracking to display imagery from the user’s current point of view, they make the most out of every pixel they are fed. In contrast, a screen based system must render and display pixels everywhere, even if the user is not looking in that direction. Since the displays travel with the user’s head, HMDs carry pixels to where they are needed, effectively multiplying resolution. For example, a single 1,280 × 1,024 HMD with a 60 degree field of view will fill a 360 degree virtual sphere six times over, thus effectively providing 7,680 × 6,144 accessible pixels over the sphere. Not only does this make efficient use of the displays, but it allows a single rendered viewpoint to provide imagery that normally would require six viewpoints. Deployable—By decreasing the rendering requirements, HMDs can often be driven by a single laptop computer. This means an HMD, rendering computer, and head tracking system can fit in a single briefcase. In addition to simplifying maintenance and spares, many HMDs are designed to operate with no need for alignment and calibration—they can be easily carried to a location, turned on, and made ready to go. Potent—By occluding the user from the real world and substituting a virtual world, HMDs exhibit a type of “perceptual potency” that is hard to duplicate with any other technology. In many virtual environment configurations, users can see their own body, physical details of the display system, and portions of the surrounding environment. While useful for some applications, such real world cues can detract from fully transporting a user to a virtual world. HMDs are often configured to completely cloister a user—to algorithmically control everything the user sees and hears. This potency comes with a corresponding demand—the complete system must accurately portray the virtual environment and keep pace with a user’s motion and expectations. For example, poor tracking and slow update rates cannot be tolerated as they degrade cues important for maintaining a sense of balance.
50
VE Components and Training Technologies
Such potency can be useful in scenarios developed to elicit strong responses. A classic example is found in experiments incorporating a virtual “pit room” that appears to the user as a ledge above a 20 foot virtual drop to a room below. Most users experience a strong sense of physical danger when observing the virtual pit from their apparently precarious standpoint on the ledge (Meehan, Insko, Whitton, & Brooks, 2002). Accurate—User-specific imagery needs to be generated for every head position in a virtual environment because each user must see the scene rendered exactly from his or her perspective or the synthetic images will not match the user’s movements. Such a mismatch can result in inaccurate and possibly misleading imagery. Screen based displays can easily accommodate a single headtracked user, but multiple users pose problems due to the fact that participants in these systems are all looking at the same physical screens (Agrawala et al., 1997). HMDs have the luxury of providing each user with a personal display, each tracked to account for an individual’s head position and orientation. As such, HMDs allow all participants in a training environment to be surrounded by accurate, perspective-correct imagery. Observable—Because HMDs are tracked, they can easily be used to observe where the user is facing in an environment. For example, DaimlerChrysler Motors Company LLC employed a mechanically tracked HMD to enable ergonomic studies of proposed designs for automotive interiors. By being able to observe the view as seen by different-sized drivers, engineers were able to determine optimal sight lines and component placement (Brooks, 1999). This is particularly interesting in training applications that require a correlation between trainees’ actions, their orientation, and what they actually see. Available—The rising demand for consumer-grade digital entertainment technologies has led to the development of components that HMD based training systems have used to move beyond fiction into useful tools. Graphics technologies for video games have led to easier and lower cost modeling and rendering of synthetic worlds. The digital production of movies has led to the development of high performance motion-capture and tracking systems. The home-theater display market has created high resolution microdisplays that can be repurposed for HMDs. As such, tracking, rendering, and displays have reached a critical price-performance ratio that now enables HMDs to be cost-effectively applied to a number of new applications (Brooks, 1999).
HMD DESIGN CHARACTERISTICS When matching HMDs to specific training tasks, it is instructive to recognize that HMDs are intimate interface devices—almost like pieces of clothing—and that there is a wide variety of design choices affecting function and comfort that can best be judged by simply trying the HMD on and looking around. Just as soldiers should “train as they fight,” HMDs should be evaluated “as they will train.” If this firsthand experimentation is done in a perceptually engaged and critical manner, decision makers can avoid prejudgments. It is easy to allow data sheets
Visual Displays: Head-Mounted Displays
51
and specifications to cloud one’s judgment and accept defects that are the result of poor design trade-offs or a limited range of adjustment. It cannot be overstated that while there are some stunningly good HMDs available, there are also stunningly bad ones as well. Toward this end, it is useful to consider the interplay between design choices and their optical, physical, and cognitive effects. Overview of Visual Issues and Terms Functionally, HMDs are similar to looking at a small display with a magnifying glass and then holding the display and magnifier up to the eye. The magnifier enables the user to focus on the display as it is brought closer to the eye. The closer to the eye it gets, the larger the image appears because it replaces more of the real world with the image from the display. As such, the optical goal of an HMD is to make a small display appear large (optical magnification) and to subtend a large portion of a user’s view (field of view). This is typically accomplished by either employing optics similar to a magnifying glass (simple magnifier) or a microscope (compound optics). A quick feeling for the visual issues associated with HMDs can be had by considering a pair of binoculars. Binoculars must be held a certain distance away from the user’s eyes (eye relief ) and adjusted to align with the distance between the eyes (interpupillary distance or IPD). This alignment places the user’s pupils within the small region in front of the lens, which provides a clear view of the magnified image (eye box or exit pupil). The lenses are then focused to place the magnified image at a virtual distance in front of the user (focal plane) that both eyes can focus upon (accommodate) and allow the eyes to triangulate (converge) on objects to form a stereoscopic view. Some binoculars, especially lower cost ones, will exhibit visual artifacts or optical aberrations. In HMDs, these artifacts include a rainbow effect (chromatic aberration), many types of blur (spherical aberration, coma, astigmatism, and field curvature), and warped appearance (geometric distortion). Readers interested in a classic text on optics design are encouraged to look at Warren J. Smith’s (2000) Modern Optical Engineering. An excellent overview of HMD designs is presented in Head-Worn Displays: A Review by Cakmakci and Rolland (2006). Exit Pupil (Eye Box) Compound optics form a relatively small region called the exit pupil, which can be thought of as a small hole that is located slightly in front of the eyepiece. When the eye’s pupil is aligned with the exit pupil, a clear image is seen. If the eye’s pupil moves out of this region, light from the display becomes occluded: a portion of the image goes dark and often exhibits a characteristic kidney bean shape. The size of the exit pupil is constrained by the physics of the optical system. Small exit pupils generally allow for more aggressive optical designs, but are undesirable as they require careful alignment of the HMD with respect to a user’s eyes.
52
VE Components and Training Technologies
Alternatively, simple magnifying optics are classified as nonpupil forming and deliver a comparatively large region where the eye will see a sharp magnified image from the display—similar to looking through a magnifying glass. Although simple magnifier designs have an ideal position for the user’s eye (it lies along the optical axis of the eyepiece), the less ideal positions result in a slight blurring of the image that does not go dark and tends to degrade more gracefully than those of pupil-forming systems. Simple magnifiers, however, often require the use of larger optical components and displays. On some HMDs with small exit pupils, the user will see a good image while looking forward, but a glance to the side will make the image go dark. The eye’s pupil is located toward the surface of the eyeball, so a rotation of the eye causes the pupil to translate away from the exit pupil of the optics because the eye’s center of rotation is located behind the pupil. It is important to pay attention to this on wide field of view (FOV) HMDs that claim a field of view that is mathematically correct but impossible to achieve by some users when they actually look toward the edges of a scene. Interpupillary Distance Adjustment Narrow exit pupils often require the left and right display optics to be closely aligned with the user’s left and right eyes. Such an adjustment feature is often desired for compound optical systems. Typically, the IPD range for HMDs is specified as the total IPD, and HMDs may either adjust each display or the total IPD. IPDs vary across the population and generally range from 53 millimeter (mm) to 73 mm with an average of 63 mm (Kalawsky, 1993). Should IPD adjustment be possible, care must be taken to reset it for each participant; otherwise the situation can be made worse due to the wide range of IPDs. A test pattern displayed on the HMD can be used to set the IPD; however, some users find it confusing so it may be advantageous to numerically set the IPD before the user puts on the HMD. In this case, an interpupilometer (a common piece of ophthalmic equipment) may be used to accurately measure the user’s IPD without requiring the user’s judgment. Most software used to render virtual imagery assumes an average IPD of 65 mm; ideally, however, it should incorporate specific users’ IPDs as well. Simple magnifier systems can be designed to enable a wide range of IPDs with minimal image degradation, and some can be used without the need for IPD adjustments. Accommodation and Convergence When the eyes fixate on an object, a number of physiological actions occur. The two primary actions are the physical focusing of the eyes’ lenses to accommodate the object and the action of differentially rotating each eye to converge on the object. The rendering of the virtual environment considers the slight viewpoint differences between a user’s two eyes and draws near-field imagery with an offset between the left and right eyes. When viewing this pair of images, the
Visual Displays: Head-Mounted Displays
53
user’s eyes must rotate by different amounts based on how close each virtual object is—an object at infinity requires no convergence, while an extremely close object requires “crossed eyes.” The stereoscopic nature of an HMD is derived from this effect. HMDs magnify a microdisplay, thus fixing the focal depth of the pixels. As such, the distance at which the user’s eyes accommodate when looking at a virtual object cannot be adjusted by the computer to correspond to its virtual distance. This creates a mismatch between the convergence cues that are rendered correctly and the accommodation (focus) cues that are fixed by the HMD optics. This is a current limitation of commercially available HMDs and an area of active research (Akeley, Watt, Girshick, & Banks, 2004; Rolland, Krueger, & Goon, 2000). Screen based displays share this characteristic—the user accommodates on the surface of the screen, while trying to converge at the distance of the virtual objects. Some HMDs may be focused at different fixed depths and thereby be optimized for near-field or far-field training tasks. Field of View A user’s natural FOV is constrained by the shape of the skull and the eye socket, with the nose blocking the central portion. This can be seen by closing one eye and looking around the periphery. For most people, the FOV of each eye is 120° vertical and 150° horizontal. The combined field from both eyes is 200° with a 100° binocular overlap region that provides stereoscopic cues (Velger, 1998). These metrics vary significantly based on face geometry, age, and eye characteristics. For a given lens diameter, the closer the lens is to the eye, the larger the potential field of view. This can be observed by moving one’s palm nearer to and away from the face. Assuming one could focus on the palm, it is clear that it needs to be touching the face to come close to subtending the full field of view of one’s eye. Very wide FOV HMDs are constrained practically by the diameter of the optics and how close a user’s eyes can be to the eyepiece. The left and right eye images of many narrow FOV displays present to exactly the same region of a user’s FOV. This arrangement is said to be 100 percent overlapped, and, except for differences due to stereo parallax, the images for each eye appear to be superimposed on each other. One approach to achieving a larger total field of view is to not fully overlap the images. This provides a central region with stereoscopic imagery, and peripheral regions without stereoscopic imagery. Wide FOV designs tend toward this arrangement, which is appropriate given its match with the human visual system and facial geometry. Wide Fields of View Narrow FOVs appear to force unnatural head and body movements while also limiting the natural motion of the eye. This need to move the head and body to explore a scene can be demonstrated by curling the fingers and touching the index
54
VE Components and Training Technologies
and ring fingers to the thumb on both hands to create two cylinders. Holding the hands up to the face like a pair of virtual binoculars creates a resulting FOV approximating 45° per eye. Tasks such as walking are possible with narrow displays, however, performance is greatly improved with a wider field of view display (Arthur, 2000). Melzer and Moffitt (1997) present a summary of papers that generally indicates that wider FOVs result in better performance for tasks requiring ego orientation, locomotion, and reaching, including orientation and navigation within an environment. A wide FOV also appears to be instrumental in establishing situational awareness. Melzer and Moffitt found that it helps “the user to establish visual position constancy and to understand events that occur over a panoramic visual field” (p. 224, per Wallach and Bacon, 1976).
Optical Artifacts The art of optical design involves balancing such issues as cost and exit-pupil size with such visual artifacts as aberration and distortion. The question is not whether such artifacts are present, but whether the magnitude is great enough to detract from the goals of the training application. It is often the case that slight yet noticeable optical artifacts are unimportant, while features such as a light system weight or a wide FOV are mandatory. Chromatic aberrations are caused by the dispersion of light through optical materials. They are particularly noticeable with thick or plastic lenses and are usually seen as a rainbow effect around the edges of the image. While these could be reduced with software techniques, they are not of primary concern with most modern HMDs. Geometric distortions are a warping of the image and can take many forms, including an outward warp called pincushion distortion or an inward warp called barrel distortion. These can be reduced through computation (Robinett & Rolland, 1992). Such correction is now being integrated directly into some HMD electronics or may be implemented as part of the software application. It is not easy to correct for the many types of optical blur that are inherently linked with the quality of an optical system’s design. It is a multidimensional issue that can take many forms and is difficult to understand intuitively. As such it is best qualified through personal observation rather than solely through numerical specification. Of particular concern are artifacts that cause visual discomfort. Most important among these are artifacts that cause incorrect imagery in the region of binocular overlap. These cause the eyes to strain as they attempt to correlate imagery seen by the left and right eyes. While such effects can be the result of poor optics, they are often caused by a physical misalignment that occurs over time and must be monitored by the system operator. Additionally, rendering software must be tested for accuracy in this regard. Misaligned imagery—including swapping eyes—is a common source of discomfort that lies with the software and complete system, not the HMD.
Visual Displays: Head-Mounted Displays
55
Resolution Although the resolution of HMDs is most often described like a standard computer monitor (total number of horizontal and vertical pixels), the resolution of an HMD is best considered by the angle subtended by a given pixel, typically measured in minutes of arc and called the angular resolution. Values below three or four arc minutes per pixel begin to appear relatively crisp with human vision capable of better than one arc minute (National Research Council, 1997). For a given display resolution, a narrow FOV HMD will create better angular resolution as it concentrates pixels in a smaller angle. In this way, a wide FOV will degrade the angular resolution. This can be mitigated with optical designs that create variable angular resolution across the field of view—the central region having better resolution than the periphery. Optical artifacts, such as blur, can decrease the effective resolution. As such, a single-number specification for resolution should be but one of the metrics used when considering HMDs. Weight Specifying weight is similar to resolution: while lighter is obviously better, it is only one measure of how useful an HMD will be in practice. It must be balanced against the often competing physical characteristics of balance, rotational inertia, fit, and form. Balance affects the downward rotational force placed on the neck. A 500 gram HMD with all the weight in the front will be more uncomfortable than a wellbalanced 1,000 gram design. There are many approaches to counterbalancing an HMD that range from simply adding weight at the rear to complex optical configurations that fold the optical path around the head to locate mass away from the face, moving it backward and toward the sides of the head. Unfortunately, moving weight in this manner can increase rotational inertia—the amount of force a user will need to exert when quickly looking around. This is an important consideration for training applications that require rapid head motions approaching 1,000° per second squared (Bolas & Fisher, 1990). Form As discussed, some optical designs require that the HMD optics be well positioned relative to the wearer’s eyes to within millimeters. The HMD needs to accurately hold electronics, optics, and displays, and it must adjust to fit a wide range of human heads with a firm grip that does not allow the system to slip during rapid head motions. There is a variety of mounting techniques that can best be judged by having a variety of users wear the system. The physical form of the display needs to be considered with the target application in mind. For example, driving simulation and rifle training applications often use physical props that must be held close to the user’s head. As such, the
56
VE Components and Training Technologies
HMD cannot extend far away from the user’s face or it will interfere with a real steering wheel or gun sight. Additional form considerations include the time it takes to fit and don an HMD, and making sure the HMD is compatible with required gear such as helmets and jackets and that it will not snag on cables or cloth. Ventilation, heat, and compatibility with the user’s eyewear are additional concerns. EXAMPLES Optical Approaches Wide Field of View In 1985, NASA (National Aeronautics and Space Administration) Ames Research Center created a wide FOV HMD by integrating optics for viewing film based stereoscopic pairs (originally for the large expanse extra perspective [LEEP] wide-angle camera system) with liquid crystal display (LCD) panels that were roughly of the same size. This configuration was commercialized in the VPL Research, Inc. EyePhone, Fakespace Labs BOOM, and Howlett LEEP Video System I. Consumer-priced narrow FOV displays in the late 1990s (for example, the Sony Glasstron) turned attention away from immersive wide FOV displays toward narrow FOV designs more suitable as monitor replacements. This led to wide FOV HMD designs remaining largely static until the mid-2000s. The recently introduced Fakespace Labs Wide5 provides a FOV exceeding 150°. A large pupil is created by incorporating modern LCD panel technologies and a single lens design. It originally was designed to provide a robust and portable virtual training system that could easily be deployed in the field for closequarters battle training applications as part of the U.S. Navy’s virtual training and environments program. To meet those requirements, it can be mounted to helmets with a standard night vision mount and incorporates a custom interface deriving stereoscopic pairs from a single digital visual interface signal available from a laptop. To increase the perceived resolution, the Wide5 has higher pixel density toward the central region. Tiled Designs Wide FOV HMDs can be created by tiling a number of smaller displays in a concave form in front of each eye. Tiles are composed of a display module and eyepiece optics that butt together and cover a portion of the perceived field of view. These displays generally require precise adjustment when the HMD is placed on the head to reduce visual tiling artifacts—the eyes must be aligned with each microdisplay and lens. Each microdisplay typically requires a video source, thus six displays per eye require 12 rendered viewpoints, making computing demands a significant system consideration, but also increasing resolution. Kaiser Electro-Optics (now part of Rockwell Collins) created a display under contract to the Defense Advanced Research Projects Agency Electronic Technology Office in the 1990s incorporating a three by two matrix of displays for each
Visual Displays: Head-Mounted Displays
57
eye. This display used small LCD displays driven by 12 separate graphics inputs. The resulting field of view was over 153° horizontal by 48° vertical (Arthur, 2000). Sensics Inc. has created a tiled display that uses a matrix of displays to create a wide FOV virtual image. Organic light-emitting displays (OLEDs) are used in a modular approach so the display may be configured in a variety of different ways, including a seven by three per eye arrangement of displays to provide a total of 4,200 × 2,400 pixels. In this tiled configuration, all 21 optical and display assemblies have to align with each of the user’s eyes. This family of HMDs can be configured with fields of view ranging from 72° to 179° horizontal by 30° to 60° vertically. High Resolution Inserts The human eye has superior visual acuity in a small region known as the fovea. A few HMDs have been created that employ a wide field of view display coupled with a second display tracked to the fovea. This results in a wide FOV immersive experience, enhanced with the precision and clarity typical of narrow FOV displays. The implementation of this is complex as it requires tracking of the eye. There have been very few systems fielded using this approach. CAE created such a system used for helicopter simulation in 1981 (Velger, 1998). Medium Field of View Displays One of the earliest examples of an HMD used for virtual environment visualization was built in 1968 as part of Ivan Sutherland’s groundbreaking work. This system used half-inch monochrome cathode ray tube (CRT) displays that provided a 40 degree field of view per eye. The image was reflected from partially silvered mirrors creating an augmented reality display system. The Virtual Research V8 design and the NVIS, Inc. SX display are both popular displays in this range of FOV. The NVIS SX offers a horizontal field of view of 48.5° and vertical of 39.6° and uses 1,280 × 1,024 field sequential color ferroelectric liquid crystal on silicon (FLCOS) panels. The V8 uses lower resolution transmissive LCDs and has a horizontal field of view of 49° and vertical of 33°. Both weigh over two pounds. Typically, adjustment of the IPD is needed to achieve the best quality image and to reduce artifacts as the eye moves. Rockwell Collins makes several HMDs in this FOV range. Its SIM EYE product employs a see-through optical design with the displays located to the sides of the head, and relay optics deliver the images to semitransparent eyepieces. Independent IPD adjustment is provided for each eye. Narrow Fields of View—Personal Display Monitors HMD with fields of view of around 25° are available at a fairly low cost. Most employ OLED or small LCD displays coupled with magnifying optics. Examples include OLED displays by eMagin Corporation and LCD based Vuzix (Icuiti) designs. Daeyang and IODisplays use liquid crystal on silicon (LCoS) displays and reflective magnifying designs. While many of these come bundled with
58
VE Components and Training Technologies
head-tracking technologies, they are of limited utility for fully immersive training applications due to the restricted field of view. By way of example, a 25 degree FOV is equivalent to viewing a 21 inch monitor at 4 feet. Alternative Optical Approaches An HMD that incorporates a head-mounted projector uses a very different optical path. The projector directs an image from the user toward a retroreflective material that reflects the image back toward the user’s eyes. This approach has shown promise for cockpit displays and other applications in which a type of virtual overlay can be implemented with cut sheets of retroreflective material (Fergason, 1997). Displays that provide the user with multiple planes of focus are not yet practically deployed, but encouraging research by Akeley et al. (2004) shows that a subset of planes can be used to create a display that could alleviate some of the issues associated with the accommodation and convergence issues discussed previously. HMDs that mix real world and virtual imagery provide many unique advantages. These are described in Henderson and Feiner, Volume 2, Section 1, Chapter 6—“Mixed and Augmented Reality for Training.” Display Technologies The display requirements for HMDs are demanding because, ideally, they would be light and exhibit very high resolution, color depth, brightness, and contrast while using little power and creating few temporal artifacts. Naturally, achieving all these requirements simultaneously is challenging, and designers resort to the best solutions available at the time in the context of their overall system goals. Field Sequential Liquid Crystal on Silicon Color Displays (FLCOS) FLCOS displays reflect light, and the polarization of pixels controls brightness. These displays employ front illumination with a separate field sequential color light source. Such field sequential displays as FLCOS show a sequence of primaries for each pixel and the eye’s persistence of vision integrates the sequential presentation of the colors and perceives the image in full color. Visual artifacts from field sequential color displays are generally not objectionable in narrow field of view conditions. In wide FOV HMDs, color flicker is an issue for some users as peripheral vision is more sensitive to motion and flicker. Active-Matrix Liquid Crystal Displays (AMLCDs) These displays are like those used in laptop screens and are typically thin-film transistor transmissive displays illuminated by a backlight. Pixels are composed of three subpixels (red, green, and blue). AMLCD displays are made by a variety of companies; Kopin Corporation makes very high resolution displays suitable
Visual Displays: Head-Mounted Displays
59
for HMDs. A fast response time is needed to reduce the smearing of moving objects. Organic Light-Emitting Displays (OLEDs) OLEDs emit light from their surfaces, which enables both truer blacks and more compact optical designs because there is no backlight assembly. eMagin Corporation is currently the primary source for small form factor pixel-type displays. These displays are new, and improvements in lifetime, resolution, and size are expected. Alternative Display Technologies Several companies have made use of fiber-optic image pipes that decouple the display from the HMD. Laser based systems that project directly onto the retina will no doubt be part of the future HMD landscape. Microvision, Inc. continues to innovate with such laser based projectors. HMD designs have largely moved away from cathode ray tubes. Mounting Approaches Generally, a person can comfortably carry an additional 10 percent of his or her head weight for indefinite periods. As the typical head weighs approximately 10 kg, it is desirable to have an HMD that weighs less than 1 kg (kilogram). A brief overview of the numerous techniques that have been developed to mount HMDs follows. On-Head Mounts Spectacles—These designs are suitable for narrow field of view displays where the weight is minimal and the narrow field of view is achieved with small plastic lenses. FOV is under 30° and the weight is typically under a few ounces. The Vuzix (Icuiti) products are good examples. Forehead Rest—HMDs like those from IODisplays and eMagin weigh around six to eight ounces and use more complex optics. The displays have a pad resting on the frontal bone and a strap around the back of the head and sometimes one over the crown of the head. The strap needs to be tight to create friction on the forehead mount, but provides purchase for the HMD. Scuba Mask—Older systems, such as that from VPL, used heavier displays held on the face by a large contact area and a tight head strap grabbing the back of the head. This design adds a minimal amount of additional weight; however, the strap-type adjustment is inconvenient and has to be tight to hold the HMD in place. They often provide poor ventilation and can become humid. Helmet Based—Kaiser Electro-Optics SIM EYE and L3 advanced helmet mounted displays (Sisodia et al., 2007) mount directly to a training or flight helmet; these designs permit the use of the pilot’s own helmet. Typically there would be several sizes and helmet designs that need to be supported.
60
VE Components and Training Technologies
Head Strap—Fakespace Labs Wide5 and Virtual Research V8 incorporate a ratchet-style head strap that can be tightened to hold the HMD on securely to fit most people. This strap design does not grab the occipital so these work best with a counterweight. This design is relatively easy for the person wearing the HMD to adjust. Exoskeletal—Disney, Kaiser Electro-Optics ProView, and other designs have employed a ridged exterior and a supple interior strap. The advantage is that the rigid exterior frame helps transfer the load of the HMD to the head strap and head in multiple places. Webbing—SEOS Limited and others hang the HMD around the head and aim to balance the straps holding on the HMD. These designs get heavy rather quickly, are cumbersome to put on and remove, and are hard to keep accurately aligned with the eyes. Over the Head with Rigid Frame—Sensics and Keio University Shonan Fujisawa Campus use designs that capture the occipital and leave the area near the ears unencumbered. These and other systems that are not counterweighted need to be tight in the back, and, consequently, are most easily adjusted by another person. Counterbalanced Approaches A number of display environments (for example, those designed for a seated user) do not require free movement while the user wears the HMD. In these situations, there may be advantages to counterbalancing the mass of the HMD with an external mechanical structure. This reduces the weight of the HMD on the user and may also afford precise tracking. Examples include the Fakespace BOOM and Disney’s Aladdin ride, which used a cable system to counterbalance the displays. CONCLUSION To select an appropriate HMD, the decision maker will find it informative to physically try the display to fully gain an appreciation of its functionality and to spend time in the HMD, looking all around the image and questioning the effects. As discussed in this chapter, many of these effects and artifacts are quite subtle and require an engaged and observant test of the display as described here. Step One: Don the HMD and move in a manner similar to the training application, paying attention to the effect of rapid or unusual exploratory motions. In addition to feeling for any uncomfortable physical sensations, such as looseness or offset center of mass, pay particular attention to the virtual images, looking for a bright and sharp environment across the entire field of view. An important step is to refit the helmet as often as required to optimize the experience. Step Two: Now with the HMD properly fit, close and relax the eyes for 20 seconds, then look straight ahead for 10 seconds and roll the eyes around to explore the edges of the environment. Rotate the head and explore the environment in a manner consistent with the training application. Pay attention to optical artifacts
Visual Displays: Head-Mounted Displays
61
that cause visual discomfort or that create a misleading virtual environment. It is often useful to alternate closing the left and then the right eye to look for differences, both while fixating on specific objects and while independently exploring the field of view for each eye. When properly selected and integrated, HMDs leverage emerging technologies to create efficient and flexible training applications that are easily deployed. With the ability to completely cloister a user in a synthetic environment, HMDs can enable the development of virtual training scenarios that are impractical to duplicate in the real world. REFERENCES Agrawala, M., Beers, A. C., Fro¨hlich, B., Hanrahan, P., McDowall, I., & Bolas, M. T. (1997). The two-user responsive workbench: support for collaboration through individual views of a shared space. In SIGGRAPH ’97: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (pp. 19–26). New York: ACM Press/Addison-Wesley. Akeley, K., Watt, S. J., Girshick, A. R., & Banks, M. S. (2004). A stereo display prototype with multiple focal distances. In SIGGRAPH ’04: Proceedings of the 31st Annual Conference on Computer Graphics and Interactive Techniques (pp. 804–813). New York: ACM Press. Arthur, K. (2000). Effects of field of view on performance with head-mounted displays. Unpublished doctoral dissertation, University of North Carolina, Chapel Hill. Bolas, M. T., & Fisher, S. S. (1990). Head-coupled remote stereoscopic camera system for telepresence applications. In S. S. Fisher & J. Merrit (Eds.), SPIE: Stereoscopic displays and applications (Vol. 1256, pp. 113–123). Bellingham, WA: SPIE. Brooks, F. P., Jr. (1999). What’s real about virtual reality? IEEE Computer Graphics and Applications, 19(6), 16–27. Cakmakci, O., & Rolland, J. (2006). Head-worn displays: A review. IEEE/OSA Journal of Display Technology, 2(3), 199–216. Fergason, J. L. (1997). Retro-reflector based private viewing system. U.S. patent number 5629806. Kalawsky, R. (1993). The science of virtual reality and virtual environments. Wokingham, England: Addison-Wesley. Meehan, M., Insko, B., Whitton, M., & Brooks, F. P., Jr. (2002). Physiological measures of presence in stressful virtual environments. In SIGGRAPH ’02: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (pp. 645–652). New York: ACM Press. Melzer, J. E., & Moffitt, K. (1997). Head-mounted displays: Designing for the user. New York: McGraw-Hill. National Research Council. (1997). Tactical display for soldiers: Human factors considerations. Washington, DC: National Academy Press. Robinett, W., & Rolland, J. P. (1992). A computational model for the stereoscopic optic of a head-mounted display. Presence: Teleoperators and Virtual Environments, 1(1), 45–62. Rolland, J. P., Krueger, M., & Goon, A. (2000). Multi-focal planes in head-mounted displays. Applied Optics, 39(19), 3209–3215.
62
VE Components and Training Technologies
Sisodia, A., Bayer, M., Townley-Smith, P., Nash, B., Little, J., Cassarly, W., & Gupta, A. (2007). Advanced helmet mounted display (AHMD). In R. W. Brown, C. E. Reese, P. L. Marasco, & T. H. Harding (Eds.), SPIE: Head and helmet-mounted displays XII: Design and applications (Vol. 6557, p. 65570N). Bellingham, WA: SPIE. Smith, W. J. (2000). Modern optical engineering: the design of optical systems (3rd ed.). New York: SPIE Press/McGraw-Hill. Sutherland, I. E. (1963). Sketchpad: A man-machine graphical communication system. In AFIPS Spring Joint Computer Conference (pp. 329–346). Montvale, NJ: AFIPS Press. Velger, M. (1998). Helmet mounted displays and sights. Norwood, MA: Artech House Inc. Wallach, H., & Bacon, J. (1976). The constancy of the orientation of the visual field. Perception and Psychophysics, 19, 492–498.
Chapter 3
PROJECTOR BASED DISPLAYS Herman Towles, Tyler Johnson, and Henry Fuchs Many in the computer graphics community refer to the 1990s as the decade of virtual reality (VR), but the stage was set by the research and technologies developed during the 1980s. By the mid-1980s, Silicon Graphics Inc. had introduced its second-generation three-dimensional (3-D) workstation, the DataGlove was available from VPL Research, Inc., and Polhemus and later Ascension were delivering trackers. When it came to display, LEEP Optical was co-developing a head-mounted display (HMD) with NASA (National Aeronautics and Space Administration) Ames Research Center, and cathode ray tube (CRT) projectors were being marketed by Electrohome, Barco, Sony, and others. While projectors have been used in large-format, vehicle based simulation display since the 1970s (CAORF, 1975), by the early 1990s many VR researchers were focused on mobile VR, where HMDs proved to be the most cost-effective stereoscopic display solution. But at SIGGRAPH 1993, the world also experienced a new vision for projective virtual environments with two landmark demonstrations: the CAVE, created by Carolina Cruz-Neira, Daniel J. Sandin, and Thomas A. DeFanti of the Electronic Visualization Lab–University of Illinois at Chicago, and the Virtual Portal, created by Michael F. Deering of Sun Microsystems Computer Corporation. These environments provided an almost unlimited field of view, reduced rotational mismatches between vestibular and visual cues, and provided the ability to walk around and observe everything and everyone in the shared space. Since that introduction in 1992, the advantages of a projective virtual environment have not changed, but the expense and the complexity of rendering pixels everywhere have. Over the last decade the cost of rendering per projective display channel has dropped precipitously, from $50K 3-D workstations and $100K projectors to $5K personal computers (PCs) with superior graphics and $5K digital projectors that are smaller and more stable. Today graphic processing units (GPUs) have enough computational power and programmability to execute advanced warping and blending algorithms in real time, eliminating the need for special purpose hardware. Camera based calibration techniques, first demonstrated by Raskar et al. (1998) and Surati (1999), are being adopted into new products and are turning the arduous task of display setup and maintenance into
64
VE Components and Training Technologies
the mundane. Projective research over the last decade has accelerated tremendously with new and better calibration methods and rendering techniques being introduced at technical conferences annually. Researchers are now rendering onto ordinary walls, eliminating the need for expensive, space-consuming screens. New passive stereoscopic solutions such as Infitec now exist that require no special display surface. In addition, software frameworks, including VR Juggler and Chromium, are evolving to greatly simplify the complexity of building applications that run on distributed rendering clusters. These developments collectively forecast a bright future for projective virtual environments. Figures 3.1 through 3.6 are illustrative of the many projective virtual displays being utilized today. The remainder of this chapter discusses many issues to be considered in buying or building a projective display system, while highlighting recent research advances that will impact future products.
DESIGNING A CUSTOM PROJECTIVE VIRTUAL ENVIRONMENT The design or purchase of any multiprojector display environment should begin with consideration of the application requirements and such fundamental questions as, What visual field of view (FOV) and acuity are required? Is stereoscopic display needed? Will the users be stationary or mobile? Should rear projection be used to avoid shadowing? What requirements, if any, should
Figure 3.1. Six-projector spherical display with a 220° by 60° field of view that shows the tripod-mounted calibration camera in the foreground. Image courtesy of Dr. Chris Jaynes, Mersive Technologies.
Projector Based Displays
65
Figure 3.2. A thirteen-projector, joint terminal attack controller (JTAC), virtual dome trainer with visuals by MetaVR, Inc., and display calibration by Mersive Technologies. Image courtesy of Air Force Research Lab (AFRL), Mesa, Arizona.
be placed on the display surface? The answers will vary greatly depending on the application. For example, if the application is scientific visualization, the design may focus on resolution and giving the viewer the ability to move to get a closer or different view of the data. In this situation, a display wall built with a rectilinear array of rear-mounted projectors may be ideal. If the application is a flight simulator, then the users are generally stationary and shadowing is not an issue, so a front-projection, forward hemispherical display may fulfill the out-thewindow view requirements. Depending on the mission, high display resolution may be needed so the user can accurately identify ground detail or spot incoming aircraft. If the display is to be used for a variety of full-immersion virtual experiences, then a front-projection, hemispherical dome theater or a six-wall, rearprojected, user-tracked, stereoscopic CAVE may be excellent choices. Defining the underlying display requirements of a virtual environment application begins to narrow the design choices, but in practice this is just the beginning of the system trade-offs that must be made. The following sections will highlight practical design issues that must be considered to build and calibrate the display system, but will also provide some understanding of the software architecture and rendering issues in developing applications. A very useful reference that can be considered a companion to this chapter is the textbook by Majumder and Brown (2007). The book touches on many of the topics discussed below, in many cases in more detail. The goal of this chapter
66
VE Components and Training Technologies
Figure 3.3. Re-creation of a virtual space with spatial and visual realism by (top) using polystyrene blocks assembled to approximate real world models and (bottom) texturing the surfaces with six projectors. Being There project at The University of North Carolina at Chapel Hill.
Projector Based Displays
67
Figure 3.4. Two digital flats simulating plaster walls used in conjunction with window and door props create a virtual room. A FlatWorld user is seen viewing a rearprojected stereoscopic virtual world through a physical door. Image courtesy of the University of Southern California Institute for Creative Technologies.
Figure 3.5. A classroom of second graders immersed inside of a 24 foot Elumenati GeoDome displaying The Molecularium: Riding Snowflakes using a single, wide field of view OmniFocus projector.
68
VE Components and Training Technologies
Figure 3.6. An example of seamless warp-and-blend rendering with two projectors displaying onto three room walls with two corner columns. Department of Computer Science at the University of North Carolina at Chapel Hill.
is to complement that work—highlighting additional issues to consider in building projective virtual environments and reemphasizing others based on the experience of the authors.
Display Configuration The most fundamental display configuration decision is to use front projection or rear projection. Rear projection has several advantages, including no viewer shadowing and nearly constant pixel density with orthogonal projection, but is largely limited to display onto planar screens. The biggest advantage of front projection is space saving—no additional room space behind the screen is needed. Another advantage of front projection is the ease of display onto smoothly curved surfaces, thus avoiding the difficult challenge of making display into corners photometrically seamless. For front surface this comes at a price, as the ideal location of the projectors is in the space of the users. Moving the projectors sufficiently out of the user space usually creates severe keystoning, which may be compensated by the projector’s lens-shift adjustments. If not, any remaining geometric distortion must be addressed in the rendering process, which will be addressed in the section “Seamless Rendering.” Some of the placement difficulties in both
Projector Based Displays
69
front and rear projection can also be solved using mirrors to fold the projector’s optical path. Determining the number of projectors and their locations for a given display surface shape can therefore be a complex task. For simple configurations (planar or cylindrical display shapes involving only a few projectors with nearly orthogonal projection), one can do a simple geometric analysis and reasonably expect to compute approximate pixel density, lens requirements, and location of the projectors. For more complex designs involving a large number of projectors and more complex display surface shapes, a computer-aided design (CAD) tool is useful. Manufacturers and researchers actively engaged in the development of projective display systems have developed CAD tools to aid in the display design process. State, Welch, and Ilie (2006) describe a tool for interactive camera placement and visibility analysis that has also been utilized for the dual task of display system layout. 3D-Perception sells CompactDesigner, a sophisticated theater-design tool that provides 2-D/3-D view analysis, display-coverage diagrams, and resolution (pixel density) plots. This tool also warns of impossible positioning of projectors (for example, physical conflicts with other projectors), unwanted screen shadows, and projected regions that are outside a projector’s depth of field (focus). Focus Issues To maximize the output light efficiency, projectors are designed with a large lens aperture, which can equate to a rather shallow depth of field. Furthermore, the lenses in commodity projectors are often optimized to focus on a planar screen parallel to the lens/imager plane. Therefore, the issue of focus cannot be summarily ignored in nonorthogonal display configurations. The focus issue is moot with laser projectors, as lasers have small apertures and effectively infinite depth of focus. Unfortunately, their cost remains prohibitively high for most applications. With standard optical designs, keep in mind that increasing the distance of the projector to the screen will improve the depth of field, as will selecting a projector with a shorter focal length lens. For example, the fisheye lens used in the wide FOV OmniFocus projection system from The Elumenati has such a short focal length that the depth of field is nearly infinite. In addition, Bimber, Wetzstein, Emmerling, and Nitschke (2005) have demonstrated the use of multiple overlapping projectors to improve overall display focus, and Brown, Song, and Cham (2006) developed an algorithm to precondition the imagery (adaptive sharpening) before projection to help ameliorate image defocus due to projector depth-of-field limitations. Shadowing Issues In front-projection theaters, shadowing caused by the viewer can be a design issue that may be mitigated with careful projector placement, but the use of shear projection geometries to minimize shadowing has practical limits set by depth-offield and pixel-sampling considerations. While typically used to position an
70
VE Components and Training Technologies
image and eliminate keystone distortions, projectors with lens shift can also be helpful in creating larger shadow-free viewing zones. The emergence of lensless projectors that use aspheric mirrors to create large-screen projection with an ultrashort throw distance can also be effectively used to minimize shadowing issues as demonstrated in Figure 3.7. Several researchers have also demonstrated solutions for shadow elimination that use cameras to actively sense occlusions and dynamically modify the blending attenuation masks of two or more projectors that are illuminating the same surface region (Cham, Sukthankar, Rehg, & Sukthankar, 2003; Sukthankar, Cham, & Sukthankar, 2001; Jaynes, Webb, Steele, Brown, & Seales, 2001; Rehg, Flagg, Cham, Sukthankar, & Sukthankar, 2002). Abutted versus Overlapped Display For the purpose of keeping the rendering task as simple as possible, it is tempting to consider projector and screen layouts based on geometrically (horizontal and/or vertical) tiled images that only abut and have no optical overlap. Many rear-projected visualization walls and piecewise-cylindrical designs have been created this way. However, the practical difficulties of controlling lens (pincushion and barrel) distortion and setting up a tiled array with perfectly matched edges have fostered a great deal of research and some commercial solutions that
Figure 3.7. A Social Computing Room utilizing 12 lensless, short-throw projectors that allow users to walk up within one foot of the front projected image without casting a shadow. Image courtesy of the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill.
Projector Based Displays
71
achieve seamless display based on casually overlapped imagery and the use of more sophisticated camera based setup and rendering techniques. Until these advanced techniques are more universally supported, it may be wise to weigh the effectiveness versus cost of the two approaches. Hardware Considerations Projector Selection A useful resource in selecting projectors is the Web site www .projectorcentral.com, but just comparing specifications is often inadequate. It is important to evaluate units firsthand and not to assume that a new model from a company will exhibit quality and features similar to those of previous models from the same vendor. Projection Technologies Nearly all projectors today are built with digital imaging devices, which enable lower cost and more stable solutions than CRT based projectors. Three digital imaging technologies are commonly used in today’s projectors: LCD (liquid crystal display), LCoS (liquid crystal on silicon), and DMD (digital micromirror device). The first DMD projector was developed by Texas Instruments and is marketed with the DLP (digital light processing) label. LCoS, like DMD, is a reflective imager, while LCD is a transmissive imager. Sony’s SXRD (Silicon X-tal Reflective Display) and JVC’s D-ILA technologies are LCoS devices. LCD and LCoS projectors use three-imager designs with complex optics for color separation and recombination. Three-chip DMD systems are available, but the time-modulated characteristic of DLP technology makes possible a singlechip design that utilizes a spinning optical-filter wheel to colorize the whitelight source and sequentially present red, green, and blue (RGB) images. Today’s DLP designs, especially those sold for the presentation market, often include a fourth clear filter segment that improves the contrast in gray-scale imagery at the expense of slightly desaturating color imagery and raising the black level. A clear color filter complicates the blending of overlapping images in projector overlap regions (Stone, 2001). Therefore, developers may wish to select a DLP projector with only RGB filters or an operational mode that disables the use of the clear filter segment. Manufacturers marketing to the visualization and simulation markets are aware of this photometric issue. Projector Specifications The three parameters most commonly used to compare projectors are brightness, resolution, and contrast (ratio). Brightness of 2,000 to 4,000 lumens and native resolutions of one to two megapixels are commonplace, with up to eight megapixels of resolution in some LCoS units. Contrast is arguably the most critical display parameter, but the industry practice of specifying full-screen on-off ratios does not adequately characterize intraframe contrast, which is much lower due to internal reflections in the optical system.
72
VE Components and Training Technologies
Brightness and the lens system largely drive the size and weight of projectors. In 2008, the median weight of 2,000 lumen projectors is approximately 5.7 pounds, increasing to 13.1 pounds for 4,000 lumens, and 31.8 pounds for 6,000 lumens (data courtesy of www.projectorcentral.com). A projector’s horizontal FOV is commonly disguised in a specification known as throw ratio (D/W), which is defined as the ratio of the distance (D), measured from the lens to the screen, divided by the width (W) of the planar projected image. Many projectors provide a zoom lens for modifying the image size, and some high end projectors may offer interchangeable lens options. Zoom lenses typically exhibit radial distortion on each end of the zoom (pincushion at wide FOV and barrel at narrow FOV) that may need to be modeled in order to achieve seamless geometry in multiprojector displays. Optical vignetting or the gradual darkening of the image toward the image periphery caused by shadowing in large aperture, multielement lens designs can create photometric challenges in multiprojector displays. Few vendors quote specifications for flat-field luminance variation, but it is not uncommon to find luminance in image corners to be 80 to 90 percent of that in the central optical field. Projectors with more than a 20 percent luminance variation should be avoided. Sources of system latency or lag are always a concern in creating virtual environments. CRT based projectors typically have zero lag, but today’s digital projectors may exhibit some image-processing latency. Few vendors quote a latency specification, but the authors have not seen specifications or measured delays exceeding one frame time. Operating noise level of projectors is also unspecified by most vendors, but should be duly considered, especially in front-projection designs. While overall environment acoustics can be complex, it may be necessary to add additional baffling for projectors with more than 30 dBA (decibels) of operating noise. (Note: to avoid mirage-like optical distortion, it is also important to make sure the hot exhaust air from one projector does not vent into the optical path of a neighboring projector.) The final two projector features to consider in selecting a projector are a serial control interface and a digital video interface (DVI). As the number of projectors in the display system increases, a serial interface can be critical for projector initialization and turning devices on and off. DVI provides an absolute pixel mapping from the graphics card to the projector’s imager to avoid the clock phasing and jitter issues common to analog (video graphics array) video. It is also important to drive the projector at the native resolution to avoid resampling issues. Stereoscopy Options The stereoscopic presentation challenge is to deliver a unique image to each eye. While the solution for HMDs is to include a separate imager for each eye, in a projective environment it is necessary to display both left- and right-eye images onto the same screen and require the user to wear either active or passive stereoscopic glasses to discriminate between the two images.
Projector Based Displays
73
Active stereoscopy is based on liquid-crystal shutter glasses that alternately open and close synchronized to the time-sequential presentation of the left- and right-eye images. Projector-shutter glass synchronization is achieved with an infrared emitter that is connected to the image generator or the projector. The time-sequential nature of active stereoscopy requires a high display-refresh rate (>100 hertz) to avoid perceived flicker. A single CRT projector can easily achieve this, but this image update rate is difficult to achieve with all but a few digital projectors (InFocus DepthQ and some three-chip DMD models). Twoprojector (100 percent overlap) active stereoscopic solutions also exist. In passive stereoscopy, the viewer’s glasses use passive filters to discriminate between the left- and right-eye images that are simultaneously displayed by two projectors. Passive glasses are less expensive than active glasses and for that reason are popular for large-group stereoscopic theaters. Two types of stereoscopic image encoding are typically used—polarization and anaglyphic. Both linear (vertical and horizontal) and circular (clockwise and counterclockwise) polarization are used to encode the two stereoscopic image pair. Linear polarization is very susceptible to left-right image cross talk with even a small amount of head tilt, while stereoscopic separation based on circular polarization is invariant to head tilt. At the same time, the quality of image separation with circular polarization has a wavelength dependency (for example, the separation of green imagery may be better than red and blue). As a result, linear polarization can provide better image separation than circular polarization, but because of the invariance to head tilt, most users consider circularly polarized stereoscopy superior to linear polarized stereoscopy. Two additional factors must be considered with polarization based stereoscopy. First, the display surface (or screen in the case of rear projection) must be polarization preserving. Finally, converting a pair of DMD projectors into a linear- or circular-polarized stereoscopic system through the addition of external optical filters is straightforward as the output light is not inherently prepolarized. However, the light from LCD and LCoS projectors is, by nature of the device, linearly polarized, but for optical design reasons the polarization of the green image is commonly rotated 90° relative to the red and blue images. This means one cannot add a simple quarter-lambda retarding filter and create light that is circularly polarized with the same orientation (clockwise or counterclockwise) for the RGB images. More sophisticated frequency-selective retarders are required, or one can cleverly swap which projector displays the left- and righteye green images. In anaglyphic stereoscopy, the image encoding is based on wavelengthdependent multiplexing. Simple red-blue anaglyphic stereoscopy is useful for demonstration of basic stereoscopic principles, but multiplexing the two images into just two colors does not produce a practical full-spectrum solution for virtual environments. Another anaglyphic option, developed by DaimlerChrysler, is Infitec—an interference filter technology that divides the visible spectrum into two parts with eye-interleaved, three-band notch filters. Infitec is head-rotation invariant and does not require a polarization preserving screen or surface, but because
74
VE Components and Training Technologies
of the spectral selectivity the two eyes can perceive different colors for the same displayed RGB value.
Screen Considerations A full discussion of screen issues is outside the expertise of the authors, but we would be remiss if we did not at least stress the importance of screen materials and surface shape as these issues are important in creating a photometrically seamless and high contrast display environment. Developers should actively research or seek out the advice of experts in the projection screen field to answer questions related to screen gain, view position dependencies, interreflection issues, ambient lighting and impact on contrast, polarization preserving surfaces for stereoscopy, and new high contrast screen options. Such companies as Da-Lite, Draper Inc., and Stewart Filmscreen Corporation can be a valuable source of information on screen and surface options. An alternative trend to the use of screens is to project onto existing room surfaces and compensate for the geometric and photometric irregularities using camera based, closed-loop calibration. This approach promises rapid setup of multiprojector display systems in new locations.
Image Generators The most fundamental issue facing display system developers is whether the display and application are designed to run on a single PC with multiple display outputs or a multiple-PC rendering cluster. If the number of projectors exceeds what can be configured on a single PC, or the performance of a single PC cannot deliver the desired application frame rate, then a cluster is the only choice. Both NVIDIA and ATI Technologies have GPUs with two output channels, and high end workstations are available with sufficient cooling and power to support two to four GPUs. If more output channels are needed, one can consider an external expansion chassis, such as NVIDIA’s Quadro Plex that supports up to eight output channels or a graphic expansion module (for example, Matrox Graphics Inc.’s TripleHead2Go) that can digitally split a single wide-screen channel into multiple nonoverlapping outputs. To achieve temporally seamless display in a multiprojector system, one must synchronize the outputs of all graphic cards. If developing an active stereoscopic configuration, video synchronization is an absolute necessity. NVIDIA’s Quadro G-Sync solution supports multicard frame lock (vertical interval synchronization), external genlock, and swap lock (synchronized buffer swaps). RPA Electronics Design, LLC also markets a synchronization kit for some NVIDIA cards. Software based synchronization solutions have been demonstrated on Linux and Windows systems by Allard, Gouranton, Lamarque, Melin, and Raffin (2003) and Waschbuesch, Cotting, Duller, and Gross (2006), respectively.
Projector Based Displays
75
Tracking Requirements Rendering a geometrically accurate scene for a user in a projective virtual environment requires knowledge of the user’s eye position in the display space. Changes in desired display surface color caused by a view perspective change are, of course, defined by the rays between the eye and the virtual objects and how the intersection of those rays with the display surface move as the viewer moves. In many geometric situations, the displayed scene may look reasonable and acceptable when viewed from a point other than the rendered viewpoint. This is particularly true when the display surface is second-order continuous as evidenced by our ability to accept perspective distortions on flat- or curved-screen presentations from a large variety of locations in a theater. Display surface discontinuities, such as the corners in a four-wall display environment, will produce more detectable geometric breaks in visual presence when the rendered and actual user viewpoints differ. The need for tracking must be analyzed on a display-geometry and an application-specific basis. If needed, only the position of the user’s eye is utilized in the rendering so it may be acceptable to approximate the eye position from a tracked position near the eyes without concern for tracker orientation. It is very important for tracking latency in HMD based VR systems to be small in order to minimize the sensed mismatch between vestibular and visual cues that can occur when displayed imagery lags actual head rotation. In projective virtual environments, the virtual scene is rendered everywhere so the mismatch between these rotational cues is largely nonexistent. As a result, the authors theorize that the need for very low latency tracking in projective virtual environments is greatly diminished compared to operation with HMDs. Seamless Rendering When multiple projectors are combined to form a single display, the fundamental goal is to create a geometrically and photometrically seamless visual for the users. Geometrically, this means that overlapping images of projectors are properly coregistered with no perceptible position or slope discontinuities, and there are no apparent perspective distortions due to off-axis projection or projection onto nonplanar display surfaces. Photometrically, the areas where projectors overlap should be undetectable (not brighter or darker), and there should be no perceptible brightness or color differences that make the number of individual projectors discernible. Ideally, photometric correction should not compromise display contrast and should be independent of scene content. These rendering challenges can seem daunting, and it is therefore understandable why the earliest projective virtual environments were designed to utilize the 3-D graphics pipeline unchanged—render a 3-D perspective scene based on the standard pinhole-camera model and a flat-image plane and then design the projector-screen configuration to match this model in an attempt to avoid any distortion. Geometric coregistration is then achieved by physically positioning the
76
VE Components and Training Technologies
projectors as exacting as possible to tile the projective images while also avoiding any photometric overlap. Beginning in the late 1990s and coinciding with the introduction of more affordable projectors, researchers began developing camera based calibration techniques and new rendering methods that would both relax the requirement for such a precise system setup and allow for more flexible display configurations. The key to these advancements was the realization that a projector is simply the dual of a camera and that is was possible to apply computer vision methods and projected structured light to calibrate projectors while also reconstructing the shape of the display surface. Given this information and the modern 3-D graphics processor, it was a relatively simple task to render predistorted images for an array of casually aligned projectors and produce a geometrically seamless visual. Parallel research in photometrics was simultaneously yielding new mathematical models and camera based calibration techniques for improved photonic uniformity. The next two sections provide insights into these new geometric- and photometric-rendering advances and the camera based calibration techniques utilized. Geometric Rendering Remapping (Warping) Basics An understanding of the geometric distortion created when projecting onto an arbitrarily shaped surface is the key to a successful rendering strategy. That understanding begins by considering the question: If standing at this location in the display space, what color should each (projector) pixel on the screen be to create a perspectively correct visual of the virtual scene? Figure 3.8 illustrates this geometric problem and includes five objects from a virtual scene, a complex-shaped display surface, a projector pose, and viewer position. In determining the color of a given projector pixel (circled), we consider the location on the display surface where the projector pixel falls (point A) and determine what color in the virtual scene the viewer should observe at this point. In the case of the projector pixel illuminating point A, the viewer should observe point B in the virtual scene. Similarly, for a projector pixel located at point C on the display surface, the viewer should observe point D in the scene. Points A and C are determined by the geometry of the display surface and the location, orientation, and optical properties of the projector—all of which can be obtained by a calibration process. Points B and D in the virtual scene are obtained by forming a ray between points A and C and the known viewer location and intersecting these rays with the objects in the virtual scene. This ray-tracing formulation of the problem effectively defines a mapping of 2-D points in the viewer’s image to 2-D points in the projector’s image. Figure 3.9 is illustrative of these two images. The amount of warping or predistortion in the projected image is a function of the display surface shape, the user viewpoint, and the calibration (position and orientation) of the projector.
Projector Based Displays
77
Figure 3.8. Ray-Tracing Model Illustrating the Remapping from View Image Space to Projector Image Space
If this remapping is accurately established for each projector, there will be geometric continuity of the virtual scene between projectors. Rendering for Planar Surfaces In the special case where the projection is onto a planar surface, the remapping from viewer image to projector image can be represented with a single 3 × 3 matrix. This mapping between two images is called a planar homography and is completely determined by four pixel correspondences between the two images. In practice, the use of additional points provides greater accuracy. Many developers (for example, Raskar, 2000; Chen, Sukthankar, Wallace, & Li, 2002; Raij, Gill, Majumder, Towles, & Fuchs, 2003; Ashdown, Flagg, Sukthankar, & Rehg, 2004) and others have calibrated large-tiled visualization walls based on computing two homographies—one representing the common mapping from the display wall to a camera image and a second homography from the camera image to each projector image. Concatenation of these two homographies defines a linear transform remapping the display wall to the projector image that, when preconcatenated with the application’s projection matrix, allows the standard graphics pipeline to directly compute the corrected projector image at zero additional computation expense. Rendering for Arbitrary 3-D Surfaces Planar homographies cannot be used to perform the remapping from the viewer image to the projected image when the display surface is more complex than a single plane. To address this limitation, Raskar et al. (1998) describe a generic two-pass rendering method that is also able to compute the remapping for arbitrary surfaces using the GPU. The algorithm works as follows. In the first
78
VE Components and Training Technologies
Figure 3.9. FlightGear simulation showing both (top) the undistorted viewer’s image and (bottom) the projector’s image with the predistortion required to compensate for a complex room corner.
Projector Based Displays
79
pass, the ideal image to be observed by the viewer is rendered into texture memory. Then, in the second pass, the ideal image is warped in a way that compensates for the geometry of the display surface when displayed by the projector. The remapping is accomplished using projective texturing, in which the polygonal model representing the display surface is textured with the ideal image using texture coordinates computed by projecting the vertices of the model into the first-pass image. The textured polygonal model of the display surface is then rendered from the projector’s perspective to obtain the image to be projected. Since the algorithm recomputes the ideal view and remapping on every frame, it can easily accommodate a tracked viewer. If the display geometry and viewer position are static, then the pixel-to-pixel remapping (warping) operation is fixed. Another rendering strategy is to precompute the per-pixel remapping coordinates between the viewer and the projector images and then, at run time, look up the mapping (stored on the GPU as a 2-D floating point texture) and use it to index the ideal viewer image for each pixel in the output projector image. This technique for performing geometric correction was first proposed by Bimber et al. (2005). A major advantage of using the per-pixel mapping approach is that both linear and nonlinear effects, such as projector lens distortion (Kannala & Brandt, 2006), can be combined into a single remapping lookup table. In the two-pass projective texturing approach, any lens distortion must be modeled independently of the perspective remapping operation. Johnson, Gyarfas, Skarbez, Towles, and Fuchs (2007) describe such a solution. Otherwise, the two representations are identical, as either can be used to generate the other. Camera Based Calibration Cameras are typically modeled by a 3 × 4 camera matrix that describes the mapping of 3-D world points to 2-D pixels in the camera’s image space. The projection matrix is defined up to scale with 11 degrees of freedom, which includes six extrinsic parameters describing the 3-D position and orientation of the device, and five parameters describing intrinsic properties including focal length in x and y, the 2-D location of the principal point in the image, and a pixel skew factor. Given six or more 3-D world points and corresponding 2-D camera image points, it is possible to solve for the camera matrix (Hartley & Zisserman, 2000). Bouguet (2008) provides a set of MATLAB tools for calibrating cameras using a checkerboard pattern calibration object. Given a calibrated camera pair, feature points projected onto the display surface can then be reconstructed in 3-D. Given this set of 3-D points and the corresponding 2-D points in projector image space, each projector can be calibrated using the same solution methods for cameras. In addition to a model describing all projectors, an estimate of the display surface geometry is required for the second-pass rendering (warping) step. This surface representation may be explicit, as in the case of a surface mesh defined by the 3-D points used to calibrate the projectors, or implicit in the case that it is represented as a mapping that describes how pixels in one projector map to pixels in
80
VE Components and Training Technologies
another projector. Such a mapping is equivalent to knowing the 3-D position of each projector pixel on the display surface (Brown, Majumder, & Yang, 2005; Quirk et al., 2006). Display calibration is normally done as part of the system setup, but researchers have also demonstrated camera based calibration methods that run concurrently with the application to continuously refine the geometric and photometric calibration (Yang & Welch, 2001; Cotting, Naef, Gross, & Fuchs, 2004; Johnson & Fuchs, 2007; Zollmann & Bimber, 2007). Photometric Correction In a multiprojector display environment, geometric correction alone is not enough to give the user the impression of a single uniform display. There may be a number of photometric inconsistencies between projectors and also within individual projectors themselves. The goal of photometric correction is to eliminate these photometric differences. Blending Basics In regions of the display surface where the imagery of multiple projectors overlaps without compensation, a higher photometric intensity will be observed. For example, if the images of two identical projectors overlap, the luminance in the overlap region will be approximately twofold brighter than the neighboring nonoverlapped region. Two blending techniques are commonly used to compensate for this luminance gain—electronic attenuation of the input signal or the placement of a physical aperture mask in the optical path. The naive approach to electronic compensation is to reduce the intensity of each projector’s imagery equally at all points in the overlap region by an amount proportional to the number of overlapping projectors. There are two issues with this solution. First, the amount of attenuation needed cannot be correctly computed without knowledge of the projector’s transfer response function (luminance output intensity as a function of the input value) or “gamma.” In practice, most projectors have an S-shaped response function similar to Figure 3.10. This means the attenuation required to reduce the luminance on the screen by a desired percentage is a nonlinear function of the input intensity. Second, slight geometric registration errors or even small lamp differences in the projectors will likely still leave an observable boundary if all overlapping pixels are attenuated equally. The human eye is very sensitive to intensity steps and slope discontinuities (Mach bands), so a better approach is to weight the contributions of the projectors with a function that smoothly transitions or blends between projectors in the overlap region. Raskar et al. (1999) describe such a method for generating attenuation masks for each projector that also takes into account the projector’s response. In practice, a slope-continuous parametric function, such as a cosine curve, rather than a linear ramp, should be used in this computation. Physical aperture masks placed in the optical path (external to the projector as shown in Figure 3.11) can also be used to blend the projected imagery in overlap
Projector Based Displays
Figure 3.10. of Input
81
Typical Nonlinear Luminance Response of Projectors as a Function
regions, but their application can also be very complex. Since these external masks are not in the focal plane of the projector, the resulting shadow has penumbra and umbra regions. The challenge is physically aligning the aperture masks so the penumbra regions of neighboring projectors overlap precisely within the projector overlap region, while also achieving a combined unity optical gain. Changing the width of the penumbra can be accomplished by moving the mask relative to the projector’s optical axis. Achieving unity gain may require developing a mask that optically varies from transparent to opaque in a prescribed manner. These complex optical design considerations and physical placement challenges are easier to overcome in simple projector and screen configurations, such as planar visualization walls or cylindrical arrays, but vendors routinely create masks with curved edges that are optically nonlinear for application in multiprojector dome displays. In summary, while electronic compensation is by far a more flexible, reconfigurable blending solution, aperture masks can provide an absolute black in overlapping regions that electronic attenuation cannot. As black levels are reduced in future generation projectors, electronic attenuation should dominate. Black Is Not Black One major advantage of CRT based projectors over today’s digital-imager technologies is the ability to adjust black to just human-discernible levels. Unfortunately, that is not the case with today’s digital-imaging projectors. Black,
82
VE Components and Training Technologies
Figure 3.11. Behind-the-screen view showing the aperture masks placed in the optical path for edge blending of the 24 projector Scalable Display Wall, November 2000. Image courtesy of Dr. Kai Li, Princeton University.
in regions of multiple projector overlap, is brighter than the black in nonoverlapping regions of display. Blending using electronic input attenuation cannot solve this black-overlap problem, but physical aperture masks by their very nature can provide 100 percent optical attenuation. To achieve absolute black uniformity electronically, Majumder and Stevens (2005) and others have computed a black-offset mask, which is applied in combination with an alpha-blending mask, to raise the level of blacks in nonoverlapped regions to match the black level in projector overlap regions. This can result in effective black blending, but at the expense of reduced display contrast. Many users or applications may find this reduction in contrast an unacceptable trade-off. With today’s commodity projectors, the best blacks are produced by LCoS projectors, followed by DMD and LCD designs. The good news is better black response is a commonly held design goal, and it is improving with each generation of imaging device, so this issue may fade away. For example, future highdynamic-range projectors, built with two image modulators in series promise drastic contrast and black improvements, and affordable laser projectors will simply make the black issue disappear.
Projector Based Displays
83
Advanced Photometrics In addition to projector overlap, additional sources of luminance variation in multiprojector design are optical vignetting in each projector, lamp brightness differences from projector to projector, and inverse-square-law luminance variations created by the distance relationships between projectors and the screen surface. To address all these issues, Majumder (2003) developed a unifying color model and new algorithms for computing more sophisticated alpha blending and black-offset masks based on achieving perceptual display uniformity rather than global intensity uniformity. Ashdown, Okabe, Sato, and Sato (2006) present a content-dependent framework for creating photometric compensation, which, like Majumder’s research, balances strict compensation against dynamic range. Color gamut is another property of projectors that may vary across makes and models. A display with color inconsistencies across its extent can be undesirable for the user. It is possible to partially correct for color inconsistencies across projectors by remapping the input values provided to each projector in such a way that the response of the projector to the new input matches as closely as possible the desired color (Wallace, Chen, & Li, 2003). Kresse, Reiners, and Kno¨pfle (2003) detail a color calibration method and algorithm that corrects color gamut differences between multiple projectors while also addressing the left-right eye color differences using Infitec stereoscopy. Wetzstein and Bimber (2007) have also developed a generalized framework that utilizes a full-light transport model to perform image based radiometric compensation of many advanced photometric issues, including interreflections, refraction, and light scattering. Using a clever approximation for the inverselight transport matrix, real time results have been demonstrated running on a GPU (shader program) for a single-projector configuration. Warp-and-Blend Hardware The warp-and-blend rendering techniques discussed in this section can all be implemented on GPUs from NVIDIA and ATI to support OpenGL and DirectX applications. Many of these advanced algorithms are implemented as custom pixel-shader programs on the GPU. The rendering cost of these operations is negligible compared to the cost of scene rendering in most applications, and therefore no additional latency is incurred. Several external warp-and-blend solutions exist, including products from 3D Perception and SEOS Ltd., that are installed in the video stream between the GPU and the projectors. In addition, high end projectors from Barco, Christie Digital Systems, Digital Projection, Inc., 3D Perception, and others have builtin warp-and-blend engines. The setup of external warp-and-blend engines has traditionally been done using screen overlays and a human in the loop making visual decisions on the quality of geometric and photometric continuity. The new trend by display systems vendors is automatic alignment algorithms that utilize one or more cameras for visual feedback. Barco, Mersive Technologies, SEOS, Scalable Display Technologies, and others are currently marketing such systems.
84
VE Components and Training Technologies
Sampling and Latency Any image resampling operation is prone to sampling artifacts, so output quality differences can be a significant differentiator between warping engines. Warping done on the GPU typically utilizes the texturing hardware to do bilinear interpolation between four input pixels for each output pixel, but even more sophisticated reconstruction filters are possible. Minimizing overall system latency can be an important goal in many high performance simulation environments. One source of additional latency can be in the geometric warp stage. Warping done on the GPU with a shader program should add minimum additional rendering cost or delay, but external warp-andblend hardware inserted between the GPU and the projector can add processing latencies up to one display frame time depending on geometric remapping. For example, if the first output pixel from the warp engine is remapped from the 15th line of the input image, there is a small, additional display latency of 15 lines compared to the direct image generator output.
Application Development Cluster Rendering Support There are significant architectural differences in a simulation designed to run on a single PC versus a rendering cluster. New strategies for data (model data and run-time user input) distribution, as well as frame synchronization must be considered. If developing a new application for cluster rendering, VR Juggler is an open source suite of application programming interfaces designed for VR application development with embedded support for distributed cluster rendering. For existing single-PC, OpenGL applications, one should consider Chromium for multinode projective display support, as it requires no modification of the application code. Chromium is an OpenGL implementation that does not render the OpenGL stream to a frame buffer, but transparently forwards the commands (and data) to render PCs. These render nodes can be customized to do warp-and-blend rendering on the GPU or with external engines. Majumder and Brown (2007) provide a good overview of Chromium for multiprojector display. Other interesting architectures for distributed display include the SAGE (Jeong et al., 2005) and VIRPI (Germans, Spoelder, Renambot, & Bal, 2001) projects. Warp-and-Blend Software Support Using an external warp-and-blend engine has the advantage of not requiring any application code changes, but adds extra expense, incurs some small increase in system latency, and cannot handle view position changes in real time. Warp and blend implemented on the GPU has none of these disadvantages, but does require an initial investment to develop the warp-and-blend software (no open source implementations yet exist). Warp-and-blend functionality can be distilled
Projector Based Displays
85
Figure 3.12. Futuristic virtual team training environment. (Top) The room with a few real objects and two trainees in standby mode and (bottom) then in operation. Sketches by Andrei State, University of North Carolina at Chapel Hill.
into two basic functions—a preDraw method that initializes rendering to GPU (texture) memory and a postDraw method that calls the warp-and-blend operator. With this strategy, the application code changes required to add warp and blend are rather simple.
86
VE Components and Training Technologies
CONCLUSIONS AND FUTURE OPPORTUNITIES Great strides have been made since 2000 in reducing the cost and complexity of building large-scale projective environments. Projector size and costs continue to shrink, and LED light sources will soon replace all hot-filament light sources. New imaging technologies, such as the grating light value laser projector, hold great promise. Camera based automatic setup and calibration methods demonstrated in the research community are beginning to be incorporated into real world products. Warp-and-blend functionality is becoming more sophisticated as it migrates from external black boxes to programmable pixel-shader algorithms running on GPUs, and application software developers are exploring new frameworks, such as VR Juggler and Chromium, for distributed cluster rendering. However, technology challenges still exist. Projectors with more resolution, better blacks and contrast, and a wider color gamut are needed. Prototype solutions for multiprojector shadow removal or improved depth of field must be taken to the next level of practice, and the search for multiuser stereoscopic solutions will continue. Now imagine a large environment designed for immersive team training that combines some real objects with mostly virtual objects as shown in Figure 3.12. Autostereoscopic display, full-body tracking, real time physical simulation, spatialized audio, and tools for generating training scenarios are a few of the technologies that must be developed and integrated. Can such a projective environment be our next reality? REFERENCES Allard, J., Gouranton, V., Lamarque, G., Melin, E., & Raffin, B. (2003). SoftGenLock: Active stereo and genlock for PC cluster. Proceedings of the Workshop on Virtual environments 2003—EGVE ’03 (pp. 255–260). New York: ACM. Ashdown, M., Flagg, M., Sukthankar, R., & Rehg, J. (2004). A flexible projector-camera system for multi-planar display. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition—CVPR ’04 (Vol. 2, pp. 165–172). Washington, DC: IEEE Computer Society. Ashdown, M., Okabe, T., Sato, I., & Sato, Y. (2006, June). Robust content-dependent photometric projector compensation. Paper presented at the Third International Workshop on Projector-Camera Systems—PROCAMS ’06, New York, NY. Bimber, O., Wetzstein, G., Emmerling, A., & Nitschke, C. (2005). Enabling viewdependent stereoscopic projection in real environments. Proceedings of the 4th IEEE/ ACM International Symposium on Mixed and Augmented Reality—ISMAR ’05 (pp. 14–23). Washington, DC: IEEE Computer Society. Bouguet, J. Y. (2008). Camera calibration toolbox for Matlab. Retrieved April 18, 2008, from http://www.vision.caltech.edu/bouguetj/calib_doc/index.html Brown, M. S., Majumder, A., & Yang, R. (2005). Camera-based calibration techniques for seamless multi-projector displays. IEEE Transactions on Visualization and Computer Graphics (Vol. 11, pp. 193–206). Piscataway, NJ: IEEE Educational Activities Department.
Projector Based Displays
87
Brown, M. S., Song, P., & Cham, T. J. (2006). Image pre-conditioning for out-of-focus projector blur. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—CVPR ’06 (pp. 1956–1963). Washington, DC: IEEE Computer Society. CAORF. (1975). Simulation at U.S. Merchant Marine Academy. Retrieved April 18, 2008, from http://www.usmma.edu/admin/it/simulator.htm Cham, T. J., Sukthankar, R., Rehg, J. M., & Sukthankar, G. (2003). Shadow elimination and occluder light suppression for multi-projector display. Proceedings of the International Conference on Computer Vision and Pattern Recognition—CVPR ’03 (Vol. 2, pp. 513–520). Washington, DC: IEEE Computer Society. Chen, H., Sukthankar, R., Wallace, G., & Li, K. (2002, October). Scalable alignment of large-format multi-projector displays using camera homography trees. Paper presented at the Thirteenth IEEE Conference on Visualization—VIS ’02, Boston, MA. Cotting, D., Naef, M., Gross, M., & Fuchs, H. (2004, November). Embedding imperceptible patterns into projected imagery for simultaneous acquisition and display. Proceedings of the Third International Symposium on Mixed and Augmented Reality— ISMAR ’04 (pp. 100–109). Washington, DC: IEEE Computer Society. Cruz-Neira, C., Sandin, D. J., & DeFanti, T. A. (1993). Surround-screen projection-based virtual reality: The design and implementation of the CAVE. Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1993, 135–142. Germans, D., Spoelder, H., Renambot, L., & Bal, H. (2001, May). VIRPI: A high-level toolkit for interactive scientific visualization in virtual reality. Paper presented at Immersive Projection Technology/Eurographics Virtual Environments Workshop, Stuttgart, Germany. Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge, United Kingdom: Cambridge University Press. Jaynes, C., Webb, S., Steele, R. M., Brown, M., & Seales, W. B. (2001). Dynamic shadow removal from front projection displays. Proceedings of the conference on Visualization —VIS ’01 (pp. 175–182). Washington, DC: IEEE Computer Society. Jeong, B., Jagodic, R., Renambot, L., Singh, R., Johnson, A., & Leigh, J. (2005, October). Scalable graphics architecture for high-resolution displays. Paper presented at IEEE Information Visualization Workshop, Minneapolis, MN. Johnson, T., Gyarfas, F., Skarbez, R., Towles, H., & Fuchs, H. (2007). A personal surround environment: Projective display with correction for display surface geometry and extreme lens distortion. Proceedings of the Annual IEEE Conference on Virtual Reality—VR ’07 (pp. 147–154). Washington, DC: IEEE Computer Society. Johnson, T., & Fuchs, H. (2007, June). Real-time projector tracking on complex geometry using ordinary imagery. Paper presented at the IEEE International Workshop on Projector-Camera Systems—PROCAMS 2007, Minneapolis, MN. Kannala, J., & Brandt, S. (2006). A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(8), 1335–1340. Kresse, W., Reiners, D., & Kno¨ pfle, C. (2003). Color consistency for digital multiprojector stereo display systems: The HEyeWall and the digital CAVE. Proceedings of the Workshop on Virtual Environments (pp. 271–279). New York: ACM.
88
VE Components and Training Technologies
Majumder, A. (2003). A practical framework to achieve perceptually seamless multiprojector displays. Unpublished doctoral dissertation, University of North Carolina, Chapel Hill. Majumder, A., & Brown, M. S. (2007). Practical multi-projector display design. Wellesley, MA: A. K. Peters. Majumder, A., & Stevens, R. (2005). Perceptual photometric seamlessness in tiled projection-based displays. ACM Transactions on Graphics, 24(1), 118–139. Quirk, P., Johnson, T., Skarbez, R., Towles, H., Gyarfas, F., & Fuchs, H. (2006, October). RANSAC-assisted display model reconstruction for projective display. Paper presented at the IEEE VR 2006 Workshop on Emerging Display Technologies, Nice, France. Raij, A., Gill, G., Majumder, A., Towles, H., & Fuchs, H. (2003, October). Pixelflex2: A comprehensive, automatic, casually-aligned multi-projector display. Paper presented at the IEEE International Workshop on Projector-Camera Systems—PROCAMS ’03, Nice, France. Raskar, R. (2000, March). Immersive planar displays using roughly aligned projectors. Paper presented at the Annual IEEE International Conference on Virtual Reality— VR 2000, New Brunswick, NJ. Raskar, R., Brown, M., Yang, R., Chen, W. C., Welch, G., Towles, H., et al. (1999). Multiprojector displays using camera-based registration. Proceedings of the Conference on Visualization—VIS ’99 (pp. 161–168). Washington, DC: IEEE Computer Society. Raskar, R., Welch, G., Cutts, M., Lake, A., Stesin, L., & Fuchs, H. (1998). The office of the future: A unified approach to image-based modeling and spatially immersive displays. Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques—SIGGRAPH ’98 (pp. 179–188). New York: ACM. Rehg, J., Flagg, M., Cham, T., Sukthankar, R., & Sukthankar, G. (2002). Projected light displays using visual feedback. Proceedings of the International Conference on Control, Automation, Robotics, and Vision—ICARCV ’02 (Vol. 2, pp. 926–932). Washington, DC: IEEE Computer Society. State, A., Welch, G., & Ilie, A. (2006). An interactive camera placement and visibility simulator for image-based VR applications. Proceedings of the Eighteenth Annual Symposium on Electronic Imaging Science and Technology—IS&T/SPIE ’06 (pp. 640-651). Bellingham, WA: SPIE. Stone, M. (2001). Color and brightness appearance issues in tiled displays. IEEE Computer Graphics and Applications, 21(5), 58–66. Sukthankar, R., Cham, T. J., & Sukthankar, G. (2001). Dynamic shadow elimination for multi-projector displays. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—CVPR ’01 (Vol. 2, pp. 151–157). Washington, DC: IEEE Computer Society. Surati, R. (1999). Scalable self-calibrating display technology for seamless large-scale displays. Unpublished doctoral dissertation, Massachusetts Institute of Technology, Boston, MA. Wallace, G., Chen, H., & Li, K. (2003). Color gamut matching for tiled display walls. Proceedings of the Workshop on Virtual Environments 2003—EGVE ’03 (Vol. 39, pp. 293–302). New York: ACM. Waschbuesch, M., Cotting, D., Duller, M., & Gross, M. (2006). WinSGL: Software genlocking for cost-effective display synchronization under microsoft windows. Proceedings of the Sixth Eurographics Symposium on Parallel Graphics and Visualization— EGPGV ’06 (pp. 111–118). Amsterdam, The Netherlands: Elsevier Science Publishers.
Projector Based Displays
89
Wetzstein, G., & Bimber, O. (2007). Radiometric compensation through inverse light transport. Proceedings of the 15th Pacific Conference on Computer Graphics and Applications—PG ’07 (pp. 391–399). Washington, DC: IEEE Computer Society. Yang, R., & Welch, G. (2001, February). Automatic and continuous projector display surface calibration using every-day imager. Paper presented at the 9th International Conference in Central Europe on Computer Graphics, Visualization, and Computer Vision—WSCG ’01, Plzen, Czech Republic. Zollmann, S., & Bimber, O. (2007). Imperceptible calibration for radiometric compensation. Short Paper Proceedings of the Twenty-Eighth Annual Conference of the European Association for Computer Graphics (pp. 61–64). Aire-la-Ville, Switzerland: Eurographics Association.
Chapter 4
AUDIO Ramy Sadek Sound is a deceptively simple yet fundamental experience in daily life. Listeners derive much information about their surroundings through hearing. Characteristics of a surrounding space, as well as the locations, velocities, and sizes of scene elements are a few examples of information gathered by listening. In this sense, the ears lead the eyes, telling them where to look. Auditory stimuli—if reproduced correctly—form a powerful link between participants’ virtual worlds and the physical space in which training occurs. Conversely, poor audio reproduction creates an incongruity between the two spaces that marks the virtual space as clearly unreal, shattering the illusion of the virtual environment. When virtual environment (VE) training does not get audio right, the “virtual” aspect may be meaningless. The familiar nature of auditory experiences leads to an intuitive understanding of sound and its behavior, leading many to oversimplify the delivery of audio. However, there are many challenges in reproducing the complex interactions of sound in the environment and the human auditory system. This chapter provides a general introduction to a variety of topics. Most of these topics are complex, making a complete discussion beyond the scope of the present discussion. The compromise is to include the information relevant to practitioners setting up an audio system for use in a virtual environment while offering suggested reading for thorough detail and advanced topics. Discussion follows the “99 percent rule,” meaning definitions and explanations are true in essence, avoiding rigorous detail in favor of clarity and practicality. Beginning with fundamentals, the chapter covers the design issues, implementation details, and trade-offs involved in such a setup. The first two sections, “What Is Sound” and “Psychoacoustics,” cover the physical properties of sound and basic psychoacoustics. The next two sections discuss basic signal processing ideas and considerations for virtual environments, such as visual displays and rendering techniques. Loudspeakers and headphones are discussed in detail, and basic safety procedures are outlined, which should be employed in all VEs. Finally, hardware and software, environmental effects, and the trade-offs between them are covered.
Audio
91
WHAT IS SOUND? PHYSICAL QUANTITIES, WAVES, AND DECIBELS In order to understand the issues involved in the design and setup of an audio system, it is important to first understand sound. In broad terms, the word sound refers to vibrations in the air within the audible range. Specifically, these vibrations are air-pressure fluctuations varying with time and space called pressure waves. There are two characteristic types of wave: transverse and longitudinal. Transverse wave propagation is perpendicular to the motion defining the wave. For example, fluffing a sheet when making a bed creates a vertical displacement that travels horizontally along the length of the sheet. So the motion of this transverse wave is perpendicular to its direction of travel. Conversely, longitudinal waves propagate in the same direction as the wave motion. Consider a tube open at one end, with a plunger at the other end. Moving the plunger forward into the tube increases the air pressure near the plunger in the direction of the plunger’s motion. The pressure moves along the length of the tube toward the opening. Longitudinal pressure waves are the type of waves that comprise sound. Sinusoids are a simple way to examine waves since a complete description of a sinusoid requires only three parameters: amplitude, frequency, and phase. Each of these variables plays a key part in the setup of an audio system and so merits a brief review. Amplitude refers to the vertical extent of the wave about its center line. Put another way, amplitude represents the magnitude of oscillation. For example, a sine wave centered about the origin that ranges between −1 and +1 has amplitude 1 since each oscillation has a displacement magnitude of 1. Frequency, measured in hertz (Hz) refers to the rate of oscillation; 1 Hz equals one oscillation per second. It follows that fast oscillations have high frequency values yielding a high pitched tone, while slow oscillations are of low frequency, creating low pitches. Frequency and wave period are inversely related. Period refers to the time required for the wave shape to repeat itself, while wavelength refers to the distance required for repetition. Therefore, wavelength refers to spatial quantities only, and frequency refers to temporal quantities. Phase refers to time displacement of the sinusoid. For example, a sine wave (with zero phase) equals zero at the origin. Moving the sine wave along the x direction until its value at the origin equals 1 yields a wave identical to a cosine. Recall from trigonometry that a sine wave shifted 90° is equal to cosine. So cosine equals a phase-shifted sine wave. Intensity refers to the average energy per square meter at a given displacement from the source. Energy in this case grows proportionally to the squared amplitude of the wave. Since sound waves emanate radially from a source, they are spherical waves around the source that grow with distance. The surface area of a sphere grows proportionally to its radius squared; therefore, the ratio of energy per unit area decays with inverse distance squared. Intensity varies directly with squared amplitude and inversely with squared distance.
92
VE Components and Training Technologies
Often it is necessary to compare sound intensities. Because the range of audible intensities is very large, it is helpful to use logarithmic units to describe quantities such as intensity and amplitude. The decibel (dB) is defined to be 10 · log10(I1/I2). That is, a decibel is 10 times the logarithm of the ratio of the two intensities. Since decibels are defined in terms of a ratio between two quantities, the measure is a relative one. Therefore, to measure a given quantity measured in dB requires an implicit comparison against a standard reference. The standard reference level for sound intensity, I0 equals 10−12 Watts per squared meter, which is (roughly) the lowest sound intensity audible by humans. Often other quantities, such as power and pressure, are measured as a ratio of squares, in which case the decibel is 20 · log10(A1/A2) since logb (xy) = y · logb(x). To avoid confusion when reading specifications in decibels, keep in mind whether measured quantities are direct ratios or ratios of squares. For additional detail on waves and physical quantities, see Haliday, Resnick, and Walker (2007). PSYCHOACOUSTICS Psychoacoustics: Loudness, Frequency, and Delay Loudness is the impression of intensity as interpreted by the auditory system; however, loudness is not a function of intensity alone, nor does it vary proportionally to intensity. Rather, loudness is a function of several factors, frequency foremost among them. The humanly audible frequencies range between approximately 20f and 20,000 Hz (20 kHz [kilohertz]). See Figure 4.1. Fletcher-Munson curves are contours relating perceived equal loudness to sound-pressure level (vertical axis), and frequency (horizontal axis). The contours denote the sound-pressure levels (SPLs) at which frequencies are perceived to be equally loud. Another way to read the graph in Figure 4.1 is as a map of sensitivity to frequencies. Humans are most sensitive to frequencies in the middle range, as indicated by the lowest parts of the equal loudness contours. The vertical axis (in dB) is logarithmic. Thus values along the contours have a great range. Consider, for example, the bottom contour, roughly 8 dB above reference level for frequencies near 2 kHz. A 50 Hz tone of equal loudness would require approximately 50 dB of amplification to sound as loud as the 2 kHz tone: a factor of over 300 times in amplitude, or 100,000 in power! This places demanding requirements on an audio system that (ideally) should reproduce the entire frequency range smoothly, without audible noise in the softest sounds or distortion in the loudest sounds. There is a complex relationship between loudness and delay as well. For example, different combinations of delay and intensity can yield the same perceived source location. Delay refers to the time separating instances of similar sounds, for example, an echo. In general, delays up to 50 ms sound as though they are a single sound rather than a sound and an echo. The exact time at which this separation occurs depends highly on the nature of the sound. For example, clicks and other sharp sounds separate even when the delay is comparatively brief, while other sounds may support longer delays without separation.
Audio
93
Figure 4.1. The human audible frequencies range between approximately 20 and 20,000 Hz (20 kHz [kilohertz]).
Delay is an important consideration of loudspeaker setup and calibration as it can greatly affect source localization, as will be described later. Spatial Hearing The auditory system localizes sound sources in several ways, each of which has important implications for audio setups in virtual environments. Two quantities are of primary importance: interaural intensity difference (IID) and interaural time difference (ITD). IID is the difference in acoustic energy received at the two ears, while ITD is the time delay between the two ears. Consider a source directly left of the listener. Sound waves from this source reach the left ear before the right ear (ITD); the right ear also receives less acoustic energy from the source (IID) due to the listener’s head shadow. IID and ITD are two components of a more general set of auditory cues described as a head-related transfer function (HRTF). HRTFs are based on anatomy: the shape of the ears (pinnae), head, chest, and shoulders define the
94
VE Components and Training Technologies
HRTF. Each anatomical interaction mutates sound waves differently, affecting the final waveform that reaches the eardrum. Resultant waveforms may differ greatly from the original source; hence the brain must decipher these differences to determine the spatial location of the source. The following sections discuss IID and ITD independently, but bear in mind that they are closely related components of an HRTF. Considering the HRTF as a function with phase, amplitude, and spectral components, IID is the amplitude component, ITD is the phase component, and reflections and absorptions by the body comprise the spectral component. Multiple spatial positions can produce the same IID or ITD. Specifically a torus of points equidistant from both ears, called a torus of confusion, yields identical values. Because of this ambiguity, the IID and ITD components alone may not lead to accurate localization. Only the full HRTF, with phase, amplitude, and spectral components, provides enough information for accurate localization. Additionally, both IID and ITD localization are frequency dependent with ITD dominating localization up to around 700 Hz, and IID after roughly 1.5 kHz. For these reasons, localization of a good HRTF application ought to outperform that of an amplitude or delay panner alone. Finally, note that the auditory system localizes frontal sources most accurately, while accuracy diminishes toward the sides and rear. Similarly, accuracy decays with vertical angle yielding relatively poor localization at moderate angles above and below the listener. See Blauert (1997) for details on spatial hearing. Source Localization, IID, ITD, and Precedence For a source directly in front of a listener (centered along the medial axis), ITD and IID equal zero because the paths from the source to each ear are the same length and have no head shadow. As the source moves to the left or right, ITD and IID vary accordingly. The converse holds as well; given two identical waveforms sent to each ear, the auditory system will perceive a single central source since ITD and IID equal zero. Adjusting loudness or delay of the waveforms to create a nonzero IID or ITD changes the angular displacement of the perceived source. The following subsections explain these phenomena in greater detail. Amplitude: IID Localization When there is no ITD, intensity plays a significant part in source localization. In this situation, the auditory system localizes based on loudness. For example, consider two loudspeakers in front of and facing a listener positioned between them. ITD equals zero. If the two loudspeakers emit the same signal, IID equals zero as well. The listener perceives a single source image centrally positioned between the loudspeakers. Increasing the amplitude of one loudspeaker signal moves the perceived source toward that loudspeaker. In a sense that loudspeaker has more weight in the localization. Precedence: ITD Localization The auditory system follows “the law of the first wavefront,” interpreting source locations based on the direction from which acoustic energy first arrives.
Audio
95
This is commonly referred to as precedence. Whereas ITD measures the difference in arrival times of a single wavefront for a given incidence angle, the precedence effect refers to localization based on the ITD of the first wavefront of a sound arriving from multiple directions. For example, consider a set of loudspeakers and a listener who is close to a particular loudspeaker. If the loudspeakers emit identical sounds, a listener will localize based on the ITD from the direction of the nearest loudspeaker because its wavefront is the first arrival. This effect holds even when the nearest loudspeaker is significantly less loud than the farther loudspeakers. Therefore, it is possible—within certain limits—to achieve the same perceived angle for a variety of intensity and delay combinations, which can be useful when calibrating an audio setup (see Rumsey, 2001) BASIC DIGITAL SIGNAL CONCEPTS Quantization Digital signals are discrete representations of functions that are continuous in both scale and time; that is, they span an infinite number of values over an infinite number of points in time. Computers are able to process only discrete values, so continuous signals are discretized in both scale and time. There is inherent error in discretization. The magnitude of the error is related to the size of the discrete steps used to represent the continuous signal. Greater precision in amplitude requires increased bit depth per sample. Similarly increased frequency requires a higher sampling rate. Both increase the amount of data required to represent the signal in digital format. Nyquist Frequency Representing a signal with maximum frequency F requires a sampling rate greater than 2F samples per second. Put another way, a digital signal generated by sampling at 2F samples per second can faithfully represent only frequencies below F Hz. This frequency, 2F, is called the Nyquist frequency, after the physicist Harry Nyquist. The minimum sampling rate required to represent a given frequency is called the Nyquist rate. This relationship requires that, for faithful reproduction, the input signal include no frequencies above F. This range—or band—of frequencies in the signal is limited, commonly expressed by the term band limited. Because the auditory system is unable to detect frequencies above 20 kHz, all humanly audible signal components are band limited. So a sampling rate at or above 40 kHz can reconstruct any humanly audible sound. Sampling rates below 40 kHz lose a portion of the audible range. Certain assumptions of the mathematical model used to derive the Nyquist rate are physically unrealizable as circuits, so it is necessary to sample at a rate slightly faster than 40 kHz to allow reproduction of the entire audible frequency
96
VE Components and Training Technologies
range when converting from digital signals to continuous (analog) signals (Watkinson, 2001). CONSIDERATIONS FOR VIRTUAL ENVIRONMENTS In a virtual environment, poor sound quality stands out. When audio fails to match visual cues in a film, audiences notice immediately, breaking their immersion. In virtual environments, a well designed audio system guides listeners to perceive audio cues as though emanating from the virtual world, aiding immersion and suspension of disbelief. Conversely, poor setups lead listeners to perceive sound as emanating from loudspeakers in specific locations, reinforcing the fact that the training experience is not “real.” It is a common view that when audio is “done right,” only experts can tell, but when “done wrong,” all will notice. In other words, audio goes unnoticed unless it is malfunctioning. This view is misleading. A proper audio setup calls little attention to the system itself (for example, noise artifacts) and yields high sound quality appreciable by all audiences, not only experts. Experts listen actively for specific technical problems common to such systems, but the immersive, engaging effects of good sound are accessible to all listeners. Implications for VE Design Careful setup is crucial to achieve high quality audio. Decisions made during the design stage of the VE can greatly affect the audio quality. From the choice of visual display devices to the geometry of the room housing the VE setup, considering audio early in the design stages avoids difficult work-arounds, saving time and improving audio quality. Often the audio setup comes up late in the design process, reducing options and complicating design, which leads to a difficult implementation. Early focus on the audio setup during design saves time, money, and headaches. Visual displays are common acoustic impediments, often interfering with placement of front loudspeakers. Unfortunately, there is no universal solution to this problem; all work-arounds are compromises with varying effectiveness. The visual display problem is one to tackle at the beginning of design to allow successful compromises. Integration with Visual Displays Currently, front projection display systems offer the best compromise, allowing high quality visual and audio performance. Recent advancements in screen manufacturing yield nearly acoustically transparent screens with excellent visual performance characteristics. These screens slightly attenuate high frequencies, but this is correctable with an equalizer. Some screen manufacturers offer equalizers pretuned to correct for their screens. Such an arrangement may save time and money; however, most virtual environment setups require significant equalization and calibration, eliminating the benefits of a preset device.
Audio
97
Front projection systems with acoustically transparent screens are the best choice because they allow high performance of both audio and video. From an audio perspective, front projection avoids many problems leading to an easier setup and better results than are possible with other visual displays. When front projection is not feasible, the alternative options are hard to rank. When evaluating alternatives, there are a few things to consider. Loudspeakers must be unoccluded, pointing directly at the center of the listening space. Also, the auditory system is more sensitive to horizontal angles than vertical ones. Therefore, it is possible to place the front speakers above or below the display if the displacement is only a few degrees from planar. Large vertical displacement angles in front are ineffective, creating more problems than they solve, so it is unhelpful to raise or lower the loudspeaker more than a few degrees. To the sides and rear, the human system is less sensitive to angular displacement, allowing vertical deviations less than 45°. Note that 45° is the extreme maximum, and certain material may sound objectionable with such large displacements. The greater the vertical angular of the rear loudspeakers, the more their spatial efficacy diminishes (Holman, 1999). In general, the main loudspeaker positions should be as planar as possible, with deviations the exception rather than the rule. In some virtual environments, participants may view the virtual world from any direction (for example, head-mounted displays, curved screen enclosures, and so forth) so there is no sense of “front” and “rear” loudspeakers. These configurations require a larger number of loudspeakers for accurate localization. Frontal imaging is most effective using three channels (right, left, and center) with a 30° angle between loudspeakers. The center channel stabilizes the audio image, which, in a two-channel configuration, is highly sensitive to precedence. This stability is very important in virtual environments as it helps to avoid the “snapping” effect wherein sound positions “snap” to the nearest loudspeaker when the participant moves or rotates his or her head. The 30° criterion demands 12 loudspeakers to cover 360° (for example, with an HMD setup), yet few commercial audio systems offer spatialization over 12 loudspeakers. Some systems do offer this capability, but have other trade-offs.
Room Acoustics Room acoustics affect audio system performance greatly. The topic of room acoustics has been widely covered; however, most discussion has focused on sound in public spaces or in studio sound for post-production. While the same principles hold, the special needs of VEs emphasize and prioritize these factors differently. In VEs, there are four primary concerns regarding room acoustics: ambient noise, standing waves, reverberance, and uneven room response. The goal of calibrating the audio system (see “Setup and Calibration”) is to negate these effects. Ambient noise must be reduced as much as possible since it severely detracts from the quality of reproduction. Some experts have drawn analogy to shining a
98
VE Components and Training Technologies
light on a video screen or jittering the picture: these effects are irritating and destroy immersion. Similarly, white noise, hums, hisses, clicks, and so forth, detract immediately from the performance of the audio system. For example, it would be counterproductive to spend money and effort on a well designed and calibrated setup and then to leave noisy computers in the listening area. When selecting a room for the VE setup, avoiding such loud building elements as large electrical transformers, elevators, or boiler rooms is essential since counteracting the sound of such massive objects is exceedingly difficult. When building a space specifically for a VE setup, use double walls and raised floors to isolate the room from outside noises. Ventilation ducts for heating and ventilation systems (even those designed for silence) should be fitted with acoustic vents. Consult an architect familiar with studio construction to ensure good design. Computers and other noisy equipment are best placed in a different room whenever possible, or at least in acoustic cabinets. When room dimensions exactly match wavelength, a standing wave occurs. If the distance between two walls is an exact multiple of one period (half the wavelength), the room will have a standing wave of corresponding frequency. Standing waves also occur at all integer multiples of that frequency since their periods match room dimensions. They are called “standing” waves because they do not vary spatially. When plotted over time, they appear to stand in place, varying in amplitude but not phase/time. In particular, there are points, called nodes, that undergo zero displacement as the wave oscillates. So frequencies are lost completely at the node’s position, yet elsewhere in the room they may be prominent. These waves are detrimental to sound quality since they create large differences in frequency response throughout the listening space. It is a common misconception that nonparallel walls do not create standing waves. In fact, such a room will have standing waves at all wavelengths between the minimum and maximum separation distances. Rectangular rooms have the most predictable, controllable behavior, while exotic room shapes are often problematic. Room modes, the resonant frequencies of a room’s geometry, are closely related to standing waves. Room modes occur when sound waves of a particular frequency reflect between two or more walls at an integer multiple of the wave period. In other words, the reflected wave aligns perfectly with the source, creating resonances that overemphasize certain frequencies, coloring the sound. Careful selection of room dimensions and acoustic treatments are the best methods to minimize the effect of room modes. See Holman (1999) for details on selecting room dimensions. Reverberance is due to the reflections of sound waves encountering surfaces. The character of the reverberance is a function of room dimensions, as well as the material properties of the reflecting surfaces. Because of this variability, each space has its own acoustic signature. Early reflections are the most significant component of this signature. In order to allow virtual environments to take on the acoustic signature of the virtual environment, the audio system must counteract the reverberance of the listening environment as much as possible. While
Audio
99
complete elimination of room effects is generally not feasible, acoustic treatments and equalization can suppress the room signature sufficiently to allow a neutral listening environment and successful application of virtual reverberance (see “Environmental Effects”). Uneven room responses complicate suppression of room effects. Because the frequency response and reverberance vary spatially, different points in the listening space affect sound very differently. Therefore, room equalization is essential for successful audio (see “Room Equalization”). Rooms with little reverberance and a flat frequency response, known as “dead” rooms, are preferable for VEs since they do not interfere with the virtual scene. Dead rooms have a trade-off since they require significantly more sound input to sound natural, which tends to yield a “brighter” sound from loudspeakers. Acoustic Treatments There are two types of acoustic treatments: absorptive and diffusive. Absorptive treatments diminish reverberance by absorbing acoustic energy over a range of frequencies. No single absorber type functions well over the entire frequency range. Therefore, when selecting absorbers, it is important to select a set that ensures coverage of the entire audible frequency range. Diffusors, on the other hand, absorb very little sound, spreading incoming energy in all directions, weakening reflections. Effective room treatments utilize both diffusion and absorption. Current best practices place them asymmetrically, with diffusors spatially opposing absorbers and vice versa. In other words, diffusors and absorbers alternate and are spaced such that diffusors face absorbers on opposing walls. This arrangement ensures that each wave front is diffused and absorbed consecutively, allowing absorbers to work together while preventing intense early reflections. Placing absorbers opposite one another creates the possibility of waves fluttering back and forth between them, with slow decay; asymmetric installation works most effectively. It is not generally necessary to cover the walls from floor to ceiling; rather, treatment panels can be vertically centered about the average listening position. Some manufacturers (for example, StudioPanel and Auralex) offer installation advice or design software that can help plan the layout pattern in oddly shaped rooms. Note that the principles of asymmetric layout and absorption over the entire audio range will be highly effective for most VEs. Floors and ceilings present some special problems. Ceilings should be treated in a manner similar to the walls if possible. In spaces with tiled ceiling, replacing a portion of the tiles with Nubby acoustic tiles is fairly effective. Usually heavy floor treatment is not feasible in VEs; however, carpeting with thick rubber underlayment is effective and is certainly a great improvement over such hard surface materials as concrete or hardwood. Any large, hard surfaces, such as doors, pillars, and cabinetry, should also undergo acoustic treatment.
100
VE Components and Training Technologies
Loudspeaker Delivery Algorithms Rendering Techniques When designing an audio setup, weighing the pros and cons of each trade-off, it is helpful to have a basic understanding of rendering algorithms, their assumptions, and requirements. Amplitude Amplitude based schemes are the simplest, most common audio imaging methods. These algorithms leverage the principle of localization based on IID (see “Amplitude: IID Localization”). Increasing loudness of particular loudspeakers while diminishing that of corresponding loudspeakers, the algorithm alters the perceived angle of the source image. By maintaining a constant power output over all angles, the algorithm moves the source image with no changes in loudness. That is, although amplitudes of particular loudspeakers change, the total power incident at the ear is constant. Amplitude schemes assume that all loudspeakers are effectively equidistant from the listening position, so they yield equal intensity at the listening position and their wave fronts arrive simultaneously. If the loudspeakers cannot be placed equidistantly from the listening position, amplitude and delay adjustments can compensate for small differences (see Rumsey, 2001). Amplitude based schemes create virtual sources strictly on the loudspeaker boundary; they cannot produce a perceived image closer to or farther from the listener than the loudspeakers. Amplitude schemes are sensitive to listener location. If the listener is too close to a loudspeaker, precedence effects dominate localization. Amplitude techniques suffer from “sweet spot” problems, meaning the effect falters outside a small, central area. Room equalization techniques can help widen the sweet spot to an acceptable size. Finally, amplitude techniques also lack an elevation model, though naive attempts offer reasonable results by using a large number of closely spaced loudspeakers. Multichannel, 5.1, 7.1, and 10.2 Systems Multichannel formats are conventionally referred to by two numbers separated by a period or “point” (for example, 5.1, 8.1, and 10.2). The number before the point refers to the number of loudspeakers, while the second number refers to the number of subwoofers in the system. This nomenclature does not specify the locations of the loudspeakers, though common usage has affiliated some names with particular layouts. For example, “5.1” usually refers to the setup with three frontal loudspeakers and two rear (surround) loudspeakers. This 5.1 format is often misunderstood due to the name “surround sound.” This term leads many to incorrectly assume that the 5.1 format allows for 360° virtual source placement. Rather, source placement suffers from large “holes” to the sides and rear, where the loudspeakers are too far apart for stable imaging. The three frontal loudspeakers provide fairly precise and stable source placement,
Audio
101
while the rear two loudspeakers are intended for ambient effects, like reverberance or background sounds, to give a sense of the spatial environment. As such, this setup is suitable for environments where the participants will face only forward and surround imaging is not a priority. The 10.2 format aims to address these shortcomings by adding loudspeakers to the sides and rear as well as two dipole loudspeakers (for diffuse field) and two height channels, which emulate early ceiling reflections in the virtual space (the most important cue affecting perception of the virtual acoustic space). As a result, 10.2 is ideally suited to theater environments with the added benefits of true surround imaging and excellent spatial effects. Interactive rendering for 10.2 is an ongoing area of research. Delay Delay based spatialization algorithms, sometimes referred to as delay imaging, leverage precedence (see “Precedence: ITD Localization”). By adjusting amplitudes and delay times for an array of loudspeakers, these algorithms alter perceived source locations. Delay methods are less common than the amplitude schemes, although they are often used in sound reinforcement applications. For large venues delay imaging may be preferable because of its reduced susceptibility to precedence artifacts, such as audience members on the left or right of the venue perceiving sound from only the nearest loudspeaker, rather than a spatialized image. Some hardware, such as front of house mixers, offers delay imaging. At the time of this writing, software implementations are not widely available, though a few research labs have experimented with this technique. As computational audio becomes more widespread, delay imaging may become more common, expanding the palette available to virtual environments. Ambisonics Ambisonics is a popular technique known for its mathematical elegance, flexibility, and extensibility, supporting arbitrary loudspeaker setups and an elevation model. However, critics complain that it yields a “phasey” sound and that the mathematical model is invalid since it assumes a point-source listener, which ignores the fact that the head has two ears. On the other hand, proponents of Ambisonics argue one must take a few minutes to learn how to listen to it for the maximal effect, at which point the “phasey” sound disappears. Some have drawn an analogy to stereoscopic visual images, which require some practice to view, but once learned, the effect is very convincing. Ambisonic spatialization can be applied as a post-process despite the common misconception that Ambisonics work only through special recording techniques. Higher order Ambisonics offers solutions for elevation, though increasing complexity audio reproduction system. The Web site www.ambisonic.net is an excellent source for further reading on Ambisonics techniques as well as for specific implementation details, such as loudspeaker arrangements and software packages. Wave Field Synthesis (WFS) WFS aims to reproduce a sonic wave field by using numerous loudspeakers (tens to hundreds). The technique is elegant and effective, though difficult to
102
VE Components and Training Technologies
implement. The chief advantage of WFS is that it passes the “pointing test.” That is, listeners will perceive the same location of a virtual sound irrespective of their location in the listening area. Unlike other systems where precedence can lead to incorrect localization, WFS causes listeners to the left of a virtual source to hear it on their right and vice versa. For example, given a virtual source placed in the middle of a theater, the entire audience would point toward the center location when asked to localize the source, rather than pointing to the same loudspeaker. WFS carries moderate to large hardware cost and requires many computers and a great deal of calibration. The company IOSONO offers prepackaged systems and installations, ideal for theater environments. Critics of WFS complain about a phasey sound and that while the virtual sources pass the pointing test, the sources always seem to emanate from the loudspeaker boundary rather than a location in free space. Finally, the latency in WFS systems may be too high for certain applications. EQUIPMENT CONSIDERATIONS Loudspeakers The market offers an expansive range of loudspeakers. Sorting through numerous variables and trade-offs can be daunting. Unfortunately the wide variances in listening spaces, VE setups, and budgets prevent a silver-bullet solution. Nonetheless, a few basic principles, covered in the following sections, serve as a good starting point for the selection of loudspeakers. There are three loudspeaker types of primary interest for virtual environments: direct radiators, dipoles, and loudspeaker arrays. Direct radiators are meant to face the listener, with optimal performance on axis, much like a spotlight. These loudspeakers are readily localized since their position is audibly clear. Dipoles, which radiate in a figure-eight pattern, are ideal for enveloping, nondirectional sound. They are normally arranged to radiate their energy in the directions perpendicular to that of the listener location so that all sound reaching the listener is reflected and diffuse. Loudspeaker arrays are comprised of a set of radiators associated with a common channel. The individual radiators are often decorrelated, which creates a vague spatial impression over a large area. These arrays are effective for sounds that are not precisely located, such as rear background ambience. Conversely, correlated loudspeaker arrays used for “beam forming” allow control over the array’s spatial radiation pattern. Each of these types has advantages and disadvantages for immersive audio. Audio setups may incorporate more than one type of loudspeaker, depending on goals and requirements (see Holman, 1999). Each loudspeaker type uses a driver to produce pressure waves. In the ideal case, a single-driver loudspeaker would yield optimal imaging, since all frequencies would emanate from the same point. However, the broad bandwidth and high dynamic range of sound cannot be reproduced by a single-driver solution. Therefore, drivers of different sizes handle segments of the frequency range. The varying precision of localization with respect to frequency is at odds with multiple
Audio
103
drivers at different locations. Subwoofers offer a means to distribute low frequency energy about the room, which allows satellite loudspeakers to employ smaller drivers in a compact enclosure, acting more like the ideal single-driver device. These are called multiway loudspeakers, often written as 2-way or 3-way, and so forth, where the number refers to the number of drivers used. Finally, coaxial loudspeakers align their drivers about a central axis to create a point-source unit. Often these drivers are unhoused, requiring in-wall installation, though several manufacturers (for example, Tannoy, Bag End, and EMES) market coaxial studio monitors. Frequency Response Frequency response refers to the magnitude of output for each frequency in the input range. The ideal frequency response would be a flat curve from 20 Hz to 20 kHz, meaning that input frequencies with equal energy have equal energy in the output. No loudspeaker extant has the ideal frequency response. When evaluating response curves, seek those as close to the ideal as possible, with smooth variations throughout the loudspeaker’s intended listening range. In full-range loudspeakers, the listening range is 20 Hz to 20 kHz. In loudspeakers intended for use with a subwoofer, the response decays steeply toward the low frequencies. Therefore, the evaluation range for such loudspeakers extends down to the crossover point at which the subwoofer predominates. In other words, steep decay in the bass roll-off is not an indication of poor performance. Instead, evaluation of these curves must consider the subwoofer response as well (see “Crossover/Bass Management”). Manufacturers of high quality loudspeakers (for example, Genelec, Mackie, and JBL) often offer frequency response curves upon request, as well as in brochures and on their Web sites, in order to demonstrate the quality of their products. In products for which these data are unavailable upon request (for example, most computer and low end home stereo loudspeakers), performance is generally too poor for use in virtual environments. Directivity Loudspeakers’ frequency responses vary with listening angle. Their low frequencies tend to radiate more broadly than their higher frequencies, which can be highly directional. Flatness and smooth transitions over the range of listening angles are the key criteria in assessing a loudspeaker’s directional performance. In a direct radiator, high directivity is desirable for accurate localization (for example, the frontal direction). Directivity should vary as little as possible with respect to frequency to avoid coloration effects (Holman, 1999). Some manufacturers publish a directivity index (DI), which measures in dB the spatial radiation pattern over the frequency range. Increasing DI indicates higher directivity where every 3 dB halves the radiation angles. For example, 0 dB implies an omnidirectional (spherical) radiation pattern, while 3 dB indicates a
104
VE Components and Training Technologies
hemispheric pattern, 6 dB spans a quarter sphere, and so on (see Holman, 1999, for more details). Ideally the curve of DI versus frequency should be as flat as possible. Any variations should be smooth since sharp changes can cause detrimental coloration in the output. Research suggests that a DI of roughly 8 dB in the mid frequencies is ideal (Holman, 1999; Rumsey, 2001). Headroom and Dynamic Range Headroom, given in dB, refers to the amplitude available between the operating level of a device and the level at which clipping distortion occurs. The dynamic range of a loudspeaker indicates the SPL range the loudspeaker can produce without distortion. Specifically, dynamic range is a value given in dB that indicates the ratio of the loudspeaker’s maximum SPL to its minimum output, or noise floor. In loudspeakers of sufficient quality for VEs, the noise floor must be below audibility at the listening position. Dynamic range deserves close attention when selecting loudspeakers. Loudspeakers should support a peak SPL of at least 103 dB at the listening position to allow sufficient headroom for equalization (Holman, 1999). Crossover/Bass Management Full-range loudspeakers have an even frequency response over the audible range 20 Hz to 20 kHz. Full-range loudspeakers are very expensive, large, and heavy. Because the auditory system does not localize very low frequency sounds very precisely, subwoofers offer a practical approach to full-range reproduction since limiting spectral range allows the satellite loudspeakers to be relatively small and inexpensive. In virtual environments wherein there may be several loudspeakers (even hundreds in some cases), it makes little sense to replicate low frequency capability for each loudspeaker. Instead, bass management hardware filters low frequency signal content, routing it to the subwoofer(s). This arrangement effectively extends the spectral range of the satellites, making feasible the use of numerous loudspeakers for spatial audio. Successful implementation of a subwoofer-satellite system requires a crossover matching the capabilities of the transducers. The crossover bass manages frequencies below a specified cutoff point. As the satellite loudspeakers’ frequency response decay in the low end, the subwoofer can take over. This transition must be smooth to avoid audible artifacts. Several manufacturers offer subwoofer/satellite systems with integrated bass management. Since bass energy can quickly consume the headroom of a subwoofer, overloading it, systems with a large number of satellite loudspeakers should distribute bass energy over multiple subwoofers. Although the auditory system does not locate very low frequencies with precision, it can determine which side of the body the subwoofer is on. Therefore, for applications wherein spatial sources with significant low frequency content (for example, military vehicle sounds) play a significant role, place four subwoofers around the training area for spatial reproduction.
Audio
105
Near-Field Monitors When listening to loudspeakers in a closed environment, there are two sources of sound: the direct sound from the loudspeakers and the reverberant sound in the room. A near-field monitor is a loudspeaker designed for close listening distances where the direct sound predominates, reducing perception of room acoustics. In practice counteracting room acoustics is complex, requiring more than short loudspeaker distances; however, the near-field approach can be very effective when combined with acoustic treatment (see “Room Acoustics”) and are commonly used in VE setups. Active and Passive Monitoring Active loudspeakers contain an integrated amplifier, whereas passive loudspeakers are driven by an external amplifier. In either case, amplifier and transducer must be matched to one another. In the case of studio monitors, the designers take on the matching, tuning, and optimizing of the amplifiers for the specific characteristics of the transducers and crossovers. With passive loudspeakers, the burden of matching amplifiers is on the buyer. Studio monitors often have other advantages as well. Many are meant to be used with a specific subwoofer and are matched accordingly. Features such as adjustable bass roll-off and volume knobs allow per-channel adjustments that are very convenient and flexible. Additionally studio monitors are meant for near-field use, with high directionality making them ideal for VEs. Generally studio monitors with matched subwoofers are the best option for VEs. In some cases studio monitors are not appropriate. For example, in very large setups where the loudspeakers are distant from the listening position, much more powerful systems are in order. Other factors, such as mounting weight restrictions or other logistical concerns, may make decoupled amplifiers preferable. In situations such as live fire exercises where there is increased risk of repeated equipment destruction, decoupled amplifiers may relieve financial stress as only the transducers would require frequent replacement. Pairing loudspeakers and amplifiers is a complex topic beyond the scope of the current discussion. But there are numerous online resources devoted to this topic. Perhaps the best resource is a knowledgeable representative at a large professional audio vendor (for example, B&H, GC Pro, Sweetwater Sound Inc., and so forth) or from the manufacturers themselves. Finally, these two loudspeaker types have different wiring considerations. Because active loudspeakers have integrated amplifiers, they require separate power and signal cables. Passive loudspeakers instead require power to be sent along the signal lines, which can be problematic over long distances. Setup and Calibration The chosen rendering method will dictate the precise loudspeaker layout, but a few rules hold in general. Avoid wall cancellations and overly strong reflections by placing the loudspeakers far from walls. Dipoles, in particular, must be far
106
VE Components and Training Technologies
from walls to create a diffuse field. Distance loudspeakers from ceilings and floors, which can cause acoustic loading that is detrimental to sound quality and can damage equipment. Direct radiators offer optimal performance on axis (see “Directivity”). Since high frequencies are highly directional and readily localizable, loudspeakers are best placed with their tweeters at ear height, pointed toward the listening position. The loudspeakers should be placed equidistantly from the center of the listening area. In cases where equidistant placement is not possible for certain loudspeakers, hardware devices can compensate by delaying the signal of the closer loudspeakers such that the wave fronts from all loudspeakers arrive simultaneously at the listening position. Some automated equalization hardware will compensate for uneven placement (see “Room Equalization”). Many professional audio devices also provide adjustable delay for this purpose. Once placed, the loudspeakers require equalization and level alignment. Room Equalization Automating the equalization process is an active area of academic research and product development that has brought potential hardware solutions to the market with more likely to follow. Some devices aimed at the high end consumer market are suitable for virtual environments with compatible setups (for example, 5.1, 7.1, and 10.2). Such manufacturers as Audyssey, Denon, Creston, Marantz, NAD, Onkyo, and Phase Technology offer products in this category. Such professional audio manufacturers as Genelec and JBL offer self-calibrating systems integrated with active studio monitors. For some VEs, such devices lack the necessary flexibility. For example, setups with a large number of loudspeakers are better served by manual equalization. Invaluable descriptions of the equalization process are available in Holman (1999) and Rumsey (2001). Manual equalization is challenging, perhaps less a science than an art that relies on the tuner’s ear and experience. Professional audio consulting services are available in most locales; though few specialize in multichannel audio, a competent consultant will be able to help novices to learn the fundamentals and achieve good quality equalization. Monitor Level Alignment Once equalized, monitor levels must be set uniformly using an SPL meter and a calibrated test signal. See Holman (1999) for a detailed description of this process. For safety, ensure the level on the playback device is never raised! (See “Safety”). Which level to select depends on the goals of the virtual environment; however, the SPL at the listening position should always be kept within safe limits (see “Safety”) to avoid hearing damage or deafness. The combined SPL level from all loudspeakers should never exceed 120 dB, and exposure to levels above 100 dB should be brief. By way of comparison, the standard for film is 83 dB SPL for pink noise at −20 dBF Srms (station remote manipulator system) (20 dB below full scale), yielding
Audio
107
a maximum of 103 dB SPL. Dolby’s surround mixing guidelines suggest 79 dB to 85 dB as target SPL values. Avoid the temptation to calibrate for a very high level unless it is necessary for the simulation. Most of the time, a level between 80 dB SPL and 95 dB SPL will suffice in VEs, which leaves headroom for equalization and keeps the level within safe limits. Subwoofer Placement Room acoustics are especially sensitive to subwoofer placement. The best way to find optimal placement is by experimentation with the equalization measurement setup in place to find the best response curve while feeding a pink-noise signal to the subwoofer (see “Room Equalization”). If the subwoofer lacks bass management circuitry, the pink-noise signal must be bandlimited to the subwoofer’s maximum frequency. To avoid equipment and hearing damage, ensure that the subwoofer filters out frequencies below its lower threshold. Small adjustments in subwoofer location can have a dramatic effect on room response. Patient, careful measurement is necessary to ensure optimal placement. A good starting point is slightly off-center with the driver between 6 and 20 inches from the front wall (consult manufacturer guidelines for specific distance requirements; placement too close to a wall can damage the driver). Then move the subwoofer a little to the left or right until it produces an optimal response at the listening position. Avoid corner placement, which can exaggerate bass response. Because the loudspeakers and subwoofer can be different distances from the listening position, phase correction is vital. Mismatched phase will create a dip in the frequency response around the crossover point. Since each subwoofer affords different controls, the instruction manual is the best reference for how to correct for phase differences. Generally, this adjustment occurs after level adjustment (see “Subwoofer Level Alignment”). Subwoofer Level Alignment Many subwoofers do not have a volume control, offering instead a sensitivity adjustment that is defined in terms of total SPL at a specified distance. With equalization measurement equipment in place, adjust the sensitivity or volume control so that at the listening position frequencies below the crossover point are at the same level as those above. Then proceed with phase adjustment. Note that subwoofer use requires bass management. If the chosen subwoofer does not have integrated bass management, install bass-management hardware in the signal path. Headphones With use of a dedicated amplifier, many headphones offer performance rivaling that of the best loudspeakers, yet at a significantly lower cost. Headphones offer the additional advantages that they avoid problems with room acoustics and are significantly easier to calibrate than a multichannel system. Open-ear headphones achieve excellent performance and are comfortable for long sessions, but do not isolate the listener from outside sounds. Closed headphones offer
108
VE Components and Training Technologies
isolation, but can become hot and uncomfortable quickly. They also tend toward excessive bass due to pressure buildup in the closed space. Closed headphones, therefore, achieve high quality frequency response at higher financial cost than open-ear headphones. Finally, high end in-ear sets, such as those from Etymotic Research, Inc., offer a compromise yielding excellent performance and isolation with less discomfort than closed headphones. Most in-ear phones suffer from the “microphone effect” when the cable rubs against clothing or impacts objects, making them unsuitable for scenarios with much participant motion; however, the market has recently produced decent wireless headphones and earphones with reasonable performance. Be sure to audition such systems to ensure they meet the application demands. Head-Related Transfer Functions Spatial reproduction can be very effective over headphone systems employing head-related transfer function (HRTF) processing. Processing the source signal against a given HRTF for each ear yields a pair of waveforms that, when played directly into the ears, yields a spatialized source image. Because each person has a unique anatomy, each has a unique HRTF. The ideal application of HRTF processing involves measuring each user’s unique HRTF for use during playback. The measuring process is sensitive, time consuming, and generally requires the use of an anechoic chamber. While listener-specific HRTF renderings remain the most effective, they are generally inaccessible. However, much research has focused on generic HRTFs effective for all listeners. Most generalized HRTFs suffer from difficulties with frontal imaging, while rear and side imaging can be effective. The results can be convincing. But since each ear has its own HRTF, efficacy varies per listener. However, with a good generalized HRTF, users can quickly adapt to the system. Because both ITD and IID cues are included for each frequency, HRTFs can provide highly accurate localization. HRTFs also avoid many of the pitfalls of other panning schemes. For example, there is no problem with precedence as the user walks around using an HRTF system. High quality headphones are ideal for HRTF processing. There are two obstacles to using headphones in VEs. The first is that in many scenarios encumbering participants with additional equipment is unacceptable. Second, head rotation can break the spatial effect since a static source will appear to rotate along with the listener’s head, which is contrary to normal experience. To leverage HRTF processing, VEs require a head-tracking system to correct for head rotations. Several game-oriented sound cards offer headphone spatialization via HRTF processing with varied success. The analog output on even the best cards is generally too noisy (see “Hardware Selection”) for VE use, so it is best to avoid game cards or to select cards that offer HRTF processed audio via digital output. Though Creative Labs has been the dominant manufacturer of game audio cards in recent years, such companies as Dolby and Nvidia are bringing new products to market that may offer new possibilities for VEs.
Audio
109
Some software packages with HRTF processing support several types of audio hardware. For example, FMOD runs on a variety of consoles and PC cards, as well as professional audio devices that ensure low noise output. Recently “surround” and 5.1 headphones have come to market. These are in two classes, those targeted toward gamers and more high end earphones. The cheaper segment can be effective for gaming, but lacks the performance needed for a convincing virtual image. Even the higher end headphones suffer from the same problems as generic HRTFs and 5.1 systems, specifically difficulties with side and rear imaging. If possible, try out headphones in this category to aid evaluation. AuSIM’s GoldSeries products offer high quality spatial audio solutions for multiple listeners and a large number of virtual sources. They are available with integrated head-tracking hardware. Such high end solutions are prized for their nuanced sound and highly accurate localization The AuSIM products and their ancestors (for example, Convolvotron and Acoustetron) have a highly regarded position in headphone based virtual audio. Individualized HRTFs through custom-fitted in-ear phones and a reference amplifier offer the highest quality sound. While such highly detailed reproduction is rarely necessary in a VE where sound is one of several stimuli, it is worth noting here as a point for comparison. Headphone Amplification Headphones of sufficient quality for use in VEs generally require a dedicated amplifier. Headphone amplifiers on the market range in price from around $50 to several thousand dollars. For VEs, a high quality, low noise amplifier with reasonably flat frequency response (around $200 to $600) is the base requirement and will suffice for most applications. VEs using high end headphones will benefit from a higher quality amplifier (in the range from $600 to $1,000). Scenarios in which reproduction with nuanced detail is crucial will need an even more sophisticated amplifier and excellent headphones (see “Vendors and Manufacturers”). Finally, for applications using HRTF processing via digital output, high quality headphone amplifiers with integrated digital-to-audio converters are a costeffective option. Such manufacturers as Benchmark, HeadRoom, Grace Design, Grado, and STAX offer products in this area. Ironically, most professional audio (proaudio) headphone amps have inferior performance since they generally are studio task oriented, geared toward such functions as signal distribution and talk-back rather than toward critical listening, making them unsuitable for VEs. To calibrate headphones simply apply the method described for loudspeakers (see “Monitor Level Alignment”), but with the microphones placed where the ears would be. Note that headphone amplification must remain fixed after calibration. Because of the possibility of hearing damage due to erroneously loud output, employ safety measures that ensure output level never exceeds a prescribed maximum (for example, a brick-wall limiter) (see “Safety”). Additionally, a policy of setting the volume to zero before each use, then gradually raising the volume to the calibrated level guarantees that no accidental audio bursts harm participants
110
VE Components and Training Technologies
during scenario setup. If possible, listeners should not don their earphones until all devices and simulation software have been booted and initialized. SAFETY Powerful audio equipment is extremely dangerous. To protect the hearing of trainees and workers, engineer safety procedures and enforce them rigorously. A few simple precautions will protect people and equipment. The threshold of pain is around 120 dB SPL, with immediate hearing damage or hearing loss around 145 dB SPL. Calibrate all loudspeakers to a maximum combined level below 116 dB SPL at the listening position. Require hearing protection during setup and calibration, and when testing new audio equipment or software drivers. Install a brick-wall limiter in the signal path to each channel. These devices attenuate signals above a specified maximum level ensuring inappropriate signal levels do not reach the loudspeakers. Some devices (for example, dbx ZonePRO processors) provide limiters and can be networked to a universal volume/mute control, an excellent precaution. Always mute the system and turn the volume down before and after each run. Increase the volume gradually at the start of each session. The most common and most dangerous problem in a multiloudspeaker system is the propagation of a single small error through all loudspeakers. A small pop, replicated 10 or more times becomes extremely loud. To protect against this common hazard, install a sound-pressure monitor in the listening space that cuts the power and signal to all loudspeakers if the measured sound reaches an unsafe instantaneous threshold (for example, a loud click) or exceeds a value integrated over time (for example, feedback). Operators should consider custom earplugs with flat frequency response. This inexpensive precaution affords attenuated listening without undesirable coloration or filtering. COMPUTING AUDIO Platforms and Application Programming Interfaces There are many platforms and application programming interfaces (APIs) for computing audio. In the context of VEs, CRE_TRON from AuSIM is perhaps the best known. The CRE_TRON API controls AuSIM’s HRTF based audio engine, AuSIM3D, which offers high quality audio via headphones. AuSIM also offers direct integration with head-tracking hardware, useful for VEs intending to use headphones and HRTFs. Some APIs, such as OpenAL, FMOD, and DirectSound, offer high level controls (for example, spatialized audio, filters, and environmental reverberance). These implementations generally assume standard loudspeaker arrangements common in video game setups. Yet VEs commonly utilize different setups with larger numbers of output channels and nonstandard loudspeaker layouts to achieve high quality reproduction. These VEs will require specialized audio
Audio
111
platforms that offer spatialization over arbitrary setups. These systems support arbitrary loudspeaker setups through such algorithms as VBAP (Vector Base Amplitude Panning; Pulkki, 1997), Ambisonics, or SPCAP (speaker-placement correction amplitude panning; Sadek & Kyriakakis, 2004). Some of these systems offer an API based on a standardized interface, such as OpenAL (for example, ARIA [Sadek, 2004]), allowing them to be “dropped in” to existing systems. VEs with very demanding requirements or complex software may need to develop an in-house audio engine on top of low level platforms such as ASIO, CoreAudio or PortAudio (Greenebaum, 2004). Finally, companies such as VRSonic that specialize in audio for VEs offer tools for content production, as well as services for setup and design. Hardware Selection The market offers a wide range of audio hardware devices with rich feature sets. A few key aspects of hardware selection are critical for VEs, namely, bit depth, sampling rate, and latency in addition to such general performance considerations as signal-to-noise ratio (SNR) and quality of digital-analog converters (DACs). For example, in the common case of an installation with many loudspeakers at high amplification, the signal-to-noise ratio is critical. A bad SNR inhibits the dynamic range of the system. Consider a training scenario with occasional gunfire; this scene requires a huge dynamic range. A high quality, well-isolated audio system may produce 80 dB of dynamic range, while a medium-grade system yields about 50 dB SNR. The 30 dB of additional dynamic range yields reproduction far superior and, in fact, necessary for the training scenario. Consumer or video game hardware cards generally include the DACs directly on the card, which yields poor noise performance. For example, the electrical noise from hard disks, graphics cards, and so forth, contaminate the analog output in a clearly audible manner. These manufacturers list impressive SNR performance for their DACs, but these measurements are often conducted in isolation, ignoring the contaminating effects of an operating computer system. Smallscale testing before investment is the best way to ensure good noise performance. Users can test the noise performance with free software tools, such as the RightMark Audio Analyzer. Due to the large amount of electrical system noise on a running computer, choose hardware with DACs on a separate interface or breakout box rather than attached to the system bus directly. Many of these interfaces connect to the bus with a PCI (peripheral component interconnect) or PCIe (peripheral component interconnect express) card, which is perfectly acceptable because the DACs’ location in the external interface can isolate them from bus noise. Many of these interfaces connect via a universal serial bus (USB) or FireWire rather than through a PCI card. Note, however, that a poorly designed breakout box may not protect against system noise; a well designed PCI card may shield against system noise more effectively. Because FireWire has higher priority than the USB, FireWire breakout boxes generally suffer less noise and fewer dropouts than USB hardware.
112
VE Components and Training Technologies
Many manufacturers (for example, MOTU, RME, Digidesign, and PresSonus) offer high quality external interfaces with isolation from system noise. Hardware with optical connections avoid electrical noise transmission by sending a digital signal over a nonconductive medium to an external digital-to-analog converter. This is the most effective way to combat bus noise. Bit Depth To capture the dynamic range of human hearing requires 24 bits, so 24 bit DACs are necessary; avoid 16 bit DACs. Internal processing (filtering, mixing, and so forth) requires higher bit depth for accurate computations. Most manufacturers of high quality hardware utilize a high bit depth processing chain. Be certain to look for this if using the device for any internal computations (for example, mixing, filtering, limiting, and gain). Finally, note that the quality of the DACs is important. Low quality 24 bit DACs may have as few as 18 effective bits, with the remaining bits essentially noise. Seek as much information as possible about the DACs used in hardware under consideration. Such analysis tools as RightMark can help in this evaluation. Reputable manufacturers (for example, RME, MOTU, and so forth) generally have reasonable DAC performance. Sample Rate Hardware interfaces often have an adjustable sampling rate. In general, 48 kHz is ideal for most VEs. Some hardware offers sampling rates of 96 kHz or even 192 kHz. Debate over the use of these high resolution formats continues. However, while 48 kHz encompasses the entire audible frequency range in theory, practical complications suggest that high resolution formats will be prevalent in the future. For example, clock jitter and bit rounding create audible artifacts in frequencies near the Nyquist limit of the sampling rate. Some have also argued that higher sampling rates can offer lower latency, as well as improved internal processing during filtering. Finally, some DACs that operate at higher clock rates (for example, 192 kHz) offer improved SNR performance when run at lower clock rates (for example, 48 kHz). Most VEs will not need the higher clock rates, though there is no harm in using hardware that supports them. However, if the higher sampling rate is not needed, set the hardware to 48 kHz to avoid the performance cost of processing the larger number of samples. Buffer Length and Latency The buffer length on the audio hardware affects both the audio latency and system performance. Latency refers to the time in between the time a sound sample enters a buffer and the time when it is heard. The sample must wait until the previous buffer is processed, making a direct correlation between buffer length and latency. Because the buffer processing speed is bound to the clock rate of the DAC, buffer length is directly related to real time. Each buffer has a certain
Audio
113
processing overhead. Shorter buffers yield lower latency, but require more CPU time, while longer buffers consume less CPU, but cause greater latency. Tactile applications invoking a sampled sound (such as firearm triggers) require low latency sounds to avoid feeling sluggish or “gummy.” High quality hardware devices offer latencies ranging from under 1 ms (millisecond) to 100 ms or more, allowing room for adjustment when trading latency against CPU performance. ENVIRONMENTAL EFFECTS Since each space has a unique acoustic character, emulating the acoustic characteristics of the virtual world solidifies the effect of the VE by blurring the line between the physical and the virtual. Game systems such as Creative Labs’ Environmental Audio Extensions or FMOD offer filters and reverbs, modeling acoustic spaces and occlusions. These effects are generally not physically accurate, but are tuned to sound plausible and are quite effective in a video game setting. However, as mentioned earlier (see “Platforms and Application Programming Interfaces”) the spatialization algorithms in these systems require the type of setup used in a gaming environment and cannot be extended to the high end setups employed in VEs. When physically accurate reverberance is required, VEs can leverage sophisticated acoustic modeling software to generate reverbs or use commercially available reverbs from measured data. Alternatively, VEs can use a system such as FMOD that offers digital signal processing functionality leveraging direct access to output channel to implement their own spatialization using an algorithm such as VBAP, Ambisonics, or SPCAP (see “Rendering Techniques” and “Platforms and Application Programming Interfaces”). First-order reflections (early reverberance) are the most important psychoacoustic cues regarding the surrounding space. Accurate reproduction of early reverberance materializes the virtual space, improving localization and immersion. When evaluating environmental effects solutions, give the highest priority to reproduction of accurate early reflections. TRADE-OFFS Audio computation platforms and hardware offer many trade-offs between performance, implementation complexity, equipment cost, and feature sets. Trade-offs include certain factors that are not represented in technical specifications, but nonetheless greatly affect development. For example, poor-quality hardware drivers are detrimental to system stability. Given a software malfunction, a good driver will exit gracefully, while a poor driver will crash the computer. Check the rate at which manufacturers offer updates for their drivers as an indicator of such bugs. Latency and CPU performance are in direct opposition due to the increased overhead of processing the larger number of buffers. The situation is complicated further by driver implementations; lesser hardware interfaces tend toward worse performance at a given latency. At times, this sacrifice is worth the cost savings,
114
VE Components and Training Technologies
whereas at other times it only makes sense to select to faster, more expensive hardware. Similarly, game cards offer desirable features such as environmental effects and occlusion. A VE may choose these features at the expense of sound quality, setup flexibility, and system stability. Conversely, VEs can utilize reverbs based on sophisticated models or measured data at the cost of higher implementation complexity and higher computational expense. These trade-offs extend into setup and calibration as well. Self-calibrating systems can save much time, at greater equipment cost, while manual calibration can achieve optimal quality with significant time investment. The possible trade-offs are too many to list and vary per VE. Each design must account for the specific needs and goals of the VE when determining which trades to make. Careful attention to these details during the design phase goes a long way toward a smooth, successful implementation.
VENDORS AND MANUFACTURERS Following is a list of Web sites and vendors specifically mentioned in this text. Holman (1999, pp. 255–266) offers a more inclusive appendix of manufacturers and resources from measurement equipment to outboard equipment and multichannel meters. Acoustic Treatments StudioPanel: www.studio-panel.com Auralex: www.auralex.com Automated Equalization Audyssey: www.audyssey.com Denon: www.denon.com Headphones and Headphone Systems HeadRoom: www.headphone.com AKG: www.akg.com Etymotic: www.etymotic.com Grado: www.gradolabs.com Sennheiser: www.sennheiserusa.com Shure: www.shure.com AuSIM: www.ausim3d.com Loudspeaker Manufacturers Genelec: www.genelec.com JBL: www.jbl.com Mackie: www.mackie.com ADAM: www.adam-audio.com Tannoy: www.tannoy-speakers.com Professional Audio Vendors B&H: www.bhphotovideo.com GCPro: www.gcpro.com Sweetwater: www.sweetwater.com
Audio
115
ACKNOWLEDGMENT OF SPONSORSHIP The project or effort described here has been sponsored by the U.S. Army Research, Development, and Engineering Command (RDECOM). Statements and opinions expressed do not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred. REFERENCES Begault, D. (1994). 3D sound for virtual reality and multimedia. Moffett Field, CA: Ames Research Center. Blauert, J. (1997). Spatial hearing: The psychophysics of human sound localization. Cambridge, MA: MIT Press. Cook, P. (Ed.). (2001). Music, cognition, and computerized sound: An introduction to psychoacoustics. Cambridge, MA: MIT Press. Cook, P. (2002). Real sound synthesis for interactive applications. Natick, MA: A K Peters. Crocker, M. (Ed.). (1998). Handbook of acoustics. New York: Wiley. Everest, F. (Ed.). (2001). The master handbook of acoustics. New York: McGraw-Hill. Greenebaum, K. (2004). Audio anecdotes: Tools, tips, and techniques for digital audio. Natick, MA: A K Peters. Haliday, D., Resnick, R., & Walker, J. (2007). Fundamentals of physics extended (4th ed). New York: John Wiley & Sons, Inc. Holman, T. (1999). 5.1 channel surround sound: Up and running. Oxford: Focal Press. Lyons, R. (2004). Understanding digital signal processing. Upper Saddle River, NJ: Prentice Hall PIR. Pulkki, V. (1997). Virtual sound source positioning using vector base amplitude panning. Journal of the Audio Engineering Society. Rumsey, F. (2001). Spatial audio. Boston: Focal Press. Sadek, R. (2004). A host-based real-time multichannel immersive sound playback and processing system. Proceedings of the Audio Engineering Society 117th Convention. Sadek, R., & Kyriakakis, C. (2004). A novel multichannel panning method for standard and arbitrary loudspeaker configurations. Proceedings of the Audio Engineering Society 117th Convention. Sherman, W., & Craig, A. (2002). Understanding virtual reality. Morgan Kaufman. Watkinson, J. (2001). Art of digital audio. Oxford: Focal Press.
Chapter 5
MULTIMODAL DISPLAY SYSTEMS: HAPTIC, OLFACTORY, GUSTATORY, AND VESTIBULAR C ¸ ag˘atay Bas¸dog˘an and R. Bowen Loftin The Sensorama is the prototypical embodiment of a virtual environment (VE), conceived and implemented years before “VR” (or “virtual reality”) became a common term. Whereas VR has been often characterized as “goggles and gloves,” Heilig (1962) developed a truly multimodal display system that provided for stereoscopic vision, binaural audition, haptics (wind in the face; vibration or “jolts”), and olfaction. Figure 5.1 shows the patent drawing for the containers that served as the Sensorama’s olfactory sources. Those containers, coupled with the system’s fan, comprise what is likely the first example of an olfactory display that was integrated with other display modalities. In many respects his invention still stands alone in terms of the degree of integration of multimodal displays in a practical device that provided a compelling virtual experience for the user. In this chapter we examine nonvisual and nonauditory displays that have been used or have the potential to be used in virtual environments for training. INTRODUCTION Motivation and Scope In most virtual environments the visual display is dominant, usually followed in importance by the auditory display. These display modalities are described elsewhere (see Welch and Davis, Volume 2, Section 1, Chapter 1; Henderson and Feiner, Volume 2, Section 1, Chapter 6; and Whitton and Brooks, Volume 2, Section 1, Chapter 12). There are circumstances, however, where other display “dimensions” are critical to the effective training of a user. In fact, there is evidence that, in some cases, vision may not be the dominant sense (Shams, Kamitani, & Shimojo, 2000). In this chapter we provide access to the dimensions of haptics, olfaction, gestation, and acceleration/orientation. The inclusion of these display modalities is motivated by the need to create a virtual environment that
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
117
Figure 5.1. The patent drawing for the containers that served as the Sensorama’s olfactory sources is shown.
maps more fully onto the real world and to exploit some uniquely human reliance on senses other than vision and audition. Loftin (2003) has considered the potential of multimodal displays to expand the human “bandwidth” for perceiving complex, multivariate data. In the training domain this may be even more important since we may wish to replicate the real world to the greatest extent possible. After all, humans routinely employ all of their senses simultaneously as they go about their normal tasks. Certainly, we can compensate when one or more senses are impaired (for example, the common cold can compromise the sense of smell), but such compensation may not be adequate for some training purposes. Also, the use of sensory modality as a substitute for another could lead to negative training or poor transfer from the training environment to the real environment. Consider some of the common circumstances in which the nonvisual and nonauditory senses play important roles. A surgeon may depend on olfaction (the sense of smell) to detect that the bowel has been perforated (Krueger, 1995). Smells can be very important in producing the crucial contexts for some training environments, including the smell of fire, blood, cooking food, an animal presence, or vegetation. In some cases the presence of the correct smell could “make or break” the sense of realism (fidelity) that training requires. In the technical descriptions, an effort has been made to build on what was provided in the Handbook for Virtual Environments (Stanney, 2002). Thus, we have typically included only updates on developments since 2001 for those
118
VE Components and Training Technologies
technologies that were fully described in the Handbook. This approach applies primarily to the sections on haptic and vestibular displays. The Handbook contains a very short section on olfactory displays and nothing on gustatory displays. A recent and relatively comprehensive treatment of multimodal interfaces has been produced by Kortum (2008).
HAPTICS Haptics Technology Haptics is a highly interdisciplinary research area that aims to understand how humans and machines touch, explore, and manipulate objects in real, virtual, or teleoperated worlds. One of the most distinguishing features of touch from other sensory modalities is that it is a bilateral process. We can look and observe the objects using our eyes, but cannot change their state. However, when we explore an unknown object in our hand, we instinctually rotate it to change its state (Lederman & Klatzky, 1987). Haptic exploration not only gives an idea about the shape and surface properties of an object, but also provides information on its material properties, such as softness. Perception and manipulation through touch are both accomplished via tactile and kinesthetic channels. Various types of receptors located under one’s skin are responsible for the tactile perception. These receptors can sense even very small variations in pressure, texture, temperature, surface details, and so on. Kinesthetic perception in the brain occurs through information supplied by the muscles, tendons, and receptors located in the joints. For example, while perception of textures or surface roughness is more of a tactile activity, feeling reaction forces when pushing an object involves kinesthetic system. Unfortunately, we know very little about how tactile and kinesthetic information is transmitted and processed by the brain. Developing haptic devices that enable tactile and kinesthetic interactions with real and virtual objects has been challenging. Our haptic interactions with physical objects around us mainly involve the use of hands. The human hand has a complex anatomy and function, and developing interfaces that fully imitate its sensing and actuation capabilities is beyond our reach today. This challenge can be better appreciated if we consider that the human hand has 27 degrees of freedom, each finger can be actuated by more than one muscle group, and there are approximately 100 receptors in one centimeter square area of a finger pad. Several different haptic devices have been developed to enable touch interactions with objects in real, virtual, or teleoperated worlds (see the review of devices in Burdea, 1996). In general, the performance of a haptic device is highly coupled to its design. A high quality haptic device has a low apparent inertia, high stiffness, low friction, and minimal backlash. Actuator selection affects the range of dynamic forces that can be displayed using the interface. Moreover, high sensor resolution and force update rates are desired to achieve stable touch interactions. For example, the haptic loop must be updated at a rate close to one kilohertz to render rigid surfaces.
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
119
One way to categorize haptic devices is whether they are passive or active. For example, a keyboard, a mouse, and a trackball can be considered as passive devices since they supply input only (unidirectional). On the other hand, a forcereflecting robotic arm can be programmed to display forces to the user based on his or her inputs (bidirectional). A second way that haptic devices can be categorized is based on whether they are grounded or ungrounded. For example, a joystick has a fixed base and is considered as a grounded haptic device. On the other hand, exoskeleton-type haptic devices are attached to the user’s arm (and move around with it) and are considered ungrounded. Today, most of the devices in this category are bulky, heavy, and not very user friendly. A third way to categorize haptic devices is based on whether they are net-force or tactile displays. The idea behind the net-force displays, such as the PHANTOM device (Massie & Salisbury, 1994), is to reduce the complex haptic interactions of a human hand with its environment to a single point. On the other hand, a net-force/torque display provides limited information about the complex distribution of forces that are perceived when, for example, a textured surface is stroked with one’s fingertip. The tactile devices are developed to display distributed forces to a user. For example, an array of individually actuated pins (tactile pin array) has been used to perturb the skin at the user’s fingertip. Finally, one could also distinguish between impedance control, where the user’s input motion (acceleration, velocity, and position) is measured and an output force is returned (as in a PHANTOM haptic device) versus admittance control, where the input forces exerted by user are measured and motion is fed back to the user, as in Haptic Master sold by Fokker Control Systems (http:// www.fcs-cs.com/robotics). Impedance devices are simpler to design and are most common, while admittance devices are generally used for applications requiring high forces in a large workspace (Salisbury, Conti, & Barbagli, 2004).
Training Applications In this chapter, we focus only on applications of active haptic devices in training of human operators since covering all applications of haptic devices would be a very exhaustive task. The early application of active haptic devices dates back to the 1950s. Force-reflecting devices were used to convey contact forces to a human operator during remote manipulation of radioactive substances at Argonne National Laboratory. The number of applications increased drastically in the 1990s since the appearance of commercial devices that enable touch interactions with virtual objects. Also, the concept of haptic rendering has emerged (Salisbury, Brock, Massie, Swarup, & Zilles, 1995; Srinivasan & Bas¸ dog˘an, 1997). Displaying forces to a user through a haptic device such that he or she can touch, feel, and manipulate objects in virtual environments is known as haptic rendering (see the recent review in Bas¸dog˘an, Laycock, Day, Patoglu, & Gillespie, 2008). Analogous to graphical rendering, haptic rendering is concerned with the techniques and processes associated with generating and displaying
120
VE Components and Training Technologies
haptic stimuli to the human user. A haptic rendering algorithm is typically made of two parts: (a) collision detection and (b) collision response. As the user holds and manipulates the end effector of the haptic device, the new position and orientation of the haptic probe are acquired, and collisions between the virtual model of the probe and virtual objects in the scene are detected. If a collision is detected, the interaction forces are computed using preprogrammed rules for collision response. The forces are then conveyed to the user through the haptic device to provide him or her with the haptic representation of the 3D object and its surface details. With the development of desktop hapatic devices and commercial rendering libraries, the field has shown a significant expansion during the last decade. New applications have emerged in fields, including medicine (surgical simulation, telemedicine, haptic user interfaces for blind persons, and rehabilitation for patients with neurological disorders), dental medicine, art and entertainment (3D painting, character animation, digital sculpting, and virtual museums), computer-aided product design (free-form modeling, assembly and disassembly, including insertion and removal of parts), scientific visualization (geophysical data analysis, molecular simulation, and flow visualization), and robotics (path planning and telemanipulation). In the following discussion of applications, we focus on the use of haptics in training of the human operator. For example, there is a need to improve the ability of humans to direct remote manipulation tasks; the improved performance can be achieved through simulation based training in virtual environments. One of the applications of this concept is in space exploration. For example, today, the interaction between a human operator located on earth and a rover located on the remote planet is provided through a set of edited text commands only. However, this approach restricts the complexity of transmitted commands and likewise reduces the quantity and quality of data return. In addition, these commands do not always make the best set since they are not extensively tested before being transmitted to the rover. For example, if the task involves handling and manipulation of objects (for example, collecting rock samples) a rover faces several uncertainties when executing it (for example, whether the sample is at a reachable distance, how to hold the sample, and so forth). Planning, scheduling, and synchronization of rover tasks that involve autonomous manipulation of objects will be even more challenging in the future when multiple rovers are used concurrently for planetary exploration and they have to work cooperatively. The National Aeronautics and Space Administration’s (NASA’s) Jet Propulsion Laboratory (JPL) has developed a multimodal virtual reality system for training a rover operator to plan robotic manipulation tasks effectively (Bas¸dog˘an & Bergman, 2001). This system utilizes dual haptic arms and a semi-immersive visualization system and is designed to train and prepare a rover operator for executing complex haptic manipulation tasks (see Figure 5.2). The training simulations involve a scenario where the operator commands a planetary rover while it collects rock samples. The observations and experiences gained from these simulations are used to help identify situations and issues the rover is likely to
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
121
encounter when it performs the same tasks autonomously on the surface of Mars. In this regard, mapping the activities of a human operator to the activities of a robotic system—transforming inputs from the haptic arms into control signals for the robotic system—is a challenging research problem (Griffin, 2003). Similar to the efforts at JPL, the Lyndon B. Johnson Space Center (JSC) at Houston trains spacecraft crew members for extravehicular activities (EVAs). EVA tasks, such as setting up an instrument, assembly, maintenance, or carrying out repairs, are inherently risky. One approach to minimizing risks is to train crew members in a multimodal virtual environment on earth, before they do it in space (Loftin & Kenney, 1995). A virtual model of the Hubble Space Telescope (HST) was constructed to train members of the NASA HST flight team on maintenance and repair procedures. Another approach is to use humanoid robots commanded through a telepresence interface to perform these tasks. Robonaut, developed at JSC, is an anthropomorphic, astronaut-sized robot configured with two arms, two five-fingered hands, a head, and a torso (Ambrose et al., 2000). The earlier telepresence interface of the Robonaut system utilized the CyberGlove haptic system (Immersion Corporation) to guide the articulated movements of the Robonaut arm without force feedback to the human operator. Later, CyberGlove was replaced with dual force feedback joysticks to improve the grasping abilities of the operator (O’Malley & Ambrose, 2003). Another popular application of haptics is in surgical training. From the start of medicine to the modern standardized surgical training programs, the training paradigm for surgeons has not changed substantially. Surgical training has been based traditionally on the “apprenticeship” model, in which the novice surgeon is trained with small groups of peers and superiors, over time, in the course of patient care. However, this training model has been placed under inspection, and its efficiency is being questioned by experts, physicians, and the public. According to the report “To Err Is Human” prepared by the National Academy of Science, Institute of Medicine in 1999, the human cost of medical errors is
Figure 5.2. This is a view of JPL’s multimodal virtual reality system used for training with a fleet of rovers.
122
VE Components and Training Technologies
high, and more people die from medical mistakes each year than from highway accidents, breast cancer, or AIDS combined. Minimally invasive surgery (MIS) is a revolutionary surgery technique in immediate need of improved training methods. MIS has been used in a range of procedures since the early 1960s (for example, if the surgery is done in the abdominal area, it is called laparoscopic surgery). This technology uses a small video camera and a few customized instruments to perform surgery. The camera and instruments are inserted into the surgery area through small skin incisions or natural orifices that enable the surgeon to explore the internal cavity without the need of making large openings. Major advantages of this type of surgery to the patient are short hospital stay, timely return to work, and less pain and scarring after the surgery. Although MIS has several advantages over traditional open surgery, surgeons are handicapped by the limitations of the technology. For example, haptic cues are substantially reduced since the surgeon has to interact with internal organs by means of surgical instruments attached to long thin tubes. While the importance of training in MIS has been well acknowledged, there is no consensus on the best or most effective method to do this training. Box trainers, for instance, are an inanimate model equipped with real surgical instruments, endoscopic cameras, and plastic tissue models. These trainers provide an environment similar to that of real surgery settings. However, simulated surgical procedures are usually poor imitations of the actual ones. Currently, animal training is considered the most realistic training model available. This model is dynamic and approaches real operative conditions. Animal tissues, although not always of the same consistency as human tissues, do respond in a similar way to the forces applied to them. The use of animals for training purposes, however, is expensive and controversial. Moreover, the trainee’s performance cannot be measured quantitatively. Simulation based training using virtual reality techniques (see Figure 5.3) has been suggested as an alternative to the traditional training in MIS. Surgical simulators developed for this purpose enable the trainee to touch, feel, and manipulate virtual tissues and organs through the haptic devices, while displaying high quality images of tool-tissue interactions on a computer monitor as in real surgery (see the review in Bas¸dog˘an, Sedef, Harders, & Wesarg, 2007). In addition to displaying forces arising from tool-tissue interaction during the simulation of surgical procedures, haptic devices can be also used for playing back prerecorded haptic stimuli. For example, a physician relies heavily on haptic cues when guiding a needle into epidural space. The appreciation of forces at each layer is important for the proper guidance of the needle. Dang, Annaswamy, and Srinivasan (2001) experimented with two modes of haptic guidance. In the first, the simulator displays a virtual guiding needle on the screen that moves along the same path and with the same speed as an expert in a prerecorded trial. If the user’s needle position exactly matches that of the guiding virtual needle, the user feels the same forces that the expert felt. In the event of a mismatch, the virtual instructor applies a force to pull the trainee back to the prerecorded trajectory. In the second mode, or tunnel guidance, we disregard the
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
Figure 5.3. techniques.
123
This is a view of simulation based training using virtual reality
time dependency of the recorded data such that users perform the task at their own speed. The needle’s movement is limited to the prerecorded trajectory, allowing users to concentrate solely on the forces encountered at each layer along the needle’s insertion path. Another application of haptics in medicine is in the area of rehabilitation. Since the nervous system is highly adaptive and open to reprogramming, a haptic arm can be used to teach it how to control movements. For example, a force feedback robotic arm and artificial force fields have been used to train and improve the motor performance of patients with chronic impairment after stroke (Krebs & Hogan, 2006). Patients were asked to perform goal-directed, planar-reaching tasks that emphasized shoulder and elbow movements under the force guidance of the robotic arm. Clinical results with well over 300 stroke patients, both inpatients and outpatients, proved that movement therapy has a measurable and significant impact on recovery following brain injury. There are also applications of haptic technology in military training. At Massachusetts Institute of Technology (MIT), under a large interdisciplinary program called Virtual Environments Technology for Training (VETT) funded by the Office of Naval Research, software and hardware technologies were developed to augment the perceptual and cognitive skills of the U.S. Navy students in training. For example, a virtual model of an electronics test console was developed to teach students basic electricity and electronics. Haptic interactions with toggle buttons, multimeter probes, and switches on the console were simulated (Davidson, 1996). In another study, experiments were designed to investigate whether haptic feedback improves their ability to control the direction of a surface ship
124
VE Components and Training Technologies
while they navigate the ship in a complex virtual environment where there are other ships and harbor hazards such as bridges. The main goal of this study was to teach U.S. Navy students basic concepts of vector algebra and dynamical systems. The results of this study showed that subjects have learned the influence of ship inertia and water currents on its heading better under the guidance of force feedback (see Durlach et al., 1999).
Artificial Force Fields for Training and Task Guidance One of the benefits of active haptic devices in training for telemanipulation tasks is that they can be programmed to guide or restrict the movements of the user by introducing artificial force fields. Artificial force fields, also known as virtual fixtures, have been shown to improve user performance and learning in telemanipulation tasks in real world and training tasks simulated in virtual environments (Rosenberg, 1993; Payandeh & Stanisic, 2002; Bettini, Lang, Okamura, & Hager, 2002; Bukusoglu, Bas¸dog˘an, Kiraz, & Kurt, 2006). The term virtual fixture refers to a software implemented haptic guidance tool that helps the user perform a task by limiting his or her movements to restricted regions and/ or influencing its movement along a desired path (Rosenberg, 1993). The virtual fixtures can be thought of as a ruler or a stencil (Abbott & Okamura, 2003). By the help of a ruler or stencil, a person can draw lines and shapes faster and more precisely than the ones drawn by freehand. Similar to the passive stencil, an active haptic device can be programmed to apply forces to the user in a virtual environment to train him or her for executing a task more efficiently and precisely. Obviously, this concept is not only useful for training, but also for actual execution of the task in the real world. In comparison to real physical constraints, the type and the number of virtual constraints that can be programmed are unlimited. Artificial force fields offer an excellent balance between automated operation and direct human control. They can be programmed to help the operator carry out a structured task faster. For example, studies on telemanipulation systems show that user performance on a given task can increase as much as 70 percent with the introduction of virtual fixtures (Rosenberg, 1993). Some other applications of virtual fixtures include robotic-assisted surgery and optical manipulation (Abbott & Okamura, 2003; Bas¸dog˘an, Kiraz, Bukusoglu, Varol, & Doganay, 2007). For example, Bas¸dog˘an and his colleagues showed that displaying guidance forces through a haptic device improves the task learning and performance of the operator significantly in telemanipulation of microparticles via optical tweezers. The task was to construct a coupled microsphere resonator made of four microspheres by individually steering and binding three spheres to an anchor sphere. One group was trained and used a system giving only visual feedback; the other group trained with and used a system providing both visual and haptic feedback. An artificial force field was used to help subjects position the particles precisely and to make the binding process easier. The summation of guidance forces (an artificial force field) and the estimated drag force
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
125
was displayed to the subjects in the second group through a haptic interface. After the training, the performance of both groups was tested in the physical setup. Experiments showed that guidance under haptic feedback resulted in almost twofold improvements in the average path error and average speed.
Challenges Aerospace, maritime, military, nuclear energy, and other high risk professions have been using simulators for training difficult and demanding tasks for the last 50 years. By integrating force feedback devices into simulators, some of these industries have augmented the perceptual, cognitive, and motor control skills of the human operators and reduced errors significantly. The flight simulators equipped with force feedback joysticks provide a convincing example for the importance of simulation technology and the significant role that haptics play in training. Just as flight simulators are used to train pilots nowadays, it is, for example, anticipated that surgical simulators will be used to train physicians in the near future. The role of haptic feedback in this application is also unquestionable. Moreover, several studies in the past have shown the significance of haptics in teleoperation tasks in real and virtual worlds. For example, artificial force fields not only enable us to train the human operator in virtual environments, but also help him or her execute the teleoperated task better and faster in the real world. Significant progress has been made in academia and industry in haptics, but there are still many research questions waiting to be answered. While it is difficult, and outside the scope of this chapter, to answer all these questions, we highlight some of the outstanding research challenges that require further attention: one of the constant challenges in integrating haptics into virtual environments is the need for a variety of haptic devices with the requisite degrees of freedom, range, resolution, and frequency bandwidth, both in terms of forces and displacements. Also, the price of next generation haptic devices must be significantly lower in order to be purchased by all computer users. In this regard, it is worth mentioning the Falcon haptic device, which was recently introduced by Novint Technologies and costs less than $200. It is hard to imagine that a single universal device can be used for all applications since the requirements of each application are different. For example, the motions and forces involved in laparoscopic surgical operations are small. Ideally, the haptic device used for laparoscopic training must have a fine resolution and 6 to 7 degrees of freedom. On the other hand, a haptic device designed for rehabilitation applications may have a lower resolution, but require a larger workspace. Another area of hardware design that requires further investigation is multifingered haptic devices and tactile displays. It has been demonstrated that when we gather information about the shape and size of an object through touch, our fingers and hand move in an optimal manner. Moreover, robotics studies show that at least three fingers are necessary for stable grasp. On the other hand, there are only a few multifingered haptic devices that are commercially available today. For example, CyberGrasp from Immersion Corporation is an exoskeleton having
126
VE Components and Training Technologies
individual wires pulling each finger to prevent its penetration into a virtual object during the simulation of grasping. Designing and building multifingered haptic devices becomes increasingly more difficult as the degrees of freedom of the device increases. Hardware for displaying distributed forces on the skin also remains a challenging problem. Very crude tactile displays for VEs are now available in the market; many of them are vibrotactile displays. The tactile devices developed in research laboratories are mostly in the form of an array of pins actuated individually. Packaging an array of actuators that does not break or hinder an active user is highly challenging, and new technologies must be explored to make significant progress in this area (see the review in Biggs & Srinivasan, 2002). There are also several challenges that remain to be solved in the area of haptic rendering. Computational cost of rendering virtual objects grows drastically with the geometric complexity of the scene, type of haptic interactions, and the material properties of the objects (for example, soft versus rigid). The simulation of haptic interactions between a point probe and a rigid virtual object has been achieved (that is, 3 degrees of freedom haptic rendering), and many of the point based rendering algorithms have been already incorporated into commercial software products, but the simulation of object-object interactions is still an active area of research (see the review of 6 degrees of freedom haptic rendering techniques in Otaduy & Lin, 2008). While point based interaction approaches are sufficient for the exploration of object surfaces, more advanced rendering techniques are necessary for simulating tool-object interactions. For example, in medical simulation, side collisions occur frequently between simulated surgical instruments and deformable organs, and 3 degrees of freedom haptic rendering techniques cannot accurately handle this situation (see the details in Bas¸dog˘an, De, Kim, Muniyandi, & Srinivasan, 2004). In fact, simulating the nonlinear dynamics of physical contact between an organ and a surgical instrument, as well as surrounding tissues, is very challenging, and there will be a continued demand for efficient algorithms, especially when the haptic display needs to be synchronized with the display of visual, auditory, and other modalities. In this regard, one of the missing components is the lack of detailed human-factors studies. Even if we assume that the hardware and software components of visual, haptic, and auditory displays will improve one day to provide richer stimulation for our sensory channels, the perception of the information is still going to be performed by the user. Hence, a better understanding and measurement of human perceptual and cognitive abilities is important for more effective training and better training transfer. OLFACTORY DISPLAYS Today few virtual environments employ olfactory displays. Nonetheless, olfaction is an important sense and has been shown to stimulate both emotional (Corbin, 1982) and recall (Chu & Downes, 2000; Degel, Piper, & Koester, 2001) responses. Most of us have had the experience of detecting a specific smell that “took us back” to a place or an event. In addition, olfaction can have both directional and nondirectional capabilities. From a training perspective there are certainly
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
127
virtual environment application areas (medical, combat, electronic fault detection, and so forth) that have a demonstrable need for an olfactory “dimension.” Technology Work on olfactory displays is fairly recent. A good body of literature on olfaction is extant (see, for example, Ohloff, 1994). Barfield and Danas (1996) established the “baseline” for this display technology in their paper. Another excellent compilation is Joseph Kaye’s (2001) MIT master’s thesis “Symbolic Olfactory Display.” These two resources gather what was known prior to 2000 about olfaction and about technologies that, in principle, can support olfactory displays. A more recent review is that of Gutierrez-Osuna (2004). Myron W. Krueger (1995) specifically worked on the issue of mixing odorants to achieve specific scents in the context of medical simulation. The Sensorama (Heilig, 1962) included a fan and a container that enclosed an odor-producing chemical and a device to open the container at a time that corresponded to visual images congruent with the odor produced by the container’s contents. This system is likely the earliest example of an olfactory display integrated with other display devices. Heilig’s (1962) approach to producing smells on demand has not been significantly improved upon in the intervening years. At issue is the ability to (1) produce a specific smell when needed, (2) deliver the smell to the nose(s) of the user(s), and (3) dissipate the smell when it is no longer required. Each of these three elements presents serious technical challenges. Producing a specific smell is, in many cases, beyond current technical capabilities. Some smells are associated with chemicals or chemical reactions and may be producible on demand (see, for example, Krueger, 1995). Rakow and Suslick (2000) have developd technology to detect odors and have proposed the creation of a “scent camera,” but their company, ChemSensing, Inc., has not yet marketed such a device. In 2000 a Korean company, E-One, proposed developing such a device as well (see http://transcripts.cnn.com/TRANSCRIPTS/0005/04/nr.00.html), but, again, nothing has reached the market. Delivery mechanisms are another area of concern. Just as in Heilig’s (1962) approach, most devices depend on a fan to deliver the scent to the user or users. The Institute for Creative Technologies at the University of Southern California has developed a neck-worn system that places the source close to the user’s nose (see http://ict.usc.edu/projects/sensory_environments_evaluation/). A good summary of past and current commercially available olfactory displays has been compiled by Washburn and Jones (2004). It is noteworthy that many commercial enterprises established to develop and market olfactory displays have not survived. An additional review is included in Davide, Holmberg, and Lundstro¨m (2001). Training Applications Although their potential has been recognized, few have attempted to incorporate olfactory displays in training applications. One of the first was developed at
128
VE Components and Training Technologies
the Southwest Research Institute (Cater, 1994)—a virtual environment for training firefighters that provided both olfactory stimulation as well as thermal, visual, and auditory displays. Researchers in the U.S. Army have investigated the use of olfactory displays to provide the smell of blood, cordite, and other scents of the battlefield (Washburn, Jones, Satya, Bowers, & Cortes, 2003). In spite of these efforts, no virtual environment training applications have been deployed (as of this writing) that incorporate an olfactory display. Challenges Challenges abound as noted earlier. The most difficult is probably in the area of producing, on demand, a specific smell that fits the context of the training application. The solution to the problem will be a mixture of both science and art. The challenge of delivering the smell to the user(s) and dissipating it, while complex, is fairly straightforward and will likely have a variety of solutions depending on the training application’s objectives and its physical relationship to the user(s). In spite of these issues, it must be recognized that there are human variables that will be beyond the control of the application and its users/operators. These variables include transitory (for example, the common cold) and permanent inability, on the part of the user, to actually detect a delivered scent. GUSTATORY DISPLAYS Technology While gustatory (taste) displays have been discussed in the literature, no system has yet emerged that can be evaluated. Beidler (1971) provides a compendium of knowledge of the basis for the sense of taste in his handbook. Much more recently Maynes-Aminzade (2005) offered a light-hearted suggestion for “edible user interfaces” at CHI 2005. Food science does provide a basis for the development of gustatory displays. For example, handbooks (see, for example, Deibler & Delwiche, 2003) provide access to the literature of “taste” from a variety of perspectives. Thus, it can be said that we do know a great deal about how the sense of taste “works” and how to produce, chemically, some specific tastes. Just as in olfaction, however, translating this knowledge into a practical application will be quite difficult. Challenges Again, just as in the case of olfaction, there are the usual challenges of producing a specific taste on demand, delivering the taste sensation to the user(s), and then eliminating the taste as required. Beyond these problems we have the additional issues of the strong relationship between taste and smell (Ohloff & Thomas, 1971) and of large human variability in the abilities to discern a specific taste and, collectively, agree on a characterization of that taste.
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
129
VESTIBULAR DISPLAYS Vestibular displays provide the senses of acceleration and orientation to the user. Obviously (see the illustration in Figure 5.4 of an early flight simulator, circa 1910), these “displays” are often mechanical in nature and provide a straightforward means of subjecting the user to the movements (accelerations) and orientations necessary for effective training. This recognition, over a hundred years ago, led to a robust industry dedicated to motion based platforms on which users are placed. Such platforms are found in many high performance flight simulators that routinely train pilots to operate aircraft and spacecraft (see, for example, Rolfe & Staples, 1988). Flight simulators certainly represent one “class” of virtual environments used for training (Brooks, 1999), but there are others that should be mentioned. Submarine simulators (for example, that are operated by the U.S. Navy at Pearl Harbor, Hawaii) typically incorporate motion bases for orientation. During a steep dive or an emergency surfacing operation, these simulators provide the users with the direct experience of trying to stay at or get to their stations in spite of a steeply sloping deck. Technology The technology of vestibular displays as represented by motion based platforms was reviewed thoroughly in Stanney (2002) and that source remains current as of this writing. Two specific points will be made here, however. The first is that low cost, “personal” motion based platforms are available and may be integrated with a virtual environment designed for training (see, for example,
Figure 5.4.
This is a picture of an early (1910) flight simulator.
130
VE Components and Training Technologies
Sterling, Magee, & Wallace, 2000). This integration is very simple, assuming that the necessary control software is available for a specific motion base. A second and rather interesting technique is to directly stimulate the human vestibular system. Cress and his colleagues have demonstrated that electrodes can be used to create the sensation of acceleration in a subject (Cress et al., 1997). Obviously, this technology may not find widespread acceptance and has not been studied in a large population to determine its degree of safety. Training Applications Given the literature references made above, we will not try to exhaustively address vestibular displays as a part of virtual environments for training in a global sense. Rather, here we will consider only the typical virtual environments (goggles and gloves) that have been developed for training purposes. The system described by Sterling et al. (2000) is, perhaps, the best example. In this case, the authors examined a “low cost” helicopter landing simulator using a small motion base, minimum controls, and a head-mounted display in comparison to a high cost, large-scale helicopter simulator. The results of their research suggest that the low cost system’s effectiveness in training was comparable to that of the high end system. Challenges With the availability of low cost motion based platforms, the ability to integrate these into virtual environments designed for training is at hand. What is lacking is the requirement to do so. One possible explanation for this lack of applications is the dominance of the visual sense. For example, ship bridge simulators almost never incorporate motion based platforms. Yet, as anyone who has used such a simulator can attest, it is easy to get seasick if the visual displays provide scenes that incorporate only visual motion. SUMMARY Multimodal displays are an essential component of virtual environments used for training. Such displays offer the potential to provide sensory channels that are essential for some training applications. If virtual environments were limited strictly to visual and auditory displays, a significant fraction of the human sensory spectrum would be ignored. In some cases this could lead to less effective training or even to negative training. This chapter addresses displays for the haptic, olfactory, gustatory, and vestibular sensory channels. Haptic displays, while limited, do offer significant technical maturity in some applications and have been demonstrated to add effectiveness to some training applications. Olfactory and gustatory displays are largely not available and have not yet been incorporated into fielded virtual environments for training. Vestibular displays are widely used in many virtual
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
131
environments designed for training in aircraft and spacecraft piloting. The use of lower cost versions of these displays is now possible, and it is anticipated that they will find their way into more widely deployed applications. In all cases, significant challenges remain before these display modalities will become as common on visual and auditory displays, yet the need to deliver highly effective training demands that these technologies be available. REFERENCES Abbott, J. J., & Okamura, A. M. (2003). Virtual fixture architectures for telemanipulation. Proceedings of the 2003 IEEE International Conference on Robotics & Automation (Vol. 2, pp. 2798–2805). New York: Institute of Electrical and Electronics Engineers. Aleotti, J., Caselli, S., & Reggiani, M. (2005). Evaluation of virtual fixtures for a robot programming by demonstration interface. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 35(4), 536–545. Ambrose, R. O., Aldridge, H., Askew, R. S., Burridge, R. R., Bluethmann, W., Diftler, M., Lovchik, C., Magruder, D., & Rehnmark, F. (2000). Robonaut: NASA’s space humanoid. IEEE Intelligent Systems & Their Applications, 15(4), 57–62. Barfield, W., & Danas, E. (1996). Comments on the use of olfactory displays for virtual environments. Presence, 5(1), 109–121. Bas¸dog˘an, C., & Bergman, L. (2001, February). Multi-modal shared virtual environments for robust remote manipulation with collaborative rovers. Paper presented at the USC Workshop on Touch in Virtual Environments, Los Angeles, CA. Bas¸dog˘an, C., De, S., Kim, J., Muniyandi, M., & Srinivasan, M. A., (2004). Haptics in minimally invasive surgical simulation and training. IEEE Computer Graphics and Applications, 24(2), 56–64. Bas¸dog˘an, C., Kiraz, A., Bukusoglu, I., Varol, A., & Doganay, S. (2007). Haptic guidance for improved task performance in steering microparticles with optical tweezers. Optics Express, 15(18), 11616–11621. Bas¸dog˘an, C., Laycock, S. D., Day, A. M., Patoglu, V., & Gillespie, R. B. (2008). 3-DoF haptic rendering. In M. C. Lin & M. Otaduy (Eds.), Haptic rendering (pp. 311–331). Wellesley, MA: A K Peters. Bas¸dog˘an, C., Sedef, M., Harders, M., & Wesarg, S. (2007). Virtual reality supported simulators for training in minimally invasive surgery. IEEE Computer Graphics and Applications, 27(2), 54–66. Beidler, L. M. (Ed.). (1971). Handbook of sensory physiology. Volume IV: Chemical senses. Part 1: Olfaction. Berlin: Springer-Verlag. Bettini, A., Lang, S., Okamura, A., & Hager, G. (2002). Vision assisted control for manipulation using virtual fixtures: Experiments at macro and micro scales. Proceedings of the IEEE International Conference on Robotics and Automation (Vol. 2, pp. 3354–3361). Piscataway, NJ: Institute of Electrical and Electronics Engineers. Biggs, S. J., & Srinivasan, M. (2002). Haptics interfaces. In K. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 93–116). Mahwah, NJ: Lawrence Erlbaum. Brooks, F. P., Jr. (1999). What’s real about virtual reality. Computer Graphics and Applications, 19(6), 16–27.
132
VE Components and Training Technologies
Bukusoglu, I., Bas¸dog˘an, C., Kiraz, A., & Kurt, A. (2006). Haptic manipulation of microspheres with optical tweezers. Proceedings of the 14th IEEE Symposium on Haptic Interfaces for Virtual Environments and Teleoperator Systems (pp. 361–365). Washington, DC: IEEE Computer Society. Burdea, G. (1996). Force and touch feedback for virtual reality. New York: John Wiley & Sons. Cater, J. P. (1994). Approximating the senses. Smell/taste: Odors in virtual reality. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (Vol. 2, p. 1781). New York: IEEE Computer Society. Chu, S., & Downes, J. J. (2000). Odor-evoked autobiographical memories: Psychological investigations of Proustian phenomena. Chemical Sensors, 25, 111–116. Corbin, A. (1982). Le Miasme et la jonquille: L’odorat et l’imaginaire social. XVIIIhXIXm sie`cles. Paris: Librairie Chapitre. Cress, J. D., Hettinger, L. J., Cunningham, J. A., Riccio, G. E., McMillan, G. R., & Haas, M. W. (1997). An introduction of a direct vestibular display into a virtual environment. Proceedings of the 1997 Virtual Reality Annual International Symposium (pp. 80–86). Washington, DC: IEEE Computer Society. Dang, T., Annaswamy, T. M., & Srinivasan, M. A. (2001). Development and evaluation of an epidural injection simulator with force feedback for medical training. In J. D. Westwood (Ed.), Proceedings of Medicine Meets Virtual Reality (pp. 97–102). Washington, DC: IOS Press. Davide, F., Holmberg, M., & Lundstro¨m, I. (2001). Virtual olfactory interfaces: Electronic noses and olfactory displays. In G. Riva & F. Davide (Eds.), Communications through virtual technology: Identity community and technology in the internet age (pp. 193– 220). Amsterdam: IOS Press. Davidson, S. W. (1996). A haptic process architecture using the PHANToM as an I/O device in a virtual electronics trainer. In J. K. Salisbury & M. A. Srinivasan (Eds.), Proceedings of the First PHANToM Users Group Workshop (Tech. Rep. No. AI-TR1596; pp. 35–38). Cambridge, MA: Massachusetts Institute of Technology. Available from http://www.sensabledental.com/documents/ documents/PUG1996.pdf Degel, J., Piper, D., & Koester, E. P. (2001). Implicit learning and implicit memory for odors: The influence of odor identification and retention time. Chemical Senses, 26, 267–280. Deibler, K. D., & Delwiche, J. (Eds.). (2003). Handbook of flavor characterization: Sensory, chemical, and physiological techniques (Food Science and Technology). Paris: Lavoisier Publishing. Durlach, N. I., Srinivasan, M. A., van Wiegand, T. E., Delhorne, L., Sachtler, W. L., Cagatay Basdogan, C., et al. (1999). Virtual environment technology for training (VETT). Cambridge, MA: Massachusetts Institute of Technology. Available from http:// www.rle.mit.edu/media/pr142/23_VETT.pdf Griffin, W. B. (2003). Shared control for dexterous telemanipulation with haptic feedback. Unpublished doctoral dissertation, Stanford University, Palo Alto. Gutierrez-Osuna, R. (2004). Olfactory interaction. In W. S. Bainbride (Ed.), Berkshire encyclopedia of human-computer interaction (pp. 507–511). Great Barrington, MA: Berkshire Publishing. Heilig, M. L. (1962). United States Patent US3050870.
Multimodal Display Systems: Haptic, Olfactory, Gustatory, and Vestibular
133
Kaye, J. (2001). Symbolic olfactory display. Unpublished master’s thesis, Massachusetts Institute of Technology, Cambridge. Available from http://alumni.media.mit.edu /~jofish/thesis/symbolic_olfactory_display.html Kortum, P. (Ed.). (2008). CHI beyond the GUI: Design for haptic, speech, olfactory, and other non traditional interfaces. Burlington, MA: Morgan Kaufmann (Elsevier). Krebs, H. I., & Hogan, N. (2006). Therapeutic robotics: A technology push. Proceedings of the IEEE, 94(9), 1727–1738. Krueger, M. W. (1995). Olfactory stimuli in virtual reality for medical applications. In K. Morgan, R. M. Satava, H. B. Sieburg, et al. (Eds.), Interactive technology and the new paradigm for healthcare (pp. 180–181). Amsterdam: IOS Press. Lederman, S. J., & Klatzky, R. L. (1987). Hand movements: A window into haptic object recognition. Cognitive Psychology, 19, 342–368. Loftin, R. B. (2003). Multisensory perception: Beyond the visual in visualization. Computers in Science and Engineering, 5(4), 565–568. Loftin, R. B., & Kenney, P. J. (1995). Training the Hubble Space Telescope flight team. IEEE Computer Graphics and Applications, 15(5), 31–37. Massie, T. H., & Salisbury, J. K. (1994). The PHANToM haptic interface: A device for probing virtual objects. Proceedings of the ASME Winter Annual Meeting, Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, 55(1), 295–300. Maynes-Aminzade, D. (2005). Edible bits: Seamless interfaces between people, data and food. CHI2005 Extended Abstracts (pp. 2207–2210). New York: ACM Press. Ohloff, G. (1994). Scent and fragrances. Berlin: Springer-Verlag. Ohloff, G., & Thomas, A. (Eds.). (1971). Gustation and olfaction. New York: Academic Press. O’Malley, M., & Ambrose, R. (2003). Haptic feedback applications for robonaut. Industrial Robot, 30(6), 531–542. Otaduy, M. A., & Lin, M. C. (2008). Introduction to haptic rendering algorithms. In M. C. Lin & M. Otaduy (Eds.), Haptic rendering (pp. 159–176). Wellesley, MA: A K Peters. Payandeh, S., & Stanisic, Z. (2002). On application of virtual fixtures as an aid for telemanipulation and training. Proceedings of 10th IEEE International Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems (pp. 18–23). Washington, DC: IEEE Computer Society. Rakow, N. A., & Suslick, K. S. (2000). A colorimetric sensor array for odour visualization. Nature, 406, 710–1784. Rolfe, J. M., & Staples, K. J. (Eds.). (1988). Flight simulation. Cambridge, MA: Cambridge University Press. Rosenberg, L. B. (1993). Virtual fixtures: Perceptual tools for telerobotic manipulation. Proceedings of IEEE Annual Virtual Reality International Symposium (pp. 76–82). Piscataway, NJ: IEEE Computer Society. Salisbury, K., Brock, D., Massie, T., Swarup, N., & Zilles, C. (1995). Haptic rendering: Programming touch interaction with virtual objects. Proceedings of the Symposium on Interactive 3D Graphics (pp. 123–130). New York: ACM. Salisbury, K., Conti, F., & Barbagli, F. (2004). Haptic rendering: Introductory concepts. IEEE Computer Graphics and Applications, 24(2), 24–32. Shams, L., Kamitani, Y., & Shimojo, S. (2000). What you see is what you hear. Nature, 408, 788.
134
VE Components and Training Technologies
Srinivasan, M. A., & Bas¸dog˘an, C. (1997). Haptics in virtual environments: Taxonomy, research status, and challenges. Computers and Graphics, 21(4), 393–404. Stanney, K. M. (2002). Handbook of virtual environments: Design, implementation, and applications. Mahway, NJ: Lawrence Erlbaum. Sterling, G. C., Magee, L. E., & Wallace, P. (2000, March). Virtual reality training—A consideration for Australian helicopter training needs? Paper presented at the SimTecT 2000 Conference, Sydney, Australia. Washburn, D. A., & Jones, L. M. (2004). Could olfacatory displays improve data visualization? Computing in Science and Engineering, 6(6), 80–83. Washburn, D. A., Jones, L. M., Satya, R. V., Bowers, C. A., & Cortes, A. (2003). Olfactory use in virtual environment training. Modeling and Simulation, 2(3), 19–25.
Chapter 6
MIXED AND AUGMENTED REALITY FOR TRAINING Steven Henderson and Steven Feiner Augmented reality (AR) extends the capabilities and training benefits of virtual reality (VR) by integrating virtual content with a user’s natural view of the environment, combining real and virtual objects interactively and aligning them with each other (Azuma et al., 2001). This is accomplished by using displays that can overlay virtual objects in the real world and registering the virtual and real worlds through the use of tracking. While some of these technologies are the same as or similar to those used in VR (Welch and Davis, Volume 2, Section 1, Chapter 1; Bolas and McDowall, Volume 2, Section 1, Chapter 2; and Towles, Johnson, and Fuchs, Volume 2, Section 1, Chapter 3), there are important differences, which we review later in this chapter. Figure 6.1 shows an experimental AR maintenance training system that uses a stereo head-worn display with a pair of attached cameras whose imagery is digitally combined with computer graphics. AR can support training by complementing essential physical world characteristics with powerful virtual training constructs. AR thus preserves the natural context, realism, and multisensory interaction of a task, while adding such virtual enablers as overlaid instructions, feedback, and cuing, as well as representations of additional physical objects. It is useful to position AR within the larger VR context of this text. Milgram, Takemura, Utsumi, and Kishino (1994) present a reality-virtuality continuum for classifying AR and VR applications. The Reality-Virtuality continuum (Figure 6.2) spans fully real environments at one end and fully virtual environments at the other. Mixed reality (MR) applications exist along this continuum, with AR systems deriving a majority of their content from the real environment and augmented virtuality systems including a majority of their content from the virtual environment. Following common usage, we will use the term AR for the entire range of MR systems.
136
VE Components and Training Technologies
Figure 6.1.
Experimental AR Maintenance Training System
Mixed and Augmented Reality for Training
Figure 6.2.
137
The Reality-Virtuality Continuum (Milgram et al., 1994)
CAPABILITIES AND BENEFITS OF AR FOR TRAINING Preservation of Task Context and Environment By displaying virtual content directly on the real world environment, AR allows users to focus on the particular task or procedure being trained. This preserves the task’s context, which is desirable for several reasons (Tang, Owen, Biocca, & Mou, 2003). First, maintaining context helps the trainee synthesize supporting information and make decisions within a constant, spatially accurate mental model. This reduces cognitive load and facilitates training transfer. Second, overlaying virtual versions of content normally presented in separate documentation can greatly reduce head and eye movement. This decreases work load, potentially improving health, efficiency, and endurance. Third, because the trainee operates in the task’s natural context, AR can provide helpful cuing information, such as labels, arrows, and other artifacts to aid in physically negotiating a training scenario. This can reduce the transition time required to move among spatially extended subtasks, potentially shortening overall training time. Finally, feedback is provided in real time and within the natural view of a particular task. According to Vreuls and Obermayer (1985), this is a desirable characteristic for training feedback and performance diagnosis. In addition to preserving context, AR also preserves the physical environment of a training scenario. This might involve using the complete native environment (for example, using AR at the actual job site) or a subset of the environment (for example, using AR to train aircraft engine repair in a classroom). Preservation of the natural environment provides important internal and external stimuli that might be difficult or expensive to replicate. This promotes realism, which can increase the confidence and knowledge transfer of trainees. Dual Use Systems for Task Training and Execution AR can operate across a wide continuum of virtual to real content, while preserving a common software and hardware architecture. This makes possible “dual-use” systems supporting training and execution of a task, depending on when, where, and what virtual content is displayed. This could allow a single AR system to support training and serve as a job aid for task execution. For
138
VE Components and Training Technologies
example, a system might initially present training material using mostly virtual content. This could be advantageous at early stages of learning when it is important not to overwhelm a student and impractical to expose him or her to the complete physical task environment. As the student gains knowledge and experience, more of the physical world might be phased into the training. The system could ultimately transition to become a full-time job aid, while supporting continuation or remedial training in the future. These types of systems have proven to significantly increase competency in certain types of tasks (Boud, Haniff, Baber, & Steiner, 1999). Moreover, such systems can naturally promote user acceptance. If users are trained with an AR system, they will come to view the system as a supplement to the particular task or procedure (Young, Stedman, & Cook, 1999). Hidden Objects and Information AR can display objects that are normally hidden from a user’s view. This includes objects hidden by design (for example, internal structures and mechanical subcomponents), objects hidden by visual occlusion (for example, landmarks obstructed by other buildings in a dense city), and objects hidden by the limits of human perception (for example, molecular structures). Since hidden objects might be vital, if not central, to a trainee’s understanding of a particular task or procedure, AR can employ virtual objects to display or otherwise accentuate hidden real ones. This can involve using cutaway views (Figure 6.3) that fully replace portions of the real world with three-dimensional (3-D) models, semitransparent overlays, pointers, or other techniques.
Figure 6.3. Example Cutaway View Using Augmented Reality (Courtesy of Andrei State, University of North Carolina)
Mixed and Augmented Reality for Training
139
In addition to displaying hidden physical objects, AR can also display otherwise invisible information, such as parts labels, superimposed instructions, visual cues, and training feedback. Mobility Because AR training systems augment a user’s natural view of the environment, the majority of these systems are potentially mobile. This can increase application portability and accessibility and reduce costs by minimizing the need for specialized or fixed facilities. However, mobility brings about a unique set of challenges, such as wide-area tracking, wearability, and power constraints. AR RESEARCH AND APPLICATIONS FOR TRAINING Foundations The origins of AR date back to the early days of computer graphics research. From the late 1960s to the early 1970s, Ivan Sutherland and his students, working at Harvard University and the University of Utah, developed the first positionand orientation-tracked see-through head-worn display (HWD) for viewing computer-generated objects (Sutherland, 1968), along with the first AR interaction techniques (Vickers, 1972). In the 1970s and 1980s, a small number of researchers studied AR at such institutions as the United States Air Force Armstrong Research Lab, the National Aeronautics and Space Administration Ames Research Center, and The University of North Carolina at Chapel Hill. In the early 1990s, Caudell and Mizell (1992) coined the term “augmented reality,” introducing the idea of using AR to replace the large boards, paper templates, and manual documentation used in constructing wire harnesses for aircraft. This work resulted in several fielded experimental systems (Mizell, 2001). Bajura, Fuchs, and Ohbuchi (1992) developed a system that allowed a user equipped with an HWD to view live ultrasound imagery directly on a patient. Feiner, MacIntyre, and Seligmann (1993) demonstrated how AR could be used to aid in servicing a laser printer. Their system, shown in Figure 6.4, interactively generated 3-D maintenance instructions using a rule based component that tracked the position and orientation of the user and selected objects in the environment; it featured a tracked, optical see-though HWD that overlaid the instructions on the user’s natural view of the task. Over the past decade, researchers have explored the benefits of using AR for training in a number of application areas, which we review in the remainder of this section. Industrial Training Applications Augmented Reality for Development, Production, and Servicing (ARVIKA) was one of the largest AR research projects targeting the industrial domain. This collaborative effort, funded by the German Ministry of Education and Research
140
VE Components and Training Technologies
Figure 6.4. Servicing a Laser Printer Using Augmented Reality (Feiner et al., 1993)
from 1999 to 2003, developed AR applications in the automotive, aerospace, power processing, and machine tool production sectors (Friedrich, 2002). Advanced Augmented Reality Technologies for Industrial Service Applications (ARTESAS) was a descendant of ARVIKA, focusing on automotive and aerospace maintenance. ARTESAS produced several prototype applications (Figure 6.5), which are featured in a compelling video demonstration (ARTESAS, 2007). The AMIRE (authoring mixed reality) project explored using an AR training system to acquaint workers with an oil refinery (Hartmann, Zauner, & Haller, 2004). A tablet personal computer (PC) presented repair instructions, navigation information, and component labeling on video captured from the computer’s video camera. Tracking was provided by an optical tracking system that used markers to identify checkpoints throughout the refinery. Boulanger (2004) demonstrated an AR application for training repair of a telecommunication switch, in which trainees are guided by a remote tutor. The tutor watches real time video of the trainee’s field of view and guides the trainee using
Mixed and Augmented Reality for Training
Figure 6.5. 2007)
141
Conducting Maintenance Using the ARTESAS Prototype (ARTESAS,
virtual arrows overlaid on an HWD and two-way voice communication. Optically tracked markers are associated with 3-D models of the telecommunication switch and its main subcomponents (Figure 6.6). This allows the trainee and the tutor to manipulate the models to query and demonstrate repair procedures. Academic Education Researchers have also demonstrated AR training systems for academic subjects. Sheldon and Hedley (2002) designed a system to help teach undergraduates about Earth-Sun relationships. Students manipulate a 3-D model of the Earth with an optically tracked marker that allows them to change their viewing perspective and visualize learning objectives. Kaufmann and Schmalstieg (2002) addressed mathematics education in a system that allows students to collaboratively create and modify 3-D geometric models. A powerful layering concept gives control over each individual’s view and enables the instructor to tailor the learning experiences and diagnose the progress of individual students.
142
VE Components and Training Technologies
Figure 6.6. Telecommunications Equipment Servicing Using Augmented Reality (Courtesy of Pierre Boulanger, The Department of Computing Science at the University of Alberta, The School of Information Technology and Engineering, University of Ottawa, Ontario, Canada)
Medicine Because of the dangers involved in some medical procedures, training is often performed with simulators. For example, Sielhorst, Obst, Burgkart, Riener, and Navab (2004) describe an obstetric training system that overlays graphics on anatomically correct mannequins. AR can also assist patients. For example, Luo, Kenyon, Kline, Waldinger, and Kamper (2005) demonstrated a system that teaches stroke victims to perform grasp-and-release exercises as part of finger extension rehabilitation. An HWD presents the patient with 3-D virtual objects to grasp, and an assistive orthosis, controlled by a therapist, provides dynamic assistance and tangible feedback as the patient grasps the target object.
Military and Aerospace The military and aerospace domains are especially amenable to AR training applications. Livingston et al. (2002) introduced the Battlefield Augmented Reality System (BARS), a wearable situational awareness aid for warfighters that also doubled as a training aid. Brown, Stripling, and Coyne (2006) extended this work, conducting a user study that explored the use of BARS for training tactics. FlatWorld (Pair, Neumann, Piepol, & Swartout, 2003) used AR to augment Hollywood-inspired modular “flats” (movie set props) to create compelling environments for military training (Towles, Johnson, and Fuchs, Volume 2, Section 1, Chapter 3). Macchiarella and Vincenzi (2004) explored how AR can be used to
Mixed and Augmented Reality for Training
143
teach aircraft design principles to mechanics and pilots. Further examples are described by Regenbrecht, Baratoff, and Wilke (2005). Teleoperation Milgram, Zhai, Drascic, and Grodski (1993) investigated AR for teleoperation. Their Augmented Reality through Graphic Overlays on Stereovideo project helps human operators visualize and control a remote robot’s view of the environment. The same AR cues and information used in visualization are applied to communicate spatially accurate commands to the robot. Lawson, Pretlove, Wheeler, and Parker (2002) created a telerobotic AR system to aid in surveying and measuring remote environments, such as sewer pipes. GENERAL STRUCTURE OF AR TRAINING SYSTEMS Like VR in general, AR uses a wide range of different technologies, although with certain key differences. To help characterize the structure of AR training systems, we rely on a generalized design hierarchy, proposed by Bimber and Raskar (2005), shown in Figure 6.7. The base level of this model includes hardware and software for tracking real world objects, displaying information to the user, and rendering computergenerated content. Most AR research efforts to date address this level. The intermediate level, implemented mostly in software, interacts with the user, presents and arranges content, and provides authoring tools. This level has not received enough emphasis in the past and contains many open research issues (Rekimoto, 1995). The application level, implemented entirely in software, consists of the overarching AR application and serves as the primary interface to the user. The user level represents the end user and is included to emphasize the human role.
Figure 6.7. (2005)
Generalized AR Design Hierarchy; Adapted from Bimber & Raskar
144
VE Components and Training Technologies
Display AR display technologies perform the task of merging virtual and real world environments. They fall into four general categories: head worn, handheld, stationary, and projective. Other chapters provide more general coverage of head-worn displays (Bolas and McDowall, Volume 2, Section 1, Chapter 2) and projective displays (Towles, Johnson, and Fuchs, Volume 2, Section 1, Chapter 3). In this section, we briefly examine displays in the context of AR. Head-worn displays are worn on the user’s head and present imagery to one eye (monocular) or both eyes (biocular if the images seen by both eyes are the same, binocular if the images form a stereo pair). HWDs (and other AR display technologies) are further categorized according to how they combine views of the real and virtual worlds. Optical see-through displays provide a direct view of the real world (mediated only by optical elements) and overlay virtual content on top of this view. The real and virtual worlds are merged using optical combiners, such as half-silvered mirrors or prisms. Video see-through displays use cameras to capture real world imagery, combine the real and virtual content digitally, and present it on the same displays. Optical see-through displays have the advantage of presenting the real world at its full spatial resolution, with no temporal lag, full stereoscopy, and no mismatch between vergence (the angle between the lines of sight from each eye to a given real world object) and accommodation (the distance at which the eyes must focus to perceive that object). However, luminance is lost because of the reflectivity of the combiner, and many designs include filtration to avoid overwhelming the relatively dim displays used for the virtual world. The lag-free view of the real world also emphasizes the lag that occurs in presenting the virtual world. Commercially available optical see-through displays cannot selectively suppress the view of any part of the real world, so bright real world objects can be seen through virtual objects that are in front of them, even when the virtual objects are supposed to be opaque. One experimental system has overcome this obstacle by introducing a liquid-crystal array and additional optics in the optical path to the real world, allowing selectable areas of the real world to be blocked (Kiyokawa, Billinghurst, Campbell, & Woods, 2003). In contrast, video see-through displays have the advantage of allowing essentially arbitrary processing of both the real and virtual worlds, making it possible to render virtual objects that fully obscure real ones. However, the real world is rendered at the resolution of the camera and display. Furthermore, because all imagery is typically presented on one display at a fixed perceived distance, both the real and virtual imagery typically suffer from vergence-accommodation mismatch, although this can be minimized if the system is used for content viewed at a preselected distance. When the camera is not effectively coincident with the user’s eye, parallax error results, causing the image of the real world to differ geometrically from what the user would see directly. This can be addressed by careful optical design, for example, using mirrors to fold the optical path to the camera (State, Keller, & Fuchs, 2005). Finally, limited field of view and
Mixed and Augmented Reality for Training
145
resolution in video see-through systems can make navigating large environments difficult. Two sets of factors dominate the selection of HWDs. The first set is a function of electronic and optical properties, including display resolution, color capability, field of view, transmissivity, and stereoscopy. The second set is a function of the size, weight, and appearance of these devices. Unfortunately, optimizing one set of factors typically comes at the expense of the other. Current commercial stereoscopic HWDs are significantly larger and heavier than a pair of standard eyeglasses, the proverbial gold standard for mainstream acceptance. However, the potential market created by consumer handheld entertainment devices is making lightweight HWDs of resolution comparable to desktop displays commercially feasible. Handheld video see-through displays (Rekimoto, 1997) couple a display screen and integrated camera; examples include mobile phones, media players, portable game machines, tablet PCs, and Ultra-Mobile PCs. While the small physical field of view of many of these devices (often mismatched with a wider camera field of view) and the need for handheld operation make them poorly suited for many AR training applications, they can play auxiliary roles as input devices or special viewing devices (Goose, Gu¨ven, Zhang, Sudarsky, & Navab, 2004). Stationary displays mounted in the user’s environment can be larger and heavier than head-worn or handheld displays, making them well suited for through-the-window applications in vehicles or other situations in which the display can be placed between the user and the augmented environment. For example, Olwal, Lindfors, Gustafsson, Kjellberg, and Mattsson (2005) describe a system that overlays operational data on the user’s view of a milling machine. Projective displays project virtual content directly onto the real world (Bimber & Raskar, 2005). The advantages of this approach include the ability to view an augmented environment without wearing a display or computer. Bright projectors combined with relatively reflective task surfaces can make this a good approach for some indoor domains, especially when multiple users need to experience the same augmented environment. However, many of these systems assume that all virtual material is intended to lie on the projected surface, limiting the kind of geometry that can be presented. Stereo projection is possible, in conjunction with special eyewear or the use of optical combiners in the environment, often in conjunction with head tracking. While many projective systems use stationary projectors, head-worn projective displays (Hua, Gao, Brown, Biocca, & Rolland, 2002) use lightweight head-worn projectors whose stereo imagery is reflected in the direction of the user from specially treated retroreflective surfaces in the environment. This allows multiple users to view individually tracked imagery on the same surfaces. Tracking and Registration Three considerations dominate the selection and integration of AR tracking technologies: registration, mobility, and frame of reference. Registration refers to the need to properly locate and align virtual objects with their real world
146
VE Components and Training Technologies
counterparts and is one of the most important concerns in creating effective AR systems (Bimber & Raskar, 2005). Registration errors typically result from five sources: distortion in the display, imperfections in the virtual 3-D model, mechanical misalignments, incorrect display settings, and tracking errors. Since AR includes both real and virtual material, inaccurate tracking is much easier to detect. The second tracking consideration is mobility. Training often involves large operating areas with varied lighting, magnetic, and structural conditions. This can rule out electromagnetic systems and approaches that tether users to their surroundings. An application’s frame of reference represents the third important AR tracking consideration. Some applications require tracking the user relative to the earth or some other large fixed coordinate system; other applications entail tracking the user relative to specific and often movable objects. Based on these considerations, AR applications often use a subset of the tracking technologies covered in Welch and Davis, Volume 2, Section 1, Chapter 1, which we briefly review here with an emphasis on AR: optical, inertial, global navigation satellite systems, and hybrid. Optical tracking systems detect light directly emitted from light-emitting diodes or other sources, or reflected from passive targets. They include marker based tracking systems, such as ARToolKit (Kato & Billinghurst, 1999) and ARTag (Fiala, 2005), which use video cameras to detect fiducial markers (predetermined black and white or colored patterns as depicted in Figure 6.8) positioned at known locations in the environment. Because cameras, hardware, and
Figure 6.8.
Example Optical Tracking Marker (Fiala, 2005)
Mixed and Augmented Reality for Training
147
software needed to process digital video streams have become commodity items, these systems are inexpensive, provide a high level of accuracy, and work well in mobile environments. When the user views the real world through the same cameras used for tracking, this can also lessen the effects of lens distortion on registration. Since marker based tracking requires that markers be placed in the environment in advance and remain sufficiently visible, researchers have been trying, with encouraging results, to replace markers with natural environmental features, making possible markerless tracking (Bleser, Pastarmov, & Stricker, 2005; Comport, Marchand, Pressigout, & Chaumette, 2006). While many tracking technologies in use for both VR and AR establish a global coordinate system within which the user’s head is tracked, optical marker tracking is often used to track a small number of specific targets relative to the head or display. This is rarely satisfactory for HWD based VR, in which the position and orientation of the head must always be known for the virtual world to look right; however, in AR, the surrounding real world will always appear correct without tracking, and, in many applications, virtual material may need to be registered only with specific tracked objects. Consequently, AR applications of this sort can be implemented with optical marker tracking alone, using the same cameras with which the user views the world in a video see-through display. Inertial tracking systems use accelerometers or gyroscopes, are compact and relatively inexpensive, and are not susceptible to variations in lighting. These characteristics support mobile orientation tracking configurations that work equally well indoors and outdoors. However, since these systems drift significantly over relatively short periods of time, they are typically configured as part of hybrid systems that provide ground truth through some other technology. For example, the earth’s magnetic and gravitational fields may be used for orientation (noting inaccuracies caused by magnetically reactive environments), and global navigation satellite systems may be used for position. Global navigation satellite system (GNSS) receivers determine their positions by computing their distances to satellites at known locations, based on signals that the satellites broadcast. The best known GNSS is the U.S. Global Positioning System. While the accuracy of regular GNSS receivers is measured in meters, some rely on additional differential error correction signals, broadcast from local base stations or satellites, which can provide as good as centimeter-level accuracy in the case of real time kinematic systems. Feiner, MacIntyre, Ho¨llerer, and Webster (1997) used GNSS in their Touring Machine, the first outdoor mobile AR system, which displayed information about buildings on their campus. Other techniques for estimating position include triangulation based on signal strength from known short-range terrestrial sources, such as mobile phone towers or wireless network access points. Hybrid tracking systems employ multiple complementary technologies and fuse their data to form a single estimate for location and/or orientation. For example, State, Hirota, Chen, Garrett, and Livingston (1996) used electromagnetic tracking and marker based vision tracking. High update inertial trackers have been combined with low update natural feature vision tracking (You, Neumann,
148
VE Components and Training Technologies
& Azuma, 1999) and marker based vision tracking (Foxlin & Naimark, 2003), making possible a higher update rate and uninterrupted tracking when natural features or markers are momentarily obscured. Graphics Rendering Graphics rendering refers to the low level computation required to generate virtual content (McDowell, Guerrero, McCue, and Hollister, Volume 2, Section 1, Chapter 8). This process is typically handled by commodity computer graphics cards and is normally transparent to a particular application. Since the real world is either not rendered (for optical see-through displays) or is captured by cameras (for video see-through displays), rendering requirements can be significantly lower for AR than for VR. However, rendering requirements can still present significant challenges for applications with demanding virtual content, especially for wearable systems. Interaction Devices and Techniques While many of the interaction devices and techniques used for VR are also used for AR, we focus on those that are particularly designed to address the coexistence of real and virtual objects. As in VR and 3-D user interfaces (UIs) in general, a dominating set has yet to emerge, as it has for 2-D UIs. However, some interesting approaches include the following: Wand based selection. Vickers (1972) introduced the use of a handheld wand whose position-tracked tip was used in conjunction with a head-tracked HWD to define a vector from the user’s eye into the environment to select real or virtual objects. Magic lenses. Bier, Stone, Pier, Buxton, and DeRose (1993) developed “magic lens” filters that perform operations, such as magnification, selection, and information filtering, on screen regions in 2-D UIs. Looser, Billinghurst, and Cockburn (2004) adapted this metaphor, using an optically tracked paddle to display and control a 3-D magic lens through which objects can be viewed in AR (Figure 6.9). Tangible AR. Building on the notion of tangible UIs (Ishii & Ullmer, 1997) that use everyday objects, tracked objects (for example, markers) can be associated with augmented content. For example, tracked tiles (Poupyrev et al., 2002) can manipulate linked content, such as video clips. The tiles take on one of two roles: operation tiles trigger actions based on proximity to other tiles, and data tiles are used as data containers. Slay, Thomas, and Vernik (2002) explored having a user hold multiple markers in his or her hand and use them as switches to “toggle” virtual content. The markers could also be used together to form basic gestures for selecting and tracking virtual objects. Visible interaction volumes. Visualizing the volumes within which users can interact may help them better accomplish their tasks. For example, Olwal, Benko, and Feiner (2002) developed statistical geometric tools for identifying objects for multimodal selection by speech and gesture (Figure 6.10). Statistics, such as
Mixed and Augmented Reality for Training
Figure 6.9.
149
Magic Lens Interaction Technique (Looser et al., 2004)
distance to center and dwell time, are sampled relative to virtual volumes attached to the user’s body (for example, a cone that follows a user’s tracked hand when pointing) and combined with speech commands to determine the user’s intentions. Bare hand tracking. Lee and Ho¨llerer (2007) use markerless optical tracking of a user’s hand to allow selection and manipulation of three-dimensional models
Figure 6.10. Using Visible Interaction Volumes for Augmented Reality User Interaction (Olwal et al., 2002)
150
VE Components and Training Technologies
with simple hand gestures. This supports tangible interaction without the need for specialized equipment or preparation of the tracking environment. Presentation The presentation layer uses primitives from the graphics rendering process to create, arrange, and animate higher level objects that form virtual content. Scene graph application programming interfaces provide abstractions for creating, organizing, and moving virtualized content and can serve as useful tools in the presentation layer. They are typically found in AR authoring and application development tools and game engines (discussed in the “Authoring” and “Application Design” sections of this chapter). One of the continued challenges facing AR applications is the integration of virtual labels (and other annotations) into a presentation scheme. This seemingly trivial task becomes extremely complicated given the large number of labels potentially required in some AR training applications. For each label, the application must dynamically determine label position, size, transparency, and priority vis-a`-vis occlusions of other objects in the 3-D scene (Bell, Feiner, & Ho¨llerer, 2001). Authoring Authoring refers to the need for robust, generalized software tools for designing AR applications. Examples include work by Grimm et al. (2002), Gu¨ven and Feiner (2003), and Zauner, Haller, Brandl, and Hartman (2003). Haringer and Regenbrecht (2002) and Knopfle, Weidenhausen, Chauvigne, and Stock (2005) have created tools applicable to designing training applications. These tools offer several advantages, the most compelling being their support for users to modify applications with minimal need for programmer assistance. This allows aftermarket modification of an application and is particularly useful in training domains where scenarios can continually evolve. Application Design Application design concerns development of the overall software that integrates all major AR functions. Game engines (Hart, Wansbury, and Pike, Volume 3, Section 1, Chapter 14) offer an appealing foundation for the development of high quality, visually appealing AR applications by providing software libraries that support the functionality found in sophisticated computer games, including 3-D graphics and audio rendering, geometric modeling, physical simulation (including collision detection), and overlaid 2-D UIs. Game engines used in recent AR applications include Valve Source (2006) and Delta3D (Darken, McDowell, & Johnson, 2005). However, it is important to note that commercial game engines often have limited documentation and may make assumptions about the game genre (for example, the camera is the player’s eye in a
Mixed and Augmented Reality for Training
151
“first-person shooter” and moves with the simulated physics of the player’s head). Goblin XNA (Oda, Lister, White, & Feiner, 2008) extends the XNA game development environment to address AR applications by adding support for live video, tracking (including the association of tracking markers with portions of the scene graph), and rendering of representations of real objects into the depth buffer to occlude virtual objects that pass behind them. User Level Design Concepts One set of considerations for the user level of the design hierarchy follows classic principles and theories found in human-computer interaction research. These include guidelines for design (Nielsen & Molich, 1992; Norman, 1990), theories (Bødker, 1991; Card, Moran, & Newell, 1983; Rosson & Carrol, 2001; Shneiderman & Plaisant, 2005), and 3-D UI design principles, such as those framed by Bowman, Kruijff, LaViola, and Poupyrev (2005). A second set of considerations focuses on wearability. Because many AR systems are mobile, their design must support operation in a variety of settings without encumbering the user. Designers must package supporting equipment, such as displays, trackers, computers, and other devices in a way that does not endanger the user and is also easy to carry. A third set of considerations deals with acceptability of AR technologies by the user. In addition to the considerations mentioned above, acceptability is also driven by visual appeal and complex social factors, such as the tendency to resist change. One approach to achieving acceptance incorporates the end user into the early stages of the design process (Mackay, Ratzer, & Janecek, 2000). CONCLUSIONS Combining virtual content with the user’s natural view of the physical environment supports the creation of effective training environments that promote natural context, realism, and multisensory interaction. Moreover, AR can make possible useful systems that, after supporting training, gradually transition into ubiquitous tools that aid in day-to-day task completion. However, for AR training to become a commercial reality, several significant challenges must be addressed. The technologies that perform the core function of mixing real and virtual objects must become more powerful and less obtrusive. Large, heavy, high resolution displays and cameras must evolve into small, lightweight devices that are as ubiquitous and unassuming as eyeglasses. Computer systems must also shrink in size, while increasing in power. Finally, tracking technologies must become more accurate and robust and less dependent on specialized preparation of the environment. Several recent technological advances are currently converging to hasten the realization of AR’s potential for training. The cameras built into portable devices are becoming capable of increasingly accurate tracking. New displays, particularly handheld and head-worn displays driven by the consumer entertainment
152
VE Components and Training Technologies
industry, will make AR systems smaller, less intrusive, and more mobile. Powerful game engines and commodity graphics devices are allowing creation and delivery of compelling virtual content for stationary and mobile users. These factors will continue to increase the capability and applicability of AR for training. REFERENCES ARTESAS. (2007). ARTESAS—Advanced augmented reality technologies for industrial service applications. Retrieved June 2007, from http://www.artesas.de Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., & MacIntyre, B. (2001). Recent advances in augmented reality. IEEE Computer Graphics and Applications, 21(6), 34–47. Bajura, M., Fuchs, H., & Ohbuchi, R. (1992). Merging virtual objects with the real world: Seeing ultrasound imagery within the patient. SIGGRAPH Computer Graphics 26(2), 203–210. Bell, B., Feiner, S., & Ho¨llerer, T. (2001). View management for virtual and augmented reality. Proceedings of the ACM Symposium on User Interface Software and Technology (pp. 101–110), New York: ACM. Bier, E., Stone, M., Pier, K., Buxton, W., & DeRose, T. (1993). Toolglass and magic lenses: The see-through interface. Proceedings of the International Conference on Computer Graphics and Interactive Techniques (pp. 73–80). New York: ACM. Bimber, O., & Raskar, R. (2005). Spatial augmented reality: Merging real and virtual worlds. Wellesley, MA: A K Peters. Bleser, G., Pastarmov, Y., & Stricker, D. (2005). Real-time 3D camera tracking for industrial augmented reality applications. Proceedings of the 13th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (pp. 47– 54). Pilsen, Czech Republic: University of West Bohemia. Bødker, S. (1991). Through the interface: A human activity approach to user interface design. Mahwah, NJ: Lawerence Erlbaum Associates. Boud, A. C., Haniff, D. J., Baber, C., & Steiner, S. J. (1999). Virtual reality and augmented reality as a training tool for assembly tasks. Proceedings of the IEEE International Conference on Information Visualization (pp. 32–36). Los Alamitos, CA: IEEE Computer Society. Boulanger, P. (2004). Application of augmented reality to industrial tele-training. Proceedings of the Canadian Conference on Computer and Robot Vision (pp. 320–328). Los Alamitos, CA: IEEE Computer Society. Bowman, D., Kruijff, E., LaViola, J., & Poupyrev, I. (2005). 3D User Interfaces: Theory and Practice. Boston: Addison-Wesley. Brown, D. G., Stripling, R., & Coyne, J. T. (2006). Augmented reality for urban skills training. Proceedings of IEEE Virtual Reality (pp. 249–252). Washington, DC: IEEE Computer Society. Card, S., Moran, T., & Newell, A. (1983). The psychology of human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum. Caudell, T., & Mizell, D. (1992). Augmented reality: An application of heads-up display technology to manual manufacturing processes. Proceedings of the 25th Hawaii International Conference on System Sciences, 2, 659–669.
Mixed and Augmented Reality for Training
153
Comport, A., Marchand, E., Pressigout, M., & Chaumette, F. (2006). Real-time markerless tracking for augmented reality: The virtual visual servoing framework. IEEE Transactions on Visualization and Computer Graphics, 12(4), 615–628. Darken, R., McDowell, P., & Johnson, E. (2005). The Delta3D open source game engine. IEEE Computer Graphics and Applications, 25(3), 10–12. Feiner, S., MacIntyre, B., Ho¨llerer, T., & Webster, T. (1997). A touring machine: Prototyping 3D mobile augmented reality systems for exploring the urban environment. Proceedings of the 1st International Symposium on Wearable Computers (pp. 74–81). Washington, DC: IEEE Computer Society. Feiner, S., MacIntyre, B., & Seligmann, D. (1993). Knowledge-based augmented reality. Communications of the ACM, 36(7), 53–62. Fiala, M. L. (2005). ARTag, a fiducial marker system using digital techniques. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2, 590–596. Foxlin, E., & Naimark, L. (2003). VIS-Tracker: A wearable vision-inertial self-tracker. Proceedings of IEEE Virtual Reality (pp. 199–206). Los Alamitos, CA: IEEE Computer Society. Friedrich, W. (2002). ARVIKA—Augmented reality for development, production and service. Proceedings of the 1st International Symposium on Mixed and Augmented Reality (pp. 3–4). Washington, DC: IEEE Computer Society. Goose, S., Gu¨ven, S., Zhang, X., Sudarsky, S., & Navab, N. (2004). PARIS: Fusing visionbased location tracking with standards-based 3D visualization and speech interaction on a PDA. Proceedings of the 10th International Conference on Distributed Multimedia Systems (pp. 75–80). Skokie, IL: Knowledge Systems Institute. Grimm, P., Haller, M., Paelke, V., Reinhold, S., Reimann, C., & Zauner, R. (2002). AMIRE—Authoring mixed reality. Proceedings of the 1st IEEE International Workshop on Augmented Reality Toolkit. Piscataway, NJ: IEEE Computer Society. Gu¨ven, S., & Feiner, S. (2003). Authoring 3D hypermedia for wearable augmented and virtual reality. Proceedings of the IEEE International Symposium on Wearable Computers (pp. 118–126). Los Alamitos, CA: IEEE Computer Society. Haringer, M., & Regenbrecht, H. T. (2002). A pragmatic approach to augmented reality authoring. Proceedings of the International Symposium on Mixed and Augmented Reality (pp. 237–245). Washington, DC: IEEE Computer Society. Hartmann, W., Zauner, J., & Haller, M. (2004). A mixed reality based training application for an oil refinery. Proceedings of the 2nd International Conference on Pervasive Computing (pp. 324–327). New York: ACM. Hua, H., Gao, C., Brown, L., Biocca, F., & Rolland, J. P. (2002). Design of an ultralight head-mounted projective display (HMPD) and its applications in augmented collaborative environments. Proceedings of SPIE, 4660, 492–497. Ishii, H., & Ullmer, B. (1997). Tangible bits: Towards seamless interfaces between people, bits and atoms. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 234–241). New York: ACM. Kato, H., & Billinghurst, M. (1999). Marker tracking and HMD calibration for a videobased augmented reality conferencing system. Proceedings of the IEEE and ACM International Workshop on Augmented Reality (pp. 85–94). Washington, DC: IEEE Computer Society. Kaufmann, H., & Schmalstieg, D. (2002). Mathematics and geometry education with collaborative augmented reality. Computers & Graphics, 27(3), 339–345.
154
VE Components and Training Technologies
Kiyokawa, K., Billinghurst, M., Campbell, B., & Woods, E. (2003). An occlusion-capable optical see-through head mounted display for supporting co-located collaboration. Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality (pp. 133–141). Washington, DC: IEEE Computer Society. Knopfle, C., Weidenhausen, J., Chauvigne, L., & Stock, I. (2005). Template based authoring for AR based service scenarios. Proceedings of IEEE Virtual Reality (pp. 249– 252). Washington, DC: IEEE Computer Society. Lawson, S. W., Pretlove, J. R. G., Wheeler, A. C., & Parker, G. A. (2002). Augmented reality as a tool to aid the telerobotic exploration and characterization of remote environments. Presence: Teleoperators and Virtual Environments, 11(4), 352–367. Lee, T. & Ho¨llerer, T (2007). Handy AR: Markerless inspection of augmented reality objects using fingertip tracking. Proceedings of the IEEE International Symposium on Wearable Computers (pp. 1–8). Los Alamitos, CA: IEEE Computer Society. Livingston, M. A., Rosenblum, L., Julier, S., Brown, D., Baillot, Y., Swan, J., Gabbard, J. L., & Hix, D. (2002). An augmented reality system for military operations in urban terrain. Proceedings of the Interservice/Industry Training, Simulation and Education Conference (pp. 868–875). Arlington, VA: National Training Systems Association. Looser, J., Billinghurst, M., & Cockburn, A. (2004). Through the looking glass: The use of lenses as an interface tool for Augmented Reality interfaces. Proceedings of the International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia. (pp. 204–211). New York: ACM. Luo, X., Kenyon, R. V., Kline, T., Waldinger, H. C., & Kamper, D. G. (2005). An augmented reality training environment for post-stroke finger extension rehabilitation. Proceedings of the International Conference on Rehabilitation Robotics (pp. 329– 332). Los Alamitos , CA: IEEE Computer Society. Macchiarella, N. D., & Vincenzi, D. A. (2004). Augmented reality in a learning paradigm for flight aerospace maintenance training. Proceedings of the Digital Avionics Systems Conference, 1, 5.1.1–5.1.9 Mackay, W., Ratzer, A., & Janecek, P. (2000). Video artifacts for design: Bridging the gap between abstraction and detail. Proceedings of the Conference on Designing Interactive Systems (pp. 72–82). New York: ACM. Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1994). Augmented reality: A class of displays on the reality-virtuality continuum. Proceedings of Telemanipulator and Telepresence Technologies (pp. 282–292). New York: ACM. Milgram, P., Zhai, S., Drascic, D., & Grodski, J. (1993). Applications of augmented reality for human-robot communication. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 3, 1467–1472. Mizell, D. (2001). Boeing’s wire bundle assembly project. In W. Barfield & T. Caudell (Eds.), Fundamentals of wearable computers and augmented reality (pp. 447–467). Mahwah, NJ: Lawrence Erlbaum. Nielsen, J., & Molich, R. (1992). Heuristic evaluation of user interfaces. Proceedings of the ACM Conference on Human Factors in Computing Systems (pp. 249–256). New York: ACM. Norman, D. (1990). The design of everyday things. New York: Doubleday. Oda, O., Lister, L., White, S., & Feiner, S. (2008). Developing an augmented reality racing game. Proceedings of the International Conference on Intelligent Technologies for Interactive Entertainment, Brussels, ICST.
Mixed and Augmented Reality for Training
155
Olwal, A., Benko, H., & Feiner, S. (2002). SenseShapes: Using statistical geometry for object selection in a multimodal augmented reality system. Proceedings of the International Symposium in Mixed and Augmented Reality (pp. 300–301). Washington, DC: IEEE Computer Society. Olwal, A., Lindfors, C., Gustafsson, J., Kjellberg, T., & Mattsson, L. (2005). ASTOR: An autostereoscopic optical see-through augmented reality system. Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality (pp. 24– 27). Washington, DC: IEEE Computer Society. Pair, J., Neumann, U., Piepol, D., & Swartout, B. (2003). FlatWorld: Combining Hollywood set-design techniques with VR. IEEE Computer Graphics and Applications, 23 (1), 12–15. Poupyrev, I., Tan, D. S., Billinghurst, M., Kato, H. A., Regenbrecht, H. A., & Tetsutani, N. A. (2002). Developing a generic augmented-reality interface. Computer, 35(3), 44–50. Regenbrecht, H., Baratoff, G., & Wilke, W. (2005). Augmented reality projects in the automotive and aerospace industries. IEEE Computer Graphics and Applications, 25 (6), 48–56. Rekimoto, J. (1997). NaviCam—A magnifying glass approach to augmented reality. Presence: Teleoperators and Virtual Environments, 6(4), 399–412. Rosson, M., & Carrol, J. (2001). Usability engineering: Scenario-based development of human computer interaction. Redwood City, CA: Morgan Kaufmann Publishers. Sheldon, B. E., & Hedley, N. R. (2002). Using augmented reality for teaching earth-sun relationships to undergraduate geography students. Proceedings of the IEEE International Workshop on Augmented Reality Toolkit. Piscataway, NJ: IEEE Computer Society. Shneiderman, B., & Plaisant, C. (2005). Designing the user interface. Reading, MA: Addison-Wesley. Sielhorst, T., Obst, T., Burgkart, R., Riener, R., & Navab, N. (2004). An augmented reality delivery simulator for medical training. International Workshop on Augmented Environments for Medical Imaging—MICCAI Satellite Workshop. Available from http:// ami2004.loria.fr/ Slay, H., Thomas, B., & Vernik, R. (2002). Tangible user interaction using augmented reality. Proceedings of the Australasian Conference on User Interfaces (pp. 13–20). Los Alamitos, CA: IEEE Computer Society. State, A., Hirota, G., Chen, D. T., Garrett, W. F., & Livingston, M. A. (1996). Superior augmented reality registration by integrating landmark tracking and magnetic tracking. Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (pp. 429–438). New York: ACM. State, A., Keller, K., & Fuchs, H. (2005). Simulation-based design and rapid prototyping of a parallax-free, orthoscopic video see-through head-mounted display. Proceedings of the 4th International Symposium on Mixed and Augmented Reality (pp. 28–31). Los Alamitos, CA Sutherland, I. E. (1968). A head-mounted three dimensional display. Proceedings of the AFIPS Fall Joint Computer Conference, 33, 757–764. Washington, DC: Thompson Books. Tang, A., Owen, C., Biocca, F., & Mou, W. (2003). Comparative effectiveness of augmented reality in object assembly. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 73–80). New York: ACM.
156
VE Components and Training Technologies
Valve Source. (2006). Valve source engine software development kit. Retrieved July 2006, from http://developer.valvesoftware.com Vickers, D. L. (1972). Sorcerer’s apprentice: Head-mounted display and wand. Unpublished doctoral dissertation, University of Utah, Salt Lake City. Vreuls, D., & Obermayer, R. W. (1985). Human-system performance measurement in training simulators. Human Factors, 27(3), 241–250. XNA. (2008). Retrieved February 2008, from http://www.xna.com/ You, S., Neumann, U., & Azuma, R. (1999). Hybrid inertial and vision tracking for augmented reality registration. Proceedings of IEEE Virtual Reality (pp. 260–267). Washington, DC: IEEE Computer Society. Young, A. L., Stedman, A. W., & Cook, C. A. (1999). The potential of augmented reality technology for training support systems. Proceedings of the International Conference on Human Interfaces in Control Rooms, Cockpits and Command Centres (pp. 242– 246). London: IEEE Computer Society. Zauner, J., Haller, M., Brandl, A., & Hartman, W. (2003). Authoring of a mixed reality assembly instructor for hierarchical structures. Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality (pp. 237–246). Washington, DC: IEEE Computer Society.
Part II: Topics for Component Integration
Chapter 7
DESIGNING USER INTERFACES FOR TRAINING DISMOUNTED INFANTRY James Templeman, Linda Sibert, Robert Page, and Patricia Denbrook It is challenging to create a three-dimensional (3-D) user interface for a simulation system to train dismounted infantry tactics, techniques, and procedures for close quarters battle (CQB). CQB is a complex skill that fully involves a person in the 3-D environment. Trainees must be able to move and act in a coordinated manner in order to practice the skills they will use in the field. Vehicle simulators need to provide only out-the-window viewing and a physical mock-up of the vehicle’s actual controls (for example, steering wheel and gearshift). For a dismounted infantry simulator, however, the trainee’s body is the “vehicle” operating directly in the virtual world. Developing such a complex user interface requires a detailed understanding of the task, knowledge of input and output device characteristics, and a design strategy that takes into account the fundamentals of human perception and action. A user interface is the medium through which communication between the user and the computer takes place (Bowman, Kruijff, LaViola, & Poupyrev, 2005). It provides the user’s experience with the simulation. Hinckley, Jacob, and Ware (2004) conclude that interaction design must consider both the motor control (input) and feedback (output) and how they interact with one another as an integrated whole. They cite Gibson’s (1986) ecological approach to human perception, which says that the organism, the environment, and the tasks the organism performs are inseparable and should not be studied in isolation. Gibson coined the term active perception, in which perception and action are tightly linked. This fundamental coupling of perception and action in a motor/sensory feedback loop is what makes the design of a user interface for training dismounted infantry so hard. The design space is extremely large because of the vast number of possible interactions between different motor control techniques and sensory feedback methods. A 3-D user interface for training dismounted infantry should allow the trainee to execute tasks similar to how they are performed in the real world, with close to the same feedback. Having the 3-D virtual world change naturally in response to
158
VE Components and Training Technologies
the trainee’s actions gives the trainee the impression of dealing directly with the virtual world. It provides the correct motor/sensory feedback loop that not only enables skill development, but also makes natural decision making possible. According to Hutchins (1996), human cognition is always affected by the complex world in which it is situated. Realistic interaction provides the cues needed to elicit and train tactical decision making. This chapter presents a list of the properties of natural human action for creating 3-D user interfaces for training. The list can also serve as criteria for usability analyses and more formal evaluations. The chapter also discusses how we applied the properties in the design of two user interfaces developed under the Virtual Technologies and Environments program, funded by the Office of Naval Research, and with support from the Naval Research Laboratory. BACKGROUND Close Quarters Battle The U.S. Marine Corps manual Military Operations on Urban Terrain (Marine Corps Institute, 1997) describes the duties of a four- to six-person search party conducting building clearing operations. The team is split into a search team and cover team: the search team methodically clears each room, hallway, and stairway in the building while the cover team maintains local security. The team operations diagramed in the manual are highly formalized and require skilled coordinated movements that are executed in a complex environment in which threats can come from any direction (the 360°/180° battle space—360° around, 90° above, and 90° below). The goal is to successfully engage all enemy threats, while avoiding injury and death. Each team member moves in concert with other team members, walking smoothly to create a stable shooting platform while searching for target indicators that must be engaged the moment they are encountered. As team members move, they primarily keep their rifles pointed just below their line of sight so that when a target appears, they can immediately snap up their rifles to target the threat. By maintaining the alignment of the head, eyes, and rifle, they essentially move the entire upper body as a single rigid unit (similar to how a gun turret moves on a tank). Essential Elements of Close Quarters Battle The actions of CQB can be summarized as the coordination of looking, moving, and shooting. Looking to direct the person’s view and weapons handling must be seamlessly integrated with the movement required to uncover and respond to threats. Kelly McCann (personal communication, 2001), a former U.S. Marine and subject matter expert in CQB, outlined the basic requirements: (a) It is vital to coordinate looking with moving. Since it can be deadly to neglect the corner of a room, riflemen should never move faster than their abilities to incrementally cover danger areas. (b) It is important to coordinate shooting (moving from a ready into an aim stance, aiming, and pulling the trigger) with looking
Designing User Interfaces for Training Dismounted Infantry
159
and moving. Team members must be ready to shoot at all times, but must never shoot faster than their abilities to hit the target. (c) Team members must coordinate their movements. Team members must pace themselves to move as a unit as they clear the building and provide cover for one another. Team members orient themselves to cover all sectors of responsibility, with some providing vertical and rear security. DESIGN STRATEGIES A user interface is composed of interaction techniques, which link what the user does (input) to what is displayed (output/feedback). As such, interaction techniques provide the motor/sensory feedback loop necessary for active perception in the 3-D simulation. Realistic simulation systems, such as those used to train CQB, have user interfaces that differ from other 3-D user interfaces in that the user relates to the virtual world through the user’s avatar, the representation of the user’s body that moves and interacts in the virtual world. Therefore, an additional aspect to the input mapping must be taken into account. Input design must consider not only the physical actions that comprise the interaction techniques, but also how those actions translate into the behavior of the user’s avatar (the effect of those actions). In the case of a system to train CQB, the physical actions are what the user does to look, move, and shoot, and the effect is how the avatar reflects those actions. Feedback (output) is the second half of the interaction technique equation. Feedback stimulates the senses and includes visual, auditory, and haptic displays. All channels are important, but visual feedback is dominant in most simulation systems, including those for dismounted infantry training, and will be the focus here. Avatars The user’s avatar is driven by the user’s actions in real time. Avatars differ in how much of the avatar’s body is articulated and the level of control the user has over the avatar’s behavior. Unlike avatars in many first-person shooter games that use canned animation sequences to portray the user’s actions (for example, walking and running), our avatars are highly articulated (including head, torso, and legs), and the user drives the motion of the avatar directly. For example, the user’s avatar turns to look by the same amount and in the same direction as the user’s head turns. Avatars can be viewed in first or third person. In first person, the user “sees” out of the avatars eyes and the avatar is fully drawn so users can see their arms, legs, and feet. In third person, the user views the full avatar from a distance. In our system, the user interacts with the virtual world in first person; third person is used only for demonstration purposes. Input: Properties of Natural Action Designing interaction techniques to support dismounted infantry training is challenging. To guide the development process, we studied how experts perform
160
VE Components and Training Technologies
real world tasks and developed a framework that lists the salient properties of natural physical action (Templeman & Sibert, 2006). In the real world, people perform actions by moving their limb segments to accomplish tasks. The set of actions that can be achieved is the result of constraints inherent in the structure of the human body that result from “[the] bony arrangement, net muscle activity, segmental organization of the body, scale or size, [and] motor integration (such as the need to provide postural support)” (Enoka, 2002, p. xix). Actions can be simple (for example, relaxing the grip on an object) or complex, involving several body segments to produce a coordinated effect (for example, running to strike a tennis ball). Likewise, users perform interaction techniques to accomplish tasks in the virtual environment. The properties of natural action guided our selection of physical actions for the interaction techniques that affect the behavior of the user’s avatar. The list of properties applies to both realistic 3-D user interfaces, as well as more abstract ones. The difference is in the degree of similarity to the real world actions. The properties of natural action with an example of where each applies in the real world follow: Body segments: For walking or running, the body leans forward and the legs move to propel the body over the ground. Effort: For a given gait of locomotion, effort increases with speed due to “the increases in heart rate, ventilation rate, and the rate of oxygen consumption” (Enoka, 2002, p. 192). Coordination: People have the ability to coordinate several actions at once. A tennis player can run up to and strike a ball in one unified motion. The effect can be achieved only by running and swinging the racket in concert. There are limits to coordination insofar as performing one action constrains other actions. Some limitations derive from the body’s physical construction; it is difficult for a person to look backward while walking straight ahead. The most obvious form of constraint is that actions requiring the dedicated use of a body segment cannot be performed at the same time as an action involving that same segment. In other instances, the way an action is performed may be altered to afford concurrent operation. For example, two hands are used to hold a rifle; turning a doorknob requires a free hand and so disrupts the way the rifle can be held. Degrees of freedom and range of motion: The head can be turned and tilted forward, back, and side to side over a limited range by exerting the muscles of the neck. To rotate the head farther, the body must turn, either by twisting the spine or rotating the pelvis while twisting the legs. Turning even farther requires turning the entire body, easily accomplished by stepping in place. Rate and accuracy: It takes longer to aim and shoot with precision than to aim coarsely. Rate and accuracy are analyzed in terms of the speed/accuracy trade-off. Open- or closed-loop control: If an action can be performed using only internal sensory feedback and without any external cues (for example, visual or auditory), the action is said to be open loop. People can look at a nearby object, close their eyes, and walk to it with fairly good accuracy under open-loop control (Philbeck, Loomis, & Beall, 1997). The opposite is closed-loop control in which a person
Designing User Interfaces for Training Dismounted Infantry
161
relies on external feedback to adjust the action as it is performed, as when catching a ball. All six properties, defined above, apply to the input actions that comprise the interaction techniques. The last three apply to the effect of those actions on the avatar. For the avatar to move correctly, the input must convey sufficient degrees of freedom (DOF) and range of motion to specify realistic behavior. A classic example is using a 3 DOF inertial tracker, which tracks only the orientation of the user’s head. The 3 DOF of orientation are captured, but the head’s 3 DOF of translation are lost. As a result, the avatar’s head motion can be displayed only approximately; more importantly, no visual feedback is available to convey motion parallax. The rate and accuracy with which the avatar walks and runs through the virtual world affects the user’s sense of timing and scale. Many games allow unrealistically fast movement at sustained speeds. It would be impossible to obtain correct timing estimates to plan a mission. Open- and closed-loop control are important because they relate how the interaction technique input actions guide the avatar’s behavior. Open-loop control is the most demanding, but also the most important property. If an action in the real world can be achieved with open-loop control using only internal sensory feedback (without external cues), the interaction technique should also permit it.
Feedback: Visual Display Characteristics In the real world, people live in a full-surround, high resolution, full-contrast world. No current, cost-effective display provides this level of visual fidelity. Therefore, it is important to know the limitations of display technologies to understand their affect on performance and training. An excellent discussion of visual display characteristics and depth cues for 3-D applications is found in Hinckley et al. (2004). Both field of regard (FOR) and field of view (FOV) are critical for understanding realistic simulation systems. FOR is the amount of visual display space surrounding the user that is accessible by turning the head and body. FOV is the maximum visual angle that can be seen without turning the head. Dismounted infantry urban combat takes place in a 360°/180° battle space. The warfighter must rapidly respond to threats coming from any direction. The display must provide the full 360°/180° FOR, or the interaction techniques must compensate for the lack of one (a simple example is providing a button to rotate the virtual world about the user). In terms of FOV, less than 120° horizontal by 120° vertical FOV per eye affects how realistically the user can perform a visual search task: a narrow horizontal FOV is like wearing blinders, forcing the user to turn the head to take in a full view; a narrow vertical FOV makes it difficult to follow a path or pick up threats from above or below (Allen, 1989). The display’s spatial resolution, contrast, and update rate are also important design considerations. They will determine whether users are able to see target indicators and distinguish friend from foe at the same range and with the same precision as in the real world.
162
VE Components and Training Technologies
USER INTERFACES TO TRAIN DISMOUNTED INFANTRY To teach CQB, the interaction techniques comprising the user interface should give trainees close to the same ability to look, move, and shoot as they have in the real world. Skills and actions, such as walking through a door and aiming a rifle, should demand approximately the same timing and exposure to threats. Likewise, the avatar’s behavior should closely resemble real world performance, moving at a similar pace and precision. In terms of sensory feedback, the trainee should be able to access the full environment. We present, as examples, two interfaces we developed that seek to match the properties of natural action. Gaiter is an immersive, body-driven interface in which the user’s body, rifle, and head-mounted display are fully tracked using a 6 DOF optical motion capture system. With Gaiter, the user walks in place to move through the virtual world and is able to freely turn in place. A headmounted display provides access to a full FOR to “immerse” the user in the virtual environment, although the FOV is limited to 36º vertical by 48º horizontal per eye. A harness is used to center the user because people tend to drift forward when they walk in place. An instrumented rifle prop registers shots fired. Body driven refers to the fact that the user’s major body segments are tracked, and the 6 DOF position of each segment directly controls the user’s avatar at the same level of articulation. Gaiter is a general purpose interface because the interaction techniques allow close to the full range of actions people have in the real world, and it is considered high end because it employs expensive, large footprint hardware. Pointman, on the other hand, is a partially device-driven, low cost interface specialized for CQB training. (A good general discussion of system cost is found in Knerr, 2006.) Although head and foot motion are captured using head tracking and sliding foot pedals (rudder pedals for helicopter simulation games) making it partially body driven, locomotion is directed using a conventional dual joystick gamepad. The mappings of the joysticks are uniquely tailored to provide control over tactical infantry movement, unlike the mappings of the controllers used for conventional first-person shooter games. With Pointman, users are able to specify the direction of movement independent of the direction of the heading of the upper body. This independence allows the users to control how their avatars look, move, and shoot, with capabilities and constraints close to those people have in the real world.
Gaiter The Gaiter user interface is a high end, fully immersive system in which the user’s full body motion drives a fully articulated avatar (Templeman, Denbrook, & Sibert, 1999). The interaction techniques were designed to closely match the properties of natural action. A head-mounted display was chosen because it gives full access to the environment (an unlimited FOR), albeit at a lower resolution and with a reduced FOV than in the real world. An early prototype is shown in Figure 7.1.
Designing User Interfaces for Training Dismounted Infantry
163
Figure 7.1. Gaiter: Note the Centering Harness, Tracking Cameras, and Tracking Markers on the Head-Mounted Display, Rifle Prop, and Body Segments
Whenever possible, the interaction techniques were designed to match the corresponding natural action one-to-one. A one-to-one mapping of body motion requires 6 DOF tracking. The orientation and translational components of the tracked motion are passed directly to the virtual simulation to specify the orientation and translation of the avatar’s corresponding body segments. We use a oneto-one mapping to convey the motion of the head, arms, upper body posture, and the rifle prop. Weapons handling (moving between stances, reloading, and so forth) and shooting are a good example of a one-to-one interaction technique. Because we fully track the rifle prop, the user is able to see a virtual representation of the rifle as it would be seen directly. The user can aim and fire using a proper shooting posture, visual sight alignment, and body indexing (consistently holding the rifle stock in the same place against the shoulder and along the side of the cheek, known as a cheek weld). The user shoulders the rifle prop and establishes a cheek weld, looks “through” the virtual depiction of the rifle sights to acquire sight alignment and a sight picture, and pulls the trigger. The location of the hit is determined from the position of the rifle in the virtual world. In this way, the correct perceptual/action skills are used to locate the target, put the rifle on the target, and keep the rifle on the target while the shot is fired. Because the head-mounted display shows the imagery on a single focal plane, the skill of focusing the eyes on the front sight is not trained. We blur the image of the rear sight to add realism, but the degree of blurring is fixed and does not depend on
164
VE Components and Training Technologies
the user’s depth of focus. Flaws remain in the latency of the view presentation and in the precise visual alignment of the user’s eyes, rifle sights, and target, which further refinement would improve. One-to-one interaction techniques, such as handling a rifle, are a direct match with the properties of natural action. The rifle handling technique uses the same body segments as real world shooting. Since the rifle prop is fully tracked in 6 DOF, there are sufficient degrees of freedom to accurately depict shooting in the virtual world. Effort is comparable because the rifle prop weighs about the same as an actual rifle and has similar inertia. Rate of firing and accuracy are maintained if the visual aspects involved with shooting are correct. Finally, with practice, marksmen can obtain close to the open-loop control they achieve in the real world. Not all interaction techniques can be designed as one-to-one mappings. Locomotion is a case in point. Users need a way to move over long distances in the virtual world while remaining within the tracked area of the motion capture system. The Gaiter locomotion technique is a gesture based technique that was designed to convey the essence of natural walking. We based our design on motor substitution, which suggests that by substituting a closely matched gesture for a natural action, the user can more easily employ familiar skills and strategies and more likely accomplish the task in a realistic manner (Templeman & Sibert, 2006). An analysis of the properties of natural action for walking and running in the real world helped us match in-place stepping to natural locomotion. Additional refinement could improve the technique, but most people who have tried the system indicate that Gaiter gives them the sense of actually walking and running through the virtual world. Body segments: Motor substitution recommends that the interaction techniques use the same or close to the same body segments, and thus the same motor control subsystems, to allow similar affordances and constraints. A leg based approach that uses stepping in place and actual turning of the body gives a good approximation of real world locomotion. Actual turning provides kinesthetic and vestibular feedback, which Chance, Gaunet, Beall, and Loomis (1998) showed are needed for people to orient themselves accurately in the environment. Effort: In-place walking and running require significant effort, albeit less than natural walking and running. Through informal use, we have found that leaning into the harness while running in place makes it feel more like actual running. Coordination: It is important that the locomotion control action interact with other actions (such as weapons handling) in a realistic manner to give the users close to the same ability and level of constraint to coordinate actions as they have in the real world. With Gaiter, body segments are employed as they are in the real world (legs for locomotion and hands for manipulating the rifle) and, as with real walking, stepping in place with natural turning operates in a body-centric coordinate system that allows reflexive action. Degrees of freedom and range of motion: Gaiter applies 6 DOF tracking to the major body segments, the head-mounted display, and the rifle to provide full control over aiming.
Designing User Interfaces for Training Dismounted Infantry
165
Rate and accuracy: Gaiter is tuned to match real world walking and running speeds and to preserve the metrics between physical and virtual space. The horizontal extension of the user’s knee and rate of in-place steps are mapped into the stride length and cadence of the avatar’s virtual steps. The system is tuned to match the velocity associated with the transition between walking and running. With walking, one support foot is always on the ground. With running, both feet are momentarily off the ground at the same time. The resulting virtual motion ties optic flow (Gibson, 1986) to leg movement to make the interaction technique feel more like a simulation of walking rather than indirect control over locomotion. Open- or closed-loop control: With practice, a user can calibrate to Gaiter and achieve open-loop control, which is consistent with the findings of Richardson and Waller (2007), who showed that practice corrects the underestimation of distance in virtual environments. Because the user can easily access the full environment with the head-mounted display, closed-loop control with vision is always available. The distinguishing feature of the Gaiter locomotion technique is how it combines control actions based on motor substitution with a body-driven avatar that reflects the user’s posture and movement. The user can naturally align other parts of the body, such as head, shoulders, arms, and torso, with the movement of the user’s knees and feet. Users can intermix a wide range of one-to-one movements, such as turning, crouching, and bending, to look around objects in the virtual. The ability to turn naturally (one-to-one) is central to Gaiter. A user can reflexively turn toward or away from a sight or sound.
Pointman Pointman is a compact, low cost interface that gives users the ability to execute realistic tactical infantry movements. The military has a growing interest in using games for training because they can support a large number of player/trainees and are portable and relatively inexpensive. However, console based gamepads, as currently used for first-person shooter games, encourage unrealistic tactics by promoting strafing motions: moving obliquely with respect to the viewing direction (Templeman, Sibert, Page, & Denbrook, 2007). Pointman encourages correct tactical movement in a desktop simulator. The Pointman user interface consists of a conventional dual joystick gamepad for locomotion and weapons handling, sliding foot pedals to specify displacement, head tracking for viewing and aiming, and a desktop or head-mounted display. Pointman is shown in Figure 7.2. We studied the details of tactical movement for CQB to isolate its fundamental properties. When performing CQB in urban terrain, tactical infantry movement involves keeping the rifle aligned with the view while scanning for threats. In the words of the U.S. Marine Corps manual on rifle marksmanship, “[Cover] the field of view with the aiming eye and muzzle of the weapon. Wherever the eyes move, the muzzle should move (eyes, muzzle, target)” (Marine Corps Combat Development Command, 2001, p. 73). Figure 7.3, adapted from the Marine
166
VE Components and Training Technologies
Figure 7.2. Pedals
Two Configurations of Pointman, Each with Head Tracking and Foot
Corps manual on urban operations (Marine Corps Institute, 1997), illustrates the tactical movement for visually clearing the area around a corner or through an open doorway (called “pie-ing”). A person turns the upper body as a rigid unit to point the rifle just past the corner of the doorway, while moving down the hallway. The objective is to incrementally see into the area just beyond the corner, while minimizing exposure to potential threats. Tactical movement, therefore, involves maintaining a ready posture with the head, the upper body, and the rifle aligned and turned as a unit independent of the direction of movement. The fact of this alignment of the body in real world CQB provides a legitimate basis for reducing the degrees of freedom needed to specify the orientation of the avatar’s upper body. It is useful to adopt terms used in both vehicular and human navigation literature (Beall & Loomis, 1996) to formalize the discussion. We define three components of motion in the horizontal
Figure 7.3. Left: Turreting the Upper Body toward the Corner while Maintaining a Straight Course, the Tactically Correct Way to Clear a Corner; Right: Pie-ing Past an Open Doorway
Designing User Interfaces for Training Dismounted Infantry
167
Figure 7.4. Illustration of basic terms: here the direction of aim indicates the heading. The angle between the course and the heading determines the kind of step being taken.
plane: heading, course, and displacement. Upper body refers to the aligned head, upper body, and rifle. Heading: angular direction the upper body faces. Course: angular direction in which the pelvis translates. Displacement: distance the pelvis translates.
In Figure 7.4, heading is shown as a top-down view of a person in a tactical ready posture. Course is the arrow pointing in the direction of translation. The angle between course and heading determines the kind of step taken. The following classes of motion were derived from studying how course and heading vary as people walk and run in the real world (Figure 7.5). Steering motion: Course and heading remain coaligned as a person moves along a path. Steering can be used to move toward a target or follow along a path. It is the most common type of pedestrian motion. Oblique motion: Heading remains in a fixed direction as course varies. Oblique motion is used by marching bands. Canted motion: Course and heading are maintained at a fixed angle offset. Steering motion is a subclass of canted motion. Notice that moving along a
Figure 7.5.
Examples of Steering, Canted, Oblique, and Scanning Motion
168
VE Components and Training Technologies
straight path with course and heading pointing in the same direction qualifies as both canted and oblique motion, so the classes partially overlap. Scanning motion: Heading is free to turn separately from the course. Scanning motion can be used to search from side to side while moving along a path or to direct the heading toward a target. Scanning while traversing a curved path is the only case in which the heading and course vary independently. The properties of natural action were applied more abstractly in the design of Pointman than for Gaiter. The goal was the same: users must be able to control looking, moving, and shooting (the basic actions of CQB), with close to the same capability and limitation people have in the real world. What is different is that the set of potential input actions was not constrained to closely match their real world counterparts, which opens up the design space manifold. The guidelines for direct manipulation interfaces (such as the desktop interface) are helpful in evaluating possible designs. The fundamental goal of direct manipulation is to tie the control actions into the user’s preexisting skills, abilities, and expectations (Jacob, Leggett, Myers, & Pausch, 1993). A “good” abstract interface, therefore, should exploit the user’s intuition, take advantage of people’s ability to coordinate actions, reduce the cognitive burden by limiting the number of artificial commands the user must remember, and provide sufficient information to support realistic avatar behavior in the virtual world. The user interface for Pointman includes a locomotion technique to specify course and heading, a method to input the amount of displacement, weapons handling functions, and view and aim control. The user controls course and heading through the joysticks on a conventional dual-joystick gamepad. Joysticks are commonly programmed to provide rate control over turning and displacement, but with Pointman, both joysticks are directional controls. The preferred technique for specifying direction is to push the joystick against the circular outer rim in the initial direction and slide it along the rim to make adjustments. This action is easier than turning the joystick without support, and it results in a smoother, more precise motion. The left joystick is used to direct the course, and the right the heading. When both joysticks are used together, course and heading are specified independently, enabling the user to execute scanning motion to direct the avatar’s actions in a tactically correct manner. There are two classes of motion that use just one joystick: steering motion (when the course and heading are coaligned as in walking along a path) is controlled using just the right joystick; oblique motion (when the view is fixed forward) is controlled with the left, the same as with the conventional joystick mapping. People are good at turning a specific number of degrees under open-loop control, for example, a 90° turn to the right, and Pointman supports that as well. In this case, the user pushes the joystick 90° from the neutral centered position in the direction of the turn. The maximum turning rate is limited to the maximum rate a person can actually turn the body. For displacement, length of stride and speed are controlled using sliding foot pedals, which mimic people’s foot motion when they walk or run. Weapons handling is accomplished through button presses on the gamepad. One button is the trigger, and another cycles through the tactical rifle stances:
Designing User Interfaces for Training Dismounted Infantry
169
tactical, alert, and ready carry. Viewing and aiming control are provided by a 3 or 6 DOF tracker attached to the head. A full FOR is available using either a desktop or head-mounted display, but access is less direct than with Gaiter (or in the real world) because the user is seated and not able to turn the body. Head tracking is used to control the yaw (turning about the vertical axis) and the pitch (tilting up and down) movement of the user’s view and aim. Turning the avatar’s heading is primarily accomplished using the right joystick, but the yaw derived from head tracking information is added to turn the view an additional amount (limited by how far the user’s head can be turned without turning the body). If the tracker provides 6 DOF that includes translation, the user can lean in, out, and side to side, further increasing the realism. Moving the view up, down, and side to side is direct and natural with a head-mounted display. It is more complicated with a desktop monitor. As the user yaws the head, the avatar’s head turns relative to the avatar’s body by the same amount and in the same direction (one-to-one), the same as with the head-mounted display. For pitch, however, there is a limitation to how far the user can tilt the head back while still comfortably viewing the screen. Therefore, we amplify the actual tilting of the head by a linear scale factor that depends on the vertical dimensions of the display, so that the user does not have to pitch the head straight up or down for the avatar to look directly up or down in the virtual world. Both desktop and head-mounted displays suffer from a limited FOV as compared with viewing in the real world, although it is less expensive to provide a wide FOV with a desktop display than with today’s head-mounted displays using larger or multiple screens. Resolution is also important and varies by the display; the higher the resolution, the better users can detect targets and discriminate whether they are friend or foe. An analysis of the final design shows how Pointman achieves the properties of natural action and allows users to “pie” corners, scan upon entering a room, and perform tactically correct team stairwell clears using a partially device-driven control. Body segments: Even though Pointman uses the thumbs rather than the legs for turning the body and directing the course, users are still able to directly sense their alignment through the position of the joysticks. Other control actions use the same body segments as in the real world. Pitching the view and sighting the rifle are controlled with head movement. The movement of the feet on the sliding foot pedals controls stepwise displacement. Effort: Because the head is tracked, adjusting the view is accomplished by moving the head as it is in the real world, taking a similar amount of effort. In the same way, using the foot pedals gives a sense of the distance traveled and the difference in effort needed to step at different rates. Coordination: Several control actions must be coordinated to enable looking, moving, and shooting. Because they are all independent actions controlled with different parts of the body, they can be easily performed concurrently. The course and heading joysticks specify the direction of translation and the upper body’s orientation. They are coordinated with the sliding foot pedals to specify
170
VE Components and Training Technologies
displacement and move the avatar through the virtual world. The user controls view direction with the joysticks and by making fine adjustments with the head. Therefore, the direction of aim for shooting is linked to the direction of view, which corresponds to what experts do in the real world: The head, upper body, and rifle are rotated as a unit to face the target. Expert shooters also pitch their torso forward or back to aim up or down, which is available if Pointman is configured with 6 DOF head tracking. Degrees of freedom and range of motion: Both head tracking and the foot pedals provide body-driven control over the avatar’s movement. If the user’s head is tracked in 6 DOF, the avatar’s head pose is a direct one-to-one match. If the tracker provides only the 3 DOF in orientation, realism decreases; however, a direct link still exists between the user and the avatar because the avatar’s head orientation appears similar to that of the user. Rate and accuracy: Foot pedals allow the user to directly control the extent of each step and rate of movement, giving a sense of distance covered. As with real walking and running, the user is able to control the cadence of the avatar through reciprocating leg movements. Open- or closed-loop control: The user can achieve open-loop control over movement by continuously sensing the relative positions of the joysticks and the direction of the head. Tactical movement often occurs in low or no light conditions, making it important to be able to direct one’s movement without vision.
SUMMARY Developing a user interface to train CQB poses challenges to provide interaction techniques that afford the essential capabilities needed to execute tactics, techniques, and procedures. We seek to transform the user’s actions into the actions of the avatar in a way that allows the user to perform the requisite tasks in a tactically correct manner; otherwise, the exercise is merely a game. The properties of natural action, which were developed by analyzing how experts perform tasks in the real world, have proven useful in guiding the design and development of interaction techniques for training dismounted infantry. We studied the tactics, techniques, and procedures of CQB and determined that the fundamental actions are looking, moving, and shooting. A further analysis revealed that tactical movement relies heavily on the ability to look while moving along a path, with the head, upper body, and rifle moving as a single rigid unit. These insights led to the design of Gaiter, a body-driven interface, and Pointman, a system that combines a device-driven interface with body-driven control. The properties of natural action are also useful in analyzing other dismounted infantry training simulators. For example, a system that uses a joystick mounted on a rifle prop for locomotion, such as the one we built for experimental purposes similar to Atlantis Cyberspace, Inc.’s Immersive Group Simulator or Quantum3D’s ExpeditionDI, overloads the users’ hands, adding control over course and speed to the actions of weapons handling and shooting. If a surround screen or head-mounted display is used, which allows the user to turn naturally,
Designing User Interfaces for Training Dismounted Infantry
171
directing steering motion is easily performed by pushing the joystick forward to move in the direction the rifle is pointed and turning the upper body. However, it is tricky to turn to search for threats without disrupting the course. The user must compensate for turning the body (to redirect the heading) by counterturning with the joystick to maintain a straight course. This control structure makes scanning motion more difficult and encourages spiraling in toward the target (strafing), as occurs with conventional game controls. The user interface is the key component in developing an effective simulator for training dismounted infantry. The goal is to develop interfaces that give trainees the ability to move and coordinate actions in the virtual world as they do in the real world. The benefit is in providing a safe, accessible, and cost-effective addition to live training. REFERENCES Allen, R. C. (1989). The effect of restricted field of view on locomotion tasks, head movements, and motion sickness. Unpublished doctoral dissertation, University of Central Florida, Orlando. Beall, A. C., & Loomis, J. M. (1996). Visual control of steering without course information. Perception, 25, 481–494. Bowman, D. A., Kruijff, E., LaViola, J. J., Jr., & Poupyrev, I. (2005). 3D user interfaces theory and practice. New York: Addison-Wesley. Chance, S. S., Gaunet, F., Beall, A. C., & Loomis, J. M. (1998). Locomotion mode affects the updating of objects encountered during travel: The contribution of vestibular and proprioceptive inputs to path integration. Presence, 7, 168–178. Enoka, R. M. (2002). Neuromechanics of human movement. Champaign, IL: Human Kinetics. Gibson, J. J. (1986). The ecological approach to visual perception. Hillsdale, NJ: Lawrence Erlbaum. Hinckley, K., Jacob, R. J. K., & Ware, C. (2004). Input/output devices and interaction techniques. In A. B. Tucker (Ed), The computer science handbook (2nd ed., pp. 20.1– 20.32). Boca Raton, FL: Chapman and Hall/CRC Press. Hutchins, E. (1996). Cognition in the wild. Cambridge, MA: The MIT Press. Jacob, R. J. K., Leggett, J. J., Myers, B. A., & Pausch, R. (1993). Interaction styles and input/output devices. Behaviour and Information Technology, 12, 69–79. Knerr, B. W. (2006). Current issues in the use of virtual simulations for dismounted soldier training. Proceedings of the NATO Human Factors and Medicine Panel Workshop: Virtual Media for Military Applications (RTO Proceedings No. NATO RTO-MPHFM-136, pp. 21.1–21.12). Neuilly-sur-Seine, France: Research and Technology Organization. Marine Corps Combat Development Command. (2001). Rifle marksmanship (Marine Corps Reference Publication No. MCRP 3-01A). Albany, GA: Marine Corps Logistics Base. Marine Corps Institute. (1997). Military operations on urban terrain (Marine Corps Institute Rep. No. MCI 03.66b). Washington, DC: Marine Barracks. Philbeck, J. W., Loomis, J. M., & Beall A. C. (1997). Visually perceived location is an invariant in the control of action. Perception & Psychophysics, 59, 601–612.
172
VE Components and Training Technologies
Richardson, A. R., & Waller, D. (2007). Interaction with an immersive virtual environment corrects user’s distance estimates. Human Factors, 49, 507–517. Templeman, J. N., Denbrook, P. S., & Sibert, L. E. (1999). Virtual locomotion: Walking in place through virtual environments. Presence, 8, 598–617. Templeman, J. N., & Sibert, L. E. (2006). Immersive simulation of coordinated motion in virtual environments. In G. Allen (Ed.), Applied spatial cognition: From research to cognitive technology (pp. 339–372). Mahwah, NJ: Lawrence Erlbaum. Templeman, J. N., Sibert, L. E., Page, R. C., & Denbrook, P .S. (2007). Pointman—A device-based control for realistic tactical movement. Proceedings of 3DUI (pp. 163– 166). Piscataway, NJ: Institute of Electrical and Electronics Engineers, Inc.
Chapter 8
RENDERING AND COMPUTING REQUIREMENTS Perry McDowell, Michael Guerrero, Danny McCue, and Brad Hollister Large, distributed, multiparticipant training simulations require significant computing resources to run the appropriate simulation application at each user station, to enable interactive inputs and displays for the user, and to enable network communication among the participating computers. This chapter will briefly look at the major categories of computations for simulations and then take an in-depth look at the hardware of a typical personal computer (PC) based trainee station: central processing unit (CPU), memory and mass storage, graphics processing unit (GPU), and networking. This chapter discusses the theory and nomenclature in the current state of the art of processing for interactive three-dimensional (3-D) and virtual environment based training systems and games. The goal is to give the reader the basic knowledge and vocabulary to discuss system processing and networking requirements with the programmers and engineers who will be designing and building the system. Although we attempt to define terms throughout the chapter, it is written with the expectation that the reader is fairly well versed with computer technology. Terms in italics are good keywords for searches of article databases, digital libraries, and online search engines. OVERVIEW OF PROCESSING REQUIREMENTS FOR VIRTUAL ENVIRONMENTS Virtual environments (VEs) are highly complicated computing applications and as such require some of the most computationally advanced systems. While other applications, such as a simulation of fluid flow around a body or a password-cracking application, may require faster computations, VEs are more complicated due to the range of computationally advanced requirements most have in multiple areas, such as rendering, simulation, physics, networking, and input and output from and to multiple devices. Consider a simple VE for a ground-combat simulation trainer with the user in a head-mounted display, using an instrumented weapon, and leading a squad of
174
VE Components and Training Technologies
soldiers in similar VEs against a simulated enemy. The computational requirements for this straightforward VE are extremely complex. To be interactive the scene must be rendered at more than 30 times per second for the display for each eye in order to prevent jerky movements that will reduce the realism of the environment, make smooth control of interactions difficult, and increase the probability that the user will experience simulator sickness. The tracking system must calculate the position and orientation of both the weapon and the user’s head, and this must be done with a minimum of latency to avoid the same problems as not rendering quickly enough. The leader’s system must be networked with the systems of his squad members, again with a minimum of latency to prevent misrepresenting his team’s locations in the world. The system’s artificial intelligence must compute the actions of the simulated enemy in real time; unless the desired behaviors are modeled well and significant computing power is available so that the behavior state is updated rapidly, the synthetic enemy will not act realistically, causing the training to be ineffective. The system must calculate all the physical interactions occurring in the world. These include collision detection, determining whether the user and all the other moving objects in the world have collided with any of the objects in the world, and if objects do collide, determining the response of the colliding objects to the collision. The last step is rendering a new scene that reflects the changes caused by the collision. The processing requirements for simulations are highly dependent on the number of real and simulated entities (people and moving objects, such as tanks), the number of participants in the simulation, the fidelity of the simulated behaviors and collisions, and the fidelity of the programs that generate data for visual displays and displays to other senses, such as audio. The hardware available to build VE systems today owes much to the PC games market. The gamers’ desire for ever-higher levels of fidelity in all elements of games has driven the capabilities of PCs, and particularly of graphics cards, in ways that enormously benefit the training system designer. PC BASED STATIONS FOR TRAINING From the 1960s through the early 1990s the computer based simulation industry was dominated by vendors who provided proprietary hardware and software solutions for rendering interactive 3-D imagery. Only in the past 10 to 15 years have general purpose PCs displaced the proprietary hardware and provided government agencies and the private sector with lower cost and more flexible ways of meeting their training hardware requirements. The new paradigm is building interactive 3-D and VE based training systems using commercial offthe-shelf PCs. Even if one vender provides an entire simulator system, it is likely to be composed of custom software running on commercial PCs instead of on custom hardware. The rest of this chapter is focused on PC based systems for simulation since, although still available, single-vendor, turn-key custom solutions are no longer the norm, and no one vendor dominates the marketplace, as companies such as Silicon Graphics, Inc., did in the 1980s and 1990s.
Rendering and Computing Requirements
175
Basic PC Architecture Figure 8.1 shows a high level view of the architecture of a PC and its major parts. The central processing unit is the computing heart of the system, and it is located on the motherboard. The motherboard is the primary printed circuit board in the system and, in addition to the CPU, contains the data pathways, called busses, that connect the system components. In addition to the CPU, other major elements of the PC are the main memory, internal and external mass storage, and peripheral processors providing computational acceleration for targeted tasks, such as graphics and simulating physical reactions. Although peripheral processors can be mounted on the motherboard, the more powerful ones, for instance, high end graphics cards, plug into a fast bus on the motherboard. Types of peripheral processors include graphics accelerator cards, which themselves contain graphics processing units, physics accelerator cards containing physics-processing-units, sound generation cards, and network interface cards. Of these, this chapter will cover only graphics cards. Physics cards are still rather rare; sound cards have little programmability and, in general, vary little except for those designed for very particular tasks such as Web servers or studio recording. Networking cards are essentially commodity products today, and we will discuss networking only with regards to some of the different methods a designer can choose for a large distributed simulation system.
Figure 8.1. Simplified block diagram of a PC motherboard. Graphics processing may be integrated on the motherboard or provided on a card plugged into a PCI-e bus slot.
176
VE Components and Training Technologies
Single and Multi-PC Systems The hardware system running the application may consist of a stand-alone PC, such as is used for most single player games, or multiple PCs networked together. The latter case has many variations. In some systems the networked PCs all perform similar tasks, for instance, in a networked multiplayer game where each player’s computer runs the same simulation (game) and the results and game state are shared over the network. In this situation, the hardware requirements are generally consistent across the networked computers. A second multi-PC configuration is when a number of PCs, called a cluster, operate in parallel on a single task, with each PC performing a portion of the computation. An example is a cluster used for image generation where each PC renders a subset of each new frame. Parallel execution of subsets of the problem reduces the time required to complete the overall task. Computational load for a large simulation can also be shared by using different computers in a cluster to perform different tasks. In this case one (or multiple) PC might perform only image generation, while others perform input handling, artificial intelligence, or physics calculations. In all these multi-PC configurations, however, the individual machines’ configurations are very similar, if not identical, to those used by PC gaming enthusiasts. Multiprocessor systems are discussed again in more detail in the “Multiprocessing” section of this chapter. PC Busses Data moves from one functional unit of the PC to another by way of busses, a set of parallel electrical conductors. A motherboard may have busses dedicated to communication between chips, for example, the main memory to the CPU bus, as well as busses that peripheral processor cards plug into. From the CPU’s perspective, all other devices in the system—whether a graphics card, main memory, or an internal disk drive—are treated as addresses written to or read from via the motherboard’s system of busses. System performance is affected by the rate at which a bus can transfer data, and the data transfer rate is determined by how wide the bus is (how many bits can be transferred each clock tick) and the speed of the clock controlling data transfer. The bus between the CPU and the graphics card is particularly important as all data to be displayed travels across it. Most PC motherboards today use the peripheral-component-interconnect-express (PCI-e) bus. For maximum performance, particularly for graphics performance, system designers must ensure a match between the bus interface on the motherboard for expansion cards and the bus interface of any such card. CPU Architecture and Performance The design of the CPU, called the CPU architecture, includes several functional units: cache memory, arithmetic and logic unit (ALU), and a control unit (see Figure 8.1). A cache is memory that is on the CPU chip and that the
Rendering and Computing Requirements
177
controller can access faster than any other memory. Cache memory is, however, more expensive to manufacture than regular main memory. A CPU may contain one or more ALUs, the components that actually perform the computing operations. Program instructions and data are fetched from main memory and stored in cache. When a program executes, the control unit reads the instructions one at a time and generates the signals that make the ALU perform the action specified in the instruction. The instruction set of a processor is the list of all the operations that the CPU can perform. Examples are add, multiply, and read data. Like almost all digital circuits, components of the CPU are designed with synchronous logic, which means that the timing of the sequence of operations executed in all components is controlled by a global clock signal, one operation per clock tick. Many operations in the instruction set take more than a single clock cycle to execute, and the design of the ALU determines exactly how many clock cycles it takes to perform a particular operation on a specific CPU. CPU Speed One specification touted as differentiating CPUs is their clock speed. If a CPU has a specification of a clock speed of 1.5 gigahertz (GHz), it means that the synchronizing clock ticks 1.5 billion times per second. The basic unit for clocks is hertz, which means cycles or oscillations per second. Because different processor models have different instruction sets, a faster clock speed does not always mean faster program execution: Unless they are the same model of CPU with identical instruction sets, a processor with a clock speed of 3.4 GHz will not necessarily process faster than a 1.5 GHz CPU. The market has seen a steady rise in the performance of CPUs as manufacturing improvements have reduced the size of transistors and increased clock rates. Experts are saying that this path to improved performance is near its theoretical limit (Mistry et al., 2007). While exotic technologies, such as molecular or quantum computing, may replace today’s silicon based computing someday (Tay, 2008), these are unlikely to be widely available during the useful life of this volume. The most prominent method of increasing computation speed using the traditional silicon based CPUs is multiprocessing. Multiprocessing Multiprocessing is simply using multiple processors to attack whatever problem is being solved. Multiprocessing can take many forms, and they are generally differentiated by where the processors are located in relation to each other: on the same chip, on the same motherboard, or in separate chassis. Multiple processors on the same chip are called multicore processors. Examples include Intel’s Quad Xeon or AMD’s Phenom processors, each of which has four processors on a single CPU. These processors share both the main memory on the motherboard and the cache memory on the processor. When multiple CPUs are mounted on the same motherboard (each likely to be a multicore chip), each has its own cache memory, but they share the main memory. In
178
VE Components and Training Technologies
computing clusters, or loosely coupled computing, the CPUs are generally located on different motherboards in different computers. The computers in the cluster are connected by a high speed network, such as gigabit Ethernet. Multiprocessor systems with the CPUs on the same chip or on the same motherboard generally perform better and use less energy than clusters of PCs. However, clusters can be more economical, since the computers in the cluster do not need to be as powerful (and thus expensive) as a machine with the same capacity all on one motherboard. Clusters also scale better: as problem size increases, computers can be added to the cluster, increasing its computing power. Single motherboard multiprocessors generally must be replaced by another top-of-theline machine when the problem becomes beyond its ability. Alternately, more computers of similar capacity can be added to form a cluster. Writing software that exploits the power of multiprocessing will be challenging for current and new programmers. As Chas. Boyd, a member of the Direct3D team at Microsoft Corporation writes, Customers will . . . benefit only if software becomes capable of scaling across all those new cores. . . . For the next decade, the limiting factor in software performance will be the ability of software developers to restructure code to scale at a rate that keeps up with the rate of core-count growth. (Boyd, 2008, p. 32)
Here is an example of how increasing the number of processors to four (a common quad core CPU) and modifying the code will yield improvements in performance for a racing game. Just as now, the visuals will be produced using the systems GPU, but all the other computing requirements of the application will be split among the various processors. The user input and the networking might be processed on one processor, while the artificial intelligence guiding the other cars would be performed on another. The performance of the user’s car is simulated in great detail on the third processor, and the physics of everything else in the game, which does not require as much realism, is processed on the final processor. Of course, programmers might find that there are better results by performing some of the calculations on the GPU, but finding the optimal performance balance will be the scope of research in the next decade or so.
Memory and Mass Storage The processor’s memory system also affects the performance of a computer. There are three major types of data storage: cache memory on the CPU chip, (main) memory on the motherboard, and mass storage (typically magnetic disk drives) that is either local to the system (internal or external to the chassis) or networked. Respectively, these three types of storage range from smallest to largest, from fastest access time to longest, and from most expensive to least expensive per bit stored. The main elements in evaluating memory’s effect upon performance are the speed and the size of a system’s memory.
Rendering and Computing Requirements
179
Memory: Type and Size Memory speed is most strongly affected by its type. Today’s PCs most often use two forms of memory, dynamic random access memory (DRAM) and static random access memory (SRAM). The circuits in DRAM are simple and can be packed more densely than those of SRAM. However, the electrical charge that holds the data in DRAM must periodically be refreshed. This makes the supporting circuitry more complex. As of this writing, the highest performance systems use double-data-rate DRAM, which can be both read and written in a single clock cycle. The same characteristic of SRAM circuitry that makes it not require refreshing makes it faster. SRAM consumes more chip area for each bit of memory, and hence it is more expensive than DRAM. The cache memory is typically SRAM, and the main memory typically DRAM. Memory size is also important for performance. The cache, which can be accessed by the CPU without leaving the chip, is relatively small. In early 2008, the largest cache on commonly available CPUs was 16 megabytes (MB) on the Intel Xeon 7000 processors. When the dataset to be stored is larger than the cache size, the data are stored in the main memory, which is accessed across a bus on the motherboard. Likewise, if the dataset to be stored is larger than the main memory, some of the data must be moved to the much slower to access hard drive. In early 2008, manufacturers commonly produced commercial computers with 2 to 4 gigabytes (GB) of main memory, with up to 8 GB available in advanced gaming machines. Operating Systems and Maximum Main Memory Size While the general rule “more is better” definitely applies to memory, one caution regarding memory size is required: the size of the address space that is supported by the operating system used puts a hard limit on the size of memory that can be used in the system. Operating systems today have either a 32 bit or 64 bit address space (and are called 32 bit or 64 bit operating systems). This means that there are 232 (4.3 × 109) or 264 (1.7 × 1019) unique memory locations in the systems, respectively. While a 32 bit operating system can address a 4 GB memory, some of the addresses are reserved for memory on the graphics card as well as certain other hardware resources. Only 3 GB (+/−0.5 GB) of main memory are available for user programs. This limitation can be avoided by using a 64 bit operating system;1 both Windows XP and Vista as well as Linux come in both 32 bit and 64 bit versions. The downside of 64 bit operating systems is that backward compatibility is limited because many applications written for 32 bit operating systems cannot run on a 64 bit operating system. Mass Storage Mass storage, most often magnetic disk drives, is currently the dominant provider of persistent information in computer systems. This dominance will soon 1 64 bit operating system can access 264 memory addresses, or approximately 1.7 × 1010 gigabytes of information; this is anticipated to be sufficient for the foreseeable future.
180
VE Components and Training Technologies
be challenged by flash memory storage devices that have faster access times than magnetic disks. Mass storage is needed to hold programs and data that are loaded into main memory at run time to be used by one or more of the CPU, the GPU, and the physics processor. Datasets for VEs can be large and may contain such items as the geometry of animated models, interior environments, and exterior environments, such as terrain. Storage size and data access time are the primary metrics of mass storage systems. GRAPHICS PROCESSING Until the late 1990s graphics processing for interactive applications was accomplished using dedicated and often custom processing and display systems called graphics accelerators. These accelerators were almost exclusively implemented as a fixed-function rendering pipeline: transformation of the vertices that define the objects in the scene, determining which objects, or parts of objects, are visible from the current viewpoint, shading each pixel in the visible objects according to the lights in the scene, and writing the data to the display buffer. Early graphics cards for PCs, constrained by size in a time when the most powerful and flexible graphics accelerators were the size of a small refrigerator, had similar fixed pipelines. Acceleration for texture mapping was introduced to PCs on peripheral processor cards designed to accelerate rendering for PC based games. (See Whitton and Wendt, Appendix A, Volume 2, Section 1 Perspective for a brief introduction to texture mapping.) This advance that made visual scenes much richer and more realistic was the beginning of high quality, interactive graphics on PCs. Graphics Processing Units A new type of graphics accelerator began appearing in the early 1990s; its main feature is the GPU. Due to the high number of transistors that can be put on chips today, the single chip GPU is able to contain the many separate processors needed to efficiently implement the graphics rendering pipeline. The GPU chip generally is mounted on a graphics card that is plugged into the motherboard and is interfaced to the CPU over a high speed bus that is often, for performance reasons, dedicated to this single purpose. Graphics cards contain the GPU chip, control units, and somewhere between 256 MB and 1 GB of memory. Graphics cards also include video output circuitry that converts the data of the rendered frames into signals in video graphics array or digital visual interface format to drive monitors, head-worn displays, and projectors. Most modern GPUs use so much power that they generate enough heat to require that they come with a dedicated fan. Alternatively, a GPU chip can be mounted on the motherboard. In this configuration the GPU, rather than having dedicated memory, shares the main system memory with the CPU. Access to the main memory is slower than access to a dedicated memory, so the performance of GPUs mounted on the motherboard is lower than those on plug-in cards. In addition, GPUs for laptops typically are a
Rendering and Computing Requirements
181
generation (or two) behind desktop products, particularly with respect to the shader model they support (see the next section). Programming the GPU Succeeding generations of graphics chips and cards have had higher transistor counts, more processors, faster clock speeds, more memory, and higher graphics performance. The most significant change over the last 10 years has been the advent of user-programmable processing elements in the GPU. While the average training application programmer will never program the GPU, this capability gives rendering-software developers tremendous flexibility and, ultimately, control over the “look” of computer-generated scenes. The custom programs that execute on the programmable elements are called shaders, and almost all modern graphics hardware supports some level of shaders. Shader Models The different levels of shaders are designated with different shader model numbers; the higher the model number, the more portions of the graphics pipeline are user programmable and the more specialized rendering effects are possible. Table 8.1 shows the shader model supported by popular graphics cards and notes which processors are user programmable in that shader model. Figure 8.2 is a block diagram of a modern GPU. In the highest performance systems today the vertex shader, geometry shader, and pixel shader stages are all user programmable. Early shader models had hard programming limitations, such as a maximum of 128 instructions and no floating point arithmetic. These restrictions are largely eliminated in more recent shader models, though GPUs are still not general purpose computing resources. Examples of operations that can be performed on the different shading processors follow: • Vertex shader—Create model of terrain from a flat tessellated plane: As each vertex in the model is accessed, add to its height an offset corresponding to the height of the desired terrain at that point. Table 8.1. Shader Models Supported by Graphics Cards Graphics Card
Shader Model Supported
ATI Radeon HD 3870 X2
4.1
NVIDIA GeForce 8800
4.0
ATI Radeon X1900
3.0
NVIDIA GeForce 8800
3.0
ATI Radeon 9800 Pro
2.0
NVIDIA GeForce 8800
2.0
ATI Radeon 8500
1.1/1.4
NVIDIA GeForce 8800
1.1/1.3
182
VE Components and Training Technologies
Figure 8.2.
Block Diagram of a Modern Graphics Processing Unit
• Geometry shader—Amplify datasets: As point primitives arrive at the geometry shader create a quad (quadrilateral primitive) centered on the original point. Similarly, to deamplify, collapse a quad into a point primitive located at the center of the quad. • Pixel shader—Customize the appearance of any pixel on the screen: Program effects, such as night vision, depth-of-field, and motion, blur and apply them only to selected pixels.
As graphics processors evolve, shaders are becoming more deeply ingrained in real time rendering methodologies and the software that supports them. For instance, the DirectX 10 and OpenGL ES 2.0 application programming interfaces
Rendering and Computing Requirements
183
have completely abandoned the fixed function pipeline in favor of the programmable one where vertex and pixel shaders are required for rendering. Shaders give rendering programmers the power to modify data as it moves through the graphics pipeline and to program any visual effect they can imagine. The inflexibility of the fixed function rendering pipeline is gone. Since programmable shaders are such a recent capability, developers are unlikely to have come close to fully exploring the possibilities enabled by them. Readers desiring a deeper understanding of shaders should see Engel (2003, 2004).
Workload Balance In the days of the fixed pipeline, it was the responsibility of the hardware designers to ensure that the pipelined processing stages ran with approximately the same throughput rate so that overall performance was maximized—no stage sat idle waiting for the previous one to complete its computations; the processing load was balanced across the stages. While the GPU hardware is designed to be balanced when executing vendor-written software, user written programs can lead to inefficiencies if the load is not balanced across the shader stages. A simple observation makes clear the need to pay attention to what processing is done in each stage: the vertex shader runs once per frame for every vertex of every object in the scene, perhaps 300,000 vertices. The pixel shader runs once for every pixel in every frame, 1,310,720 times for a 1,280 × 1,024 frame—about 1 million more times for each frame than does the vertex shader! As a general rule, only operations that must be executed in the pixel shader should be performed there. Any operation that can be reasonably moved upstream into the vertex shader (and subsequently have its output values interpolated to be inputs to the pixel shader) will provide significant performance savings. Knowing how to best distribute application workload can be determined through the use of performance profiling. At the most coarse-grained level, time is divided between what is spent on the CPU with that spent on the GPU. A thorough discussion of performance profiling strategies is beyond the scope of this chapter, but suffice it to say that this is a crucial part of understanding how to optimize graphics performance. There are many tools available to aid in this process, such as Intel’s Vtune, AMD’s CodeAnalyst, GPU Perfstudio, and NVIDIA’s PerfKit.
General Purpose Computing on GPUs Because the mathematical operations needed for graphics are similar to the operations needed for many types of scientific computing, it was natural that programmers began using the programmable shading stages for nongraphics computing. This is so prevalent that is it called general-purpose GPU (GPGPU) computing. Fluid simulations, computer vision, and rigid body physics are examples of nongraphics algorithms that can be accelerated by programming them on the GPU.
184
VE Components and Training Technologies
As GPGPU programming has become more commonplace, programming tools to help the coder write his or her code (as opposed to analyze it, as the tools in the preceding paragraph do) have become more common. Already, the largest makers of GPUs, NVIDIA and AMD, have created interfaces to help the programmers do this. NVIDIA’s CUDA (Compute Unified Device Architecture) and AMD’s CAL are designed to “abstract computation as large batch operations that involve many invocations of a kernel function operating in parallel” (Fatahalian & Houston, 2008). For more information about how one of these interfaces, CUDA, does this, see Nickolls, Buck, Garland, and Skadron (2008). Practical Considerations Choosing a Graphics Accelerator A general rule-of-thumb with graphics processors is to always buy the fastest, most capable card available for development work. The cost differential is likely to be small and by the time the finished system is deployed, what was earlier topof-the-line functionality will be available at an affordable price. Designers must, however, be cautious when the development is to be done on a desktop machine and the deployment vehicle is to be a laptop. That situation requires careful analysis of current and expected future performance of laptop graphics before the development platform is chosen. Not all PCs are built with a large enough power supply to support the highestperformance graphics cards. Users configuring their own systems should look carefully at the power requirements of the overall system. Adequate power should not be an issue for users purchasing a preconfigured system from a reputable dealer. Matching Shader Models across VE System Components The game engines (see McDowell, Volume 2, Section 1, Chapter 10) that are the core software of many simulations and scenario based training systems are typically written to support a particular shader model. Mismatches between the shader models supported by the software and hardware always result in unused potential in either the hardware or software and images that include only the effects available in the lower model number. Buying a card that supports a higher level model will get you all of the features available from the game engine’s current software release and enable you to enjoy the new features when the software is later updated to support the higher level model. When More Than One Graphics Card Is Necessary In some cases one graphics card is not enough, for instance, when the dataset is so large that the time to render a new frame is unacceptably long or when the application requires more than two video outputs. In the first case the output frame can be subdivided and each section assigned to a separate graphics card and then the partial images recombined for display. In the second case, for
Rendering and Computing Requirements
185
instance, a display wall consisting of an array of projectors, a separate card may be needed for every projector. The outputs of the multiple cards must be synchronized so that updates to the final image from each card all occur at the same time. Without synchronization, the overall composite image (created from the several sources) may appear unstable. As of this writing, NVIDIA and ATI offer SLI and CrossFire products, respectively, for splitting rendering across multiple graphics cards. Cards that offer frame lock or gen lock capability can be synchronized for multiprojector applications. These features are often available on cards designed for professional video applications. Additionally, programmers are beginning to explore the possibilities multiple graphics cards provide for computations other than graphics. For example, the architecture of programmable graphics cards lends itself to any sort of computation where a great deal of parallelism is desired, such as physics, AI, or others. In these systems, one graphics card is used for normal graphics processing, while another is used for whatever other calculations the application programmers decide to send. This trend is likely to continue, since NVIDIA recently purchased Ageia, a maker of physics cards, and many industry insiders believe this is part of NVIDIA’s efforts to expand the uses of its cards outside graphics.
NETWORKING Networking has emerged as a cornerstone of virtual environments today. Although new single-user games are still being written, increasingly game worlds are becoming virtual places where multiple users interact. In the commercial arena, this has led to such massively multiplayer online games (MMOGs) as World of Warcraft and EverQuest and such online environments as Second Life. People often find the sense of kinship and camaraderie they experience while playing multiplayer games on the Internet more important than the gameplay itself. In fact, the popularity of Second Life makes that point clearly, since it is not a game; it does not have points or scores; it does not have winners or losers. Similarly, even though simulators for single-user training are developed, the focus is moving to training teams. Networked simulations allow teams to train together in the same virtual world and permit trainers or subject matter experts to participate as opposing forces in ways that computer-controlled characters cannot. The networking requirements for games and simulations vary little. Application requirements drive the choice of networking technology. In a large virtual world such as Second Life, it is most important to support large numbers of simultaneous users in the same online environment, and it is acceptable to have longer latency—longer between updates reflecting the movement and actions of others in the scene. However, in training simulations, very low latency is required since it is unacceptable for a player to, for instance, fire at an enemy who is visible on the screen in one place, but whose actual location is somewhere else. This chapter will cover the advantages of limitations of three networking standards in common use: the distributed interactive simulation (DIS) architecture and the high level
186
VE Components and Training Technologies
architecture (HLA), both created by the U.S. military, and the client/server paradigm commonly used in the game industry (as well as many other commercial applications). The DIS architecture was originally created by Bolt, Beranek, and Newman in the late 1980s for the U.S. Army’s simulated network (SIMNET) program. The idea was to allow hundreds of entities to take part in the same simulated exercise communicating via the relatively limited bandwidth that was available at the time. In the early 1990s, several other groups began using the DIS networking protocol in SIMNET as the networking component of their own simulations, and it became obvious that this protocol should be formalized. DIS was officially made IEEE (Institute of Electrical and Electronics Engineers) standard 1278 in 1993, and the IEEE has authorized several modifications to it over the years. DIS works by sending protocol data units (PDUs) over the network using either the transmission control protocol or user datagram protocol network transport protocol. There are PDUs to represent each kind of data that must be shared in order for all participants to have enough understanding of the state of the entire simulation to be able to fully participate in it. For example, one of the most common PDUs is the entity state PDU, which passes the state (location, velocity, amount of damage, fuel, and so forth) of an entity to the other entities in the simulation. As the entity changes its state (for example, changes velocity), it sends out an entity state PDU to notify others in the simulation of its new state. One of the most important parts of the DIS architecture is the fact that each entity keeps track of only its own location and uses dead reckoning to update the location of the other entities in the world for each frame between receipt actual location data in entity state PDUs. Dead reckoning is a method of estimating an entity’s new position by predicting location over time by extrapolating from the last known position and velocity. A consequence of dead reckoning is that an entity’s position can appear to jump when the position determined by dead reckoning is different from and is replaced by the real position when the next state PDU is received. Another networking protocol created by the U.S. military is the high level architecture. HLA emerged in the mid-1990s as a standard intended to allow interoperability between the military’s many dissimilar simulations. The military simulators were built by different contractors for different services, which had different needs. It is not surprising that the simulators could not communicate with each other. HLA was designed to correct this problem. HLA has a vernacular with which training system designers should have at least a passing acquaintance. To be an HLA compliant simulation (referred to as a federate), the simulation must use a specified interface to pass information to a run-time interface (RTI). An RTI is not designed to be part of the simulation, but rather is software that plugs into the simulation and uses a common template to pass information from one simulation to the others. Multiple simulations connected via RTIs using a common template are called a federation. For each federation, there must be a federation object model that contains all the data objects, interactions, and attributes that will be passed among simulations.
Rendering and Computing Requirements
187
RTIs are not part of the simulation, but rather they are additional software. Several vendors sell RTIs. Ideally, any RTI should be able to serve as the connector for a particular simulation and accurately communicate with an RTI of a different vendor acting for a different simulation. In reality it is not that easy: some vendors have augmented their RTIs to perform other tasks and to pass information other than that required by the HLA specification. This can cause interoperability problems. There is debate about which protocol, DIS or HLA, is better. Most U.S. simulations are required by a Department of Defense directive to use HLA for communication. However, many other nations’ militaries, not required to use HLA, choose to use DIS for their simulations. In reality, each has advantages and disadvantages, and the developer needs to perform due diligence in choosing a protocol for a large-scale simulation. The other method of networking is the one that most multiplayer computer games use, a client-server architecture. The server, generally a very powerful computer with an extremely high speed network connection, holds the game “truth.” The players’ computers are referred to as the clients, and they connect to the server, which passes them information on the global state of the game. As a player performs actions (for example, move, shoot, or crouch), the client sends messages to the server, and the server in turn sends messages to the other players’ clients to update them on the first player’s actions. How these messages are sent has quite a bit of effect upon the numbers of players that can interact in the same world. For example, in MMOGs, where there can be several thousand people on the same server at one time, quite often messages about player A’s actions are sent only to those players near player A. This is an acceptable solution since those players a great distance from player A in the virtual world (and cannot see him or her) do not need to know about his or her actions. However, if too many players congregate in a small area of the virtual world, it can seriously degrade network performance and, hence, the rate at which state updates are received by each player. CONCLUSION Because PCs have surpassed vendor-produced custom solutions for simulations, virtual environments, and games, trainers who want to use any of these technologies need to be versant in the PC technology that will play such a major part of their training applications. The biggest change in the last 15 years has been the advent of the GPU, especially the opening of the pipeline to application programmers, and the parallelization of processors in both CPUs and GPUs. What the final product of these two changes will be is not yet known. As Kurt Akeley, member of the founding Silicon Graphics team who did pioneering work on OpenGL and is now at Microsoft Research, asks, What we’re talking about isn’t just whether we can use graphics processors to do general-purpose computing, but in the bigger sense, how will general-purpose computing be done? How will graphics processing and other technologies that have
188
VE Components and Training Technologies
evolved influence the way computing is done in general? That’s a big issue that the world’s going to be working on for the next five to ten years. (Duff, 2008)
REFERENCES Boyd, C. (2008). Data-parallel computing. ACM Queue, 6(2), 30–39. Duff, T. (2008). A conversation with Kurt Akeley and Pat Hanrahan. ACM Queue, 6(2), 11–17. Engel, W. F. (2003). ShaderX2—shader programming tips and tricks with DirectX9.0. Plano, TX: Wordware Publishing, Inc. Engel, W. F. (2004). Programming vertex and pixel shaders. Boston: Charles River Media. Fatahalian, K., & Houston, M. (2008). GPUs: A closer look. ACM Queue, 6(2), http:// doi.acm.org/10.1145/1365490.1365498 Mistry, K., Allen, C., Auth, C., Beattie, B., Bergstrom, D., Bost, M, et al. (2007). A 45nm logic technology with high-k+metal gate transistors, strained silicon, 9 Cu interconnect layers, 193nm dry patterning, and 100% Pb-free packaging. IEEE International Electron Devices Meeting—IEDM 2007 (pp. 247–250). Nickolls, J., Buck, I., Garland, M., & Skadron, K. (2008). Scalable parallel programming with CUDA. ACM Queue, 6(2), 40–53. Tay, E. (2008). The death of the silicon computer chip. IT news [Electronic version]. Retrieved April 23, 2008, from http://www.itnews.com.au/News/72838,the-death-ofthe-silicon-computer-chip.aspx
Chapter 9
BEHAVIOR GENERATION IN SEMI-AUTOMATED FORCES Mikel Petty Virtual environment training simulations often include simulated entities (such as tanks, aircraft, or individual humans) that are generated and controlled by computer software systems rather than individual humans for each entity. Those systems, which are known as semi-automated forces (SAF) because the software is monitored and controlled by a human operator, play an important role in virtual environment simulations. The purposes of SAF systems, the main behavior generation approaches used in them, examples of important semi-automated forces systems and their applications, and open SAF research problems are described in this chapter. After beginning with a motivating scenario to suggest their importance, this introductory section places SAF systems in the context of virtual environment simulations and describes some specific purposes and applications for SAF systems. AN INFORMAL MOTIVATION Consider the following scene. Four U.S. Army soldiers sit at the controls of a training simulator. The simulator, which is about the size of a garden shed, appears from the outside to be a connected set of computers, monitors, and large green fiberglass enclosures. From the inside, the simulator is a simplified but believably realistic recreation of the interior of an M1A1 Abrams, the U.S. Army’s main battle tank. The four soldiers are the M1A1’s crew. They manipulate the simulator’s controls as they would in an actual tank, driving their tank through a simulated battlefield that they can see through the view ports of their tank. Computer generated images for each of the view ports show the battlefield as it would be seen from that point. A second crew is at the controls of another M1A1 simulator. In the real world that simulator may be adjacent to the first, or it may be hundreds of miles away, but the two are connected by a computer network. In the simulated battlefield the second tank is following the first, about 30 meters behind. As the two M1A1s move slowly forward, the commander of the lead M1A1 warily surveys the terrain from his vantage point in the cupola atop the turret,
190
VE Components and Training Technologies
searching for the enemy vehicles that are likely to be nearby. As his tank crests a ridge, he spots a column of enemy tanks emerging from behind a tree line some 2,000 meters away. The enemy tanks are generated in the battlefield by another simulator node, attached to the M1A1 simulator via the network. However, they are not controlled by human crews; rather, computer software is generating their behavior, as well as that of many other vehicles in the simulated battlefield. The commander of the lead tank radios the commander of the second M1A1, who cannot yet see the enemy tanks, and warns him of the threat. Then, over the simulator’s intercom, he orders the driver to turn the M1A1 to face its frontal armor toward the enemy tanks and to stop so as to provide the gunner an easier firing problem. The commander’s feeling of urgency is easily heard in his voice as he tells the gunner where the enemy tanks are, which one to engage first, and what ammunition to use. As quickly as his skills allow, the gunner rotates the M1A1’s turret and elevates the main gun to align the aiming reticule with the first target. In quick succession he thumbs the laser range-finder button and squeezes the main gun trigger; the M1A1 simulator’s sound system produces the sound of the main gun firing and the first enemy tank bursts into flames. While the gunner executes his shot, the second M1A1 comes over the ridge and the commander of the lead M1A1 orders the second crew to engage the second enemy tank. The commander observes that the rest of the enemy tanks have responded to the incoming fire by reversing direction and taking cover behind the tree line. After the second M1A1 destroys another enemy tank, there are no targets visible. Several seconds pass while the commander assesses the situation. His apprehension growing, he orders both M1A1s to move back behind the ridge crest. But the decision comes a moment too late. Before either M1A1 can complete the maneuver, enemy tanks have emerged from behind both ends of the tree line. One of the enemy tanks sights the lead M1A1, turns toward it, and quickly stops. Its turret swings around and the enemy tank fires. The sound system of the M1A1 simulator produces an unpleasantly loud crashing sound, and the screens of the simulator go black; the lead M1A1 has been destroyed by the enemy tank. Because this is simulation, the commander of the lead tank is not dead, but he is nonetheless dismayed and pounds his controls in frustration. The scene just described has two crucial elements. First, the simulation succeeds in creating an environment with enough intensity and urgency to draw the soldiers into the simulated world. Second, it includes autonomous entities that oppose the simulation users, attempting to thwart and destroy them. To a large extent, both the simulation’s intensity and its usefulness as a training system depend on the sophistication and realism of the behavior of the autonomous entities. How that behavior is generated is the subject of this chapter.
Distributed Simulation Virtual environments, especially those that include SAF systems of the type discussed here, are typically constructed using a simulation technology known
Behavior Generation in Semi-Automated Forces
191
as distributed simulation. (SAF systems may also be used in other nondistributed systems.) In a distributed simulation, large simulation systems are assembled from a set of independent simulation nodes communicating via a network. Crewed simulators of the type described earlier and SAF systems may be nodes linked in a distributed simulation. Distributed simulation adds implementation complexity to the nodes, but it has benefits, including scalability (larger scenarios can be accommodated by adding more nodes to the network), specialization (simulation nodes optimized for a specific purpose can be combined to produce a complete simulation), and geographic distribution (the nodes need not all be at the same location). In a distributed simulation the networked nodes report the attributes (for example, location) and actions (for example, firing a weapon) of interest regarding their simulated entities by exchanging network messages. A network protocol defines the format of the messages, the conditions under which specific messages should be sent, and the proper processing for a received message. Several standard distributed simulation network protocols exist, including distributed interactive simulation (Institute for Electrical and Electronics Engineers, 1995), high level architecture (Dahmann, Kuhl, & Weatherly, 1998), and Test and Training Enabling Architecture (TENA) (U.S. Department of Defense, 2002).
SAF Purpose and Advantages in Training In a military training application, the virtual environment system is intended to provide a simulated battlefield in which training scenarios are executed. In such a battlefield, the trainees need an opposing force against which to train. One method of providing opponents is to have two groups of trainees in simulators fight each other. This method is sometimes used, and the trainees may enjoy the competitive aspects of the arrangement, but it increases the number of simulators needed at a training site and requires that to train any given military unit a second unit be available to provide the opposition. It also can mean that the trainees are faced with opponents who employ the same tactical doctrine as they do, which is not likely to be the case in combat. A second method is to use human instructors who are trained to behave according to the desired enemy doctrine. This method does not reduce the need for simulators and is manpower intensive. Nevertheless, it is sometimes used, especially in live simulation (such as the dedicated opposing force at the U.S. Army’s National Training Center located at Fort Irwin, California). The third method is to use a simulation node that generates and controls multiple simulation entities using software, possibly supported by a human operator. Such nodes are known as semi-automated forces or computer generated force (CGF) systems. SAF systems can lower the cost of a virtual environment training system by reducing the number of crewed simulators and the number of humans required to operate the system for a given scenario size and by generating large numbers of computer-controlled entities. A SAF system can be programmed to behave according to the tactical doctrine of any desired opposing force and so
192
VE Components and Training Technologies
eliminate the need to train and retrain human operators to use tactics appropriate to different enemies. In addition to providing opponents, SAF systems can also generate friendly forces, allowing a group of trainees to practice cooperation with a larger friendly force. Because a SAF system can be easier to control by a single person than an opposing force made up of many human operators, it may give the training instructor greater control over the training experience.
SAF SYSTEM CHARACTERISTICS This section first outlines some of the common characteristics of SAF systems. It then focuses on methods for generating and validating SAF behavior.
Components and Capabilities Certain characteristics are common to all existing SAF systems and are essentially inherent in the context in which those systems are used. Several important ones are discussed in the following: Network connection and protocol. When SAF systems are part of a distributed simulation, they need a network connection and interface software to send and receive network messages in compliance with the network protocol standard. Battlefield phenomenology models. The SAF-controlled entities exist in a battlefield that is a simulated subset of the real world, so the physical events and phenomena on the battlefield must be modeled within the SAF system. For example, if a SAF-controlled vehicle is moving, its acceleration, deceleration, turn rates, and maneuverability on different terrain types must be modeled. Combat interactions need to be modeled in accordance with the physics of weapon and armor performance characteristics. Support for multiple entities. SAF systems are typically able to simulate multiple entities simultaneously. The SAF system’s software architecture must provide a means to allocate processing time so that all of its controlled entities have their actions and behavior generated frequently enough to keep pace with the overall simulation. Autonomous behavior generation. SAF systems use behavior generation algorithms to react autonomously to the simulation situation or to carry out orders given by an operator. This characteristic of SAF systems is the primary topic of this chapter and will be discussed in more detail later. Operator interface. In addition to the autonomous behavior, most SAF systems provide an operator interface that allows a human operator to control the SAF entities. Figure 9.1 shows an example of a typical SAF system operator interface, from the One Semi-Automated Forces (OneSAF) system (to be discussed later). The operator may provide high level plans that are executed in detail by the SAF system, initiate and control behavior in situations that are beyond the SAF system’s capabilities, or override autonomously generated behavior. SAF system interfaces typically provide a map display of the battlefield that shows the
Behavior Generation in Semi-Automated Forces
Figure 9.1.
193
Example SAF Operator Interface (OneSAF)
battlefield terrain and the simulated entities on it, together with a human command interface. Behavior Generation The actions of SAF-controlled entities within the virtual environment have two aspects, physics and behavior. The physics aspect was mentioned earlier. The movements and interactions of SAF-controlled entities are under the control of the physical models in the SAF system, which produce a level of physical realism appropriate to SAF applications. The second aspect is behavior. Here the question is not how the SAFcontrolled entities execute their actions, but what actions they execute. SAF entities are often representing either humans or vehicles with human crews and, as such, must act in ways that not only comply with the laws of physics, but make sense in terms of human behavior and tactical doctrine. Generating realistic behavior has been challenging due to the relative complexity of human behavior and the long-standing difficulty of encoding it in an algorithmic form suitable for computer execution. Generating behavior by simply having humans fully control the SAF entities via an operator interface is not satisfactory, as the goal of the SAF system is to make the behavior generation as autonomous as possible. Autonomous behavior generation for SAF entities requires that the patterns and rules of behavior be encoded in the algorithms of the SAF system. The types of algorithms used in SAF systems for behavior generation can, broadly speaking, be grouped into two categories, here termed cognitive modeling and behavior emulation. Cognitive modeling approaches to behavior
194
VE Components and Training Technologies
generation in SAF systems begin with the assumption that generating realistic human behavior is best done by modeling human cognition, or at least those portions of it pertinent to the behavior. Cognitive modelers assert that, to varying degrees, the computation that occurs within their systems is, in fact, modeling human cognition. Several broad cognitive modeling frameworks and architectures, each based on a particular theoretical model of cognition and intended to support a wide range of human behavior generation, have been developed and have seen use in multiple applications. Noteworthy general cognitive modeling examples include ACT-R (adaptive control of thought–rational) (Anderson & Lebiere, 1998), EPIC (executive process/interactive control) (Meyer & Kieras, 1997), SOAR (Laird, Newell, & Rosenbloom, 1987), and COGNET (cognition as a network of tasks) (Zachary, Ryder, Weiland, & Ross, 1992). More specialized aspects of SAF behavior have also been implemented using cognitive models, such as tactical air combat (Nielsen, Smoot, Martinez, & Dennison, 2001) and commander decision making (Sokolowski, 2003). As with cognitive modeling, behavior emulation approaches to SAF behavior generation are also intended to produce realistic human behavior. However, in contrast to cognitive modeling, in behavior emulation there is no claim or intent that the algorithms used to produce behavior model human cognition. The goal of behavior emulation is solely to generate usefully realistic behavior for the SAF entities, without regard to whether the algorithmic processes used to generate the behavior correspond in any way to human cognitive processes. Although considerable progress has been made in cognitive modeling, behavior emulation is currently more common than cognitive modeling in production SAF systems. Reasons for this include the reuse of legacy behavior emulation approaches to reduce development costs and the comparatively high computational expense of some cognitive modeling methods in real time SAF systems. Because of its prevalence in SAF systems, behavior emulation will be the focus here. The most widely used approach to behavior generation in production SAF systems has been finite state machines (FSMs). The FSM approach falls into the behavior emulation category, as FSMs do not appear to be and are not claimed to be models of human cognition. (The FSM approach makes use of ideas from formal automata theory, but behavior generation FSMs do not have all of the mathematical properties of theoretical finite state automata.) Over the last two decades variations of FSMs have been used to generate human behavior in a number of SAF and non-SAF systems (Maruichi, Uchiki, & Tokoro, 1987; Petty, Moshell, & Hughes, 1988; Smith & Petty, 1992; Calder, Smith, Courtemanche, Mar, & Ceranowicz, 1993; Aronson, 1994; Ahmad, Cremer, Kearney, Willemsen, & Hansen, 1994; Moore, Gieb, & Reich, 1995; Ourston, Blanchard, Chandler, Loh, & Marshall, 1995). The repeated use of FSMs suggests their intuitive appeal and effectiveness. The common idea among FSM implementations is that a simulation entity’s behavior is decomposed into a finite set of behavior patterns, or states, with identifiable and discrete conditions for transitioning between the states. An entity’s
Behavior Generation in Semi-Automated Forces
195
controlling FSM is always assumed to be in one of its states. Associated with each state is an implementation of that state’s behavior pattern in the underlying programming language, for example, a C function or a Java method. While the FSM is in a state, the entity’s behavior is generated by executing that state’s associated implementation; thus the current state of the FSM determines what behavior is being generated for the entity. Conditions that depend on events or attributes in the simulation are associated with transitions from one state to another in the FSM. When a transition condition is true, the FSM changes state, thereby changing the entity’s behavior. In addition to complex predicates, the transition conditions may also be null, allowing a transition to occur as soon as the first state has executed, or simple time delays, to produce realistically timed changes in behavior. Figure 9.2 is an example FSM taken from an early research SAF system (Smith & Petty, 1992), which is examined here to illustrate the technique. It controls the behavior of an infantry fireteam in the process of using an antitank guided missile. When the fireteam is given permission to fire such missiles (perhaps via the operator interface or by some other FSM), the FSM is started. The FSM’s start state, di_open_file_atgm, automatically transitions after .25 seconds to the di_await_atgm_target state, where target acquisition and selection are performed. That state repeats every second until a suitable target is found. When this occurs, di_await_atgm_target first it starts another FSM, face_target, which causes the
Figure 9.2.
Example SAF Finite State Machine
196
VE Components and Training Technologies
fireteam to face the intended target. It then transitions to the di_stop_and_kneel state, which brings the fireteam to a halt (it may have been moving while watching for a target) and then transitions to the next state, di_fire_atgm, after a delay corresponding to the missile setup time. Assuming the target is still visible, that state launches a missile by starting another FSM, fire_missile, which generates the missile launch flash, controls the missile in flight, and handles the missile’s impact at the end of the flight. The di_fire_atgm state waits for the fire_missile FSM to report that the missile flight is finished, whereupon the di_reload_atgm state is started. That state reloads the fireteam’s antitank missile (if sufficient munitions are available), and after a realistic time delay transitions to the di_await_atgm_target state for another cycle. The behavior described as being performed by a state is generated by the programming language code associated with the state. The FSM mechanism serves to partition a complex multistep behavior into simpler parts, to associate programming language code with each part, and to control the execution of that code and thus generate the behavior.
Behavior Realism Requirements For the benefits of a SAF system to be realized, the SAF entities must behave in a usefully realistic manner. In general, interacting with a SAF will produce positive training only if the behavior generated for the SAF entities is physically realistic, behaviorally realistic, and doctrinally consistent. The training benefit of the virtual environment is reduced or lost if its physical realism does not adequately conform to trainees’ experiences in the real world. Vehicles should operate according to their performance characteristics, and terrain must be considered when determining whether two entities have a line of sight to each other. Current SAF systems are generally sufficiently physically realistic. Behavioral realism is more difficult. SAF-controlled entities must react to a given situation in a manner similar to the (real) entities that are being simulated. Because the simulated entities are often controlled by humans, the SAF behavior must appear to be similar to, and thus as intelligent as, human behavior in each situation. Fortunately for SAF developers, the context of their use often makes the SAF system’s intelligent behavior requirement less difficult than the general artificial intelligence problem. For example, in a training simulation emphasizing vehicle combat, the repertoire of behaviors of a tank crew is much smaller than that of general human behavior, so intelligent behavior by a SAF-controlled tank is easier to generate than general intelligent behavior. Even so, generating intelligent behavior is still challenging, especially in a real time simulation that precludes the use of powerful but slow-executing artificial intelligence techniques. SAF behavior must be doctrinally consistent in the sense that the actions of the SAF-controlled entities should be consistent with doctrine of the entities the SAF is simulating. For example, SAF entities purportedly part of the armed forces of a particular nation should maneuver and perform according to that nation’s tactical doctrine. This goes beyond simple believability; a goal of military training
Behavior Generation in Semi-Automated Forces
197
systems is to allow trainees to practice against opponents that use the tactics of the expected adversary. With the increasing prevalence of irregular and asymmetric opponents with no fixed tactical doctrine, the meaning of doctrinal consistency is more difficult to define, but it remains important to present trainees with SAF opponents that exhibit behavior like that they will encounter in actual operations. Behavior Validation Simulation validation is the process of determining whether or not the results produced by a model are consistent with the actual phenomenon or process being simulated (Balci, 1998). Validation of SAF systems must include validation of the behavior generated for the SAF-controlled entities. Validating a model of human behavior is inherently problematic, as human behavior is complex and subtle, variable in almost all situations, and generally understandable only in the context of a time sequence of actions. Nevertheless, efforts have been made to develop methods to validate SAF behavior (Petty, 1995). Models of specific aspects of SAF behavior sometimes prove amenable to quantitative or statistical validation methods; examples include operational decision making (Sokolowski, 2003) and reconnaissance route planning (Van Brackle, Petty, Gouge, & Hull, 1993). However, validating the full repertoire of SAF behavior often depends on observation of the generated behavior by subject matter experts who make qualitative assessments of the behavior based on their experience and expertise, a process termed face validation. Face validation can be unstructured, with experts simply observing the SAF behavior in a typical scenario, or highly structured, with multiple preplanned scenario vignettes designed to stimulate the SAF system in specific ways and elicit expected behaviors in response. A special form of face validation applied to SAF behavior validation is the Turing test. First proposed as a test of intelligence for computer systems (Turing, 1950), for SAF validation the Turing test is often formulated this way: Can observers of entities in a simulated battlefield reliably determine whether any given entity is controlled by humans or by a SAF system? The Turing test, in both its original and SAF forms, is purely operational in that it deliberately ignores the question of how the SAF behavior is generated; it is interested only in the quality of the generated behavior. Many SAF developers believe that the Turing test is a useful SAF validation method (Wise, Miller, & Ceranowicz, 1991; Petty, 1994). Others argue that the Turing test cannot be relied upon as the sole means of evaluating a human behavior generation model, giving examples that demonstrate that, while it can be useful, it is neither necessary nor sufficient to ensure the validity of SAF behavior on its own (Petty, 1994). EXAMPLE SAF SYSTEMS This section describes two existing SAF systems, chosen because of their wide usage, their importance in military training applications, and the extensive effort
198
VE Components and Training Technologies
and resources devoted to their development. There are many other SAF systems of various types in addition to these. ModSAF At one time ModSAF (Modular Semi-Automated Forces) was arguably the most important SAF system. During its period of use it was widely distributed, supported a diverse range of production and research applications, and was extensively modified and enhanced (Ceranowicz, 1994). ModSAF was the intellectual descendant of earlier SAF systems, and many ideas developed for ModSAF persist in newer SAF systems (including OneSAF, to be discussed later). ModSAF could generate many different entity types, including fixed and rotary wing aircraft, tanks, infantry fighting vehicles, other vehicles, and groups of dismounted infantry, as well as platoon-, company-, and battalion-sized units. ModSAF included an operator interface component allowing a human operator to direct the entities, and a simulation component that simulated the individual entities, military units, and environmental processes. The latter performed both physical simulation (for example, vehicle dynamics and weapons effects) and behavioral simulation (for example, route planning and mission execution). ModSAF was a real time, time-stepped simulation with a variable update rate that depended on the computational load (Ceranowicz, 1994). ModSAF was designed to have a modular software architecture. ModSAF modules were intended to constitute a repository of useful capabilities that could be used in different ways in different SAF systems and could be easily replaced by developers of new ways to provide SAF functionality. ModSAF included a variety of entity simulation modules in several categories (dynamics models, turret models, weapons models, sensor models, and damage models), which could be combined for a new entity via parameter file specifications. If a new entity type could not be assembled from the existing modules, new ones could be developed. ModSAF relied on a human operator for two functions: to set up preplanned missions for ModSAF entities and units and to provide supervisory control of the behavior of the simulated entities during simulation runs. The operator performed those functions using a map of the virtual battlefield that showed the terrain and the entities (similar to Figure 9.1) and allowed the operator to create movement routes, military control measures, and battle positions. The operator could create preplanned missions for ModSAF entities that were divided into a number of phases; for each phase the operator defined the tasks a unit was to perform and the criteria for a transition to the next phase. The operator could also give commands for immediate execution by entities and units. Such intervention might have been necessary when ModSAF’s automated behavior was not handling a situation correctly or when a scenario called for a specific event that must be arranged by the operator (Ceranowicz, Coffin, Smith, Gonzalez, & Ladd, 1994). The basic building block of the ModSAF behavior generation mechanism was the task, which was a single nondecomposable behavior performed by an entity
Behavior Generation in Semi-Automated Forces
199
or unit (Calder et al., 1993). Tasks were implemented within ModSAF as finite state machines. A ModSAF FSM represented a task as a set of states that each encoded a component action of the task, a set of transition conditions that determined and caused transitions between the states, and a set of inputs and outputs for the task. Fairly complex behaviors were implemented as ModSAF FSMs; some interesting examples included near-term movement control (Smith, 1994) and finding cover and concealment (Longtin, 1994).
OneSAF OneSAF is the U.S. Army’s newest constructive battlefield simulation and SAF system. (Here the term “OneSAF” is used for brevity; it refers to the OneSAF Objective System.) It is the result of an extensive development effort, including extended preparatory experimentation with models and implementation techniques using an enhanced version of ModSAF known as the OneSAF Testbed. OneSAF is intended to replace a number of legacy entity based simulations, to serve a range of applications including analysis of alternatives, doctrine development, system design, logistics analysis, team and individual training, and mission rehearsal, and to be interoperable in live, virtual, and constructive simulation environments (Parsons, 2007). OneSAF’s capabilities incorporate the best features of previous SAF systems (Henderson & Rodriquez, 2002) and include such advanced features as aspects of the contemporary operating environments (Parsons, Surdu, & Jordan, 2005), multiresolution terrain databases with high resolution buildings, and command and control systems interoperability (Parsons, 2007). OneSAF has been developed using modern software engineering practices and has a product line architecture that allows the software components of OneSAF to be reusable in different configurations for different applications (Courtemanche & Wittman, 2002). The behavior generation mechanism in OneSAF combines tested concepts that have appeared multiple times in various forms in SAF research (primitive and composite behaviors) with the latest modeling approaches (agent based modeling) (Henderson & Rodriquez, 2002). As with ModSAF, behavior generation in OneSAF is behavior emulation, rather than cognitive modeling. The basic level of behavior representation in OneSAF is primitive behaviors, which are units of “doctrinal functionality” (Parsons, 2007), such as behaviors executable by OneSAF actors (in OneSAF actor is a generic term for entity or unit). Primitive behaviors typically consider perceptions about the simulated world and invoke actions in that world (Henderson & Rodriquez, 2002). They are denoted as primitive because they are implemented directly as programming language code and not further decomposed into sub-behaviors. They are not necessarily primitive in terms of behavioral complexity; they may be relatively simple (for example, UseWeapon) or relatively complex (for example, RequestAndCoordinateFireSupport). The OneSAF primitive behaviors constitute a repertoire of behaviors available for execution by OneSAF entities and units and for assembly into composite behaviors (defined later; Tran, Karr, & Knospe, 2004).
200
VE Components and Training Technologies
Composite behaviors are behaviors formed by combining other behaviors and may comprise both primitive behaviors and other composite behaviors. Composite behaviors can be assembled hierarchically, with lower level behaviors composed into higher level behaviors, and those behaviors in turn composed into still higher level behaviors. The behavior levels may correspond to military echelon levels (platoon, company, and battalion). OneSAF includes a behavior composition tool that provides a graphical editing environment in which behaviors can be composed. Using this tool, primitive behaviors and composite behaviors can be assembled into sequential and parallel execution threads controlled by branches and loops that may test predicates (conditions) in the state of the executing actor or the simulation (Tran, Karr, & Knospe, 2004). To provide familiarity, the behavior composition tool’s graphical notation has both a visual appearance and a semantic content similar to the flowcharting notation widely used in program design since the 1960s (Henderson & Rodriquez, 2002). When a composite behavior is executed, the behaviors that make up the composite behavior are executed in the order defined by the connections and logic in the composite behavior’s flowchart. RESEARCH DIRECTIONS An early examination of the state of the art of SAF systems identified 10 areas in which further research was needed (Fishwick, Petty, & Mullally, 1991). Four of those areas have been resolved, at least to the extent that they are no longer issues in SAF development. Two others have proven to be not as important in SAF systems as formerly thought. Four remain open; they are the following: Behavior representation language. A long-standing goal has been a language (textual or graphical) in which SAF behaviors could be expressed in a form usable by subject matter experts, as opposed to software developers. Although there have been several initiatives in this direction, there is a fundamental difficulty: any language powerful enough to express the full range of desired behaviors seems to become as complex as a programming language, thus making it inaccessible to nonprogrammers. The OneSAF behavior composition process and tools are attempting to address this. Automated planning. Planning for SAF entities still depends, ultimately, on the SAF operator. Here planning means organizing the high level activities of the overall SAF force, such as an operation plan for a brigade, not controlling the low level actions of individual SAF entities. SAF systems are currently much better at the latter than the former. SAF researchers and developers continue to strive for greater autonomy of SAF entities through automated planning. Autonomous agent modeling. Autonomous agents in general interact with the environment and behave under the control of internal plans and algorithms; SAF entities certainly share these characteristics. Ideas from autonomous agent research have seen an increasing application in SAF development, and in OneSAF.
Behavior Generation in Semi-Automated Forces
201
Validation. As described earlier, validation of SAF behavior is an important issue and remains an open research area. Broadly applicable quantitative methods are needed to take SAF validation beyond variations of face validation. Two new areas of important research have emerged since the earlier list: Scalability. Increases in the computational power of computer workstations that run SAF systems have resulted in concomitant increases in the number of SAF entities that can be generated by a typical workstation, but the increase in entities has not been proportional to the increase in computational power. This is primarily because the expectations of SAF users for increased fidelity in SAF physical models and increased sophistication in SAF behaviors have grown, consuming much of the increased computation power (Franceschini, Petty, Schricker, Franceschini, & McCulley, 1999). Generating a single SAF entity today requires considerably more computational power than it once did. Consequently, there is a continuing need to find ways to generate larger numbers of SAF entities using standard workstations. Moreover, there is a parallel desire to allow operators to control a larger number of entities; this depends on both improvements in operator interface design and on increased levels of automated planning, as mentioned earlier. Asymmetric entities and tactics. SAF systems have traditionally been focused on generating behavior that accurately reproduced the tactical doctrine of the expected military foe. In contrast to the Cold War era, current real world adversaries increasingly do not have a fixed and formal tactical doctrine, instead behaving in ways described as asymmetric, and exhibiting continuous adaptation to friendly tactics. Implementing such behavior in SAF systems will be even more challenging than fixed doctrinal behavior. There are some asymmetric entities and tactics in OneSAF (Parsons, Surdu, & Jordan, 2005), but research is needed. CONCLUDING COMMENTS Semi-automated forces systems are an essential component of virtual environment training systems, generating both opposing and friendly entities that populate the virtual world and provide important training stimuli. Much progress has been made in their implementation, and excellent examples exist. However, SAF systems, in general, and the generation of autonomous behavior, in particular, are likely to be the subject of research for some time. Increased entity generation capacity, improved behavioral realism, broader behavior repertoires, and expanded use of cognitive modeling for behavior generation are all areas where additional work would be useful. Although this chapter has focused on training applications, SAF systems can also be used nontraining purposes; for example, they may provide both friendly and opposing forces in an analysis application requiring many runs of a scenario to support statistical analysis, an application poorly suited for human control of simulation entities via crewed simulators.
202
VE Components and Training Technologies
REFERENCES Ahmad, O., Cremer, J., Kearney, J., Willemsen, P., & Hansen, S. (1994). Hierarchical, concurrent state machines for behavior modeling and scenario control. Proceedings of the Fifth Annual Conference on AI, Simulation, and Planning in High Autonomy Systems (pp. 36–42). Los Alamitos, CA: IEEE Computer Society. Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Erlbaum. Aronson, J. (1994). The SimCore tactics representation and simulation language. Proceedings of the Fourth Conference on Computer Generated Forces and Behavioral Representation (pp. 187–193). Orlando, FL: UCF–Institute for Simulation and Training. Balci, O. (1998). Verification, validation, and testing. In J. Banks (Ed.), Handbook of simulation: Principles, methodology, advances, applications, and practice (pp. 335–393). New York: Wiley. Calder, R. B., Smith, J. E., Courtemanche, A. J., Mar, J. M. F., & Ceranowicz, A. Z. (1993). ModSAF behavior simulation and control. Proceedings of the Third Conference on Computer Generated Forces and Behavioral Representation (pp. 347– 356). Orlando, FL: UCF–Institute for Simulation and Training. Ceranowicz, A. (1994, May 4–6). ModSAF capabilities. Proceedings of the Fourth Conference on Computer Generated Forces and Behavioral Representation (pp. 3– 8). Orlando, FL: UCF–Institute for Simulation and Training. Ceranowicz, A., Coffin, D., Smith, J., Gonzalez, R., & Ladd, C. (1994, May 4–6). Operator control of behavior in ModSAF. Proceedings of the Fourth Conference on Computer Generated Forces and Behavioral Representation (pp. 9–16). Orlando, FL: UCF–Institute for Simulation and Training. Courtemanche, A. J., & Wittman, R. L. (2002). OneSAF: A product line approach for a next-generation CGF. Proceedings of the Eleventh Conference on ComputerGenerated Forces and Behavior Representation (pp. 349–361). Orlando, FL: UCF– Institute for Simulation and Training. Dahmann, J. S., Kuhl, F., & Weatherly, R. (1998). Standards for simulation: As simple as possible but not simpler: The high level architecture for simulation. Simulation, 71(6), 378–387. Fishwick, P. A., Petty, M. D., & Mullally, D. E. (1991). Key research directions in behavioral representation for Computer Generated Forces. Proceedings of the 2nd Behavioral Representation and Computer Generated Forces Symposium (pp. E1–E14). Orlando, FL: UCF–Institute for Simulation and Training. Franceschini, R. W., Petty, M. D., Schricker, S. A., Franceschini, D. J., & McCulley, G. (1999). Measuring and improving CGF performance. Proceedings of the Eighth Conference on Computer Generated Forces and Behavioral Representation (pp. 9– 15). Orlando, FL: UCF–Institute for Simulation and Training. Henderson, C., & Rodriquez, A. (2002). Modeling in OneSAF. Proceedings of the Eleventh Conference on Computer Generated Forces and Behavioral Representation (pp. 337–347). Orlando, FL: UCF–Institute for Simulation and Training. Institute for Electrical and Electronics Engineers (1995). IEEE Standard for Distributed Interactive Simulation—Application Protocols (Standard 1278.1-1995). Piscataway, NJ: Author. Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). SOAR: An architecture for general intelligence. Artificial Intelligence, 33, 1–64.
Behavior Generation in Semi-Automated Forces
203
Longtin, M. J. (1994). Cover and concealment in ModSAF. Proceedings of the Fourth Conference on Computer Generated Forces and Behavioral Representation (pp. 239– 247). Orlando, FL: UCF–Institute for Simulation and Training. Maruichi, T., Uchiki, T., & Tokoro, M. (1987). Behavioral simulation based on knowledge objects. Proceedings of the European Conference on Object Oriented Programming (pp. 213–222). London: Springer-Verlag. Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive control processes and human multiple-task performance: Part 1. Basic Mechanisms. Psychological Review, 104, 3–65. Moore, M. B., Gieb, C., & Reich, B. D. (1995). Planning for reactive behaviors in hide and seek. Proceedings of the Fifth Conference on Computer Generated Forces and Behavioral Representation (pp. 345–352). Orlando, FL: UCF–Institute for Simulation and Training. Nielsen, P., Smoot, D., Martinez, R., & Dennison, J. D. (2001). Participation of TacAirSoar in roadrunner and coyote exercises at Air Force Research Lab, Mesa, AZ. Proceedings of the Ninth Conference on Computer Generated Forces and Behavioral Representation (173–180). Orlando, FL: UCF–Institute for Simulation and Training. Ourston, D., Blanchard, D., Chandler, E., Loh, E., & Marshall, H. (1995). From CIS to Software. Proceedings of the Fifth Conference on Computer Generated Forces and Behavioral Representation (pp. 275–285). Orlando, FL: UCF–Institute for Simulation and Training. Parsons, D. (2007, May). One Semi-Automated Forces (OneSAF). Paper presented at the DoD Modeling and Simulation Conference. Retrieved December 13, 2007, from www.onesaf.net Parsons, D., Surdu, J., & Jordan, B. (2005, June). OneSAF: A next generation simulation modeling the contemporary operating environment. Paper presented at the 2005 European Simulation Interoperability Workshop, Toulouse, France. Petty, M. D. (1994). The Turing test as an evaluation criterion for Computer Generated Forces. Proceedings of the Fourth Conference on Computer Generated Forces and Behavioral Representation (pp. 107–116). Orlando, FL: UCF–Institute for Simulation and Training. Petty, M. D. (1995, December). Case studies in verification, validation, and accreditation for computer generated forces. Paper presented at the ITEA Modeling & Simulation: Today and Tomorrow Workshop, Las Cruces, NM. Petty, M. D., Moshell, J. M., & Hughes, C. E. (1988). Tactical simulation in an objectoriented animated graphics environment. Simuletter, 19(2), 31–46. Smith, J. (1994). Near-term movement control in ModSAF. Proceedings of the Fourth Conference on Computer Generated Forces and Behavioral Representation (pp. 249– 260). Orlando, FL: UCF—Institute for Simulation and Training. Smith, S. H., and Petty, M. D. (1992). Controlling autonomous behavior in real-time simulation. Proceedings of the Southeastern Simulation Conference 1992 (pp. 27–40). Pensacola, FL: Society for Computer Simulation. Sokolowski, J. A. (2003). Enhanced decision modeling using multiagent system simulation. Simulation, 79(4), 232–242. Tran, O., Karr, C., & Knospe, D. (2004, August). Behavior modeling. OneSAF Users Conference, Orlando, FL. Retrieved December 13, 2007, from www.onesaf.net Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433–460.
204
VE Components and Training Technologies
U.S. Department of Defense. (2002, November 4). TENA—The Test and Training Enabling Architecture: Architecture reference document. Foundation Initiative 2010 (Version 2002, Review ed.). Retrieved April 25, 2008, from https://www.tena-sda. org/display/intro/Documentation Van Brackle, D. R., Petty, M. D., Gouge, C. D., & Hull, R. D. (1993). Terrain reasoning for reconnaissance planning in polygonal terrain. Proceedings of the Third Conference on Computer Generated Forces and Behavioral Representation (pp. 285–305). Orlando, FL: UCF–Institute for Simulation and Training. Wise, B. P., Miller, D., & Ceranowicz, A. Z. (1991). A framework for evaluating Computer Generated Forces. Proceedings of the 2nd Behavioral Representation and Computer Generated Forces Symposium (pp. H1–H7). Orlando, FL: UCF–Institute for Simulation and Training. Zachary, W., Ryder, J., Weiland, M., & Ross, L. (1992). Intelligent human-computer interaction in real time, multi-tasking process control and monitoring systems. In M. Helander & M. Nagamachi (Eds.), Human factors in design for manufacturability (pp. 377–402). New York: Taylor and Francis.
Chapter 10
GAMES AND GAMING TECHNOLOGY FOR TRAINING Perry McDowell This chapter looks at games and gaming technologies and how they are being used for training. However, why include a chapter about this in a handbook on virtual environments for training? Doing so implies that, at a minimum, games are a form of virtual environment, and they are being used for training or people are using gaming technology to build virtual environments. While both of these are true, the truth is that recent advances in gaming, virtual environments (VEs), and training technologies are blurring the line between what is a game, a VE, or a training device. This loss of differentiation is good and provides each community with new opportunities to expand, but it can make it difficult to keep up. The coming together of these various disciplines is a very recent occurrence. In Stanney (2002), published just six years before this tome, not one of the 56 chapters contained the word “game” in the title. The closest chapter title to games is “Entertainment Applications of Virtual Environments,” the final chapter before the conclusion. The Serious Games Summit, a workshop for those building games whose primary goal is something other than entertainment, prior to the Game Developers Conference, began only in 2004. Likewise, in 2003, the Interservice/Industry Training, Simulation & Education Conference, the largest military training conference in the world, had three papers or tutorials with “game” in the title. In 2006, there were approximately 30, including special sessions and a “serious game” contest to determine the best training game. Although recent, this cross-pollination occurs in many areas and extends in many directions besides writings and conferences. For example, Nintendo’s Wii remote is undoubtedly a piece of gaming hardware, but it is a clear descendant of the wand used in the earliest CAVE created by Carolina Cruz-Neira (CruzNeira, Sandin, DeFanti, Kenyon, & Hart, 1992), which, equally undoubtedly, is a VE. Likewise, many virtual environments are now using game engines, originally designed to create games, as the underlying software driving many parts of the virtual environment (Hashimoto, Ishida, & Sato, 2005). Additionally, in the past, when trainees were placed in an environment similar to that of their jobs,
206
VE Components and Training Technologies
the vast majority of training applications were huge simulators, generally mimicking aircraft, tanks, or ships. Now, these large simulations are being replaced with, or more often, augmented by smaller, game-like systems. Also, the training community is expanding the number and types of tasks that can be trained using some combination of game and VE technology. The most important point for the reader to take away from this chapter is this: games and gaming technologies can be exceptionally valuable tools for an educator or trainer to use, but they remain only that: tools. While some may claim that they will replace trainers (Prensky, 2000), most do not see this. Instead, games will become part of the whole training experience, included in curricula as just another method for trainers to help their students learn, such as readings, quizzes, lectures, and one-on-one tutoring. As Jenkins (2007) states, “Our goals are never to displace the teacher but rather to provide teachers with new resources for doing what they do best.” One key element in figuring out how and when to use these tools in a curriculum is the cognitive task analysis. A cognitive task analysis takes an in-depth look at a task and attempts to break it down into distinct subtasks and examines what is required for each. These can become quite long; for example, even the simple task of making toast looks complicated when broken down into subtasks and capabilities. These would include knowing how to adjust the toaster’s settings, choosing the correct setting for the desired crispness, being able to the see the bread and the toaster, understanding what lever to push down to start the toasting, and having the physical strength/dexterity to manipulate the lever. Although this chapter will not go into cognitive task analyses in depth, to learn more, see Kirwan and Ainsworth (1992). One note: although training and education are not the same things, games have the ability to do both. Space limitations prevent this chapter from going into the significant depth required to delineate the two; therefore, throughout this chapter both training and education will be combined together in the term “training.”
METHODS TO CREATE TRAINING GAMES There are several methods to create training games, but this chapter will cover the three that are the most common in the author’s opinion: using commercial games, partnering with a game company to produce a game, and building (or having a game company build) a game from scratch for a particular training need.
Commercial Games The simplest method to use gaming technology for training is to have students play a commercial off-the-shelf (COTS) game to learn. COTS games have proven very effective at teaching a wide variety of tasks. For example, many armies use Steel Beasts, originally released as a commercial game simulating tank warfare, to practice skills required to fight their modern tanks and other
Games and Gaming Technology for Training
207
mechanized vehicles (eSim Games, 2007). Similarly, many teachers use Sid Meier’s Civilization series to teach their classes history (Squire, 2004). In these cases, COTS games can be very effective as learning aids. Steel Beasts won the 2006 Serious Games Challenge Award (eSim Games, 2007), while Squire (2004) shows that Civilization improved students’ knowledge of history. Generally, the most important reason someone uses a COTS game for training rather than developing a game is the cost. Building a top-of-the-line commercial game from scratch can be a significant and costly undertaking; Halo 2, completed in 2004, credited over 190 people and had a budget over $40 million, and the budgets of the top games continue to rise exponentially (Reimer, 2005). Rather than paying that cost, the company bears the cost, and the trainers just buy copies of the game for their trainees. So long as the game, as it was built, can train, it can be used. For example, in discussing the Australian Army’s use of Steel Beasts for tank training, an Australian Department of Defense spokesperson says in Braue (2007), “The key for defense is getting value for money, while meeting the training need.” If an existing game can produce learning in the student faster than the other available methods, it makes sense to use the game. However, using COTS games is not without risks and problems. Games produced to sell commercially are designed to do just that: sell commercially. When deciding whether to sacrifice realism to improve gameplay, user enjoyment, and therefore sales, it is an easy decision for the game designer: sales wins every time. Additionally, often the game must be significantly simpler than reality to make gameplay even possible; for example, if it took Steel Beasts’ players as long to learn to control an M1A1 Abrams tank effectively as it does to teach a soldier to do so in reality (12–18 months), no one would bother to play the game. Similarly, while the Civilization games have been used in many classrooms, there are many who feel that they provide students with a poor, misleading, or one-sided educational experience. Whelchel (2007) contains a laundry list of the problems, two of which are listed here. The first is that the game applies Western “ages” to non-Western cultures that did not experience them, giving a false delineation to non-European cultures. The second is that the game rewards players based upon their Civilization’s type of government, and “a capitalist democratic republic garners the most benefits, leading players to adopt an Amero-centric perception of the relative worth of political systems.” These problems, among others, may cause someone playing the game to get an incorrect or biased view of world history. This does not mean that COTS games cannot be a valuable asset to teachers and trainers, especially those who are constrained by budget. Even though Steel Beasts greatly simplifies the requirements for driving and fighting a tank, it can be assumed that trained soldiers can already do that, or at least will receive that training via other methods. Therefore, the fact that the game abstracts away the difficulty of those tasks is immaterial, and it can be used to train soldiers in how to employ their tank as part of a larger troop. Likewise, the inadequacies listed above for Civilization might be serious shortcomings in a graduate level course on comparative political systems. However, they are so far beyond the scope of
208
VE Components and Training Technologies
a high school world history class that the game can be used effectively in that setting to demonstrate political decision making. The key to effective implementation of a COTS game is to perform a thorough evaluation of the game and determine which tasks, if any, it can be used to train. Ideally, this will be a complete cognitive task analysis of the entire task, with a mapping between the required subtasks and those that the game trains. However, at a minimum it should consist of a trainer, with subject matter expert help if necessary, examining the overall task and the game and then making a determination of which subtasks the game trains. It is crucial to do this because it is highly unlikely that a commercial game, made without any input from the trainer, will completely and perfectly train all parts of the overall task. Therefore, the training curriculum will have to be modified to remove redundant training covered by the game and, more importantly, ensure that tasks not trained in the game are addressed somewhere else in the curriculum. Returning to Steel Beasts with a somewhat fanciful example, it would be ineffective to make a soldier’s training consist entirely of that game. After his training, he might have a good concept of how to maneuver a platoon of tanks against an enemy, but when he sits in the driver’s seat of a tank, he would not have any idea of how to start it. One other problem with COTS games as trainers is that there has to be an existing COTS game in the first place. For example, if a math teacher wants a game to teach calculus, he or she is unlikely to find a COTS game that would be effective; there just is not a market for the game to be developed in the first place. Likewise, the U.S. Navy is unlikely to find a COTS game to teach deck seaman to handle lines during mooring because such a commercial game would be unlikely to sell. Even though calculus, line handling, and a myriad of other tasks still need to be taught and would benefit from a game based approach, there will never be COTS games educators can use to teach them.
Partner with a Commercial Game Company An option somewhere between using a COTS game and building one from scratch to train a particular application is to create a symbiotic relationship with a commercial game company to build a game. This arrangement, which we will call partnering, is where each group brings different resources to the collaboration and each group gets to use the resulting game, but it is a slightly different arrangement than merely contracting with a commercial company. In a simple contracting arrangement, which will be discussed later, a contract specifies exactly what kind of game the trainers want the game company to produce. The game company builds the game, the trainers use it to train, and the game company does nothing else with it. In a partnership, both groups want to use the final product: the trainers to teach, the game company to sell commercially. Normally the trainers provide the subject matter experts and access to equipment and locations in the game, the company provides the game production experts, and they split the cost of game production. The trainers get to use the game without any
Games and Gaming Technology for Training
209
additional licensing costs, and the game company gets the profits from selling the game commercially. When it works correctly, this can be a very beneficial relationship for both parties. The trainers get a game designed specifically for their training needs for less than the cost of contracting it out to a company. The company gets the help of subject matter experts for free, it does not have to bear the entire cost of development, and it can use the fact that the game is being used for training in marketing. However, there are two major downsides to creating a trainer this way. The first is similar to a problem with COTS games: the task must be one that has a chance of being a commercially successful game. Because ground combat games are so popular, game companies line up for the chance to work with the U.S. Army or U.S. Marine Corps to produce a trainer for ground troops that they could resell. Likewise, flight simulators often sell well, so companies are willing to work with the U.S. Air Force to produce such a game. Unfortunately, many of the tasks required by the military, although crucial for combat success, are not likely to make games that are commercially viable. For example, logistics is critical to keep the military fighting, but Logistic Technician, a game where the player acts as a supply noncommissioned officer, is unlikely to be a commercial success. Therefore, even though the military needs to train supply soldiers and might want to create a game to train logisticians, it would be unlikely to find a partner. The other problem with these relationships is that each partner requires a different outcome from the partnership. The trainer wants a training application, while the game company wants a commercial hit, and to each everything else (including the partner’s desires) is secondary and considered less important. This dichotomy can lead to problems when decisions have two options, one better for training, the other better for gameplay and enjoyment. There are literally thousands of these decisions in every game development. Quite often these can be aggravated by a lack of communication or understanding between the groups. Trainers think that the obvious answer is the training-friendly option, while the gamers cannot conceive of making a decision to reduce the fun of the game. These differing goals often lead to games that are significantly better in one of these phases, but rarely produce a game that is both an exceptional training tool and a commercial success. An example of this is the partnership between the U.S. Marine Corps and Destineer, a video game publisher, to create the Close Combat: First to Fight game, intended to be a drug reduction tool within the Corps. In the version intended for the Corps, the player would encounter a marine under the influence of drugs and have to deal with the consequences. The Marine Corps supplied funding and U.S. Marines to act as subject matter experts during the development of the game. However, it has not been used as a training tool by the Marine Corps due to problems with the techniques in the games. For example, to speak to someone in the game, the player must point his or her weapon at that person, which U.S. Marines are trained never to do unless they intend to shoot. Although a newer version is supposed to have fixed these problems, to date it has not been used as a trainer within the Marine Corps.
210
VE Components and Training Technologies
Build the Training Game from Scratch Another option, and oftentimes the best in terms of the finished product meeting the training need, is to build a game from scratch. Other chapters in this volume will address how to do this in depth, so this chapter will be limited to describing some of the pros and cons of this to help determine whether this is appropriate. Additionally, this section will discuss some of the terms a trainer needs to understand in order to make this decision. The biggest advantage of building a training game purposely for a given training need is having one, and only one, question when determining whether to include a feature: Will this feature improve the trainee’s ability to learn the material? Unlike a COTS game, which likely gave no consideration to this, or a partnership, where this question had to be balanced with the commercial company’s desire to turn a profit, every aspect of the design can be centered on how it makes the learner’s experience better. With only one goal, designers can produce a significantly more focused product. Additionally, there is a significant reduction in the amount of discussion to determine whether a feature is bad for learning but good for commercial success, or vice versa, which leads to a faster production and a happier, more productive team. The biggest disadvantage of building a training game from scratch is that the trainer must bear all expenses alone. As mentioned earlier, the top games can cost $40 million to produce, and clearly a local school board cannot approve such a budget item for a game to teach math better. Even the military, with some of the biggest training budgets, cannot afford such costs unless the game is expected to produce a larger savings elsewhere in the budget or a huge increase in performance. While training games do not need to match the high quality graphics, sound, and artificial intelligence of COTS titles in order to be effective, the high quality of COTS games is included when using a COTS game or partnering with a commercial game producer. If one or both of these other options is available, the trainer needs to decide whether the improvement derived from having a game specifically designed to meet his or her training need is worth the additional investment. GAME ENGINES One of the most prevalent ways that the virtual environment and training communities have borrowed from the game industry is in the field of software. The game industry has always been very competitive and as the cost of producing a game has risen into the tens of millions of dollars, the difference between a game that makes many times the initial investment and one that loses it is incredibly small. This has driven game engine builders to continually refine their engine to make it the fastest, the best looking, and the easiest to use. This competition has meant that each commercial game engine generally has had a life span of 2 to 3 years, so in the past 15 years there have been five to seven generations. Even by the impressive standards of the computer industry, few areas have achieved such
Games and Gaming Technology for Training
211
a rapid growth. Each generation greatly surpassed the previous in performance, features, and ease of use, and this constant refinement has produced some remarkable software. Of this software, the most prevalent are game engines, the underlying software that almost every game is built upon. As each game engine is different, sometimes radically, it is difficult to give a perfect description that precisely defines game engines that does not exclude many. A simple, if imprecise, definition is that a game engine does the tasks that are required to make the game work, but the game programmer would prefer not to do or think about while creating the game. In more exact terms, a game engine abstracts much of the lower level implementation away from the game programmer. For example, when a player shoots a weapon or drives a vehicle, the programmer knows this must be networked to the other players, but does not want to have to worry about making this happen at the transmission control protocol/Internet protocol layer. Rather, he or she would like to make a simple function call and let the game engine worry about turning that call into the message transferred to all the other players. This is the case for almost all aspects of the game. In addition to networking, the core functionality of game engines generally includes a scene graph for rendering, physics (including collision detection and visibility determination), audio, memory management, threading, windowing, graphical user interfaces, and refereeing player interactions and determining the results. Additionally, most engines are now designed either to provide artificial intelligence automatically or to contain a framework to make it easy for the programmer to produce whatever the desired behavior. See http://www.devmaster.net/engines/engine_ details.php?id=25 for a more detailed description of the components of the Unreal 3 game engine, one of the most popular game engines on the market. Normally, each game engine is optimized for a specific genre of game, and often even specialized for a specific subgenre. For example, engines are customized not only for first-person shooter (FPS) games, role-playing games, or real time strategy games, but an engine for FPS games will likely be customized even further for an FPS game that is conducted mainly inside, or in a jungle environment, or a desert setting. This allows the engine programmers to make incredible optimizations for a specific type of game, so the result is that this game runs exceptionally well. However, it also limits the breadth of games that can be created using it. However, game engines are no longer evaluated just by the quality of the games built atop them. Increasingly, an engine is judged by how easy it makes it for the game team to produce that product. Because of the strict time constraints in the game industry, a game that plays acceptably and is produced on time is often significantly more profitable than the perfect game that ships late and is plagued by delays and cost overruns. As mentioned earlier, the difference between success and failure can be very small, and games are canceled in middevelopment for being late, as well as for being poorly done. Therefore, the engine’s ability to improve the production flow (often called the pipeline) has become as important as its ability to produce impressive graphics.
212
VE Components and Training Technologies
A great example of this is the Unreal 3 engine, which Epic Games used to create the game of the year for 2006, Gears of War. One of the ways that the Unreal 3 engine improved the pipeline was adding a graphical programming tool Kismet that allowed nonprogrammers, such as level designers, to produce effects that previously could be created only by programmers. The level designer can produce the effect himself or herself, and it would exactly match his or her idea. If the effect is not computationally expensive, the script can be inserted directly into the game. If it is too computationally expensive to run in real time in scripting language, a programmer can once again program it in a faster language, generally C++, but now instead of having only the level designer’s verbal description, he or she can see the exact effect the level designer desires. This makes it much easier for him or her to match it on the first try and makes production run much smoother. Currently, this capability is available only in a few of the most expensive game engines, but in the coming years it is likely to expand to a much wider range of engines. NEEDED IMPROVEMENTS While gaming technology offers great benefits for training systems now, there are several additional capabilities that need to be created before they reach their full potential. These include some improved technological capabilities, additional research into game based training, and some modifications to current methodologies and attitudes. Needed Technological Advances In order to fit games into curricula, one of the most important technical advances will be the ability for training games and learning management systems to communicate. A learning management system is an application, generally Web based, that leads the student through the curriculum, and it typically contains a syllabus, class assignments, links to readings, quizzes, the student’s grades in the class, and other similar material. Common examples of learning management systems are Blackboard, Moodle, and Meridian. Learning management systems have become indispensable parts of distance learning classes for some time and are now being used by most on-site training classes also. Joining games and learning management systems is key to the future of game based training; more on this subject can be found in “Interfacing Interactive 3-D Simulations with Learning Systems” by Conkey and Smith (Volume 2, Section 2, Chapter 15). Another technological improvement needed before game based training can reach its potential is the improvement of intelligent tutoring systems. As Dr. Jeff Wilkinson, Army Program Manager for the Institute of Creative Technologies, said during a session at the 2006 Serious Game Summit, “Experiential learning is not effective; guided experiential learning is effective.” While learning can occur from students playing randomly with the only feedback being failing at
Games and Gaming Technology for Training
213
the game, giving a player the ability to examine his or her mistakes and learn from them is a much quicker and better way to deliver information and train. Unfortunately, this means that a trainer will have to review his or her gameplay, either in real time or via a playback mechanism, and spend time to go over it with him or her. It would be more effective if there was a computer based system that did this initially instead of the trainer, with the trainee approaching the trainer only if the computer based remediation did not completely explain his or her deficiencies. This is what intelligent tutoring systems are designed to do. These tutoring systems are more than just simple lists of every mistake the player makes; instead, they are designed to analyze the mistakes and create a mental model of the player’s knowledge. The tutoring system then determines the best method of remediation to eliminate the gap between the player’s current knowledge level and that required for successful completion of the training objective. The simplest method of remediation is to have the player reread texts covering his or her mistakes, but they can be significantly more complex. Hypertext documents, with links to allow the student to drill down further into concepts not completely understood, are slightly more advanced, while natural language systems designed to hold conversations just as a human tutor would are among the most complex. This tutoring does not have to occur after a training session; just as a human tutor often does, an intelligent tutor might coach the user during the game to help the student overcome an obstacle so he or she can reach other areas. Finally, the improvement that might do the most to increase the number of games to be used as effective training aids is to get the tools to create them out of the hands of engineers and programmers and into the hands of trainers. Currently this means creating more advanced systems that do not require the same expertise to produce games. There is significant interest in creating simple tools, often called scenario generators, that would allow trainers to create training games without engineers. This would significantly reduce the cost and time to produce the training and allow scenarios to be easily modified by trainers as circumstances change. The Defense Advanced Research Projects Agency’s Real World program is designed to create such tools. In the somewhat distant future, it would be ideal if a system could examine a trainee’s record on the learning management system, determine what knowledge the trainee needed to complete to reach all required competencies, and then generate game scenario(s) that teach and evaluate him or her.
Needed Research While lately there has been a great deal of research on using games for training, that research has largely centered on proving that games can provide training value. There needs to be significantly more research into a wide variety of subjects before games can truly reach their potential as training tools. Additionally, much of the published research has been discussions or examples where researchers used a game in training. O’Neil, Wainess, and Baker (2005) surveyed more
214
VE Components and Training Technologies
than 4,000 articles published in the preceding 10 years, many of which were from fairly well respected journals, and found that only 19 were “empirical” studies. (Empirical was defined as having a control group and random assignment.) While there is much research indicating that games make effective training tools, there is very little proving it. Additionally, the overall finding of their paper was that learning is dependent on the quality of instructional design and instructional strategies, not on the media. Learning games that are based on solid instructional methods (design and strategies) made effective learning tools, while games designed with weak or poor instructional methods normally did not. While this result should not be surprising, it creates several basic questions that need to be addressed by serious, empirical research, such as the following: • What are the best instructional methods to employ in games? • Do the best methods vary by the type of task being taught? • What are the best types of games to train? Or, more properly, what is the best type of game to train a given task? • Is this the same for all types of learners? • If games train effectively, do they train efficiently, that is, better, faster, and cheaper than other training methods? • If so, is this true in all cases, or for which domains is it not true? • Games are likely to be only one part of curricula. When is the best time to use a game? a. At the beginning, because it will give context for why the material taught is important to the big picture even though the trainee does not have the knowledge/skills to do well? b. After some instruction, because trainees can practice what they have learned and reinforce it? c. After all the instruction, when they should be able to complete the entire task? d. Or throughout the course? • Is there a checklist that can be created for building training games so that any subject matter expert can build a training game for his or her field, or is this an art form that requires specialized training? • How much does fun play in the effectiveness of training games? (Pedants might ask “What is fun?” The answer to that question is too disputed to answer here. (For more information, see Koster, 2004, or Salen and Zimmerman, 2003.) • Does fun merely serve as a method to get a trainee to spend more time on a subject, or is it significant in the effectiveness of the training? • Studies indicate that fun does not matter in short-term retention, but is that true for long-term retention of material? • If using intelligent tutoring systems, is it better to point out errors during the game or after the trainee finishes? • If so, is this the case for all tasks and learners?
Games and Gaming Technology for Training
215
• Are games better predictors of real world performance than current methods, such as tests and oral interviews? • If so, for which tasks and game types does this hold true? • Do differences exist in the effectiveness of game based training for people in different groups, such as gender, age, educational level, socioeconomic background, and gaming experience? • Similarly, does this hold true if games are used as predictors of performance?
Needed Modifications to Methodologies and Attitudes Many believe that games hold the possibility of improving almost any curriculum; however, whether they can reach that potential is based more on overcoming attitudinal and commercial barriers than technological ones. There are several of these barriers, but this chapter will cover only three: aversion to games by decision makers, the differences between game designers and educators, and the current commercial models of the gaming and simulation business. The first of these is that certain people, especially those in positions of power or the current training establishment, dislike using games to train. Normally this is not based upon research indicating that games are ineffective at training, but rather something more visceral. At times, these can be addressed fairly simply, especially through education. For example, many teachers worry that using games for training is an attempt to replace them in the classrooms the way robots have replaced assembly-line workers. These fears can often be overcome by explaining to them that games are not going to replace them any more than television, movies, or the Internet did; in fact, they are merely another tool they can use to educate their students (Jenkins, 2007). Similarly, some people believe that training is very serious business, and games are considered frivolous. Most of these people have no objections to using computer based simulations for training, and they definitely would agree that training events should be engaging, but to use the common term for an engaging simulation, that is, “game,” somehow implies to them that the trainees are partaking in child’s play rather than in training. In this case, while education can work, often a simple name change can suffice. The U.S. Marine Corps, for example, uses tactical decision-making simulations, not games, to train marines; that most of these are largely indistinguishable from first-person shooter games is immaterial. The second of these problems is the dichotomy between game designers and educators. While having educators more involved in the development of games has great potential for the expansion of their use in the classroom, it also has the possibility to destroy the value of games as teaching tools. One of the main reasons games are so effective in training is that they engage the learner. However, that is not always the case with games designed by educators, as many of the early computer learning games were. As Henry Jenkins, co-founder of the Massachusetts Institute of Technology (MIT) Comparative Media Studies Program, wrote, “Most existing edutainment products combine the entertainment value of a bad lecture with the educational value of a bad game” (Jenkins,
216
VE Components and Training Technologies
2002). Much of the blame for this must be placed upon the fact that these games were mainly designed by educators who were not trained in the art of making entertaining games. Giving educators the tools to build games without giving them the capability to build engaging games will create a multitude of games that are expensive failures because they became just another boring part of school. The cure for this is to ensure that those creating the games understand both the educational side and the game design side, or at least respect their colleagues on the other side of the argument. Mark Oehlert, a researcher on game based training at the Defense Acquisition University, says, We need to work a lot harder on bringing instructional system design closer together with game design. Currently, they are two entirely different schools of thought, but we must do better in creating a hybrid. It’s not technology that will slow serious games from reaching their potential; we’ve got plenty of technology. The question will be “How can we make powerful learning moments with these technologies? (McDowell, 2007)
The third is far more troublesome and may be the most serious obstacle to the widespread adaptation of training games: the business models of building training games. Currently, there is no good model of creating training games. The game industry is generally built around creating only a few games, with the hope that one or two will hit it big and generate millions of dollars of revenue. The licensing of most of the biggest game engines reflects this; they can cost upwards of $500,000 per game, not to mention that most of the code outside the engine and art assets are made for one game only. All these factors make it difficult to build many different games, but that is exactly what the training community needs since there are so many diverse training needs. Additionally, another problem is that games are generally created, released, and then not maintained. As users update their computers and operating systems, the games companies do nothing to ensure that the older games will work on the new equipment. For commercial games, this normally is not an issue, but trainers would prefer to replace their games only when there is a change to the material, not because they upgraded computers. Similarly, if the material is changed, there is no way for the training command to change the game to reflect these changes without going back to the company, which might be unwilling to make the changes cheaply, if at all. In order to drive these costs down, there needs to be a major shift in the business of making games. Potential solutions include open source products and increased reuse of the code, models, and art assets. There are many ideas on how to do this, but there is currently not much momentum to make the required changes.
CONCLUSION The most important item in determining how to design a game based trainer is remembering that it is only one tool that the educator has to train the student and
Games and Gaming Technology for Training
217
that the game must fit into the entire blended training solution. Similarly, it is critical for the trainers and subject matter experts to perform an analysis on the tasks to be trained to determine which can be trained using a game and how effectively the game trains these tasks. This ensures that any tasks not trained effectively in the game are covered in another part of the curriculum. There are many ways to create games for training, and no one method is perfect in all cases. While COTS games are generally the cheapest option, in many cases there is not a COTS game available, or at least not one that trains the task well. Partnering with a games company can reduce costs, but it may be impossible to find a company willing to partner for a specific training need and, as always, it is difficult to build a single product with two goals in mind. While building a game from scratch ensures that it meets the training need, the costs can be prohibitive. The trainer who desires to use a game based solution as part of his or her training regimen needs to evaluate the training need, the trainees, and available resources in order to decide upon the best method to create the game. Games have proven to be effective training tools, and they are likely to be used more in the future. It is up to the people who design and implement the games to build good games so as to ensure that the momentum games currently have in moving from the fringe to the mainstream continues. REFERENCES Braue, D. (2007). Behind pretend enemy lines. Retrieved December 8, 2007, from ArmyTechnology Web site, http://www.army-technology.com/features/feature1082/ Cruz-Neira, C., Sandin, D. J., DeFanti, T. A., Kenyon, R. V., & Hart, J. C. (1992). The CAVE: Audio visual experience automatic virtual environment. Communications of the ACM, 35(6), 65–72. DevMaster.net. (2008). Retrieved July 30, 2008, from http://www.devmaster.net/engines/ engine_details.php?id=25 eSim Games Press Release. (2007). Retrieved December 8, 2007, from eSim Games Web site, http://www.esimgames.com/press_releases.htm Hashimoto, N., Ishida, Y., & Sato, M. (2005). Game-engine based virtual environments for immersive projection display systems. Proceedings of 2005 IPT & EGVE Workshop, R. Blach and E. Kjems (Eds.). Jenkins, H. (2002). Game theory: How should we teach kids Newtonian physics? Simple. Play computer games. [Electronic version]. MIT’s Technology Review, Retrieved December 23, 2007, from http://www.technologyreview.com/Energy/12784/ Jenkins, H. (2007, March). From serious games to serious gaming (part six): Common threads, Confessions of an aca-fan: The official weblog of Henry Jenkins. Retrieved April 18, 2008, from http://www.henryjenkins.org/2007/11/from_serious_games_to_ serious_7.html Kirwan, B., & Ainsworth, L. K. (Eds.). (1992). A guide to task analysis. London: Taylor and Francis, Ltd. Koster, R. (2004). A theory of fun. Phoenix. Scottsdale, AZ: Paraglyph Press.
218
VE Components and Training Technologies
McDowell, P. L. (2007). Serious games: Why today? Where tomorrow? M S and T, Issue 2/2007, 26–30. O’Neil, H. F., Wainess, R., & Baker, E. L. (2005). Classification of learning outcomes: Evidence from the computer games literature. The Curriculum Journal, 16(4), 455–474. Prensky, M. (2000). Digital game-based learning. New York: McGraw-Hill. Reimer, J. (2005). Cross-platform game development and the next generation of consoles, Ars technica, November 7, 2005 [Electronic version]. Retrieved April 18, 2008, from http://arstechnica.com/articles/paedia/hardware/crossplatform.ars/2 Salen, K., & Zimmerman, E. (2003). Rules of play: Game design fundamentals. Cambridge, MA: MIT Press. Squire, K. D. (2004). Replaying history: Learning world history through playing Civilization® III. Doctoral dissertation, Indiana University. Stanney, K. M. (Ed.). (2002). Handbook of virtual environments: Design, implementation, and applications. Mahway, NJ: Lawrence Erlbaum. Whelchel, A. (2007). Using Civilization® simulation video games in the world history classroom. World History Connected, 4(2). Retrieved December 8, 2007, from http:// historycooperative.press.uiuc.edu/journals/whc/4.2/whelchel.html
Chapter 11
VIRTUAL ENVIRONMENT SICKNESS AND IMPLICATIONS FOR TRAINING Julie Drexler, Robert Kennedy, and Linda Malone Due to the maturity and flexibility of virtual environment (VE) technology, which provides compellingly realistic visual images and allows users to be exposed to scenarios that would be dangerous or impractical in the real environment, VE systems can provide a safe and highly cost-effective alternative to real world training. Considerable evidence also suggests that VE technology can enhance task performance in a training environment (Kenyon & Afenya, 1995; Magee, 1995; Witmer, Bailey, & Knerr, 1996). However, while VEs may offer such advantages as low cost training, numerous studies on the effects of exposure to different VE systems indicate that motion sickness–like symptoms are often experienced during or after exposure to the simulated environment (Kennedy et al., 2003). Simulators, which are a specific type of VE typically used to simulate a flying or driving environment, present two-dimensional, computer-generated images on a fixed-screen display (for example, cathode ray tube [CRT] and dome). The motion sickness–like symptoms associated with exposure to simulators, known as simulator sickness, have been a problem for over 40 years (Kennedy, Drexler, & Compton, 1997). In the first published report of simulator sickness, Miller and Goodson (1960) indicated that 78 percent of the flight students and instructors experienced some degree of sickness as a result of exposure to a military helicopter simulator. Since then, reports of simulator sickness have appeared in nearly all military simulators, including the U.S. Navy, Marine Corps, Army, Air Force (Crowley, 1987; Gower, Lilienthal, Kennedy, & Fowlkes, 1987; Kennedy, Lilienthal, Berbaum, Baltzley, & McCauley, 1989; Warner, Serfoss, Baruch, & Hubbard, 1993), and Coast Guard (Ungs, 1988), as well as automobile and tank simulators (Curry, Artz, Cathey, Grant, & Greenberg, 2002; Lampton, Kraemer, Kolasinski, & Knerr, 1995; Lerman et al., 1993). Another specific type of VE system, a virtual reality (VR) device, employs a visually coupled device worn by the user to typically present three-dimensional, computer-generated images. Motion sickness–like symptoms have also been increasingly reported by a significant proportion of VR users, particularly those
220
VE Components and Training Technologies
using helmet-mounted displays (HMDs; Hettinger, 2002; Kennedy, Jones, Lilienthal, & Harm, 1994; Pausch, Crea, & Conway, 1992; Regan & Price, 1994). In order to distinguish between the symptoms from exposure to a VR system and simulator-induced symptoms, some authors have referred to the side effects of VR devices as virtual reality sickness or cybersickness (McCauley & Sharkey, 1992). Simulator sickness and cybersickness involve visually induced motion stimuli as opposed to traditional forms of motion-induced sickness that are caused by inertial motion. The symptoms that typically occur as a result of exposure to VEs include disorientation, nausea, dizziness, sweating, drowsiness, eyestrain, headache, loss of postural stability, and vomiting, although infrequent; the symptom severity can range from mild discomfort to debilitating illness (Drexler, Kennedy, & Compton, 2004; Kennedy, Fowlkes, & Lilienthal, 1993). While simulator studies have shown that simulator sickness exhibits more oculomotor-related symptoms than conventional motion sickness, VR research indicates that cybersickness exhibits more disorientation-related symptoms (Kennedy, Dunlap, Jones, & Stanney, 1996; Kennedy, Lane, Lilienthal, Berbaum, & Hettinger, 1992). Moreover, investigations into the motion sickness–like symptoms related to HMD based systems produce more severe levels of sickness and affect a greater number of users than simulators (Kennedy et al., 1996). In a survey of simulator sickness in 10 different military flight simulators, approximately 10 to 60 percent of pilots reported some degree of sickness (Kennedy, Hettinger, & Lilienthal, 1990; Kennedy, Lilienthal, et al., 1989). In contrast, Kennedy, Jones, Stanney, Ritter, and Drexler (1996) found that the average level of sickness in their VR studies was not only significantly higher than those found in the flight simulators, but 85 to 95 percent of the participants reported experiencing symptoms.
FACTORS INFLUENCING SICKNESS IN VIRTUAL ENVIRONMENTS Early military flight simulators, which first called attention to the sickness problem, had equipment limitations such as visual distortions, excessive transport delays, and flickering images that were considered to be the source of the discomfort experienced by users (Drexler et al., 2004). Simulator sickness was, therefore, initially thought to be due solely to the inadequacies of the equipment, so equipment improvements would eliminate the sickness problem (Kennedy, Jones, & Dunlap, 1996). However, as technological advances improved equipment fidelity and the visual scenes became more realistic, the incidence and severity of sickness actually increased (Kennedy et al., 2003; Kennedy & Lilienthal, 1994; Kennedy et al., 1990). Although the fundamental causes of motion sickness have not been completely identified, researchers have identified the following factors that influence the incidence and severity of VE sickness: characteristics of the individual user (for example, age, gender, exposure history, and current physiological state), exposure duration (that is, increased incidence and sickness severity are associated with increased duration), usage schedule (that is, repeated exposures generally
Virtual Environment Sickness and Implications for Training
221
reduce sickness severity), and various equipment features (Kennedy, Berbaum, Dunlap, & Smith, 1995; Kennedy et al., 1997; Kennedy & Fowlkes, 1992; Kolasinski, 1995; McCauley, 1984; Stanney, Kennedy, & Drexler, 1997). Obviously, the equipment creates the simulated environment and, of the major determiners of sickness, manipulation of VE equipment features provides the most direct, practical, and economical means to controlling sickness. Therefore, this chapter focuses primarily on the various system features that affect VE sickness.
Equipment Features Specification of the equipment parameters that promote effective performance and realism, but avoid or minimize sickness, is critical for the design and use of VE systems (Drexler, 2006; Kennedy, Berbaum, & Smith, 1993). A number of design inadequacies or equipment limitations have been reported in the scientific literature as potential factors that contribute to sickness in VEs. In the following sections, the equipment features implicated as factors influencing sickness are presented and categorized according to the type of VE system in which they can be found. The common VE system features are presented first, followed by the features specific to HMD based systems (see Bolas and McDowall, Volume 2, Section 1, Chapter 2), and the features specific to projection based systems (see Towles, Johnson, and Fuchs, Volume 2, Section 1, Chapter 3). It is important to note that although see-through HMDs (designed for augmented reality applications) and desktop displays are VE systems, they are not included in this chapter, which is focused on more immersive based VE systems. Common VE Equipment Features Individuals largely rely on their visual senses during exposure to a VE system, and, as such, the visual display will provide the most salient and detailed information about the simulated environment (Durlach & Mavor, 1995; Wilson, 1997). The visual display not only provides “input” to the user, changes in the visual scene also represent the “output” of the user (Kennedy & Smith, 1996). However, VEs are very interactive and, as a result, the visual display systems engage “numerous oculomotor systems, and hence have the potential to produce motion sickness symptoms” (Ebenholtz, 1992, p. 303). The display characteristics that have been implicated as factors influencing sickness include the field of view, display resolution, viewing region, and temporal delays (refresh rate, update rate, and system latency). Field of View and Display Resolution Research has shown that wider fields of view (FOVs; that is, the horizontal and vertical angular dimensions of a visual display) provide better task performance (Bowman, Datey, Ryu, Farooq, & Vasnaik, 2002; Wilson, 1997). However, research on the effects of sickness related to FOV size indicate that, in general, wider FOV displays increase the incidence and intensity of sickness, particularly
222
VE Components and Training Technologies
symptoms of eyestrain, headache, and dizziness (DiZio & Lackner, 1992; Lawson, Graeber, Mead, & Muth, 2002; Lin, Duh, Parker, Abi-Rached, & Furness, 2002; Padmos & Milders, 1992; Pausch et al., 1992). FOV is usually a trade-off with resolution (that is, image quality), but poor resolution can cause strain on the visual system as the user tries to focus on the simulated image, resulting in symptoms such as eyestrain and headache (Pausch et al., 1992). In wider FOV displays the available pixels are more spread out over the retinal area stimulated, which reduces display resolution (Bowman et al., 2002; Wilson, 1997). Accordingly, in simulators with computer-generated image display systems that have a fixed pixel capacity, high spatial resolution may be limited to a small FOV (Rinalducci, 1996). In contrast, narrower FOVs (that is, 40°– 60° vertical by 60°–80° horizontal) with higher resolution can cause tunnel vision or increase disorientation effects (Bowman et al., 2002). Relatedly, Kennedy, Fowlkes, and Hettinger (1989) indicated that wide FOV displays can magnify the effects of any distortions in the visual display. Durlach and Mavor (1995) also noted that greater geometric image distortions occur in HMD displays with large FOVs because a greater degree of magnification is required to project the real world size image onto the small display screens. Other research related to FOV size has suggested that the incidence of sickness is influenced by the amount of vection, the illusion of self-motion in the absence of physical movement (Hettinger, Berbaum, Kennedy, Dunlap, & Nolan, 1990; Hettinger & Riccio, 1992; Lawson et al., 2002) or flicker produced by the display. Several researchers have reported that displays with a wide FOV provide a more compelling sensation of vection as well as a better orientation within the simulated environment (Hettinger et al., 1990; Kennedy, Fowlkes, et al., 1989; Padmos & Milders, 1992; Pausch et al., 1992), but are also more likely to produce sickness symptoms (Hettinger, 2002; Hettinger & Riccio, 1992). Moreover, Durlach and Mavor (1995) reported that greater levels of motion sickness are produced when users make head movements in VE displays that induce vection. Sensitivity to flicker is greater in peripheral vision than in foveal (that is, central) vision (Boff & Lincoln, 1988b); thus, a wider FOV display will increase the likelihood that the user will perceive flicker because more of the peripheral vision will be stimulated (Pausch et al., 1992). Flicker is not only distracting to the VE user, it can also induce symptoms of motion sickness, particularly those related to the visual system (La Viola, 2000). Viewing Region The viewing region of a display is the area in which the system user is able to maintain an image of the simulated scene (Padmos & Milders, 1992). The design eye point, also referred to as the design eye, is the point located in the geometric center of the viewing region, the optimal position for the user to view the display (Pausch et al., 1992). Kennedy, Fowlkes, et al. (1989) explained that graphic displays such as those used in simulators only provide an accurate visual representation when they are viewed from the design eye (see also Kennedy, Berbaum, et al., 1987). Consequently, the visual image becomes increasingly distorted as
Virtual Environment Sickness and Implications for Training
223
the eccentric distance from the design eye point increases (Padmos & Milders, 1992; Pausch et al., 1992), which can increase sickness (Kennedy, Fowlkes, et al., 1989). Optical distortion can also occur in HMD based systems when there is a discrepancy between the interpupillary distance of the user (discussed in a later section) and the optical centers of the HMD display screens (Wilson, 1996; Mon-Williams, Wann, & Rushton, 1993). Moreover, optical distortions are generally likely with HMD based systems because the lenses are imperfect (Wilson, 1996). Prismatic distortions from the lenses could occur if the individual is not looking through the center of the lenses, such as when the headset is not properly adjusted or while the user looks around the visual environment (Wilson, 1996). Relatedly, a high degree of optical magnification is required to transfer the simulated scene on the small HMD display screens into a real world size image on the retina, and greater geometric image distortions occur as the degree of magnification increases because the display screens are positioned about an inch in front of the eyes (that is, a fixed close viewing distance), which can increase sickness (Durlach & Mavor, 1995). Temporal Delays VE systems are controlled by computers that must perform a large number of calculations in order to (1) generate the simulated visual imagery, (2) control the inertial or position tracking system, and (3) monitor and respond to the control inputs of the system user (Frank, Casali, & Wierwille, 1988). As the number of required calculations increases due to factors such as an increase in scene complexity, the temporal delay between a user’s input to the system and subsequent changes in the system output, in terms of the visual display and motion-base, can also increase (Frank et al., 1988). Other factors that can affect computational and rendering speeds include wider FOV displays, higher image resolution, and visual scene changes that accommodate head movements (Durlach & Mavor, 1995). Moreover, Frank et al. (1988) asserted that separate computers with different update rates are often used for the visual and motion systems in simulators, which can exacerbate temporal delays and thereby make the visual-inertial delays asynchronous. While temporal delays can obviously affect the user’s performance, temporal lags in VE systems also have the potential to contribute to sickness (Wilson, 1997). The factors that limit temporal resolution include display refresh rate, update rate, and system latency (Durlach & Mavor, 1995). Refresh Rate. Refresh rate, or frame rate, is defined as the frequency with which an image is generated on the display (that is, the time required to update the visual image on the screen; Blade & Padgett, 2002). The interactive nature of VEs requires high frame rates, although the specific frame rate required in any particular situation depends on the type of environment simulated (Durlach & Mavor, 1995). The refresh rate can affect the quality of the displayed images, but is also related to the perception of flicker (Durlach & Mavor, 1995; Wilson, 1996). Specifically, the refresh rate can interact with luminance (that is, the brightness or intensity of the light coming from the display) to produce flicker,
224
VE Components and Training Technologies
which contributes to visual fatigue and sickness (Padmos & Milders, 1992; Pausch et al., 1992). For instance, higher luminance levels and higher contrast levels are known to increase flicker sensitivity while slower refresh rates can promote flicker in the visual display (Boff & Lincoln, 1988a). However, because of the interaction of refresh rate, luminance, and contrast, in order to suppress flicker the refresh rate must increase as luminance and contrast increase or vice versa (Pausch et al., 1992). Durlach and Mavor (1995) asserted that the typical luminance level in HMD displays was sufficient to cause flicker for frame rates of 30 hertz (Hz) or less. Relatedly, La Viola (2000) suggested that perceived flicker could be eliminated in the fovea with a 30 Hz refresh rate, but a higher refresh rate was required to eliminate flicker in the periphery for large targets. Since sensitivity to flicker increases with larger FOVs, faster refresh rates (that is, 80 to 90 Hz) may also be required in FOVs larger than 70° in order to avoid flicker (Padmos & Milders, 1992). Therefore, May and Badcock (2002) suggested that with current display luminances, a frame rate of at least 120 Hz was required to avoid flicker (see also Bridgeman, 1995). Update Rate. Update rate, the rate or frequency with which a new image is generated and shown on the visual display, is typically measured in frames per second (fps; Padmos & Milders, 1992). The update rate is determined by the power of the computer hardware (that is, the computational speed) and is inversely related to the complexity of the visual scene (Dulach & Mavor, 1995; Pausch et al., 1992; Wilson, 1996). In other words, there is a trade-off between display update rate and visual scene complexity where faster update rates limit the level of visual complexity available (Wilson, 1997). For example, Wilson noted that a 30 fps update rate is a “comfortable” rate for the eye because it is similar to watching a video, but more detailed and complex applications can only support 10 to 20 fps. A low update rate can cause the images in the visual display to shake and create contour distortions (Padmos & Milders, 1992), which can produce disorientation and other symptoms of motion sickness (May & Badcock, 2002). For example, Durlach and Mavor (1995) noted that update rates below 12 Hz can induce sickness. Therefore, the authors suggested that the minimum update rate for HMD systems is 12 fps in order for the display motion to be perceived as smooth and to provide some realism in the visual dynamics (see also Wilson, 1996). In computer-generated imagery (CGI) simulator displays (discussed in a later section), the maximum update frequency also depends on the complexity of the visual scene (that is, the number of polygons to be processed) as well as the total number of pixels that can be processed each second (that is, the pixel fill rate; Padmos & Milders, 1992). The authors noted that 30 Hz would be a sufficient update frequency for many simulator applications, but higher update frequencies would be required when faster angular speeds of displayed objects were used to avoid shaking images. However, Wilson (1996) indicated that update rate and system latency (discussed in the next section) are independent,
Virtual Environment Sickness and Implications for Training
225
so even with a fast update rate there may still be lags in the system that can cause disorientation. System Latency. VEs are computer based systems, so computational limitations of the equipment can produce a temporal delay between operator input and subsequent changes to the visual display (Kennedy, Fowlkes, et al., 1989). In the scientific literature, various terms have been used for this type of delay including system lag/latency, system update rate, image delay, or transport delay. System latency is a combination of (1) the sampling time of the input controls, (2) the time to calculate a viewpoint change, and (3) the time between position change input from the host computer to the visual display system and rendering of the corresponding image (Padmos & Milders, 1992). A large degree of system latency can affect the user’s control of the simulated environment and can increase sickness (Padmos & Milders, 1992). Previous research in flight simulators has shown that when large system delays were present, pilots were unable to accurately predict the length of the delay, which caused them to base their current actions on a guess of the vehicle’s position following their previous control input (Pausch et al., 1992). The authors reported that this technique, sometimes referred to as “guess and lead the system,” usually failed and caused the pilot to overcompensate control of the vehicle, which produced oscillations. Consequently, abnormal accelerations caused by the operator-induced oscillations increased the potential for simulator sickness because very low frequency motion or visual distortions were produced as a result of the increased load on the computer (Kennedy, Berbaum, et al., 1995). Accordingly, system delays should be no more than 40 to 80 milliseconds (ms) in driving simulators and 100 to 150 ms in flight simulators (Padmos & Milders, 1992). In HMD based systems, system lag or latency is defined as the amount of time required to send a signal from the position tracker (discussed in the next section) and subsequent presentation of the image on the display (Wilson, 1996)—in other words, the time between when an individual moves within the environment and when the movement is reflected in the visual scene. System lag in HMD based systems is composed of the position tracker delay, the delay in sending the position information to the computer, and the delay in processing the information and creating the image (Wilson, 1996). System latencies of 100 ms or greater have been shown to induce sickness symptoms (Wilson, 1996). For example, DiZio and Lackner (1997) investigated the effects of system delay (that is, delay between head movements and updates to the visual scene) on motion sickness where system update delay (67, 100, 200, and 300 ms) and FOV (wide [126° × 72°] versus halving the linear dimension) were varied. The study found that significant motion sickness symptoms, including nausea, were induced in the shortest delay condition, and the severity of sickness increased monotonically with system delay. However, the results also showed that reducing the FOV reduced the effect of the update delay on sickness (that is, the severity of motion sickness was cut in half in the decreased FOV condition with a 200 ms system delay).
226
VE Components and Training Technologies
Features Specific to HMD Based Systems The equipment features specific to HMD based systems that have been implicated as factors influencing sickness include the type of visual display, interpupillary distance, helmet weight, and position tracker. Visual Display Type HMDs typically contain two liquid crystal displays with magnifying optics positioned in front of each eye (Rinalducci, 1996). The displays are either binocular or biocular. Binocular displays present a slightly different image to each eye with some degree of overlap (about 60°) that provides stereoscopic depth information (that is, cues for the distance of objects) similar to viewing objects in the real world (Mon-Williams & Wann, 1998; Wann & Mon-Williams, 2002). Conversely, biocular displays present identical images to each eye so depth cues are not available (Mon-Williams & Wann, 1998). Because humans have two eyes with some degree of spacing between them, a slightly different image is seen by both eyes when viewing an object under normal viewing conditions, which provides the ability to judge relative depth (that is, to see very small differences in depth; Rinalducci, 1996). Thus, when viewing a near object, our eyes turn inward together (that is, convergence) in order to see the object as a single entity and the curvature of the lens changes to focus the image on the retina (that is, accommodation; Ebenholtz, 2001; May & Badcock, 2002). Furthermore, accommodation and convergence are cross-linked so the eyes normally converge and accommodate for the same distance, and accommodation produces convergence and vice versa (Mon-Williams & Wann, 1998). In a stereoscopic HMD, the display is positioned only about an inch away from the eyes, but the images presented on the screens can show objects positioned at different optical distances (for example, 10 feet [ft], 100 ft, and so forth; Wilson, 1996). Accommodation is therefore fixed to the distance of the display in order to focus the displayed images, whereas the degree of convergence changes relative to the distance of the virtual objects being viewed (Rinalducci, 1996; Wann & Mon-Williams, 2002). Consequently, the normal accommodationconvergence relationship is disrupted because there is a mismatch between the amount of convergence and accommodation needed to view the display, resulting in symptoms such as eyestrain or headache (Ebenholtz, 1992; Kennedy, Berbaum, et al., 1987). Several empirical studies have evaluated the effects of binocular and biocular system use on the visual system. Mon-Williams et al. (1993) examined the effects of using a binocular HMD on the visual system and found deficits in binocular vision after a relatively brief exposure (that is, 10 minutes). Participants also reported symptoms related to disturbances of the visual system including blurred vision, eyestrain, headache, difficulty focusing, and several participants also reported nausea. Rushton, Mon-Williams, and Wann (1994) hypothesized that the primary cause of the visual deficits found in the Mon-Williams et al. study was the conflict between the stereoscopic depth cues, image disparity, and focal depth (that is, the information that produced a conflict in accommodation and
Virtual Environment Sickness and Implications for Training
227
convergence). Therefore, they replicated the study using a biocular display and larger sample size. Biocular displays present the same image to each eye, so there is no dissociation between convergence and accommodation (Wilson, 1996). In contrast to the Mon-Williams et al. study, no significant changes in binocular visual performance were found for exposure periods of up to 30 minutes. Additionally, compared to the sickness found in the previous study, only a few participants reported mild symptoms of visual strain. Mon-Williams and Wann (1998) later demonstrated that even during relatively short exposures (that is, 10 minutes) to a binocular HMD, a continual conflict between accommodation and convergence caused stress on the visual system. Study participants reported adverse visual symptoms (for example, eyestrain and headache) and measurable changes in visual functioning. Therefore, the authors concluded that the differences in effects on the visual system between binocular and biocular displays found in their previous studies were due to accommodation-convergence conflicts rather than the stereoscopic depth information provided in binocular displays. Based on their findings, the investigators also expressed concern that the changes in participants’ visual functioning from exposure to the HMD could affect subsequent performance on visually demanding tasks such as driving. Thus, stereoscopic systems may support better task performance, but they also increase the likelihood for visual side effects compared to biocular displays because of the inherent conflict between accommodation and convergence (Wann & Mon-Williams, 2002; Wilson, 1996). Interpupillary Distance Some HMDs provide the ability to adjust the lateral distance between the eyepieces (that is, the display screens) in order to accommodate differences in the interpupillary distance (IPD) of the users, but others provide only a fixed distance between the optical centers of the display lenses (Mon-Williams et al., 1993). However, as mentioned previously, a discrepancy between the IPD and the optical centers of the display screens can create optical distortions in the visual imagery, which can produce stress on the visual system and increase sickness symptoms (Mon-Williams et al., 1993, 1995; Rushton et al., 1994; Wilson, 1996). Helmet Weight The weight of an HMD can vary from four ounces to more than five pounds (McCauley-Bell, 2002). However, changing the weight of the head alters the inertia of the head, which can be extremely provocative (Durlach & Mavor, 1995). Most HMDs are also coupled with a position tracking device that necessitates head movements in order to change the viewpoint of the simulated visual scene. DiZio and Lackner (1992) argued that the weight of an HMD creates sensorimotor rearrangements during head movements, which can increase sickness. They also noted that an HMD that weighs 2.5 pounds increases the effective weight of the head by at least 20 percent. Similarly, Durlach and Mavor (1995) pointed out that wearing an HMD that increased the weight of the head by
228
VE Components and Training Technologies
50 percent can, in general, increase a person’s susceptibility to motion sickness during exposure to angular acceleration. For instance, DiZio and Lackner (1992) discussed the results of a study where participants were exposed to periodic angular accelerations and decelerations in a rotating chair. Motion sickness symptoms were more severe in participants wearing a weighted helmet during exposure than those with no load on their head. Position Tracker An important component of HMD based systems is the ability to detect and track the position and the orientation of the user’s head in order to identify where the individual is looking within the environment and make appropriate changes to the simulated scene (Durlach & Mavor, 1995; Wilson, 1996). The majority of HMDs are directly coupled to the motion of the user’s head using a position tracking system (Durlach & Mavor, 1995). A position tracker, consisting of sensors mounted to the HMD, first determines the position and orientation of the user’s head and then transfers the information to the processing computer, which generates and renders an image that corresponds to a viewpoint change in the simulated scene based on the user’s head movements (Biocca, 1992; Wilson, 1996). The accuracy of position information provided by a head tracker can vary, and as a result, can influence the incidence of sickness symptoms (La Viola, 2000). For instance, a study by Bolas (as cited in Wilson, 1996) indicated that nausea was a consequence of “poorly tracked systems, with slow response and noise in the tracking system” (p. 43). Additionally, the stability of the information provided by some tracking devices can produce jitter and, thus, distortion in the visual image that can induce sickness symptoms (La Viola, 2000). Another temporal constraint of many VE systems is the lag associated with position tracking systems, which has been cited as the major factor contributing to update delays in HMD images (Durlach & Mavor, 1995). Delays between a tracker system acquiring position information and the viewpoint update on the screen can range from 10 to 250 ms for commonly used electromagnetic tracking systems (Draper, Viirre, Furness, & Gawron, 2001). Moreover, DiZio and Lackner (1992) asserted that temporal distortions in the visual display occur because “the visual displays and head tracking devices do not match human capabilities and graphics systems cannot keep up with rapid human movements” (p. 322). The latency of a position tracker is based on the time required to register the user’s position or movement and send the information to the processor (Wilson, 1997). Once the signal is received by the processor, there is another delay in processing the position information and rendering the update in the visual scene (Wilson, 1997). If position tracker delays are present, the user may perceive a difference in what is represented within the visual scene and what he or she is doing in the real world (that is, a mismatch between head motion and the visual display), which can affect task performance as well as induce sickness symptoms including nausea or dizziness (Allison, Harris, Jenkin, Jasiobedzka, & Zacher, 2001; Hettinger & Riccio, 1992). Moreover, position tracker delays can be especially
Virtual Environment Sickness and Implications for Training
229
nauseogenic in wide FOV displays because larger head movements are needed to acquire targets in the peripheral field (Durlach & Mavor, 1995). Durlach and Mavor maintained that tracker-to-host computer rates must be at least 30 Hz because delays between head motion and visual feedback less than 60 ms may induce sickness, and they argued that position trackers should not contribute more than 10 ms to overall system latency. A study by Draper et al. (2001), however, provided an exception to the general findings reported in the literature. In their experiment, two time delays (125 ms and 250 ms) were created using a delay buffer between the head tracker and the image processing computer. Their findings revealed that sickness symptoms were induced by exposure to the HMD system, but contrary to the investigators’ hypothesis, there was no significant effect of time delay on sickness. Features Specific to Projection Based Systems The equipment features specific to projection based systems that have been implicated as factors influencing sickness include CGI displays, collimation, platform type, motion frequency, and temporal lag. CGI Displays Many simulators employ multiple CRT visual displays using computergenerated imagery (Kennedy, Fowlkes, et al., 1989). However, misalignment of the CGI optical channels can cause distortion in visual images because the design eye from which all CGI channels could be viewed simultaneously is eliminated (Kennedy, Berbaum, et al., 1987; Kennedy & Fowlkes, 1992). Therefore, the same optical distortions that occur when users move their heads outside of the design eye (see the previous discussion on viewing region) can be created and increase sickness. Additionally, if the focus of the CGI channels is different, different accommodative distances would be required to view a scene that was imaged at infinity (Kennedy, Berbaum, et al., 1987). The authors indicated that the consequence of these repeated changes in accommodation can be symptoms such as eyestrain or headache and noted that the incidence and severity of eyestrain was higher in simulators with CGI displays than in those with dome displays. Moreover, Kennedy, Berbaum, et al. (1987) argued that the number of CGI optical channels was generally proportional to the number of symptoms reported. Collimation Collimation relates to the parallel alignment of the light rays emitted by the visual display, which places the image at optical infinity (Padmos & Milders, 1992). Collimated images are typically used to increase realism in the simulated environment by creating an illusion of depth in two-dimensional images. In simulators, collimated images from more than one image channel (that is, display) are often seamlessly combined using concave mirrors (Padmos & Milders, 1992). Kennedy (1996) explained that an improperly collimated system can produce negative convergence and accommodation that can contribute to simulator
230
VE Components and Training Technologies
sickness, especially symptoms associated with disturbances of the visual system (for example, eyestrain, headache, and so forth). Platform Type Simulator platforms are either fixed- or motion-base. In a fixed-base simulator, information regarding motion is provided solely by the visual display system, whereas motion-base simulators provide a subset of the inertial forces that would be present during real movement in the vehicle being simulated (DiZio & Lackner, 1992; Durlach & Mavor, 1995). Specifically, a motion-base simulator can provide motion cues compatible with initial but not sustained acceleration using two types of inertial cues: acceleration and tilt (Kennedy, Berbaum, et al., 1987). McCauley and Sharkey (1992) indicated that the hydraulic motion-base typically used on simulators provides six axes of movement with ±35° of angular displacement and two meters of linear displacement. Although motion-base systems are extremely expensive, they are used in specific applications (for example, flight simulators) to enhance the sense of motion provided by the visual display (Durlach & Mavor, 1995). However, visual movement through a simulated environment that is not accompanied by the normal inertial cues (that is, forces and accelerations) associated with movement through the real environment can induce sickness, particularly nausea (May & Badcock, 2002; McCauley & Sharkey, 1992). Consequently, the overall incidence of sickness is typically lower in simulators with a motion-base than those with a fixed-base (McCauley, 1984). Kennedy, Berbaum, et al. (1987) suggested that one reason for the lower incidence of sickness was due to differences in pilot head movements during exposure. The authors explained that in a motion-base simulator, pilots’ head movements were similar to those in the actual vehicle, whereas the head movements in fixed-base simulators were often in conflict with the inertial stimulus, which increased the provocativeness of the simulation. There have, however, been a few reports that contradict the general findings of a difference in sickness incidence between fixed- and motion-base simulators (McCauley & Sharkey, 1992). Motion Frequency A strong relationship between sickness incidence and exposure to very low frequency whole-body vibration has been found in a variety of provocative motion environments including ships at sea, planes, automobiles, trains, and motionbase simulators (Guignard & McCauley, 1990). Research has indicated that the most nauseogenic frequency of motion is centered around 0.2 Hz; the lower limit for nauseogenic motion is frequencies below 0.1 Hz, and a decline in acceleration-induced sickness also occurs at frequencies above 0.2 Hz (Guignard & McCauley, 1990). It is generally agreed that sickness incidence in motion-base simulators depends on the frequency and acceleration characteristics of the motion produced by the platform (Kennedy et al., 1990). Specifically, the incidence and severity of sickness is usually greatest when the energy spectra from the motion-base is in the very low frequency range of 0.2 Hz (Kennedy et al.,
Virtual Environment Sickness and Implications for Training
231
1990; Lawson et al., 2002; McCauley, 1984). Kennedy, Berbaum, et al. (1987) also reported that motion sickness is proportional to the acceleration in a system, so 0.2 Hz is more nauseogenic than 0.5 Hz. Moreover, an examination of the sickness rates in several motion based flight simulators indicated that the simulators that produced linear oscillations in the range of 0.2 Hz showed significantly higher incidence and severity of simulator sickness than motion-base simulators with low levels of energy in the 0.2 Hz region (Kennedy, Allgood, Van Hoy, & Lilienthal, 1987; Van Hoy, Allgood, Lilienthal, Kennedy, & Hooper, 1987). Temporal Lag Inaccuracies in motion cueing created by temporal delays between the control inputs of the simulator user and subsequent changes in the visual display, motionbase, or both have been implicated as a contributing factor to the incidence of simulator sickness (Kennedy & Fowlkes, 1992; Kennedy et al., 1990; McCauley, 1984). For example, Frank et al. (1988) evaluated visual-motion coupling delays and cuing order in a driving simulator using different combinations of transport delays (0, 170, or 340 ms) in either the visual system, motion system, or both. Their results showed that zero delay in either system was the most desirable condition, whereas delays in the visual or motion system increased participants’ overall severity of sickness. However, visual delays affected sickness incidence more than motion system delays and when asynchronous delays occurred between the visual and motion systems, sickness was greater when the motion system led the visual system. In contrast, Padmos and Milders (1992) indicated that the visual imaging system should not have a time lag with respect to the inertial system. The general recommendation for reducing the potential for sickness due to cue asynchrony is to limit the delay between any two system cues to no more than 35 ms (Lilienthal, as cited in Pausch et al., 1992). Kennedy, Berbaum, et al. (1987) also recommended that lag in the motion-base should not exceed 83 to 125 ms, and there should be no more than 40 ms asynchrony between visual and inertial cues.
IMPLICATIONS OF VE SICKNESS ON TRAINING State-of-the-art and compellingly realistic VE systems currently exist, but the pervasiveness of deleterious side effects associated with exposure has the potential to limit the utilization of VE systems, particularly as a training device. Specifically, if humans are unable to effectively function in the VE, training objectives may be compromised or could result in a negative transfer of the training effect, which has the potential to affect subsequent performance on the real world task (Canaras, Gentner, & Schopper, 1995; Lathan, Tracey, Sebrechts, Clawson, & Higgins, 2002). McCauley (1984) pointed out that sickness symptoms could distract users and/or decrease their motivation during a simulation based training exercise and ultimately compromise the effectiveness of the training protocol (see Hettinger et al., 1990; Kennedy et al., 1990). Users that experience symptoms during a simulation may also learn new behaviors (that
232
VE Components and Training Technologies
is, coping mechanisms) such as minimizing head movements, using only the instruments (that is, not looking at the visual displays), or avoiding aggressive maneuvers in order to avoid or reduce sickness symptoms (Baltzley, Kennedy, Berbaum, Lilienthal, & Gower, 1989; Hettinger et al., 1990; Kennedy et al., 1990; Kennedy, Lilienthal, et al., 1989). However, while these behaviors may be appropriate for the simulated task, they may not be appropriate for performing the corresponding real world tasks (Lathan et al., 2002; Pausch et al., 1992). Moreover, any negative transfer of training to the real world device could cause users to lose confidence in the training they receive from the simulator, resulting in decreased simulator usage (McCauley, 1984; Pausch et al., 1992). Similarly, once a user experiences sickness, he or she may be reluctant to return to the VE for subsequent training or, alternatively, could disengage some of the system features (for example, the motion-base) to reduce the potential sickness (Crowley, 1987; McCauley, 1984). Individuals experiencing side effects may also be unwilling or unable to remain in the environment. Consequently, a proportion of those exposed may prematurely cease their interaction with the VE system prior to training completion. Furthermore, if the sickness problem is too severe and cannot be remedied, the device could be discarded, like the helicopter simulator reviewed in Miller and Goodson (1960). For the company that owns the VE system, both of these situations have economic implications associated with the purchase of equipment, either specific components or the entire system, which cannot be used. The side effects of exposure to VE systems also have the potential to jeopardize the health and/or safety of users. One such threat is the persistence of symptoms (that is, aftereffects) for a prolonged period of time following termination of exposure to the system. Baltzley et al. (1989) investigated the time course of recovery from simulator sickness and found 75 percent of the pilots who experienced symptoms indicated the symptoms dissipated within one hour after simulator exposure. Of greater concern to user safety, however, was the authors’ findings that indicated that 13 percent of all military pilots exposed to different simulators reported aftereffects that persisted more than four hours after exposure to the device; 8 percent experienced symptoms for six or more hours. Likewise, Stanney and Kennedy (1998) reported persistent aftereffects from exposure to an HMD based system; participants were still reporting significant levels of symptoms one hour after exposure to the device. Specifically, disorientationtype symptoms (for example, dizziness) were 95 times higher, gastrointestinalrelated symptoms (for example, nausea) were 10 times higher, and visual disturbances (for example, eyestrain) were 7 times higher than pre-exposure levels. Unfortunately, the study was not designed to evaluate the time course of symptom recovery beyond the one hour post-exposure period. Extreme cases of prolonged VE aftereffects have also been reported. For example, Viirre and Ellisman (2003) reported that after a researcher used a desktop VE for 10 minutes, the user only experienced postural instability for a few minutes immediately after exposure; however, several hours later, there was an onset of vertigo and nausea that persisted for four days.
Virtual Environment Sickness and Implications for Training
233
Additional threats to user safety occur when the side effects of VE exposure appear after the user has left the VE facility. One potential safety hazard is delayed effects; a user is symptom-free during or immediately following exposure to a VE, but symptom onset occurs during some period of time subsequent to stimulus exposure (Baltzley et al., 1989). For example, Miller and Goodson (1960) reported that while most of the individuals exposed to a helicopter simulator experienced sickness symptoms during the exposure, some users did not experience any symptoms until several hours after leaving the simulator. Of particular concern for users’ safety was the authors’ report of a flight instructor who was forced to stop his car and walk around in order to reduce the disorientation he was experiencing as a delayed effect of his earlier exposure to the simulator. Another threat to user safety is flashbacks, which occur when symptoms cease once exposure to a provocative stimulus is terminated, but symptom onset suddenly reoccurs later (Baltzley et al., 1989). McCauley (1984) cited a 1980 study by Kellogg et al. where pilots reported visual flashbacks that occurred 8 to 10 hours after exposure to a fixed-base flight simulator. Similarly, Stanney and Kennedy (1998) found that approximately 31 percent of the participants in their study reported flashbacks following VR exposure. In response to reports of prolonged and delayed aftereffects, the military instituted mandatory grounding policies for post-simulator flights in order to guard against the negative aftereffects that can occur subsequent to training in a flight simulator (Crowley, 1987; Kennedy et al., 1992). A simulator sickness field manual, developed by the U.S. Department of Defense and distributed to all military simulator sites, stated that flight personnel should be grounded (that is, flights should not be scheduled) for at least 24 hours after simulator exposure or 12 hours after simulator sickness symptoms have subsided, whichever is longer (Naval Training Systems Center, 1989). Obviously, restrictions on the post-simulator activities of flight personnel can affect operational readiness, but the military also recognized the potential risk to pilots as well as to the expensive equipment under their control (Kennedy et al., 1990). Recently, the Department of the Navy (2004) issued an update to the NATOPS (Naval Air Training and Operating Procedures Standardization) General Flight and Operating Instructions that included policy and procedural guidelines on simulator sickness. In addition to warnings about the occurrence of prolonged and delayed aftereffects, the aviation safety instructions also mandated that (1) flight personnel experiencing simulator sickness abstain from flight duties on the day of simulator exposure and (2) flight personnel who have previously experienced simulator sickness cannot be scheduled for flight duty for at least 24 hours following exposure to a simulator. Clearly, prolonged aftereffects, delayed effects, and flashbacks can present a significant threat to the afflicted user’s activities for a considerable period of time following exposure. Kennedy and Stanney (1996) indicated that these types of long-term aftereffects occur in less than 10 percent of all flight simulator exposures. An overall incidence rate for HMD based systems has not been reported, although long-term aftereffects data from one study showed that 35 percent of participants reported symptoms more than four hours after exposure and
234
VE Components and Training Technologies
17 percent reported symptoms the following morning (Stanney, Kennedy, & Kingdon, 2002). Kennedy and Stanney (1996) also suggested that, compared to flight simulators, the advanced technology in VR displays will produce “an even more serious level of impairment” (p. 61). Nevertheless, long-term aftereffects create the potential for the legal liability of VE designers, manufacturers, and system owners if an accident occurs as a result of VE exposure. It has been suggested that disorientation-type aftereffects such as dizziness have the greatest potential for causing personal injury (Baltzley et al., 1989). Disorientation, drowsiness, fatigue, and nausea, which are frequently reported following exposure to VE systems, can also affect an individual’s ability to safely perform routine tasks such as walking, riding a bicycle, or operating a motorized vehicle (Kennedy, Kennedy, & Bartlett, 2002). If an accident occurs after the user is released from the VE facility and the cause can be associated with the aftereffects of VE exposure, the manufacturer or company that owns the VE device could be found legally liable and, thus, be required to pay compensation for damages (Kennedy et al., 2002). At a minimum, the manufacturer or company could face costly and time consuming litigation in order to defend a product liability claim. As VE technologies continue to develop, it is anticipated that VE systems will become less expensive and, thus, more widely accessible to diverse populations. The number of people who could experience adverse side effects will also increase resulting in a greater risk for product liability claims. Kennedy et al. (2002) therefore emphasized the need for manufacturers and owners of VE systems to take proactive steps in order to minimize their legal liability and outlined a seven-step system safety approach that could be used to assess the potential risks associated with the aftereffects of VE exposure to circumvent product liability issues. REFERENCES Allison, R. S., Harris, L. R., Jenkin, M., Jasiobedzka, U., & Zacher, J. E. (2001). Tolerance of temporal delay in virtual environments. Proceedings of the IEEE Virtual Reality 2001 International Conference (pp. 247–254). New York: IEEE. Baltzley, D. R., Kennedy, R. S., Berbaum, K. S., Lilienthal, M. G., & Gower, D. W. (1989). The time course of postflight simulator sickness symptoms. Aviation, Space, and Environmental Medicine, 60(11), 1043–1048. Biocca, F. (1992). Will simulator sickness slow down the diffusion of virtual environment technology? Presence, 1(3), 334–343. Blade, R. A., & Padgett, M. (2002). Virtual environments standards and terminology. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 15–27). Mahwah, NJ: Lawrence Erlbaum. Boff, K. R., & Lincoln, J. E. (Eds.). (1988a). Flicker sensitivity: Effect of flicker frequency and luminance level. Engineering data compendium: Human perception and performance, Vol. 1 (pp. 170–171). Wright-Patterson Air Force Base, OH: Aerospace Medical Research Laboratory. Boff, K. R., & Lincoln, J. E. (Eds.). (1988b). Flicker sensitivity: Effect of target size. Engineering data compendium: Human perception and performance, Vol. 1 (pp. 178–179). Wright-Patterson Air Force Base, OH: Aerospace Medical Research Laboratory.
Virtual Environment Sickness and Implications for Training
235
Bowman, D. A., Datey, A., Ryu, Y. S., Farooq, U., & Vasnaik, O. (2002). Empirical comparison of human behavior and performance with different display devices for virtual environments. Proceedings of the Human Factors and Ergonomics Society 46th Annual Meeting (pp. 2134–2138). Santa Monica, CA: Human Factors and Ergonomics Society. Bridgeman, B. (1995). Direction constancy in rapidly refreshed video displays. Journal of Vestibular Research, 5(6), 393–398. Canaras, S. A., Gentner, F. C., Schopper, A. W. (1995, July). Virtual reality (VR) training (Final Rep. No. CSERIAC-RA-95-009). Wright Patterson Air Force Base, OH: Crew System Ergonomics Information Analysis Center. Crowley, J. S. (1987). Simulator sickness: A problem for Army aviation. Aviation, Space, and Environmental Medicine, 58(4), 355–357. Curry, R., Artz, B., Cathey, L., Grant, P., & Greenberg, J. (2002). Kennedy SSQ results: Fixed- vs. motion-base Ford simulators. Proceedings of the Driving Simulation Conference “DSC2002” (pp. 289–299). Department of the Navy. (2004, March 1). NATOPS General Flight and Operating Instructions (OPNAVINST 3710.7T), Section 8.3.2.17: Simulator sickness (pp. 8–10). Washington, DC: Author. DiZio, P., & Lackner, J. R. (1992). Spatial orientation, adaptation, and motion sickness in real and virtual environments. Presence, 1(3), 319–328. DiZio, P., & Lackner, J. R. (1997). Circumventing side effects of immersive virtual environments. In M. J. Smith, G. Salvendy, & R. J. Koubek (Eds.), Design of computing systems: Social and ergonomic considerations (pp. 893–896). Amsterdam: Elsevier. Draper, M. H., Viirre, E. S., Furness, T. A., & Gawron, V. J. (2001). Effects of image scale and system time delay on simulator sickness within head-coupled virtual environments. Human Factors, 43(1), 129–146. Drexler, J. M. (2006). Identification of system design features that affect sickness in virtual environments. Unpublished doctoral dissertation, University of Central Florida, Orlando. Drexler, J. M., Kennedy, R. S., & Compton, D. E. (2004, September). Comparison of sickness profiles from simulator and virtual environment devices: Implications of engineering features. Paper presented at the Driving Simulation Conference Europe “DSC 2004,” Paris, France. Durlach, N. I., & Mavor, A. S. (Eds.). (1995). Virtual reality: Scientific and technological challenges. Washington, DC: National Academy Press. Ebenholtz, S. M. (1992). Motion sickness and oculomotor systems in virtual environments. Presence, 1(3), 302–305. Ebenholtz, S. M. (2001). Oculomotor systems and perception. Cambridge, United Kingdom: Cambridge University Press. Frank, L. H., Casali, J. H., & Wierwille, W. W. (1988). Effects on visual display and motion system delays on operator performance and uneasiness in a driving simulator. Human Factors, 30(2), 201–217. Gower, D. W., Lilienthal, M. G., Kennedy, R. S., & Fowlkes, J. E. (1987, September). Simulator sickness in U.S. Army and Navy fixed- and rotary-wing flight simulators. In Conference Proceedings of the AGARD Medical Panel Symposium on Motion Cues in Flight Simulation and Simulator Induced Sickness (AGARD-CP-433; pp. 8.1–8.20). Neuilly-sur-Seine, France: Advisory Group for Aerospace Research and Development.
236
VE Components and Training Technologies
Guignard, J. C., & McCauley, M. E. (1990). The accelerative stimulus for motion sickness. In G. H. Crampton (Ed.), Motion and space sickness (pp. 123–152). Boca Raton, FL: CRC Press. Hettinger, L. J. (2002). Illusory self-motion in virtual environments. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 471–491). Mahwah, NJ: Lawrence Erlbaum. Hettinger, L. J., Berbaum, K. S., Kennedy, R. S., Dunlap, W. P., & Nolan, M. D. (1990). Vection and simulator sickness. Military Psychology, 2(3), 171–181. Hettinger, L. J., & Riccio, G. E. (1992). Visually induced motion sickness in virtual environments. Presence, 1(3), 306–310. Kennedy, R. S. (1996). Analysis of simulator sickness data (Technical Rep., Contract No. N61339-91-D-0004). Orlando, FL: Naval Air Warfare Center, Training Systems Division. Kennedy, R. S., Allgood, G. O., Van Hoy, B. W., & Lilienthal, M. G. (1987, June). Motion sickness symptoms and postural changes following flights in motion-based flight trainers. Journal of Low Frequency Noise and Vibration, 6(4), 147–154. Kennedy, R. S., Berbaum, K. S., Dunlap, W. P., & Smith, M. G. (1995, October). Correlating visual scene elements with simulator sickness incidence: Hardware and software development (Phase II Final Rep., Contract No. N00019-92-C-0157). Washington, DC: Naval Air Systems Command. Kennedy, R. S., Berbaum, K. S., Lilienthal, M. G., Dunlap, W. P., Mulligan, B. E., & Funaro, J. F. (1987). Guidelines for alleviation of simulator sickness symptomatology (Final Rep. No. NAVTRASYSCEN TR-87-007). Orlando, FL: Naval Training Systems Center. Kennedy, R. S., Berbaum, K. S., & Smith, M. G. (1993). Methods for correlating visual scene elements with simulator sickness incidence. Proceedings of the 37th Annual Meeting of the Human Factors Society (pp. 1252–1256). Santa Monica, CA: Human Factors and Ergonomics Society. Kennedy, R. S., Drexler, J. M., & Compton, D. E. (1997). Simulator sickness and other aftereffects: Implications for the design of driving simulators. Proceedings of the Driving Simulation Conference (DSC’97; pp. 115–123). Paris, France: ETNA. Kennedy, R. S., Drexler, J. M., Compton, D. E., Stanney, K. M., Lanham, D. S., & Harm, D. L. (2003). Configural scoring of simulator sickness, cybersickness and space adaptation syndrome: Similarities and differences. In L. J. Hettinger & M. W. Haas (Eds.), Virtual and adaptive environments: Applications, implications, and human performance (pp. 247–278). Mahwah, NJ: Lawrence Erlbaum. Kennedy, R. S., Dunlap, W. P., Jones, M. B., & Stanney, K. M. (1996). Screening users of virtual reality systems for after-effects such as motion sickness and balance problems (Final Rep. No. NSF1-96-4). Arlington, VA: National Science Foundation. Kennedy, R. S., & Fowlkes, J. E. (1992). Simulator sickness is polygenic and polysymptomatic: Implications for research. International Journal of Aviation Psychology, 2(1), 23–38. Kennedy, R. S., Fowlkes, J. E., & Hettinger, L. J. (1989). Review of simulator sickness literature (Technical Rep. No. NTSC TR89-024). Orlando, FL: Naval Training Systems Center. Kennedy, R. S., Fowlkes, J. E., & Lilienthal, M. G. (1993). Postural and performance changes following exposures to flight simulators. Aviation, Space, and Environmental Medicine, 64, 912–920.
Virtual Environment Sickness and Implications for Training
237
Kennedy, R. S., Hettinger, L. J., & Lilienthal, M. G. (1990). Simulator sickness. In G. H. Crampton (Ed.), Motion and space sickness (pp. 317–341). Boca Raton, FL: CRC Press. Kennedy, R. S., Jones, M. B., & Dunlap, W. P. (1996). A predictive model of simulator sickness: Applications for virtual reality [Abstract]. Aviation, Space, and Environmental Medicine, 67(7), 672. Kennedy, R. S., Jones, M. B., Lilienthal, M. G., & Harm, D. L. (1994). Profile analysis of after-effects experienced during exposure to several virtual reality environments. In Conference Proceedings of the AGARD Medical Panel Symposium on Virtual Interface: Research & Applications (AGARD-CP-541; pp. 2.1–2.9). Neuilly-sur-Seine, France: Advisory Group for Aerospace Research and Development. Kennedy, R. S., Jones, M. B., Stanney, K. M., Ritter, A., & Drexler, J. M. (1996). Human factors safety testing for virtual environment mission-operations training (Final Rep. No. NASA1-96-2). Houston, TX: NASA Johnson Space Center. Kennedy, R. S., Kennedy, K. E., & Bartlett, K. M. (2002). Virtual environments and product liability. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 543–553). Mahwah, NJ: Lawrence Erlbaum. Kennedy, R. S., Lane, N. E., Lilienthal, M. G., Berbaum, K. S., & Hettinger, L. J. (1992). Profile analysis of simulator sickness symptoms: Application to virtual environment systems. Presence, 1(3), 295–301. Kennedy, R. S., & Lilienthal, M. G. (1994). Measurement and control of motion sickness aftereffects from immersion in virtual reality. Proceedings of Virtual Reality and Medicine, The Cutting Edge (pp. 111–119). New York: SIG-Advanced Applications, Inc. Kennedy, R. S., Lilienthal, M. G., Berbaum, K. S., Baltzley, D. R., & McCauley, M. E. (1989). Simulator sickness in U.S. Navy flight simulators. Aviation, Space, and Environmental Medicine, 60, 10–16. Kennedy, R. S., & Smith, M. G. (1996, November). A smart system to control stimulation for visually induced motion sickness (Phase II Final Report No. NAS9-19106). Houston, TX: NASA Lyndon B. Johnson Space Center. Kennedy, R. S., & Stanney, K. M. (1996). Virtual reality systems and products liability. The Journal of Medicine and Virtual Reality, 1(2), 60–64. Kenyon, R. V., & Afenya, M. B. (1995). Training in virtual and real environments. Annals of Biomedical Engineering, 23, 445–455. Kolasinski, E. M. (1995, May). Simulator sickness in virtual environments (ARI Tech. Rep. No. 1027). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Lampton, D. R., Kraemer, R. E., Kolasinski, E. M., & Knerr, B. W. (1995, October). An investigation of simulator sickness in a tank driver trainer (ARI Rep. No. 1684). Orlando, FL: U.S. Army Research Institute for the Behavioral and Social Sciences. Lathan, C. E., Tracey, M. R., Sebrechts, M. M., Clawson, D. M., & Higgins, G. A. (2002). Using virtual environments as training simulators: Measuring transfer. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 403–414). Mahwah, NJ: Lawrence Erlbaum. La Viola, J. J., Jr. (2000). A discussion of cybersickness in virtual environments. SIGCHI Bulletin, 32(1), 47–56. Lawson, B. D., Graeber, D. A., Mead, A. M., & Muth, E. R. (2002). Signs and symptoms of human syndromes associated with synthetic experiences. In K. M. Stanney (Ed.),
238
VE Components and Training Technologies
Handbook of virtual environments: Design, implementation, and applications (pp. 589–618). Mahwah, NJ: Lawrence Erlbaum. Lerman, Y., Sadovsky, G., Goldberg, E., Kedem, R., Peritz, E., & Pines, A. (1993). Correlates of military tank simulator sickness. Aviation, Space, and Environmental Medicine, 64(7), 619–622. Lin, J. J-W., Duh, H. B. L., Parker, D. E., Abi-Rached, H., & Furness, T. A. (2002). Effects of field of view on presence, enjoyment, memory, and simulator sickness in a virtual environment. Proceedings of the IEEE Virtual Reality Conference 2002 (pp. 164– 171). New York: IEEE. Magee, L. E. (1995, March). Virtual Reality Simulator (VRS) for training ship handling skills. Paper presented at the NATO/OCTAN Research Study Group 16 “Advanced Technologies Applied to Training Design” Workshop: Virtual Environments Training’s Future?, Portsmouth, England. May, J. G., & Badcock, D. R. (2002). Vision and virtual environments. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 29–63). Mahwah, NJ: Lawrence Erlbaum. McCauley, M. E. (Ed.). (1984). Simulator sickness: Proceedings of a workshop. Washington, DC: National Academy Press. McCauley, M. E., & Sharkey, T. J. (1992). Cybersickness: Perception of self-motion in virtual environments. Presence, 1, 311–318. McCauley-Bell, P. R. (2002). Ergonomics in virtual environments. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 807–826). Mahwah, NJ: Lawrence Erlbaum. Miller, J. W., & Goodson, J. E. (1960). Motion sickness in a helicopter simulator. Aerospace Medicine, 31(3), 204–212. Mon-Williams, M., & Wann, J. P. (1998). Binocular virtual reality displays: When problems do and don’t occur. Human Factors, 40(1), 42–49. Mon-Williams, M., Wann, J. P., & Rushton, S. (1993). Binocular vision in a virtual world: Visual deficits following the wearing of a head-mounted display. Ophthalmic and Physiological Optics, 13, 387–391. Mon-Williams, M., Wann, J. P., & Rushton, S. (1995). Design factors in stereoscopic virtual-reality displays. Journal of the SID, 3/4, 207–210. Naval Training Systems Center. (1989, October). Simulator sickness field manual: MOD 4. Orlando, FL: Author. Padmos, P., & Milders, M. (1992). Quality criteria for simulator images: A literature review. Human Factors, 34(6), 727–748. Pausch, R., Crea, T., & Conway, M. (1992). A literature survey for virtual environments: Military flight visual systems and simulator sickness. Presence, 1(3), 344–363. Regan, E. C., & Price, K. R. (1994). The frequency of occurrence and severity of sideeffects of immersion virtual reality. Aviation, Space, and Environmental Medicine, 65, 527–530. Rinalducci, E. J. (1996). Characteristics of visual fidelity in the virtual environment. Presence, 5(3), 330–341. Rushton, S., Mon-Williams, M., & Wann, J. P. (1994). Binocular vision in a bi-ocular world: New generation head-mounted displays avoid causing visual deficits. Displays, 15(4), 255–260. Stanney, K. M., & Kennedy, R. S. (1998). Aftereffects from virtual environment exposure: How long do they last? Proceedings of the Human Factors and Ergonomics Society
Virtual Environment Sickness and Implications for Training
239
42nd Annual Meeting (pp. 1476–1480). Santa Monica, CA: Human Factors and Ergonomics Society. Stanney, K. M., Kennedy, R. S., & Drexler, J. M. (1997). Cybersickness is not simulator sickness. Proceedings of the Human Factors and Ergonomics Society 41st Annual Meeting (pp. 1138–1142). Santa Monica, CA: Human Factors and Ergonomics Society. Stanney, K. M., Kennedy, R. S., & Kingdon, K. (2002). Virtual environment usage protocols. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 721–730). Mahwah, NJ: Lawrence Erlbaum. Ungs, T. J. (1988). Simulator induced syndrome in Coast Guard aviators. Aviation, Space, and Environmental Medicine, 59(3), 267–272. Van Hoy, B. W., Allgood, G. O., Lilienthal, M. G., Kennedy, R. S., & Hooper, J. M. (1987). Inertial and control systems measurements of two motion-based flight simulators for evaluation of the incidence of simulator sickness. Proceedings of the IMAGE IV Conference (pp. 265–273). Phoenix, AZ: Image Society Incorporated. Viirre, E., & Ellisman, M. (2003). Vertigo in virtual reality with haptics: Case report. Cyberpsychology and Behavior, 6(4), 429–431. Wann, J. P., & Mon-Williams, M. (2002). Measurement of visual aftereffects following virtual environment exposure. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications (pp. 731–749). Mahwah, NJ: Lawrence Erlbaum. Warner, H. D., Serfoss, G. L., Baruch, T. M., & Hubbard, D. C. (1993). Flight simulatorinduced sickness and visual displays evaluation (Final Tech. Rep. No. AL/HR-TR1993-0056). Brooks AFB: Armstrong Laboratory. Wilson, J. R. (1996). Effects of participating in virtual environments: A review of current knowledge. Safety Science, 23(1), 39–51. Wilson, J. R. (1997). Virtual environments and ergonomics: Needs and opportunities. Ergonomics, 40(10), 1057–1077. Witmer, B. G., Bailey, J. H., & Knerr, B. W. (1996). Virtual spaces and real world places: Transfer of route knowledge. International Journal of Human-Computer Studies, 45, 413–428.
Chapter 12
EVALUATING VIRTUAL ENVIRONMENT COMPONENT TECHNOLOGIES Mary Whitton and Fred Brooks This chapter is about user studies designed to evaluate how the characteristics of the components of a virtual environment (VE) system affect the VE user’s performance. As in this book section, component is defined broadly and encompasses hardware such as displays, software such as various rendering techniques, user interfaces that are part hardware and part software, and even complete VE systems. We illustrate the diversity of the experimental design, metrics, and analysis techniques required to perform evaluations. We focus on participant tasks and metrics, what works and what does not work, and what we learned about doing evaluation. Some lessons follow directly from the studies; some are more general. The references provide details of the studies and review related literature. This is not a primer on experimental design and statistics. For those new to user studies, we recommend Martin (2007) or Field and Hole (2003). Both are introductory and easy to read. Martin focuses on designing experiments, with an appendix on statistics; Field and Hole briefly cover experiment design and treat statistics in more detail. Berg (2006) comprehensively covers research methods in the social sciences, including such ethnographic techniques as observations and critical-incident interviews. We treat, in turn, evaluations related to sensory-input fidelity, user interfaces, and system performance. We conclude with comments about the role of evaluation in system design. The studies described here mostly draw on work done from 1998 to 2007 by graduate students in the Effective Virtual Environments research group at The University of North Carolina (UNC) at Chapel Hill. WHY EVALUATE COMPONENT TECHNOLOGIES? Evaluating component technologies early in the design process results in better component selection, shorter development time, lower deployment costs, and more satisfied users and sponsor. To effectively isolate the performance of the component under test and to minimize potential confounds, it is useful to test
Evaluating Virtual Environment Component Technologies
241
the component embedded in a system with known performance characteristics. The test setting must, however, be able to stress the component at the performance levels required by the final system and application. The earlier a problem is found, the less expensive—in both time and money— it is to fix. Consequently, evaluation—particularly usability evaluation—should not be a one-time event, but should be a series of evaluations performed during the design and development process. One does not want to find out in the field that users will not accept the display selected because it is not bright enough for the desert. Testing before final component selection leads to lower overall deployment costs since one avoids buying unneeded capability. If the impact of a component on user performance is evaluated beforehand, when budget constraints dictate the selection of a less capable component, designers know exactly what performance is being given up.
WHAT SHOULD BE EVALUATED AND HOW? The task to be trained drives the level of performance required of the components, whether the component is the acuity required in the display or the behavioral fidelity of semi-automated entities. One type of evaluation compares performance achieved with other techniques for doing the same thing. The different techniques are the levels of the independent variable. An example is the study “Managing Avatar-Object Collisions” that compares the UNC-developed MACBETH (managing avatar conflict by a technique hybrid) method of managing avatar-object collisions to published rubber-band and constant offset methods (Burns, Razzaque, Whitton, & Brooks, 2007). Another type of evaluation compares several levels of performance of a single component technology. An example is comparing performance in systems with different rendering frame update rates (Meehan, Insko, Whitton, & Brooks, 2002). We measured physiological response to our visual-cliff environment (hereafter called the Pit) at four frame update rates. First we had to develop a system that achieved the best update rate possible with then-available components. We then defined three additional levels of the independent variable by artificially reducing the frame update rate in the VE system. Once the independent variable levels were in place, we designed and executed a user study with dependent variables and measures appropriate to the component and task. We have found two rounds of testing to be useful. First, we evaluate the component in a constrained environment that allows tight control of potential confounding factors. Second, we perform a more ecologically valid evaluation; that is, we design an evaluation task, setting, and scenario that approximate, to the extent possible in a university laboratory, a task, setting, and scenario in which the component will actually be used. Component testing does not obviate testing the performance of completed systems and testing the efficacy of the final application. Evaluating the effect of endto-end system latency on task performance is an example of the former; Volume 3, Section 2 of this handbook is devoted to training effectiveness and evaluation.
242
VE Components and Training Technologies
TESTBED ENVIRONMENT AND METRICS Our work was facilitated by developing a testbed system. The Pit environment developed for Usoh et al. (1999) resulted in a VE system and a virtual scene that consistently evoked high emotional response in participants. Slater, Usoh, and Steed (1995), inspired by the work of Gibson and Walk (1960) that showed we fear falling at a young age, first did user studies in a virtual scene that required participants to walk along a high ledge; our Pit environment is a variation of theirs. Study participants explore a low stress virtual room while the door to the high stress room is closed; then the door to the Pit room opens and participants are asked to perform a simple task such as placing a beanbag on the chair that is visible in Figure 12.1. The task, included to increase participants’ engagement with the virtual scene, always requires that the participant walk along the ledge (or walk out into space). Although participants always have a task, our goal is to measure their response to the environment. Initially we used the Slater-Usoh-Steed questionnaire to measure participants’ sense of presence in the virtual scene (Slater et al., 1995; Usoh et al., 1999). Later we adopted objective measures that are contemporaneous with the experience and correlate with presence. We, like others (Slater & Garau, 2007), no longer use post-experiment questionnaires as our primary measure of presence. Meehan’s work (2001; Meehan et al., 2002) established the validity, reliability, and sensitivity of physiological measures as correlates of presence. Our most successful physiological metric has been delta heart rate: the difference between baseline heart rate in the low stress room and heart rate when the participant is standing on the ledge. The Pit environment and the physiological measures have been used to study frame update rate (Meehan, 2001; Meehan et al., 2002), passive haptics (Insko, 2001), latency (Meehan, Razzaque, Whitton, & Brooks, 2003), and lighting fidelity (Zimmons, 2004).
EVALUATING SENSORY INPUT FIDELITY Each of the studies reported here is concerned with the quality of the sensory input delivered by the system to the user, that is, as explained in Volume 2, Section 1 Perspective, concerned with the quality of the immersion. These studies all contribute to the understanding of the fidelity requirements for VE systems. (The relevance of fidelity for training effectiveness evaluation is the topic of Volume 3, Section 2.) Evaluating the Impact of Field of View in Head-Mounted Displays VE users wearing head-mounted displays (HMDs) often complain about the loss of peripheral vision due to narrow field of view (FOV). Almost all HMDs have a horizontal FOV of less than 60°, much less than the normal human FOV of about 200°. User performance degrades when the HMD FOV is less than about
Evaluating Virtual Environment Component Technologies
243
Figure 12.1. The Pit environment. (Top) The lab with wooden and foam passive haptics used to increase the strength of the illusion of the ledge. (Bottom) An overview of the virtual scene. Images courtesy of the Department of Computer Science, UNC–Chapel Hill.
244
VE Components and Training Technologies
50° (Piantanida, Boman, Larimer, Gille, & Reed, 1992). Only an evaluation of performance with a wide FOV HMD could inform potential users and vendors of whether such HMDs, with FOV approaching 180°, improve user performance. Tasks and Metrics Arthur’s (2000) participants performed tasks while using an HMD capable of FOVs of 176°, 112°, and 48° horizontal × 47° vertical (Arthur, 2000). All participants performed all tasks in all three HMD FOV conditions and, for control, in a commercial 48° × 36° FOV HMD (Virtual Research V8) and in a restricted real condition with a simple cowl restricting the participant’s FOV to that of the V8. Arthur’s (2000) study had participants perform common tasks that included both egocentric and exocentric actions: visual search, walking through a virtual environment without running into walls, distance estimation, and spatial memory. The performance metric for the search and walking tasks was task time to completion; the metrics for distance estimation and spatial memory were distance and position error, respectively. To evaluate other aspects of usability, Arthur tested postural stability before and after HMD use, and participants completed the simulator sickness questionnaire (Kennedy, Lane, Berbaum, & Lilienthal, 1993) and the Witmer and Singer presence questionnaire (1998). Findings Arthur found that wider FOV led to shorter search times on the visual search task and shorter walking times for travel through a maze. There were no observable trends between FOV and values on the other measures—simulator sickness, postural stability, distance estimation, spatial memory, and presence. Participants using the relatively narrow FOV V8 HMD walked and searched faster than those using the widest FOV condition. Arthur speculated that this was due to better acuity and brightness in the V8. Lessons Learned 1. Include an experimental condition that compares the tested component to a similar, familiar component. The wide FOV HMD is radically different from the commercial Virtual Research V8: It has six liquid-crystal display panels per eye, it weighs twice as much as the V8, and brightness and contrast vary across the display panels. The results for the V8 and restricted real conditions gave us confidence that the data we collected for wide FOV HMD users were reasonable. 2. Evaluating early prototype components may require additional specialized equipment. The FOV studies required not only access to a Defense Advanced Research Projects Agency–funded, Kaiser Electro-Optics–developed, experimental, wide FOV HMD, but also they required a large graphics system with 12 separate graphics pipelines and video outputs to drive the 12 display tiles in the HMD. The department’s (then prototype) HiBall wide-area tracker (3rdTech, 2006) enabled Arthur to design the maze-walking task so that participants really walked in the lab.
Evaluating Virtual Environment Component Technologies
245
Impact of Passive Haptics on Training Transfer The most serious credibility problem we see with VEs is that one touches nothing while seeing apparently tangible objects. Insko’s (2001) dissertation studied passive haptics, the registering of low fidelity physical mock-ups of virtual objects. In one study he examined the following question: For training done in a VE, do users learn better if passive haptics are added? Conditions, Tasks, and Metrics Insko’s participants trained to navigate a maze by walking three times through a virtual model of the maze (see Figure 12.2 [top]). The display was a V8 HMD, the head tracker was UNC’s HiBall, the locomotion technique was real walking, and virtual avatars of the participants’ hands were registered to their real hands using a Polhemus tracker. Participants trained in one of two conditions: with passive haptics (see Figure 12.2 [bottom]) or without passive haptics but with visual and audio cues when their hands collided with the virtual maze walls. Participants were encouraged to use their hands to touch the maze as they walked through it. After training, the participants were blindfolded, taken to a real maze (identical to the passive haptics), and instructed to walk the maze. The metrics were task completion time and number of collisions with the walls. The experimenter logged unexpected or otherwise interesting events, including wrong turns. Findings Participants trained with passive haptics took significantly less time to walk the real maze and had significantly fewer collisions. Eleven of 15 who trained without passive haptics made the same wrong turn toward the end of the maze; only 2 of 15 who trained with passive haptics made the same error. Lessons Learned 1. Log experimenter observations, both qualitative and quantitative. The logs can help explain outlier data points and support the exclusion of them from the statistical analysis. In the passive haptics study, observations caught the consistent, but unexpected, wrong-turn behavior. The observation that participants consistently tipped their heads to locate sound sources in three dimensions helped explain why our (unpublished) results comparing localization performance in two-dimensional (2-D) (Microsoft DirectSound) and 3-D (AuSIM Gold Series) sound-generation conditions differed from those reported in the literature. We found no significant performance differences attributable to the sound-rendering method for our participants who could freely walk about and move their heads. In previous studies, participants who were seated with their heads held stationary performed better on the localization task with the stimuli presented in 3D sound (Wenzel, Wightman, & Foster, 1988). 2. Select the levels of the independent variable carefully, balancing the number of conditions and the number of research questions with the reality of study design complexity and number of participants required. For reasons of expediency, Insko did
246
VE Components and Training Technologies
Figure 12.2. (Top) The virtual maze. (Bottom) The passive haptics maze. Participants trained in the virtual maze either with or without the passive haptics. After training, the participants were blindfolded and then walked the real maze, set up identically to the passive haptics. Images courtesy of the Department of Computer Science, UNC–Chapel Hill.
not include a condition exposing participants simultaneously to passive haptics and the synthetic audio and visual cues. If he had, he could have examined the questions of whether using all cues would result in even better real-maze performance than with passive haptics alone and whether training with the audio tones, clearly absent in the real world, would, in fact, mistrain and lead to poorer performance. This is the perennial “training wheels” question for all simulation based training.
Evaluating Virtual Environment Component Technologies
247
3. Require only one session with each participant if at all possible. It is often difficult to get volunteer or minimally compensated participants to return to the lab for the multiple sessions that training retention studies require. Expect to offer larger incentives for multisession studies and withhold most payment until after the final session.
Impact of Lighting Fidelity on Physiological Response One intuitively expects that the more realistic the rendering of a virtual scene, the more VE users’ responses to the scene and task performance will approach real world responses and performance. Zimmons (2004) tested this by evaluating physiological responses to two levels of lighting fidelity and two levels of texture fidelity in a 2 × 2 design. A fifth condition was an unrealistic model of the same scene. Conditions, Tasks, and Metrics Figure 12.3 shows three of Zimmons’s conditions: low fidelity texture and ambient lighting, high fidelity textures with global illumination, and all the surfaces rendered with the same white-on-black grid texture. Zimmons’s primary measure was delta heart rate, measured between the room with the normal floor and the room with the Pit. He administered the same series of questionnaires as Meehan (2001) and interviewed participants. Findings An ANOVA (analysis of variance) analysis over data from the five conditions revealed no significant differences and no trends in the delta heart rate measures. Even the scene rendered with a uniform grid texture evoked high levels of stress as indicated by a rise in heart rate. In interviews, over 60 percent of all participants mentioned feeling fearful. Lessons Learned 1. Always run pilot studies. Besides bringing procedural problems to light, running a pilot study all the way from greeting participants through data analysis enables a statistical power analysis to determine if experiments are likely to differentiate among the conditions without an untenable number of subjects. 2. Null results do not mean the work is valueless, but never claim that lack of statistical significance of differences implies that the conditions are the same. There are two ways to emphasize the practical significance of any differences in measured values. a. Field and Hole (2003) suggest that authors always report effect size as part of their statistical results. Reporting effect size allows readers to judge for themselves if differences matter practically. b. Statistical techniques for equivalence testing, testing the hypothesis that sample populations do not differ, are available. An important application is in studies
248
VE Components and Training Technologies
Figure 12.3. The Pit environment displayed in three of Zimmons’s rendering styles: (top) low quality lighting and low resolution textures; (middle) high quality lighting and high resolution textures; (bottom) rendered with a white-on-black grid texture applied to all objects. Images courtesy of the Department of Computer Science, UNC–Chapel Hill.
Evaluating Virtual Environment Component Technologies
249
verifying the efficacy of generic compared to brand-name drugs. Wellek (2002) is a comprehensive study of equivalence testing written for statisticians.
Impact of Lighting Fidelity on Task Performance Zimmons (2004) asked whether the style of lighting used in rendering abstract knot-like shapes affects performance in a visual search task. The conditions for this study were strictly controlled; even so, there were lessons for those planning training systems. Conditions, Task, and Metrics The task was to look at a rendering of a target knot-like figure and then, after the target disappeared, to locate the target figure from among a now-visible set of 15 similar knot-like figures or to indicate that the target knot was not in the search set. Figure 12.4 shows knots rendered in ambient lighting, local illumination, and global illumination, as well as the setup for the search task. Note that all of the knots in each search set were rendered with the same lighting style. To eliminate confounding factors, all knots were the same color, had the same surface material properties, and were rendered in gray scale. All pairwise combinations of the three target-knot lightings and three search-set lightings were tested. The measures were time to select and accuracy.
Figure 12.4. (Left) Knots rendered in Zimmons’s three lighting styles: (top) ambient lighting, (middle) local lighting, and (bottom) with shadows and interreflections. The large figure shows the search set with 15 knots. The target knot is displayed on the wall (as shown), but without the search set visible; then the target object disappears and the search set appears. Selection is made with a virtual laser pointer. Images courtesy of the Department of Computer Science, UNC–Chapel Hill.
250
VE Components and Training Technologies
Table 12.1. Accuracy Scores for Different Conditions in the Knot Search Task Search Object Lighting Model (SOLM) Global Local Ambient
Table Object Lighting Model (TOLM)
Global Local Ambient
80.3% 73.9% 54.5%
74.4% 77.3% 66.2%
48.6% 62.5% 70.3%
Findings Table 12.1 shows accuracy scores for different combinations of target lighting and search-set lighting. The best scores are on the diagonal and are not always associated with the highest fidelity lighting! The more different the lighting between target and search set, the poorer the performance. We speculate that this is because the task is essentially a pattern-matching problem and the more dissimilar the rendering style between target and search set, the harder it was to recognize the underlying similarity between correctly matched knots. Lessons Learned 1. Mitigate confounders by carefully considering all aspects of your stimuli and evaluation setup. Whereas the virtual environment for the visual search task was very simple—a room with a table and a picture frame—development of the stimulus models and images was complex and time consuming as lighting, brightness, and colors had to be matched. We unexpectedly added complexity to the data analysis of a locomotion study because the paths the participants walked were not all the same length. The consequence was that the data from the different segments could not be naively combined in repeated-measures analyses. 2. Useful knowledge can come from studies that are highly constrained and have little ecological validity. The knot study does not claim or show that the lighting approximation fidelity results can be generalized to other tasks, but it provides a case that demonstrates that we must always ask the questions of how much realism is needed. 3. While lower quality lighting approximations may be used during training to simplify the task for beginners, the final training condition should be as close as possible to those trainees will see in the field. Zimmons’s (2004) data suggest that consistency of lighting style is more important for accurate identification of complex shapes than lighting fidelity.
EVALUATING USER INTERFACES Sometimes a simple question can lead to a series of studies, as did our asking which locomotion technique is best for VE users. That simple question led to a long research thread of developing and comparatively evaluating locomotion interfaces. We report on two locomotion studies here.
Evaluating Virtual Environment Component Technologies
251
Sometimes user studies are required during the development of a technique to give it principled rather than ad hoc foundations. One study of a user-interface component involved the development of a psychometric function to establish an experimentally determined value for a critical parameter.
Locomotion Technique Effect on Presence Locomotion Conditions Table 12.2 shows the five conditions we used for a series of locomotion studies. There are three viewing conditions (unrestricted, FOV restricted eyes, and HMD) and three locomotion conditions (real walking, walking in place, and flying with a gamepad/joystick). The five conditions are as follows: • REAL—real walking and unmediated eyes, • COWL—real walking and FOV restricted eyes, • VEWALK—real walking and head-mounted display, • WIP—walking in place and head-mounted display, and • JS—joystick and head-mounted display.
The REAL condition gives us a standard against which to compare. The COWL condition was included so that we could isolate the effect of reduced FOV from the effects of HMD presentation of visual stimuli. In our first locomotion study, reported next, there was no COWL condition and a push-button-flying interface was used in place of the joystick to better replicate a previous study. Tasks and Metrics A study of the impact of the locomotion technique on presence used the Pit environment (Usoh et al., 1999). The study replicated the work reported in Slater et al., 1995) using a similar virtual scene, task, metrics, and data analysis. We included the VEWALK condition in addition to the push-button-flying and WIP interfaces that were compared in the earlier study. The WIP interface, implemented with a neural network, was the same as used in the Slater, Usoh, and Steed study. Participants performed a task in the Pit environment and filled out
Table 12.2. The Locomotion Conditions
252
VE Components and Training Technologies
the Slater-Usoh-Steed presence questionnaire and Kennedy’s simulator sickness questionnaire (Kennedy et al., 1993). Findings Both VEWALK and WIP were significantly more presence inducing than push-button-flying. With no other factors in the model, VEWALK was more presence inducing than WIP. However, when oculomotor discomfort (a subscale of the simulator sickness questionnaire) was included, there was no difference between the VEWALK and WIP conditions. Oculomotor discomfort diminished presence for flying and WIP, but not for VEWALK. Lessons Learned 1. Reuse, or minimally modify, experimental methods, measures, and analysis protocols from published works. The methods have already been vetted by publication reviewers, and it makes it easier to compare work across studies. 2. Pilot the task. Participants must be able to learn the interfaces and complete the task. Some participants were never able to successfully use the neural-network based WIP interface. Timing the task during piloting lets one estimate the length of experimental sessions and judge if participant fatigue is going to be an issue.
Locomotion Technique Effect on Training Transfer Task, Secondary Task, and Metrics We compared how well participants learned a task when trained in one of the five conditions. This study and task were designed to have some ecological validity with respect to training warfighters moving on foot and to require more complex movements than our previous studies. The task, moving from one side to the other of a virtual bombed-out building while avoiding gunfire, required participants to move quickly and stop precisely behind sheltering barriers (see Figure 12.5). Participants had to maneuver around sharp corners and avoid obstacles on the floor. (The low obstacles were outside the vertical field of view of the HMD when the participant was looking straight ahead.) To increase cognitive load, participants were also asked to count the occurrences of two audio events. In response to participant comments from the previous study, we developed an easier-to-use WIP interface for this study (Feasel, Whitton, & Wendt, 2008). The performance metric was exposure to gunfire, measured in bodypercentage seconds. Data from the head tracker were logged for time-trajectory analyses. The design was pre-test (REAL condition), training (in one of the five conditions), and post-test (REAL) to enable us to evaluate training transfer and the pattern of exposure scores across the training trials. Findings While data analysis is ongoing, early analysis of exposure data during the 12 training trials shows that training to competence with the WIP and JS
Evaluating Virtual Environment Component Technologies
253
Figure 12.5. Locomotion study environment. (Top) Virtual environment as seen in HMD. (Bottom) Physical environment with passive haptic approximations of objects in the virtual scene. The truck and shooters were purely virtual and were projected on the white wall visible in the top scene. The two visible barriers in the top image correspond to the two far barriers in the bottom image. The oval and the arrow approximate the HMD view direction. Images courtesy of the Department of Computer Science, UNC–Chapel Hill.
254
VE Components and Training Technologies
interfaces requires 5–10 minutes longer than training for the three conditions in which people really walk.
Lessons Learned 1. Train to competence in a setting with complexity comparable to the test scenario. Our training scene was, unfortunately, less cluttered than the test scenes and did not force the participants to maneuver through spaces as tight as those in the test scenes. 2. Do not underestimate the space, equipment, programming, modeling, logistical, and management resources required to design, implement, and execute studies with some ecological validity. Just the paper design of the four virtual scenes for this study took well over 80 hours. The layouts were constrained by analysis requirements, available building blocks for passive haptics, cable management issues, the need to switch from one physical (passive haptics) environment to another in three to four minutes, and the need that they be of comparable difficulty. 3. Pilot test to ensure that the experiment will discriminate among conditions. Our pilot test showed our task was too easy to yield discrimination. We added a distracter cognitive task, reasoning that if people were counting explosions and jets flying overhead, there would be fewer cognitive resources available for moving and hiding. If a relevant taxonomy of tasks is available, for example, Bloom’s taxonomy of cognitive tasks, consult it when choosing tasks. 4. When possible, strive for within-subjects designs that expose all participants to all conditions. In the case of this locomotion study, participants could rate and rank the five interface techniques because they had experienced all five. In a subsequent study, the length of sessions dictated that each participant experience only one of the five conditions. That limited our ability to use participant comments to make sharp distinctions among the conditions. 5. Make use of a statistics consulting service if needed and available. Advice from such a service during study development helped ensure that we were able to answer our research questions with the study design and analysis we planned. 6. Do not expect all statistical analyses to be as simple as t-tests and ANOVAs. The experimental design may dictate more sophisticated techniques than those learned in a first statistics or experimental design course. In this study, the complexity unexpectedly rose when we found that the exposure data did not meet the normality criteria required for use of parametric techniques. 7. Ecological validity is hard to achieve. Real users should be involved in study design if at all possible, especially for testing training transfer studies. 8. Training transfer studies are difficult because they require a “real” condition. The laboratory environment imposes space and other limitations on that real condition. These limits often result in low ecological validity; the question arises as to how generalizable laboratory training transfer study results are to real world training. 9. Using military personnel as study participants may require review by the military human subjects protection organization. This includes Reserve Officers’ Training Corps students; they are considered to be on active duty.
Evaluating Virtual Environment Component Technologies
255
Managing Avatar-Object Collisions Whereas one can develop interface techniques by trial and error, varying parameters until it “feels right,” studies may be required to establish those parameters in a principled way. Burns (2007) used methods from psychophysics to establish the detection thresholds needed in his technique. Some VE systems prevent unnatural interpenetrations of avatars and objects by stopping the avatar at the surface of the object. When this occurs for an avatar of a hand, the participant’s real and virtual hands can move out of registration. Burns developed and evaluated an interface component technique, MACBETH, that moves the virtual and real hands back together imperceptibly (Burns, 2007; Burns et al., 2007). Establishing Detection Thresholds MACBETH manipulates the position and the velocity of the avatar hand relative to the position and the velocity of the participant’s real hand. For the manipulation to be imperceptible, Burns had to determine at what levels people detect differences in real and avatar hand position and hand velocity. Detection thresholds are found by developing psychometric functions. Coren, Ward, and Enns (1999) include a very readable introduction to psychophysics, psychometric functions, and detection thresholds. The basic method used to find a psychometric function is staircase presentation of stimuli. In an up staircase, the stimulus is first presented at a low value; if the subject does not perceive the stimulus, the level is raised step-by-step until it is perceived. Down staircases work similarly. There are various methods for managing the presentation of the stimulus after a reversal—when a participant goes from nonperception to perception (or vice versa) and for adaptively decreasing the size of the step to efficiently yield a more precise threshold. Because thresholds are often different when approached from above and below, well designed studies interleave both up and down staircases. Psychophysical studies are often very time consuming; it may take an hour to reach the staircase-stopping condition, and there are typically multiple repetitions for each participant. Comparing MACBETH to Other Techniques: Tasks, Environments, and Metrics MACBETH was compared against two published techniques—rubber-band and incremental motion (Zachmann & Rettig, 2001). Burns designed two mazes as test cases. The task was to grab a ball from a start position and move it through a maze to an end position. The maze was only a bit wider than the ball, so collisions between the ball and maze walls were frequent. Metrics were time to completion and a series of pairwise forced-choice preference ratings. All permutations of pairs were tested to avoid order effects and enable comparison of the three techniques.
256
VE Components and Training Technologies
Findings The study found that performance using MACBETH was better than or equally as good as performance with rubber-band or incremental motion. Participants preferred the MACBETH technique, finding it to be more natural than the others. Lesson Learned Devising an evaluation study often requires assumptions. Burns used a single up staircase method to determine the position-discrepancy detection threshold between the real hand and the avatar. Later, he used multiple, interleaved, adaptive staircases to determine the velocity-discrepancy detection threshold. Because the outputs of the two studies were not strictly comparable, Burns had to make some major, but plausible, assumptions in order to complete development of his technique. The lesson is the importance of reporting and justifying all assumptions. If the results seem implausible, revisit the assumptions. EVALUATING SYSTEMS: PERFORMANCE AND EFFICACY Evaluating the Effect of End-to-End System Latency For HMD users, the fastest viewpoint-changing motion is head rotation. Hence, the most critical system response time is that between when head rotation begins and the display of an updated view. This end-to-end latency is an oftenmeasured critical system parameter (Mine, 1993; Olano, Cohen, Mine, & Bishop, 1995). Task performance is shown to degrade with 80 ms (milliseconds) of latency (So & Griffin, 1995), and participants can perceive latency of as little as 10–12 ms (Ellis, Mania, Adelstein, & Hill, 2004). Conditions, Tasks, and Metrics As part of an exhibition at SIGGRAPH (Special Interest Group on Graphics and Interactive Techniques) 2002 we compared delta heart rate in participants experiencing our Pit environment with either 50 ms of average end-to-end latency or 90 ms average. We hypothesized that those experiencing low latency would have a stronger stress response than those in the high latency condition. We also administered the Slater-Usoh-Steed presence questionnaire. Findings The low latency system yielded a significantly higher delta heart rate between the two rooms. The Slater-Usoh-Steed presence questionnaire revealed no significant differences in reported presence: the physiological measure was more sensitive than the questionnaire. Lessons Learned 1. Be pragmatic in the choice of experimental conditions. Although we could achieve 40 ms latency with our best hardware, we chose 50 ms as the low latency condition in case we had an equipment failure and had to continue with less-capable hardware.
Evaluating Virtual Environment Component Technologies
257
A goal for the exhibition was that every participant have a very good VE experience, so we selected a high latency value, 90 ms, that is, 10 percent less than 100 ms, a generally accepted upper bound for interactive systems. 2. Develop and maintain a good working relationship with the group that oversees the ethical treatment of human subjects in research studies, called the Institutional Review Board in the United States. From earlier studies, the UNC Institutional Review Board was familiar with our work and the precautions we take to ensure participant safety. Although the locale was unusual, getting approval for this exhibition based study was straightforward.
Evaluating the Efficacy of a Collaboration System The nanoManipulator Collaboratory is a system that extends the function of the nanoManipulator to enable distributed collaboration for visualization and analysis of scientific data (Sonnenwald, Whitton, & Maglaughlin, 2003). In a multifaceted study, we asked the following questions: Can science be done as effectively when scientists are noncollocated and using the Collaboratory system as when they are face-to-face using the nanoManipulator? What do the participants think of the system? Is the system likely to be adopted by the target users? Should development of tools for distributed scientific collaboration continue? The nanoManipulator Collaboratory was developed in UNC’s National Institutes of Health Center for Computer Integrated Manipulation and Microscopy under the direction of Diane H. Sonnenwald and Mary Whitton. Participants, Conditions, Tasks, and Metrics Twenty pairs of upper level undergraduate science majors performed two different laboratory tasks, on two different days, in two different experimental conditions: face-to-face and noncollocated. When noncollocated, they used the Collaboratory system. Half worked face-to-face first; half worked noncollocated first. The laboratory task was to analyze data that had been gathered previously by working domain scientists. Each participant wrote a laboratory report for each session. Sessions were recorded (video and audio), and there was an experimenter/observer in the room at all times to note critical incidents (Hix and Hartson, 1993) (behaviors or events that might have affected outcome). Questionnaires and interviews followed each session. Grades on the laboratory reports provided quantitative data about the quality of the science done in each condition. Participants’ opinions, qualitative data, of the usability of the system were collected in questionnaires and in semistructured interviews. The interviews were transcribed and analyzed using open and axial coding (Berg, 2006). Participants’ opinions of the adoptability of the system were gathered with a purpose-designed questionnaire based on the Rogers’ diffusion of innovation theory (Rogers & Rogers, 2003).
258
VE Components and Training Technologies
Findings There were no significant differences in either the scores on the lab reports completed in the two conditions or the scores on the adoptability questionnaire. When order of conditions for the two sessions was included in the model, the group that collaborated noncollocated first had significantly higher scores on the second task than the group that worked face-to-face first. At the time of the study, we had not yet become aware of equivalence testing (Wellek, 2002), and we had only the interview data to help us understand the results. In the interviews the participants identified both positive and negative aspects of working face-to-face and working noncollocated. Some expressed a strong preference for working noncollocated, as it gave them their own space and sole use of the Collaboratory tool. Participants reported devising work-arounds for the perceived disadvantages of using the Collaboratory; they did not let perceived system deficiencies keep them from doing their tasks. Lessons Learned 1. Study designs usually demand compromises. The quality of science would, ideally, be judged by long-term measures, such as number and quality of papers and grants that result. This study required short-term measures plausibly related to scientific quality. As conceived, study participants were to have been the system’s target users— graduate research assistants, post-doctoral fellows, and working scientists. We quickly realized we were unlikely to find 40 of them willing to participate in an eight hour study. Our decision to use undergraduate students broadened the participant pool, but constrained the sophistication of the science lab tasks. 2. A full 2 × 2 design (adding the conditions where a group did both labs face-to-face and a group that did both labs noncollocated) would have been better for this study as we could have then eliminated any difference in difficulty of the two laboratory tasks as a factor in difference of scores between the first and second sessions. This would have, however, required twice as many pairs of participants and another six to eight months. 3. Developing new measurement tools and designing the statistical analysis are significant portions of the study design task and may require outside expertise. The center offering consulting on statistics will often also help with measurement tool development. 4. Multifaceted studies enable data triangulation. Triangulation, common in the social sciences, is the use of multiple research methodologies to study the same phenomena. The theory is that using multiple methodologies overcomes any biases inherent in the individual methods and, consequently, enables the researcher to draw conclusions from the aggregate data more confidently than from a single measure or method. In this study, the null statistical results were plausibly explained by the interview data that showed participants found positive and negative elements for both conditions and developed work-arounds. We were trying to find out if there were problems with scientific collaboratories that would suggest that development stop. Looking at the whole of our data, we are comfortable saying that we found no showstoppers, so development should continue.
Evaluating Virtual Environment Component Technologies
259
5. Large, multifaceted studies are resource intensive—equipment and people. For this study two rooms, each with two computers, a force-feedback device, four cameras, two video recorders, two audio recorders, and wireless telephones, were tied up for eight months. Seven people shared the study execution and observation duties: on the order of 400 person hours to simply gather the data. The 40 participants were each paid $100. Including system development and the study, an average of four graduate students worked on the project each semester for four years, and three to five faculty members were involved over the life of the project.
THE ROLE OF EVALUATION IN SYSTEM DEVELOPMENT Evaluation should not be an isolated event, but should, when possible, be part of every stage of component and system development. As requirements are seldom fully understood a priori by either the customer or the designer, an iterative design cycle—analyze, design, implement, and evaluate—is critical to achieve a usable product that meets expectations. Gabbard, Hix, and Swan (1999) address design and evaluation of VE systems from the perspective of usability engineering. A more recent discussion of the usability process model and how it integrates with development models can be found in Helms, Arthur, Hix, and Hartson (2006). An important step in requirements analysis (Volume 1, Section 2) is translating those application requirements into system requirements. The particular requirements, then, are the starting benchmarks for component evaluation. In the iterative development cycle, those benchmarks may change Gabbard et al. (1999) suggest that type of evaluation should change over the development cycle. A sequence of evaluations might include evaluation of a proposed design against expert guidelines, informal evaluation by a few target users or experts, formal usability evaluation by target users (formative studies), and, finally, formal comparison to alternate solutions (summative studies). The investment required to evaluate iteratively can be substantial. The cost of not evaluating VE training systems—as individual components and as a system —can be not only a failed project, but also increased scepticism about the value of VE technology for training. Component evaluation is a precursor to formal experimental studies asking whether VE training systems do train, and if they do, are they also better than current training practice on such metrics as training better, faster, or cheaper. Application efficacy is the hardest evaluation of all, and the one most needed now to increase the use of VE in training systems. ACKNOWLEDGMENTS The authors gratefully acknowledge the major support for the work reported here from the Office of Naval Research (VIRTE Project). Additional support was provided by the NIH National Institute for Biomedical Imaging and Bioengineering, the NIH National Center for Research Resources, the Link Foundation, and SAIC, Inc. Equipment for the audio study was loaned to us by AuSIM Inc.
260
VE Components and Training Technologies
REFERENCES Arthur, K. (2000). Effects of field of view on performance with head-mounted displays (Doctoral dissertation; CS Tech. Rep. No. TR00-019). Chapel Hill: The University of North Carolina at Chapel Hill, Department of Computer Science. Berg, B. L. (2006). Qualitative research methods for the social sciences (6th ed.). Boston: Allyn and Bacon. Burns, E. (2007). MACBETH: Management of avatar conflict by employment of a technique hybrid (Doctoral dissertation; CS Tech. Rep. No. TR07-002). Chapel Hill: The University of North Carolina at Chapel Hill, Department of Computer Science. Burns, E., Razzaque, S., Whitton, M. C., & Brooks, F. P., Jr. (2007). MACBETH: Management of avatar conflict by employment of a technique hybrid. International Journal of Virtual Reality, 6(2), 11–20. Coren, S., Ward, L. M., & Enns, J. T. (1999). Sensation and perception (5th ed.). Philadelphia: Harcourt Brace College Publishers. Ellis, S. R., Mania, K., Adelstein, B. D., & Hill, M. I. (2004). Generalizability of latency detection in a variety of virtual environments. Proceedings of the 48th Annual Meeting of the Human Factors and Ergonomics Society (pp. 2083–2087). Santa Monica, CA: Human Factors and Ergonomics Society. Feasel, J., Whitton, M. C., & Wendt, J. D. (2008). LLCM-WIP: Low-latency, continuousmotion walking-in-place. Proceedings of IEEE Symposium on 3D User Interfaces 2008 (pp. 97–104). Reno, NV: IEEE. Field, A., & Hole, G. (2003). How to design and report experiments. London: SAGE Publications. Gabbard, J. L., Hix, D., & Swan, J. E. (1999). User-centered design and evaluation of virtual environments. IEEE Computer Graphics and Applications, 19(6), 51–59. Gibson, E. J., & Walk, R. D. (1960). The visual cliff. Scientific American, 202(4), 64–71. Helms, J. W., Arthur, J. D., Hix, D., & Hartson, H. R. (2006). A field study of the wheel: A usability engineering process model. Journal of Systems and Software, 79(6), 841–858. Hix, D., & Hartson, H. R. (1993), Developing user interfaces: Ensuring usability through product & process. New York: John Wiley & Sons. Insko, B. (2001). Passive haptics significantly enhances virtual environments (Doctoral dissertation; CS Tech. Rep. No. TR01-017). Chapel Hill: The University of North Carolina at Chapel Hill, Department of Computer Science. Kennedy, R. S., Lane, N. E., Berbaum, K. S., & Lilienthal, M. G. (1993). A simulator sickness questionnaire (SSQ): A new method for quantifying simulator sickness. International Journal of Aviation Psychology, 3(3), 203–220. Martin, D. W. (2007). Doing psychology experiments (7th ed.). Belmont, CA: Wadsworth Publishing. Meehan, M. (2001). Physiological reaction as an objective measure of presence in virtual environments (Doctoral dissertation; CS Tech. Rep. No. TR01-018). Chapel Hill: The University of North Carolina at Chapel Hill, Department of Computer Science. Meehan, M., Insko, B., Whitton, M., & Brooks, F. P., Jr. (2002). Physiological measures of presence in stressful virtual environments. ACM Transactions on Graphics, 21(3), 645–652. Meehan, M., Razzaque, S., Whitton, M., & Brooks, F. (2003). Effects of latency on presence in stressful virtual environments. Proceedings of IEEE Virtual Reality 2003 (pp. 141–148). Los Angeles: IEEE.
Evaluating Virtual Environment Component Technologies
261
Mine, M. R. (1993). Characterization of end-to-end delays in head-mounted display systems (Tech. Rep. No. TR93-001). Chapel Hill: The University of North Carolina at Chapel Hill, Department of Computer Science. Olano, M., Cohen, J., Mine, M., & Bishop, G. (1995). Combating rendering latency. Proceedings of the ACM Symposium on Interactive 3D Graphics 1995 (pp. 19–24). Monterey, CA: ACM. Piantanida, T. P., Boman, D., Larimer, J., Gille, J., & Reed, C. (1992). Studies of the fieldof-view/resolution tradeoff in virtual reality. Proceedings of Human Vision, Visual Processing and Digital Display III (Vol. 1666, pp. 448–456). Bellingham, WA: SPIE. Rogers, E. M., & Rogers, E. (2003). Diffusion of innovations (5th ed.). New York: The Free Press. Slater, M., & Garau, M. (2007). The use of questionnaire data in presence studies: Do not seriously Likert. Presence: Teleoperators & Virtual Environments, 16(4), 447–456. Slater, M., Usoh, M., & Steed, A. (1995). Taking steps: The influence of a walking technique on presence in virtual reality. ACM Transactions on Computer-Human Interaction, 2(3), 201–219. So, R. H. Y., & Griffin, M. J. (1995). Effects of lags on human operator transfer functions with head-coupled systems. Aviation, Space, and Environmental Medicine, 66(6), 550–556. Sonnenwald, D. H., Whitton, M., & Maglaughlin, K. (2003). Evaluating a scientific collaboratory: Results of a controlled experiment. ACM Transactions on Computer Human Interaction, 10(2), 151–176. 3rdTech. (2006). HiBall-3100™ Wide-Area, High-Precision Tracker and 3D Digitizer. Retrieved April 16, 2008, from http://3rdtech.com/HiBall.htm Usoh, M., Arthur, K., Whitton, M. C., Bastos, R., Steed, A., Slater, M., & Brooks, F. P. (1999). Walking > walking-in-place > flying in virtual environments. Proceedings of SIGGRAPH ’99 (pp. 359–364). Los Angeles: ACM. Wellek, S. (2002). Testing statistical hypotheses of equivalence. Boca Raton, FL: Chapman & Hall/CRC Press. Wenzel, E. M., Wightman, F. L., & Foster, S. H. (1988). A virtual display system for conveying three-dimensional acoustic information. Proceedings of the 32nd Annual Meeting of the Human Factors Society (pp. 86–90). Santa Monica, CA: Human Factors Society. Witmer, B. G. & Singer, M. J. (1998). Measuring presence in virtual environments: A presence questionnaire. Presence: Teleoperators & Virtual Environments, 7(3), 225– 240. Zachmann, G., & Rettig, A. (2001, July). Natural and robust interaction in virtual assembly simulation. Paper presented at the Eighth ISPE International Conference on Concurrent Engineering: Research and Applications (ISPE/CE 2001), Anaheim, CA. Zimmons, P. (2004). The influence of lighting quality on presence and task performance in virtual environments (Doctoral dissertation; CS Tech. Rep. No. TR07-002). Chapel Hill: The University of North Carolina at Chapel Hill, Department of Computer Science.
This page intentionally left blank
SECTION 2
TRAINING SUPPORT TECHNOLOGIES SECTION PERSPECTIVE Jan Cannon-Bowers and Clint Bowers Training support technologies, methods, and tools are those elements that can be added to virtual environments (VEe) to optimize their training value. As has been noted previously in this volume, simply creating a virtual representation of a task environment, no matter how faithful it is to the real world, provides only the context in which training can occur (Salas, Bowers, & Rhodenizer, 1998). Hence, it is imperative that a VE based training system incorporates features (derived from learning science) that will ensure that learning can (and does) occur. One way to organize the discussion of such methods and tools is to introduce the scenario based training approach. Scenario based training (SBT) refers to training that incorporates realistic scenarios as a basis to enable practice and feedback on crucial competencies (Oser, Cannon-Bowers, Salas, & Dwyer, 1999). According to Cannon-Bowers, Burns, Salas, and Pruitt (1998), the primary mechanism upon which learning occurs in SBT is through the scenario itself. Therefore, considerable care must be taken in developing scenarios and supporting elements (for example, performance measures) that directly support the targeted learning objectives. Based on extensive experience in training U.S. Navy combat teams, CannonBowers et al. (1998) proposed an overarching framework to describe the SBT process. We have modified this process slightly; the updated version can be seen in Figure SP2.1. According to Figure SP2.1, the SBT process begins with specification of the tasks that must be performed and translation of these into targeted learning objectives (that is, those knowledge, skills, abilities, and attitudes that are necessary for effective performance of the task. Once targeted learning objectives are specified, specific events can be scripted that provide trainees with the opportunity to learn the objectives and/or demonstrate mastery of them. These events are typically “tied together” through a scenario or story that provides a compelling and convincing backdrop.
264
VE Components and Training Technologies
Figure SP2.1.
Scenario Based Training Process
Based on scenario/story events, specific instructional strategies can be selected. For example, for a particular learning objective it may be deemed best to scaffold performance before allowing the trainee to complete the task on his or her own. Scenario/story events also provide a basis upon which to develop specific, measurable performance assessment strategies. Such strategies may be built into the system (for example, automatically collected based on trainee responses), or they may exist as an external adjunct (for example, an instructor rating). Further, observed performance must be interpreted so that a diagnosis of the underlying causes of that performance is inferred. This is crucial to the specification of feedback and remediation strategies. It can also occur as an automated process (in the case of an intelligent tutoring system) or via the actions of a human instructor. Once specified, feedback can be delivered on line (for example, as hints or cues to the learner during performance) or as a post-exercise debrief or after action review (AAR). At this point, any necessary remediation can be required of the trainee (for example, to review declarative or procedural knowledge in textbooks). The final step in the SBT cycle is to close the loop by recording the trainees’ progress in a training episode so that it can drive specification of subsequent learning objectives. This step is crucial since it helps to ensure that training resources are expended efficiently by tailoring the presentation of learning objectives to the trainees’ (or teams’) particular needs. Modern training systems often feed into complex learning management systems for this reason. The SBT process described here is meant to be a guide for training researchers and practitioners as they conceptualize the design of SBT systems. It also provides an organizing framework for couching a discussion of SBT. In fact, the chapters in Volume 2, Section 2 all fit fairly well into this conceptualization. The following sections provide more detail about the SBT process and describe how each chapter enriches our understanding of how to optimize it.
Training Support Technologies
265
TASK ANALYSIS/LEARNING OBJECTIVES/COMPETENCIES It has long been acknowledged that a detailed task analysis is a first essential step in developing a training system. Obviously, the targeted task must be well understood before training can be developed for it. For the most part, this would seem to be a fairly simple process, and job/task analysis methods have been in existence for many years (Annett & Stanton, 2006). However, in modern training systems, the task analysis process is complicated for several reasons. First of all, a growing emphasis has been witnessed in recent years on higher order skills, especially decision making and problem solving. This has led to a desire to better understand how experts perform in realistic environments, including specification of tacit or implicit knowledge (that is, that which is crucial to task performance, but not well articulated by the experts themselves; see Cianciolo, Matthew, Sternberg, & Wagner, 2006). Hence, new methods of eliciting knowledge have been developed in recent years to better describe the way that the task is best accomplished. A second complicating factor in task analysis is that in many modern settings, the pace of change is extreme (particularly compared to environments in previous generations). This means that knowledge is not static; rather, it changes and evolves relatively quickly, requiring that training systems change quickly as well. Unfortunately, traditional task analysis methods were developed for systems with relatively static knowledge and do not adapt to changing conditions very well. To address this challenge, Chapter 18 (Shadrick and Lussier) describes a knowledge elicitation strategy that is designed to accommodate changing task requirements and conditions. This approach, which actually builds on the strengths of several traditional methods, may be useful in rapidly building scenarios, new concepts, and future conditions. EVENTS/SCENARIO/STORY Once the learning objectives have been established, they can then be used as input to the scenarios, forming the basis of the training exercise. Past researchers have conceptualized this process in terms of embedding events into scenarios that represent the learning objectives. An event, in this context, is any stimulus condition that is purposely scripted into a scenario in order to elicit a particular response. Scenario events have also been conceptualized as triggers—specific scenario conditions that will allow the trainee to practice the targeted learning objectives (Fowlkes, Dwyer, Oser, & Salas, 1998). Hence, the scenario events form the basis of trainee practice opportunities. The scenario in scenario based training also serves another purpose, that is, to provide a context or narrative that ties events together. In this sense, the scenario or story also serves a motivational purpose in the sense that it engages the trainee in a realistic context. Recently, researchers have begun to theorize that narrative elements may actually enhance learning by helping guide trainees through the system (Ironside, 2006). Further, research into such concepts as immersion and presence seems to indicate that learning can be enhanced when trainees are
266
VE Components and Training Technologies
psychologically engaged in the scenario. Stories and story based learning also help to ensure that the experiences trainees gain in training (as opposed to the real world) are authentic—that is, they are rich, faithful representations of the world that will enable the trainee to transfer his or her virtual experience into the operational environment. In Chapter 19, Gordon describes the history of story based learning environments and how they have evolved over time. As technology advances, more complex, interactive media are being developed that allow for increasingly complex story based systems. Gordon describes development of a story based leadership training system and how this approach can be used to enhance learning. PERFORMANCE MEASUREMENT/ASSESSMENT/DIAGNOSIS Once scenario events are scripted and instructional strategies selected, the next step in the scenario based training development process is to specify the performance measures that will be implemented to assess trainee behavior. For the most part, measures of performance should flow out of the task analysis/cognitive task analysis that were conducted at the start of the process. This implies that as tasks are initially described, the conditions and standards of effective performance are also specified. At this time, it is often the case that measurement procedures or approaches are also selected. In the case of scenario based training these measures are often behavioral in nature (that is, they describe the specific behavioral response that is expected of the trainee). On other occasions, measures are more cognitive, assessing the mental processes that trainees use in accomplishing tasks presented by the scenario. For example, Chapter 16 (Riley, Kaber, Sheik-Nainar, and Endsley) describes a technique for measuring situational awareness in trainees as a means to better understand whether trainees have a sufficient mental picture of scenario events and an awareness of crucial information in the environment. Likewise, Chapter 17 (Cain and Armstrong) provides a rationale for why it is important to understand and measure the cognitive workload presented by the scenario. In this case, the goal of measurement is to determine whether trainees are overloaded by the task demands, particularly as an indication of task difficulty. When it is determined that workload is excessive, measures can be taken to scale back or simplify task demands to better match the trainee’s level of mastery and capability. INSTRUCTIONAL STRATEGIES Scenario based training environments provide a context in which learning can occur, but in and of themselves, they are not training systems without the addition of the elements displayed in Figure SP2.1. Of primary importance in this regard is the establishment of instructional strategies that optimize learning. In fact, there are many possible approaches to embedding instruction in scenario based training. For example, instructional decisions can be made regarding the difficulty of tasks presented to trainees, the form and timing of feedback (more will be said
Training Support Technologies
267
about this in a later section), the nature of hints and cues provided to trainees, the spacing of practice opportunities, and the like. Chapter 20 (Lane and Johnson) describes how tenets from intelligent tutoring can be applied to more dynamic scenario based training situations. According to these authors, intelligent tutoring provides a framework for embedded instructional features—including measurement and feedback—into virtual learning environments. In fact, they argue that virtual environments may allow for additional opportunities, for example, by allowing pedagogical agents to become part of the story or narrative. In another vein, Chapter 21 (Singer and Howey) describes an instructional approach to enhancing virtual environments by manipulating deviations from fidelity (that is, the actual task situation) as a means to improve learning. By using augmenting cues and adjuncting cues, provisions can be added to simulations that better support the learner by ensuring that necessary exposure to, and practice with, the stimuli can occur. For example, enhancing the salience of a stimulus in the environment through visual manipulation (for example, increased brightness) can help to direct the learner’s attention so he or she appropriately confronts the stimulus. FEEDBACK As has been alluded to in previous sections, feedback is an essential element in scenario based training (Cannon-Bowers et al., 1998). As in all forms of training, feedback provides trainees with a detailed understanding of their own performance and how they need to correct behavior in order to enhance future performance. Much has been written about feedback in training, exploring such things as when and how often to give feedback, the format of feedback (for example, directive or reflective), the specificity of feedback, and who provides feedback (instructors or trainees themselves). This literature provides much useful guidance on how best to implement feedback mechanisms in training (CannonBowers et al., 1998). In Chapter 14 (Lampton, Martin, Meliza, and Goldberg), a framework for AARs is provided as a mechanism for providing feedback in scenario based training. These authors describe an approach that takes advantage of electronic data streams that can enhance the delivery of feedback. They also discuss several issues in implementing feedback (AARs) in virtual training systems. LEARNING MANAGEMENT The final step in a fully implemented scenario based training system is to record trainee performance and use that information to inform future scenario based training episodes. In operational environments this is too often accomplished informally, so that subsequent training sessions are suboptimized. Moreover, whereas more traditional distance learning content can be (relatively) easily tracked, performance in simulations and scenario based training is more
268
VE Components and Training Technologies
complex. Hence, conventional learning management systems are not necessarily well suited to incorporate scenario based learning outcomes. Chapter 15 (Conkey and Smith) is an attempt to bridge this gap by discussing how a learning management system (LMS) can be better connected to scenario based training. This involves a rethinking of concepts and practices that are typical in e-learning situations (for example, how performance is measured and recorded) and also of current learning management standards (for example, SCORM [Sharable Content Object Reference Model]). This type of thinking is essential in order for scenario based training to enter the “mainstream” as a viable training strategy. Conkey and Smith provide a good basis to begin this discussion. SUMMARY/LESSONS LEARNED As virtual technologies continue to develop, it is clear that they will be an increasingly popular vehicle in which to embed training. In order to optimize the transition to virtual training, it is essential that training researchers rely on the science of learning and performance as a basis to design effective training. While scenario based training has been the subject of scientifically based investigation only for about 20 years, much has been written and learned about how to optimize design. Chapter 13 (Stout, Bowers, and Nicholson) summarizes the extant literature in this area. In fact, these authors provide detailed guidelines for how best to design and implement scenario based training. Based on the chapters in Volume 2, Section 2, along with other work in this series and beyond, it is fair to conclude that scenario based training has graduated from a new, untested technique to a relatively well-developed one. However, much needs to be done to fully realize the potential of this approach. Many of the chapters in Volume 2, Section 2 address future research needs. Our assessment is that further work needs to be done in several areas, most of which are related in some way to the issue of performance assessment. We say this because many of the issues that we believe are most pressing—dynamically measuring trainee performance, adapting feedback to trainee needs, implementing intelligent tutoring concepts, connecting training outcomes to LMSs, providing adaptive narratives, and establishing expert performance standards—are all related in one way or another to timely, accurate performance measurement. In modern virtual training systems, this is a multidisciplinary challenge since it involves human performance experts, as well as measurement experts, technologists, engineers, programmers, and learning scientists. We are optimistic that such research and development will occur, especially as new fields (for example, health care and law enforcement) begin to embrace scenario based training as a viable alternative. REFERENCES Annett, J., & Stanton, N. (2006). Task Analysis. International Review of Industrial and Organizational Psychology (Vol. 21, pp. 45–78). Hoboken, NJ: Wiley Publishing.
Training Support Technologies
269
Cannon-Bowers, J., Burns, J., Salas, E., & Pruitt, J. (1998). Advanced technology in scenario-based training. Making decisions under stress: Implications for individual and team training (pp. 365–374). Washington, DC: American Psychological Association. Cianciolo, A., Matthew, C., Sternberg, R., & Wagner, R. (2006). Tacit knowledge, practical intelligence, and expertise. The Cambridge handbook of expertise and expert performance (pp. 613–632). New York: Cambridge University Press. Fowlkes, J., Dwyer, D., Oser, R., & Salas, E. (1998). Event-based approach to training (EBAT). International Journal of Aviation Psychology, 8(3), 209–221. Ironside, P. (2006, August). Using narrative pedagogy: Learning and practicing interpretive thinking. Journal of Advanced Nursing, 55(4), 478–486. Oser, R., Cannon-Bowers, J., Salas, E., & Dwyer, D. (1999). Enhancing human performance in technology-rich environments: Guidelines for scenario-based training. In E. Salas (Ed.), Human/technology interaction in complex systems (Vol. 9, pp. 175– 202). Greenwich, CT: Elsevier Science/JAI Press. Salas, E., Bowers, C., & Rhodenizer, L. (1998). It is not how much you have but how you use it: Toward a rational use of simulation to support aviation training. International Journal of Aviation Psychology, 8(3), 197–208.
Chapter 13
GUIDELINES FOR USING SIMULATIONS TO TRAIN HIGHER LEVEL COGNITIVE AND TEAMWORK SKILLS Rene´e Stout, Clint Bowers, and Denise Nicholson In recent years, a good deal of attention has been paid to employing virtual environments for training purposes, and the focus of this chapter is to provide guidance on doing so. The notion of virtual environments tends to conjure in the mind Hollywood examples of science fiction, such as the holodeck. It gives an impression of total immersion within the virtual environment with highly complex technology. The sense that this gives is that the environment is more “virtual” to the extent that real world stimuli are filtered out and presented by the technology, such as through the use of helmets, augmented visual displays, and haptic devices. On the other hand, using anything other than real world equipment or materials to perform real world tasks in real world environments constitutes a virtual, synthetic, or simulated environment—it is a simulation of the real world task environment. In this chapter, we consider the interface of a virtual environment to be just another medium for training and do not attempt to address how sophisticated this interface is. We instead focus on the training itself and that we are just using a virtual or simulated environment to accomplish the training. We use the term “simulation” versus “virtual environment” because we want to include synthetic or simulated environments at various points on the spectrum of being virtual, given that we believe that our guidelines apply across the spectrum. With this in mind, we now turn to providing an overview on the use of simulations for training before providing greater detail on the nature of our guidelines. Using simulations to train complex, real world tasks is often perceived as a modern instructional technique, although it actually has a rich history. For example, as early as 2500 B.C., the Egyptians and Sumerians used figurines to depict different warring factions. The use of training simulations in the military also has a long historical precedence, with one of the most well-known uses being the first flight simulator, which was developed by Ed Link in 1936 and used to train U.S. Army airmail pilots.
Guidelines for Using Simulations to Train
271
Today, simulations are used in a variety of settings and for a variety of tasks, such as commercial and military aviation, space exploration, medicine, law enforcement, military combat operations, military and commercial equipment maintenance, military and commercial driving operations, and education. They have the ability to “replicate virtually any real world artifact” (Salas, Bowers, & Rhodenizer, 1998, p. 198), such as detailed terrain, equipment failures, adverse weather, and motion, as well as certain behaviors of virtual team members. Indeed, within some environments, such as commercial aviation, simulation plays a fundamental role. For example, across the airline industry, whereas years ago all training was done in the aircraft, now it is commonplace for training to take place in the simulator, followed by an observational jump-seat ride, and then no actual hands-on training in the aircraft prior to the aviator’s first revenue flight. With the wide use of simulation in the field of practice there comes a concomitant widespread, yet erroneous, belief that simulation equals training (Salas, Milham, & Bowers, 2003). It is important to always keep in mind that simulations are just a tool that can be used for training (Salas et al., 1998). However, as such, they do have several advantages. For example, the following five advantages were listed by Hays (2006, p. 232): 1. Instructional simulations are available almost anytime when compared to using actual equipment that may be unavailable due to other commitments. 2. Simulations can be run faster than actual equipment because simulated exercises can be reset and rerun very quickly (for example, when training air traffic controllers, simulated aircraft or other simulated entities can be quickly added or removed from instructional scenarios). 3. Simulation scenarios are reproducible, so they can be used to teach lessons that require repetition. 4. Simulations can provide the learner with more trials in a given amount of time by eliminating tasks that are not central to the instructional objective. For example, if the objective is to train in-flight refueling, the simulation can omit takeoff or landing tasks. 5. Simulations can provide the learner with cause-and-effect feedback almost immediately, when it is most effective.
Hays (2006) also noted the advantage that simulations can provide trainees with a realistic preview of the operational environment and the jobs that they will perform. In addition, simulations can be used to train procedures that are too risky to do in the operational environment. Moreover, empirical evidence has been found for the training benefit provided by simulations. For example, Hays and his colleagues (Jacobs, Prince, Hays, & Salas, 1990; Hays, Jacobs, Prince, & Salas, 1992) conducted a meta-analysis regarding using simulations for aircraft training. “In statistics, a meta-analysis combines the results of several studies that address a set of related research hypotheses” (Wikipedia, 2007a). They concluded that using simulations combined with aircraft training versus aircraft training alone was found to be favorable in more than 90 percent of the experiments reviewed.
272
VE Components and Training Technologies
Also, probably one of the most well-cited advantages of simulations is that they can provide cost savings. For example, a lifecycle cost analysis of a maintenance simulation revealed that training via the simulation was half as expensive as training on the real equipment (Cicchinelli, Harmon, Keller, & Kottenstette, 1980, as cited in Hays, 2006). The cost savings can be especially appreciated when low cost, such as personal computer (PC) based, simulations are used, which have also been found to provide training value (Baker, Prince, Shrestha, Oser, & Salas, 1993; Brannick, Prince, & Salas, 2005; Brannick, Prince, Salas, & Stout, 1995; Jentsch & Bowers, 1998; Stout, Salas, Merket, & Bowers, 1998). Because of the widespread acceptance and use of simulations, especially in domains characterized by stress and dynamic task conditions, research has also been conducted regarding how to most effectively use simulations to train complex real world tasks and higher level cognitive and teamwork skills (these skills are explained below). However, unfortunately, the field of practice is replete with examples of training designers, developers, and implementers ignoring much of the guidance that has resulted from research. Perhaps one of the challenges facing these training personnel is that it is difficult to find in the literature a set of clear, easy to use guidelines for training higher level cognitive and teamwork skills. The current document attempts to compile such a set of guidelines. The guidelines are not empirically based, but were instead derived from suggestions in the literature, as well as based upon the current author’s practical experience in the field. It should be noted that the assumption the current authors make is that these guidelines will be used to train individuals and teams to perform in complex, dynamic situations. As described by Cannon-Bowers and Salas (1998, p. 19), these situations are characterized by the following conditions: • Multiple information sources, • Incomplete, conflicting information, • Rapidly changing, evolving scenarios, • Adverse physical conditions, • Performance pressure, • Time pressure, • High work/information load, • Auditory overload/interference, and • Threat.
It is particularly under these types of conditions that higher level cognitive skills and teamwork skills are needed. Therefore, the guidelines in this chapter are focused on using simulations to develop these types of skills. Before proceeding to explain the organization of this chapter, it is worth a moment to explain what is meant by higher level cognitive skills versus psychomotor skills and to differentiate between teamwork skills and “task work” skills. Each will be addressed in turn.
Guidelines for Using Simulations to Train
273
Predominantly, three “domains” or areas of competencies have been discussed in the literature: cognitive, psychomotor, and affective. (The affective domain concerns attitudes and motivation and will not be addressed here.) Within the cognitive domain, Bloom, Engelhart, Furst, Hill, and Krahwohl (1956) derived a “taxonomy” or categorization of competencies from lowest level to highest level as follows: knowledge, comprehension, application, analysis, synthesis, and evaluation. An example of each of these follows: 1. Knowledge: able to list the major components of a diesel engine; 2. Comprehension: able to understand the meaning of nonliteral statements (for example, metaphor and irony); 3. Application: able to predict the angle of bank of the aircraft given a specific airspeed, tailwind, and turn ratio; 4. Analysis: able to distinguish facts from hypotheses; 5. Synthesis: able to propose ways to test a hypothesis; 6. Evaluation: able to indicate logical fallacies in arguments.
In contrast, a taxonomy for the psychomotor domain was developed by Simpson (1972) as follows from least complex to most complex: perception, set, guided response, mechanism, complex overt response, adaptation, and origination. An example of each of these follows: 1. Perception: able to recognize the problem in a failing air conditioner based on the sound it makes while running; 2. Set: able to position hands preparatory to typing; 3. Guided response: able to make appropriate hand signals to wave off aircraft as demonstrated; 4. Mechanism: able to start a fire with sticks; 5. Complex overt response: able to operate a particular weapons system; 6. Adaptation: able to pump car brakes to stop on ice (when not specifically taught); 7. Origination: able to create a more efficient method to disassemble a complex piece of machinery.
The specifics of these taxonomies are not important. Rather, the important point is that psychomotor tasks absolutely have a cognitive component (hence the “psycho” in “psychomotor”), but the focus is on doing a motor task. Cognitive skills go beyond just being able to carry out the motor portion of the task. They involve things that trainees must think about, such as assessing a situation, problem solving, decision making, and adaptability. Regarding the difference between task work skills and teamwork skills, Cannon-Bowers, Tannenbaum, Salas, and Volpe (1995) and Smith-Jentsch, Johnston, and Payne (1998) made the following distinction. They explained that task work skills relate to the skills required to perform the job that are specific to the position (such as skills at interpreting a radar display) and are usually
274
VE Components and Training Technologies
technical in nature (although they may include perceptual and cognitive components versus just motor components, such as the example given here). On the other hand, teamwork skills relate to skills required to coordinate activities with other team members to perform the mission. Based upon the distinction between cognitive and psychomotor skills and then between task work and teamwork skills, the current chapter considers that training individual technical skills is different from training teamwork skills. It considers individual technical skills as containing both cognitive and psychomotor components. Conversely, training teamwork skills focuses on the team coordination requirements needed for performance. When discussing guidelines on using simulations to train teamwork skills, it therefore does not address training individual technical skills. Furthermore, when discussing guidelines on using simulations to train higher level cognitive skills, it focuses on developing such processes as situation assessment, problem solving, decision making, and adaptability versus lower level psychomotor or cognitive processes. GUIDELINES FOR USING SIMULATIONS TO TRAIN HIGHER LEVEL COGNITIVE SKILLS THAT ARE NOT SPECIFIC TO TEAMWORK SKILLS (BUT ARE ALL RELEVANT TO TEAMWORK SKILLS) Scenario and Training Environment Embed triggers or events/opportunities for trainees to practice and receive feedback on critical tasks and competencies associated with learning objectives (Oser, Cannon-Bowers, Salas, & Dwyer, 1999; Stout et al., 1998). • They allow trainees to demonstrate their proficiencies and deficiencies for the purpose of performance measurement, diagnosis, and feedback (Oser et al., 1999). • Include a number of triggers for each learning objective that vary in difficulty and occur at different points in the scenario (Oser et al., 1999; Prince, Oser, Salas, & Woodruff, 1993).
Functional fidelity or “how the simulation works or provides the necessary information to support the task” (Hays, 2006, p. 245), including relevant context cues, trumps physical fidelity (Beaubien & Baker, 2004; Hays, 2006; Johnston, Poirier, & Smith-Jentsch, 1998; Ross, Phillips, Klein, & Cohn, 2005; Yusko & Goldstein, 1997) or “how the simulation looks; the physical characteristics of the simulation” (Hays, 2006, p. 245). • Others have used the term psychological fidelity and indicate that it is more important than physical fidelity. For example, Beaubien and Baker (2004) defined psychological fidelity as “the degree to which the trainee perceives the simulation to be a believable surrogate for the trained task. Alternatively, it could be defined as the match between the trainee’s performance in the simulator and the real world. For example, a PC based flight simulator could be defined as high in psychological fidelity if the trainees temporarily suspend disbelief and interact as much as they would in the real world” (p. 52). They further noted that, without the temporary suspension of disbelief,
Guidelines for Using Simulations to Train
275
trainees are unlikely to behave as they would in the real world and that psychological fidelity can be maximized by developing scenarios that mimic the task demands of the real system.
A certain degree of physical or equipment fidelity is obviously needed to induce psychological fidelity, to incorporate relevant cognitive cues, and to enhance transfer of training (such as a noisy helicopter cockpit), but irrelevant physical details do not add to the training experience and may indeed detract from it (as discussed later in the chapter).
“Consider all matters that may be important to creating the illusion of reality. This includes wearing uniforms, gloves, and equipment” (Prince et al., 1993, p. 75) required in the real world. These should particularly be used if they cause constraints in the real world, such as weight restrictions of a pilot’s helmet or survival gear (Prince et al., 1993).
• Moreover, higher physical fidelity is not necessarily better—what is important is to accurately represent cognitive cues (Ross et al., 2005).
“Although there is a tendency to believe that more fidelity is always better, the published research does not support this conclusion. Specifically, we were unable to identify any studies that found a direct correlation between the level of simulation fidelity and training related outcomes, such as learning, transfer, and safety. Like any other tool, the effectiveness of simulation technology depends on how it is used.” (Beaubien & Baker, 2004, p. 55).
Furthermore, Salas et al. (1998) cited four studies that found that simulations with greater scene detail and/or greater scene variety had little to no effect on trainee performance. They concluded that those funding the development of simulations must emphasize learning instead of technology.
“The context must be authentic in relationship to how practitioners experience and act in real-life settings. Building a context to support authentic domain experience is not the same thing as simulating physical fidelity. Reproducing billowing smoke, elegantly drawn leaves on trees, or precise shadows is artistically rewarding, but irrelevant if those elements are not used in assessments or decisions typical of that situation. Meanwhile, failing to represent a tiny pile of freshly overturned dirt indicating that the nearby entrance of a cave has been disturbed can interfere with an authentic cognitive experience.” (Ross et al., 2005, p. 22).
• The level of fidelity that is required is based upon the learning objectives.
For example, Hays (2006) provided the following example: “If a trainee is learning to fly a plane in a simulated cockpit, and the task is to fly at a specific altitude, then the simulated altimeter must be represented in enough detail that the learner can read the altitude. On the other hand, if the training task is only to locate the display next to the altitude indicator, then the simulation does not need to include a functioning altimeter” (p. 246).
Add features to increase the user’s acceptance of the simulation and motivation to use it. • Have instructors espouse the simulation’s usefulness (Hays, 2006).
Trainees believed that training would be more useful (Cohen, 1990, as cited in Salas, Rhodenizer, & Bowers, 2000) and exerted more effort to transfer what they
276
VE Components and Training Technologies learned in training (Huczynski & Louis, 1980, as cited in Salas et al., 2000) when supervisors supported them attending the training.
Performance was enhanced when supervisors participated in goal setting prior to training (Magjuka, Baldwin, & Loher, 2000, as cited in Salas et al., 2000).
• Use maps, charts, checklists, other documentation (for example, approach plates and NOTAMs—notice to airmen) and relevant peripherals (for example, headsets or headphones; a yoke or joystick; a simulated box for changing radio frequencies) just as they would be used in the real world (Prince et al., 1993; Stout et al., 1998).
Provide trainees with all of the background information that they would have in the real world prior to the start of the simulation (Yusko & Goldstein, 1997), and allow them adequate time to review this information. Provide multiple and varied scenarios to help trainees generalize their competencies and adapt to novel situations, because training cannot possibly cover every existing and potential future situation (Ross et al., 2005). • “Scenarios should allow participants to undergo different courses of action” (Salas et al., 2000, p. 508). • “Scenarios should allow participants to perform the desired behaviors on multiple occasions” (Salas et al., 2000, p. 508). • Avoid using only exemplar, clear scenarios; instead include many exceptions and variations (Ross et al., 2005). • Incorporate into scenarios situations that have more than one right answer (Oser et al., 1999; Prince et al., 1993), especially avoiding one obvious right answer (Prince et al., 1993; Yusko & Goldstein, 1997).
Also, incorporate scenarios “that have several specific and sensible avenues for a solution” (Swezey & Salas, 1992, p. 235) and require trainees to demonstrate the ability to perform tasks using different approaches (Swezey & Salas, 1992).
• Incorporate some scenarios that prohibit (with realistic causes) common ways of accomplishing the task to force trainees to find alternatives and elaborate their understanding of the task situation (Ross et al., 2005).
Also, incorporate some scenarios that are designed to “go wrong” at certain points to allow trainees to confront and resolve anomalies (Swezey & Salas, 1992).
• In some scenarios, incorporate conflicting goals or rules, for example, an attacker using a civilian as a shield when given the rules of engagement of (1) do not injure innocent civilians and (2) defend yourself against an attacker (Ross et al., 2005).
Incorporate into scenarios situations with information that is conflicting, ambiguous, incomplete, or incorrect (Oser et al., 1999), so that it emulates the degree of “chaos” found in the real world (Yusko & Goldstein, 1997).
Ensure that scenarios are sufficiently challenging, yet not too challenging (Hays, 2006; Prince et al., 1993). • Scenarios should be “just beyond the trainee’s current level of competence” (Kozlowski, 1998, p. 128). • Increase complexity/challenge as trainees advance (Hays, 2006).
Guidelines for Using Simulations to Train
277
• “Experience using simulation for training . . . has demonstrated the value of providing easier simulations in the beginning, so that a single skill or two may be practiced before having to integrate all the skills into a dynamic situation” (Prince et al., 1993, p. 74).
Trainees are better able to adapt to simple conditions than to complex, dynamic ones (Swezey & Salas, 1992).
• Avoid artificially simplifying the simulation of the operational environment to, in turn, avoid giving the trainee an incorrect impression of the domain (Ross et al., 2005). Oversimplifying concepts, such as those in the medical domain, leads to the development of misconceptions and can impede further learning (Feltovich, Spiro, & Coulson, 1989, 1993).
Appropriately “chunk” or organize information so that it does not overwhelm the trainee and can be processed more effectively (Chase & Simon, 1973; Miller, 1956). • Using a larger scenario, from which multiple “mini scenarios” or vignettes are drawn, can help; it can help to retain the full complexity of the real environment without oversimplifying concepts and without overwhelming the trainee (Ross et al., 2005). • For novices and beginners, “basic knowledge should be utilized in focused problem sets within a small, but rich setting characteristic of the practice domain” (Ross et al., 2005).
Incorporate novelty, new information, surprises, and/or turns of events (Hays, 2006; Ross et al., 2005; Yusko & Goldstein, 1997). • “Scenarios should introduce surprises during the execution of missions to provide practice in rapidly responding to the changed situation” (Ross et al., 2005, p. 57).
“For example, friendly units could become unable to perform (e.g., because they cannot reach their intended position, or because a weapon system breaks down); the enemy could move in a nontraditional way or bring a larger force than was reported by intelligence or reconnaissance; key roads could be too muddy to traverse or blocked by refugees demanding assistance; or, higher headquarters could deliver a new frag order based on an opportunistic target or other change in the situation” (Ross et al., 2005, p. 57).
• Baffling events and unmistakable anomalies can help trainees unlearn misconceptions (Ross et al., 2005). • Add meaningful distractions, such as introducing an emergency when trainees are busy working on a required procedure—this forces them to assign priorities to tasks and to monitor their progress to ensure completion (Prince et al., 1993).
Eliminate meaningless distractions, such as errors in underlying simulation models (Hays, Stout, & Ryan-Jones, 2005).
Design the scenario such that effective trainees will do well and avoid no-win scenarios (Yusko & Goldstein, 1997). Similarly, do not include any tricks in scenarios (Prince et al., 1993).
278
VE Components and Training Technologies
Instructional Strategy Avoid free play/discovery learning (Oser et al., 1999). • Unsupported practice can leave opportunities to train and receive feedback on critical competencies to chance (Oser et al., 1999). • Provide instructional support; without it, the training may not only be ineffective but may be detrimental to learning (Hays, 2006).
The problems with discovery learning may center on difficulties that learners have in forming and testing hypotheses when this strategy is used (de Jong & van Joolingen, 1998, as cited in Hays, 2006).
Do not short-cut training time (Ross et al., 2005). • “Insufficient time or inappropriate structure for exploring a domain may result in deficient or easily forgotten networks of concepts and principles that represent key phenomena and interrelationships in a domain” (Ross et al., 2005, p. 29).
Simulation exercises should be part of a larger instructional program (Hays, 2006; Prince et al., 1993). • “Design scenarios as part of a total training program” (Prince et al., 1993, p. 74). • Characteristics of the simulation are not as important as the design of the training program that uses the simulation (Caro, 1973, as cited in Hays, 2006). • “High fidelity simulations can enhance the perceived realism of well designed training programmes, but cannot compensate for poorly designed ones” (Beaubein & Baker, 2004, p. 55). • Providing information, followed by demonstration and then practice and feedback (via simulation) on required competencies is a recommended strategy (Prince et al., 1993; Serfaty, Entin, & Johnston, 1998).
Use whole- versus part-task trainers appropriately (Beaubein & Baker, 2004). • Part-task trainers can take many forms, but all essentially segment a complex task into its main components (Beaubein & Baker, 2004). • Use of part-task trainers is less costly and can help trainees develop lower level technical skills such that training on the highest level skills can be reserved for wholetask, full-mission simulations (Beaubein & Baker, 2004; Kirlik, Fisk, Walker, & Rothrock, 1998).
Premature use of whole-task trainers may overwhelm trainees with environmental distractions, stress, and time pressure (Beaubein & Baker, 2004).
On the other hand, “Part-task training must be supplemented with additional fulltask training to provide the trainee with an opportunity to integrate part-task skills with cognitive activities required by the full-task context” (Kirlik et al., 1998, p. 111).
• When using part-task trainers, ensure that scenarios do not go beyond their reach (for example, if using a PC based flight simulator not capable of simulating equipment
Guidelines for Using Simulations to Train
279
malfunctions, do not include learning objectives on equipment-specific troubleshooting; Prince et al., 1993).
Encourage mental simulation during the performance of some scenarios (Driskell & Johnston, 1998) to predict events and courses of action (Ross et al., 2005) (see also “Feedback” later in this chapter; the key is that mental simulation can be used during the performance of scenarios, during within-scenario feedback sessions, and during post-session debriefs). • Help trainees to “stop and think” about their processes (Ross et al., 2005). • “Trainees should be taught mental simulation as a way to improve skills and to evaluate and develop options for decisions that must be made in time-pressured environments” (Kozlowski, 1998, p. 129).
Help trainees to develop their “metacognitive” skills (that is, to be more aware of their own thinking processes and what they do and do not understand) (Salas et al., 2000). • Junior first officers (that is, pilots) provided with metacognitive training were better at providing backup support (Jentsch, 1997, as cited in Salas et al., 2000).
Aim for “overlearning” (Driskell & Johnston, 1998; Schendel & Hagman, 1982; Hagman & Rose, 1983) for technical skills to free up attentional resources on higher level cognitive tasks. • Overlearning is “a pedagogical concept according to which newly acquired skills should be practiced well beyond the point of initial mastery, leading to automaticity” (Wikipedia, 2007b).
“Pedagogical” is the same thing as a way of teaching or instructing.
“Automaticity” means that the task can be performed without conscious, effortful use of attentional resources (for example, one can chew gum and walk; one can drive a car and talk).
• Part-task trainers may be particularly suited to achieve overlearning (Beaubien & Baker, 2004; Kirlik et al., 1998).
Incorporate “scaffolding” into some scenarios (Beaubien & Baker, 2004). • Various authors have defined this instructional practice in different ways, but most refer to decreasing instructional support as trainees advance. For this guideline, the current authors use scaffolding as described by Beaubien and Baker (2004) where instructors, facilitators, or role-players take over some of the simulated task requirements initially and, over time, gradually withdraw from the task.
Incorporate competition (Hays, 2006). • Competition can be with a live opponent, against a computer-controlled opponent, or against a criterion score (Hays, 2006).
280
VE Components and Training Technologies
Performance Measurement/Assessment Process measurement trumps outcome measurement (see “Feedback” later in the chapter for more details). • “Develop performance diagnosis strategies and tools that will enable observers to identify deviations between observed and desired performance trends” (Oser et al., 1999, p. 199).
Have a measurement scheme delineated a priori (that is, in advance; Salas et al., 2000). • Both processes and outcomes should be identified in advance (Salas et al., 2000).
Use multiple measures to obtain a more accurate representation of performance (Cannon-Bowers & Salas, 1997). When possible, have a dedicated observer/rater (Smith-Jentsch, Zeisig, Acton, & McPherson, 1998). Measure not only what trainees do, but also what they think about, and evaluate the degree of “goodness” of their thinking versus whether it is right/wrong (Ross et al., 2005). Use tools to assist in capturing performance data (Cannon-Bowers, Burns, Salas, & Pruitt, 1998). Feedback Delivery1 Have a feedback presentation scheme delineated a priori. Process feedback trumps outcome feedback (or knowledge of results)—provide process feedback linked to outcomes (Oser et al., 1999; Ross et al., 2005; Salas et al., 2000; Smith-Jentsch, Johnston, et al., 1998; Yusko & Goldstein, 1997). • In a nutshell, outcome measures answer whether the correct decision was made or correct action was taken (for example, did the bomb hit the target?), while process measures answer whether or not the decision was made correctly or actions were taken correctly (for example, what were the steps, communications, and so forth, that led to the bomb hitting or missing the target?). “Making the right decision does not mean the right process was used to arrive at the decision. Therefore, feedback concerning the team processes will be more diagnostic in determining where a team’s weaknesses may be and more useful in helping the team to correct performance errors” (Salas et al., 2000, p. 495). (Note this quote is included here because the principle is relevant to individual performance as well.) Furthermore, aspects of the situation may be out of the trainees’ control, so the right processes may be utilized yet success may not have been achieved (Yusko & 1 The reader should also see Lampton, Martin, Meliza, and Goldberg (Volume 2, Section 2, Chapter 14) and Riley, Kaber, Sheik-Nainar, and Endsley (Volume 2, Section 2, Chapter 16) of the handbook for further guidance on feedback delivery and design and use of after action review (AAR) systems, respectively.
Guidelines for Using Simulations to Train
281
Goldstein, 1997). Providing negative feedback regarding the negative outcome would do little to change the situation. • “Measurement of processes is critical for diagnosing specific deficiencies associated with how a given outcome was reached.” (Oser et al., 1999, p. 186). • Process measures are critical to providing feedback for purposes of training, and outcome measures are needed to identify which processes are more effective (SmithJentsch, Johnston, et al., 1998). • Help trainees to accurately diagnose their limitations and reasons for poor performance and to develop self-assessment skills (Ross et al., 2005).
Train facilitators to provide effective feedback and coaching (Smith-Jentsch, Zeisig, et al., 1998; Yusko & Goldstein, 1997). (See also other types of training to provide to instructors under “Use of Instructors and Instructor Training” later in this chapter). • “Coaching is a complex and difficult skill to master and also one of the most important . . . Although most people are not born great coaches, they can improve dramatically through systematic training and practice” (Yusko & Goldstein, 1997, p. 222). • Active practice at giving feedback is necessary to impart skills at doing so (SmithJentsch, Zeisig, et al., 1998). • Facilitators/coaches should do the following:
Create an open and trusting climate (Smith-Jentsch, Zeisig, et al., 1998; Yusko & Goldstein, 1997).
Show empathy, yet remain accurate and truthful (Yusko & Goldstein, 1997).
Encourage input from the trainees (Oser et al., 1999; Ross et al., 2005; SmithJentsch, Zeisig, et al., 1998).
Pause and make eye contact with trainees after asking for input (Smith-Jentsch, Zeisig, et al., 1998).
Reserve their input for times when trainees do not respond or to clarify issues or to elaborate upon them (Smith-Jentsch, Zeisig, et al., 1998).
Reinforce trainee participation in the feedback process (Oser et al., 1999; Ross et al., 2005; Smith-Jentsch, Zeisig, et al., 1998).
Focus on specific behavioral versus person-oriented feedback (with both positive and negative behavioral examples; Prince et al., 1993; Smith-Jentsch, Johnston, et al., 1998; Smith-Jentsch, Zeisig, et al., 1998; Swezey & Salas, 1992).
Follow up on behavioral examples provided by trainees by asking them how these affected or could have affected performance outcomes (Smith-Jentsch, Zeisig, et al., 1998).
Refrain from being judgmental (Yusko & Goldstein, 1997).
Emphasize that the feedback is just from their perspectives and not necessarily “objective reality” (Yusko & Goldstein, 1997).
A picture is worth a thousand words, so use video and/or audio playback (especially to capture communications) and display playback (Oser et al., 1999; Stout et al., 1998; Swezey & Salas, 1992).
282
VE Components and Training Technologies
• “Training technologies can show the trainees that they do not fully understand what is important by illustrating the consequences of focusing on the wrong part of the situation” (Ross et al., 2005, p. 47).
Use both during scenario feedback and AARs (Ross et al., 2005). • Trainees benefit from feedback that is continual versus present only at the end of a session (Ross et al., 2005). Semistructured “time-outs” during the execution of some scenarios to encourage trainees to discuss their current interpretations of the situation, to mentally simulate how it will play out, and to predict the courses of action that would have the most desirable consequences would be beneficial. Likewise, post-session debriefs or after action reviews should encourage trainees to discuss their interpretations of the situation at various points in the scenario and how various courses of action either supported or failed to support the goals of the mission (Ross et al., 2005). • System-initiated advice given during a computer based simulation helped individuals learn domain related concepts (Leutner, 1993). • After observing simulated shipboard combat information center (CIC) exercises, Kirlik et al. (1998) concluded that post-scenario debriefing appeared to come too late for the trainee to benefit from the information.
Help trainees to understand when their knowledge structures are incomplete (Hays, 2006). • Preconceptions should be made clear (Feltovich et al., 1993). • Instructors should identify and prevent misconceptions early in training and guide trainees to correct their own misconceptions later in training (Kozlowski, 1998). • Allow time during feedback to focus on disconfirming information to help trainees unlearn misconceptions (Ross et al., 2005). • Allow trainees to “tell the story” and help them elaborate what is correct and unlearn what is not to help them unlearn misconceptions (Ross et al., 2005).
Use queries to explore the trainees’ thinking (Ross et al., 2005).
• Model expert thinking by walking through the facts and correcting mistaken assumptions; “paint a picture” of the situation by walking through the facts and noting how decisions evolve and what their consequences are (Ross et al., 2005).
When behaviors are modeled by the instructors/facilitators, provide trainees with specific instructions of what to watch for (Jentsch, Bowers, & Salas, 2001). • Keep in mind that research has shown that negative (incorrect) behaviors are better recognized by trainees and behaviors are more recognized if consequences are shown (Jentsch et al., 2001). • Also, keep in mind that research has shown that trainees are better able to generalize modeled behavior to a transfer setting if they viewed both negative and positive behaviors than if they viewed just positive behaviors (Baldwin, 1992, as cited in Salas et al., 2000).
• When instructors are not available or when otherwise appropriate, use technologies to help learners compare their understanding and handling of the situation with that of experts (Ross et al., 2005).
Guidelines for Using Simulations to Train
283
Encourage mental simulations to reflect on one’s performance in the scenario (Ross et al., 2005). • Help trainees to identify what parameters of different scenarios have been changed and to identify what exceptions and variations occurred (and their impact on situation assessment and choice of courses of action) (Ross et al., 2005). • Help trainees to reflect upon the situation and to revisit it from multiple viewpoints (Ross et al., 2005). • Help trainees to self-examine their performances by asking and answering queries such as “what if . . . ?” or “how else can I . . . ?” (Ross et al., 2005).
Taper feedback as trainees progress (Ross et al., 2005). • “The teacher’s job is ‘to hold the learners in their zone of proximal development by providing just enough help and guidance, but not too much” (Perkins, 1992, as cited in Ross et al., 2005, p. 30). • “At the novice level, instructors or coaches or mentors are necessary to guide and direct the learning process, more so than at the later stages” (Ross et al, 2005, p. 54).
Use tools to organize the feedback session (Cannon-Bowers, Burns, et al., 1998). Use of Instructors and Instructor Training More is needed to train than just a thorough knowledge of the domain (Ross et al., 2005; Roth, 1998). • “Job performance does not impute instructional competence” (Roth, 1998, p. 362). • “At this time, the best facilitator is a person who knows the field and understands how to help people perceive situations and reflect at different stages of competency” (Ross et al., 2005, p. 26). • A good instructor is the primary factor in creating an authentic training experience (Ross et al., 2005).
Provide training to instructors on the conduct of scenario exercises (Oser et al., 1999), in addition to providing training on providing feedback during or after the exercise, as discussed earlier in this chapter under “Feedback Delivery.” This training should focus on helping them to do the following: • Embed triggers or opportunities for the trainees to demonstrate key competencies (Oser et al., 1999). • Observe and record key behaviors or rate performance with the performance measurement instruments (Oser et al., 1999; Prince et al., 1993). • Monitor scenario progress relative to a plan (Oser et al., 1999). • Adapt to unexpected situations and unexpected trainee responses in a realistic manner (Oser et al., 1999).
284
VE Components and Training Technologies
• Control the scenario such that these control functions are transparent to the trainees (Oser et al., 1999). • Use the simulation equipment, including equipment that facilitates instructor control (Oser et al., 1999) (see also “scenario management plan” under “Logistical Issues” later in this chapter).
When training observers/raters, they should not only practice with extremes (that is, clearly good performance and clearly poor performance), but they should also practice with instances in between (Stout, Prince, Salas, & Brannick, 1995; Yusko & Goldstein, 1997). • Rater training should ensure that there are acceptable levels of agreement among raters (Stout et al., 1995; Yusko & Goldstein, 1997).
• Training should also be provided to role-players before they interact with the actual trainees (Yusko & Goldstein, 1997).
Training provided to role-players should include explaining that they should stay in their roles versus entering unrealistically into deliberations of the trainees (Prince et al., 1993).
Use only facilitators who are credible to the training audience (Prince et al., 1993) (for example, if the audience is surgeons, they are not likely to accept feedback on surgical decisions from a nurse). Use apprentice level mentors versus true experts, because the latter has forgotten what tends to confuse trainees. • This is because true experts have difficulty verbalizing or articulating what they know, because their knowledge has become very deeply ingrained (Ross et al., 2005).
Logistical Issues Try out scenarios prior to implementation to test for problems (Oser et al., 1999) to ensure that they are sufficiently challenging, yet not too challenging or unnecessarily complex (Prince et al., 1993), and to establish realism (Johnston et al., 1998; Stout et al., 1998). • Try out the scenarios with several different trainees who represent the training audience (Prince et al., 1993).
Before the pilot test, scenario designers should themselves try out the scenarios (Prince et al., 1993).
• Do not rely just on expert opinion regarding how challenging a scenario is—look at trainee performance during the pilot test (for example, the current authors worked with experts in the development of a scenario that they felt would be far too challenging for their trainees only to discover that it was indeed far too easy for the five trainees in the pilot test). • Expect several iterations of the scenario before getting the “right” one to use with the training audience, because the construction of scenarios is not an exact science (Prince et al., 1993).
Guidelines for Using Simulations to Train
285
• Look for trainee actions that would take them off course of the scenario and find realistic ways to bring them back in (Stout et al., 1998).
“Controllers must be capable of modifying a scenario in real time in response to training audience decisions and performance . . . and . . . for ensuring continuity and realism” (Oser et al., 1999, p. 190).
“Use scenario control and management techniques that do not prevent the training audience from making decisions in a natural manner and that are transparent to the training audience” (Oser et al., 1999, p. 200).
• Look for common mistakes made by trainees to prepare facilitators or designated observers to capture them in their performance measurement and feedback schemes.
Allow adequate time to practice using the simulation prior to the first training session (Stout et al., 1998). Identify all of the organizational resources that will be necessary for conducting the scenarios as early as possible (Prince et al., 1993). • For example, if air traffic control (ATC) needs to be included in the scenario and there are no qualified individuals to play this role, some ATC messages can be prerecorded (Prince et al., 1993).
Prebrief simulation/scenario participants, including the trainees and facilitators (including role-players, controllers, and observers; Oser et al., 1999; Prince et al., 1993). • Explain each of the following:
The purpose, focus, and objectives of the scenario (Oser et al., 1999);
The scenario schedule (Oser et al., 1999);
Rules for the scenario (Oser et al., 1999);
Scenario flow (Oser et al., 1999);
Participant responsibilities (Oser et al., 1999);
Any simulation-specific limitations that may impact performance (Oser et al., 1999; Stout et al., 1998). • When possible, incorporate simulation-specific limitations into the scenario (for example, for a flight simulator, minor simulator malfunctions can be placarded by “maintenance,” just as they would in the real world (Prince et al., 1993).
“Develop and implement a ‘scenario management plan’ for control of the scenario” (Oser et al., 1999). • “Identify the requirements for and roles of scenario control personnel for overall scenario management (e.g., senior controller, senior role-player, senior observer). The scenario management plan should include: a) clear procedures for beginning and ending the scenario, b) contingency plans to follow in case of unexpected events (e.g., communication problems, simulation problems), c) the flow of the scenario, and d) clear procedures for control of the scenario. During the scenario, progress needs to be monitored relative to the management plan” (Oser et al., 1999, p. 198).
286
VE Components and Training Technologies
• When the facilitator must time the introduction of a fault or provide a prompt, tie the timing to information that is easy for the facilitator to note, to maintain scenario consistency (Prince et al., 1993). For example, “if it is easier for the facilitator to see and keep track of the distance than the time, then his or her planned interventions should be based on distance measuring equipment rather than time” (p. 80).
An example prompt that may be needed was given by Prince et al. (1993): “For example, if it is not possible to land at the briefed airfield and an alternate must be selected, a crew may not recognize the need to make that decision in a predetermined reasonable amount of time. The facilitator, acting as controller, can then prompt them to state their intentions” (p. 80).
• Script the role of the facilitator to the extent possible to guard against casual redesign of the scenario (Prince et al., 1993).
Creation of subscenarios can help the facilitator from having to depart from the script when trainees take an unusual action that affects the rest of the scenario in an adverse way (Sherwin, 1981, as cited in Prince et al., 1993).
Allow adequate time for a thorough debrief (Smith-Jentsch et al., 1998). GUIDELINES FOR USING SIMULATIONS THAT ARE SPECIFIC TO TRAINING TEAMWORK SKILLS (AND ARE NOT APPLICABLE TO TRAINING INDIVIDUAL HIGHER LEVEL COGNITIVE SKILLS, UNLESS OTHERWISE NOTED) Scenario/Training Environment Scenarios should involve all team members (Salas et al., 2000; Swezey & Salas, 1992) either by their actual participation or through the use of roleplayers (Prince et al., 1993). When relevant, incorporate into scenarios situations where multiple organizations must coordinate for effective performance (Oser et al., 1999). Allow time for team members to conduct a premission brief (such as a preflight brief) or to plan as they would in the real world (Prince et al., 1993). Ensure that communications are conducted and channeled as they would be in the real world (Prince et al., 1993). (Note this guideline is potentially applicable to training individual higher level cognitive skills.) • Include any background noise in the communication system that would be present in the real world (Lauber, 1981, as cited in Prince et al., 1993). • Include realistic interruptions (for example, receipt of ATC messages; communications from other aircraft or agencies) (Prince et al., 1993).
Instructional Strategy Teach team leaders to effectively prebrief the scenario (Tannenbaum, SmithJentsch, & Behson, 1998). • “Team leaders do not necessarily possess the skills required for conducting effective briefings—for example, they tend to over-rely on one-way communications” (Tannenbaum et al., 1998, p. 264).
Guidelines for Using Simulations to Train
287
• “Teams do not naturally conduct effective briefings—for example, they tend to gravitate toward discussing outcomes and task work skills to the exclusion of teamwork skills” (Tannenbaum et al., 1998, p. 264).
Team leaders can be trained to conduct more effective briefings, such as how to probe more and to guide the team to consider teamwork behaviors (Tannenbaum et al., 1998).
Cross-train team members (Blickensderfer, Cannon-Bowers, & Salas, 1998; Cannon-Bowers, Salas, Blickensderfer, & Bowers, 1998; Salas et al., 2000; Swezey & Salas, 1992). • Expose trainees to their team members’ roles, responsibilities, and information needs (Salas et al., 2000). • Train them on the tasks of other team members/how other members operate (Salas et al., 2000).
“Interdependencies among team members should be clarified” (Swezey & Salas, 1992, p. 227).
The team task analysis should drive what aspects of team member tasks should be focused upon in cross-training (Blickensderfer et al., 1998).
• Explain to trainees how their task performance is related to the overall goals of the team (Kozlowski, 1998). • Use positional rotation to foster an understanding of other team members’ positions (Blickensderfer et al., 1998; Cannon-Bowers, Salas, et al., 1998; Salas et al., 2000). • Cross-training can help team members to anticipate each others’ needs (Blickensderfer et al., 1998), which can potentially allow them to better monitor each others’ performance and provide more effective backup support.
Help trainees to “stop and think” about their team processes during the execution of some scenarios (Ilgen, Hollenbeck, Johnson, & Jundt, 2005). • Teach team leaders to articulate periodic situation updates, including problems with assessments, such as missing, unreliable, or conflicting evidence (Cohen, Freeman, & Thompson, 1998). • “Team training should include techniques for training individuals to analyze their own errors, to sense when the team or individual team members are overloaded, and to adjust their behavior when overloads occur” (Swezey & Salas, 1992, p. 233). • “Every team member should be able to recognize unexpected events and to describe actions which he or she would expect to take when an unexpected event interferes with, or changes, the team’s purpose, structure, or dependency situation” (Swezey & Salas, 1992, p. 235).
Incorporate competition among different trainees/trainee teams. Performance Measurement/Assessment Have a team performance measurement scheme delineated a priori (that is, in advance) (Salas et al., 2000).
288
VE Components and Training Technologies
• “Team and individual process and outcome measures should be identified in advance” (Smith-Jentsch, Johnston, et al., 1998; Salas et al., 2000, p. 508). • For example, Fowlkes, Lane, Salas, Franz, and Oser (1994) described a team performance measurement scheme that they and their colleagues at NAVAIR Orlando developed and applied to an aircrew coordination training (ACT) research and development program. This measurement scheme was named “TARGETs” for Targeted Acceptable Responses to Generated Events and Tasks. It followed the Synthetic Battlefield Authoring Tool (SBAT) approach and, working with subject matter experts, specific desirable behaviors were identified for each scenario that was developed. It focused not on right or wrong responses, but on what would be a better response for the crew to make based upon the situation at hand. When consensus could be obtained among the experts, the behavior was added as a metric. The behaviors were placed in checklist format such that observers could simply check whether or not the behavior was demonstrated. This methodology and measurement scheme has been applied with different types and levels of aviators, such as with undergraduate naval aviators (Stout, Salas, & Fowlkes, 1997) to multiservice distributed teams (Dwyer, Oser, Salas, & Fowlkes, 1999). Table 13.1 provides an example of using the SBAT with the TARGETs measurement scheme within the T-44 undergraduate naval aviation community. The last column provides examples of specific targeted behaviors of interest in the scenario, and the column that precedes it shows the trigger event that was embedded into the scenario to elicit the behavior of interest. • As another example, Smith-Jentsch, Zeisig, et al. (1998) described a performance measurement and feedback scheme that they and their colleagues, also at NAVAIR Orlando, developed and applied to a research and development program for training shipboard CIC teams, called “TADMUS” for Tactical Decision Making under Stress. This measurement and feedback scheme was named “TDT” for team dimensional training. This approach focused on training raters to observe different “dimensions” of teamwork (that is, information exchange, communication, supporting behavior, and initiative/leadership) and specific instances of behaviors that occurred in the conduct of a scenario that could be linked to predefined behavioral examples of the particular dimension. For example, under the dimension of information exchange, a predefined behavioral example was “providing periodic situation updates that summarize the big picture.” During the performance of a scenario, different specific behavioral examples of team members providing these status updates, or failing to do so when they should, would be collected. When practical, the approach used different observers for each dimension who then pulled their responses at the end of the scenario and helped the facilitator to organize the debrief around the four dimensions. This approach used a concept of “guided team self-correction” in which a facilitator or a team leader helps team members to provide their own behavioral examples first and encourages all team members to do so (see also more on teaching team self-correction under “Feedback” later in this chapter).
Use tools to assist in capturing team performance data (Cannon-Bowers, Burns, et al., 1998). • For example, Cannon-Bowers, Burns, et al. (1998) and their colleagues used various tools to capture performance in the TADMUS research project, such as the Shipboard Mobile Aid for Training and Evaluation (ShipMATE). They described ShipMATE as a handheld computer that allowed observers to “1. Make written and spoken observations of trainee performance, 2. capture team communications and graphic displays
Table 13.1.
Sample Use of SBAT with TARGETs Measurement Scheme
Task Analysis
Review of literature on teams and team training; attendance of a variety of ACT courses; review of T-44 training curriculum; detailed interviews with T-44 instructors
Targeted Competency
Generic Behavioral Component
Communication
Acknowledge communication (for Situational Awareness example, OK, roger) Identify problems or potential problems Verbalize a course of action Demonstrate awareness of task
Training Objective
Scenario Trigger Event
Performance Measure (TARGETs)
Trainee shall acknowledge communications in a scenario event involving icing conditions
Ice buildup on the wings; passenger makes note of observation
Acknowledge passenger who observed ice buildup
Trainee shall identify problems or potential problems in a simulation event involving icing conditions (same for verbalize course of action and demonstrate task awareness)
Discuss implications of icing Ask air traffic control about icing Make new plan Consult flight handbook
Adapted from Stout et al. (1997). Enhancing teamwork in complex environments through team training. Group Dynamics: Theory, Research, and Practice, 1(2), 169–182.
290
VE Components and Training Technologies of the scenario related to those observations, 3. track and preview significant events, and 4. make specific, event based observations with cuing from the system” (Cannon-Bower, Burns, et al., 1998, p. 371).
Feedback Provide feedback to the team as a whole and to individual members of the team (Prince et al., 1993). Encourage all team members to provide input during feedback sessions (Smith-Jentsch, Zeisig, et al., 1998). • Ensure that input does not focus on one or only a few team members (Smith-Jentsch, Zeisig, et al., 1998).
Recap key events at the beginning of the feedback session (Smith-Jentsch, Zeisig, et al., 1998). Summarize the feedback session at its conclusion (Frink, 1981, as cited in Prince et al., 1993). (Note this guideline and the preceding one are potentially applicable to training individual higher level cognitive skills.) Teach team leaders to facilitate the feedback session (Smith-Jentsch, Zeisig, et al., 1998) for some scenarios. Leaders should do the following: • Provide task-focused versus person-oriented feedback (Tannenbaum et al., 1998). • Ask for examples of effective and ineffective behavior prior to stating their own observations (Salas et al., 2000; Smith-Jentsch, Zeisig, et al., 1998).
Ask team members for specific examples of effective and ineffective behaviors versus generalities (Salas et al., 2000; Smith-Jentsch, Zeisig, et al. 1998) (for example, “all team members used correct brevity codes” versus “our communications were good”).
• Give self-critiques (Tannenbaum et al., 1998). • Accept feedback from others (Tannenbaum et al., 1998; Smith-Jentsch, Zeisig, et al., 1998). • Encourage participation from all team members (Oser et al., 1999; Ross et al., 2005; Smith-Jentsch, Zeisig, et al., 1998).
Reinforce this participation (Smith-Jentsch, Zeisig, et al., 1998).
• Make eye contact with all team members after asking for input (Smith-Jentsch, Zeisig, et al., 1998). • Guide team members in providing constructive input (Smith-Jentsch, Zeisig, et al., 1998).
(Note all of these behaviors apply to facilitators as well, except for giving a selfcritique.)
Teach team members self-correction skills (Blickensderfer, Cannon-Bowers, & Salas, 1994; Salas et al., 2000; Smith-Jentsch et al., 1998).
Guidelines for Using Simulations to Train
291
• Give the team members an opportunity to critique their own performance (Prince et al., 1993). • Encourage all team members to participate in self-correction (Salas et al., 2000). • When an instructor is not present, reviews should focus on the same types of questions used when instructors are present, such as adjustments that would be made to actions based on the outcomes of the scenario (Ross et al., 2005).
Encourage team members to include planning and strategizing in self-correction (Salas et al., 2000).
If trainees are failing at lower level procedural skills than the team training learning objectives intended to focus upon, “punt” and remediate these lower level skills (Kirlik et al., 1998). • Incorporating real time automated system feedback on these types of skills during the conduct of team training exercises can free the instructor up for the higher level feedback (Kirlik et al., 1998).
It can also improve standardization, timeliness, and diagnostic precision of feedback, as well as reduce distractions caused by facilitator-trainee interactions, especially when performing time critical, dynamic tasks (Kirlik et al., 1998).
Use tools to organize the team feedback session (Cannon-Bowers, Burns, et al., 1998). • For example, Cannon-Bowers, Burns, et al. (1998) indicated that one of the purposes of the ShipMATE tool, described earlier, was to aid shipboard instructors in preparing for and providing a debrief on team training scenarios.
Logistic Issues Provide a prebrief telling the team that the focus of the training is on teamwork processes versus performance outcomes (Smith-Jentsch, Zeisig, et al., 1998). Ensure that team members are proficient on their individual technical tasks before participating in a teamwork-oriented scenario (Kirlik et al., 1998; Kozlowski, 1998; Swezey & Salas, 1992). • “Effective team training is founded on solid individual training . . . When individuals lack knowledge or competence in their own areas, they cannot focus their attention effectively on team processes or performance” (Kozlowski, 1998, p. 139). • Technical skill is necessary but not sufficient when teams operate in high stress environments; stress exposure training is also needed (Driskell & Johnston, 1998). • (Note guidance and feedback can be provided on individual technical skills during a teamwork-oriented scenario, but the focus should be on teamwork skills. The current authors observed costly training of multiple aircraft personnel conducting antisubmarine warfare exercises in which radar operators had not performed radar functions in quite awhile. As a result, much of the week’s training was spent teaching them basic radar operations so that they could eventually meaningfully participate in the team exercise, wasting valuable training opportunities, as well as resources).
292
VE Components and Training Technologies
Allow adequate time for a thorough team debrief involving all team members (Smith-Jentsch, Zeisig, et al., 1998). CONCLUSION The guidelines proposed in this chapter follow the SBAT. There are many guidelines provided in this chapter, and the users of these guidelines can choose those that most suit their training needs. The most critical guideline to keep in mind, however, is that a thorough task or needs analysis must be conducted to develop effective learning objectives, and everything else that is done in the SBAT process must ensure that these learning objectives are met. Furthermore, if the simulations are not designed to support these learning objectives, the training delivered via these simulations will not be effective.2 REFERENCES Baker, D. P., Prince, C., Shrestha, L., Oser, R., & Salas, E. (1993). Aviation computer games for crew resource management training. The International Journal of Aviation Psychology, 3, 143–156. Baldwin, T. T. (1992). Effects of alternative modeling strategies on outcomes of interpersonal-skills training. Journal of Applied Psychology, 77(2), 147–154. Beaubien, J. M., & Baker, D. P. (2004). The use of simulation for training teamwork skills in health care: How low can you go? Quality and Safety in Health Care, 13 (Suppl. 1), i51–i56. Blickensderfer, E., Cannon-Bowers, J. A., & Salas, E. (1998). Cross training and team performance. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 299–311). Washington, DC: American Psychological Association. Blickensderfer, E. L., Cannon-Bowers, J. A., & Salas, E. (1994). Feedback and team training: Team self-correction. Proceedings of the 2nd Annual Mid-Atlantic Human Factors Conference (pp. 81–85). Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: Handbook I. Cognitive domain. New York: David McKay. Brannick, M. T., Prince, C., & Salas, E. (2005). Can PC-based systems enhance teamwork in the cockpit? The International Journal of Aviation Psychology, 15(2), 173–187. Brannick, M. T., Prince, C., Salas, E., & Stout, R. (1995, April). Assessing aircrew coordination skills in TH-57 pilots. In C. Bowers & F. Jentsch (Chairs), Empirical research using PC-based flight simulations, Symposium conducted at the 8th International Symposium on Aviation Psychology, Columbus, OH. Cannon-Bowers, J. A., Burns, J. J., Salas, E., & Pruitt, J. S. (1998). Advanced technology in scenario-based training. In J. A. Cannon-Bowers & E. Salas (Eds.), Making 2 This project was supported by the Department of Navy, Office of Naval Research through the University of Central Florida under ONR Award No. N00014-07-1-0098. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Office of Naval Research or the University of Central Florida.
Guidelines for Using Simulations to Train
293
decisions under stress: Implications for individual and team training (pp. 365–374). Washington, DC: American Psychological Association. Cannon-Bowers, J. A., & Salas, E. (1997). A framework for developing team performance measures in training. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance assessment and measurement: Theory, methods and applications (pp. 45–62). Mahwah, NJ: Lawrence Erlbaum. Cannon-Bowers, J. A., & Salas, E. (1998). Individual and team decision making under stress: Theoretical underpinnings. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 17–38). Washington, DC: American Psychological Association. Cannon-Bowers, J. A., Salas, E., Blickensderfer, E. L., & Bowers, C. A. (1998). The impact of cross-training and workload on team functioning: A replication and extension of the initial findings. Human Factors, 40, 92–101. Cannon-Bowers, J. A., Tannenbaum, S. I., Salas, E., & Volpe, C. E. (1995). Defining team competencies and establishing team training requirements. In R. Guzzo & E. Salas (Eds.), Team effectiveness and decision making in organizations (pp. 333–380). San Francisco: Jossey-Bass. Caro, P. W. (1973). Aircraft simulators and pilot training. Human Factors, 15(3), 502–509. Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4, 55–81. Cicchinelli, L. F., Harmon, K. R., Keller, R. A., & Kottenstette, J. P. (1980). Relative cost and training effectiveness of the 6883 three-dimensional simulator and actual equipment (Rep. No. AFHRL-TR-80-24). Brooks Air Force Base, TX: Air Force Human Resources Laboratory. Cohen, D. J. (1990, November). What motivates trainees? Training and Development Journal, 44(11), 91–93. Cohen, M. S., Freeman, J. T., & Thompson, B. (1998). Critical thinking skills in tactical decision making: A model and a training strategy. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 155–189). Washington, DC: American Psychological Association. de Jong, T., & van Joolingen, W. R. (1998). Scientific discovery learning with computer simulations of conceptual domains. Review of Educational Research, 68(2), 179–201. Driskell, J. E., & Johnston, J. H. (1998). Stress exposure training. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 191–217). Washington, DC: American Psychological Association. Dwyer, D. J., Oser, R. L., Salas, E., & Fowlkes, J. E. (1999). Performance measurement in distributed environments: Initial results and implications for training. Military Psychology, 11(2), 189–215. Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1989). The nature of conceptual understanding in biomedicine: The deep structure of complex ideas and the development of misconceptions. In D. A. Evans & V. L. Patel (Eds.), Cognitive science in medicine: Biomedical modeling (pp. 113–172). Cambridge, MA: The MIT Press. Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1993). Learning, teaching, and testing for complex conceptual understanding. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test theory for a new generation of tests (pp. 181–215). Hillsdale, NJ: Lawrence Erlbaum.
294
VE Components and Training Technologies
Fowlkes, J. E., Lane, N. E., Salas, E., Franz, T., & Oser, R. (1994). Improving the measurement of team performance: The TARGETs methodology. Military Psychology, 6, 47–61. Frink, A. (1981). Performance evaluation and assessment. In J. K. Lauber & H. C. Foushee (Eds.), Guidelines for the development of line-oriented flight training: Vol. 2. Proceedings of a NASA/industry workshop (NASA Conference Publication No. 2184, pp. 122– 126). Moffet Field, CA: NASA Ames Research Center. Hagman, J. D., & Rose, A. M. (1983). Retention of military tasks: A review. Human Factors, 25(2), 199–213. Hays, R. T. (2006). The science of learning: A systems theory approach. Boca Raton, FL: Brown Walker Press. Hays, R. T., Jacobs, J. W., Prince, C., & Salas, E. (1992). Flight simulator training effectiveness: A meta-analysis. Military Psychology, 4(2), 63–74. Hays, R. T., Stout, R. J., & Ryan-Jones, D. L. (2005). Quality evaluation tool for computer- and web-delivered instruction (Rep. No. NAWCTSD TR-2005-2). Orlando, FL: Naval Air Warfare Center Trainig Systems Division. (ADA 435 294). Huczynski, A. A., & Louis, J. W. (1980). An empirical study into the learning transfer process in management training. Journal of Management Studies, 17, 227–240. Ilgen, D. R., Hollenbeck, J. R., Johnson, M., & Jundt, D. (2005). Teams in organizations: From input-process-output models to IMOI models. Annual Review of Psychology, 56, 517–543. Jacobs, J. W., Prince, C., Hays, R. T., & Salas, E. (1990). A meta-analysis of flight simulator training research (Tech. Rep. No. 89-006). Orlando, FL: Naval Training Systems Center. Jentsch, F. G. M. (1997). Metacognitive training for junior team members: Solving the copilot’s catch 22. Unpublished doctoral dissertation, University of Central Florida, Orlando. Jentsch, F., & Bowers, C. (1998). Evidence for the validity of PC-based simulations in studying aircrew coordination. The International Journal of Aviation Psychology, 8, 243–260. Jentsch, F., & Bowers, C., & Salas, E. (2001). What determines whether observers recognize targeted behaviors in modeling displays? Human Factors, 43(3), 496–507. Johnston, J. H., Poirier, J., & Smith-Jentsch, K. A. (1998). Decision making under stress: Creating a research methodology. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 39–59). Washington, DC: American Psychological Association. Kirlik, A., Fisk, A. D., Walker, N., & Rothrock, L. (1998). Feedback augmentation and part-task practice in training dynamic decision-making skills. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 91–113). Washington, DC: American Psychological Association. Kozlowski, S. W. J. (1998). Training and developing adaptive teams: Theory, principles, and research. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 115–153). Washington, DC: American Psychological Association. Lauber, J. K., & Foushee, H. C. (Eds.). (1981). Guidelines for the development of lineoriented flight training: Vol. 2. Proceedings of a NASA/industry workshop (NASA Conference Publication No. 2184). Moffet Field, CA: NASA Ames Research Center.
Guidelines for Using Simulations to Train
295
Leutner, D. (1993). Guided discovery learning with computer-based simulation games: Effects of adaptive and non-adaptive instructional support. Learning and Instruction, 3, 113–132. Magjuka, R. J., Baldwin, T. T., & Loher, B. T. (2000). The combined effects of three pretraining strategies on motivation and performance: An empirical exploration. Journal of Managerial Issues, 6, 282–296. Miller, G. A., (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Oser, R. L., Cannon-Bowers, J. A., Salas, E., & Dwyer, D. J. (1999). Enhancing human performance in technology-rich environments: Guidelines for scenario-based training. In E. Salas (Ed.), Human/technology interaction in complex systems (Vol. 9, pp. 175– 202). Stamford, CT: JAI Press. Perkins, D. N. (1992). Technology meets constructivism: Do they make a marriage? In T. M. Duffy & D. H. Jonassen (Eds.), Constructivism and the technology of instruction (pp. 45–55). Mahwah, NJ: Lawrence Erlbaum. Prince, C., Oser, R., Salas, E., & Woodruff, W. (1993). Increasing hits and reducing misses in CRM/LOS scenarios: Guidelines for simulator scenario development. International Journal of Aviation Psychology, 3(1), 69–82. Ross, K. G., Phillips, J. K., Klein, G., & Cohn, J. (2005). Creating expertise: A framework to guide technology-based training (Final Tech. Rep., Contract No. M67854-04-C8035). Orlando, FL: MARCORSYSCOM PMTRASYS. Roth, J. T. (1998). Improving decision-making skills through on-the-job training: A roadmap for training shipboard trainers. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 345–364). Washington, DC: American Psychological Association. Salas, E., Bowers, C. A., & Rhodenizer, L. (1998). It is not how much you have but how you use it: Toward a rational use of simulation to support aviation training. The International Journal of Aviation Psychology, 8 (3), 197–208. Salas, E., Milham, L. M., & Bowers, C. A. (2003). Training evaluation in the military: Misconceptions, opportunities, and challenges. Military Psychology, 15(1), 3–16. Salas, E., Rhodenizer, L., & Bowers, C. A. (2000). The design and delivery of crew resource management training: Exploiting available resources. Human Factors, 42(3), 490–511. Schendel, J. D., & Hagman, J. D. (1982). On sustaining procedural skills over a prolonged retention interval. Journal of Applied Psychology, 67(5), 605–610. Serfaty, D., Entin, E. E., & Johnston, J. H. (1998). Team coordination training. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 221–245). Washington, DC: American Psychological Association. Sherwin, P. (1981). Scenario design and development issues. In J. K. Lauber & H. C. Foushee (Eds.), Guidelines for the development of line-oriented flight training: Vol. 2. Proceedings of a NASA/industry workshop (NASA Conference Publication No. 2184, pp. 113–118). Moffet Field, CA: NASA Ames Research Center. Simpson, E. J. (1972). The classification of educational objectives in the psychomotor domain. In Contributions of behavioral science to instructional technology: 3. The psychomotor domain: A resource book for media specialists (pp. 43–56). Washington, DC: Gryphon House.
296
VE Components and Training Technologies
Smith-Jentsch, K. A., Johnston, J. H., & Payne, S. C. (1998). Measuring team-related expertise in complex environments. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 61–87). Washington, DC: American Psychological Association. Smith-Jentsch, K. A., Zeisig, R. L., Acton, B., & McPherson, J. A. (1998). Team dimensional training: A strategy for guided team self-correction. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 271–297). Washington, DC: American Psychological Association. Stout, R. J., Prince, C., Salas, E., & Brannick, M. T. (1995, April). Beyond reliability: Using crew resource management (CRM) measurements for training. Proceedings of the 10th International Symposium on Aviation Psychology. Stout, R. J., Salas, E., & Fowlkes, J. (1997). Enhancing teamwork in complex environments through team training. Group Dynamics: Theory, Research, & Practice, 1, 169–182. Stout, R. J., Salas, E., Merket, D. C., & Bowers, C. A. (1998). Low-cost simulation and military aviation team training. Proceedings of the American Institute of Aeronautics and Astronautics Modeling and Simulation Conference (pp. 311–318). Swezey, R. W., & Salas, E. (1992). Guidelines for use in team-training development. In R. W. Swezey & E. Salas (Eds.), Teams: Their training and performance (pp. 219– 245). Norwood, NJ: Ablex Publishing Corporation. Tannenbaum, S. I., Smith-Jentsch, K. A., & Behson, S. J. (1998). Training team leaders to facilitate team learning and performance. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 247–270). Washington, DC: American Psychological Association. Wikipedia. (2007a). Definition of meta-analysis. Retrieved January 16, 2007, from http:// en.wikipedia.org/wiki/Meta_analysis Wikipedia. (2007b). Definition of overlearning. Retrieved January 10, 2007, from http:// en.wikipedia.org/wiki/Overlearning Yusko, K. P. & Goldstein, H. W. (1997). Selecting and developing crisis leaders using competency-based simulations. Journal of Contingencies and Crisis Management, 5 (4), 216–223.
Part III: Training Management
Chapter 14
AFTER ACTION REVIEW IN SIMULATION BASED TRAINING Don Lampton, Glenn Martin, Larry Meliza, and Stephen Goldberg The after action review, or AAR, is a bedrock of modern military training for training teams, units, and organizations. The AAR is a method of providing feedback to units after operational missions or collective training exercises (U.S. Army Combined Arms Center, 1993). It is an interactive discussion, guided by a facilitator or trainer known as an AAR leader. During the AAR, unit members discuss what happened, why it happened, and how to improve or sustain performance in similar situations in the future. The AAR begins with a review of the unit’s mission or goals. Establishing what happened should include exercise controllers, observers, and role-players, including the opposing force. However, the hallmark of the modern AAR technique is the active participation of the trainees. The AAR is used extensively by the U.S. Army and the U.S. Marine Corps. Related procedures, varying in degree of formality, are used in naval and aviation training. While much of the work involved in developing and refining the AAR process was accomplished to support the training of military units, the AAR approach should be relevant to any organization that makes use of collective training methods (for example, police SWAT [special weapons and tactics] teams). The AAR has been applied to nonmilitary government applications (Rogers, 2004) and industry (Darling & Parry, 2001). A thorough history of the AAR process is presented by Morrison and Meliza (1999). This chapter presents the potential training benefits of the use of simulation based AAR tools, as well as some of the technical challenges that may be faced when designing and implementing AAR systems. An overview is presented of the use of AARs in the field of practice of the army, as observed by the current authors and colleagues, based upon army doctrine. The focus of the chapter is the design, implementation, and use of an AAR system developed specifically for use with immersive virtual environment simulations for training small teams. The features and the specific training benefits that this AAR system can provide are described. Issues and challenges faced when designing and implementing this AAR system, and lessons that were learned in doing so, are discussed. Finally,
298
VE Components and Training Technologies
guidance is provided for designing and implementing computer based AARs in general, which were derived from the specific lessons that were learned. OVERVIEW OF THE AFTER ACTION REVIEW The U.S. Army has documented the AAR doctrine extensively in formal training circulars and field manuals, including Training Circular No. 25-20, A Leader’s Guide to After-Action Reviews (U.S. Army Combined Arms Center, 1993) and Field Manual No. 7-1, Battle Focused Training (Department of the Army, 2003). The AAR is considered a valid and valuable technique regardless of branch, echelon, or training task. It can be applied to both live and virtual training environments. In the live environment, the training participants use operational equipment on real world terrain to perform against an opposition force composed of a live opposition force, targets (live fire), or a combination of both. In virtual environments, the training participants use simulated equipment and weapons. Virtual training frequently uses semi-automated forces to populate the virtual environment with computer-generated friendly forces, enemy, neutrals, and civilians. FEEDBACK IN THE AAR PROCESS Holding (1989) pointed out that feedback is a critical component to all skill acquisition. Collective training exercises can provide intrinsic and extrinsic performance feedback (Brown, Nordyke, Gerlock, Begley, & Meliza, 1998). Intrinsic feedback consists of the cues that exercise participants perceive about their own performance. For example, an infantry unit may call in artillery fire on a target. Intrinsic feedback would consist of the unit’s observation that artillery rounds are impacting too far from the intended target. As a result of their perception of their performance, they would have the supporting artillery unit shift fires. Extrinsic feedback consists of information that the exercise participants do not ordinarily have available to them. It can provide insights into how to improve or sustain performance in the future. Participants sometimes end an exercise with a limited perspective regarding what happened, based upon the information available to them and what they saw, heard, and otherwise sensed (intrinsic feedback). This limited perspective is referred to as “perceived truth.” “Ground truth” is the term used to describe the actual situation that occurred. Trainees are often not able to perceive all that is going on around them, so perceived truth may frequently differ from ground truth. Events may be happening quickly and may be open to differing interpretations. Perceptions and memories of the occurrence, sequence, and timing of events can be greatly distorted leading to generation of causal relationships that are not based on the actual facts (Goldberg & Meliza, 1993). Sometimes exercise participants recognize the impacts of their actions via intrinsic feedback, but at other times they are not aware of these impacts until they receive extrinsic feedback. Extrinsic feedback can be used to correct misperceptions and clarify events and effects. The AAR process may provide unit
After Action Review in Simulation Based Training
299
members with a view of collective (team, unit, or organizational) performance that was not apparent to, or viewable by, any one participant during an exercise (Meliza, 1999), including the trainers who were observing the exercise. The AAR uses a Socratic method in which a series of leading and open-ended questions are used by an AAR leader to help those in the training audience discover what happened and why. A debrief or critique conducted by one or more observers of a training exercise is an alternative to the AAR (Scott & Fobes, 1982; Hoare, 1996). Historically, the critique preceded the AAR as the way of providing feedback by trainers. A person who had observed a training exercise lectured the trainees. A major difference between the AAR and the critique is that the critique provided the training participants with conclusions reached by the person giving the critique rather than facilitating the training participants to reach their own conclusions. Critiques could easily be construed as criticism because the opinions expressed were based on perceptions, judgments, and possibly misinterpretations of ground truth. Morrison and Meliza (1999) note that the critique often focused on errors committed and created a defensive atmosphere among the trainees. In contrast to the critique method, in the modern AAR method the leader functions as a discussion facilitator. Training participants are expected to examine their performance through guided self-evaluation. They are encouraged to identify their problems and develop approaches to correct them. It has been suggested that use of the AAR feedback method results in units taking ownership of the diagnosis of problems and the corrective actions they identify (Scott & Fobes, 1982). AAR FUNCTIONALITIES TO SUPPORT FEEDBACK To be effective and efficient the AAR leader needs one or more starting points for the discussion and at least a general idea of where the discussion will head. The job of the AAR leader is made easier to the extent that he or she is already aware of the types of problems the unit has been experiencing. If all an AAR leader knows about a mission is that a unit sustained heavy casualties, the Socratic method will take a long time to identify the root causes of the problem. If the AAR leader knows that most of the casualties occurred within a few minutes of making contact with the enemy and that few friendly vehicles returned fire upon contact, then that leader is closer to identifying and understanding what happened and why. In virtual simulations, AAR aids prepared from electronic data streams can document or illustrate aspects of performance that are close to the root causes of weaknesses and strengths. Developments in battlefield simulation technology have made it possible to provide AAR leaders with an electronic record describing unit or individual location, firing events, and communications over the course of an exercise. AAR software systems allow these data to be converted into a variety of AAR aids demonstrating critical aspects of unit performance (Meliza, 1999). These aids can be used during an AAR to describe or illustrate ground truth. For example, a graph showing the number of rounds fired by each vehicle in a platoon over time may make the point that only one of the vehicles in the
300
VE Components and Training Technologies
platoon fired during the first five minutes of an engagement. To gain this information from the AAR process, a unit would have to slowly reconstruct the sequence of events based on the unit members’ memories. AAR aids also offer the benefit of providing units with demonstrable ground truth when their recollections are at odds with what actually happened. To the extent that AAR aids illustrate the root causes of exercise events, they expedite the AAR process. AAR aid generation capabilities that examine exercise data streams to check specific aspects of performance offer a means of helping AAR leaders and units diagnose strengths and weaknesses. The most frequently used AAR aid is a sequential replay of exercise events. A replay, however, is not necessarily the most efficient or effective way of illustrating key aspects of performance. Long segments of the exercise may contain no individual events that are significant in isolation, and, therefore, AAR aids that summarize activity over a period of time can be more effective. A graphic showing shot lines (lines connecting shooter location to impact location) aggregated over a specific period of time can quickly show which potential targets were engaged and by whom during the period of interest. A Recently Developed AAR Tool for Virtual Environments In the 1999 to 2002 time frame, as part of an overall project to develop capabilities for simulation based training of dismounted combatants, the Dismounted Infantry Virtual AAR System (DIVAARS) was developed. The goal was to develop an AAR system that incorporated lessons learned from earlier AAR systems and was tailored to the unique requirements of small unit dismounted infantry training in a virtual environment. An emphasis was placed on being able to meet the special challenges of urban environments for military operations and training. The challenges are primarily visual in that buildings and other structures break up the visual field and limit the portion of the battlefield that can be observed by any one person (Lampton, Clark, & Knerr, 2003). This required an AAR system that could not only replay an exercise, but could also support the AAR goals of presenting exercise events and data in a manner that would facilitate trainee understanding of what happened, why it happened, and how to improve. DIVAARS recreates exactly what happened during the mission. During the replay the unit members can observe the location, posture, and actions of all the other members. DIVAARS can replay mission action exactly as viewed by any of the participants, providing the trainees with perspectives that would not be available with live training. These features not only support the trainees’ explanation of why events happened, but also may help the unit members develop shared mental models of individual and unit tasks. Watching the replay may also strengthen group identification and cohesiveness. Finally, several DIVAARS features, such as depicting critical events in slow motion and from multiple perspectives, may enhance memory so those lessons learned are more likely to be employed in subsequent training and missions.
After Action Review in Simulation Based Training
301
DIVAARS Features Playback A linear beginning-to-end playback is unlikely to be either the best or most efficient way to provide the trainees with an understanding of what happened during an exercise. The replay system includes such actions as pause, stop, play, step-forward, fast-forward, rewind, fast-reverse, and step-reverse. Variable playback speeds are available. In addition, the AAR leader has the capability to mark significant events during the exercise and jump directly to them during the AAR. Viewing Modes Viewing scenario events from different perspectives can support understanding what happened. Multiple viewing modes are available during both the exercise and the AAR. Ten preset views can be selected at any time prior to or during the exercise for immediate use. These can be used for perspectives or positions that the AAR leader thinks will be useful, such as the view from an enemy position. The variety of viewing modes provides added capabilities during the AAR process. • Top-down view—A view of the database looking straight down from above. It can be moved left, right, up, down, and zoomed in or out. The AAR leader can also lock the view onto an entity, in which case it will stay centered directly above that entity as it moves through the database. • Two-dimensional (2-D) view—This is the traditional plan view display. It is the same as the top-down view except that depth perspective is not shown. • Entity view—By selecting any entity (including enemy or civilian), the AAR leader can see and display exactly what that entity sees. This includes the effects of head turning and posture changes. • Fly Mode—The AAR leader can “fly” through the database using the mouse for control.
During the course of a replay the trainees will be able to see the mission from a number of perspectives. The top-down, 2-D, and fly views, views that are never available to trainees during the mission exercise, promote seeing the big picture and learning to see the battlefield. The entity view, seeing through the eyes of others, supports a number of training functions. Did the leaders see an action or problem, but fail to respond, or were they not looking in the right direction at all? Do squad members maintain 360° security and report promptly? What was the view from likely and actual enemy positions? Movement Tracks Movement tracks show, in a single view, the path an entity traveled during an exercise. Markers are displayed at fixed time intervals. Every fifth marker is a different shape than the four preceding it. The display of these markers can be turned on and off. The movement tracks provide a clear display of the path and speed of movement of each member of the unit. In addition, they provide
302
VE Components and Training Technologies
indications of the unit formations and of the location and duration of halts in movement. Thus, the AAR leader may elect to skip or fast-forward through portions of the replay, knowing that the movement traces for those skipped segments will be observable when the replay is resumed. Entity Identifier A unique identifier is shown above the avatar (virtual representation) of each unit member. For example, 2SL is the identifier for the squad leader, second squad. The entity identifiers change size to be readable across all levels of zooming. Viewing Floors of a Building The AAR leader needs to be able to follow the action in military operations on urban terrain (MOUT) scenarios even when a unit enters a building. The AAR leader can select a building and then select a floor of that building to be displayed. Using this feature, the operator can view and display the avatars going through a building without the problem of upper floors being in the way. Munition Visualizations This feature helps to determine what objects are being shot by each entity and to identify patterns of unit fire. Bullet flight lines, artillery arcs, and missile paths are shown as appropriate for all weapon firings. Each visualization is the same color as the originating entity and gradually fades away after the shot. Event Data Collection and Display DIVAARS has the capability to track many events, including shots fired, kills by entities, movement, and posture changes. These data can be shown in a tabular format or graphical display. The AAR leader can use them as needed to make various teaching points. They can also be used to support subsequent data analysis for research and development applications. Critical incident events are automatically flagged, reducing workload on the AAR operator, and can be jumped to during playback. Examples of critical events are the first shot fired during a mission and fratricides. The ability to jump to marked events allows the AAR leader to quickly access a number of related events to support a discussion theme, such as quality of reporting, rather than dealing with incidents in the order they occur in a sequential replay. Security, use of resources, and mission tempo are examples of themes we have observed experienced AAR leaders address. Ten different tables and graphs are available: • Shots fired, by entity and unit; • Kills, by entity and unit; • Killer-victim table that shows who killed whom, with the option to show the angle of the killing shot (front, flank, or back) or the posture of the victim (standing, kneeling, or prone);
After Action Review in Simulation Based Training
303
• Shots as a function of time, by entity, unit, and weapon; • Kills as a function of time, by entity, unit, and weapon; • Kills by distance from killer to victim, by entity, unit, and weapon; • Rate of movement of each entity, and aggregated at fire team and squad levels; • Percentage of time friendly units were stationary; • Percentage of time friendly units were in different postures; • Display of user-defined events.
DIVAARS Evaluation and Utilization DIVAARS was developed as part of a comprehensive program to develop capabilities for dismounted combatant virtual training. It was evaluated within the context of the exercises conducted as part of the overall research program. Overall, DIVAARS was rated very highly by soldiers. Table 14.1 contains soldier ratings of the system’s capability to present information. The data represent soldiers’ opinions drawn from a number of different projects. The conditions for each year differed in the composition of the teams of trainees, the individual AAR facilitators, the suites of virtual environment (VE) technologies used, and the mission scenarios. These high ratings reflected favorably not only on the DIVAARS capabilities to support the goals of the AAR process, but also on the skill of the facilitators who lead the AARs. Although the terminology varied, all of the facilitators directed the AAR discussions to identify “improves” and “sustains.” Improves addressed aspects of team performance that did no go well and how to do better the next time. Sustains identified and reinforced successful aspects of performance and provided positive notes to the AAR. Since its development trials in 2001 and 2002, DIVAARS was used as the AAR tool in the Virtual Integrated MOUT Training System testing at Fort Campbell, Kentucky, in 2004 (Knerr & Lampton, 2005). It was used in tests of Table 14.1. Ratings of DIVAARS by Soldiers Participating in Dismounted Soldier Simulation Exercises The AAR System Made Clear
What happened during a mission Why things happened the way they did during a mission How to do better in accomplishing the mission *
Ratings
2001
2002
2004
2005
SA* A Total SA A Total SA A Total
44% 56% 100% 44% 39% 83% 28% 56% 84%
82% 12% 94% 76% 24% 100% 71% 24% 95%
62% 31% 93% 46% 35% 81% 54% 38% 92%
68% 32% 100% 62% 35% 97% 69% 23% 92%
Note: SA = strongly agree; A = agree.
304
VE Components and Training Technologies
wearable-computer dismounted soldier training systems. DIVAARS was included in the suite of capabilities making up the U.S. Navy’s Virtual Technologies and Environments (VIRTE) program.
Issues in After Action Review System Development Design of Virtual AAR Systems AAR systems are typically based on their planned use (for example, live versus virtual and/or domains such as military, medical, and so forth) and use varying technologies to fulfill their missions. Such systems range from straight analytical systems that produce only tables and graphs of performance to full reproduction virtual and live systems that use 3-D rendering or video recordings. Our focus is on the full virtual systems with a slight discussion of live and analytical systems. Virtual AAR systems are either independent applications or dependent applications that use the simulation as a part of the AAR system. An independent AAR system records and plays back all data within it, handling all recording and all rendering. Alternatively, a dependent AAR system could record all data and retransmit them back to existing simulators for rendering (such as a stealth viewer, which renders the simulated environment). Conceptually, every AAR system must run in a minimum of two phases. The first, the recording phase, is when all data are processed and stored. The data may be stored exactly as received, or they may be sampled at regular time intervals. The former has the advantage of storing exactly what was actually transmitted, while the latter simplifies later use of the data (although at the cost of storing more data as the samples would be recorded at a higher frequency than the transmitted data). In addition, an AAR system could include prerecording, preplayback, and post-playback phases to provide an AAR leader the ability to preplan the AAR and the ability to record notes on the exercise for later use. For a system that stores actual data, data playback can bring some issues. The primary problem is that the data must be rerendered to fill in necessary gaps. For example, an application built on the distributed interactive simulation (Institute of Electrical and Electronics Engineers [IEEE] Computer Society, 1996), or DIS, protocol that stores the actual protocol data units (PDUs), or network packets, would be required to perform dead reckoning on the data in order to complete a playback rendering. Although this is initially straightforward, dead reckoning must be altered to handle varying speeds. DIS uses a “heartbeat” and a “timeout” value to handle simulators joining an exercise or leaving an exercise (including as a result of a system crash). Entity updates are typically sent every 5 seconds (the heartbeat), and entities for which no update has been received for 12 seconds are dropped (the time-out). For example, in “slow motion” replay mode an AAR system has to adjust the DIS heartbeat and time-out as another PDU may not “arrive” until a scale factor later (for example, replay at 0.25 speed would use a heartbeat of 20 seconds and a corresponding time-out time).
After Action Review in Simulation Based Training
305
Similarly, dead reckoning must handle a “pause” function in the AAR system and stop updating entities. Whether a system stores actual data or sampled data, playback still must address some important issues. These issues largely come from manipulating the data as the user plays back segments of the exercise. As the replay simulation time is altered, changes to the virtual environment must be addressed in appropriate fashion. First, rewinding is a major issue during playback. If a system does not store samples of the position and orientation (that is, it is a nonsampling system), then velocity and acceleration parameters within the dead reckoning process must be negated to process correctly. In effect, the application must “rewind” the entities by simulating them in reverse. In addition, changes made to the virtual environment during the exercise in all virtual AAR systems must be “undone” as necessary. This can include relatively simple things, such as closing recently opened doors and removing textures of weapons effects from the sides of buildings. However, more complex changes must also be undone. If dynamic terrain is supported (for example, a hole blown into the side of a building in a breaching operation), then it must also be undone during a rewind operation (for example, the hole must be filled back in). Second, jumping directly from one time to another, without “playing” the intervening events, must be handled. One major characteristic that makes an AAR system different from a replay system is the ability to jump through the data stream, show and discuss what is necessary, and then go on to a next exercise. To accomplish a jump through time, AAR systems must update each entity within the world without ignoring the effects of key events that occurred during that jumped time period. For example, while updating entities is a relatively straightforward process, such interactions as weapon fire and dynamic terrain changes still need to be performed and not skipped. It becomes necessary to build a file index that includes a notion of “important” data packets that get processed even during a jump operation. It is important to note here the dependency on simulation time of these events. Simulators are often built that either lack a time concept (that is, do not place a timestamp on data at all) or ignore it (that is, do not fill in correct times). Whereas most simulators can simply process data once received, properly timestamped data are essential for AAR system playback. Without timestamps, the AAR system cannot know when a particular event should be processed relative to the events surrounding it. In a worst case scenario, the AAR system can place a timestamp on data as they are received before storing them.
Voice Communication Voice and other fast streaming types of data warrant special attention in any AAR system that processes the data directly (whether virtual, live, or analytical). Such data are time sensitive and require the AAR system to keep pace both in recording and rendering.
306
VE Components and Training Technologies
During recording, the data must be retrieved from the communication mechanism (typically, the simulation network) quickly to avoid falling behind. Otherwise, the result is voice communication that does not synchronize properly with the simulation events. As the exercise continues, the discrepancy will grow and become more noticeable. However, addressing this issue alone is not enough. During rendering at playback, the AAR system can have almost the opposite problem. As the voice data are retrieved from the recording, they must not only be read from the recording quickly enough (similar to reading from the network), but also rendered quickly enough. In an AAR system that is performing multiple tasks, keeping up with rendering can be a difficult issue. In addition, such an AAR system must monitor itself and realize when it has fallen behind. To handle the recording and reading issue, a separate computational thread is often beneficial to provide the AAR system enough running time to accomplish handling all the data. For rendering, another separate thread is also beneficial so that the audio card can receive the data in time. A starved rendering system will come across as sounding as if it has a stuttering problem with small gaps between syllables. It is important to note that a rendering thread also must monitor itself to make sure it does not fall behind too much. Such latency will cause the voice communication to become offset from the simulation events (much like not reading from the network quickly enough). In this case, it is often better to drop some packets (and accept the slight audio glitch) to synchronize the voice data back with the simulation data. An AAR Engine Whether military or another domain, there are many capabilities needed in an AAR system. These include recording and playback of scenario data with full ability to pause, rewind, and jump to specific events. In addition, support for a graphical user interface and potentially other interfaces (such as an electronic whiteboard) could be included. On the other hand, there are also some features that may be specific for each domain. For example, a military AAR system would include visualization of weapon effects, but an AAR system for working with patients in cognitive rehabilitation might visually highlight all the key elements in the environment necessary for making coffee. To address these issues, an engine for after action reviews could provide these capabilities, based on a fundamental architecture of common functionalities coupled with a plug-in architecture for specific domain features. In addition, the plug-in architecture allows users to add their own new capabilities to include in their AAR system. In order to allow the core functions and the plug-in modules to communicate, an event or messaging system would be required that allows all the components to communicate. The System of Object Based Components for Review and Assessment of Training Environment Scenarios (SOCRATES) is one such engine (University of Central Florida, 2007). SOCRATES is not a new general-purpose AAR system, but rather it is an AAR engine. As such, it provides the common functionalities across all potential AAR systems into a single foundation for all training
After Action Review in Simulation Based Training
307
environment scenarios (much like a game engine provides the common needs of games). This architecture allows other modules to be loaded to essentially create a new after action review system. In addition to providing flexibility, the plug-in architecture also provides the capability not to load a feature. If a particular review system is being used on a less capable machine (such as a wearable computer of an embedded simulation), some features could be disabled. DIVAARS is an example of an AAR system built upon the SOCRATES AAR engine. Each military-specific capability (such as viewing inside a building or shot visualization) is implemented as a plug-in. However, an AAR engine also allows the development of AAR systems in new domains that have not previously used the AAR process. For example, specialists in cognitive rehabilitation have recently been using modeling and simulation to increase cognitive ability in brain injury patients (Fidopiastis et al., 2005). As a part of this, they desire the ability to review each patient’s performance whether in mock scenarios (such as a mock kitchen as a part of a rehabilitation clinic) or in virtual/augmented environments. An AAR system here would provide the ability to record and review a session with a patient in a cognitive rehabilitation setting.
DISTRIBUTED AARS AAR systems have typically been designed to support multiple trainees at a single location. However, it is also necessary to conduct training, and AARs, within a distributed training environment with trainees at multiple physical sites (the extreme being each trainee at his or her own location). A distributed AAR system presents additional issues. First, a distributed AAR system has to address interface issues. In an AAR leader based setting, there must be an auditory and visual component to the AAR. Each user must be able to communicate verbally across the distributed links. In addition, the rendering must occur at each user station. This requires the creation of a client AAR application that will receive commands and updates from the AAR leader’s master station. Typically, AAR sessions would occur in a group setting with the AAR leader standing in front of the trainees facilitating discussion. In a distributed setting, this is not the case, but elements of the face-to-face review need to be introduced. Some method allowing the AAR leader to highlight items and “point” to things would be required. A “telestrator” much like those used in sports broadcasts can be very useful in this setting (Martin & Cherng, 2007). Beyond the interface, there are also communication issues between the master and client stations within the distributed AAR. Some communication protocol must be created for the master station to control the client stations (to allow the AAR leader to “play” the exercise, control the viewpoint, send out telestrator visuals, and so forth). SOCRATES uses an Extensible Markup Language (XML) based protocol to achieve this requirement. The XML is a generalization of the more known HyperText Markup Language used in the World Wide Web and allows generic data to be tagged qualitatively (its text based characteristic makes
308
VE Components and Training Technologies
it easy to expand and act as a system-independent representation of data). Other systems have created their own binary protocol. Ultimately, a standardized protocol should be created that all AAR systems could implement and on which be based. Communication schemes can also be an issue. In the simplest case, the master station receives all data in recording mode and then transmits simulation data to each client during playback. However, in some systems the network bandwidth available may be reduced, and it may be desirable to limit the network bandwidth required for the AAR. Alternatively, depending on the simulation data, it may actually be cheaper to transmit image data from the master station to the client stations rather than the simulation data itself. Finally, depending on the network technologies used, there can also be storage issues in a distributed AAR system. If the network is severely limited, each embedded simulation may store its own data for AAR or limit transmission to a few nearby locations. In this case the “master” AAR station does not have all the data necessary to reproduce the exercise. A coordination of transmission of the data from client to client would be necessary to address who needs what data for rendering. These last two issues (communication schemes and data storage) are currently the least understood technological research questions in distributed after action review. As research in embedded simulation and limited bandwidth networks progresses, many of these questions will be answered. SUMMARY Although it evolved from live training, the AAR process seems to fit exceptionally well with computer based simulation training. The AAR process and AAR aids have been tailored many times to fit different live and simulated training environments, missions, and unit equipment. The AAR process can work well with computer based simulations to quickly determine what happened during a simulation exercise and to support the discussion of why events occurred and how to do better. However, many challenges remain in implementing costeffective new simulations based on game engine technologies, and identifying the most effective training strategies for their use. Different approaches can be used to build AAR systems, each with its own issues. However, there are a number of core issues common to every AAR system, such as jumping to an event or time and synchronizing voice communications. An “AAR engine” to address these issues for a set of AAR systems can be an advantageous approach. This becomes more evident as distributed AAR sessions are considered and the handling of data across widely distributed nodes becomes an issue. REFERENCES Brown, B. R., Nordyke, J. W., Gerlock, D. L., Begley, I. J., & Meliza, L. L. (1998, May). Training analysis and feedback aids (TAAF Aids) study for live training support (Study
After Action Review in Simulation Based Training
309
Report, Army Project Number 2O665803D730). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Darling, M. J., & Parry, C. S. (2001). After action reviews: Linking reflection and planning in a learning practice. Reflections, 3(2), 64–72. Department of the Army. (2003). Battle-focused training (Field manual No. 7-1). Washington, DC: Author. Domeshek, E. A. (2004). Phase II. Final report on an intelligent tutoring system for teaching battle command reasoning skills (ARI Tech. Rep. No. 1143). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Fidopiastis, C. M., Stapleton, C. B., Whiteside, J. D., Hughes, C. E., Fiore, S. M., Martin, G. A., Rolland, J. P., & Smith, E. M. (2005, September). Human experience modeler: Context driven cognitive retraining and narrative threads. Paper presented at the International Workshop on Virtual Rehabilitation, Catalina Island, CA. Goldberg, S. L., & Meliza, L. L. (1993). Assessing unit performance in distributive interactive simulations: The Unit Performance Assessment System (UPAS). Proceedings of NATO Defence Research Group Meeting, Panel 8: Defence Applications of Human and Bio-Medical Sciences. Training Strategies for Networked Simulation and Gaming (Technical Proceedings No. AC/243(Panel 8)TN/5; pp. 173–182). Hoare, R. (1996). From debrief to After Action Review (AAR). Modern Simulation & Training, 6, 13–17. Holding, D. H. (Ed.). (1989). Human skills (2nd ed.). New York: John Wiley & Sons. IEEE Computer Society. (1996). IEEE standard for distributed interactive simulation— application protocols. New York: Institute of Electrical and Electronics Engineers. Knerr, B. W., & Lampton, D. R. (2005). An assessment of the Virtual-Integrated MOUT Training System (V-IMTS) (ARI Research Rep. No. 1163). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Lampton, D. R., Clark, B. R., & Knerr, B. W. (2003). Urban combat: The ultimate extreme environment. Journal of Performance in Extreme Environments, 7, 57–62. Martin, G. A., & Cherng, M. (2007). After action review in game-based training (Tech. Rep.). Orlando, FL: University of Central Florida, Institute for Simulation and Training. Meliza, L. L. (1999). A guide to standardizing after action review (AAR) aids (ARI Research Product No. 99-01). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Moreno, R. (2004). Decreasing cognitive load for novice students: Effects of explanatory versus corrective feedback in discovery-based multimedia. Instructional Science 32, 99–103. Morrison, J. E., & Meliza, L. L. (1999). Foundations of the after action review process (ARI Special Rep. No. 42). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Rogers, E. (2004). Pausing for learning: Adapting the Army after action review process to the NASA project world (NASA White Paper). Retrieved August 18, 2005, from http:// smo.gsfc.nasa.gov/knowman/documents/whitepapers/Pausing_for_Learning.pdf Scott, T. D., & Fobes, J. L. (1982). After action review guidebook I. National Training Center (ARI Research Product No. 83-11). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
310
VE Components and Training Technologies
University of Central Florida. (2007). SOCRATES Overview. Retrieved August 2, 2007, from http://www.irl.ist.ucf.edu/index.php?option=com_content&task=view&id=26&Itemid=81 U.S. Army Combined Arms Center. (1993). A leader’s guide to after-action reviews (Training Circular No. 25-20). Fort Leavenworth, KS: Author.
Chapter 15
INTERFACING INTERACTIVE 3-D SIMULATIONS WITH LEARNING SYSTEMS Curtis Conkey and Brent Smith Training and education are a comprehensive process that does not simply consist of the transmission and learning of content. The next generation of instructional technologies will need to have the ability to track and assess individuals and teams at varying levels of proficiency while providing feedback and guidance so that errors in performance can be identified and corrected. Gaming and simulation have great potential to support adaptive learning by placing the learner in a real world environment and allowing the student to learn in context. The contextual experience enables learners to create their own constructs and apply them to unfamiliar situations. The push to adopt more engaging and immersive approaches for training and evaluating human performance in the context of Sharable Content Object Reference Model (SCORM)-managed learning environments relies on a complex partnering of technologies, standards, and specifications. The use of underlying technologies may simplify the assessment process and help standardize the manner in which a student is assessed, but foreseeable challenges must still be resolved. INTERFACING SIMULATIONS WITH LEARNING SYSTEMS Many organizations are engaged in transforming how they train. Many of these training programs are placing a strong emphasis on distance learning technologies, also known as e-learning. One challenge of these efforts is to produce e-learning content that is more than a mere replica of existing text based training. The technologies that enable e-learning continue to advance at a rapid pace. Computerized data management, advanced graphical presentation technologies, networked storage, and accessibility of information continue to influence how organizations capture, manage, and disseminate knowledge; transform that knowledge into learning opportunities; and deliver new learning opportunities to those who need them.
312
VE Components and Training Technologies
The next generation of instructional technologies will need to have the ability to track and dynamically assess individuals and teams at varying levels of proficiency in order to tailor learning content to individual needs while costeffectively providing feedback and guidance so that errors in performance can be identified and corrected. These systems will bring together intelligent tutoring capabilities, distributed subject matter experts, in-depth learning management, and a diverse array of support tools across the continuum of learning experiences that a student will encounter in order to ensure a responsive, learner centric system. The increased interest in interactive three-dimensional (3-D) simulations and game based training applications across the spectrum of education and training suggests their potential for becoming valuable extensions to traditional educational initiatives. These technologies represent the evolution of e-learning from existing page based through low level two-dimensional animated training to fully immersive, three-dimensional training, which places trainees virtually in the relative environment. Simulations and games have great potential to support a range of instructional strategies by placing learners into an environment where they can learn in context. However, simulations are only part of the mix of learning strategies that must be experienced by learners in order to create well-educated students. Blended learning environments will be a key component of future training capabilities. New technologies enable a managed learning environment to seamlessly integrate training management and delivery, to facilitate the exchange of information between disparate simulations and content, and to provide appropriate learner-centric training at the point of need. Current legacy training management systems and new emerging technology systems must be fully and seamlessly integrated to provide a single access and management point for all events and activities. Assessment strategies within an enterprise learning organization should, in the future, include opportunities to continually monitor progress against a discrete set of metrics. Apart from traditional multiple-choice assessments, they should include the ability to dynamically access performance. Here simulations offer a unique capability to modify and insert new training resources into a learning strategy in order for learners to learn from their performances. As a student’s level of proficiency increases, the learning strategy should conform to the evolving skill level of the student, and different feedback formats should provide immediate knowledge of performance thereby increasing the rate of acquisition and retention of learned behaviors. Various “use cases” for using simulations within a managed learning environment include the following: • Stand-alone training—Pedagogy or contextual situations may require simulations that run without concurrent access to the Web or as an asynchronous educational intervention. However, centralized record keeping for student performance and the ability to centrally manage these applications is paramount. • Graduated student coaching—Most pedagogical uses of simulations require learning systems that provide various levels of student feedback. This feedback is characterized in use cases in terms of show me (familiarization), let me (acquire and
Interfacing Interactive 3-D Simulations with Learning Systems
313
practice), and test me (validation), all of which require different levels of learner support. • Performance assessment—Coaching and pedagogy for the “let me try” and “test me” use cases require the capability to track, store, evaluate, and report on a student’s performance. This necessitates formal methodologies and data models for how we collect information from the instructional system and roll these events up into learning objectives that can be used as evidence of competency. In particular, these requirements are defined by critical tasks needed within the simulations and by performance measures defined for these critical tasks. • Concurrent assessment of multiple skills—Instructional strategies require simulations that provide the ability for students to simultaneously demonstrate competency of many complex tasks. A single simulation session can provide feedback on multiple learning objectives. • Dynamic team interaction—Team training is becoming increasingly common as tasks become interdisciplinary and require the interoperation of multiple skill sets. Simulation sessions can allow for controlled interoperational training among individuals while monitoring the achievement of learning objectives both individually and on a team level.
MANAGED LEARNING The rise of computer based training (CBT) in the 1980s and the Internet in the 1990s raised the attractive possibility of practical reuse of training materials across multiple training programs. Long a dream for efficiency and cost-saving purposes, the advent of these technologies gave new urgency to the effort. These efforts would focus on the emerging distance learning training industry that was merging the capabilities of CBT and the Internet to produce a powerful new training capability that allowed for interactive, remote training. Learning management systems (LMSs) became a key component of the emerging e-learning industry. An LMS is based on the concept of a centralized server delivering learning content to users over a network, thus enabling learning content to be widely distributed to anyone with a Web browser and access to the system. However, for this to be practical, standards efforts would be required that allowed for communication between the various noncompatible LMSs that were being developed. To facilitate these efforts the Advanced Distance Learning (ADL) Co-Laboratories were formed in 1997. A major success of the ADL effort has been the creation of the SCORM standard. Its primary goal is to enable the interoperation of LMSs and the reusability of the content in a Web based training environment. Where content design and development are concerned, conformance to SCORM 2004 promotes reusable and interoperable learning resources across multiple learning management systems. Within the SCORM context, the term LMS implies a server based environment in which the intelligence resides for controlling the delivery of learning content to students. This involves gathering student profile information, delivering content to the learner, monitoring key
314
VE Components and Training Technologies
interactions and performance within the content, and then determining what the student should next experience. At its simplest, SCORM 2004 is a model that references a set of interrelated technical specifications designed to meet high level requirements for Web based learning content. Within the SCORM context, content can be described, sequenced, tracked, and delivered in a standardized fashion. SCORM provides a common way to launch content and a common way for content to communicate with an LMS and exchange predefined data elements between an LMS and content during its execution. This enables a “learning objectives” approach to training by providing a means for interoperability between learning content, in the form of sharable content objects (SCOs). An SCO is a collection of assets developed to provide the instructional requirements of a learning objective. It is also the basic building block of the SCORM Content Aggregation Model (CAM). The CAM relates specifically to the assembling, labeling, and packaging of learning content into “content objects.” SCORM requires the delivery of content objects that are sharable and reusable. It provides a tagging mechanism for discovery and access to content, but does not specify a repository structure for the content. This drives the requirement for the Content Object Repository Discovery and Registration/Resolution Architecture (CORDRA), a model for interconnecting repositories capable of learning content management, recovery, and reuse. The requirement for CORDRA comes from the obligation to create SCORM compliant content via Department of Defense Instruction 1322.20. It provides a tagging mechanism for discovery and access to content, but does not specify a repository structure for the content. CORDA meets that need. In CORDA, each content object consists of a self-describing archive that bundles related learning resources with one or more Extensible Markup Language based manifest files. A learning resource is any representation of information that is used in this learning experience. In the SCORM environment, content objects do not determine, by themselves, how to traverse through a unit of instruction. Instead, the LMS processes “sequencing and navigation” rules using results from the content objects to determine the order in which a student will experience learning resources. In order to use interactive models, 3-D simulations, and games as learning resources, a method of integrating them into the SCORM paradigm is necessary. SCORM 2004, as it exists, does not fully address this need.
GAMES AND SIMULATION—DELIVERY AND DEPLOYMENT The SCORM Run-Time Environment (RTE) specifies the launch of learning content, communications between content and an LMS, data transfer, and error handling. This RTE was developed for basic e-learning delivery methods from rudimentary page turners up to Flash animations. However, when the focus is shifted to interactive 3-D simulations, and/or games, the issues involved are significantly more complex. There are many types of simulations with varying levels of interactivity, immersion, and complexity. Games come in a multitude of genres and styles. They can be resident in a client browser or embedded into a
Interfacing Interactive 3-D Simulations with Learning Systems
315
separate system, which is a challenging issue when SCORM assumes Web based processes. Adding to the difficulty is the fact that often a simulation’s resources require auxiliary processes and systems that must be initialized, configured, and available for a simulation based application to launch. For example, a personal computer simulation or game may require direct access to a machine’s graphics subsystem in order to render an environment in real time and to the local file store (that is, hard drive) in order to store and quickly access needed resources. Web browsers and SCORM are not designed to handle this level of “system awareness.” Further, SCORM also limits how an LMS is able to interact with an SCO and, consequently, a simulation. Since the LMS in the SCORM architecture is required to operate in a Web based server/browser infrastructure, it has no inherent capabilities to directly launch or communicate with a native application session running on the client or other machine. One common approach to enable this capability is to use a Java applet or ActiveX component embedded in a HyperText Markup Language (HTML) Web page of the learning content to create and manage a message pipeline between the simulation and the LMS. In the SCORM environment, the LMS is responsible for establishing an application programming interface (API) for the content in the client browser to exchange data during each learning session. Communications from the content to the LMS are accomplished using ECMAScript, which can be used to “set” and “get” data values on the LMS server using the SCORM API provided by the LMS. The API defined by the SCORM is the standard Institute of Electrical and Electronics Engineers (IEEE) 1484.11.2-2003 API for content to runtime services communication. This API provides the basic capabilities needed to interface a simulation with an LMS. With appropriate development of a reciprocal API in a simulation or game it is possible to tether a simulation to a Web based, LMS direct training event and have active communications between the two. Figure 15.1 demonstrates this concept with a firefighter simulation that is tethered to an LMS. One key benefit of e-learning technologies is the ability to deploy, update, and maintain training resources from a centralized server. Integrating LMS technologies with simulations expands this to include managing a simulation’s resources from a centralized server. While SCORM provides a specification for packaging the content objects for delivery through an LMS, a simulation’s resources may necessitate a large installation footprint that may need to be configured and available for a simulation based application to launch. Unlike traditional training applications, simulations and games can easily involve datasets running into the tens or hundreds of megabytes. With current technology, it is impractical to download this amount of data each time a user wants to run the simulation; until recently there have been few options to download, uncompress, install, cache, run, and update large applications from within the context of a Web browser. One option is to use “smart client” technologies, such as “Java Web Start” from Sun Microsystems, Inc., or “ClickOnce” from Microsoft Corporation. Smart client applications are similar to traditional applications in that they are installed locally on a user’s machine and, as such, are able to make full use of local
316
VE Components and Training Technologies
Figure 15.1.
LMS with Tethered Simulation
machine resources. However, smart client applications are also able to take advantage of the benefits of a traditional Web based application; they can be stored on a Web server and easily deployed to users’ machines on demand; they can be updated by automatically downloading new or revised components, and they can communicate seamlessly with external Web services. To the user, the experience is similar to launching a video from a Web page; however, when the simulation is launched, developers have the ability to download all of the required simulation code libraries, dependencies, and assets onto the client machine. The simulation can then be launched and run in a separate window, that is, “tethered” to the Web page, while communicating back to the launch page over a transmission control protocol/Internet protocol socket. On subsequent launches, the simulation launches immediately. If content developers decide to update the simulation by modifying code or changing assets, they may simply update the files on the LMS server, which will automatically compare files on the LMS with files located on the client machine and then download and install only the modified files on each user’s machine. For reasons generally associated with increasing network security concerns, simulations will often require elevated permissions and cannot run within the standard Web interface. Additionally, network policies, such as firewalls, may not allow these types of applications to be run on the local machine/network. However, if the simulation application libraries are digitally signed, network administrators can set each machine up to allow or disallow a given application.
Interfacing Interactive 3-D Simulations with Learning Systems
317
COMMUNICATIONS VIA SCORM In SCORM, simulation content currently can be treated and processed (for example, described, sequenced, tracked, or delivered) like all other content. Within the SCORM paradigm, a course structure format (CSF) defines all of the course elements, the course structure, and all external references necessary to represent a course and its intended behavior. This CSF is intended to promote reuse of entire courses and encourage the reuse of course components by exposing all the details of each course element. The CSF describes a course using three groups of information. The first group, called global properties, is the data about the overall course. The second, called block, defines the structure of the course, and the third group, objectives, defines a separate structure for learning objectives with references to course elements within the assignment structure. SCORM presently provides a rules based “learning strategy” that enables sharable content objects (SCOs) to set the state of the global objectives. These records can store the learner’s degree of mastery in the form of a score or a pass/fail state, or they may store the progress of the learner in terms of completion. However, SCORM’s Computer Managed Instruction (CMI) data model and restriction on SCO-toSCO communications limit the design of simulation content. SCORM uses the CMI data model to provide data for content and to capture result and tracking data used to control content delivery. The CMI model is weak in representing many attributes necessary for a simulation or game, including tracking results, learner attributes, assessment data, and content state. Games and simulations typically have more complex requirements for data tracking and state management than what is currently possible within SCORM. In order for student performance to be tracked inside a simulation, the data communicated to the LMS must be confined to the data models and specifications that SCORM provides. A mechanism for assessment needs to be developed to allow student performance data to be extracted from a simulation and communicated to an LMS. Generally speaking, each SCO can contain a number of “objectives,” and learner progress toward each objective can be tracked according to basic data values, such as completion status, success status, and score. While a simulation scenario may track many assessment variables internally, it needs to be able to combine these variables into data values that an LMS is able to understand. One approach to solving this challenge is to aggregate the learner’s actions into learning objectives and report them using the objectives data model. Sequencing rules can be created to map these to global variables so that these values can be used by sequencing rules for other SCOs. However, according to SCORM, these global variables can be written to only once; therefore, a global variable cannot be used to aggregate data over a set of lessons. Additionally, the existing sequencing rules can inspect only three aspects of a global variable: score, completion, and satisfaction. This limited set of data is generally not sufficient to support all the data types required for games or simulations. This is an area that requires further standards work, which we address later in the chapter.
318
VE Components and Training Technologies
ASSESSMENT OF SIMULATION BASED TRAINING Currently, there is little consistency in how students are evaluated across different training environments, whether in the classroom, in the field, at home via e-learning, or in simulated training exercises. The problem is that assessment strategies and associated development tools have been nonstandard and closely coupled with the learning environment that they support. In other words, an assessment system developed for one learning domain can rarely be used or applied in other domains. Existing assessment mechanisms can be loosely divided into two camps based on the type of learning environment in which they are used. Static assessment is generally associated with teaching methods where the content for a given unit of learning is the same; whether the content is comprised of lectures, text, pictures, or video, each student will experience the same material in more or less the same order. In these environments, assessment generally comes in the form of a written test with questions that the student answers in a linear fashion. The benefit of this form of learning is that this type of assessment can be highly structured and easily integrated into a curriculum. In contrast, a dynamic learning environment generally refers to one in which the student interacts with the training material on some level, such as in games, simulations, or a live training exercise. In a dynamic learning environment, the training experience may be different each time the learner is engaged with it. The means of assessment in dynamic learning environments is varied. Learning pathways can no longer be defined in terms of highly structured, linear patterns and time frames. Rather, assessment must accommodate transitions between learning experiences and ultimately accommodate further training within other learning environments. In order to overcome these limitations, training developers typically develop customized methods for assessing performance. In a game or simulation, there may be some internal scoring mechanism or, as in a live training situation, assessment may take the form of a subjective review by an observer/controller or an after action review. In most of these environments, if there is a scoring/assessment mechanism, it is most likely specific to a particular training system, is not meant to be used by external systems, and may not map directly to the overall learning objectives of a larger training curriculum. Games and simulations generally have no mechanism for interfacing to an LMS to record the data. Manual entry of performance into a student’s educational history is not uncommon. In recent years, there has been considerable interest (Darque, Morse, Smith, & Frank, 2006) in how to tie dynamic learning environments to other managed learning environments. The majority of this research is focused on assessment within a single training system and uses an approach that tracks a learner’s response to critical events within that system. These events are tracked within a training system for use as evidence of whether or not a given learning objective has been satisfied. However, there are several problems with coupling assessment logic to specific training systems. The biggest problem is the lack of standardization. Currently,
Interfacing Interactive 3-D Simulations with Learning Systems
319
there are no standard definitions or data formats for such concepts as “completion,” “progress,” “evidence,” “assessment logic,” and “results.” Therefore, these components are often developed specific to each training system. This process of “reinventing the wheel” for each new training environment increases development times and discourages the development of mature, stable, reusable, and extensible components. It also inhibits the development of tools and utilities that could make it easier to create and modify assessment models. In order to overcome existing assessment architecture limitations and to enable the standardized integration of simulations and games into learning environments, assessment functionality is being logically separated from learning content and learning management systems. As shown in Figure 15.2, a new software component called an “assessment engine” is tasked with listening to event data coming from the learning environment. The assessment engine processes these data according to a defined assessment model and broadcasts the results (that is, status, scores, and grades) to a training system (that is, LMS or AAR system). The assessment engine can be incorporated into a simulation or it may be an external module. Ideally it acts as middleware between the learning environment and the training system and, as such, frees them from having to
Figure 15.2. Engine
Learner Assessment Data Model and Authoring Tools Assessment
320
VE Components and Training Technologies
handle any assessment logic internally. This architecture also allows the assessment functionality to become more generic by allowing it to be used by any number of learning environments and/or training systems. In order to extract the assessment functionality into its own process, yet still work within a variety of system architectures, there needs to be a standardized set of data formats and messaging protocols to enable communication between the learning environment and the assessment engine, and the assessment engine and the training system (that is, evidence and results). These protocols, or APIs, serve as a contract among the three processes, allowing each component to run independently. This also makes it easier for an instructional component to be replaced by an entirely new instructional component if desired. For example, a game based simulation data log could be interchanged with an observer log taken from a live training exercise, as long as both logs generate similar event data as defined by the event data protocol. BUILDING ASSESSMENTS INTO SIMULATION BASED LEARNING EXPERIENCES For each simulation or game, the assessment process is broken into logical components. The first step in the assessment development process is to specify the constructs of each training objective that needs to be measured within each of these dynamic learning environments. Once performance measures are determined, they can be mapped to events within each environment. Events are raised as the user interacts with the learning environment. The events raised by each learning environment are designed to be relevant markers of events occurring during the training session. In the case of a simulation, these events may be triggered by simple user actions, such as “student pushed button.” Alternatively, the events may be defined on a higher level and encapsulate more complex concepts that capture the interaction between objects and other system components, such as “time” and “state” variables within the learning environment. The detail and complexity of what constitutes an event should be determined by instructional designers and simulation developers according to instructional needs and simulation or game architecture. A method of capturing and naming each event also needs to be developed in order to publish all events within each environment that are necessary to satisfy each task. The assessment engine listens to these events and uses them to track a learner’s progress toward completion of defined tasks. An assessment data model is used to combine events and tasks in a hierarchal manner to form more complex tasks, which tasks ultimately culminate in one or more objectives. As a learner completes specific training objectives, the assessment component communicates this information to the training system. The goals of the aforementioned assessment model are best met with a tiered development approach. The tiers can be conceptually divided into the learning environment tier, the data tier, and the assessment tier. • The learning environment tier is the system and/or process where the actual training takes place. In the case of simulations, this tier will represent the student interacting
Interfacing Interactive 3-D Simulations with Learning Systems
321
with the simulation code. In the case of live or classroom training, this tier will represent all aspects of the physical training environment, including the students, the equipment, and the observer/controllers. At this level, the learning environment tier will be responsible for generating simple messages related to relevant events and system state. For simulations, these messages will be generated by the simulation code, and for live training the messages may be generated by other training equipment and/or the training observers (via an electronic form). The messages will notify the assessment engine when an assessment object changes state or raises an event, and they will either be sent directly to the assessment engine in real time or be stored in an intermediary data store. Once in the assessment engine, the messages will be validated against the data tier and processed according to the assessment tier. • The data tier is defined as the data model and schemas that describe the assessment objects, events, tasks/objectives, and assessment logic for a given training environment. Each training objective will be defined based on the known list of messages (events) and state data that are published by the learning environment. The assessment model will define which events it is dependent on and any rules concerning the sequencing or timing of those events. It will also define progress measurement criteria, such as complete/not complete, percent complete, score, and so on. • The assessment tier is defined as the layer that “listens” to training events from the learning environment tier and processes them according to rules defined in the data tier. This component also consists of formalizing the constructs of an event publishing system that listens for any events that are associated with training objectives within the learning environment. These events will need to be processed through an assessment model in order to track the state of each objective. It is important to note this architecture is a logical model and does not mandate where this assessment processing will take place or how information is communicated to other training systems. The assessment engine may be run within the context of a client application, a server application, or a Web service. It may even run within the same physical process as a simulation or other training system. However, even if the engine is physically running in the same process, it is still logically separated from the other components and can be updated and maintained separately.
The guiding principle here is that a training system needs to combine assessment variables into data values that other systems are able to understand. For example, SCORM 2004 provides standard specifications for assessment results and roll-up functions to aggregate assessment results. In order for a SCORM conformant LMS to track student performance during a dynamic learning experience, such as a simulation or game, the data communicated to the LMS must be confined to the data model that SCORM provides. However, as the assessment component processes events from the learning environment, there is the potential to collect other contextual information that can be used as evidence to support competency. From an instructional effectiveness point of view, the capture of this information could greatly increase the effectiveness of dynamic learning environments by allowing a more in-depth review of what a student has and has not mastered across multiple learning experiences. For instance, the ability to pass after action review data back to the LMS could greatly improve a remote teacher’s ability to access a student’s performance by allowing the teacher to actually play back the student’s simulation exercise. With the broad ability to track events and
322
VE Components and Training Technologies
student activity within simulation environments, an untapped means for assessment of performance is available for next generation training systems. The efforts behind competency based training seek to exploit these opportunities.
COMPETENCIES BASED LEARNING The Standard for Reusable Competency Definitions is motivated in part by a growing international movement, led by the human resources community, to look at competency development through the bigger picture of expressing objectives and their relationships. At issue is how to express the fact that one objective (or competency) might be composed of several subobjectives and how to express ways in which data on subobjectives can be rolled up. A competency based approach to assessment is considered by many as the bridge between traditional measures of student achievement and the future (Jones & Voorhess, 2002). However, there are numerous challenges associated with developing and assessing competency based learning initiatives. The definition of a competency for the purpose of this discussion is the combination of skills, abilities, and knowledge needed to perform a specific task. Figure 15.3 shows that for each individual student, skills and knowledge are acquired through learning experiences. Different combinations of skills and knowledge define the competencies that an individual may possess. Finally, different combinations of competencies are required to carry out different sets of tasks.
Figure 15.3.
Assessment Pyramid. Source: Department of Education
Interfacing Interactive 3-D Simulations with Learning Systems
323
Current technology makes it possible to analyze the sequences of actions learners take as they work through a problem and compare these sequences of actions against models of knowledge and performance associated with different levels of expertise. Most assessments, however, provide snapshots of achievement at ticker points in time, but do not capture the progression of a learners’ conceptual understanding (Learning Federation, 2003). This is where simulation and game based training can play a significant role. The ability to roll up events into training objectives via an assessment engine allows for the delivery of competency based learning. EVOLVING STANDARDS FOR SIMULATION AND GAME BASED LEARNING As indicated in the preceding discussion, many of the capabilities that the integration of simulation into LMS environments enables are only minimally enabled at this time due to minimal support from the existing SCORM standard. As this technology becomes more advanced and costs are reduced, we theorize that the training community will embrace games and simulations as simply another type of media that may be used to convey instruction. In the future, these technologies will interoperate with other learning technologies that have yet to be fully developed or implemented on a large scale. Advancement of standards will enable significant new capabilities for training environments. Integration with personnel management systems and business processes will introduce new concepts, such as people, objects, assessment profiles, or developmental progression. Global learner profiles will play a large role in tracking a learner’s progress across a wide range of training solutions. In the past, there has not been a consistent approach to the development of dynamic learning environments; none of the major courseware initiatives, such as reuse, interoperability, and sharing, have been addressed. A primary goal of integrating simulations and games into the SCORM paradigm is to provide a generic, adaptable, and standardized mechanism for simulations to perform learner assessment. The term “learner assessment,” in the context of this discussion, refers to the ability of the system to track a learner’s progress through an interactive training environment and generate a set of performance metrics. These metrics (that is, scores, grades, and AARs) can be used to determine if the learner has satisfactorily met the defined learning objectives and to provide structured feedback and/or appropriate remediation to the student. In order to meet these requirements, technology needs to be developed that allows in-depth assessments of student performance against a defined set of training objectives in order to identify student deficiencies and provide feedback to both students and instructors. The challenge, therefore, is to find an efficient way to do the following: • Create a set of data models and protocols that can formally represent the complex interactions between the events raised within a simulated learning environment and the training objectives, task conditions, and standards defined within a given LMS.
324
VE Components and Training Technologies These models must be able to account for the different relationships among the learner, other learners, and the environment where the training takes place.
• Provide standardized tools to create and/or modify assessment models so that trainers and instructors can design the way a student is evaluated inside each learning system. • Develop an assessment specification that uses these models to track and assess student performance across multiple training environments. • Define data warehousing structures to ensure that all of the activities that take place during a simulation are logged and available for review. From these data, patterns of routine cognitive activities can be discerned.
Recognizing the importance of these requirements, two IEEE standards committees have formed a collaborative study group to investigate the potential of formalizing a standard set of technical specifications to allow simulations and/ or games to be launched and managed through SCORM-conformant content and learning management systems. The IEEE Learning Technology Standards Committee is chartered by the IEEE Computer Society Standards Activity Board to develop accredited technical standards, recommended practices, and guides for learning technology. Currently, three of the five components of SCORM (metadata, communications, and content aggregations) are based on Learning Technology Standards Committee standards, while a fourth (competency) is in progress. Figure 15.4 shows key elements and interfaces that have emerged from the study group discussions. This diagram is coded. The stipple pattern is used to represent a dynamic learning environment that takes initial conditions and produces events during a learning experience. In the SCORM context, the learning
Figure 15.4.
Overview of LMS and Simulation Interfaces
Interfacing Interactive 3-D Simulations with Learning Systems
325
management system (large shaded oval on the left) is responsible for managing three forms of data: • Learner data, including learner identification and learner assessment records; • Competency information, shown here in terms of tasks, conditions, and standards, but possibly including other forms, such as learning objectives; and • Content, the traditional SCOs, typically HTML based data.
The assessment engine, in diagonal lines, takes streams of events generated by the simulation and generates assessment results by comparing these events to rule based models of assessment. While a simulation may track many assessment variables internally, it needs to be able to combine these variables into data values that an LMS is able to understand. This appears to be a common consideration across the study group and has been identified as an area that warrants further investigation for the development of potential standards. As shown in Figure 15.4, there is a mechanism to listen to events or messages from a simulation, process them, translate them into data that an LMS can understand, and communicate them to the SCORM data model either through the LMS or via Web services. Together, these standards, technologies, and tools have the potential to make future training simulations even more accessible, adaptable, affordable, interoperable, and reusable. This objective is consistent with current trends to make training more dynamic and engaging, more accessible, more deployable, and less expensive. Although the required technologies exist and have been successfully demonstrated, the challenge is to formalize them together in an architecture that can be integrated and used across the spectrum of training strategies and media. REFERENCES Darque, B., Morse, K., Smith, B., & Frank, G. (2006). Interfacing simulations with training content. Paper presented at the NATO Modeling and Simulation Group Conference: Transforming Training and Experimentation through Modeling and Simulation. Jones, E. A., & Voorhess, R. A. (2002). Defining and assessing learning: Exploring competency-based initiatives (NCES 2002-159). Washington, DC: U.S. Department of Education. Learning Federation. (2003). Learning modeling and assessment R&D for technologyenabled learning systems. Washington, DC: The Learning Federation.
Chapter 16
ENHANCING SITUATION AWARENESS TRAINING IN VIRTUAL REALITY THROUGH MEASUREMENT AND FEEDBACK Jennifer Riley, David Kaber, Mohamed Sheik-Nainar, and Mica Endsley Virtual reality (VR) technologies have become popular training mediums in both civilian and military domains (Liu, Tendick, Cleary, & Kaufmann, 2003 [surgical tasks]; Kaber, Wright, & Sheik-Nainar, 2006 [teleoperation]; Lampton, Bliss, & Morris, 2002 [dismounted infantry]). VR presents a dynamic and interactive means for engaging trainees while introducing instructional concepts. Simulations facilitate repetitive practice in prototypical situations that are more easily, quickly, and economically developed as compared to live rehearsals. Military personnel, for example, can be trained in VR on tactics, weapons use, or navigating unfamiliar terrain that might be encountered during future deployment. Advances in VR technology facilitate user training in rich visual scenes in distributed and multiplayer training opportunities. Such technologies are considered to be valuable instructional mediums for supporting skill development and increased knowledge transfer and retention (Ponder et al., 2003). Though the use of simulated environments for training is attractive for several reasons, researchers suggest that technological advances alone do not necessarily result in skill transfer to reality. Salas and Cannon-Bowers (1997) stated that training technologies (such as virtual environments) must include elements that are relevant to core competencies of the domain in order to promote skill acquisition. For example, in the military, relevant competencies might include situation assessment, team coordination, planning, weapons skill, and leadership. Endsley and Robertson (2000) say that in addition to instructional elements, it is important to provide feedback on training in order to develop or finetune specific skills. To really take advantage of VR and simulation technologies, it is important to incorporate measures to accompany instructional components in order to determine if trainees are meeting established standards for readiness. It is also useful
Enhancing Situation Awareness Training in Virtual Reality
327
to employ tools that can assess trainees’ awareness of events and operational elements during training in order to understand why they exhibited certain behaviors and how they communicated their knowledge with team members. This type of tool would provide trainers with the information they need to make the training experience useful for building key cognitive skills associated with situation awareness (SA) and decision making. Today there are tools available to record training events in VR simulations that can be used to review the events with trainees afterwards, but trainers are hampered in being able to take full advantage of this capability. Trainers need additional tools to more fully capture the trainees’ cognition during scenarios so that they can use this information to provide constructive feedback toward improving cognitive skill. This is particularly true for SA, which is a key component of expert cognitive performance.
SITUATION AWARENESS AND SITUATION AWARENESS SKILL DEVELOPMENT Situation awareness is defined as “the perception of elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future” (Endsley, 1988). The construct of SA has been demonstrated to be important to operational decision making and task performance in complex domains. For example, a soldier must develop an awareness of all mission relevant elements, including the location of teammates and their level of operational readiness, characteristics of the environment, status of devices, impact of weapons use, and potential actions of others in the environment. They, along with operators in similar dynamic environments (for example, law enforcement), need to be able to rapidly assess a situation, understand the effects on mission objectives, and precisely predict the likely outcome in their near future time and space. This information processing serves as a basis for dynamic decision making. For teams, SA can be defined as the degree to which every team member possesses the SA required for his or her responsibilities (Endsley, 1995b). If any one person experiences an error in SA, it can potentially lead to failure for the team. Artman and Garbis (1998) said that a team can develop mental models of a situation that are partly shared and partly distributed between members. This extends the concept of team SA to include shared SA—instances in which two or more team members must share an understanding of informational elements to meet a common goal. Endsley, Bolte, and Jones (2003) point out that shared SA does not imply a need for complete sharing of all information requirements, but rather involves having the same understanding on the subset of information that is needed to meet overlapping goals for team roles. Breakdowns in SA can be a result of failure to detect or properly interpret critical cues, memory errors, or missing information. Errors in team or shared SA are additionally impacted by problems with team processes (for example, inadequate or erroneous communications, lack of shared understanding or incompatible interpretations, diverging mental models, or failures to recognize deviations in
328
VE Components and Training Technologies
SA among team members). Endsley and Robertson (2000) identified a few target areas for improving the SA of pilots through training, which may also be applicable for improving SA in other domains. The authors discuss approaches, such as training for task management (for example, prioritization and dealing with interruptions), workload management and time sharing, contingency planning, training for comprehension and projection, and communications. They emphasize the need for structured feedback on SA, stating that it is critical to the learning process. FEEDBACK IN TRAINING FOR SA ENHANCEMENT For many years the military has used the after action review (AAR) to provide knowledge of results to training soldiers. An AAR is a structured assessment conducted after a mission or training activity that supports teams and leaders in discovering what happened during an event (Department of the Army, 1993). For virtual environment training exercises, Knerr, Lampton, Crowell, et al. (2002) have advocated the use of coaching and AAR by expert instructors. VR technology provides the capability for AARs, and one such tool used by the Army Research Institute (ARI) for the Behavioral and Social Sciences is the Dismounted Infantry Virtual After Action Review System (DIVAARS) (Knerr, Lampton, Martin, Washburn, & Cope, 2002). DIVAARS functions similar to a video recorder to capture a training exercise from different visual perspectives and includes the ability to tag events for later replay. In the present research effort, we developed a system to enhance training and augment the AAR through measurement and analysis on the trainees’ SA. Specifically, SA measurement tools were developed to provide feedback on cognitive states and behaviors in conjunction with critical training event information. To support SA feedback following VR training, SA measures should provide diagnostic results on trainees’ knowledge of critical SA information, their ability to seek and integrate information effectively, and the quality of their situation assessment behaviors. Such measures must produce results that provide detailed insight into how well trainees actually achieve these aspects of SA. Salas and Cannon-Bowers (1997) stated that trainee performance measures and results can contribute inputs to the feedback process that are necessary for improving performance. The results of a robust SA measure can serve to define elements of the training environment and operational domain on which SA is being challenged and can provide support for diagnosing behaviors that lead to or degrade SA. Here we describe an SA assessment system that is designed to support SA skill development through SA-oriented training and feedback. The research was initially conducted to meet the needs of the ARI for a comprehensive computer based measure of SA that could be utilized to train dismounted infantry squads using advanced VR technology. It was extended to meet the needs of the Office of Naval Research Virtual Technologies and Environments program for training SA in Fire Support Teams (FiSTs).
Enhancing Situation Awareness Training in Virtual Reality
329
DEVELOPING SA MEASURES FOR VR TRAINING To identify candidate SA measures for VR training, we conducted a review of existing measures (Kaber, Riley, Lampton, & Endsley, 2005). We focused on the compatibility of the measures with the VR system and the degree to which the measures supported direct assessment of components in an infantry based model of SA (Endsley et al., 2000). We also considered the need for measures that would be minimally intrusive to trainee performance and the continuity of training scenarios. Other researchers have conducted detailed reviews of SA measures in the past (see Endsley & Garland, 2000, for examples) that include information on implementation and equipment required for administration of the measures. In general, the types of measures include the following: 1. Direct, objective measures of SA, such as SAGAT (Situation Awareness Global Assessment Technique); 2. Direct, subjective measures of SA (either self- or observer rated); 3. Process measures that involve inferring SA from eye movements, verbal protocols, or team communications; 4. Behavior based methods that assess SA in terms of appropriateness of trainee actions to particular scenario events; and 5. Performance measures that infer SA from training situation outcomes.
We identified the general advantages and limitations of each measurement type. For example, direct subjective measures are easy to implement and provide a measure of individual SA. However, when rating one’s perceived level of SA, a trainee often may not know what information he or she is not aware of and base SA ratings on limited knowledge of performance outcomes. Query methods are objective and less biased measures of SA, but they require a detailed analysis to construct and administer and may require assessments through freezes in the simulation. Considering the various strengths and weaknesses and the fact that different measures can provide different insights on trainee SA, we determined that a multiple measure approach would be most robust for use in VR training sessions. We also considered the need for SA measures that were not only validated, but that provided diagnostic information for training instruction. Endsley et al. (2000) developed a model of SA specific to infantry operations. The model is based upon Endsley’s (1995b) theory of SA, which presents the cognitive construct as a product of perception, comprehension, and projection of task states. The model depicts both cognitive and task factors that can impact SA. These include task factors, such as mission planning and preparation, system complexity, and operational pace; doctrinal factors, such as tactics, procedures, and rules of engagement; environmental factors, such as terrain and weather conditions; and individual factors, such as attention and memory limits, cognitive and spatial abilities, and effects of stress, fatigue, and overload. The model also includes aspects related to information sources (for example, digital systems,
330
VE Components and Training Technologies
team members, or direct observation). It should be noted that while this model was developed specifically to depict SA in infantry operations, many of the elements are directly applicable to aspects of SA in the much broader military and civilian arena. Using an established set of psychometric criteria (see Wickens, 1992, on criteria for measures of cognitive workload), we identified three candidate measures, which satisfied the specific requirements for unobtrusiveness and reliability in assessing SA and could be developed into an SA assessment system for training in VR. A direct query method (for example, SAGAT, Endsley, 1995a; 2000) allows for flexible and objective assessments of SA along all components of SA, as the queries can be tailored to the specific relevant aspects of SA that are incorporated in a training situation. For the present effort, we opted to develop real time SA probes, as these can be presented to trainees during task performance without requiring a freeze of the simulation, although they produce less information than SAGAT (for example, Jones & Endsley, 2000). In this case, verbal probes are developed to elicit operator responses to SA queries that are based on a detailed analysis of the SA requirements for the domain. The probes can come from an evaluator or be posed as naturally occurring questions from a commander or other team member in the training scenario, making them realistically imbedded in the training event. The trainee response to SA probes can be assessed for accuracy based upon actual situation data (ground truth), providing an objective assessment of the accuracy of trainee SA that can be used as feedback. For example, the trainee can be directly presented with evidence of any misperceptions, misinterpretations, or omitted but critical information so that he or she can make needed adjustments to cognitive models and attention patterns in the future. In addition to the probe measure, we found that techniques involving expert observer ratings of SA can be implemented easily and allow for accurate assessments of whether behaviors or communications comply with SA acquisition and dissemination. Example measures include the Situation Awareness Behaviorally Anchored Rating Scale (SABARS; Strater, Endsley, Pleban, & Matthews, 2001) and team communication measures developed by Brannick, Prince, Prince, and Salas (1993) and Wright and Kaber (2003). SABARS is a post-trial rating made by an observer to evaluate the SA of trainees based upon a predefined set of physical behaviors, relevant to situation assessment, exhibited during an exercise. By basing the evaluation on predefined, SA relevant behaviors, SABARS avoids the problem of self-rating or of an observer trying to ascertain the trainee’s state of knowledge. Rather, it measures whether relevant processes are being employed (situation assessment as opposed to SA itself ). SABARS was developed specifically for platoon level operations and allowed observers to rate platoon leader behaviors on a scale from “very poor” to “very good.” We developed a computer based implementation of SABARS that allows for multiple ratings of soldier behaviors during a single training trial. As SA in military operations is heavily dependent on team communications, we included this aspect as a third and important measure for the system. Wright
Enhancing Situation Awareness Training in Virtual Reality
331
and Kaber (2003) developed a team communication and coordination measure for assessing overall team performance in complex systems control. The measurement technique involved counting and rating communications representative of critical team behaviors (dimensions of teamwork for the measure were previously identified by Brannick et al., 1993). The count and rating information was, in turn, used as the basis for a composite rating of team coordination. The measure is similar to SABARS, involving expert ratings of participant communication behaviors, but it also includes an objective measure of the frequency of specific key team communications. To implement a team communications measure focused on SA assessment, we adapted the team communications criteria used by the previous authors to reflect important aspects for SA in the specific domain and emphasized the process of achieving SA via acquisition and dissemination of situation data, within and outside of the core team (for example, asking questions and requesting situation reports). This technique requires monitoring the natural, unsolicited verbalizations of teams for specific information in communications and rating the quality of the statements. It provides useful diagnostic information associated with sharing critical SA information across the team. THE SA ASSESSMENT SYSTEM The Virtual Environment Situation Awareness Review System (VESARS) includes a suite of SA measures incorporating the three techniques described above and an application to review the training results. VESARS was developed to support two separate training domains—army infantry in urban environments and marine FiSTs. The initial design of VESARS for the infantry focused on SA measurement for a single trainee (for example, the squad leader) (Kaber et al., 2005; Kaber, Riley, Sheik-Nainar, et al., 2006). The modified design of VESARS, demonstrated for marine FiST, was created to facilitate SA measurement for a small team. The general underlying functionality of the applications is the same for individual and team SA measurement. However, the implementation of such measures, for individual or team assessment, is different. SA PROBE DELIVERY The SA probe delivery tool is used to administer SA probes to trainees during the training exercise. The tool supports an instructor in identifying relevant probes over the course of the training scenario, selecting and presenting probes, and recording trainee responses to probes. The system scores each probe as correct or incorrect by comparing the recorded trainee response with the ground truth of the simulation recorded by the administrator and stores the data for later access. SA probes were developed using goal-directed (cognitive) task analysis (Endsley, 1993; Endsley et al., 2003). The goal-directed task analysis documents the important goals and decisions associated with a task or particular role, along with the resultant SA requirements needed to perform tasks. The SA requirements are
332
VE Components and Training Technologies
dynamic information needs, rather than static procedural knowledge or rules that might be related to task steps. Defining the SA requirements for the target domain was a critical step in developing relevant SA probes for objective SA measurement. The probes presented to trainees map directly to the critical decisions that they must make in real situations. Goal-directed task analysis (GDTA) is conducted through multiple interviews with subject matter experts (SMEs), along with observation of operations, and a review of domain specific literature. The SMEs are generally interviewed individually. The results of interviews are pooled to develop a goal hierarchy and, later, the completed analysis includes key decisions and SA needed for each goal. Preliminary goal hierarchies from the task analysis are analyzed by the same or different SMEs for refinement and validation. An example segment of the analysis for the position of Forward Air Controller in Marine FiST is presented in Figure 16.1.
Figure 16.1. requirements.
Graphical GDTA results include goals, decisions, and SA
Enhancing Situation Awareness Training in Virtual Reality
333
The critical decisions identified in the goal-directed task analysis are used to generate a list of candidate SA probes. These probes are developed for each individual in a team and can be categorized in multiple ways: by level of SA, domain elements (for example, enemy, terrain, risk, and so forth), training objective, and phase of operation. The comprehensive list of SA probes for the domain is stored in a database that is accessed by VESARS. Table 16.1 presents example SA probes—at the three levels of SA (perception, comprehension, and projection) defined by Endsley (1995b)—developed for army infantry squads and marine FiSTs. To incorporate shared SA, the results of goal-directed task analysis for all team members are cross-analyzed to identify the shared goals and shared SA needs. In many cases, similar informational elements are needed for different purposes across team members. SA probes can be developed to determine if team members have the same understanding of these elements. VESARS identifies those probes used to assess shared SA and indicates the relevance of each probe to various team members. As some information requirements are relevant to all members of a team, some SA probes can be posed to any team member. However, some information requirements are applicable only to a limited set of team members. Such SA probes would be identified in VESARS as having relevance only to specific positions. The VESARS interface for administering probes includes multiple features: (1) menus for viewing probes, (2) options for filtering and finding probes by category, and (3) a map of the virtual training exercise area. Once a probe is selected, it is put in a queue until it is delivered to one or more team members. There are options for recording the trainee response to the SA probe and the ground truth of the simulation, and options for saving the response data and/or posing the probe to another team member. Team members can be asked the same SA probe at nearly the same point in time. The scores are recorded so that instructors can determine which members of the team have the same understanding on specific shared SA items at that point in time in order to assess the quality of team SA (See Figure 16.2 for a graphic of the VESARS SA probe delivery tool.) The SA probes and categorizations in VESARS are specific to a given domain based on the task analysis. Once SA requirements for a domain have been Table 16.1. Example SA Probes Domain
Level of SA Probe
Infantry squad
1
What is the civilian activity?
Infantry squad
2
Can you take cover?
Infantry squad
3
Will the enemy be able to detect your squad?
Marine Fire Support Team
1
Where is the target located?
Marine Fire Support Team
2
Do we need suppression of enemy air defenses?
Marine Fire Support Team
3
Will civilians be in danger if we engage the enemy?
Figure 16.2. VESARS interface supports SA probe delivery and scoring.
Enhancing Situation Awareness Training in Virtual Reality
335
elicited, the tool can be easily modified to reflect the new training domain. As an advantage, the VESARS developed for marine FiSTs can be utilized with any VR based training tool designed to train Fire Support Teams; it is not tied to a specific simulation or VR technology. We can also provide direct integration of VESARS with a specific simulation system by linking the SA probe delivery tool with the simulation (Kaber, Riley, Sheik-Nainar, Hyatt, & Reynolds, 2006). When integrated with a simulation tool, VESARS automatically presents appropriate SA probes to the trainer based upon trainee locations and simulated events.
THE SITUATION AWARENESS BEHAVIORAL MEASURE The SA behavioral measure implemented in VESARS is used by an external observer to rate trainees with respect to physical behaviors consistent with good situation assessment in a given domain. This includes both team level behaviors and role-specific behaviors. At the team level, we identify the situation assessment behaviors that should be exhibited by members for acquiring and maintaining SA during an operation. These behaviors are often high level actions that should be observable across team roles for the majority of training scenarios to be experienced (see Figure 16.3). This measurement design allows a single observer to rate team level behaviors for multiple members at the same time. Evaluators can determine which team members excel at certain situation assessment behaviors and also identify those who may be struggling to acquire SA during training. The behavioral rating scale also supports ratings for single individual trainees. In cases in which multiple raters are available, or training assessment is concerned with a particular role or individual, a rater can score a trainee on behaviors that are related to his or her role on the team. Rather than higher level team behaviors, in this case VESARS presents aspects of SA behavior important to acquiring and maintaining SA for a specific job. When multiple team members are rated individually, trainers can see which team members excel in their particular roles and which team members may hinder team SA because of individual SA errors. In either team or individual SA assessment, VESARS presents to the evaluator the critical SA behaviors to observe. This particularly supports the less experienced evaluator, pointing out what to look for in evaluating situation assessment. Each behavior can be rated on a Likert scale including ratings from 1 (“poor”) to 7 (“good”) or “not observed.” The not observed tag results in a rating of 0 and tracks situations in which an important SA behavior was expected, but not observed. A particular utility of the VESARS behavioral rating system lies in being able to assess how the quality of a trainee’s situation assessment behaviors change over time, as the ratings can be made throughout the training session and are recorded along the timeline. This avoids the post-trial memory recall problem associated with making an after action assessment of all behaviors and the overgeneralization that is common to such ratings. The VESARS behavioral ratings
Figure 16.3. The VESARS team level SA behavior rating scale for marine FiSTs is presented (with magnified inset).
Enhancing Situation Awareness Training in Virtual Reality
337
for each team member and each item on the rating tools are also averaged to provide a behavioral score for the role and for a particular situation assessment skill. The results are recorded at the end of a training trial into a data file for subsequent review. THE SITUATION AWARENESS MEASUREMENT OF TEAM COMMUNICATIONS To assess team communication with VESARS, training evaluators listen to the natural communications of the team members (either online or after action) and rate the quality of verbalizations for achieving SA. Various team communications can be categorized as relevant or irrelevant to expected SA communication items. Each item on the rating scale has been identified (with SMEs) as an important team communication for acquisition and dissemination of goal-related information. As trainee statements are rated, the tool tracks the frequency of “good” and “bad” SA communication for each item and each team member. At the end of a training trial, the rater has the opportunity to review the summary data for SA communications in order to provide an overall team communications rating. The overall rating ranges from 1—“hardly any skill in SA” to 5—“complete skill in SA.” The SA communication data are stored for later review. VESARS provides descriptive phrases that serve as guidelines to the rater in order to assess the frequency and quality of SA-related team communication and rate them consistently from “1” to “5” (see Figure 16.4). The team communications measure of SA primarily focuses on explicit communications between team members. There is a limited ability, however, to assess aspects of implicit communications using the behavioral rating tool (for example, observing team member actions or visual patterns). Even so, the behavioral tool mainly addresses nonverbal cues that are perceived as specialized (and explicit) codes (for example, signals, gestures, and postures), as is expected in the military environment. Each of the tools as part of the VESARS can be presented on a desktop, laptop, or tablet PC (personal computer). The tablet PC implementation allows for movement in immersive virtual training setups and for raters to position themselves in close proximity to trainees. Figure 16.5 shows VESARS running on a tablet PC. Although it was developed for VR, VESARS can be easily used with desktop simulations or in live training environments.
THE SITUATION AWARENESS REVIEW AND FEEDBACK The final component of VESARS provides for review and feedback on SA in the AAR. VESARS integrates and displays the data collected from the three measurement tools. The focus is on providing meaningful feedback to trainees on their SA, in terms of mission (SA) knowledge, situation assessment behaviors, and team SA communications. VESARS presents both summary data and detailed data from each measure through a series of interface tabs (see Figure 16.6).
Figure 16.4. VESARS communications rating tool for marine FiSTs is presented (with magnified inset).
Enhancing Situation Awareness Training in Virtual Reality
339
Figure 16.5. simulations.
VESARS is implemented with a tablet PC for easy use in training
Figure 16.6.
VESARS provides overview results for three measures.
340
VE Components and Training Technologies
VESARS provides key statistics on trainee SA toward improving the quality of AAR content. There are few other tools to support the AAR leader in presenting feedback on results to trainees following an exercise. Thus, the “take away” value for the trainees may hinge upon the ability of the AAR leader to identify critical behaviors and decision points during a scenario and relate trainee actual behaviors to needed skills and required competencies. This work can be particularly challenging in that AAR leaders may not be trained in SA theory, yet they may need to relate observed levels of SA to performance of combat skills. The VESARS provides this support for the AAR leader by helping him or her identify where losses in SA occur and what improvements are needed. The “Overview” tab displays a summary of all SA data collected (see Figure 16.6). The SA probe scores are expressed as a percentage of correct responses to the SA probes. The behavioral scores are presented as the average situation assessment rating (from 0 [“not observed”] to 7 [“very good”]) for each team role. The summary of the communications data is presented as the percentage of “good” communications that occurred across the team. VESARS provides, at a high level, which member of a team experienced the greatest challenge in acquiring SA—a “red” circle around a score denotes the lowest score for each measure. VESARS provides access to detailed data on a particular measure for further analyses and training feedback by the AAR leader. Beyond the overall summary of SA data, results from each SA measure can be reviewed in detail for each team member. The SA probes tab provides detailed results for each team member, including an overall score and accuracies in responding to probes targeting each level of SA (see Figure 16.7). The AAR leader can access detailed probe performance data on each member of the team and make comparisons with others. Data on individual probes, showing a table of all probes presented to the team, their responses, and the ground truth, can be accessed by each hyperlinked team member label. In addition, VESARS can present shared SA data. The interface presents a count of probes on common knowledge, which were answered correctly by teammates. The trainer can see which team members were queried, which of the team members had the same SA, and whether the team understanding of the situation matched actual events. See Figure 16.8 for an example of shared SA results from the marine FiST domain. Data on SA behaviors indicate how well the team or individual members performed, on average, in situation assessment. Results can be reviewed for specific behavioral items for a given team member role. Figure 16.9 presents a team view of situation assessment behaviors for the marine FiST. These data can be related to the SA probe results, providing information on which behaviors were performed and how this translated into inaccurate responses to SA probes. In addition, behavioral data presented over time can be linked to specific scenario events, providing insight into why SA may have been challenged at a particular point in time. Similar data can be viewed on team communications. The trainer can view the percentage of good communications for the team or for individual team
Enhancing Situation Awareness Training in Virtual Reality
Figure 16.7.
341
SA probe data can be reviewed in detail.
members. The overall SA rating for team communications is also presented (see Figure 16.10). Results on communications can be graphically presented to trainees so that they quickly see which communications were performed well and which communication behaviors need to be modified to support team SA. These results can be related to SA probe results. Results indicating infrequent and inaccurate SA communications on certain items will also likely be associated with inaccurate responses to related SA probes. There may be instances in which the SA results from various measures conflict. For example, individuals may receive low overall SA scores based on their SA communication, but provide accurate responses to SA probes. In this example, the results would indicate team members who are skilled in acquiring individual SA, but lack the communication skills for supporting shared and team SA. SUMMARY As feedback is critical to learning, VESARS is expected to significantly enhance the benefit of simulation based training by assessing the SA of trainees and providing reviews of SA performance to enhance learning of critical behaviors and skills. The system couples validated objective and subjective measurement tools for comprehensive evaluation of SA. Much of the training value of VESARS lies in the method of targeting critical SA information needs for the domain being trained. The underlying SA analysis (for example, goal-directed
Figure 16.8. Shared SA results are provided with VESARS.
Enhancing Situation Awareness Training in Virtual Reality
Figure 16.9.
343
VESARS behavioral data are presented in graphical form.
task analysis, assessment behaviors, and SA communications) serves to provide the instructional content that supports developing core competencies in a domain. This helps to enhance the training utility of sophisticated simulation tools. The SA analysis methods also provide insight into scenario events, activities, or missions that can be included in scenarios to tax SA, train specific situation assessment skills and communications, and ultimately enhance cognition and decision making in operations. The flexible and robust approach to SA measurement and analysis also makes it easy to provide structured feedback on the level of SA achieved during training situations. Beyond supporting structured feedback, VESARS could be used for assessing displays or training protocols. The SA scores of operators or trainees experiencing multiple display designs or training under alternate regimes can be compared to determine which design options will better support SA during operations or provide for more robust SA skill development. Multifaceted results would also denote the aspects of SA that are benefited. Although current versions of VESARS are intended for SA assessment in military domains, the system is easily adaptable to new domains after conducting an analysis of SA requirements. As an advantage, the individual SA measures can be administered together or as separate components, depending upon the SA skills the trainer is interested in evaluating or further developing in trainees. Because VESARS can be loaded on separate machines (laptop, tablet, or desktop), it can also be used to assess SA of distributed training participants. Raters at different locations can view trainees in virtual environments. Each tool can collect and
Figure 16.10. VESARS provides an overview of team communication results.
Enhancing Situation Awareness Training in Virtual Reality
345
store SA results, which can be uploaded to a central system for distributed after action review and discussion. Future research should be aimed at additional validation of VESARS. In previous research with VESARS for army infantry, the authors have demonstrated some sensitivity of the tool to scenario manipulations and individual differences among trainees (Kaber et al., 2005). This work was conducted with a very small participant sample, however, and, for certain measures, there were too few data collection points to completely validate the system. More robust validation requires access to a greater number of participant teams for longer periods of time and to training systems and situations that facilitate collection of other validated measures of SA (for example, SAGAT) along with VESARS measures, so that correlations between SA results could be assessed. The key is having the necessary control over training events in order to collect repeated measures for all responses across multiple scenarios and conditions to support the statistical analyses needed for robust validation. ACKNOWLEDGMENTS We thank subject matter experts at 29 Palms, California, and Camp LeJeune, North Carolina, for their contribution to task analysis for the marine FiST. We thank Mr. Dan Macchiarella and Lt. Col. John Hyatt for their contributions as SMEs and in reviewing SA measures and results. Justus Reynolds, Fleet Davis, Arathi Sethumadhavan, and Matthew DeKrey are acknowledged for hard work in task analysis and programming. This effort was supported by a Small Business Innovative Research contract from the Office of the Secretary of Defense through the Army Research Institute and was further supported by the Office of Naval Research (ONR) Virtual Technologies and Environments Program grant. We are indebted to the project technical monitors, Don Lampton and Bruce Knerr of ARI and CDR Dylan Schmorrow of ONR. The views and conclusions presented in this chapter are those of the authors and do not necessarily reflect the views of the U.S. Army or the U.S. Navy. REFERENCES Artman, H., & Garbis, C. (1998). Situation awareness as distributed cognition. In T. Green, L. Bannon, C. Warren, & Initials Buckley (Eds.), Cognition and cooperation. Proceedings of 9th Conference of Cognitive Ergonomics (pp. 151–156). Limerick, Ireland. Brannick, M. T., Prince, A., Prince, C., & Salas, E. (1993). The measurement of team process. Human Factors, 37(3), 641–651. Department of the Army. (1993). A leader’s guide to after action review (TC 25-20). Endsley, M. R. (1988). Design and evaluation for situation awareness enhancement. Proceedings of the Human Factors Society 32nd Annual Meeting (Vol. 1, pp. 97–101). Santa Monica, CA: Human Factors Society. Endsley, M. R. (1993). A survey of situation awareness requirements in air-to-air combat fighters. International Journal of Aviation Psychology, 3(2), 157–168.
346
VE Components and Training Technologies
Endsley, M. R. (1995a). Measurement of situation awareness in dynamic systems. Human Factors, 37(1), 65–84. Endsley, M. R. (1995b). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32–64. Endsley, M. R. (2000). Direct measurement of situation awareness: Validity and use of SAGAT. In M. Endsley & D. Garland (Eds.), Situation awareness analysis and measurement (pp. 147–173). Mahwah, NJ: LEA. Endsley, M. R., Bolte, B., & Jones, D. G. (2003). Designing for situation awareness: An approach to human-centered design. London: Taylor & Francis. Endsley, M. R., & Garland, D. J. (2000). Situation awareness analysis and measurement. Mahwah, NJ: Lawrence Erlbaum. Endsley, M. R., Holder, L. D., Leibrecht, B. C., Garland, D. J., Mattews, M. D., & Graham, S. E. (2000). Modeling and measuring situation awareness in the infantry operational environment (Research Rep. No. 1753). Alexandria, VA: U.S. Army Research Institute for Behavioral and Social Sciences. Endsley, M. R., & Robertson, M. M. (2000). Training for situation awareness. In M. R. Endsley & D. J. Garland (Eds.), Situation awareness analysis and measurement (pp. 349–365). Mahwah, NJ: Lawrence Erlbaum. Jones, D. G., & Endsley, M. R. (2000). Can real-time probes provide a valid measure of situation awareness? In D. B. Kaber & M. R. Endsley (Eds.), Human performance, situation awareness and automation: User-centered design for the new millennium (pp. 245–250). Atlanta, GA: SA Technologies. Kaber, D. B., Riley, J. M., Lampton, D., & Endsley, M. R. (2005). Measuring situation awareness in a virtual urban environment for dismounted infantry training. Proceedings of the 11th International Conference on Human-computer Interaction: Vol. 9. Advances in virtual environments technology: Musings on design, evaluation, and applications. Las Vegas, NV: MIRA Digital Publishing. Kaber, D. B., Riley, J. M., Sheik-Nainar, M. A., Hyatt, J. R., & Reynolds, J. P. (2006). Assessing infantry soldier situation awareness in virtual environment-based training of urban terrain operations. Proceedings of the International Ergonomics Association 16th World Congress. Maastricht, The Netherlands: Elsevier. Kaber, D. B., Wright, M. C., & Sheik-Nainar, M. A. (2006). Investigation of multi-modal interface features for adaptive automation of a human-robot system. International Journal of Human-Computer Studies, 64(6), 527–540. Knerr, B. W., Lampton, D. R., Crowell, H. P., Thomas, M. A., Comer, B. D., Grosse, J., et al. (2002). Virtual environments for dismounted soldier simulation, training and mission rehearsal: Results of the FY 2001 culminating event (Tech. Rep. No. 1129). Alexandria, VA: U.S. Army Research Institute for Behavioral and Social Sciences. Knerr, B. W., Lampton, D. R., Martin, G. A., Washburn, D. A., & Cope, D. (2002). Developing an after action review system for virtual dismounted infantry simulations. Proceedings of the 2002 Interservice/Industry Training Simulation and Education Conference. Arlington, VA: National Training Systems Association. Lampton, D. R., Bliss, J. P., & Morris, C. S. (2002). Human performance measurement in virtual environments. In K. M. Stanney (Ed.), The handbook of virtual environments: Design, implementation and applications (pp. 701–720). Mahwah, NJ: Lawrence Erlbaum.
Enhancing Situation Awareness Training in Virtual Reality
347
Liu, A., Tendick, F., Cleary, K., & Kaufmann, C. (2003). A survey of surgical simulation: Applications, technology, and education. Presence: Teleoperators and Virtual Environments, 12(6), 599–614. Ponder, M., Herbelin, B., Molet, T., Schertenlieb, S., Ulicny, B., Papagiannakis, G., et al. (2003, May). Immersive VR decision training: Telling interactive stories featuring advanced virtual human simulation technologies. Proceedings of the 9th Eurographics Workshop on Virtual Environments. Zurich, Switzerland: Eurographics Association. Salas, E., & Cannon-Bowers, J. A. (1997). Methods, tools, and strategies for team training. In M. A. Quinones & A. Ehrenstein (Eds.), Training for a rapidly changing workplace: Applications of psychological research (pp. 249–279). Washington, DC: American Psychological Association. Strater, L. D., Endsley, M. R., Pleban, R. J., & Matthews, M. D. (2001). Measures of platoon leader situation awareness in virtual decision-making exercises (Research Rep. No, 1770). Alexandria, VA: U.S. Army Research Institute for Behavioral and Social Sciences. Wickens, C. D. (1992). Engineering psychology and human performance (2nd ed.). New York: Harper Collins. Wright, M. C., & Kaber, D. B. (2003). Team coordination and strategies under automation. Proceedings of the 47th Annual Meeting of the Human Factors and Ergonomics Society (pp. 553–557). Santa Monica, CA: Human Factors and Ergonomics Society.
Chapter 17
ASSESSING COGNITIVE WORKLOAD IN VIRTUAL ENVIRONMENTS Brad Cain and Joe Armstrong This chapter discusses cognitive workload measurement within the context of virtual environments, describing some factors that affect workload and discussing some reasons for measuring workload in virtual environment applications. The chapter briefly discusses workload assessment techniques, including subjective ratings and performance based measures, noting a few in-depth workload measurement literature reviews. INTRODUCTION Complex work environments are typically associated with operators performing multiple tasks either simultaneously or in close succession to each other (Meyer & Kieras, 1997). Although execution of the individual tasks can often be described as a collection of procedural solutions, the interactions of concurrent tasks and the demands imposed on the operator create a chaotic system where task outcomes are very sensitive to the timing of events, the history and anticipated events, as well as the nature of the tasks involved. The need to understand and facilitate task flows within these complex environments to improve system performance has resulted in empirical investigations of the factors related to the cognitive aspects of task performance, with less emphasis placed on the physical workload demands. The complexity of these environments is further expanded through the inherent nonlinearity of the many interactions between the operator and a given system, as well as between multiple operators. Cognitive workload primarily focuses on determining the nature of the human information processing capacity (Reid & Nygren, 1988), including such issues as the number of cognitive resources that exist and the interactions that occur between them during task execution. The concept of cognitive workload is often considered to be an interim measure that is intended to provide insight into where increased task demands may lead to unacceptable system performance. Implicit in the measurement of cognitive workload, hereafter simply referred to as workload, is the belief that as task difficulty or task demands increase, there
Assessing Cognitive Workload in Virtual Environments
349
is less residual capacity to deal with additional tasks; system performance usually decreases; response times and errors increase; control variability increases; fewer tasks are completed per unit time; and task performance strategies change (Huey & Wickens, 1993). In most cases, workload assessment focuses on high tempo periods, where errors of commission and omission are more prevalent; however, low tempo intervals are also important as a lack of operator engagement can reduce vigilance in situations where workload may be considered negligible despite a high level of effort required to maintain attention during monitoring for infrequent events. This has implications for designing operating procedures of virtual environments (VEs) that keep the operator suitably aroused and engaged with the task. VEs were developed to provide a controlled experience for human operators to perform tasks similar to those performed in the real world. These artificial environments are abstractions of the real world with numerous compromises on fidelity of representation imposed by both technological limitations and cost. The usefulness of these devices lies in the cost-benefit trade-off that optimizes the benefits subject to the technological and cost constraints by implementing the essential features of the real world relevant to the application area in sufficient detail. VEs are used principally for engineering research and development and training, although mission rehearsal and tactics development are also likely candidates. Each application area has different objectives and requirements, but as they include a human in the loop to drive the performance of a simulated system, there is a need for a suitable representation of the environment and methods to assess the success of these representations. In the context of assessing such human factors as workload during the systems design process, VEs are being used to provide input to all aspects of system design from requirements analysis, design, development, evaluation, and verification. At each phase of design, representations of a system can be assessed within a VE to ensure the system complexity does not exceed the operator’s capacity to perform within acceptable workload limits, often at varying levels of fidelity depending upon the design phase. Testing of design concepts and procedures in VEs can identify excessive demands without formal workload measurement, but the assessment of intermediate states and estimates of residual capacity for operators to deal with emergencies or unanticipated events makes workload assessment a useful diagnostic tool. It is unclear, however, whether workload as measured in VEs necessarily translates into an equivalent demand as measured in the real world under similar circumstances. For training, the objective is to provide a positive transfer of training that improves the operator’s ability to perform in the real world with the real system. In this case, workload may be useful to assess the simulator’s ability to convey adequately the necessary cues and information required while performing the required tasks, that is, as a metric of the degree of similarity of the essential features for training between the VE and the real world. Further, workload provides additional evidence for assessing competency that may not be adequately captured in pass/fail performance criteria. Seldom is it required or even desired to
350
VE Components and Training Technologies
reproduce the real world exactly, as noted by Wickens (1992, p. 240). Workload measurement provides a quantitative measure of the subjective experience of performing some task in the VE that complements objective performance metrics to provide a more complete picture of the validity of the VE, as well as the operator’s capability to perform selected tasks. The range of possible VEs varies from desktop abstractions used for psychophysics experiments to fully immersive environments that support the conduct of full military exercises (Durlach & Mavor, 1995; Youngblut, Johnston, Nash, Wienclaw, & Will, 1996) provided the VE evokes responses similar to experiences in the real world (Magnusson, 2002; Na¨hlinder, 2002). The level of realism associated with a given VE interacts with different levels of human performance, and the impact on workload is not always clear. It is important to assess how the differences between the VE and the real world affect cognitive processes through both subjective and objective performance measures of workload. FACTORS AFFECTING WORKLOAD There are numerous factors that can affect perceived workload, but they can be broadly classified into two categories: task factors and operator factors. Differences between the real world and its VE representation have the potential to produce differences in the flow of information to the operator. While these differences may affect performance, it is not clear how those differences might affect workload, and differences between them may result in changes to the operator’s approach to using the two systems. VE characteristics, such as lower fidelity visual representations, the lack of stressors and motivators, such as fatigue and survival, or the lack of proprioceptive cues may all serve to change the task demands for that system and thereby alter the operator’s perceived workload. Workload and performance metrics must be selected to be sensitive to known differences to confidently predict real world expectations from VE results. Perceptual Differences Significant research and development have occurred within the realm of modeling and simulation to improve the supporting technologies that underlie. While many VE shortfalls have been identified over the past 10 years (Durlach & Mavor, 1995; Youngblut et al., 1996), VEs remain today, and technological advances have reduced the severity of many of these problems (Renkewitz & Alexander, 2007). Typically, we acquire most of our information of dynamic, real world events visually. Synthetic environments have only recently been able to employ technologies that closely approximate the real world experience both in the richness of detail and the resolution of that representation. The degree to which a VE must approximate reality is highly dependent on the nature of the task, and differing VE applications will often have different visual display requirements. For example, the presentation of stereoscopic information and motion parallax to support
Assessing Cognitive Workload in Virtual Environments
351
the judgment of depth and distance may require a visual scene with only moderate resolution, while other tasks that involve fine visual acuity may require very high resolution but only biocular imagery. Other tasks, such as target identification, may require principally foveal displays, while others, such as target detection, may require a better presentation of peripheral information. Shortfalls in the appropriate levels of fidelity in the key aspects of a VE may also affect performance, such as inadequate visual cueing that prevents helicopter hover maneuvers in closely coupled systems, such as deck landings or ground vehicle control in complex terrain, but the effect on cognitive workload is unclear. When there are conflicting cues in the environment as a consequence of technological limitations of the VE, the operator has to reconcile the relevant cues with the false cues. While degraded visual cues may lead to poorer performance in the VE than in the real world, it is unclear whether users will perceive that they have to work any harder to achieve that level of performance. Audition and communication have typically received less attention in the VE literature than has vision. Interactions with virtual teammates, opponents, and bystanders should be natural. Computer speech production has improved to the point where it is intelligible and reasonably natural, but problems persist when dealing with virtual operators that require speech recognition and interpretation through natural language processing (Sanchez-Vives & Slater, 2005). While simple, constrained dialogue is feasible, broad lexicons, ambiguous meanings and real time simulation pose interpretation challenges for virtual agents. Implausible behaviors and decisions can occur as virtual agents violate underlying assumptions embedded in their rule bases. This in turn can lead to changes in performance and workload for the operator when interpreting entity behavior or increase the effort required compensating for the behavior of virtual teammates. Unnatural constraints on how information has to be conveyed (restrictive vocabularies or manual text input) and loss of audible information (adversary detection or failure noises) mean that information has to be obtained in some less convenient means (for example, text messages or keyboard entry), leading to higher than expected workload. Further, studies of three-dimensional auditory cueing aid to visual search found that even small errors in the spatial location of the auditory presentation delayed visual detections, indicating that shortfalls in one information processing channel may result in greater workload in another channel if they present conflicting information. Noise and vibration levels in VEs are generally lower than in the real world, resulting in less harmful experiences, although some simulators provide reduced levels for context. Despite providing some degree of realism, operators exposed to these sensations may experience less fatigue or clearer communications, lessening the perceived workload and improving performance over real world conditions. Proprioceptive feedback and sensations of acceleration are typically limited in VEs due to the lack of or uncorrelated visual representations of body, limits in motion base displacements, and so forth. This leads to disparities among cues about body position, grasping, control manipulation, and movement within the
352
VE Components and Training Technologies
VE that the operator learns to compensate for or to ignore. Simulators often compensate with false acceleration cues because of physical constraints that can distract the user under certain conditions (for example, recentering of jacks during rapid helicopter control inputs can feel like the tail wheel striking the tail fan during a deck landing on a ship). Tactile feedback from simulator controls may lead to overcontrolling or lack of adequate control such that more effort is required to achieve the same level of performance, yet it may also lead to a loss of user acceptance, causing both lower performance and workload. Cognitive Differences Typically the VE is less complex than its real world equivalent: there are fewer sensory details to interpret, fewer distractions or alternatives to consider, and generally fewer methods to achieve one’s goals. Scenarios are contrived, for good reasons, but this may lead to unrealistic conditions that the subject perceives as unlikely, causing lower trust or acceptance of the VE. There are few consequences of failure in a VE, although competency testing and desire to perform adequately can be stressful. Knowledge that a VE is only a simulation may influence operator performance. Some maintain that it is the sense of presence or the perception of being in the synthetic environment that differentiates VEs from other synthetic environments or simulators. The journal Presence is largely devoted to this topic and the technological challenges facing the creation of presence, but even modest fidelity simulations can induce engagement, if not a sense of presence (Bowman & McMahan, 2007; Dekker & Delleman, 2007). Subjective workload provides a means of assessing operator acceptance to determine whether sufficient engagement has been achieved that, when coupled with objective measures of performance and subject matter expert opinion, provides a method to assess acceptance of the VE. Behavior Presenting a visual representation of teammates provides supporting cues to task performance, such as the ability to observe a teammate control inputs made or nonverbal signals. Tactile and haptic feedback from controls provides considerable information to the operator about the state of the local environment, particularly to subject matter experts. Conflicting feedback may cause ambiguity leading to confusion and extra reasoning. Without accurate representation of these behaviors and physical cues, the operator has to exert additional effort to collect relevant information by other means. Operator State Operators bring individual knowledge, skills, and abilities to the VE, sometimes collectively called traits that are relatively constant through an experience. Shortcomings in the VE ultimately affect the operator directly, often substantially
Assessing Cognitive Workload in Virtual Environments
353
moderating operator state during a VE experience. Simulator sickness can result from a mismatch between the visual and vestibular cues (Johnson, 2007; Kennedy, Lane, Bierbaum & Lilienthal, 1993) (see Drexler, Kennedy, and Malone, Volume 2, Section 1, Chapter 11), degrading the operator’s ability to perform and leading to poorer performance or increased effort. Other unintended effects, such as the lack of representation of the operator’s own body, can lead to disorientation and confusion. Operator discomfort is a key factor that is seldom addressed in VE technology reviews. We know from military operations that adjusting close fitting helmets for personnel is an involved and critical process that is required to ensure that the helmet does not produce undue discomfort that will affect performance. Also, neck strain can occur when using night vision goggles, even with good designs under normal conditions. Unfortunately, VE helmet-mounted displays intended for general use seldom provide adequate adjustment to allow for individual differences, so discomfort and strain arise over time, distracting the operator from the primary task and affecting the perception of workload. WORKLOAD MEASUREMENT METHODS There are a number of reviews of workload in the open literature that discuss theoretical and applied issues that the reader may consult (for example, Cain, 2007; Castor, 2003; de Waard, 1996; Eggemeier, Wilson, Kramer, & Damos, 1991; Farmer & Brownson, 2003; Gopher & Braune, 1984; Gopher & Donchin, 1986). The following sections provide a brief look at some of the techniques that can be usefully applied in VEs. Physiological measures were not included here because of their specialized instrumentation requirements, but readers interested in these techniques may want to consult the relevant literature1 (for example, Kramer, 1991; Kramer, Trejo, & Humphrey, 1996; Wilson et al., 2004; Wilson, 1998, 2001; Wilson & Eggemeier, 1991). Subjective Ratings Subjective workload measurements are the most commonly used techniques for gathering feedback from operators about the perceived demands of a task (Rubio, Diaz, Martin, & Puente, 2004; Tsang & Velazquez, 1996; Wickens, 1989; Xie & Salvendy, 2000). Subjective measures are relatively easy to administer and often have high face validity due to the active involvement of the operators in the evaluation process. They may also have good construct validity if there is a coherent theoretical framework that fits the operator’s own perception of internal state while performing a task (Annett, 2002), but subjective assessments may not correspond to observed task performance. This is an important distinction as the perception of a situation, particularly a situation that induces 1 See also the Association of Applied Psychophysiology and Biofeedback Web page: http:// www.aapb.org, April 2008.
354
VE Components and Training Technologies
stress or requires substantial experience to understand subtle differences, can have a dramatic impact on the operator’s perception of his or her ability to complete the assigned tasks (Annett, 2002; Hancock & Vasmatzidis, 1998). There are several popular subjective workload metrics that attempt to be diagnostic of the task. The VACP (visual, auditory, cognitive, and psychomotor) method (Aldrich & McCracken, 1984; Aldrich, Szabo, & Bierbaum, 1989; McCracken & Aldrich, 1984) was perhaps the first multidimensional technique, and despite objections to inappropriate aggregation of ordinal data, VACP has been used extensively to reflect expert judgment in constructive simulations during system. The subjective workload assessment tool (Reid & Nygren, 1988; Reid, Shingledecker, & Eggemeier, 1981; Reid, Shingledecker, Nygren, & Eggemeier, 1981) is used to rate subjective experiences on three dimensions: time load, mental effort, and psychological stress. Conjoint scaling of the ordinal ratings creates interval ratings, a step not addressed by most subjective techniques. Hart and Staveland (1988) developed the National Aeronautics and Space Administration (NASA) Task Load Index (TLX) with six verbally anchored dimensions to represent individual differences of perceived contributions to workload. Castor (2003, p. 36) notes that five of the dimensions seem to cluster on a single factor that could be considered “workload,” while the own performance dimension reflects a different aspect of task performance that is related to, but is separate from, the workload assessment. The DRAWS (Defence Research Agency Workload Scale) measurement technique (Farmer, Belyavin, et al., 1995; Farmer, Jordan, et al., 1995) asks subjects to rate task demands on three information processing dimensions: input, central, and output (an additional dimension, time pressure, was later added to these original three dimensions). The POP (Prediction of Operator Performance) model was developed as a predictive form of DRAWS; integrating subjective DRAWS ratings of individual task demands predicts the interference among concurrent tasks in constructive simulations. POP has been validated against human data from laboratory tasks, indicating that subjective techniques can have predictive validity in addition to construct and content validity if properly implemented. Numerous comparison studies of these techniques have been published (Byers, Bittner, Hill, Zaklad, & Christ, 1988; Colle & Reid, 1998; Hart & Staveland, 1988; Hill et al., 1992; Nygren, 1991; Whitaker, Hohne, & Birkmire-Peters, 1997). Hart and Wickens (1990, p. 267) reported that of several subjective measures, NASA TLX correlated best with performance measures while displaying low intersubject variability and good user acceptance. While the use of multiple dimensions in some techniques may provide insight into task overload, it is still beyond their capability to explain or diagnose the theoretical reason for the cognitive overload (Annett, 2002; Damos, 1991; Jordan et al., 1996; Rubio et al., 2004). Performance Based Measures Since the primary reason for creating a VE is to create a convincing simulation of the real world, measurement of on-task performance is a natural approach to assessing system efficacy and operator demand. Performance measures of
Assessing Cognitive Workload in Virtual Environments
355
workload can be classified into two major types: primary and secondary task measures. In most VE investigations, the primary task will be of interest as it is usually an approximation of an in-service task. Wickens (1992, p. 392) notes that for primary task measures to be adequate for workload assessment, a number of conditions should be met: manipulation of the task must be sufficient to cause changes in task performance to infer remaining capacity; the demand of secondary tasks should not be the limiting factor, particularly if they use other modalities; other effects should not creep into the study (confounds such as learning or fatigue); and analysts must determine whether operator strategies affect performance and workload differentially, giving the appearance of dissociation between the two. Secondary task measures are considered more diagnostic than primary task measures alone and are used to assess the remaining processing capacity while performing a primary task. The secondary task paradigm can be further classified into auxiliary task and loading task methodologies, but the intent of both is to increase operator load to the point where changes in effort or strategy are no longer able to compensate for changes in the demand manipulation of concurrent task execution without a corresponding change in performance. Damos (1991, p. 114) suggests a number of considerations for selecting secondary tasks, noting that practice on individual tasks is not sufficient to ensure optimal dual task performance; the tasks must be also practiced together, an aspect related to the metacontroller’s role in performance and workload measurement (Jex, 1988). While primary task measures are often based on an operational context, the contextual relevance or importance of secondary tasks varies. This can lead to subjects disregarding the secondary task so contextually relevant secondary tasks are preferred.
COMMENTS ON APPLICATION OF WORKLOAD ASSESSMENT Wierwille and colleagues reported the results of a focused series of experiments stressing different aspects of mental demand on workload measurement methods (Casali & Wierwille, 1983, 1984; Wierwille & Connor, 1983; Wierwille, Rahimi, & Casali, 1985). The results indicated that various workload measurement techniques are differentially sensitive to the types of load manipulations. Less than half of the measurement techniques tested were appropriate in any given study, and fewer still had a monotonic relationship with the load manipulations. Wickens (1992), de Waard (1996), and Farmer and Brownson (2003), among others, provide critical reviews of workload measurement techniques, offering recommendations for human-in-the-loop simulation. O’Donnell & Eggemeier (1986) proposed several criteria to guide the selection of workload measurement techniques. The dissociation between subjective and performance measures of workload is generally acknowledged, and there are numerous examples of it in the literature, although the causes are not well understood (de Waard, 1996; Farmer & Brownson, 2003; Farmer, Jordan, Belyavin, Birch, & Bunting, 1993; Vidulich &
356
VE Components and Training Technologies
Wickens, 1986; Yeh & Wickens, 1985). Dissociations can be observed as an insensitivity or a reversal of expected results by one of the measurement classes to task manipulations. Vidulich and Bortolussi (1988) observed that subjective measures tend to be sensitive to working memory demands and less so to response execution demands. Their hypothesis is that subjective workload is sensitive only to manipulations that are well represented consciously, so that varying demands in skill based tasks will not cause subjective ratings to change substantially. This suggests that subjective workload measures are well suited to assessing modern technologies that aid judgment or decision making, but are less suited for assessing physical or mechanical aspects of skill based tasks. In summary, the literature covering the range of analytic techniques supporting the assessment of workload is extensive. Though there are many attributes associated with the concept of workload that further research, a sufficient body of knowledge exists to support practical applications of workload assessment tools and techniques. The following section provides a brief discussion on the practical application of workload assessment within a VE framework. WORKLOAD ASSESSMENT VALIDITY Most if not all workload methods have been developed and validated in a simulated environment where cues are presented to operators in some representation or abstraction of a real world equivalent task. Thus, regardless of what workload measurement devices are actually measuring, they probably measure it best in a synthetic environment. This suggests that there should be no special consideration required for applying workload measurement techniques in a VE. Adams (1997) notes that while measuring performance and workload in the field may not suffer from content validity, it does not validate the measurement method construct, and care must be taken to ensure a valid method is used. Measurement of proficiency or transfer of training in a simulator calls into question the issue of content and construct validity. Simulators for training transfer and simulators for proficiency may be quite different, and a simulator that is good for transfer of training may not be good for proficiency testing and vice versa. Adams cites Weitzman that a score for instrument flying in a 1979, high fidelity flight simulator had a moderate correlation of 0.51 with the same measures in the real aircraft, raising doubts about the validity of both the measurement approach and the VE study. Studies such as this that incorporate workload metrics help to establish the technological requirements for valid VEs. VEs typically provide a flexible framework from which to generate both subjective and objective measures of workload. This sets the stage, but does not guarantee that corresponding measurements can be made in operational settings to validate a VE. Some of the benefits that VEs afford include the following: 1. Flexibility of automated data collection capabilities integrated within the hardware and software infrastructure supporting the VE, 2. Ability to integrate a range of primary or secondary performance measures that would be impossible within operational equipment,
Assessing Cognitive Workload in Virtual Environments
357
3. Ability to pause virtual scenarios at any time to obtain subjective measurements of workload on a momentary basis, and 4. Greater control over confounding nuisance variables that affect performance in an operational setting.
Advances in simulator technologies are also creating environments with a greater sense of presence such that the difference between simulated and real world performance is rapidly decreasing, thereby making the VE valuable as a preliminary design assessment tool. Research with VEs is helping to identify the essential elements and the degree of fidelity that is needed to provide a valid simulation. Workload assessment techniques may assist in determining the appropriateness of a VE for design or training. Yet, there are still a number of considerations that must be addressed when using VEs as part of an assessment tool for human performance measurement, especially in the context of metacognitive concepts such as workload. These considerations include the following: 1. How can the ecological validity of the laboratory conditions be established such that empirical results may be generalized to operational environments? There is currently no clear path to defining the characteristics of VEs required to ensure that measurements in a virtual world are extensible to the real world. 2. Will participants attribute sufficient realism to the VEs to treat the tasks within the environment in a realistic manner and experience workload comparable to that experienced in practice? For example, the very fact that a critical simulated event in a VE is not truly life threatening may change participant behaviors and perceptions, diminishing the level of stress and workload experienced. 3. VE studies may employ untrained or inexperienced participants to offset either the cost or difficulty of obtaining subject matter experts. Evidence suggests that expert performance may be substantially different from untrained participant performance, showing evidence of highly skilled automatized behavior that does not contribute to perceptions of subjective workload.
Validation of VEs is difficult, but without validation, there is little confidence that the results obtained with the VE are meaningful. Comparison of workload measurements from similar VE and real world tasks provides a critical component for the VE validation process. CONCLUDING THOUGHTS When selecting a workload measure, or a battery of measures, the analyst should consider the objective of the assessment. In the context of systems evaluation, then perhaps a univariate measure is sufficient. If a more diagnostic measurement is required, say to assess training proficiency, then multidimensional measures may be more appropriate or used in addition to a univariate measure. Primary and embedded secondary task measures relevant to the operational context are also recommended when evaluating workload in VEs, particularly during the course of a complex scenario where multiple measurement intervals may be required to form an understanding of the results.
358
VE Components and Training Technologies
Farmer and Brownson (2003) recommend that a battery of workload measures be selected for simulation based assessments and provide guidance on such a selection. This does not mean analysts should apply any or all measures from such a list, but that several should be considered for the insight they can provide. If a shotgun selection of methods is adopted, the analyst might well end up with a bewildering, contradictory set of results. A careful assessment of the task under study and its context is necessary to select an appropriate battery of workload measurement methods. This battery should include at least one objective measure and make use of quantitative subjective assessments (rather than simply subjective pass/fail ratings). Regardless of the measures selected, the analyst should consider the human factors discussed in the various chapters of this handbook when drawing conclusions from workload analyses conducted in VEs, especially when comparing workload results between a VE and its real world counterpart. Considerable research remains to be conducted to better isolate the various factors in VEs that may contribute to differences in performance, changes in perception, and measurement of workload when compared to the real world. Improving our understanding of these factors will serve both to focus developers of VEs in selecting the most appropriate areas of technology to improve to better approximate the real world and to allow analysts using VEs as either training or testing environments to draw better, more accurate conclusions regarding human performance in these domains. REFERENCES Adams, S. R. (1997). In-flight measurement of workload and situational awareness. Paper presented at the TTCP UTP7 Human Factors in Aircraft Environments Workshop on the validation of measurements, models and theories, Naval Postgraduate School, Monterey, CA. Aldrich, T. B., & McCracken, J. H. (1984). A computer analysis to predict crew workload during LHX Scout-Attack Missions Vol. 1 (Technical Report No. MDA90333333-81-C0504, ASI479-054-I-84(B)). Fort Rucker, Alabama: U.S. Army Research Institute Field Unit. Aldrich, T. B., Szabo, S. M., & Bierbaum, C. R. (1989). The development and application of models to predict operator workload during system design. In G. R. McMillan, D. Beevis, E. Salas, M. H. Strub, R. Sutton & L. V. Breda (Eds.), Applications of human performance models to system design (Vol. 2, pp. 65–80). New York: Plenum Press. Annett, J. (2002). Subjective rating scales: science or art? Ergonomics, 45(14), 966–987. Bowman, D. A., & McMahan, R. P. (2007). Virtual reality: How much immersion is enough? [Electronic Version]. Computer 40, 257. Retrieved February 2008. Byers, C. J., Bittner, A. C. J., Hill, S. G., Zaklad, A. L., & Christ, R. E. (1988). Workload assessment of a remotely piloted vehicle (RPV) system. Paper presented at the Human Factors Society’s 32 Annual Meeting, Santa Monica, California. Cain, B. (2007). A review of the mental workload measurement literature (No. RTO-TRHFM-121-Part-II, AC/323(HFM-121)TP/62). Neuilly-Sur-Seine Cedex, France: North
Assessing Cognitive Workload in Virtual Environments
359
Atlantic Treaty Organization, Research and Technology Organization, Human Factors and Medicine Panel. Casali, J. G., & Wierwille, W. W. (1983). A comparison of rating scale, secondary-task, physiological and primary-task workload estimation techniques in a simulated flight task emphasizing communications load. Human Factors, 25(6), 623–641. Casali, J. G., & Wierwille, W. W. (1984). On the measurement of pilot perceptual workload: a comparison of assessment techniques addressing sensitivity and intrusion issues. Ergonomics, 27(10), 1033–1050. Castor, M. C. (2003). Final Report for GARTEUR Flight Mechanics Action Group FM AG13: GARTEUR Handbook of mental workload measurement (No. GARTEUR TP 45). Stockholm, SE: Group for Aeronautical Research and Technology in Europe. Colle, H. A., & Reid, G. B. (1998). Context effects in subjective mental workload ratings. Human Factors, 40(4), 591–600. Damos, D. L. (1991). Dual-task methodology: some common problems. In D. L. Damos (Ed.), Multiple-task performance (pp. 101–119). London: Taylor & Francis. Dekker, E. D., & Delleman, N. (2007, July). Presence. Paper presented at the Virtual Environments for Intuitive Human-System Interaction. de Waard, R. (1996). The measurement of driver’s mental workload. Heran, The Netherlands: University of Groningen. Durlach, N. I., & Mavor, A. S. (Eds.). (1995). Virtual Reality. Scientific and technological challenges. Washington, DC: National Academy Press. Eggemeier, F. T., Wilson, G. F., Kramer, A. F., & Damos, D. L. (1991). Workload assessment in multi-task environments. In D. L. Damos (Ed.), Multiple task performance (pp. 207–216). London: Taylor & Francis, Ltd. Farmer, E., & Brownson, A. (2003). Review of workload measurement, analysis and interpretation methods: European Organisation for the Safety of Air Navigation. Farmer, E. W., Belyavin, A. J., Jordan, C. S., Bunting, A. J., Tattersall, A. J., & Jones, D. M. (1995). Predictive workload assessment (Final Rep. No. DRA/AS/MMI/ CR95100/1). Farnborough, Hampshire, United Kingdom: Defence Research Agency. Farmer, E. W., Jordan, C. S., Belyavin, A. J., Birch, C. L., & Bunting, A. J. (1993). Prediction of dual-task performance by mental demand ratings (No. DRA/AS/FS/CR93087/ 1). Farnborough, Hampshire, United Kingdom: Defence Research Agency. Farmer, E. W., Jordan, C. S., Belyavin, A. J., Bunting, A. J., Tattersall, A. J., & Jones, D. M. (1995). Dimensions of operator workload (Final Rep. No. DRA/AS/MMI/ CR95098/1). Farnborough, Hampshire, United Kingdom: Defence Research Agency. Gopher, D., & Braune, R. (1984). On the psychophysics of workload: Why bother with subjective measures? Human Factors, 26(5), 519–532. Gopher, D., & Donchin, E. (1986). Workload—An examination of the concept. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of Perception and Human Performance (Vol. 2): Cognitive Processes and Performance. (pp. 41-41–41-49). John Wiley & Sons. Hancock, P. A., & Vasmatzidis, I. (1998). Human occupational and performance limits under stress: The thermal environments as a prototypical example. Ergonomics, 41 (8), 1169–1191. Hart, S., & Wickens, C. D. (1990). Workload assessment and prediction. In H. R. Booher (Ed.), MANPRINT: An approach to systems integration (pp. 257–296). New York: van Nostrand Reinhold.
360
VE Components and Training Technologies
Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. M. Hancock, N. (Ed.), Human Mental Workload (pp. 139–183). Amsterdam: North-Holland. Hill, S. G., Iavecchia, H. P., Byers, J. C., Bittner, A. C., Zaklad, A. L., & Christ, R. E. (1992). Comparison of four subjective workload rating scales. Human Factors, 34(4), 429–439. Huey, B. M., & Wickens, C. D. (1993). Workload transition: Implications for individual and team performance. Washington, DC: National Research Council, Commission on Behavioral and Social Sciences and Education, Committee on Human Factors, Panel on Workload Transition. Jex, H. R. (1988). Measuring mental workload: Problems, progress, and promises. In P. A. Hancock & N. Meshkati (Eds.), Human mental workload (pp. 5–39). Amsterdam, NL: Elsevier Science Publishers B.V. (North-Holland). Johnson, D. M. (2007, July). Simulator sickness research summary. Paper presented at the Virtual Environments for Intuitive Human-System Interaction, Neuilly-Sur-Seine Cedex, France. Jordan, C. S., Farmer, E. W., Belyavin, A. J., Selcon, S. J., Bunting, A. J., Shanks, C. R., et al. (1996, September). Empirical validation of the prediction of operator performance (POP) model. Paper presented at the Human Factors and Ergonomics Society 40th Annual Meeting, Philadelphia, Pennsylvania. Kennedy, R. S., Lane, N. E., Bierbaum, K. S., & Lilienthal, M. G. (1993). Simulator sickness questionnaire: an enhanced method for quantifying simulator sickness. International Journal of Aviation Psychology, 3(3), 203–220. Kramer, A. F. (1991). Physiological metrics of mental workload: A review of recent progress. In D. L. Damos (Ed.), Multiple task performance (pp. 279–328). London: Taylor & Francis. Kramer, A. F., Trejo, L. J., & Humphrey, D. G. (1996). Psychophysiological measures of workload: Potential applications to adaptively automated systems. In R. Parasuraman & M. Mouloua (Eds.), Automation and human performance: Theory and applications (pp. 137–162). Mahwah, NJ: Lawrence Erlbaum. Magnusson, S. (2002). On the similarities and differences in psychophysiological reactions between simulated and real air-to-ground missions. International Journal of Aviation Psychology, 12(1), 3–18. McCracken, J. H., & Aldrich, T. B. (1984). Analyses of selected LHX mission functions: Implications for operator workload and system automation goals (Technical Rep. No. MDA903-81-C-0504 ASI479-024-84). Fort Rucker, AL: U.S. Army Research Institute Aircrew Performance and Training. Meyer, D. E., & Kieras, D. E. (1997). Precis to a practical unified theory of cognition and action: Some lessons from EPIC computational models of human multiple-task performance. (EPIC Tech. Rep. No. 8, TR-97/ONR-EPIC-8). Ann Arbor: University of Michigan, Psychology Department. Na¨hlinder, S. (2002, October). Similarities in the way we react in a simulator and a realworld environment. Paper presented at the SAWMAS First Swedish-American workshop on modeling and simulation. Nygren, T. E. (1991). Psychometric properties of subjective workload measurement techniques: Implications for their use in the assessment of perceived mental workload. Human Factors, 33(1), 17–33.
Assessing Cognitive Workload in Virtual Environments
361
O’Donnell, R. D., & Eggemeier, F. T. (1986). Workload assessment methodology. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and human performance (Vol. 2): Cognitive processes and performance (pp. 42-41–42-49). John Wiley & Sons. Reid, G. B., & Nygren, T. E. (1988). The subjective workload assessment technique: A scaling procedure for measuring mental workload. In P. A. M. Hancock, N. (Ed.), Human mental workload (pp. 185–218). Amsterdam: Elsevier Science Publishers B.V. (North-Holland). Reid, G. B., Shingledecker, C. A., & Eggemeier, F. T. (1981, June). Application of conjoint measurement to workload scale development. Paper presented at the Human Factors Society 25th Annual Meeting, Seattle, Washington. Reid, G. B., Shingledecker, C. A., Nygren, T. E., & Eggemeier, F. T. (1981, October). Development of multidimensional subjective measures of workload. Paper presented at the Conference on Cybernetics and Society sponsored by IEEE Systems, Man and Cybernetics Society, Atlanta, Georgia. Renkewitz, H., & Alexander, T. (2007, July). Perceptual Issues of Augmented and Virtual Environments. Paper presented at the Virtual Environments for Intuitive HumanSystem Interaction, Neuilly-Sur-Seine Cedex, France. Rubio, S., Diaz, E., Martin, J., & Puente, J. M. (2004). Evaluation of subjective cognitive workload: A comparison of SWAT, NASA-TLX, and Workload Profile measures. Applied Psychology, An International Review, 53(1), 61–86. Sanchez-Vives, M. V., & Slater, M. (2005). From presence to consciousness through virtual reality. Nature Review Neuroscience, 6, 332–339. Tsang, P. S., & Velazquez, V. L. (1996). Diagnosticity and multidimensional subjective workload rating. Ergonomics, 39(3), 358–381. Vidulich, M. A., & Bortolussi, M. R. (1988, October). A dissociation of objective and subjective measures in assessing the impact of speech controls in advanced helicopters. Paper presented at the Human Factors Society 32nd Annual Meeting, Santa Monica, California. Vidulich, M. A., & Wickens, C. D. (1986). Causes of dissociation between subjective workload measures and performance. Caveats for the use of subjective assessments. Applied Ergonomics, 17(4), 291–296. Whitaker, L. A., Hohne, J., & Birkmire-Peters, D. P. (1997, September). Assessing cognitive workload metrics for evaluating telecommunication tasks. Paper presented at the Human Factors and Ergonomics Society 41st Annual Meeting, Albuquerque, New Mexico. Wickens, C. D. (1989). Models of multitask situations. In G. R. B. McMillan, D. Salas, E. Strub, M. H. Sutton, R. van Breda, L. (Eds.), Applications of human performance models to system design (Vol. 2, pp. 259–274). New York: Plenum Press. Wickens, C. D. (1992). Engineering psychology and human performance (2nd ed.). New York: HarperCollins Publishers. Wierwille, W. W., & Connor, S. A. (1983). Evaluation of 20 workload measures using a psychomotor task in a moving-base aircraft simulator. Human Factors, 25(1), 1–16. Wierwille, W. W., Rahimi, M., & Casali, J. G. (1985). Evaluation of 16 measures of mental workload using a simulated flight task emphasizing mediational activity. Human Factors, 27(5), 489–502.
362
VE Components and Training Technologies
Wilson, G. C., et. al. (2004). Operator functional state assessment (No. RTO-TR-HFM104, AC/323 (HFM-104) TP/48). Paris, France: North Atlantic Treaty Organisation (NATO), Research and Technology Organisation (RTO) BP 25, F-92201, Neuillysur-Seine Cedex, France. Wilson, G. F. (1998). The role of psychophysiology in future mental workload test and evaluation (NATO Unclassified No. AC/243(panel-8)TP-17-SESS In. 98-02006). Dayton, OH: Armstrong Lab, Wright-Patterson AFB. Wilson, G. F. (2001). An analysis of mental workload in pilots during flight using multiple psychophysiological measures. International Journal of Aviation Psychology, 12(1), 3–18. Wilson, G. F., & Eggemeier, F. T. (1991). Psychophysiological assessment of workload in multi-task environments. In D. L. Damos (Ed.), Multiple task performance (pp. 329– 360). London: Taylor & Francis. Xie, B., & Salvendy, G. (2000). Review and reappraisal of modelling and predicting mental workload in single- and multi-task environments. Work & Stress, 14(1), 74–99. Yeh, Y. Y., & Wickens, C. D. (1985, September/October). The effect of varying task difficulty on subjective workload. Paper presented at the Human Factors Society 29th Annual Meeting, Baltimore, Maryland. Youngblut, C., Johnston, R. E., Nash, S. H., Wienclaw, R. A., & Will, C. A. (1996). Review of virtual environment interface technology. Alexandria, VA: Institute for Defense Analyses.
Part IV: Training Paradigms
Chapter 18
KNOWLEDGE ELICITATION: THE FLEX APPROACH Scott Shadrick and James Lussier The military training environment is transforming. No longer does the systems approach to training (for example, tasks, conditions, and standards; Branson et al., 1975) encompass all that must be trained. Traditional methods requiring lengthy systematic processes used in institutional training must give way to more rapid, but still sound, approaches that allow the military to keep pace with lessons learned in the theater of war. Training development methods rely on eliciting knowledge from experts. The knowledge elicitation process must be reexamined. Units in theater need to rapidly transfer their knowledge to units scheduled for deployment. The desire to quickly insert new technological developments into the current battlefield environment means that tactics, techniques, and procedures (TTPs) must be developed rapidly by taking advantage of limited numbers of experts. Addressing each of these involves the process of using knowledge elicitation techniques, the process of capturing domain knowledge underlying human performance, to support tactical scenario development for various virtual environments. This chapter provides an overview of knowledge elicitation for scenario development and presents a flexible approach to knowledge elicitation using simulation based vignettes.
KNOWLEDGE ELICITATION IN THE MILITARY TRAINING ENVIRONMENT Domain knowledge and skill have been recognized as critical assets related to expert performance (Klein, 1992). Knowledge elicitation describes the process whereby an analyst captures that domain knowledge through systematic interaction with experts. Knowledge elicitation often relies on the direct observation of expert behaviors and other methods designed to capture existing internal cognitive processes. The outcomes of the elicitation effort can be decomposed to describe knowledge and cognitive processes that are linked to observable behaviors within the domain. While much research has been conducted on knowledge elicitation, the elicitation and integration of knowledge still remains one of the
364
VE Components and Training Technologies
most time consuming, tedious, and essential tasks in designing systems (Chervinskaya & Wasserman, 2000). Capturing expert knowledge supports three functions of training: (a) to improve and support human performance in decision-making tasks, (b) to design instruction to efficiently and effectively train individuals, and (c) to automate task functions that consist of human logic processes. The underlying goal of each of those functions is the development of a skill base for a specific domain that will augment performance through assistance, training, and automation. A fourth function is to support the effective analysis and synthesis of future requirements and events (that is, concept development) (Shadrick, Lussier, & Hinkle, 2005). A number of efforts are under way to improve the quality of military training and, by extension, human performance. These efforts include (a) the development of new instructional models designed to accelerate the development of expertise, (b) the development of technologies to support the rapid identification of training needs and the development of training based on current and emerging issues, (c) the development of procedures for collecting robust stories, TTPs, and expertise in the form of tacit knowledge, and (d) the identification and development of training scenarios for virtual training environments. Each of those efforts involves capturing and codifying expert knowledge needed for the development of training using knowledge elicitation. Knowledge elicitation methods are a key component of the informationgathering stage of task or cognitive task analysis. The methods attempt to provide a convenient way to accurately and easily allow experts to communicate their expertise. Despite the abundance of available elicitation techniques, capturing expert knowledge and understanding how the knowledge relates to problem solving capabilities remains an obstacle (Moody, Blanton, & Will, 1998/1999). Often, the elicitation of expert knowledge is vague or glossed over because it is largely ad hoc and nonscientific (Wright & Ayton, 1987). Klein, Calderwood, and MacGregor (1989) point out a tendency to emphasize explicit knowledge over implicit knowledge (also called tacit knowledge). Explicit knowledge is information that is easily accessed, expressed, stored, and applied. Implicit knowledge, on the other hand, is information that experts use repeatedly, but cannot be easily articulated. Experts, through years of practice, develop automatic habits. These unconscious chunks of knowledge usually go to the heart of what makes someone an expert. Knowledge elicitation methods that overemphasize explicit knowledge lead to “the mistaken conclusion that explicit knowledge is sufficient for performing a task well” (Klein et al., 1989, p. 463). For training development, elicitation techniques must be used to determine unique contributions of both implicit and explicit knowledge. Knowledge elicitation techniques can be used to more effectively and efficiently design, develop, evaluate, and update training products. Methods of knowledge elicitation support three needs in the military training environment: (a) institutional, (b) operational unit, and (c) concept development. The institutional domain is comprised by the military’s education system. It provides soldiers, marines, airmen, and sailors with the functional knowledge,
Knowledge Elicitation: The FLEX Approach
365
skills, and abilities needed to perform. It primarily trains individuals to perform specified duties or tasks related to their positions. The institution provides initial military training, professional military education, and develops, teaches, and applies military doctrine. It provides support to operational unit training through exportable training support packages, mobile training teams, and combat training centers. Knowledge elicitation can be a key tool in enabling the institution to develop current training and doctrine, evaluate new TTPs, and update and refine institutional training to meet current demands. Training and training development continues in the operational unit. Here, training focuses on collective training within a unit and associated individual and leader tasks. Operational training builds on the foundation provided by institutional training and introduces additional skills needed to support the unit’s mission. During deployment, the operational unit is responsible for understanding emerging TTPs and developing materials to train the deployed force. That requires the use of knowledge elicitation techniques to understand expert performance and task requirements, and tools to support the rapid development of training products. Concept development is used in the military to develop new systems and is concerned with future operational capabilities, training, and doctrine. It addresses how systems should be developed and how they will be employed. When the goal of knowledge elicitation is concept development, experts are required to generate best-guess estimates based on an anticipated set of new capabilities or TTPs. For example, as the army develops new unmanned aerial systems, there is a need to understand how to employ the new systems. While the participants in the knowledge elicitation exercise may be experts with existing systems, they are asked to predict how the new system will influence future operational conditions. Knowledge elicitation methods can be used to generalize current knowledge to future systems where detailed specifications are not yet known. KNOWLEDGE ELICITATION IN TRAINING DEVELOPMENT Before describing the role of knowledge elicitation in training and scenario development, it is necessary to provide a foundation for scenario based training and simulation. Scenario based training evolved from the problem based learning research. Problem based learning strategies have been implemented since the 1950s (Boud & Feletti, 1997). Savery (2006) describes problem based learning as an “instructional learner-centered approach that empowers learners to conduct research, integrate theory and practice, and apply knowledge and skills to develop a viable solution to a defined problem” (p. 12). As such, problem based learning focuses on authentic problems and scenarios (Savery & Duffy, 1995) to facilitate transfer of skills to real world contexts (Bransford, Brown, & Cocking, 2000). Scenario based training applies the problem based learning model by specifically incorporating the use of context-rich situations. In scenario based training, training participants practice performing key tasks in realistic, live, virtual, or constructive virtual environments. Participants receive exposure to a variety of
366
VE Components and Training Technologies
situations in which they can apply their knowledge to solve complex problems. The scenarios used for training must be developed to specifically trigger the desired performance to accomplish a particular training objective (CannonBowers, Burns, Salas, & Pruitt, 1998; Stretton & Johnston, 1997). Within the military training environment, scenario based training is used interchangeably with simulation based virtual environments and structured simulation based training, which typically use a training support package to facilitate training and practice. During the 1990s, the U.S. Army Research Institute conducted considerable research into the development of structured simulation based training. According to Campbell, Quinkert, and Burnside (2000), the structured training approach is characterized by its emphasis on deliberate purposeful building of training that takes advantage of simulation capabilities. The exercises provide for a focus on critical task in a planned sequence of performance that reinforces learning and builds on prior experience. The training is embedded in the context of tactically realistic scenarios, causing the unit [or individual] to be immersed in the tactical situation. (p. 4)
Scenario based training, simulation based or otherwise, relies on effective methods of task analysis, cognitive task analysis, team task analysis, and knowledge elicitation (Cannon-Bowers & Salas, 1997). A Note Concerning the Use of Virtual Environments for Training Lussier, Shadrick, and Prevou (2003) wrote that the maxim “train as you fight” has risen to such a level of familiarity in the U.S. Army that the maxim goes almost unquestioned. The idea is reflected in the belief that the best training methods rely on performing the task in a realistic environment. Yet, Salas, Bowers, and Rhodenizer (1998) wrote that the merging of the terms “simulation” and “training” and the widespread misconceptions concerning simulation have led to the overreliance on virtual environments and “the misuse of simulation to enhance learning of complex skills” (p. 197). Unfortunately, the ability to use simulations for specific training has not kept pace with the use of simulation for practice activities (compare Salas et al., 1998; Stewart, Dohme, & Nullmeyer, 2002). When new virtual simulators are acquired, they are treated as replications of the environment, and it is assumed that experience in the simulator will result in effective training. The higher the fidelity of the simulation, the stronger the assumption that training in the virtual environment will be effective and will transfer to the real environment. While simulation technology continues to evolve, the training technology has remained the same for decades (Salas et al., 1998). To ensure quality training with simulation, the training must incorporate specific components based upon sound instructional principles (Black, 1996). The key components include the following: (a) identification of tasks, (b) presentation of enabling knowledge, (c) demonstration of how the task should
Knowledge Elicitation: The FLEX Approach
367
be performed, (d) the opportunity for the trainee to perform the task, (e) provision for feedback to the trainee concerning task performance, and (f ) the opportunity to practice the task to mastery under increasingly difficult, but realistic conditions (Black, 1996; Black & Quinkert, 1994; Holding, 1965). Lussier and Shadrick (2004) add additional requirements: an explicit description of elements that constitute correct performance of the task, performance measurement to assess whether the task is performed correctly, active and effective coaching, the opportunity for immediate repetition of poorly performed tasks, and a focus on tasks that are difficult, critical, or constitute areas of individual or collective weakness. The challenge no longer lies in the capability of simulation to present a realistic virtual environment, but in the ability of trainers to ensure that the components of sound instruction previously mentioned are present (Lussier & Shadrick, 2004; Salas et al., 1998). Of the components of effective training listed in the preceding paragraph, tactical engagement simulations alone only truly enable the opportunity to perform and the opportunity to practice. Much additional exercise design work must be done to add the other components. That additional work requires training developers to understand task requirements, to identify appropriate tasks that need to be trained, to identify and/or develop TTPs, and to identify and/or develop appropriate scenarios by conducting a thorough cognitive task analysis and knowledge elicitation.
KNOWLEDGE ELICITATION TECHNIQUES The selection of a knowledge elicitation method involves several considerations. Knowledge elicitation methods are typically divided into direct and indirect methods. Direct methods involve the researcher asking domain experts to describe how they perform their jobs. Examples include interviews, protocol analysis, simulation, and concept mapping. The effectiveness of direct methods depends on the experts’ ability to articulate the information. Experts are usually able to easily verbalize declarative knowledge (explicit knowledge), such as descriptions of facts, things, methods, or procedures. Indirect methods are more suitable when knowledge is not easily expressed by the expert. Tacit knowledge that has been learned implicitly through experience and overlearned automatic procedural knowledge can often be difficult for experts to describe. Indirect methods require the experts to describe their knowledge with the help of predefined structures, such as repertory grids, decision trees, card sorting, and laddering techniques. Due to the complexity of most military domain areas and the difficulties associated with forecasting future capabilities and requirements, a combination of several methods is typically employed to collect information from knowledgeable experts. The next sections will briefly highlight popular methods of knowledge elicitation. The review is not intended to be exhaustive. The intent is to provide an overview of a few widely used methods that contributed to the development of a new method of elicitation focused on the elicitation of knowledge for use with scenario and concept development.
368
VE Components and Training Technologies
The Delphi Technique The Delphi technique was developed to provide a method to elicit expert knowledge and develop group consensus (Dalkey, 1969). In Delphi approaches, experts do not interact directly with one another. The researcher gathers responses and provides them to other expert participants for review after which each expert submits a revised response. The process is repeated until consensus is established while avoiding groupthink bias. Difficulty arises in assessing the quality of the answer and, thus, whether groupthink existed in reaching consensus (Meyer & Booker, 1991). The Delphi method also requires significant time to implement. Interviews The interview method requires the researcher to ask experts questions in order to recall and describe the steps necessary to perform a task. In interviews, the researcher asks open-ended questions about the expert’s reasoning in making decisions. The unstructured interview method allows the researcher to become familiar with jargon and gain an overview of the domain. The major disadvantage of unstructured interviews is that disorder can ensue if the expert speaks off topic or erroneously assumes that the researcher is knowledgeable in the domain (Hoffman, Shadbolt, Burton, & Klein, 1995). Structured interviews are preplanned to reduce the time required for the interview by focusing the expert on specific questions. Structured interviews require the researcher to have some knowledge of the domain. Semistructured interviews are often preferred since they allow the researcher to add questions to clarify points and to omit questions that become irrelevant as the interview unfolds. The technique promotes more continuity in the data than unstructured interviews. Continuity assists in the comparison and aggregation of responses from various experts. Semistructured interviews are more likely to produce data that are germane without the inefficiencies of collecting unnecessary data (Meyer & Booker, 1991). Protocol Analysis Protocol analysis requires the researcher to record the expert as the task is completed. That process often generates more valid knowledge than asking experts to verbally describe the steps required for the task (Wright & Ayton, 1987). The expert is required to “think aloud” while working through the problem or situation to identify knowledge elements and steps required for problem solving in the domain. Protocol analysis allows the expert to perform the task in a real world context while describing the cognitive activities. Protocol analysis is based on introspection and may interfere with the problem solving processes (Wright & Ayton, 1987). Thus, if the task has a substantial cognitive component, the think-aloud requirement may interfere with task performance. In retrospective protocol analysis, the expert is videotaped during task performance and interviewed after task performance to reduce cognitive interference.
Knowledge Elicitation: The FLEX Approach
369
OVERVIEW OF THE FLEX METHOD While the previously described methods provide a number of advantages, a new method is needed to address the unique aspects associated with the development of new concepts and future conditions and scenarios. To address that need, the Flexible Method of Cognitive Task Analysis (FLEX) was developed (Shadrick et al., 2005). The FLEX method allows researchers to capture existing knowledge and facilitates the creation of new knowledge and concepts. The technique is similar to the information acceleration method used in the marketing domain to forecast consumers’ responses to new products by providing early models to focus groups. The FLEX method is an interview based problem solving approach that systematically develops and explores new concepts. The FLEX method grounds the experts’ thinking in a concrete setting. Knowledge is captured by employing a vignette based scenario approach where experts are required to solve a complex problem by employing the resources and capabilities provided. Participants may interact with several variations of the scenario to explore the range of considerations. Similar to protocol analysis, participants are asked to verbalize their responses by thinking aloud. Each vetted response is provided to subsequent participants (for example, Delphi) to identify weaknesses, confirm strengths, and build upon the prior responses. This provides the participants with a way to evaluate, modify, and shape the scenario and expert response. It also allows the participants to develop additional, although related, scenarios and solutions. During the process, a semistructured interview is used to probe expert knowledge and gain a deeper understanding of expert reasoning. Responses from subsequent participants are fed back to the original participants for additional input. Finally, an interactive group discussion is used to allow for consensus building and validation. The following section summarizes the steps involved in developing FLEX scenarios and conducting a knowledge elicitation and concept development exercise (Shadrick et al., 2005). Phase 1: Domain and Problem Identification The domain and problem identification phase is used to define the domain area and identify problems of interest. The information is used to identify potential experts from a variety of relevant domains—particularly when technological advancements are expected to play a critical role in the scenario or concept development. Where future technologies are concerned, an environmental scan, backcast, and/or technological forecast should be conducted to assist in understanding the domain. Phase 2: Initial Review and Analysis The initial review and analysis phase is used to capture existing information about the domain and problem space. The phase is similar to what would be expected during traditional and cognitive task analysis. During this phase it is
370
VE Components and Training Technologies
necessary to interview experts, document processes, and gather information about expert performance. In contrast to traditional task analysis, the goal is not to develop a complete list of tasks and duties for a given domain. The goal is to gain an understanding of domain expertise, task requirements, processes, and outcomes. During the review it is critical to identify areas where innovation may lead to improved performance and processes and to gain an understanding of current research and immediate technological innovation for the domain. Phase 3: Refine Problem Space and Develop Initial Scenario During the refine problem space and develop initial scenario phase, initial decisions about the knowledge elicitation scenario should be established. An initial group of domain experts is used to develop one or more scenarios with appropriate “branches.” The branches should represent conditions where responses may vary depending on the context. The scenarios should be developed in an iterative fashion, allowing experts to develop a realistic situation capable of focusing the knowledge elicitation process on the areas of interest. At this stage, it is necessary only to establish the initial conditions for the knowledge elicitation process—it is not necessary to fully develop the situation. Phase 4: Initial Knowledge Elicitation During the initial knowledge elicitation phase, knowledge and concepts are elicited from experts from the relevant domain based on the scenario(s). The scenario serves as a starting point to focus the experts on a particular problem space. The elicitation process allows the experts to further refine the scenario by adding new information, challenging assumptions, anticipating unintended consequences, and predicting interventions. During the elicitation, experts are asked to discover new ways to solve problems associated with the scenario given hypothesized future capabilities. The elicitation focuses on both individuals and small groups (for example, dyads and triads) and combines experts from different specialty areas. Phase 5: Data Reduction and Consensus Building The data reduction phase allows a small group of experts, different from those in phase 4, to develop a consensus on the efficacy of the knowledge and concepts captured by aggregating data into common and meaningful responses. That information can be used to update the scenario to reflect the new knowledge or to create a new branch to highlight “what if ” situations. The altering of the scenario allows the experts to “leave their fingerprints” on the scenario and allows for an iterative process for continuous improvement. After revising the scenario it is necessary to reexamine the scenario with a new set of experts and provide the revised materials to former participants. This process provides a systematic way to evaluate the realism of the new information.
Knowledge Elicitation: The FLEX Approach
371
The goal of the iterative process is to develop a well-tested and documented solution for the purpose of developing new theories, principles, tools, techniques, and procedures. Phase 6: Knowledge Representation and Concept Documentation Knowledge representation provides a mechanism for documenting and displaying information in a usable format. In this context, knowledge refers to organized concepts, theories, principles, descriptions, and mental models of descriptive, procedural, and metacognitive information. The goal is to present the results of the knowledge elicitation process in a meaningful way. Knowledge can be represented using a variety of methods, including logic, semantic networks, production rules, frame based representations, decision trees, graphs, diagrams, charts, and tables. APPLICATION OF THE FLEX METHOD This section will provide two case studies documenting the use of the FLEX approach to elicit knowledge and develop appropriate scenarios. Teams of Teams—The Homeland Security Example Development of effective training for homeland security or natural disaster crises poses several unique problems. Developing an effective response to such events requires the coordinated efforts of multiple government agencies. For that reason, multiple groups of domain experts from diverse agencies must be involved in the analyses and training design processes (Landis, Fogli, & Goldberg, 1998). Another concern is that many experts may have only limited expertise. That is, experts may understand their own agencies’ procedures. However, they may not have experienced them firsthand, and they may not understand how their agencies’ procedures interact with those of other agencies. As a result, although they may have the necessary factual knowledge, they have not developed the proceduralized skills that are the hallmark of a true expert. Thus, there may be relatively few true experts. A key strength of FLEX lies in its ability to capture, create, and organize the knowledge of domain experts regarding events that have yet to actually occur when few experts are available. The FLEX iterative process of knowledge elicitation, domain expert review, and rapid development was applied to develop prototype crisis response scenarios and training (Shadrick, Schaefer, & Beaubien, 2007). The process began with a review of documents provided by various army, state, county, and municipal government agencies. From those materials, an initial understanding of the various types of homeland security and natural disaster crises was developed, and effective crisis responses were identified. Next, FLEX interviews were conducted to validate the initial assumptions and to collect additional information for use in scenario development. The purpose of the interviews was to identify examples of historically effective and ineffective crisis response behaviors and to identify
372
VE Components and Training Technologies
critical events for which the participants were ill prepared to respond. The interviewees included experts from eight military and civil-military, interagency partners. From the interviews, a series of high level scenarios was developed to conduct FLEX sessions. The scenarios and training behaviors were subsequently reviewed by an additional set of experts and revised as necessary. The interviews also resulted in the development of a set of behavioral indicators that could be used to score performance during the training scenarios. The indicators serve as benchmarks of expert performance for a given scenario. During each successive round of expert interviews, the researchers were able to identify and resolve critical shortcomings in the proposed training content and format. After the third round of interviews, the experts did not identify additional changes, suggesting that the major issues had been resolved. This set of materials was then used to develop the final training program. In addition to the 10 training scenarios, the process also produced an expert mental model that could be used for crisis response. Expert assessments of the training materials (including scenarios) were overwhelmingly positive, and the experts strongly supported the relevance of the materials (Shadrick et al., 2007). The content validity of the training was assessed by quantitative indices elicited from an independent sample of experts with crisis response experience. Results for the training items resulted in a content validity ratio of +1.0 with only 1 of 122 items receiving a negative value. Results for the conceptual grouping of information resulted in Cohen’s kappa values of 0.78, percent agreement = 0.85. Subsequent implementation of the training scenarios resulted in significant performance improvement (Schaefer, Shadrick, Beaubien, & Crabb, 2008). Future Requirements—Spinning Out Future Technologies to the Current Force The U.S. Army is transforming to a lighter, highly mobile future force that can operate readily within joint, interagency, and multinational environments. This force is designed to be responsive, deployable, agile, versatile, lethal, survivable, and sustainable. Possession of these characteristics will enable future force units to see first, understand first, act first, and finish decisively. Future force units are designed to apply knowledge based capability to respond rapidly and decisively across the full spectrum of military operations through deployment with the Future Combat Systems. As part of the development of the family of systems, the army has planned a series of spinouts or rapid transitions of future force technologies to the current force. The integration of future force technologies with current force systems that have been fielded for many years and were not designed for incorporating revolutionary new technologies will be a significant challenge. To meet this challenge TTPs must be provided with technology spinouts so that soldiers and leaders are not left to “figure out” how to integrate and employ each new technology provided. Thus, initial TTPs will need to be developed before the capabilities are actually produced.
Knowledge Elicitation: The FLEX Approach
373
As a result, traditional methods for developing TTPs may be inadequate. Traditional methods range from analysts and experts developing and presenting new concepts, to using large-scale simulation exercises in which groups of soldiers employ the capability, to user juries in which soldiers provide feedback on the developed capability. Each approach has its own strengths and drawbacks. Many researchers have noted the difficulties that experts have in realistically assessing the impact of future capabilities. Large-scale simulations are resource- and timeintensive endeavors that are difficult to control and replicate, thus lowering the validity of any findings. User juries require that the capability be well developed before the jury can occur. Therefore, there is a need to investigate TTP development methods to augment the methods above, methods that provide structured activities to measure, assess, and guide the TTP development process, yet are flexible enough to respond rapidly to a wide range of conceptual constructions. To address the need described above, the FLEX method was applied with the goal of generating concepts for future operations that can be used to develop ideas about TTPs (for example, how a particular capability might best be used). Concept development sessions were used to present a depiction of a situation designed to start the participants thinking about how the army might function in the future. Each participant was shown a problem scenario relevant to an important army issue and asked to think through the specific situation. Then, a broader discussion was initiated to include considerations beyond the scenario presented in order to test the generality of the solutions devised for the specific scenario. A succession of participants confronting the same scenarios helped refine the concepts. After several participants had completed the trial, responses and process observations were used to modify the scenario to allow it to become more realistic, more focused, and to incorporate participants’ ideas for additional trials. This process of refinement and presentation was repeated until no new information was being generated. After final modifications were made, the scenario was presented to a small group in anticipation that additional information might be forthcoming from a group setting. The process was repeated until it reached the point of diminishing returns. After analysis of data from multiple participants, a composite response to the issue was produced. Further exploration may use new scenarios or a scaled world environment to systematically test the group solution. The FLEX approach was able to stimulate participants and elicited unique, creative, and well-reasoned responses. There were, however, considerable individual differences. Some participants had difficulty critically evaluating the future concepts. Some participants appeared to be able to transition into looking at the situation from the viewpoint of how things might be in the future (2015–2020) quite readily. Others required considerable encouragement. Soldiers have a tendency to look at situations the way they have been trained (that is, mission, enemy, terrain, troops, time, and civilians), and they often tend to focus on a course-of-action analysis. In a concept development situation, it may require some training or encouragement to lead their thinking toward “how else could we do things, particularly if we had access to assets that don’t even exist today?”
374
VE Components and Training Technologies
TTP SUMMARY FOR A RAID Give each UAS a task and purpose. Direct two platoon UASs to recon objective early. Fly high to make the UAS harder to engage. Use the lead platoon UAS to conduct route recon, and hand off next platoon in order of march to recon, confirm, confirm/deny situation on objective. Hold trail platoon’s and company commander’s UASs in reserve in event either of the others gets shot down or goes down due to mechanical/technical errors. Keep the UAS as close to your ground forces as feasible without compromising mission requirements. Stay within direct fire range of your UASs so you can react to contact. Conserve UAS resources until you need them, such as on the objective. Use the UAS to support recon while moving to objective. Observe avenue of approach, escape routes, and enemy strongpoints. Use the commander’s UAS for command and control. Alter flight patterns to improve survivability of the UAS, particularly in urban areas. Mission • Assign a specific task to each UAS. • Keep the company UAS centrally located to help with command and control and serve as a reserve asset. • Use the company UAS to observe dead space and confirm or deny enemy presence, obstacles, or other threats or impediments to movement. • Divide the objective area into platoon sectors with each platoon responsible for specific reconnaissance and observation tasks. • Specify launch and recovery triggers. • Use the platoon UASs to support the cordon, being sure to cover ingress/egress routes and likely avenues of approach. • Develop a movement and rotation plan that allows for continuous observation/ coverage.
Enemy • Fly from different directions, both near and far, to minimize the enemy’s ability to detect the UAS and determine friendly intentions. • Deploy the UAS to conduct recon of the builtup area just before you get to objective to reduce the enemy’s reaction time.
Terrain and Weather • Use the UAS to scan terrain and obstacles from different angles. • In an urban terrain, use the UAS to do a detailed recon and to see inside buildings when possible.
Knowledge Elicitation: The FLEX Approach
375
Troops and Support Available • Be sure to have backup observation platforms available to ensure security can be maintained if the UAS is destroyed. • Use the UAS early in coordination with indirect fire assets to degrade enemy’s ability to resist. • Use the UAS to support soldiers who are clearing buildings; the UAS can see all exterior angles minus subsurface and can observe into the building interior.
Time Available • For a short-term, mobile target, send a UAS out immediately to monitor the area and locate/track the target.
On the other hand, once they did start thinking futuristically, they were very creative and developed reasonable approaches to the problem. Topolski, Leibrecht, Kiser, Kirkley, and Crabb (2008) used the FLEX method to develop TTPs for employing Class I unmanned aircraft systems (UASs) during a variety of simulation based exercises. The researchers were attempting to develop employment TTPs for the systems even before the systems were produced. Participants were provided background information on the area of operations and on the expected Class I UAS capabilities. In one exercise, participants were asked to employ the systems during a raid operation in urban terrain against an insurgent force. The sidebar provides sample TTPs developed using the process. The researchers assess the effectiveness of the FLEX method from the perspective of the researcher and participant. Both groups rated all aspects of the FLEX method highly. SUMMARY Potter, Roth, Woods, and Elm (2000, p. 321) wrote, “In performing cognitive task analysis, it is important to utilize a balanced suite of methods that enable both the demands of the domain and the knowledge and strategies of domain expertise to be captured in a way that enables a clear identification of opportunities for improved support.” There is clearly a need for a systematic approach to eliciting information needed to more effectively develop scenarios and future concepts, processes, systems, and procedures. The method described here provides a flexible method for eliciting and creating scenarios, and concepts and information for investigating, creating, testing, and understanding future issues. The FLEX method also holds considerable promise when dealing with task domains that are expected to undergo rapid change, that involve distributed expertise, or that have not yet occurred. Future efforts will evaluate the actual efficacy and efficiencies realized implementing the FLEX method. Ongoing efforts, such as the U.S. Army Research Institute’s project to develop training
376
VE Components and Training Technologies
for large-scale interagency training and the development of TTPs for the Future Combat Systems, are using the method to further evaluate its effectiveness. REFERENCES Black, B. A., (1996). How will simulation enhance training? Unpublished manuscript [NATO RSG 26]. Black, B. A., & Quinkert, K. A. (1994). The current status and future trends for simulationbased training in armored forces from crew to battalion level. Proceedings of the 35th NATO-DRG Symposium on Improving Military Performance through Ergonomics. Mannheim, Germany. NATO AC/243-TP/6. Boud, D., & Feletti, G. (1997). The challenges of problem-based learning (2nd ed.). London: Kogan Page. Branson, R. K., Rayner, G. T., Cox, J. L., Furman, J. P., King, F. J., & Hannum, W. H. (1975). Interservice procedures for instructional systems development (Vols. 1–5, TRADOC Pam 350-30 NAVEDTRA 106A). Ft. Monroe, VA: U.S. Army Training and Doctrine Command. Bransford, J. D., Brown, A. L., & Cocking, R. R. (Eds.). (2000). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press. Cannon-Bowers, J. A., & Salas, E. (1997). A framework for developing team performance measures in training. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance assessment and measurement (pp. 45–77). Hillsdale, NJ: Erlbaum. Cannon-Bowers, J. A., Burns, J. J., Salas, E., & Pruitt, J. S. (1998). Advanced technology in decision-making training. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 365–374). Washington, DC: APA Press. Campbell, C. H., Quinkert, K. A., & Burnside B. L. (2000). Training for performance: The structured training approach (ARI Special Report 45). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Chervinskaya, K. R., & Wasserman, E. L. (2000). Some methodological aspects of tacit knowledge elicitation. Journal of Experimental & Theoretical Artificial Intelligence, 12, 43–55. Dalkey, N. C. (1969). The Delphi Method: An experimental study of group opinion. Santa Monica, CA: Rand Corporation. Holding, D. H. (1965). Principles of training. Oxford, England: Pergamon. Hoffman, R. R., Shadbolt, N. R., Burton, A. M., & Klein, G. A. (1995). Eliciting knowledge from experts: A methodological analysis. Organizational Behavior and Human Decision Processes, 62, 129–158. Klein, G. (1992). Using knowledge engineering to preserve corporate memory. In R. R. Hoffman (Ed.), The psychology of expertise: Cognitive research and empirical AI (pp. 10–190). Mahwah, NJ: Lawrence Erlbaum. Klein, G., Calderwood, R., & MacGregor, D. (1989). Critical decision method for eliciting knowledge. IEEE Transactions on Systems, Man, & Cybernetics, 19(3), 462–472. Landis, R. S., Fogli, L., & Goldberg, E. (1998). Future-oriented job analysis: A description of the process and its organizational implications. International Journal of Selection and Assessment, 6(3), 192–197.
Knowledge Elicitation: The FLEX Approach
377
Lussier, J. W., & Shadrick, S. B. (2004, December). How to train deployed soldiers: New advances in interactive multimedia instruction. Paper presented at the Interservice/ Industry Training, Simulation, and Education Conference, Orlando, FL. Lussier, J. W., Shadrick, S. B., & Prevou, M. I. (2003). Think like a commander prototype: Instructor’s guide to adaptive thinking (ARI Research Product 2003-01). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Meyer, M. A., & Booker, J. M. (1991). Eliciting and analyzing expert judgment: A practical guide. San Diego, CA: Academic Press. Moody, J. W., Blanton, J. E., & Will, R. P. (1998/1999, Winter). Capturing expertise from experts: The need to match knowledge elicitation techniques with expert system types. The Journal of Computer Information Systems, Winter, 89–95. Potter, S. S., Roth, E. M., Woods, D. D., & Elm, W. (2000). Bootstrapping multiple converging cognitive task analysis techniques for system design. In J. M. Schraagen, S. F. Chipman, & V. L. Shalin (Eds.). Cognitive task analysis (pp. 317–340). Mahwah, NJ: Lawrence Erlbaum. Salas, E., Bowers, C. A., & Rhodenizer, L. (1998). It is not how much you have but how you use it: Toward a rational use of simulation to support aviation training. International Journal of Aviation Psychology, 8(3), 197–208. Savery, J. R. (2006). Overview of problem-based learning: Definitions and distinctions. The International Journal of Problem-Based Learning, 1(1), 9–20. Savery, J. R., & Duffy, T. M. (1995). Problem-based learning: An instructional model and its constructivist framework. In B. Wilson (Ed.), Constructivist learning environments: Case studies in instructional design (pp. 135–148). Englewood Cliffs, NJ: Educational Technology Publications. Schaefer, P. S., Shadrick, S. B., Beaubien, J., & Crabb, B. T. (2008). Training effectiveness assessment of Red Cape: Crisis Action Planning and Execution (Research Rep. No. 1885). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Shadrick, S. B., Lussier, J. W., & Hinkle, R. (2005). Concept development for future domains: A new method of knowledge elicitation (Tech. Rep. No. 1167). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Shadrick, S. B., Schaefer, P. S., & Beaubien, J. (2007). Development and content validation of crisis response training package Red Cape: Crisis Action Planning and Execution (Research Rep. No. 1875). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Stewart, J. E., Dohme, J. A., & Nullmeyer, R. T. (2002). U.S. Army initial entry rotarywing transfer of training research. International Journal of Aviation Psychologist, 12 (4), 359–375. Stretton, M. L., & Johnston, J. H. (1997). Scenario-based training: An architecture for intelligent event selection. Proceedings of the 19th Annual Meeting of the Interservice/Industry Training Systems Conference (pp. 108–117). Washington, DC: National Training Systems Association. Topolski, R., Leibrecht, B. C., Kiser, R. D., Kirkley, J., & Crabb, B. T. (2008). Flexible method for developing tactics, techniques, and procedures for future capabilities. Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Wright, G., & Ayton, P. (1987). Eliciting and modeling expert knowledge. Decision Support Systems, 3, 13–26.
Chapter 19
STORY BASED LEARNING ENVIRONMENTS Andrew Gordon
STORY BASED LEARNING ENVIRONMENTS There is no substitute for experience in the acquisition of complex skills. The goal of virtual environment training is not to reduce the need for experience, but rather to provide experiences to trainees with less cost, risk, and time than would be required if these experiences were instead acquired on-the-job. Accordingly, the challenge in developing effective virtual environment training is designing experiences for trainees that support learning. In story based learning environments these experiences are designed to have distinctly narrative qualities: a set of characters, a temporal sequence of causally related events, a rich but relevant amount of descriptive detail, and a point. Typically the trainee participates as a character within this environment following a learning-by-doing pedagogical strategy, and the actions taken affect the outcomes of an emerging storyline. In the ideal case, the trainees in a story based learning environment walk away with a story to tell about their training experience—one that is not markedly different from the best stories that practitioners tell about their real world experiences. The research history of story based learning environments over the last three decades has been driven by changes in technology, but defined by particular paradigms of instructional design. In the late 1980s the Learning Technology Center at Vanderbilt University was a center of research in story based learning environments, guided by the design principles of anchored instruction (Cognition and Technology Group at Vanderbilt, 1990). The university’s Jasper Woodbury project used a hypertext-controlled video laser-disc player to present students with problems grounded in a fictional scenario, where completing the problems would enable the students to write their own ending to the story (Cognition and Technology Group at Vanderbilt, 1992). In the 1990s, Northwestern University’s Institute for Learning Sciences led research on story based learning environments following a design philosophy known as goal based scenarios (Schank, Fano,
Story Based Learning Environments
379
Bell, & Jona, 1993). These systems, typically constructed as software applications using desktop video to deliver story content, were constructed in diverse learning domains that included corporate tax advising, wetlands management, and counseling couples about sickle cell anemia. The late 1990s saw the commercialization of outcome-driven simulations (Cleave, 1997), a cost-effective design for goal based scenarios that was championed by Cognitive Arts, Inc., and others, deliverable as a Web application where story content was given as text with digital photographs of fictional situations. At the beginning of the twenty-first century, story based learning environments took advantage of advances in virtual reality and gaming technologies. This is best exemplified by the research projects of the Institute for Creative Technologies at the University of Southern California (Swartout et al., 2006; Gordon, 2004; Korris, 2004; Hill, Gordon, & Kim, 2004), which increasingly have followed the educational design principles of guided experiential learning (Clark, 2004). Throughout this research history there has been a change in the nature of the story in story based learning environments. In the early anchored instruction prototypes the central story of the experience was a completely fictional narrative designed to appeal to the demographic of the target audience. In subsequent work on goal based scenarios, the central story of the experience was fictional, but it was delivered along with a collection of nonfiction stories that illustrated particular points or lessons relevant to events in the fictional storyline. These nonfiction stories consisted of short narratives of the real world experiences of skilled domain experts, often presented as desktop video clips as part of an automated tutoring component of the learning environment. Beginning with the commercialization of outcome-driven simulations in the late 1990s, these two types of stories (fiction and nonfiction) in story based learning environments became closely intertwined. In this work and in the story based learning environments that would follow, the collection and analysis of real world nonfiction stories became integral to the authoring of the fictional storyline and structuring of the user interaction. The stories in contemporary story based learning environments are defined by the real world nonfiction anecdotes that training developers collect from subject matter experts. The evolving role of nonfiction stories in the development of story based learning environments brings a new perspective to this form of training application. In the past, these applications served as the delivery method for an explicit body of training content. Today, these applications function more as a complex form of communication, mediating between storytellers and the people who can best benefit from hearing these stories. Story based learning environments can be viewed as a form of digital storytelling, where the fictional storylines of learning environments are media that can preserve the underlying points of stories acquired through real world experience. Seen from this perspective, the main challenges for developers of story based learning environments concern the management of real world story content through the development pipeline. This includes the following key problems: How do developers collect stories of real world experiences that will serve as the basis for the training application? How should
380
VE Components and Training Technologies
these stories be analyzed to identify their central points and relationship to training objectives? How can real world story content be fictionalized and utilized within the context of a virtual reality training application? In this chapter we discuss a number of best practices for each of these three development questions. As a context for this discussion, we begin by providing an example of a story based learning environment, the Institute for Creative Technologies (ICT) Leaders Project.
THE ICT LEADERS PROJECT: A STORY BASED LEARNING ENVIRONMENT The ICT Leaders Project, a collaboration between the University of Southern California’s Institute for Creative Technologies and Paramount Pictures, was a research effort aimed at allowing junior U.S. Army officers to practice making leadership decisions in the context of complex fictional scenarios realized in a virtual reality environment (Gordon, van Lent, van Velsen, Carpenter, & Jhala, 2004; Gordon, 2004; Iuppa & Borst, 2007). The trainee played the role of a U.S. Army captain commanding a company of soldiers on a peacekeeping mission in Afghanistan. The situation, which parallels the story developed for a live-action U.S. Army training film (Hill, Douglas, Gordon, Pighin, & van Velsen, 2003), involved providing security for a food distribution operation complicated by the presence of competing warlords. Rather than relying on scripted video, however, the ICT Leaders Project presented the story and fictional scenario in a virtual reality environment based on a commercial game engine, where cinematic scenes were interwoven among conversations with animated virtual soldiers and civilians in the environment. The user experience in the ICT Leaders application was structured around a series of scripted cinematic scenes rendered in the virtual environment of the game engine. These scenes moved the storyline forward and presented challenging leadership problems where a decision had to be made by the user. These problems were always presented to the user by storyline characters, and the primary user interaction involved text based conversations with these characters. The user had the opportunity to raise questions and make comments concerning the problem, but in order to move the storyline forward the user needed to communicate a decision to the virtual character. The choice that the user made had a direct effect on how the storyline unfolded, where different choices caused the experience to follow different paths in a branching storyline structure. The application prototype included 11 decisions that make up the branch points of the storyline, and each of these decisions was motivated by a specific leadership point as evidenced by a nonfiction narrative of a leadership experience. These leadership stories were collected through directed interviews with experienced company commanders. Each story was subsequently analyzed to identify its central point, the lesson that challenges the expectations that novices have when adopting a leadership role.
Story Based Learning Environments
381
The ICT Leaders Project used a commercial game engine, Epic Game’s Unreal Tournament 2003, to create the virtual environment for the user experience. Using the standard “mod” editor that comes with this product, a new type of interactive application was developed that was very far removed from its original first-person shooter design. Custom terrain maps, character models, animations, sound effects, and props were created to produce an immersive virtual reality environment to serve as a backdrop for the fictional storyline. The storyline itself was authored with the help of a professional Hollywood scriptwriter, and professional voice actors were used to record the dialogue of the virtual characters. Cinematic scenes were designed with the assistance of a professional director to give the production a traditional cinematic style, particularly with respect to camera movement and cuts. In the opening sequence of the application, the trainee takes on the role of Captain Young, who is to lead an infantry company in Afghanistan as a replacement for a previous captain removed due to a medical emergency. In the morning after his arrival, Captain Young meets with the first sergeant and executive officer to go over the security plan for a food distribution operation being conducted by a nongovernment relief organization. The executive officer assures the captain that everything is in order, but has a question regarding the leadership style that the captain will set for the company: Should soldiers in the company make their own decisions when problems arise, or should they consult the captain before taking initiative? This question ends the scripted scene, and the trainee is then allowed to discuss the issue with the executive officer using a text based dialogue interface. The trainee can ask questions and get further clarification from the executive officer by typing them into the system (for example, “How experienced are the junior officers in this company?”). Responses to these questions are selected using a text classification algorithm, built from a corpus of handannotated questions using machine learning technologies. The selected responses are then delivered to the trainee as recorded audio clips accompanied by character animations. Ultimately, the trainee must provide a decision with regard to the original question posed by the executive officer, either to let the soldiers take initiative or to request that they consult first with the captain. When either of these choices is entered into the text based dialogue interface, a branch in the storyline is selected and a new cinematic scene moves the trainee to the next decision point. From the perspective of the research history of story based simulations, the ICT Leaders Project can be viewed as a type of outcome-driven simulation (Cleave, 1997) embedded in a virtual reality environment, where branches in the space of outcomes are selected using a text based dialogue interface. Aside from the work needed to support these extensions, the authoring of the ICT Leaders Project closely followed the approach established during the commercialization of this technology in the late 1990s, albeit with substantial influence from the Hollywood entertainment community (Iuppa, Weltman, & Gordon, 2004). As such, the methods used in the ICT Leaders Project address many of the concerns in the design of contemporary story based learning environments. In the
382
VE Components and Training Technologies
remaining sections of this chapter, we discuss the three key issues of story collection, story analysis, and simulation design using this project to illustrate current directions in the evolution of these systems.
CAPTURING THE STORIES OF EXPERIENCE The first-person nonfiction narratives that people share about their experiences are increasingly valued as an instrument for knowledge socialization—the sharing of knowledge through social mechanisms. Schank and Abelson (1995) argue that stories about one’s experiences and the experiences of others are the fundamental constituents of human memory, knowledge, and social communication. Sternberg et al. (2000) argue that storytelling is particularly valuable as a means of communicating tacit knowledge. This enthusiasm for storytelling is echoed in the management sciences, where organizational storytelling is seen as a tool both for organizational analysis and organizational change (Boyce, 1996; McCormack & Milne, 2003; Snowden, 1999). Organizational stories are also increasingly used in the development of effective computer based knowledge management applications. Johnson, Birnbaum, Bareiss, and Hinrichs (2000) describe how story collection can be directly linked to work flow applications in order to provide story based performance support. This rising interest in the role of stories in organizations has paralleled the increased importance of stories in the development of story based training applications, creating synergies in the theory and practice of story management. One of the central problems in the use of stories for knowledge management and training applications concerns the scalability of the methods used to collect them from the people who have interesting stories to tell. Today, the vast majority of stories that are used in organizational knowledge management and training applications are manually gathered through direct interviews with subject matter experts. The methods used to collect stories through interviews vary considerably; some more closely resemble cognitive task analysis techniques (Clark, Feldon, van Merrienboer, Yates, & Early, 2007), and others involve small group “story circle” meetings (Snowden, 2000). For the ICT Leaders Project and others at the University of Southern California’s Institute for Creative Technologies, an interview methodology evolved over a number of years that was particularly effective at gathering stories from U.S. Army soldiers (Gordon, 2005). Interviews were arranged for an average of 10 soldiers, 2 at a time, over sessions that lasted one hour each. These interviews were conducted in an extremely casual manner, where two or three members of the development team would talk with the 2 soldiers around a table, recorded using unobtrusive room microphones rather than individual or lavaliere microphones. The main goal of these interviews was to maximize the number of stories told by each pair of soldiers during the course of the hour-long session. The tactics were to trigger a memory of some real experience by asking leading questions related to the topic of the eventual training application and to set a conversational tone that would encourage soldiers to tell these stories. When
Story Based Learning Environments
383
soldiers began talking in abstractions and making generalizations, the tactic was to push them to get more specific and to describe an actual experience that illustrated the point of their generalizations—or contradicted them, as was often the case. When a soldier started telling a story of a real experience, the tactic was to encourage him or her to keep talking, mostly by avoiding the natural conversational tendencies to offer some commentary on his or her story or to respond with a related story from one’s own experience. Often, a silent pause was enough to prompt him or her to continue with a story or to provide another example. A key aspect of these interviews was that they were always conducted with pairs of soldiers. When one soldier finished telling a story, the other would invariably be reminded of a story from his or her own experience. In the best cases, the interviewers could simply listen to the swapping of stories by the two soldiers, intervening only when the topics drifted away from training objectives. For the ICT Leaders Project, these interview methods were employed to collect stories related to U.S. Army leadership skills. In the summer of 2002, interviews were conducted at the U.S. Military Academy at West Point with 10 U.S. Army captains, each having just completed service as a company commander and beginning a master’s degree program in Behavioral Science in Leadership. Sixty-three stories of leadership were gathered using these story-collection interview methods, an average of just over 12 stories per hour. Although effective for targeted research and development projects, interview methods such as this are not scalable solutions to the problem of organizationwide story collection. If story collections are to be widely used in the largescale development of knowledge management and training applications, then the costs of collecting stories from subject matter experts and other members of organizations must be substantially reduced. In the past few years, the phenomenal rise of Internet weblogging has created new opportunities for computer-supported story-management applications (for example, Owsley, Hammond, & Shamma, 2006). With the estimated number of weblogs exceeding 70 million in March 2007 (Technorati, 2007), there is a reasonable expectation that substantial numbers of people in any profession or large organization are already sharing their stories with the public at large. If storytelling in weblogs is at all similar in character to face-to-face storytelling among peers (Coopman & Meidlinger, 1998), then we would further expect that a significant portion of these stories are directly relevant to the training needs of organizations. A minimal-cost solution to the problem of creating story collections is to employ automated techniques for extracting first-person nonfiction narratives of people’s experiences directly from these Internet weblogs. Gordon, Cao, and Swanson (2007) explored the use of contemporary natural language processing technologies to automatically extract stories from Internet weblogs, which they estimated accounted for 17 percent of all weblog text. They demonstrated that high precision (percentage of extracted text segments that were actually stories) was difficult to obtain using current techniques, with the best precision performance reaching 49.7 percent. Although significantly higher than the baseline of 17 percent, this level of performance is still below the level of
384
VE Components and Training Technologies
inter-rater agreement achieved between two human judges, estimated at kappa = 0.68 (Gordon & Ganesan, 2005). STORIES ON THE FRINGE OF EXPECTATION In most science and engineering pursuits, first-person nonfiction stories are disparagingly referred to as “anecdotal evidence,” a term meant to discredit the story as a suitable base for generalization. The argument here is that a single random incident may not be representative of the types of incidents that one would expect to encounter; only an appropriately large random sample of the experiences of practitioners can characterize the situations that new trainees are likely to encounter. In reality, stories may be the worst possible form of evidence if one were trying to learn something about the average case. People do not tell stories about the average case. The average case is boring. People tell stories about the things they find interesting, surprising, and unexpected. When one looks at a large number of stories from some domain of expertise, they do not sample the distribution of expected situations. Instead they each lay on a point along the edge of people’s normal experiences, collectively defining the fringe of expectation. Gathering and analyzing the stories of the real world experiences of practitioners informs us not about the events that take place in the world, but rather about the expectations that these practitioners have about these events—and what they find surprising. The concept of stories on the fringe of expectation is best illustrated when considering the fascinating stories that are told by night security guards of commercial office buildings. There was the time that a fire broke out in the trash chute. There was the time that an opossum crawled into the elevator shaft from the rooftop. There was the time the CEO of the company showed up in the middle of the night wearing pajamas. If these stories were representative of the lives of night security guards at commercial office buildings, then this might be one of the most exciting jobs on the planet. Sadly, the exact opposite is true. The representative experiences of night security guards are not the things that they tell stories about, to each other or anyone else. The stories that they do tell are the exceptions to the norm, the experiences that were markedly different from what they have come to expect in their routine practice. Gathering and analyzing these stories tells us more about the expectations of these professionals than the situations that are likely to occur overnight in commercial office buildings. The position that stories are strongly related to expectations has been advanced within the fields of social and cognitive psychology. Bruner (1991) argued that the violation of expectations, which he referred to as canonicity and breach, is a defining characteristic of narrative as used by the mind to structure its sense of reality. Schank (1982) expanded on views held by Bartlett (1932) and observed that many features of human episodic memory can be explained if we view memories as organized by mental models and schemas that define our expectations of the world. Schank argued that people remember events when they are
Story Based Learning Environments
385
counter to their expectations and used these expectation violations as a basis for revising their mental models to more accurately reflect reality. Schank and Abelson (1995) later argued that natural human storytelling supported these learning processes, enabling groups of people to collectively learn from the surprising experiences of others. Although this perspective is controversial in the social sciences (Wyer, 1995), the concept of an expectation violation has proven useful in developing story based learning environments based on real world experiences. To understand the importance of expectation violations in the development of training technology, consider the value of a good conceptual model to practitioners who must be adaptive in the execution of their skills. When they are in familiar environments and given familiar tasks, they can usually succeed by doing the same thing that has worked for them in the past. Where practitioners find themselves in situations that are only abstractly related to their experiences or training, they must adapt their normal behaviors. Here, a good causal understanding of the things in their environment—the people, organizations, politics, and systems—will aid them in developing successful plans by providing accurate expectations about the effects of their actions. When things happen as expected, plans are successful and tasks are accomplished. When things do not happen as expected (an expectation violation), then the natural human tendency is to identify where one’s model of the world has failed. This tendency is the impetus for the formulation of rich episodic memories, the experiences that people think about over and over again in an effort to learn a better model of the way the world really works. Stories are the natural way that people share these experiences with others and serve as an effective means of using the collective experiences of others to help corroborate one’s own experiences and collaboratively change the way that groups model their environment. Collectively, the stories told by practitioners help identify where the models of novices and trainees are likely to be wrong or disputable and, as such, help identify the simulated situations that make the most effective use of training time. Developers of story based learning environments can capitalize on expectation violations to help embed pedagogically motivated decisions into their simulation. The identification of the expectation violation in a story supports the authoring of a decision situation, a fictional set of circumstances where a decision must be made where the best choice is dependent on whether or not the expectation or the expectation violation is believed. This approach was used in the development of the ICT Leaders Project, where each of the stories that was collected from U.S. Army captains was analyzed to formulate the expectation violation and a fictional decision situation that hinged on the expectation. For example, one of these stories was from a captain who had commanded both combat infantry units, as well as noncombat service support units. He remembered tasking the service support soldiers to move the trailer section of a tractor-trailer rig over the course of a day when no tractor was available. The subordinate soldiers responded with excuses about why they would not be able to do the job, sought to find someone else to do the job for them, and questioned why it needed to be done the first place. The captain was
386
VE Components and Training Technologies
struck by the difference in mindset when commanding combat infantry units that, given the same task, would simply get the job done and report back when it was completed. Why were the service support soldiers not like that? Why did they not behave with the same sense of purpose and initiative that was seen with the combat units? This story informs us very little about the true difference between service support units and combat infantry units; this is merely anecdotal evidence that there might be some difference in mindset between these two groups. Instead, the utility of this story is that it identifies an expectation that is held by this captain about how subordinate soldiers should behave, one that was violated by this experience. In the ICT Leaders Project, a group of four researchers and training developers on the project came to the consensus about the expectation and expectation violation of this story as follows: Expectation: Both combat and noncombat units realize the importance of their roles in the accomplishment of the larger mission and will perform accordingly. Expectation violation: A sense of pride and importance must be developed in low performing noncombat units. The next step in the analysis process is to use this formulation of the expectation violation to create a decision situation, one where the choice of what to do would be primarily determined by whether the expectation or its violation were believed to be true. Here the aim is to engineer a hypothetical situation where a decision has to be made, and where there are two options that are both viable, rational courses of action. In the ICT Leaders Project, the decision situation that was authored for this expectation violation was as follows: Situation: A noncombat unit has been attached to the combat infantry company you command, and it is not performing well. Choice rejected by the expectation: Wait for unit performance to improve as the soldiers realize their importance to the mission. Choice supported by the expectation violation: Work with the soldiers in the unit to develop a sense of pride and importance. In some cases it is possible to author a fictional decision situation that closely parallels some real decision that was made in the nonfiction story, but more often the decisions made in the real world do not have two or more well-balanced, viable, rational options from which to choose. Furthermore, authors of these situations need to guard against the presumption that one of the two options is the best choice or the right answer. Nor does the original story provide real support for one choice or another. Even if the events occurred exactly as they were described, they will rarely provide a strong justification for rejecting the expectations that are challenged. Instead, authors of these fictional decision situations should view them as ways of exploring the fringe of expectation, the fertile area that lies between the novice’s mental models of the task domain and the experiences of practitioners. The right answers to these problems are not going to be determined through the analysis of a handful of stories, but rather through the varied practices of training doctrine development—a different challenge altogether.
Story Based Learning Environments
387
THE FICTIONALIZATION OF LESSONS LEARNED In the historical development of story based learning environments over the last three decades, the most evident changes are in the technologies used in their production. As mentioned in the first section of this chapter, early story based learning environments were produced using video laser-disc and computer hypermedia technologies in the late 1980s. This was followed by the appropriation of desktop video technologies in the early and mid-1990s, followed by Web applications in the late 1990s. Today, innovation in story based learning environments is largely connected to virtual reality and computer gaming technologies. While the early 2000s saw enormous enthusiasm for the integration of computer gaming technologies in the development of computer based training, the pairing of this technology with design paradigms in story based learning environments was not an obvious match. The design paradigm that was commercially viable at the time was that of the outcome-driven simulation (Cleave, 1997), a story based learning environment whose branching storyline structure lent itself particularly well to the hypermedia nature of Internet Web applications. In contrast, computer gaming technology is at its best when treated as a constructive simulation environment, where the situations encountered by trainees emerge through the careful tuning of initial situations and the simulation rules that govern the effects of actions. In short, the best simulation based training looked more like an airplane flight simulator, while the best story based learning environments looked more like a choose-your-own-adventure book (for example, Packard, 1979). It was not at all obvious how the two could be successfully paired. The ICT Leaders Project might best be viewed as an early attempt to force these two technologies together into one training application. The approach taken by the development team was to author an outcome-driven simulation using the same methods that had been used for Web based training instantiations, where each of the fictional decision situations identified through the analysis of real world stories served as a branching point in a static branching storyline. Specifically, it was constructed as a tree with 11 branch points, each with two branches. The presentation of each decision situation and the consequences of selecting one of the two options were realized as scripted scenes, each using a consistent set of fictional characters and interrelated events. In authoring a rich fictional storyline for the ICT Leaders Project, the challenge was to instantiate each of the general descriptions of decision situations into a coherent narrative with dramatic impact (Iuppa et al., 2004). Work on the fictional storyline for the ICT Leaders Project followed on the heels of the development of another media based training application based on the same corpus of interviews with U.S. Army captains, the Army Excellence in Leadership (AXL) project (Hill et al., 2003, 2004). In this work, the transcribed stories of leadership told in these interviews were used as source material for the development of the screenplay for a live-action training film, entitled Power Hungry. The 15 minute film depicts the fictional events occurring over the course of a day in the life of Captain Young. Captain Young is assigned to command an infantry company in Afghanistan during Operation Enduring Freedom, tasked
388
VE Components and Training Technologies
with providing security for a food distribution operation conducted by a nongovernment relief organization. Conditions deteriorate as Captain Young divides his time between micromanaging his subordinates and meeting with local warlords, who eventually succeed in disrupting the operation through deception about their rivalries. The AXL research project at the University of Southern California’s Institute for Creative Technologies later used this film and others like it to explore the development of distance learning technologies for case-method instruction (Hill et al., 2006). Much of the training value of this film comes from the discussions of the leadership style for Captain Young. The ICT Leaders Project based its fictional branching storyline in exactly the same scenario environment, seeking to capitalize on the richness of the fictional situation created in the Power Hungry film and to provide trainees with a means of playing the role of Captain Young in a learn-by-doing training application. To instantiate the decision situation described in the previous section of this chapter (concerning noncombat units) the writers on the ICT Leaders Project cast the decision in the context of an argument between the first sergeant and the executive officer of the company. The storyline introduces a noncombat military unit to raise the issue, a small civil affairs unit that is attached to Captain Young’s company to aid in their interaction with Afghanistan political leaders. They perform poorly at an assigned task, which is to oversee and manage a band of local militia forces that are partnering with the U.S. Army to ensure security for the food distribution operation. This concerns the first sergeant of Captain Young’s company, and when he and the executive officer meet with Captain Young (a role-played by the trainee) he offers to give the civil affairs unit some coaching to improve its motivation. The executive officer disagrees, saying that it is likely that this motivational speech would hurt more than help and that the problems of the civil affairs unit are expected given the little time they have had to integrate with the rest of the company. The sergeant still does not agree and turns to Captain Young (the trainee) for a decision on what to do. The great weakness of this style of story based learning environment, that is, an outcome-driven simulation, is that users are forced to select among a very small number of options in order to ensure that the consequences of these actions have both narrative coherence and lead directly to other decision situations. The trainee who is playing the role of Captain Young in the ICT Leaders Project must decide between two options in this decision situation, regardless of whether or not he or she has a more creative solution to the problem in mind. Perhaps the civil affairs unit should not be assigned the task of managing local militia in the first place. Perhaps the first sergeant should redirect his attention to improving the motivation of the local militia instead. Perhaps the executive officer should get involved directly and leave Captain Young alone to work on the bigger problems of the day. It is possible to provide an interface to the trainees that would allow them to make these types of creative choices (for example, Gordon, 2006), but it is harder to imagine predicting the effects of these choices given the current state of simulation technology. Harder still is the problem of keeping the storyline on track so that the effects of these creative
Story Based Learning Environments
389
actions ultimately lead the trainee to another pedagogically motivated decision situation. In the present, the latter half of the first decade of 2000, current research in story based learning environments is closely aligned with research on technologies for interactive drama. The central question within the research area is how to ensure that a well-crafted story unfolds when the user plays the active role of a creative protagonist. Much of this work attempts to ensure that particular plot elements are included in the unfolding story regardless of the user’s actions (Magerko, 2007; Riedl & Stern, 2006a; Mateas & Stern, 2003). Several researchers have noted the parallels between this concern and that of the developers of story based learning environments, who seek instead to ensure that trainees are presented with particular decision situations (Riedl & Stern, 2006b; Magerko, Stensrud, & Holt, 2006). Increasingly, these efforts are incorporating artificial intelligence planning and execution models to ensure story-like paths through state spaces that are far larger than could reasonably be authored by hand. However, the richness of the possible storylines is most limited by the believability of the behavior models used to control the actions of virtual human characters, which remains an incredibly difficult artificial intelligence research challenge (Swartout et al., 2006). SUMMARY At the beginning of this chapter, story based learning environments were characterized as a complex form of communication, mediating between real world experiences told as stories and the experiences of learners in virtual environments. Seen from this perspective, the main challenges for developers of story based learning environments concern the management of real world story content through the development pipeline. Three key processes in this pipeline were highlighted in this chapter, each representing areas where automation and innovation should be the focus of future research and development. First, stories of real world experiences are an invaluable means of communicating tacit knowledge, but the directed interview methods used today to collect stories from practitioners have problems of scalability. Second, stories of the experiences of practitioners can be analyzed to identify the expectations that they challenge and can be transformed into decisions to be made by learners in fictional situations. However, this style of analysis and transformation capitalizes on only one aspect of nonfiction stories related to learning, tightly constraining the way that these stories are incorporated into virtual learning environments. Third, the branching storyline techniques used to develop outcome-driven simulations in the 1990s transfer well to today’s virtual reality environments, but new innovations in interactive drama are needed to allow learners in these environments to tackle problems in creative ways. REFERENCES Bartlett, F. C. (1932). Remembering: An experimental and social study. Cambridge, England: Cambridge University Press.
390
VE Components and Training Technologies
Boyce, M. (1996). Organizational story and storytelling: A critical review. Journal of Organizational Change Management, 9(5), 5–26. Bruner, J. (1991). The narrative construction of reality. Critical Inquiry, 18(1), 1–21. Clark, R. E. (2004). Design document for a guided experiential learning course (Final Rep., Contract No. DAAD 19-99-D-0046-0004). Los Angeles: University of Southern California, Institute for Creative Technology and the Rossier School of Education. Clark, R. E., Feldon, D., van Merrienboer, J., Yates, K., & Early, S. (2007). Cognitive task analysis. In J. Spector, M. Merrill, J. van Merrienboer, & M. Driscoll (Eds.), Handbook of research on educational communications and technology (3rd ed., pp. 1801–1856). Mahwah, NJ: Lawrence Erlbaum. Cleave, J. (1997). A storyline-based approach to developing management roleplaying simulations. Unpublished Doctoral Dissertation, Northwestern University, Evanston, IL. Cognition and Technology Group at Vanderbilt. (1992). The jasper experiment: An exploration of issues in learning and instructional design. Educational Technology Research and Development, 40(1), 65–80. Cognition and Technology Group at Vanderbilt. (1990). Anchored instruction and its relationship to situated cognition. Educational Researcher, 19, 2–10. Coopman, S., & Meidlinger, K. (1998). Interpersonal stories told by a Catholic Parish staff. American Communication Journal, 1(3). Gordon, A. (2004, June). Authoring branching storylines for training applications. Paper presented at the Sixth International Conference of the Learning Sciences (ICLS-04), Santa Monica, CA. Gordon, A. (2005). The fictionalization of lessons learned [Guest Editorial for Media Impact column]. IEEE Multimedia 12(4), 12–14. Gordon, A. (2006, October). Fourth frame forums: Interactive comics for collaborative learning. Paper presented at the Fourteenth Annual ACM International Conference on Multimedia (MM 2006), Santa Barbara, CA. Gordon, A., Cao, Q., & Swanson, R. (2007, October). Automated story capture from internet weblogs. Paper presented at the Fourth International Conference on Knowledge Capture (KCAP-07), Whistler, Canada. Gordon, A., & Ganesan, K. (2005, October). Automated story capture from conversational speech. Paper presented at the Third International Conference on Knowledge Capture (KCAP-05), Banff, Canada. Gordon, A., van Lent, M., van Velsen, M., Carpenter, M., & Jhala, A. (2004). Branching storylines in virtual reality environments for leadership development. Proceedings of the Innovative Applications of Artificial Intelligence Conference (IAAI-04; pp. 884– 851). Menlo Park, CA: AAAI Press. Hill, R., Douglas, J., Gordon, A., Pighin, F., & van Velsen, M. (2003). Guided conversations about leadership: Mentoring with movies and interactive characters. Proceedings of the Fifteenth Innovative Applications of Artificial Intelligence Conference (IAAI-03; pp. 101–108). Menlo Park, CA: AAAI Press. Hill, R., Gordon, A., & Kim, J. (2004, December). Learning the lessons of leadership experience: Tools for interactive case method analysis. Paper presented at the 24th Army Science Conference, Orlando, FL. Hill, R., Kim, J., Gordon, A., Traum, D., Gandhe, S., King, S., Lavis, S., Rocher, S., & Zbylut, M. (2006, November). AXL.Net: Web-enabled case method instruction for
Story Based Learning Environments
391
accelerating tacit knowledge acquisition in leaders. Paper presented at the 25th Army Science Conference, Orlando, FL. Iuppa, N., & Borst, T. (2007). Stories and simulations for serious games: Tales from the trenches. Burlington, MA: Focal Press. Iuppa, N., Weltman, G., & Gordon, A. (2004, August 10–13). Bringing Hollywood storytelling techniques to branching storylines for training applications. Paper presented at the Third International Conference for Narrative and Interactive Learning Environments, Edinburgh, Scotland. Johnson, C., Birnbaum, L., Bareiss, R., & Hinrichs, T. (2000). War stories: Harnessing organizational memories to support task performance. Intelligence 11(1), 16–31. Korris, J. (2004, December). Full spectrum warrior: How the Institute for Creative Technologies built a cognitive training tool for the XBox. Paper presented at the 24th Army Science Conference, Orlando, FL. Magerko, B. (2007). Evaluating preemptive story direction in the interactive drama architecture. Journal of Game Development, 2(3). Magerko, B., Stensrud, B., & Holt, L. (2006, December). Bringing the schoolhouse inside the box—A tool for engaging, individualized training. Paper presented at the 25th Army Science Conference, Orlando, FL. Mateas, M., & Stern, A. (2003, March). Facade: An experiment in building a fully-realized interactive drama. Paper presented at the Game Developers Conference, Game Design track, San Jose, CA. McCormack, C., & Milne, P. (2003). Stories create space for understanding organizational change. Qualitative Research Journal 3(2), 45–59. Owsley, S., Hammond, K., & Shamma, D. (2006, June). Computational support for compelling story telling. Paper presented at the ACM SIGCHI International Conference on Advances in Computer Entertainment Technology, Hollywood, CA. Packard, E. (1979). The cave of time. NY: Bantam Books. Riedl, M., & Stern, A. (2006a, December). Believable agents and intelligent story adaptation for interactive storytelling. Paper presented at the 3rd International Conference on Technologies for Interactive Digital Storytelling and Entertainment, Darmstadt, Germany. Riedl, M., & Stern, A. (2006b, May). Believable agents and intelligent scenario direction for social and cultural leadership training. Paper presented at the 15th Conference on Behavior Representation in Modeling and Simulation, Baltimore, Maryland. Schank, R. (1982). Dynamic memory: A theory of reminding and learning in computers and people. New York: Cambridge University Press. Schank, R., & Abelson, R. (1995). Knowledge and memory: The real story. In R. Wyer (Ed.), Knowledge and memory: The real story (pp. 1–85). Mahwah, NJ: Lawrence Erlbaum. Schank, R., Fano, A., Bell, B., & Jona, M. (1993). The design of goal-based scenarios. Journal of the Learning Sciences, 3(4), 305–345. Snowden, D. (1999). Story telling for the capture and communication of tacit knowledge. Unpublished doctoral dissertation, Indiana University, Bloomington, IN. Snowden, D. (2000). The art and science of story or are you sitting uncomfortably?: Part 1. Gathering and harvesting the raw material. Business Information Review, 17, 147–156.
392
VE Components and Training Technologies
Sternberg, R., Forsythe, G., Hedlund, J., Horvath, J., Wagner, R., Williams, W., Snook, S., & Grigorenko, E. (2000). Practical Intelligence in Everyday Life. New York: Cambridge University Press. Swartout, W., Gratch, J., Hill, R., Hovy, E., Marsella, S., Rickel, S., & Traum, D. (2006). Toward virtual humans. AI Magazine, 27(1), 96–108. Technorati. (2007). State of the Blogosphere /State of the Live Web. Retrieved July 1, 2007, from http://www.sifry.com/stateoftheliveweb Wyer, R. (Ed.). (1995). Knowledge and memory: The real story. Mahwah, NJ: Lawrence Erlbaum.
Chapter 20
INTELLIGENT TUTORING AND PEDAGOGICAL EXPERIENCE MANIPULATION IN VIRTUAL LEARNING ENVIRONMENTS H. Chad Lane and Lewis Johnson Modern virtual environments provide new and exciting opportunities for the learning of complex skills. Rapid progress in the commercial game industry, as well as in computer graphics, animation, and artificial intelligence research, has produced immersive environments capable of simulating experiences that can closely resemble reality. Educators and learning scientists have grasped these opportunities, motivated by the prospect of providing safe, authentic practice environments for real world skills not previously within the scope of computersupported learning. Greater realism and more immersion seem to be in harmony with modern instructional design methodologies and theories of learning, such as situated learning (Brown, Collins, & Duguid, 1989): We argue that approaches such as cognitive apprenticeship that embed learning in activity and make deliberate use of the social and physical context are more in line with the understanding of learning and cognition that is emerging from research. (p. 32)
A tenet of situated cognition is that knowledge should be learned in its context of use, as well as within the culture of its practice. Computer based learning environments that seek to replace traditional paper based homework assignments tend to be based on the “culture of school” rather than the more real world cultural contexts discussed in the situated learning literature and thus rarely leverage the full capabilities of a computer to simulate these contexts. Virtual learning environments (VLEs), on the other hand, hold the potential to provide learners with greater authenticity and clearer connections to real world applications of skills they are acquiring. However, there is a natural tension between the realism in VLEs and efficient, robust learning. For example, real world skills that may take months or years to
394
VE Components and Training Technologies
apply (such as building a home) may not require faithful representation of time in a computer simulation (such as waiting two weeks for the delivery of materials). Relying exclusively on high fidelity and immersion therefore limits a VLE’s ability to actually promote learning. Numerous studies have shown that learning is suboptimal, sometimes even hindered, when pure discovery and trial and error are used as the primary means for skill acquisition (Mayer, 2004; Kirschner, Sweller, & Clark, 2006). Guidance is therefore critical to avoid these pitfalls, especially for novices. Support can come from a variety of sources, of course, such as instructors, peers, carefully designed instructional materials, or even from within the learning environment itself. Our focus here is on the latter—that is, how we might scaffold learning automatically and from within a virtual learning environment. This chapter summarizes principles that have emerged from studies of human and computer tutors, as well as how artificial intelligence (AI) and intelligent tutoring system (ITS) technologies can be applied to the problem of providing guidance in immersive and virtual learning environments. HUMAN AND COMPUTER TUTORING Students working one-on-one with expert human tutors often score 2.0 standard deviations—roughly two grade levels—higher than students in a conventional classroom (Bloom, 1984). In contrast, the very best intelligent tutoring systems achieve learning gains of about 1.0 standard deviation (Anderson, Corbett, Koedinger, & Pelletier, 1995; VanLehn et al., 2005). The best computer-aided instructional systems—computer tutors that do not use AI techniques—produce learning gains of about 0.4 standard deviation (Niemiec & Walberg, 1987). Unfortunately, a precise answer to the question of why tutoring is more effective than other forms of instruction has remained elusive. Most hypotheses tend to focus either on the behaviors of the tutor—that learning occurs because of expert execution of tutoring tactics—or of the student—that learning occurs when the student makes deep contributions during a tutoring session. Each of these perspectives has implications for how intelligent tutors should behave in virtual environments, so in this section, we take a brief look at both of these hypotheses and the empirical evidence supporting them. Why Is Tutoring Effective? A popular claim for the effectiveness of tutoring is that human tutors are able to adapt and thus individualize instruction to fit the needs of the particular student being tutored. These adaptations can be made in response to a variety of student traits, including those involving the knowledge state of the student or the affective (emotional) state. For example, some expert human tutors implement mastery loops that involve the repeated assignment of problems that test a particular skill (or set of skills) until the student has confidently demonstrated competence (Bloom, 1984). Another tactic is to select or formulate problems in ways that will appeal to and motivate the student (Lepper, Woolverton, Mumme,
Intelligent Tutoring and Pedagogical Experience Manipulation
395
& Gurtner, 1993). Assigning an easier problem when a student’s confidence is low is an example of a tutoring tactic in this category. Human tutors also implement different tactics based on student traits. For example, the policy of immediate feedback is a well-documented tactic applied by both human and computer tutors that increases learning efficiency (Merrill, Reiser, Ranney, & Trafton, 1992; Anderson et al., 1995), but may hinder students’ self-assessment and selfcorrection skills (Schooler & Anderson, 1990). Immediate feedback is considered individualized in the sense that students’ own specific sets of correct and incorrect actions determine what kind of feedback they receive—it is rare that two students will receive exactly the same tutorial interventions. Like problem selection, the content and timing of tutoring feedback can be based on the knowledge state of the student or on affective traits. Lepper et al. (1993) document a variety of lower level tutoring tactics intended to manage affect, such as maximizing success (through praise) and minimizing failure (via commiseration). Some have argued that the best tutors balance the need for active participation of the student with the provision of guidance (Merrill et al., 1992). This means the student does as much of the work as possible, while the tutor provides just enough feedback to minimize frustration and confusion. Also, effective tutoring has been found to have less to do with didactic explanations by the tutor and more to do with the interaction between the tutor and the student. Chi, Siler, Jeong, Yamauchi, and Hausmann (2001) conclude that “students’ substantive construction from interaction is important for learning, suggesting that an ITS ought to implement ways to elicit students’ constructive responses” (p. 518). It is a common pattern in ITS research to first identify effective learning events and patterns in human tutoring, then attempt to emulate them in an ITS.
Intelligent Tutoring Systems Given that research on intelligent tutoring is often inspired by empirical studies of human tutors, it is not surprising that computer tutors share many similarities with human tutors (Merrill et al., 1992). For example, when a student reaches an impasse, human and intelligent computer tutors both use similar approaches to help the student overcome the impasse: both monitor student reasoning and intervene to keep the student on a productive path. A major limitation for early generation tutoring systems was that they interacted with the learner primarily through graphical user interface gestures, such as menu selections, dragging and dropping, and so on. For example, in the Andes physics tutoring system (VanLehn et al., 2005), students draw force vectors on diagrams and enter equations into text fields. Andes provides immediate flag feedback by coloring correct actions green and incorrect actions red. Solicited help is available that allows the student to ask why an action is wrong or for advice on taking the next step. Andes implements model tracing, an algorithm originally appearing in the Cognitive Tutors from Carnegie Mellon University (Anderson et al., 1995). Model tracing tracks a learner step by step through a problem solving space, comparing the observed actions to those indicated by an expert model of the targeted skill and
396
VE Components and Training Technologies
delivering feedback according to some pedagogical model or policy. Immediate feedback with solicited follow-up help is one such policy. Human tutors have an advantage over computer tutors in that a much larger space of tutorial interventions is possible. For example, some important differences that distinguish human tutors arise from subtle cues from facial expressions, body language, conversational cues, or the simple use of dialogue (Fox, 1993). Given the 1 sigma “gap” between the effectiveness of expert human tutors and the best computer tutors, it is no surprise that a great deal of research in the last decade has gone into endowing computer tutors with more of the “features” of human tutors in the hope of narrowing the effect size difference. The use of interactive dialogue represents a major research focus over the last decade. Many such systems attempt to leverage the expressivity of natural language input and dialogue to remediate flawed conceptual knowledge (Graesser, VanLehn, Rose´, Jordan, & Harter, 2001), while others have used dialogue to encourage metacognitive and reflective thinking on problem solving (Core et al., 2006; Peters, Bratt, Clark, Pon-Barry, & Schultz, 2004; Katz, Allbritton, & Connelly, 2003). Just as dialogue opens up new avenues for tutorial intervention, so does research into pedagogical agents and virtual human instructors.
CONSIDERATIONS FOR INTELLIGENT TUTORING IN VIRTUAL ENVIRONMENTS Rickel and Johnson (1997), who were among the first to propose the use of intelligent tutoring in virtual reality environments, point out that much stays the same: students will still reach impasses, demonstrate misconceptions, and will benefit from the guidance and help of a tutor. They highlight new methods of interactions afforded by VLEs: • The tutor can inhabit the environment with the student, thus providing increased potential for “physical” collaboration. • Similarly, an embodied tutor can communicate nonverbally, through gestures and facial expressions, for example. • A virtual reality environment allows students to be tracked in new ways, such as by their visual attention and physical movements.
Thus, the scope of tutorial interactions is greatly increased in VLEs, in both directions: in performing tutorial interventions and in the bandwidth available for monitoring the learner. Researchers have explored the ways in which virtual environments differ from more traditional computer based learning environments that tend to be developed as substitutes for written homework. How well do traditional ITS approaches, such as those discussed in the previous section, map into tutoring in VLEs? What opportunities do VLEs make available that might enhance the effectiveness of an intelligent tutor? Here, we consider both directions: (1) how the advances from intelligent tutoring in traditional environments might be used to promote learning in VLEs and (2) whether more
Intelligent Tutoring and Pedagogical Experience Manipulation
397
advanced immersive technologies might contribute to closing the 1 sigma gap between human and intelligent tutoring. We limit our consideration to those VLEs specifically constructed for the learning of cognitive skills that also include an underlying simulation of some real world phenomena. We also restrict ourselves to those environments that seek a reasonably high level of fidelity and realism. Thus, included in the discussion are virtual worlds that permit exploration from a first-person perspective, simulations of complex equipment (that include an interface modeled directly on actual equipment), and simulations of natural phenomena, such as social, biological, or meteorological phenomena. Expanding the Problem Space: Time and Movement Many VLEs can also be classified as open learning environments. These are characterized by a greater amount of learner control and are generally considered to be more appropriate for learning in ill-structured domains (Jonassen, 1997). Because of the large problem space in many VLEs, solving the plan recognition problem (monitoring, understanding, assessing, and so forth) is often a significant challenge for ITSs. Here, we highlight two key challenges: tutoring in real time contexts and in environments that provide expanded freedom of student movement in a virtual space. Tutoring in Real Time Environments For problem solving tasks that are not time constrained (for example, solving algebra equations), computer based learning environments typically wait for the learner to act. This stands in contrast to many domains targeted by VLEs that require real time thinking, decision making, and acting. Ritter and Feurzeig (1988, p. 286) were among the earliest to wrestle with the problems of tutoring in a real time domain and highlight the following three major differences: • The knowledge acquisition problem is more complicated since experts tend to “compile” their knowledge for efficient execution. • Diagnosing errors is more complicated because time is typically not available to ask the student questions during practice. • Assessing performance and conveying feedback is best done after task completion to avoid the risk of interrupting the learner (see Lampton, Martin, Meliza, and Goldberg, Volume 2, Section 2, Chapter 14).
The knowledge acquisition problem is not magnified only by constraints related to real time processing but also by the nature of ill-structured domains in general (Lynch, Ashley, Aleven, & Pinkwart, 2006), which are common domain targets of VLEs. Diagnosis of errors and assessment of performance are similarly not unique to real time domains, but are nonetheless more complicated because of time constraints during practice. Time-constrained problem solving often goes hand-in-hand with dynamic learning environments—that is, as time moves forward while the student deliberates, the state of the world may change in favorable
398
VE Components and Training Technologies
or unfavorable ways. Here, we review several approaches to dealing with these challenges in terms of how ITSs have been implemented to support learning. Ritter and Feurzeig (1988) describe TRIO (Trainer for Radar Intercept Officers), an ITS built to train F-14 interceptor pilots and radar operators to support the real time decision-making tasks involved with air defense and collaboration. The system presents the learner with radar displays and flight instruments that provide both needed information and the ability to take actions in the simulation. TRIO provides guidance in three ways: • Before practice: demonstrations of expert performance, • During practice: coaching support while the learner practices, and • After practice: post-practice debriefing (after action review).
These interventions are driven by a rule based cognitive model of domain expertise (called the “TRIO articulate expert”) that is capable of performing the intercept tasks the learner is acquiring. TRIO intervenes with a learner only if mission critical mistakes are being made (or about to be made) and leaves most feedback for the post-practice reflective period. This is a typical policy for ITSs operating in real time domains given the risks of competing for the working memory of a learner. The articulate expert focuses on finding the appropriate intermediate goals throughout execution of the task and uses these to help the student learn what went wrong and what should be done. The model is flexible enough to represent multiple solutions to a given problem. Roberts, Pioch, and Ferguson (1998) adopted a similar approach in the development of TRANSoM (Training for Remote Sensing and Manipulation), an ITS for the training of pilots of underwater remotely operated vehicles (ROVs). Just as in TRIO, demonstrations, guided practice, and reflection also play key roles. Because of the real time nature of the task, TRANSoM also attempts to simultaneously avoid distracting the learner, while preventing session-killing errors from occurring. A key aspect to ROV operation is the maintenance of a mental model of the vehicle itself. This is a challenge given the limited inputs regarding the ROV’s status (which is true in reality). To increase the chances of being nonintrusive, TRANSoM applies two techniques. First, all coaching support is delivered verbally so the visual modality is not in competition with the learner. Second, although unsolicited help is delivered in a manner similar to TRIO (when there is deviation from an expert solution path), students are also given the chance to ask for guidance when they feel they need it (that is, solicited help). Among other lessons learned, Roberts et al. (1998) suggest that the use of discourse cues, short utterances, and the simultaneous use of directive visual cues along with verbal feedback would increase the chances of a verbal feedback being effective in a VLE. Tutoring in Open-Movement Environments To promote the feelings of learner control and freedom, many VLEs, especially those that are game based, tend to allow free movement within a virtual world. This is consistent with the motivation for building open learning
Intelligent Tutoring and Pedagogical Experience Manipulation
399
environments. It is typical in this category of VLEs to give the learner control of an avatar or vehicle to maneuver around in a virtual world. Usually done from a first-person perspective, it allows the learner to make such choices as what to explore, when, and for how long. The problem for an ITS in these environments is twofold. First, if the skill being practiced is directly related to the movements of the learner’s avatar, it must be determined at what level of action the ITS should react. For example, does a turn in one direction represent an intention to move in that direction? Second, to what extent physical/motor skills transfer to the real word from virtual environments is an open question. Thus, most ITSs that permit free movement do so in order to maximize the learner’s feeling of freedom and independence and less because it contributes to the acquisition of some underlying cognitive or physical skill. Very few ITSs precisely track how learners maneuver in a virtual environment. Most systems observe only gross physical movements (from area to area) and interact when issues arise related to the events of the game in those physical areas. One exception is the Collaborative Warrior Tutoring system (Livak, Heffernan, & Moyer, 2004), an ITS that tracks physical movements in a threedimensional (3-D), first-person shooter environment for the learning of tactical skills and military operations on urban terrain. Through the use of a cognitive model of room and building clearing skills that inspects the dynamically changing environment represented in the 3-D world, the ITS is able to assess the learner’s movements (including buggy knowledge) and give hints and feedback on the fly. These interventions come as text overlaid on the view of the virtual world alongside communications between characters. The model of expert performance is also used to drive the behaviors of computer-controlled characters in the environment. Most other ITSs that permit free movement in a virtual world do not track movements at this fine-grained level. For example, in the Tactical Language and Culture Training System (TLCTS) mission environment (Johnson, Vilhjalmsson, & Marsella, 2005), the learner is given game objectives and is free to move around an Iraqi village to achieve them. This requires visiting a variety of locations in the village (for example, the cafe´) and interacting with locals in culturally appropriate ways through Arabic speech and gestures. This is similar to the approach taken in the narrative based learning environment Crystal Island (Mott & Lester, 2006). In this system, the learner plays the role of a scientist on an island where several of the inhabitants have become ill from an infectious disease. The learner must move around the island interviewing people, collecting evidence, and running tests. As in TLCTS, actual movements in the environment are important to the extent that they represent decisions—for example, if the learner walks toward a research station with a sample, it is reasonable to conclude he or she intends to test it for contamination. Expanding the Space of Intelligent Tutoring Interactions As discussed, VLEs that are open tend to provide a much larger problem solving space than more traditional computer based learning environments. Not only
400
VE Components and Training Technologies
does this provide more freedom for the learner, but also for the ITS to perform a wider array of pedagogically motivated interactions. In this section we discuss two of these opportunities: through the use of pedagogical agents and via dynamic manipulation of the learning environment in ways that promote learning, sometimes called pedagogical experience manipulation. Pedagogical Agents Artificial intelligence research into the development of intelligent, communicative agents and virtual humans has led to interdisciplinary research on natural language processing, emotional modeling, gesture modeling, cultural modeling, and more (Cassell, Sullivan, Prevost, & Churchill, 2000; Swartout et al., 2006). Since people tend to treat human-like computer characters as they would humans (Reeves & Nass, 1996), there is potential for learners to “bond” more with intelligent tutors that express themselves through a human-like avatar. Previously in this chapter we discussed the 1 sigma gap between the best ITSs and expert human tutors and how dialogue based tutoring systems represent one attempt to bridge this gap. By endowing ITSs with features similar to those used by human tutors, the hypothesis is that this gap can be narrowed. For example, facial expressions might be used to express concern or approval, among other emotions, all of which are potentially useful as indirect feedback. Pedagogical agents tend to serve in one of two roles. The first is in the role of a coach or tutor with the goal of supporting learning through explicit guidance and feedback. The second is when the pedagogical agent assumes a role in an underlying narrative or story playing out in the virtual environment. A wide range of pedagogical agents have been developed that play the role of tutor or coach (Clarebout, Elen, Johnson, & Shaw, 2002; Person & Graesser, 2002). Most provide hints and feedback to a learner during some problem solving task, provide explanations, communicate verbally and nonverbally, and seek to provide “just-in-time” support. Soar Training Expert for Virtual Environments, one of the earliest pedagogical agents, possessed all of the traditional capabilities of ITSs (delivered feedback, explanations, gave hints, and so forth), but also had the ability to lead the learner around the virtual environment, demonstrate tasks, guide attention (through gaze and pointing), and play the role of teammate (Rickel et al., 2002; Rickel & Johnson, 1997). Using animation, sound, and dialogue techniques, pedagogical agents can also attempt to manage the learner’s affective state through encouragement and motivational techniques. For example, in the Multiple Intelligent Mentors Instructing Collaboratively system, an emotional instructional agent has been implemented that will express confusion, disapproval, excitement, encouragement, pleasure, and more (for example, Baylor & Kim, 2005). In narrative based learning environments, pedagogical agents have the opportunity to be “part of the story” by assuming some role in the underlying narrative being played out in the environment. For example, in the Mission Rehearsal Exercise (MRE) system (Swartout et al., 2006), the learner, playing the role of a young lieutenant, is placed in a situation in which one of his platoon’s Humvees
Intelligent Tutoring and Pedagogical Experience Manipulation
401
has been in an accident with a civilian car. The sergeant in the scenario has the knowledge of how to resolve the crisis and will give guidance should the learner need it, such as pointing out the negative aspects to a particular order (for example, “Sir, our troops should not be split up.”). A similar solution is used in TLCTS in endowing an accompanying sergeant with coaching ability, but making only solicited help available (Johnson et al., 2005). In recent versions of TLCTS, tutoring by the accompanying aide has been curtailed, as it was found that some learners got the false impression that only a limited number of choices were available, namely, those that the aide recommends. Instead, tutoring support is provided through the characters in the game, by their reactions to the learner, and at times by the leading questions that they ask of the learner. This approach is inspired by the tactics that good human role-players employ in role-playing exercises at training centers, such as the U.S. Army’s National Training Center. Crystal Island also provides all of its tutoring support through the characters in the game (Mott & Lester, 2006), as well as affective support through empathetic characters (McQuiggan, Rowe, & Lester, 2008). Empirical research on pedagogical agents is mixed in terms of how well they close the 1 sigma gap between computer and human tutors (Clarebout et al., 2002). Moreno, Mayer, and Lester (2000) found that the simple presence of an animated agent did not impact learning, but that speech (over text) led to improved retention and transfer in learning. The same study also showed that interactive dialogue was superior to more didactic utterances by the agent, which is consistent with studies of dialogue based ITSs that do not use pedagogical agents (Graesser et al., 2001). In research aimed at understanding how pedagogical agents can go beyond possessing only domain knowledge, Baylor & Kim (2005) found evidence that agents playing both a motivator and expert role simultaneously (which they refer to as a “mentor”) outperformed agents in each of these roles alone in the ill-defined domain of instructional planning. Wang et al. (2007) found that a key determiner of the effectiveness of a pedagogical agent is the extent to which the agent employs socially appropriate tactics that address learner “face,” consistent with the politeness theory of Brown and Levinson (1987). Learners who interacted with a pedagogical agent that employed politeness tactics achieved greater learning gains than learners who interacted with an agent that did not employ such tactics, and the effect was greatest among learners who expressed a preference for tutorial feedback delivered in a polite, indirect way. Wang has since replicated these results with TLCTS, using politeness strategies delivered via text messages. These studies suggest that (a) the manner in which the agent interacts with the learner determines its impact on learning, (b) the effect varies with the individual characteristics of the learner, and (c) socially appropriate tactics can affect learning even without an animated persona. Studies involving pedagogical agents generally show that learners prefer having a pedagogical agent to not having one, but more evidence needs to be collected to determine their actual value in promoting learning beyond what disembodied ITSs are able to do.
402
VE Components and Training Technologies
Pedagogical Experience Manipulation and Stealth Tutoring A VLE’s underlying simulation provides more subtle opportunities to promote learning beyond explicit guidance. In most VLEs, many forms of implicit feedback already exist that mirror feedback one can observe in real environments. For example, if a basketball is shot, implicit feedback comes from the visual evidence that the ball flies through the hoop or bounces off the rim. In a virtual environment, it may be that different events and behaviors may be more appropriate for learning at different times. It may be pedagogically beneficial to override a simulation such that it establishes ideal conditions for learning or produces implicit feedback that meets an individual learner’s needs. In the basketball example, it may be better for the simulation to have the ball go in the hoop if the goal is to give the learner practice in playing in a tight game (assuming the basket would make the score closer). In this section, we briefly describe two such approaches: experience manipulation and stealth tutoring. There are at least two strategies available for intelligent manipulation of a learner’s experience in a VLE that can promote learning. The first is through the amplification or dampening of implicit feedback. For example, in simulations with virtual humans, it is possible to tweak their behaviors to achieve certain pedagogical objectives. For example, if a learner commits a cultural error, such as mentioning a taboo subject, it may be productive to have the character overreact to that error to support the learner’s recognition of the mistake. If the implicit feedback is amplified in this way, the ITS would be supporting the metacognitive skill in the learner of recognizing that an error was made, which is a critical early step in acquisition of intercultural skills (Lane, 2007). Similarly, if a learner has repeatedly demonstrated knowledge of a given cultural rule, it may make sense to minimize time spent related to that already mastered material. This could be played out by virtual humans with shorter utterances and dampened visual reactions when applicable. A second category of experience manipulation lies in the actual dynamic modification of the state of the simulation in ways that establish appropriate conditions for learning. Although modification of implicit feedback can be used in this way, there are other means. For example, in the Interactive Storytelling Architecture for Training (ISAT) system, the learner is guided through plot points that are selected based on an evolving learner model (Magerko, Stensrud, & Holt, 2006). The version of ISAT that runs in the domain of combat medic skills will manipulate the environment in ways that address the needs expressed by the learner model. For example, if a learner has difficulty identifying the proper order in which to treat multiple injured soldiers, ISAT is capable of adapting the injured soldiers’ injuries and behaviors such that they test the specific weaknesses of the learner. In the combat medic domain, ISAT may adjust the damage an explosion inflicts on victims of an attack or tweak their behaviors resulting from sustained injuries—for example, rolling around on the ground or yelling. These examples of experience manipulation are intended to establish conditions for learning and allow the learner the chance to practice the right skills at the best times within a VLE.
Intelligent Tutoring and Pedagogical Experience Manipulation
403
Stealth tutoring, a specific kind of experience manipulation, focuses on methods of conveying tutor-like explicit guidance from within the VLE. Given that explicit help comes with the risk of learner dependence on it, there may be times when covert support may be preferable so that a learner is not aware help is being given. Crystal Island, and the underlying narrative and tutorial planning system U-Director, demonstrates stealth tutoring in a particularly elegant way (Mott & Lester, 2006). If the system detects that a learner is wandering around the island and failing to make progress, the underlying planning model will decide to direct the nurse character to share her opinion that some of the food on the island might be making people sick. This “hint” comes only after the detection of floundering and in an entirely plausible way (via a character who is concerned about the infectious disease). Of course, an accompanying risk of providing covert support is that, if detected by the learner, self-efficacy and confidence may subsequently suffer. Narrative based learning environments make this kind of support possible. A similar method is used by the virtual human sergeant in the MRE when his initiative is set to “high”—he will more openly share his opinion regarding what needs to be done at any given time (Rickel et al., 2002; Swartout et al., 2006). Although these approaches both rely on virtual characters (and thus fit under the space of pedagogical agent interactions), other opportunities exist to give hints and guidance indirectly through the environment. Care must be taken, however, as with any pedagogical support approach, that the learner does not become dependent on this assistance. CONCLUSIONS In this chapter we described many of the issues facing designers of intelligent tutoring systems for virtual learning environments. Specific challenges arise from the nature of domains that VLEs make accessible, such as tutoring for real time skills and the problem of understanding student actions in open learning environments. Expertise is generally harder to capture and encode in such domains, when compared to domains that involve forms of symbol manipulation and that are less dynamic. Research into automatic approaches to acquiring domain knowledge in VLEs would support the longer-term integration of ITSs. We also described the role of pedagogical agents and how they can be used to promote learning in VLEs. Although current empirical evidence for the use of pedagogical agents remains unclear, they have been found to have many appealing properties for learners and to be beneficial in ways other than just promoting learning (for example, motivation). Pedagogical agents can also participate in an underlying narrative, and thus provide more opportunities for tutorial intervention. We described pedagogical experience manipulation in terms of how it can be used to adjust implicit feedback to promote a learner’s recognition of success or failure and how it can be used to dynamically establish ideal conditions for learning. These new capabilities and new tactics may support “closing the gap” between expert human tutors and computer tutors, but significantly more empirical research is needed to find out.
404
VE Components and Training Technologies
Virtual learning environments with intelligent tutoring capabilities are beginning to be adopted on a widespread basis. For example, TLCTS learning environments are being used by tens of thousands of military service members (Johnson, 2007), and additional learning environments are being developed for nonmilitary use. Because these learning environments are instrumented and log all learner actions, they are an excellent source of data to assess the effectiveness of tutoring techniques in VLEs. Several key questions remain unanswered in the literature regarding the use of ITSs in modern VLEs. For example, how distracting is explicit feedback? How do different modalities compare with respect to distraction? As far as pedagogical experience manipulation, what is the proper balance between narrative control and explicit tutorial control? What other kinds of guidance are possible through stealth techniques, such as difficulty management and task selection? When are explicit measures required and how do they compare when delivered via stealth approaches? What are the risks of stealth guidance and experience manipulation on learners with respect to confidence, self-efficacy, and help-seeking skills? Modern VLEs make realistic practice in a computer based environment possible, and answers to these kinds of questions will have a great impact on how effective VLEs may become. There is no end in sight to the immersive potential for virtual environments—it is important to remember, as Rickel and Johnson (1997) pointed out, that learners will continue to exhibit misconceptions and hit impasses. In order to maximize the teaching power of modern VLEs, it will be important to continue to consider these empirical questions, understand the accompanying risks, and create technological advances that adhere to the principles of effective learning. REFERENCES Anderson, J. A., Corbett, A. T., Koedinger, K., & Pelletier, R. (1995). Cognitive Tutors: Lessons Learned. Journal of the Learning Sciences, 4(2), 167–207. Baylor, A. L., & Kim, Y. (2005). Simulating instructional roles through pedagogical agents. International Journal of Artificial Intelligence in Education, 15, 95–115. Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13(6), 4–16. Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 8(1), 32–42. Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language use. New York: Cambridge University Press. Cassell, J., Sullivan, J., Prevost, S., & Churchill, E. (Eds.) (2000). Embodied conversational agents. Cambridge, MA: MIT Press. Chi, M. T. H., Siler, S. A., Jeong, H., Yamauchi, T., & Hausmann, R. G. (2001). Learning from human tutoring. Cognitive Science, 25(4), 471–533. Clarebout, G., Elen, J., Johnson, W. L., & Shaw, E. (2002). Animated pedagogical agents: An opportunity to be grasped? Journal of Educational Multimedia and Hypermedia, 11 (3), 267–286.
Intelligent Tutoring and Pedagogical Experience Manipulation
405
Core, M. G., Traum, D., Lane, H. C., Swartout, W., Marsella, S., Gratch, J., & van Lent, M. (2006). Teaching negotiation skills through practice and reflection with virtual humans. In C. M. Overstreet & A. Martens (Eds.), SIMULATION: Transactions of the Society for Modeling and Simulation International, 82(11), 685–701. Fox, B. A. (1993). The human tutorial dialogue project. Hillsdale, NJ: Lawrence Erlbaum. Graesser, A. C., VanLehn, K., Rose´, C. P., Jordan, P. W., & Harter, D. (2001). Intelligent tutoring systems with conversational dialogue. AI Magazine, 22(4), 39–51. Johnson, W. L. (2007). Serious use of a serious game for language learning. In R. Luckin et al. (Eds.), Artificial intelligence in education (pp. 67–74). Amsterdam: IOS Press. Johnson, W. L., Vilhjalmsson, H., & Marsella, S. (2005). Serious games for language learning: How much game, how much AI? In C. K. Looi et al. (Eds.), Artificial intelligence in education (pp. 306–313). Amsterdam: IOS Press. Jonassen, D. H. (1997). Instructional design models for well-structured and ill-structured problem solving learning. Educational Technology Research and Development, 45(1), 65–94. Katz, S., Allbritton, D., & Connelly, J. (2003). Going beyond the problem given: How human tutors use post-solution discussions to support transfer. International Journal of Artificial Intelligence in Education, 13, 79–116. Kirschner, P., Sweller, J., & Clark, R. E. (2006). Why minimally guided learning does not work: An analysis of the failure of discovery learning, problem-based learning, experiential learning and inquiry-based learning. Educational Psychologist, 41(2), 75–86. Lane, H. C. (2007, July). Metacognition and the development of intercultural competence. Paper presented at the Workshop on Metacognition and Self-regulated Learning at the 13th International Artificial Intelligence in Education Conference, Marina del Rey, CA. Lepper, M., Woolverton, M., Mumme, D. L., & Gurtner, J. L. (1993). Motivational techniques of expert human tutors: Lessons for the design of computer-based tutors. In S. P. Lajoie & S. J. Derry (Eds.), Computers as cognitive tools (pp. 75–105). Hillsdale, NJ: Lawrence Erlbaum. Livak, T., Heffernan, N. T., & Moyer, D. (2004, May). Using cognitive models for computer generated forces and human tutoring. Paper presented at the 13th Annual Conference on Behavior Representation in Modeling and Simulation. Simulation Interoperability Standards Organization, Arlington, VA. Lynch, C. F., Ashley, K., Aleven, V., & Pinkwart, N. (2006, June). Defining “ill-defined” domains; A literature survey. Paper presented at the Workshop on Intelligent Tutoring Systems for Ill-Defined Domains at Intelligent Tutoring Systems at the 8th International Conference on Intelligent Tutoring Systems, Jhongli, Taiwan. Magerko, B., Stensrud, B., & Holt, L. S. (2006, December). Bringing the schoolhouse inside the box—A tool for engaging, individualized training. Paper presented at the 25th Army Science Conference, Orlando, FL. Mayer, R. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist, 59(1), 14–19. McQuiggan, S., Rowe, J., & Lester, J. (2008). The effects of empathetic virtual characters on presence in narrative-centered learning environments. In Proceedings of the 2008 SIGCHI Conference on Human Factors in Computer Systems (pp. 1511–1520). Merrill, D. C., Reiser, B. J., Ranney, M., & Trafton, J. G. (1992). Effective tutoring techniques: A comparison of human tutors and intelligent tutoring systems. Journal of the Learning Sciences, 2(3), 277–305.
406
VE Components and Training Technologies
Moreno, R., Mayer, R. E., & Lester, J. C. (2000). Life-like pedagogical agents in constructivist multimedia environments: Cognitive consequences of their interaction. In J. Bourdeau & R. Heller (Eds.), Proceedings of the World Conference on Educational Multimedia, Hypermedia, and Telecommunications—ED-MEDIA 2000 (pp. 741– 746). Charlottesville, VA: Association for the Advancement of Computers in Education. Mott, B. W., & Lester, J. C. (2006). Narrative-centered tutorial planning for inquiry-based learning environments. Proceedings of the 8th International Conference on Intelligent Tutoring Systems (pp. 675–684). Berlin: Springer. Niemiec, R., & Walberg, H. J. (1987). Comparative effects of computer-assisted instruction: A synthesis of reviews. Journal of Educational Computing Research, 3, 19–37. Person, N., & Graesser, A. C. (2002). Pedagogical agents and tutors. In J. W. Guthrie (Ed.), Encyclopedia of education (pp. 1169–1172). New York: Macmillan. Peters, S., Bratt, E. O., Clark, B., Pon-Barry, H., & Schultz, K. (2004). Intelligent Systems for Training Damage Control Assistants. In Proceedings of the Interservice/Industry Training, Simulation, and Education Conference (pages not available). Arlington, VA: National Training Systems Association. Reeves, B., & Nass, C. (1996). The media equation. New York: Cambridge University Press. Rickel, J., & Johnson, W. L. (1997). Intelligent tutoring in virtual reality: A preliminary report. In Proceedings of the Eighth World Conference on Artificial Intelligence in Education (pp. 294–301). Amersterdam: IOS Press. Rickel, J., Marsella, S., Gratch, J., Hill, R., Traum, D., & Swartout, W. (2002, July/ August). Toward a new generation of virtual humans for interactive experiences. IEEE Intelligent Systems, 32–38. Ritter, F., & Feurzeig, W. (1988). Teaching real-time tactical thinking. In J. Psotka, L. D. Massey, & S. A. Mutter (Eds.), Intelligent tutoring systems: Lessons learned (pp. 285– 302). Hillsdale, NJ: Lawrence Erlbaum. Roberts, B., Pioch, N. J., & Ferguson, W. (1998). Verbal coaching during a real-time task. In Proceedings of the Fourth International Conference on Intelligent Tutoring Systems (pp. 344–353). Berlin: Springer. Schooler, L. J., & Anderson, J. R. (1990). The disruptive potential of immediate feedback. Proceedings of the Twelfth Annual Conference of the Cognitive Science Society (pp. 702–708). Cambridge, MA. Swartout, W., Gratch, J., Hill, R., Hovy, E., Marsella, S., & Rickel, J. (2006). Toward virtual humans. AI Magazine, 27(2), 96–108. VanLehn, K., Lynch, C., Schulze, K., Shapiro, J. A., Taylor, L., Treacy, D., et. al. (2005). The Andes physics tutoring system: Five years of evaluations. In G. McCalla & C. K. Looi (Eds.), Artificial intelligence in education (pp. 678–685). Amsterdam: IOS Press. Wang, N., Johnson, W. L., Mayer, R. E., Rizzo, P., Shaw, E., & Collins, H. (2007). The politeness effect: Pedagogical agents and learning outcomes. International Journal of Human-Computer Studies, 66(2), 98–112.
Chapter 21
ENHANCING VIRTUAL ENVIRONMENTS TO SUPPORT TRAINING Mike Singer and Amanda Howey The “enhancement” of virtual environments (VE) for training first requires a clear distinction between task fidelity and learning requirements, addressing alterations made to improve the efficiency and/or effectiveness of learning during discrete training episodes. This chapter reviews some of the research on the instructional enhancement of simulations for training, encompassing instructional features, dynamic graphics, and supportive instructional systems (unfortunately size limitations preclude a comprehensive literature review). The goal is to provide an organizing overview of the breadth and the depth of issues that have been investigated, to note some indications of current applications, and to point toward profitable future research. In short, this chapter will address what is currently known about the functions, features, and tools that work to enhance training during simulation based exercises. In order to address important findings from simulation and virtual environments, a taxonomic framework is introduced that provides relational structure for the information and arguments presented. The basic framework is derived from a systems approach that encompasses training and leads to our reasoning about the constraints upon the enhancement of VE based training. The systems approach has a long history in training and many citations (for example, Hays & Singer, 1989). Briefly, a system has interrelated parts such that a change in one part causes a change in one or more of the other parts. Obviously this leads to subsystems and suprasystems that also have specific relationships. A training system has learners and information as inputs, costs and resources as limitations, and subsystem processes (which have characteristics and relationships, as discussed below) that address the transition to skilled performance, with proficient performers as output. GOALS, STRATEGIES, TACTICS, AND FEATURES We argue that for most directed learning approaches, there are generally four main components: goals, strategies, tactics, and features (Singer, Kring, &
408
VE Components and Training Technologies
Hamilton, 2006). These components refer to related subsystems in the training system as instructional features are applied through the use of tactics, which are employed as a part of a strategy for reaching an instructional goal guided by measurements that indicate the learners’ past and current state and models that project the effective training. These different system components relate to one another in structured and supportive relationships (see Figure 21.1). By explicitly defining the concepts and describing their interrelationships, we can better understand previous empirical work in similar domains and apply that information to current techniques of interest, VE based training, and generalize prior research to new training domains of interest. DEFINITIONS AND RELATIONSHIPS Instructional Goals Directed learning is the purposeful transfer of information, knowledge, skills, abilities, and/or attitudes from one source (for example, instructor, computer software, simulation, or other system) to an individual or group (Hays, 2001). Given this definition, the purpose of a directed learning program can be termed the instructional goal. This purpose has also been referred to as the instructional
Figure 21.1.
The Four Main Components of Directed Learning Approaches
Enhancing Virtual Environments to Support Training
409
objective, outcome, or task. When relating the instructional goal to the supporting concepts, it often is better to consider the smallest coherent unit possible, usually labeled as a “task.” Task is a difficult concept to define in an all-inclusive fashion, but generally is taken to refer to a unitary set or sequence of behaviors that enable a complete and meaningful job or mission function (for example, Miller & Swain, 1987). (For a more in-depth exposition, see Fleishman & Quaintance, 1984, or applicable chapters in the Handbook of Human Factors and Ergonomics, Salvendy, 1997.)
Training Strategies Directed learning programs must have one or more (usually multiple) explicit approaches, or instructional strategies—a “plan, method, or series of activities aimed at obtaining a specific goal” (Jonassen & Grabowski, 1993, p. 20). As noted above, for each task to be learned, at least one strategy must be selected. Obviously, when reviewing a set of tasks as the instructional goal, it is usually more efficient to address all or as many tasks as possible with the same strategy. Some examples of strategies would include auditory presentation of a performance sequence, visual presentation of an information set and resultant solution, or supporting the dynamic interaction of trainee and equipment using simulation (or interacting with the real equipment).
Training Tactics We can then support the strategies with “specific actions which are wellrehearsed and are used to enable the strategy” (Jonassen & Grabowski, 1993, p. 20). We refer to these as training or instructional tactics (Singer et al., 2006). Tactics are often based on or limited by available enhancements. For example, if one strategy is to explicitly drive cognitive elaborations during rule learning, one enabling tactic might be to employ auditory prompts (questions pertaining to rules used in task performance and leading to the desired cognitive elaborations) at appropriate points during a training session. If an instructor is present, has been trained in this tactic, and can use his or her voice to provide the prompt, this probably will work with a certain efficacy. In a VE used for training, especially if it is distributed geographically, one must have some artificial method (instructional feature) for providing or conveying the same auditory prompt. So these instructional tactics are maneuvers or manipulations that are employed within a strategy and can be used to change a learner’s knowledge state, enabling the learner to reach the instructional goal. The application of these instructional tactics has to be controlled in some fashion, and the control itself might be considered an instructional feature. The control might be constant, selectable by the trainer, automated with trainer selectable conditions, under live control of an instructor, or under the control of an artificial intelligence program.
410
VE Components and Training Technologies
Training Enhancements or Instructional Features In using tactics to support strategies and ultimately reach instructional goals, we argue that tactics require tools for support and implementation. These tools are the enhancements that have been traditionally identified as instructional features (Sticha, Singer, Blacksten, Morrison, & Cross, 1990; Ricard, Crosby, & Lambert, 1982). Instructional features thus refer to a wide variety of tools and/ or techniques that instructors can use to support and execute the instructional tactics and strategies. In this chapter, the term “instructional features” primarily refers to alterations of the simulation that change the operational fidelity of the simulation in some way for instructional effect (highlighting, blinking objects, intentional pauses, and so on).
FIDELITY A short diversion has to be inserted at this point in order to clearly separate the “environment” from the “enhanced” pieces. Task requirements for performance provide the basic definition for the “what” and “how” that needs to be represented in a training simulation, the environment for task performance. This has been the focus of the learning psychologist (for example, Gagne, 1954) and human factors practioners (for example, Smode, 1971; Goldstein, 1987; Swezey & Llaneras, 1997) for many years. For example, if one is looking for a way to train soldiers to correctly identify and deactivate improvised explosive devices (IEDs), a task analysis will indicate the appropriate stimuli for identifying potential IEDs, as well as the environmental stimuli and functionality for deactivating the IED. The simulation must include several possible IEDs and associated situations in order to enable a variety of practice opportunities. The simulation must also allow the soldier to approach and deal with appropriate objects according to established protocol. Fidelity is the essence of the similarities, or the closeness of the simulation to its real world counterpart. Hays and Singer (1989) define two types of fidelity: physical and functional, although there are other ways to discuss fidelity in simulations. Physical fidelity addresses the “what” that is developed for a simulation, for example, the image of a trash pile and an IED. Functional fidelity defines the “how” in a simulation, including the state of the weapon (for example, loaded), capability for firing, damage caused to the target (for example, physics modeling), and sound of firing. Together the task specific physical and functional fidelity provide the initiating, guiding, and feedback stimuli that are absolutely necessary for task performance in the learning situation or simulation. When we are interested in learning and transfer of training, high fidelity may not always be necessary or sufficient for this to occur. The trainer might have to compromise the fancy engineering and software tools that create an environment that is identical to the real world task for a more simplistic view that includes learning aids for the users (instructional features), as pointed out long ago by Smode (1971). This model of task fidelity, one of compromise and balance, is
Enhancing Virtual Environments to Support Training
411
typically used to frame the instructional approaches that can be applied to reach instructional goals, as the most important deviations from fidelity are those that enhance the learning or retention of the skilled task performance. INSTRUCTIONAL FEATURES AS ENHANCEMENTS Conceptually, there are two ways to enhance a simulator in order to facilitate learning and transfer of training, both of which are based in deviations from fidelity. As Boldovici (1992) pointed out, the psychology of learning has always been focused on the arrangement and characteristics of stimuli in order to promote learning. In many normal task situations the relationships between the initiating, guiding, and feedback stimuli or cues can be probabilistic and difficult to learn. Even in the real environment, using a subset of the task cues or stimuli correctly may not guarantee correct performance. This can make key stimuli difficult to establish or learn as fundamental in the task performance situation. Therefore the meaningfulness or salience must be manipulated during training in some fashion, until the learner provides evidence through performance that the stimulusresponse link has been satisfactorily established. Following from this logic, and from the logic of using deviations from fidelity to enhance learning, Boldovici proposed that adaptive training must alter stimuli in order to support the learner in recognizing and internalizing the salience of the initiating, guiding, and terminal or feedback stimuli of the task while learning actually occurs. In other words, the learner must understand the presence and use of the added or altered stimuli in order for it to be a helpful learning strategy. Pure practice in the actual performance environment is the prototypical situation that simulation attempts to achieve. Anything that leads to deviations has changed the stimuli present in the simulated performance environment, through altering normal stimuli or adding stimuli not normally present in the performance environment. Boldovici (1992) addressed the former as augmenting or attenuating stimuli and the latter as supplementing (and fading) stimuli. These are most clearly defined as augmenting or adjuncting cues (from Boldovici, 1992). Augmenting Cues The augmentation approach centers on changing the characteristics of stimuli that are normally present in the task environment and may be used in learning to perform or improving performance on the task. One way to improve the detection and use of normal task stimuli is to enhance the potential salience of those stimuli in some fashion. One method of enhancement might be increasing some physical characteristic of the cue or stimulus—making it brighter, increasing auditory output, changing the dimensions, and so forth. Another method that can make the normal critical task stimuli more salient is to decrease the salience of surrounding or interfering cues. An example would be decreasing the brightness, reflectivity, saturation, or hue of surrounding objects to make the important stimulus “stand out” or be more perceptible to the trainee. Obviously, control
412
VE Components and Training Technologies
over this class of cues is driven by selection of critical cues from the task analysis and insight into which cues are used to the greatest effect during task performance. The trainer must then determine which of the cues in the critical set to address, and when to alter them. The approach cannot stop at that point, but must then close the cycle of adaptive training by determining when and how to attenuate the alterations in the stimuli. Adjuncting Cues An alternative method for guiding or reinforcing learning performance is to add discriminative stimuli or cues to the simulation, referred to as adjunct cues. This is most clearly an instructional intervention, and with this approach there is an additional cognitive factor: the trainee must understand the purpose and the use of the additional stimuli. In terms of visual effects, an adjunct cue can be used to direct attention to the critical stimuli by pointing or marking the target stimuli in some way (for example, using an arrow to point at a location or placing a distinctive circle around a location). Coaching during an instructional session is also an adjunct cue, in that the visual (text or symbolic) or auditory information about the situation can also be inserted in the stimulus stream in an attempt to aid the learning process. In terms of auditory inputs, sounds can be used to orient the trainee to stimuli, or verbal coaching can be provided to guide the trainee. Other sensory domains can and have been used to add cueing to domains, such as haptics (for example, Hopp, Smith, Clegg, & Heggestad, 2005), temperature, olfactory (for example, Washburn & Jones, 2004), or multisensory inputs (for example, Jerome, 2006; Jerome, Witmer, & Mouloua, 2005; Albery, 2005). It should be clear that adjunct cues used in a training situation or simulation are no different from those used to aid performance of a task, as humans learn during every activity. IMPLEMENTATION METHODS By implementation we do not mean the technology of changing an environmental stimulus that is used in task performance within the simulation (augmentation) nor the technology for inserting extra stimuli (adjuncting) in the simulation. It should be apparent from the long history of simulation that has led to virtual environments that the technology is continually changing and improving (the age of some of our references should make this completely apparent). By implementation methods, we mean that consideration has to be given to the instructional control and use of any changes made to stimuli in the simulation. The control and use of the different enhancements are the instructional tactics. Control refers to whether the enhancement can be controlled, by whom it can be controlled, and the parameters of that control. The simulation may have been implemented with enhancements that cannot be changed, for example, information stimuli that appear in a simulation when an operator is close enough. The feature may be restricted to the control of a trainer or may be implemented only
Enhancing Virtual Environments to Support Training
413
during a review. The feature may be under the control of the trainee. The scope of implementation methods is immense, yet it is intimately connected to the instructional feature effectiveness. Instructional Feature Control Instructor Control Definitions and Example Instructor control would allow the instructor to pace the course as necessary to ensure proper training. For example, the system may have automated evaluations determining when a trainee has fired a weapon at an impermissible target (for example, a nonthreatening civilian) and could notify the trainer, flag and timestamp the action for after action review (AAR) use, or immediately intervene in several ways. Depending on how the trainee is performing, the instructor would be able to decide whether to turn the features on or off. One example of an instructor-controlled feature is coaching. The instructor, when appropriate, can give the trainees instructions and guide them through their tasks. In research looking at these kinds of interventions, interrogative coaching has been shown to be superior over no coaching interventions within short trials (for example, Singer et al., 2006). System Control This would range from programmed interventions through artificial intelligence–assisted interventions. An example is the mounted tactual cockpit display created by Gilson and Ventola (as cited in Lintern & Roscoe, 1980). The system provides adjunct information only when a trainee is off course or withdrawing, or it does not provide information when performance is correct. The Engagement Skills Trainer 2000 Simulator employs system-controlled cues when it determines a trainee has used incorrect fire. In that situation, the system stops the simulation and asks the trainee to “defend your action.” Each time a shot is fired at an incorrect time, the system will stop the scenario and allow the trainee to explain to the trainer why the shot was fired. The trainer must then decide whether the trainee was successful in verbally defending his or her action before continuing with the scenario. At the most basic level the system always requires a trainer intervention when an error is committed. Trainee Control There are times when the trainee has control over the pace of the training. One example can be in computer based training courses. The trainee decides when enough material is covered on a page, and he or she is ready to move on to the next. This can be dangerous, as humans are typically overconfident in their reading comprehension of material (Matlin, 2005). Another example of trainee control is the ability to explore the simulator at his or her own discretion looking for areas of interest (as opposed to restricting him or her to one place or path) and the ability to pause the simulator or training. One example of a simulator with
414
VE Components and Training Technologies
these features is being used by the U.S. Air Force to train network defensive operations using an intelligent tutoring system that allows trainees to explore the many areas and dimensions of the program and to pause the training when they need clarification or a break (Goan, 2006). This begs the question, should the trainee be able to obtain aid when desired—or turn off training (for example, “Mr. Clippy” from Microsoft Word) when intrusive? There is some research that shows that learners are not the best judges of their own learning and may not choose optimally when provided aid (Maki, Jonas, & Kallod, 1994). Salience, Trust, and Reliability of the Instructional Features In addition to type and control, there are three more important aspects to the cues we must consider: salience, trust, and reliability. When looking at the presence (or salience) of what we would call adjunct cues, Yeh and Wickens (2000) found detection-aiding cues to be effective, particularly with low salience targets. So when the trainee must learn to find hard-to-detect action prompts (such as a target), adding an adjunct cue (for example, in this example, a colored reticle) to that area on the target was found to be helpful. In addition, Yeh and Wickens were interested in trust and reliability of the cues. In one condition of their experiment they presented (adjunct) cues that were 100 percent accurate, and in another condition the (adjunct) cues were only 75 percent accurate. The benefits came with some costs, which were considered to be consistent with previous findings (Merlo & Wickens, 1999; Yeh, Wickens, & Seagull, 1999; Mosier, Skitka, Heers, & Burdick, 1998; Ockerman & Pritchett, 1998): when the cues were reliable and consistent, the trainees began to look only for those cues to act and failed to act when the events should have elicited a response (during testing). Once the cues were removed from the simulated environment the trainees might not know when to act given only the real world promptings. Crutch Effects When changing stimuli in a representative simulation, the trainer runs the risk of the learner incorporating the wrong stimulus mix for the learned responses. This can lead to nongeneralization/transfer to the work environment, or even unintentional discriminative learning that fosters nonperformance on the task. These are sometimes referred to as crutch effects. For example, Wheaton, Rose, Fingerman, Korotkin, and Holding (1976) suggested that it is potentially harmful to future performance to present the cues on every trial; rather, it is better to force the trainee to complete some tasks without the cueing. An implicit characteristic of instructional tools is the time course of their application or use. The issues are when to apply and when to withdraw the enhancement. One option is to start with the adjunct or augmenting cues and then gradually remove them. Yeh and Wickens (2000) experimented with presenting cues sometimes when appropriate and other times when the cue should not have been present. When the cues became unreliable, the participants stopped using them, which had three primary effects: (1) reduced benefits of cueing, as
Enhancing Virtual Environments to Support Training
415
participants did not rely on the adjunct cues, as shown in decreased performance on the task; (2) an increase in false alarms: participants blindly followed the adjunct cues as though they were always correct; and (3) decreased the attentional cost of cueing, as it helped with the problem of the participants using cues as crutches. On one hand, this approach is not effective for direct teaching of actions in the environment through the use of enhanced features; however, it does require the user to assess the accuracy of the cueing within the environment, which was not always observed. St. John, Smallman, and Manes (2005) also noticed this and discussed the issue of confirmation bias, or times when trainees blindly follow the cueing, regardless of the actual need for action as dictated by the environment or situation. Lintern and Roscoe (1980, p. 232) suggest that supplementary cues that provide more precise information can simplify the whole task considerably and allow the trainee to converge quickly on the correct control responses. Appropriate control behavior might be learned more rapidly under these conditions, and gradual withdrawal of the supplementary cues should then force the trainee to become increasingly dependent on the cues that are normally available without disrupting the newly learned control skills.
By doing so, the user generalizes to the natural cues in the environment and not the superficial cues that are added. Whether this could also support an increase in transfer is a research issue that has not been investigated, to our knowledge. However, it seems reasonable that having “transferred” within the training session could improve transfer to using actual equipment in the real world.
INSTRUCTIONAL FEATURES EXAMPLE APPLICATIONS This section briefly reviews some recent enhancements for simulation based training, encompassing material that can be labeled instructional features, and supportive instructional systems. In our conceptual model, these are all tools that enable instructional tactics and support specific strategies. Graphics fit easily and obviously into the area of adjunct cues within simulations, typically presenting extra symbols or portraying motion using the simulated objects. Supportive instructional systems, encompassing (simulation-relevant) intelligent tutors, computer based instruction capabilities, and automated measurement systems also easily fall within the concept tools that support or enable instructional tactics as defined above. The focus is on establishing what is known about whether, when, and how to apply these widely varying tools to enhance training. Many current virtual environment simulations use instructional tactics and features, without addressing these factors as such or examining their individual instructional benefits. Some new approaches, such as the application of augmented cognition technologies to intelligent tutoring systems (see Nicholson, Lackey, Arnold, & Scott, 2005), directly address the need for instructional research. The augmentation of intelligent tutoring systems (as discussed by Nicholson et al., 2005) proposes to use advanced methods of system control in the form of better student, expert,
416
VE Components and Training Technologies
and tutor models combined with advanced measurement to change the difficulty of tasks (through augmenting the information required) or provide within-task coaching, or to provide after-task reviews (perhaps inserting adjunct cues). Presumably, the intelligent tutoring systems would employ instructional strategies, tactics, and features in supporting optimal learning for each trainee. Other developing systems are incorporating tactics and instructional features in the belief that the application will enable effective learning and incentive for continued use (Ackerman, 2005). The U.S. Army’s “Every Soldier a Sensor Simulation” uses within-experience feedback under automated control (visible information operation scores, as well as verbalizations during reports) and similar feedback during the automated review of the training experience. Since the simulation is built upon a game engine, increasing levels of difficulty provide advanced training and enthusiasm for continued play. Ambush! (Diller, Roberts, & Willmuth, 2005) is a game based training system that includes voice communication among participants and observer/controller stations (with enhanced AAR support capabilities). This system is being used in conjunction with hands-on training for convoy operations, originally focusing on ambushes. The system is used to train both squads and platoons, and several groups can be run at once with different tasks and missions. The instructional tactics used in the simulation require halting the simulation for specialized or focused training (for example, medical aspects, leaders conducting interviews, and car searches) and then resuming the simulated mission after the focused training episode (which occurs outside the game, using physical mock-ups and interactions). The only noted instructional feature used is the relatively standard “freeze, save, and restart” of the ongoing situation, although the functionality is not labeled as such. Another example of game based simulations is the Virtual Environment Cultural Training for Operational Readiness (VECTOR; Deaton, Sanatarelli, Barba, & McCollum, 2006). The major focus of the VECTOR effort was to investigate improvements in the intelligence and cultural validity of the nonplayer characters (NPCs) so that trainees could learn higher level adaptive, interactional skills. The immersive simulation allowed trainees to move around an urban area, interacting with a limited set of individuals from whom information could be acquired. The information acquisition required working within cultural constraints, gaining trust through interactions, and learning the social conventions of the culture in order to gain sufficient information to fulfill a mission. The most prominent instructional tactic is to deliver cultural information and guidance through the (automated) interpreter role while accompanying the trainee through the area. This is a coaching instructional feature, under the control of an automated system, providing corrective feedback for every wrong action. As the system was developed to investigate the technological aspects of intelligent NPCs, the effectiveness of the instructional tactic and supporting feature has not been evaluated. The system is being elaborated with increased authoring and trainer control over scenarios by the U.S. Army Research Institute (ARI) for the Behavioral and Social Sciences and will probably be evaluated during fielding in the near future.
Enhancing Virtual Environments to Support Training
417
Haptics can also be used in training to provide necessary task fidelity for the simulation, or haptic stimulation can be used as an instructional feature. One recent example of research into this area investigated the two different aspects of haptics: the provision of normal cues and metaphoric signals of normal task stimuli (Hafich, Fowlkes, & Lenihan, 2007). The application of haptics occurred through vibrating “tactors” attached to a vest, a leg, and arm bands. The “normal” stimulus was provided as an approximation of the normal cues from the environment, so all tactors on the front of the vest would vibrate to simulate an explosion from the front. A “metaphoric” haptic cue was more artificial, requiring decoding by the wearer; for example, an explosion might be conveyed by a pattern of vibrations of the tactor closest to the explosion. As might be expected, the naturalistic haptic cues were more correctly recognized in the tasks, but there were no performance changes found over the repeated trials (Hafich et al., 2007). The instructional tactic consisted of consistent application of the haptics throughout the learning trials. The metaphoric or symbolic haptics did not change performance or identification in comparison with no adjunct information provision, and the normal haptics stimuli were significantly better than either metaphoric or no information conditions. Hafich et al. point out that learning even the natural cues required several trials. From this we can infer that presenting even naturalistic cues that are low fidelity requires learning and that simplistic symbolic haptic information requires about the same learning before any effect might be found. Nevertheless, it seems that haptic cues can be used in complex task performance, although more research is needed on the effective application of those systems. BiLat (Hill et al., 2006, p. 1) is “a game-based simulation that provides Soldiers a practice environment for conducting bilateral meetings and negotiations in a cultural context.” In developing the game based simulation, the tasks were analyzed for learning objectives and developed interrelated story based scenarios. The simulation contains several different intelligent systems that support the interactions with the user and direct avatar behaviors during the meetings. One of the intelligent systems provides coaching during the meetings, as set by the instructor and providing a mix of situationally specific hints, corrections, or confirmations. This approach falls very directly into the adjunct provision (of hints or directions) and also can be tailored for reduced use based on the trainee’s performance, thus potentially limiting or eliminating the crutch effect. The system uses the coaching system to implement a “reflective tutor” that provides feedback, conceptual questions, and “what if ” questions during the training session or AAR. Finally, while not documented in their paper, a demonstration revealed that the game environment replicates a tactical operations center where preparation for the meetings is conducted by the trainee (accounting for most of the training). In this simulated center all usable information presentation systems are highlighted with light and higher definition graphics so that the trainee easily understands the sources to be used in preparing for the meetings. The BiLat game therefore provides an example of using instructional features to support the instructional tactic of guiding or scaffolding the users interactions in order to support learning objectives. Unfortunately, the simulation has not yet been evaluated
418
VE Components and Training Technologies
for effectiveness (Hill et al., 2006), although it is being used in training at Fort Leavenworth, Kansas. The authors are involved in an ongoing program conducted by the U.S. Army Research Development and Engineering Command referred to as Asymmetric Warfare Virtual Training Technology (Singer, Long, Stahl, & Kusumoto, 2008; Mayo, Singer, & Kusumoto, 2005). The program goal is to use distributed game technology to provide a generalized simulation for dismount soldier training and rehearsal, enabling large numbers of soldiers and automated forces to interact. The system was first developed to provide adequate fidelity for reasonable dismounted soldier operations requiring interaction and decision making and has record and replay capabilities for AAR. The simulated radio channels can be used for coaching by a trainer (although the trainee must link to that radio channel also). Several new instructional features are being developed for the system, under contract to ARI, in order to conduct research into both instructional tactics and features. Foremost among these is implementation of a laser pointer, a colored vector or pointer controlled by a trainer that can be used for indicating objects and directions in the course of a training session. The laser pointer can also be used during AARs to call out important information. Several recording and replay enhancements are also being added: trainer control over trainee location so that distributed trainees can be brought to the appropriate virtual location and time for a replay interval; trainer control over distributed voice channels, enabling replay of the session sounds without interruption, lecture by the trainer, discussion by the trainees, and so forth; and the easy addition of timestamp controls (called bookmarks) by the trainer, enhancing trainer marking of training session key events. All of these features will be used in a program of research investigating learning gains from use and the effects of different employment tactics for the features.
CONCLUSION This conceptual structure is an attempt to organize a somewhat unfocused area in order to apply what is known from past research, identify what is currently being used and researched, improve current applications, and provide a better structure for future research. We believe that by identifying and evaluating how instructional tactics are used to support instructional strategies, the instructional tactics can be more easily applied to different task domains. Identifying and evaluating how instructional features support those instructional tactics will also aid in their application to the increasingly ubiquitous game based simulations. One problem that such a structure may help address is the proliferation of instructional features without regard to how they affect instructional tactics or strategies. Just because one can add text information to a training simulation does not mean that the extra information actually improves the efficiency or effectiveness of that training. The problem is similar to adding features to automobiles; it seems like a good idea to add phones or in-vehicle computer systems (for example, global
Enhancing Virtual Environments to Support Training
419
positioning system devices), but research shows that secondary tasks (dealing with the equipment) while driving decreases awareness and increases response times to emergencies (Ranney, Harbluk, & Ian Noy, 2005). We think that given our conceptual structure, this kind of research could be generalized to game based training. REFERENCES Ackerman, R. K. (2005, April). Army teaches soldiers new intelligence-gathering role [Electronic version]. SIGNAL Magazine. Retrieved July 10, 2007, from http:// www.afcea.org/signal/articles/anmviewer.asp?a=731 Albery, W. B. (2005). Multisensory cueing for enhancing orientation information during flight. Proceedings of the 1st International Conference on Augmented Cognition (CDROM). Las Vegas, NV: Augmented Cognition International. Boldovici, J. A. (1992). Toward a theory of adaptive training (Tech. Rep. No. 959). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. (ADA 254903) Deaton, J., Sanatarelli, T. P., Barba, C. A., & McCollum, C. (2006). Virtual environment cultural training for operational readiness (Tech. Rep. No. 1175). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. (ADA 315125) Diller, D. E., Roberts, B., & Willmuth, T. (2005). DARWARS Ambush!—A case study in the adoption and evolution of a game-based convoy trainer by the U. S. Army. Proceedings of the 2005 Fall Simulation Interoperability Workshop, Orlando, FL, September 2005. Fleishman, E. A., & Quaintance, M. K. (1984). Taxonomies of human performance: The description of human tasks. Orlando, FL: Academic Press, Inc. Gagne, R. M. (1954). Training devices and simulators: Some research issues. American Psychologist, 9(7), 95–107. Goan, T. (2006). A simulation-based, intelligent tutoring system for enhancing decision effectiveness in computer network defensive operations (AFRL Tech. Rep. No. 20). Mesa, AZ: Air Force Research Laboratory. (ADA 325997) Goldstein, I. L. (1987). The relationship of training goals and training systems. In G. Salvendy (Ed.), Handbook of human factors (pp. 963–975). New York: John Wiley & Sons. Hafich, A., Fowlkes, J., & Lenihan, P. (2007). Use of haptic devices to provide contextual cues in a virtual environment for training. Proceedings of the 28th Interservice/Industry Training Systems and Education Conference (CD-ROM). Arlington, VA: National Training Systems Association. Hays, R. T. (2001). Theoretical foundation for advanced distributed learning research (Rep. No. TR-2001-006). Orlando, FL: Naval Air Warfare Center Training Systems Division. Hays, R. T., & Singer, M. J. (1989). Simulation fidelity in training system design: Bridging the gap between reality and training. New York: Springer-Verlag. Hill, R. W., Belanich, J., Lane, H. C., Core, M., Dixon, M., Forbell, E., Kim, J., & Hart, J. (2006). Pedagogically structured game-based training: Development of the ELECT BiLAT simulation [Electronic version]. Proceedings of the 25th Army Science Conference. Retrieved April 5, 2007, from http://people.ict.usc.edu/~core/papers/ 2006-09-ASC06-ELECT-BiLAT-FINAL.pdf
420
VE Components and Training Technologies
Hopp, P. J., Smith, C. A. P., Clegg, B. A., & Heggestad, E. D. (2005). Interruption management: The use of attention-directing tactile cues. Human Factors, 47(1), 1–11. Jerome, C. J. (2006). Orienting of visual-spatial attention with augmented reality: Effects of spatial and non-spatial multi-modal cues (Doctoral dissertation, University of Central Florida, 2006). Dissertation Abstracts International, 67 (11), 6759. (UMI No. 3242442) Jerome, C. J., Witmer, B. G., & Mouloua, M. (2005). Spatial orienting of attention using augmented reality. Proceedings of the 1st International Conference on Augmented Cognition (CD-ROM). Las Vegas, NV: Augmented Cognition International. Jonassen, D. H., & Grabowski, B. L. (1993). Handbook of individual differences, learning, & instruction. Hillsdale, NJ: Lawrence Erlbaum. Lintern, G., & Roscoe, S. N. (1980). Visual cue augmentation in contact flight simulation. In S. N. Roscoe (Ed.), Aviation psychology (pp. 227–238). Ames, IA: Iowa State Press. Maki, R. H., Jonas, D., & Kallod, M. (1994). The relationship between comprehension and metacomprehension ability. Psychonomic Bulletin & Review, 1, 126–129. Matlin, M. W. (2005). Memory strategies and metacognition. In Cognition (6th ed., pp. 171–206). New York: John Wiley & Sons. Mayo, M., Singer, M. J., & Kusumoto, L. (2005, December). Massively multi-player (MMP) environments for asymmetric warfare. Journal of Defense Modeling and Simulation. Arlington, VA: National Training Systems Association. Merlo, J. L., & Wickens, C. D. (1999). Effect of reliability on cue effectiveness and display signaling (Tech. Rep. No. 19990518 078, ADA363440). Urbana-Champaign, IL: U.S. Army Laboratory. Miller, D. P., & Swain, A. D. (1987). Human error and human reliability. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (pp. 219–250). New York: John Wiley & Sons, Inc. Mosier, K., Skitka, L., Heers, S., & Burdick, M. (1998). Automation bias: Decision making and performance in high technology cockpits. International Journal of Aviation Psychology, 8, 47–63. Nicholson, D., Lackey, S., Arnold, R., & Scott, K. (2005). Augmented cognition technologies applied to training: A roadmap for the future. Proceedings of the 1st International Conference on Augmented Cognition (CD-ROM). Las Vegas, NV: Augmented Cognition International. Ockerman, J. J., & Pritchett, A. R. (1998). Preliminary investigation of wearable computers for task guidance in aircraft inspection. In G. Boy, C. Graeber, & J. M. Robert (Eds.), HCI-Aero ’98: International Conference on Human-Computer Interaction in Aeronautics. Ranney, T. A., Harbluk, J. L., & Ian Noy, Y. (2005). Effects of voice technology on test track driving performance: Implications for driver distraction. Human Factors, 47, 439–454. Ricard, G. L., Crosby, T. N., & Lambert E. Y. (1982). Workshop on Instructional Features and Instructor/Operator Design for Training Systems. (NAVTRAEQUIPCIH341, ADA121770). Orlando FL: Naval Training Equipment Center. Salvendy, G. (Ed.). (1997). Handbook of human factors and ergonomics. New York: John Wiley & Sons. Singer, M. J., Kring, J. P., & Hamilton, R. M. (2006, July). Instructional features for training in virtual environments (Tech. Rep. No. 1184, ADA 455301). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
Enhancing Virtual Environments to Support Training
421
Singer, M. J., Long, R., Stahl, J., & Kusumoto, L. (March, 2008). Formative evaluation of a Massively Multi-Player Persistent Environment for Asymmetric Warfare Exercises (Tech. Rep. 1227). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Smode, A. F. (1971). Human factors inputs to the training device design process (Rep. No. TR NAVTRAEQUIPCEN 69-C-0298-1). Orlando FL: Naval Training Equipment Center. St. John, M., Smallman, H. S., & Manes, D. I. (2005). Assisted focus: Heuristic automation for guiding users’ attention toward critical information. Proceedings of the 1st International Conference on Augmented Cognition (CD-ROM). Las Vegas, NV: Augmented Cognition International. Sticha, P. J., Singer, M. J., Blacksten, H. R., Morrison, J. E., & Cross, K. D. (1990). Research and methods for simulation design: State of the art (Tech. Rep. No. 914). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. (ADA 230 076) Swezey, R. W., & Llaneras, R. E. (1997). Models in training and instruction. In G. Salvendy (Ed.), Handbook of human factors (pp. 514–577). New York: John Wiley & Sons. Washburn, D. A., & Jones, L. M. (2004). Could olfactory displays improve data visualization. Computing in Science and Engineering, 6(6), 80–83. Wheaton, G. R., Rose, A. M., Fingerman, P. W., Korotkin, A. L., & Holding, D. H. (1976). Evaluation of the effectiveness of training devices: Literature review and preliminary model (Research Memorandum 76-6, ADA076809). Alexandria VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Yeh, M., & Wickens, C. D. (2000). Attention and trust biases in the design of augmented reality displays (Tech. Rep. No. ARL-00-3/FED-LAB-00-1, ADA440368). Savoy, IL: Aviation Research Laboratory, University of Illinois at Ubana-Champaign. Yeh, M., Wickens, C. D., & Seagull, F. J. (1999). Target cueing in visual search: The effects of conformality and display location on the allocation of visual attention. Human Factors, 41(4), 524–542.
This page intentionally left blank
ACRONYMS
AAR AC ACM ACT ACT-R ADL AHMD AI ALU AMIRE AMLCD ANOVA API AR ARI ARTESAS ARVIKA ATC AV AXL BARS CAD CAM CAVE CBT CEO CGF CGI CIC CMI COGNET CORDRA
after action review alternating current Association of Computing Machinery aircrew coordination training adaptive control of thought–rational Advanced Distance Learning advanced helmet mounted display artificial intelligence arithmetic and logic unit authoring mixed reality active-matrix liquid crystal display analysis of variance application programming interface augmented reality Army Research Institute Augmented Reality Technologies for Industrial Service Applications Augmented Reality for Development, Production, and Servicing air traffic control augmented virtuality Army Excellence in Leadership Battlefield Augmented Reality System computer-aided design Content Aggregation Model cave automatic virtual environment computer based training chief executive officer computer-generated force computer-generated imagery combat information center Computer Managed Instruction cognition as a network of tasks Content Object Repository Discovery and Registration/Resolution Architecture
424
Acronyms
COTS CPU CQB CRT CSF CUDA DAC DC DI D-ILA DIS DIVAARS DLP DMAS DMD DOF DRAM DRAWS DVI EPIC EVAs FLCOS FLEX FiST FOR FOV FPS FSMs GDTA GNSS GPGPU GPS GPU GUI HLA HMD HRTF HST HTML HWD ICT IED IEEE IID IPD ISAT ITD
commercial off-the-shelf central processing unit close quarters battle cathode ray tube course structure format Compute Unified Device Architecture digital-analog converter direct current directivity index direct image light amplifier distributed interactive simulation Dismounted Infantry Virtual After Action Review System digital light processing (trademark owned by Texas Instruments Digital Motion Analysis Suite digital micromirror device degrees of freedom dynamic random access memory Defence Research Agency Workload Scale digital visual interface executive process/interactive control extravehicular activities ferroelectric/field sequential liquid crystal on silicon Flexible Method of Cognitive Task Analysis Fire Support Team field of regard field of view first-person shooter finite state machines goal-directed task analysis global navigation satellite system general-purpose graphics processing unit global positioning system graphics processing unit graphical user interface high level architecture head/helmet-mounted display head-related transfer function Hubble Space Telescope HyperText Markup Language head-worn display Institute for Creative Technologies improvised explosive device Institute of Electrical and Electronics Engineers interaural intensity difference interpupillary distance Interactive Storytelling Architecture for Training interaural time difference
Acronyms ITS JPL JSC LCD LCoS LEEP LMS MIS MIT MMOG ModSAF MOUT MR MRE NASA NATOPS NOTAMs NPCs OLED OneSAF ONR OpenGL PCI PCIe PC PDU POP R&D RDECOM RGB ROV RTE RTI SA SABARS SAF SAGAT SBAT SBT SCORM SCOs ShipMATE SIGdial SIGGRAPH SIMNET SME SNR
intelligent tutoring system Jet Propulsion Laboratory Lyndon B. Johnson Space Center liquid crystal display liquid crystal on silicon large expanse extra perspective learning management system minimally invasive surgery Massachusetts Institute of Technology massively multiplayer online games Modular Semi-Automated Forces military operations on urban terrain mixed reality Mission Rehearsal Exercise National Aeronautics and Space Administration Naval Air Training and Operating Procedures Standardization notice to airmen nonplayer characters organic light-emitting diode/display One Semi-Automated Forces Office of Naval Research open graphics library peripheral component interconnect peripheral component interconnect express personal computer protocol data unit Prediction of Operator Performance research and development Research, Development and Engineering Command red, green, and blue remotely operated vehicle Run-Time Environment run-time interface situation awareness Situation Awareness Behaviorally Anchored Rating Scale semi-automated forces Situation Awareness Global Assessment Technique Synthetic Battlefield Authoring Tool scenario based training Sharable Content Object Reference Model sharable content objects Shipboard Mobile Aid for Training and Evaluation Special Interest Group on Discourse and Dialogue Special Interest Group for Graphics and Intermixed Techniques simulated network subject matter expert signal-to-noise ratio
425
426
Acronyms
SOCRATES SOLM SPL SRAM Srms SWAT SXRD TADMUS TARGETs TDT TENA TLCTS TLX TOLM TRANSoM TRIO TTPs UAS UCF UIs UNC USB UWB VACP VBAP VE VECTOR VESARS VETT VIRTE VR WFS XML
System of Object Based Components for Review and Assessment of Training Environment Scenarios Search Object Lighting Model sound-pressure level static random access memory station remote manipulator system special weapons and tactics Silicon X-tal Reflective Display (trademark owned by Sony Corporation) Tactical Decision Making under Stress Targeted Acceptable Responses to Generated Events and Tasks team dimensional training Test and Training Enabling Architecture Tactical Language and Culture Training System NASA Task Load Index Table Object Lighting Model Training for Remote Sensing and Manipulation Trainer for Radar Intercept Officers tactics, techniques, and procedures unmanned aircraft system University of Central Florida user interfaces University of North Carolina universal serial board/bus ultrawide band visual, auditory, cognitive, and psychomotor Vector Base Amplitude Panning virtual environment Virtual Environment Cultural Training for Operational Readiness Virtual Environment Situation Awareness Review System Virtual Environments Technology for Training Virtual Technologies and Environments virtual reality wave field synthesis Extensible Markup Language
INDEX
AARs. See After action review (AAR) process Abelson, R., 382, 385 Absorptive acoustic treatments, 99. See also Audio Acceleration, 231, 351–52. See also Vestibular display systems Acceptability of augmented reality (AR), 151 Accommodation, 52–53, 144 Accuracy of VE components, 31, 32t, 50 Acoustic sensors, 46 ACT (aircrew coordination training research and development program), 288 Active loudspeakers, 105 Active-matrix liquid crystal displays (AMLCDs), 58–59 Active perception, 157 Active stereoscopy, 72–73 ActiveX components, 315 Actor, defined, 199 AC trackers, 29 ACT-R (adaptive control of thought– rational), 194 Adams, S. R., 356 Address space, 179 Adjuncting cues, 412 Advanced Augmented Reality Technologies for Industrial Service Applications (ARTESAS), 140, 141f Advanced Distance Learning (ADL) Co-Laboratories, 313
AeroScout, 31 Affective domain, 273 After action review (AAR) process, 297–308; AAR engines, 306–7; Asymmetric Warfare Virtual Training Technology project, 418; debriefs of critiques as alternative to, 299; design of virtual AAR systems, 304–5; Dismounted Infantry Virtual AAR System (DIVAARS), 300–304, 303t, 328; distributed AARS, 307–8; feedback and, 298–300; key statistics provided by VESARS, 340–41; overview, 297–98, 328; Socratic method used by, 299; system of object based components for review and assessment of training environment scenarios (SOCRATES), 306–7; voice communication, 305–6. See also Feedback Ageia, 185 AI (artificial intelligence), 174, 394–95 Aircrew coordination training (ACT) research and development program, 288 Akeley, K., 58, 187–88 Alladin ride, 60 Allard, J., 74 ALU (arithmetic and logic unit), 176–77 Ambient noise, reduction in, 97–98. See also Environment Ambisonics, 101, 111 Ambush!, 416
428
Index
AMD, 177–78, 184 AMIRE (authoring mixed reality) project, 140–41 AMLCDs (active-matrix liquid crystal displays), 58–59 Amplitude, 91, 100. See also Audio Anaglyphic stereoscopy, 73–74 Analog computational programs, xiv Analysis of variance (ANOVA) analysis, 247 Andes physics tutoring system, 395 Anecdotal evidence, 384–86 Angular resolution of HMDs, 55 Animal training, 122 Animation, 15–20 Animazoo, 28 Annaswamy, T. M., 122–23 ANOVA (analysis of variance) analysis, 247 Aperture masks, 82f Application design for AR, 150–51 Application programming interface (API), 315 Apprentice level mentors, 284 AR. See Augmented reality Argonne National Laboratory, 119 Arithmetic and logic unit (ALU), 176–77 Arm motions, 25 Army Excellence in Leadership (AXL) project, 387–89 Army Research Institute (ARI) for the Behavioral and Social Sciences, 328, 366, 416–17. See also U.S. Army Art. See Three-dimensional (3-D) modeling Art assets, 15 ARTESAS (Advanced Augmented Reality Technologies for Industrial Service Applications), 140, 141f Arthur, K., 244 Artificial force fields (virtual fixtures), 124–25 Artificial intelligence (AI), 174, 394–95 Artman, H., 327 ARVIKA (Augmented Reality for Development, Production and Servicing), 140
Ascension, 29, 30, 31, 63 Ascension Technology Corporation, 26 Ashdown, M., 83 ASIO, 111 Assessment: assessment engines, 319–25, 319f; assessment pyramid, 322f; assessment tier, 321; building assessments into simulation based learning experiences, 320–22; of cognitive workload, 348–58; of cognitive workload, comments on application of workload assessment, 355–56; of cognitive workload, factors affecting workload, 350–53; of cognitive workload, primary task measures, 355; of cognitive workload, secondary task measures, 355; of cognitive workload, selecting measures, 357–58; of cognitive workload, workload assessment validity, 356–57; of cognitive workload, workload measurement methods, 353–55; competency based approach, 322–23, 322f; data tier within assessment, 321; dynamic learning environments, 318–20; learner assessment, 319f, 323; learning environment tier within assessment, 320–21; measurement of performance, 266, 354–56; measurement of proficiency, 356; overview, xv; of simulation based training, 312, 318– 20; static, 318; in tutoring environments, 397. See also Evaluating VE component technologies; Feedback Astigmatism, 51 Asymmetric entities and tactics, 201 Asymmetric Warfare Virtual Training Technology, 418 ATI Technologies, 18, 74, 83, 185 Atlantis Cyberspace, Inc., 170 Attack Center Trainer, xiv Audio, 90–115; basic digital signal concepts, 95–96; computing audio, 110–13; considerations for virtual environments, 96–102; environmental effects, 113; equipment considerations,
Index 102–10; headphones, 93–94, 107–108, 107–10; integration with visual displays, 96–97; overview, 8, 9–10; physical quantities, waves, and decibels, 91–92; psychoacoustics, 92– 95, 93f; safety considerations, 106, 110; sound, defined, 91; spatial hearing, 93–95; vendors and manufacturers, 114. See also Noise Audio sensors, 46 Augmented reality (AR), 135–52; capabilities and benefits for training, 137–39; Conducting Maintenance Using the ARTESAS Prototype, 141f; described, xv, 9, 411–12; development of, 139; display technologies, 145; Example Cutaway View Using Augmented Reality, 138f; Experimental AR Maintenance Training System, 136f; general structure of training systems, 143–51; intelligent tutoring systems, 415–16; Reality-Virtuality continuum, 135, 137f; research and applications for training, 139–43; Servicing a Laser Printer Using Augmented Reality, 140f Augmented Reality for Development, Production and Servicing (ARVIKA), 140 Augmented Reality through Graphic Overlays on Stereovideo project, 143 AuSIM, 109, 110–11 Australian Army, 207 Authoring, 150 Authoring mixed reality (AMIRE) project, 140–41 Automated planning, 200 Automaticity, 279 Autonomous agent modeling, 200 Autonomous behavior generation, 192 Avatar (self-representation), 159–61, 165, 255–56 Awareness. See Situation awareness AXL (Army Excellence in Leadership) project, 387–89 Azuma, R. T., 25, 44
429
Badcock, D. R., 224 Bajura, M., 139 Baker, D. P., 274, 279 Baker, E. L., 213–14 Baltzley, D. R., 232 Band limited, defined, 95 Barco, 63, 83 Bare hand tracking, 149–50 Bareiss, R., 382 BARS (Battlefield Augmented Reality System), 142 Bartlett, F. C., 384 Bas¸dog˘an, C., 124–25 Bass management hardware, 104 Battlefield Augmented Reality System (BARS), 142 Battlefield phenomenology models, 192 Baylor, A. L., 401 Beall, A. C., 164 Beaubien, J. M., 274, 279 Behavior: behavioral rating scale, 335– 37, 336f; behavior emulation, 193–94; behavior generation, 193–96, 195f; as factor affecting workload, 352; overview of models, 10; realism requirements, 196–97, 242; representation language, 200; validation, 197 Being There project, 66f Benko, H., 148–49 Berbaum, K. S., 229, 231 Berg, B. L., 240 Bier, E., 148 BiLat, 417–18 Billinghurst, M., 148 Bimber, O., 69, 79, 83, 143 Binocular overlap, 53 Binoculars, 51 Biological computers, 13 Birnbaum, L., 382 Bishop, G., 44 Bit depth, 112 Black overlap, 81–83 Blending techniques, 80–83, 81f Blindness, 46 Blogging, 383–84 Bloom, B. S., 273
430
Index
Body language, 19 Bolas, M., 228 Boldovici, 411 Bolte, B., 327 Bortolussi, M. R., 356 Boulanger, P., 141 Bowers, C. A., 366 Box trainers, 122 Boyd, Chas., 178 Brannick, M. T., 330 Braue, D., 207 Brick-wall limiters, 109 Brown, D. G., 142 Brown, M. S., 69, 84 Brown, P., 401 Brownson, A., 355, 358 Bruner, J., 384 Buffer length, 112–13 Bump mapping, 17 Burgkart, R., 142 Burns, E., 255–56 Burns, J. J., 263, 288–90, 291 Burnside B. L., 366 Busses (PC), 175, 176 Buxton, W., 148 Cache memory, 176–77, 178, 179 CAE, 57 CAL (AMD), 184 Calderwood, R., 364 CAM (Content Aggregation Model), 314 Cameras, 35–38, 36-37f, 40f, 63–64, 79 Campbell, C. H., 366 Cannon-Bowers, J. A., 263, 272, 273–74, 288–90, 291, 326, 328 Cao, Q., 383–84 Carnegie Mellon University, 395 CAS (computer-aided design) tools, 69 Castor, M. C., 354 Caudell, T., 139 The CAVE, 63, 205 CBT (computer based training), 313 Central processing units. See CPUs Certus, 30 CGF (computer-generated forces) systems, 191
CGI (computer-generated imagery) simulator displays, 224 CGI displays, 229 Cham, T. J., 69 Chance, S. S., 164 Chauvigne, L., 150 ChemSensing, Inc., 127 Chen, D. T., 147 Chi, M. T. H., 395 China (ancient), xiii–xiv Christie Digital Systems, 83 Chromatic aberrations, 51, 54 Chromium, 64, 84 Civilization, 207–8 Class I unmanned aircraft systems (UASs), 375 ClickOnce, 315 Client-server architecture, 186, 187 Clock rates, 112 Close Combat: First to Fight, 209 Closed-loop training, 160–61, 165, 170 Close quarters battle (CQB). See Designing user interfaces for training dismounted infantry Cluster rendering, 84, 176, 178 CMI (Computer Managed Instruction) data model, 317 Coaxial loudspeakers, 103 Cockburn, A., 148 COGNET (cognition as a network of tasks), 194 Cognitive Arts, Inc., 379 Cognitive awareness. See Assessment, of cognitive workload; Situation awareness Cognitive skills, 157–58, 193–94, 272– 74, 352. See also Training: guidelines for using simulations Cognitive task analysis (CTA), described, 206 Cognitive Tutors, 395 Cohen’s kappa values, 372 Collaboration systems, efficiency of, 257–59 Collaborative Warrior Tutoring system, 399 Collimation, 229–30
Index Collision detection and response, 20 Color gamut, 83 Comas (blur), 51 Commercial, off-the-shelf (COTS) tracking systems: inertial trackers, 28– 29; magnetic tracking systems, 29; mechanical tracking systems, 28; optical tracking systems, 30; radio frequency trackers, 30–31; ultrawide band (UWB) communications, 30–31 Commercial off-the-shelf (COTS) games, 206–10; business models of building training games, 216; costs of building, 207, 208–9, 210, 216–17; evaluation of, 208; game designers versus educators, 215–16; limitations of, 207–8; partnering with COTS companies to build training games, 208–9, 210. See also Games and gaming technology for training; specific games by name Commodity products, VE components as, 11 Communication via storytelling. See Story based learning environments Compact designer, 69 Competencies based learning, 322–23, 322f Component technologies, overview, 1–13, 2f, 3f, 4–5t, 240 Composite behaviors, defined, 200 Compute animation services, 15–16 Computer-aided design (CAD) tools, 69 Computer based training (CBT), 313 Computer-generated forces (CGF) systems, 191 Computer-generated imagery (CGI) simulator displays, 224 Computer Managed Instruction (CMI) data model, 317 Compute Unified Device Architecture (CUDA), 184 Computing components: behavior models, 10; data logs, 8–9; display devices, 8; game engines, 10, 18, 150– 51, 210–12; interfacing with trackers, 33; networking, 8, 175, 185–87, 192, 316; overview, 7–8, 10. See also
431
Rendering and computing requirements Confirmation bias, 415 Content Aggregation Model (CAM), 314 Content Object Repository Discovery and Registration/Resolution Architecture (CORDRA), 314 Context within augmented reality, 137 Control units, 176–77 Convergence, 52–53 CORDRA (Content Object Repository Discovery and Registration/Resolution Architecture), 314 CoreAudio, 111 Coren, S., 255 COTS games. See Commercial off-the-shelf (COTS) games COTS tracking systems. See Commercial, off-the-shelf (COTS) tracking systems Cotting, D., 74 Course structure format (CSF), 317 Coyne, J. T., 142 CPUs (central processing units), 175, 176–77 CQB (close quarters battle). See Designing user interfaces for training dismounted infantry Crabb, B. T., 375 CRE_TRON API, 110–11 Creative Labs, 108, 113 Cress, J. D., 130 Critical incidents, defined, 257 Critiques, 299. See also Feedback CrossFire product, 185 Crossover/bass management, 104 Crowell, H. P., 328 CRT (cathode ray tube) projectors, 57, 63, 71, 81–82 Crutch effects, 414–15 Cruz-Neira, Carolina, 63, 205 Crystal Island, 399, 401, 403 CSF (course structure format), 317 CUDA (Compute Unified Device Architecture), 184 Cultural validity of nonplayer characters, 416 CyberGlove, 28, 121
432
Index
CyberGrasp, 125–26 Cybersickness. See Motion sickness–like symptoms DAC (digital-analog converters), 111–12 Daeyang, 57–58 DaimlerChrysler Motors Company LLC, 50, 73–74 Damos, D. L., 355 Dang, T., 122–23 Data: logs, 8–9; tiers for assessment, 321; tiles, 148; timestamps, 305; triangulation, 257; voice, 305–6 DataGlove, 63 DB (decibels), defined, 92. See also Audio DC trackers, 29 Dead reckoning, 186 Decibels (dB), defined, 92. See also Audio Deering, Michael, 34, 63 DeFanti, Thomas A, 63 Defence Research Agency Workload Scale (DRAWS) measurement technique, 354 Defense Advanced Research Projects Agency, 56–57, 213 De Florez, Luis, xiv Degrees of freedom (DOF), 31, 160, 161, 164, 170 Delay based spatialization algorithms, 101 Delay-induced error, 25, 34–35, 38–39 Delphi Technique, 368 Delta3D, 150 Delta heart rate, 247, 256 Deployable head-mounted displays (HMDs), 49 DeRose, T., 148 Designing user interfaces for training dismounted infantry, 157–71; background, 158–59; CQB (close quarters battle), described, 157, 158– 59; design strategies, 159–61; Gaiter, 162–65, 163f, 168, 170; intuition of users, 168; Pointman, 162, 165–70, 166f, 167f. See also User interfaces
Destineer, 209 Device-driven interface. See Pointman De Waard, R., 355 Diffusion of innovation theory, 257 Diffusive acoustic treatments, 99 Digidesign, 112 Digital-analog converters (DAC), 111–12 Digital computers, history of development, xiv Digital media, described, 15–16 Digital micromirror device (DMD) projectors, 71 Digital Projection, Inc., 83 Digital video interface (DVI), 72 D-ILA technologies, 71 Dipoles, 102, 105–106 Direct3D, 17–18 Directivity of loudspeakers, 103–4 Direct radiators, 102 DirectSound, 110 DirectX 83, 183 DIS architecture. See Distributed interactive simulation architecture Dismounted infantry tactics. See Designing user interfaces for training dismounted infantry Dismounted Infantry Virtual AAR System (DIVAARS), 300–304, 303t, 328 Disney, 60 Disorientation. See Motion sickness–like symptoms Displacement mapping, 17 Display devices: AR display technologies, 144–45; audio displays, 9–10; motion sickness–like symptoms associated with, 221–25; overview, 8; refresh rates, 18, 223–24; touch displays (see Haptics); visual displays, 8, 9, 226–27. See also Multimodal display systems Display walls, 185 Distributed interactive simulation (DIS) architecture, xiv, 185–87, 190–91, 304–5, 307 DiZio, P., 225, 227, 228
Index Dizziness. See Motion sickness–like symptoms DLP technology, 11 DMAS (digital motion analysis suite), 30 DMD (digital micromirror device) projectors, 71 Doctrinal functionality, 199 Dolby, 107, 108 Domain knowledge. See Knowledge elicitation DRAM (dynamic random access memory), 179 Draper, M. H., 229 Drascic, D., 143 DRAWS (Defence Research Agency Workload Scale) measurement technique, 354 Dreams, 13 Drexler, J. M., 220 Drift, 31, 32t Dual-use systems, 138 Duller, M., 74 Durlach, N. I., 222, 224, 227–28, 229 DVI (digital video interface), 72 D/W (throw ratio), 72 Dynamic error, 38–39 Dynamic learning environments, 318–19 Dynamic random access memory (DRAM), 179 Dynamic range of speakers, 104 DynaSight, 30 Earplugs, 110 Earth-Sun relationships, 141–42 ECMAScript, 315 Ecological approach to human perception, 157 Effectiveness of VE systems. See Assessment Effects of exposure to VE systems. See Motion sickness–like symptoms Efficiency of HMDs, 49 Eggemeier, F. T., 355 Egyptians (ancient), 270 E-learning, 311–12, 313 Electrodes, 130
433
Electrohome, 63 Electronic Visualization Lab–University of Illinois at Chicago, 63 Ellisman, M., 232 Elm, W., 375 The Elumenati, 67f, 69 eMagin Corporation, 57–58, 59 Emergency responses (FLEX example), 371–72 Emmerling, A., 69 Endsley, M. R., 326, 327, 328, 329–30, 333 End-to-end latency, 7, 256 Engagement Skills Trainer 2000 Simulator, 413 Engelhart, M. D., 273 Enhancing virtual environments to support training, 407–19; fidelity, 410–11; goals, strategies, tactics, and features, 407–10, 408f; implementation methods, 412–15; instructional features as enhancements, 411–12; instructional features: example applications, 415–18 Enns, J. T., 255 Entity state PDU, 186 Environment, 97–99, 113, 137, 157 Environmental Audio Extensions, 113 E-One, 127 EPIC (executive-process/interactive control), 194 Epic Games, 212, 381 Episodic memory, 384–85 Equivalence testing, 247–49 Errors: delay-induced, 25, 34–35, 38–39; diagnosing in tutoring environments, 397; first-order dynamic error, 39, 43; nonrepeatable errors, 35–38; prediction intervals and, 44; repeatable errors, 35– 38; total tracker error, 43 Ethical treatment of human subjects, 257 Evaluating VE component technologies, 240–59, 253f; ANOVA (analysis of variance) analysis, 247; collaboration systems, 257–59; locomotion studies, 251–54, 253f; MACBETH (managing avatar conflict by a technique hybrid),
434
Index
241; performance and efficacy, 256– 59; performance and latency, 256–57; the Pit, 50, 241, 242–47, 243f, 248f, 251–52, 256; purpose of, 240–41; role in system development, 241, 259; sensory input fidelity, 242–50; testbed environment and metrics, 242; types of, 241; user interfaces, 250–56. See also Assessment Events, 265–66, 274, 320 EverQuest, 185 “Every Soldier a Sensor Simulation” (U.S. Army), 416 Executive-process/interactive control (EPIC), 194 Expectation violation, 384–86 Experience manipulation, 402 Explicit knowledge, defined, 364 Exposure to VE systems. See Motion sickness–like symptoms Extensible Markup Language (XML), 307–8 Extravehicular activities (EVAs), 121 Extrinsic feedback, described, 298–99 Eye box (exit pupil), 51–52 Eye relief, 51 Face validation, 197 Facial animation, 19 Fakespace Labs BOOM, 60 Fakespace Labs Wide5, 56, 60 Falcon haptic device, 125 Falling. See the Pit Farmer, E., 355, 358 FaroArm, 28 FARO Technologies, 28 FBM Facility, xiv Federation object model, 187 Federations, 186 Feedback: AAR and, 298–300; augmented reality (AR), 137; “Every Soldier a Sensor Simulation” (U.S. Army), 416; extrinsic, 298–99; haptic, 352; implicit, 402; interaction-feedback loop, 3, 3f; intrinsic, 298–99; overview, 267; self-correction skills, 290–91;
self-critiques, 290; situation awareness training through, 328, 337–41, 341f, 342–43f; tactile, 352; in tutoring environments, 395–96, 397; usercentered design specification and, 159, 161; on workload (See Assessment, of cognitive workload) Feiner, S., 139, 147, 148–49 Ferguson, W., 398 Feurzeig, W., 397, 398 Fictional scenarios. See Story based learning environments Fidelity: augmented reality (AR) systems, 137; described, 410–11; as factor affecting workload, 350–52; functional, 274–75, 410; of lighting simulations, 18, 249–50, 249f, 250t; physical, 274–75, 410; psychological, 274–75; requirements, 242; sensory input, 242–50; task, 410–11 Field, A., 240, 247 Field curvature, 51 FOR (field of regard), 161 Field of regard (FOR), 161 Field of view (FOV), 53–56, 242–44; defined, 51, 161; Fakespace Labs Wide5, 56; horizontal, 161; medium fields of view displays, examples, 57; motion sickness–like symptoms associated with, 221–22, 224; narrow fields of view displays, examples, 57– 58; NASA example, 56; throw ratio (D/W), 72; user-centered design specification and, 161; vertical, 161 Field sequential liquid crystal on silicon color displays (FLCOS), 58 Fingerman, P. W., 414 Finite state machines (FSMs), 194–96, 195f Fire Support Teams (FiSTs), 328, 333–35. See also VESARS FireWire breakout boxes, 111 First-order dynamic error, 39, 43 “Fish tank” VR, 26 Flashbacks, 233 Flash memory storage devices, 180 FlatWorld, 67f, 142
Index FLCOS (liquid crystal on silicon) panels, 57, 58 Fletcher-Munson curves, 92 FLEX (Flexible Method of Cognitive Task Analysis) method, 369–71, 371–75, 375 Flexibility of HMDs, 49 Flicker, 222, 224 Flight simulation, history of, xiv, 129f, 270 Flock of Birds, 29 FMOD, 110, 113 Focal plane, 51 Fokker Control Systems, 119 Food science, 128 Form of HMDs, 55–56 Fort Irwin, California, 191 Fort Leavenworth, Kansas, 418 FOV. See Field of view Fowlkes, J. E., 222, 288 Fractal procedures, 16 Frame lock, 185 Frame of reference, 146 Frame update rates, 241 Frank, L. H., 223, 231 Franz, T., 288 “Freeze, save, and restart,” 416 Frequency, defined, 91. See also Audio Frequency response of loudspeakers, 103 FSMs (finite state machines), 194–96, 195f Fuchs, H., 79, 139 Functional fidelity, 274–75, 410 Furst, E. J., 273 Future Combat Systems, 375 Gabbard, J. L., 24, 259 Gaiter, 162–65, 163f, 168, 170 Game Developers Conference, 205 Games and gaming technology for training, 205–17; audio hardware devices, 111–12; aversion to games by decision makers, 215; building training games from scratch, 210; business models for building training games, 216; client/server paradigm within game industry, 186; costs, 208–9, 210,
435
216–17; creating commercial games, 206–8; environmental effects of audio systems, 113; evolving standards for simulation and game based learning, 323–25; game designers versus educators, 215–16; game engines, 10, 18, 150–51, 210–12; improvements and research needed, 212–16; intelligent tutoring systems, 212–13; interfacing interactive 3-D simulations with learning systems, 314–16; learning management systems, 212; massively multiplayer online games (MMOGs), 185, 187; partnering with commercial game companies, 208–9, 210; scenario generators, 213; scoring/assessment mechanisms, 318– 20. See also specific systems by name Garbis, C., 327 Garrett, W. F., 147 Gaunet, F., 164 GDTA (goal-directed task analysis), 331– 35, 332f Gears of War, 212 General-purpose GPU (GPGPU) computing, 183–84 Gen lock, 185 Geographic distribution, 191 Geometric distortions, 51, 54 Geometric modeling programs, 16 Geometry shaders, 182–83 German Ministry of Education and Research, 140 Gibson, E. J., 242 Gibson, J. J., 157 Global illumination, 18 Global learner profiles, 323 Global navigation satellite system (GNSS) receivers, 147 Global positioning system (GPS), 25, 27, 147 “Glove” devices for tracking, 26, 28, 121 GNSS (global navigation satellite system) receivers, 147 Goals: enhancing virtual environments to support training, 407–8, 408f; goal
436
Index
based scenarios, 378–79; goal-directed task analysis (GDTA), 331–35, 332f Goblin XNA, 151 Goodson, J. E., 219, 232, 233 Gordon, A., 383–84 Gouranton, V., 74 GPGPU (general-purpose GPU) computing, 183–84 GPS (global positioning system), 25, 27, 147 Graphics, 15–20; art assets, 15; graphics accelerators, 180; modeling object movement dynamics, 18–20; modeling objects—shape and surface appearance, 16–17; recommended references, 15; scene lighting, 17–18; software packages, 16 Graphics processing units (GPUs): Block Diagram of a Modern Graphics Processing Unit, 182f; described, 63, 175, 180–81; general-purpose GPU (GPGPU) computing, 183–84; for multiprojector system, 74; practical considerations, 184–85; rendering and computing requirements, 148, 180–85; shader models, 18, 83, 181–83, 181t, 184; warp-and-blend rendering techniques, 83; workload balance, 183 Grodski, J., 143 Gross, M., 74 Ground truth, 298 Group training. See Team training Gustafsson, J., 145 Gustatory display systems, 128 Gutierrez-Osuna, R., 127 Gyarfas, F., 79 Gypsy-6, 28 Hafich, A., 417 Halo 2, 207 Handheld video see-through displays, 145 Hands, 26, 28, 118, 121 Haptic Master, 119 Haptics, 118–26; as adjuncting cue, 412; artificial force fields for training and task guidance, 124–25; challenges, 125–26; described, 118; feedback, 352;
haptic technology, 118–19; metaphoric or symbolic, 417; overview, 8; passive, 245–47, 246f; training applications, 119–24, 121f, 123f. See also Multimodal display systems Haringer, M., 150 Hart, S. G., 354 Hausmann, R. G., 395 Hays, R. T., 271, 275, 410 Headaches. See Motion sickness–like symptoms Head-mounted displays (HMDs), 48–62; benefits of choosing, 48–50; delay-induced error from, 25; described, 144–45; design characteristics, 50–56; end-to-end latency, 7, 256; examples of component technologies, 56–60, 63; FOV HMD, 242–44; history of development, 139; interpupillary distance, 227; motion sickness–like symptoms associated with, 219–20, 223, 226–29; mounting approaches, 59–60; operator discomfort with, 353; overview of visual issues and terms, 51; position trackers, 228–29; testing, 60–61; tracking for training, 23; visual display types, 226–27; weight of, 25, 59, 60, 227–28. See also specific systems by name Head-mounted projectors, 58 Headphones, 93–94, 107–10 Head-related transfer functions (HRTFs), 93–94, 108–9 Headroom, 104 Head tracking, 9, 26, 34, 43f Health. See Motion sickness–like symptoms Hearing. See Audio Hedley, N. R., 141–42 Heilig, M. L., 116, 127 Hettinger, L. J., 222 HiBall tracking system, 30, 245–47 Hidden objects and information, 138–39 High level architecture (HLA), 185–87 Hill, W. H., 273 Hinckley, K., 157 Hinrichs, T., 382
Index Hirota, G., 147 Hix, D., 259 HLA (high level architecture), 185–87 Holding, D. H., 298, 414 Hole, G., 240, 247 Ho¨llerer, T., 147, 149–50 Holloway, R. L., 34 Holodeck, 11–13 Homeland security (FLEX example), 371–72 HRTFs (head-related transfer functions), 93–94, 108–9 HST (Hubble Space Telescope), 121 HTML (HyperText Markup Language), 307, 315 Hubble Space Telescope (HST), 121 Human episodic memory, 384–85 Humans (virtual), 19 Human subjects, ethical treatment of, 257 Hutchins, E., 158 Hy-BIRD, 31 Hybrid tracking systems, 29, 31, 147–48 HyperText Markup Language (HTML), 307, 315 ICT Leaders Project, 380–89; expectation violation, 385–86; interview methodology, 382–83; outcome-driven simulations, 387–89 IEEE (Institute of Electrical and Electronics Engineers), 186, 315, 324 IID (interaural intensity difference), 93–94, 100 Ilie, A., 69 Illness. See Motion sickness–like symptoms Image generators, 74 Image quality. See Resolution Immersion, 3, 6, 242, 265–66 Immersion Corporation, 28, 121, 125 Immersive Group Simulator or Quantum3D’s ExpeditionDI, 170 Impedance devices, 119 Implicit feedback, 402 Implicit knowledge (tacit knowledge), 364
437
Industrial training applications, 140–41, 141f, 142f InertiaCube series, 29 Inertial trackers, 27, 28–29, 147 Infantry movements. See Designing user interfaces for training dismounted infantry Infitec, 64, 73–74 Input devices, overview, 7 Insko, B., 245–47 Institute for Creative Technologies (ICT), 22, 127, 379, 380–89 Institute for Learning Sciences, 378–79 Institute of Electrical and Electronics Engineers (IEEE), 186, 315, 324 Institutional Review Board, 257 Instructional features, described, 410 Instructional goals and strategies, overview, 266–67, 408–9 Instructional tactics, 409 Instruction sets, 177 Instructor control, 413 Intel, 177–78 Intelligent tutoring system (ITS), 212–13, 393–404; augmentation of intelligent tutoring systems, 415–16; Collaborative Warrior Tutoring system, 399; effectiveness of, 394–95; expanding the space of intelligent tutoring interactions, 399–400; feedback, 397; human and computer tutoring, 394–96; open-movement environments, 398–99; pedagogical agents, 400–401; pedagogical experience manipulation and stealth tutoring, 402–3; real time environments, 397– 98; Tactical Language and Culture Training System (TLCTS) mission environment, 399, 401, 404; TRANSoM (Training for Remote Sensing and Manipulation), 398; TRIO (Trainer for Radar Intercept Officers), 398 Intensity, defined, 91–92. See also Audio Interaction devices and techniques for AR, 148–50, 149f Interaction-feedback loop, 3, 3f
438
Index
Interactive Storytelling Architecture for Training (ISAT) system, 402 Interactivity, 6–7. See also End-to-end latency Interaural intensity difference (IID), 93– 94, 100 Interaural time difference (ITD), 93–95 Interfacing interactive 3-D simulations with learning systems, 311–25; assessment of simulation based training, 312, 318–20; building assessments into simulation based learning experiences, 320–22; communications via SCORM, 317; competencies based learning, 322–23; evolving standards for simulation and game based learning, 323–25; games and simulation: delivery and deployment, 314–16; interfacing simulations with learning systems, 311–13; managed learning, 313–14; “use cases” for using simulations, 312–13 Interference filter technology, 73 Internal display delay, 39 Internal parameters, 39 Internet. See Blogging; Interfacing interactive 3-D simulations with learning systems Interpupillary distance (IPD), 51, 52, 227 InterSense, 31, 46 Interservice/Industry Training, Simulation & Education Conference, 205 Interview method, 368 Intrinsic feedback, 298–99 Intrinsic parameters, 39 IODisplays, 57–58, 59 IOSONO, 102 IPD (interpupillary distance), 51, 52, 227 ISAT (Interactive Storytelling Architecture for Training) system, 402 IS-900, 31 IS-1200, 31 ITD (interaural time difference), 93–95 ITS. See Intelligent tutoring system
Jacob, R. J. K., 157 Jasper Woodbury project, 378 Java Web Start, 315 Jenkins, H., 206, 215 Jeong, H., 395 Jet Propulsion Laboratory (JPL) (NASA), 120–21, 121f Jitter, 31, 32t Johnson, C., 382 Johnson, T., 79 Johnson, W. L., 396, 404 Johnston, J. H., 273–74 Jones, D. G., 327 Jones, M. B., 220 JSC (Lyndon B. Johnson Space Center), 121 JVC, 71 Kaiser Electro-Optics, 56–57, 59, 60 Kamper, D. G., 142 Kaufmann, H., 141 Kaye, Joseph, 127 Keio University Shonan Fujisawa Campus, 60 Kennedy, R. S., 220, 222, 229–30, 231, 232, 233–34, 252 Kenyon, R. V., 142 Kim, Y., 401 Kinesthetic perception, 118 Kirkley, J., 375 Kiser, R. D., 375 Kishino, F., 135 Kjellberg, T., 145 Klein, G., 364 Kline, T., 142 Knerr, B. W., 328 Kno¨pfle, C., 83, 150 Knot-like figures example, 249–50, 249f, 250t Knowledge elicitation, 363–75; challenges to, 364; Delphi Technique, 368; direct methods, 367; FLEX method, 369–71; FLEX method: application of, 371–75; indirect methods, 367; interview method, 368; in the military training environment, 363–65; protocol
Index analysis, 368; in training development, 365–67 Knowledge socialization, 382 Kopin Corporation, 58–59 Korotkin, A. L., 414 Krathwohl, D. R., 273 Kresse, W., 83 Krueger, Myron W., 127 Labels for AR, 150 Lackner, J. R., 225, 227, 228 Lamarque, G., 74 Lampton, D. R., 328 Lane, N. E., 288 Language systems, 21–22 Laparoscopic surgery, 122 Laser-BIRD, 30 Laser printers, 139, 140f Lasers, 69 Latency, 31, 32t, 72, 113 La Viola, J. J., Jr., 224 Lawson, S. W., 143 LCD (liquid crystal display) projectors, 71 LCoS (liquid crystal on silicon) projectors, 71 A Leader’s Guide to After-Action Reviews (U.S. Army), 298 Learner assessment, 323 Learning environment tier within assessment, 320–21 Learning management systems (LMSs), 212, 267–68, 313–16, 316f, 318–20, 319f. See also SCORM Learning Technology Center, 378 Lee, T., 149–50 LEEP Optical, 63 Leibrecht, B. C., 375 Lepper, M., 395 Lester, J. C., 401 Levinson, S. C., 401 LIDAR (light detection and ranging) technology, 16 Lighting, 17–18; consistency of lighting style, 250; impact of lighting fidelity on physiological responses, 247–49, 248f; impact of lighting fidelity on task performance, 249–50, 249f, 250t
439
Light maps, 18 Likert scale, 335 Lindfors, C., 145 Link, Ed, 270 Link Trainer (“Blue Box”), xiv Lintern, G., 415 Liquid crystal display (LCD) projectors, 71 Liquid crystal on silicon (FLCOS) panels, 57, 58 Liquid crystal on silicon (LCoS) projectors, 71 Live simulation, 191 Livingston, M. A., 142, 147 LMSs. See Learning management systems Locomotion techniques, 164, 165, 168, 245–47, 251–54, 253f Loftin, R. B., 117 Logistic Technician, 209 Logs (data), 8–9 Longitudinal waves, 91 Loomis, J. M., 164 Loosely coupled computing, 178 Looser, J., 148 Loudness, defined, 92, 93f. See also Audio Loudspeakers, 100–108. See also Audio L-systems, 16 Luminance levels, 224 Luo, X., 142 Lussier, J. W., 366–67 Lyndon B. Johnson Space Center (JSC), 121 M1A1 simulator example, 198–99 MACBETH (managing avatar conflict by a technique hybrid), 241, 255–56 Macchiarella, N. D., 142 Macgregor, D., 364 Mach bands, 80 MacIntyre, B., 139, 147 Magic lenses, 148, 149f Magnetic tracking systems, 29 Magnifying glasses, 51 Majumder, A., 82, 83, 84
440
Index
Managed learning. See Learning management systems Managing avatar conflict by a technique hybrid (MACBETH), 241, 255–56 Managing avatar-object collisions, 255–56 Manes, D. I., 415 Markerbased tracking systems, 146–47 Markerless tracking, 147 Martin, D. W., 240 Massachusetts Institute of Technology (MIT), 123–24 Massively multiplayer online games (MMOGs), 185, 187 Mass storage, 175, 178, 179–80 MATLAB tools, 79 Matrox Graphics Inc, 74 Mattsson, L., 145 Mavor, A. S., 222, 224, 227–28, 229 Maximum tracked objects, 31, 32t May, J. G., 224 Mayer, R. E., 401 McCann, Kelly, 158–59 McCauley, M. E., 230, 233 Measurand, 28 Measurement. See Assessment; Evaluating VE component technologies Measurement of performance, 266, 354–56 Measurement of proficiency, 356 Mechanical tracking systems, 28 Medical simulation, xiv–xv, 121–23, 123f, 124–25, 142 Meehan, M., 242, 247 Meier, Sid, 206–7 Melin, E., 74 Meliza, L. L., 299 Melzer, J. E., 54 Memory (human), 384–85 Memory (PC), 175, 176–77, 178, 179, 335–37 Mersive Technologies, 83 Metacognitive skills, 279 Metaphoric haptics, 417 MicroScribe G2LX, 28 Microsoft Corporation, 315
Microvision, Inc., 59 Middleware, 18 Milders, M., 231 Milgram, P., 135, 143 Military Operations on Urban Terrain (U.S. Marine Corps), 158 Miller, J. W., 219, 232, 233 Minimally invasive surgery (MIS), 122 Misaligned imagery, 54 Mission Rehearsal Exercise (MRE) system, 400–401 MIT (Massachusetts Institute of Technology), 123–24 Mixed reality (MR) systems, 135, 143f. See also Augmented reality (AR) Mizell, D., 139 MMOGs (massively multiplayer online games), 185, 187 Mobility, 139, 146 Model tracing, 395–96 ModSAF (Modular Semi-Automated Forces), 198–99 Modulation transfer function, 38 Moffitt, K., 54 Monitor level alignment, 106–7 Mon-Williams, M., 226–27 Moreno, R., 401 Morrison, J. E., 299 Motherboards, 175, 176, 178 Motion, range of, 160, 161, 164, 170 Motion capture/pose determination, 28 Motion frequency, 230–31 Motion Imaging Corporation, 30 Motion-induced measurement noise, 39–40 Motion prediction, 43–44, 43f Motion sickness–like symptoms, 219–34; equipment features and, 221–25; as factor affecting workload, 353; with HMD based systems, 219–20, 223, 226–29; implications of VE sickness on training, 231–34; motion frequency, 230–31; with projector based displays, 229–30; prolonged and delayed aftereffects, 232–34; temporal lag, 231 Motor substitution, 164 MOTU, 112
Index Movie set props, 142 MR. See Mixed reality systems MRE (Mission Rehearsal Exercise) system, 400–401 Multichannel formats, 100–101 Multicore processors, 177–78 Multimodal display systems, 116–31; gustatory displays, 128; haptics, 118– 26; haptics, described, 118; haptics: artificial force fields for training and task guidance, 124–25; haptics: challenges, 125–26; haptics: training applications, 119–24, 121f, 123f; motivation and scope, 116–18; olfactory displays, 116, 117f, 126–28, 412; overview, 10; vestibular display systems, 129–30, 129f Multiple Intelligent Mentors Instructing Collaboratively system, 400 Multiprocessing, 177–78 Multisensory inputs, 412 Multiway loudspeakers, 103 nanoManipulator Collaboratory, 257–59 Narratives. See Story based learning environments NASA: FOV HMD, 56, 63; Hubble Space Telescope (HST), 121; Jet Propulsion Laboratory (JPL) (NASA), 120–21, 121f; National Aeronautics and Space Administration (NASA) Task Load Index (TLX), 354 National Academy of Science, Institute of Medicine, 121 National disasters (FLEX example), 371–72 Nausea. See Motion sickness–like symptoms Navab, N., 142 NAVAIR Orlando, 288 Naval Research Laboratory. See Designing user interfaces for training dismounted infantry Near-field monitors, 105 Net-force displays, 119 Networking, 8, 175, 185–87, 192, 316 Nintendo, 11, 205
441
Nitschke, C., 69 Nodes, 98 Noise: motion-induced noise, 40; noise floor, 104; operating noise level of projects, 72; reduction in noise levels, 351–52. See also Audio Nonfictional scenarios. See Story based learning environments noDNA, 28 Nonrepeatable errors, 35–38 Normal mapping, 17 Northern Digital Inc., 30 Northwestern University, 378–79 Novint Technologies, 125 NVIDIA, 18, 74, 83, 108, 184, 185 NVIS SX display, 57 Nyquist frequency (2F), 95–96, 112 Obermayer, R. W., 137 Object movement dynamics, 18–20 Observability of HMDs, 50 Obst, T., 142 O’Donnell, R. D., 355 Oehlert, Mark, 216 Office of Naval Research Virtual Technologies and Environments program, xiv, 328, 333–35. See also Designing user interfaces for training dismounted infantry; VESARS Ohbuchi, R., 139 Okabe, T., 83 OLED (organic light-emitting displays) displays, 57, 59 Olfactory display systems, 116, 117f, 127, 412. See also Multimodal display systems Olwal, A., 145, 148–49 OmniFocus projector, 69 O’Neil, H. F., 213–14 OneSAF (One Semi-Automated Forces) system, 192–93, 193f, 199–200 Online simulation, 23, 27 OpenAL, 110 OpenGL (open graphics library) applications, 17–18, 83, 84, 183 Open-loop control, 160–61, 164, 165, 170
442
Index
Operating principles of tracking systems, 31, 32t Operating systems and maximum main memory size, 179 Operation tiles, 148 Operator interface, 192 Optical aberrations, 51 Optical artifacts, 54 Optical magnification, 51 Optical see-through displays, 144 Optical tracking systems, 29, 38, 40f, 146–47, 146f Optical vignetting, defined, 72 Organic light-emitting displays (OLED) displays, 57, 59 Organizational stories, 382 Orientation. See also Vestibular display systems Origin Instruments, 30 Oser, R., 288 Outcome-driven simulations, 379, 381, 387–89 Outdoor environments, 26–27, 45–46 Overlearning, 279 Padmos, P., 231 Paint programs (2-D painting), 17 Parallel execution, 176 Paramount Pictures. See ICT Leaders Project Parker, G. A., 143 Passive haptics, 245–47, 246f Passive loudspeakers, 105 Passive stereoscopy, 72–73 Payne, S. C., 273–74 PC based stations for training: basic architecture, 175, 175f; CPU Architecture and Performance, 176– 77; history of development, 174; memory and mass storage, 178, 179– 80; memory: type and size, 179; multiprocessing, 177–78; PC busses, 175, 176; rendering and computing requirements, 174–80, 175f; single and multi-PC systems, 176 PCIe (peripheral component interconnect express) card, 111, 176
PCI (peripheral component interconnect), 111 PDUs (protocol data units), 186 Pedagogical agents for AI, 174, 279, 394–95, 400–403 Perceived truth, 298 Perceptual potency of HMDs, 49–50 Performance measurement/assessment. See After action review (AAR) process; Evaluating VE component technologies; Feedback Peripheral component interconnect express (PCIe) card, 111, 176 Peripheral component interconnect (PCI), 111 Peripheral processors, defined, 175 Per-pixel mapping approach, 79 Personal computers. See PC based stations for training PHANTOM device, 119 Phase, defined, 91. See also Audio Phase delay. See Synchronization delays Phenom processor, 177–78 Photometric correction, 80–84 Physical fidelity, 274–75, 410 Physics accelerator cards, 175 Physics engines, 20 Physics-processing-units, 175 Physiological responses, 247–49, 248f Pier, K., 148 Pioch, N. J., 398 The Pit, 50, 241, 242–47, 243f, 248f, 251–52, 256 Pixel shaders, 182–83 Planar homography, 77 Platforms and application programming interfaces (APIs), 110–11 Pointman, 162, 165–70, 166f, 167f Polarized stereoscopy, 73 Polhemus, 26, 29, 63, 245–47 Politeness theory, 401 POP (Prediction of Operator Performance) model, 354 PortAudio, 111 Pose of objects, 7, 23, 24–25. See also Trackers/tracking for training Position trackers, 228–29
Index Posture, 23, 24–25. See also Trackers/tracking for training Potency, 49–50 Potter, S. S., 375 Power Hungry (film), 387–88 Prediction intervals, 44 Prediction of Operator Performance (POP) model, 354 Presence, 6, 265–66 Presence (journal), 352 Presentation layer, 150 PresSonus, 112 Pressure waves, 91 Pretlove, J. R. G., 143 Prevou, M. I., 366 Primitive behaviors, 199 Prince, A., 330 Prince, C., 330 Procedural modeling programs, 16 Proficiency, measurement of, 356 Projective texturing, 79 Projector based displays, 63–86; abutted versus overlapped display, 70–71; acoustic treatments, 99–100; application development, 84–85; AR display technologies, 145; Being There project, 66f; CGI displays, 229; collimation, 229–30; costs, 63, 71; display configuration, 68–69; FlatWorld, 67f; focus issues, 69; hardware considerations, 71–72; image generators, 74; integration with audio systems, 96–97; lensless projectors, 70; loudspeaker delivery algorithms, 100; motion sickness–like symptoms associated with, 229–30; multiprojector systems, 64f, 65f, 68f, 74, 81–82, 82f; OmniFocus projector, 67f; overview of design considerations, 64–68; platform type, 230; resources for selecting, 71; room acoustics, 97– 99; screen and surface options, 74; seamless rendering, 68, 75–84, 77f, 78f; shadowing issues, 69–70, 70f; Social Computing Room, 70f; team training environment, 85f; tracking requirements, 75
443
Proprioceptive feedback, 351–52 Protocol analysis, 368 Protocol data units (PDUs), 186 Pruitt, J., 263 Psychoacoustics, 92–95, 93f Psychological fidelity, 274–75 Psychologically engagement. See Immersion Psychology of learning, 411 Psychometric functions, 251, 255–56 Psychomotor skills, described, 272–73 Quadro G-Sync, 74 Quadro Plex, 74 Quad Xeon processor, 177–78 Quantization, 95 Quinkert, K. A., 366 Radio frequency trackers, 30–31 Radiosity, 18 Raffin, B., 74 Rakow, N. A., 127 Range of motion, 160, 161, 164, 170 Range of tracking systems, 31, 32t Raskar, R., 63, 77–79, 80, 143 Ray tracing, 18 Realism. See Fidelity Reality-Virtuality continuum, 135, 137f Real time environments, 397–98 Real time tracking, 23, 27 Real World, 213 Refresh rates, 18, 223–24 Regenbrecht, H. T., 150 Registration, 145–46 Rehabilitation through augmented reality systems, 123, 142 Reiners, D., 83 Reliability, 414 Remapping (warping) basics, 76–77, 77f Remotely operated vehicles (ROVs) operations, 398 Rendering and computing requirements, 173–88; for arbitrary 3-D surfaces, 77–79; graphics processing, 180–85; networking, 185–87; overview, 173– 74; PC based stations for training, 174– 80, 175f; rendering programs, 15–20
444
Index
Rendezvous delay. See Synchronization delays Repeatable errors, 35–38 Resolution, 31, 32t, 55, 222 Reverberance, 98–99, 113 Rhodenizer, L., 366 Richardson, A. R., 165 Rickel, J., 396, 404 Riener, R., 142 RightMark, 111, 112 Ritter, A., 220 Ritter, F., 397, 398 RME, 112 Roberts, B., 398 Robertson, M. M., 326, 328 Robonaut, 121 Robots, xiii–xiv, 19, 124–25 Rockwell Collins, 57 Rogers, E. M., 257 Room acoustics, 97–99. See also Audio Room equalization, 106 Room modes, 98 Roscoe, S. N., 415 Rose, A. M., 414 Roth, E. M., 375 ROVs (remotely operated vehicles) operations, 398 RPA Electronics Design, LLC, 74 Run-time interface (RTI), 186–87 Rushton, S., 226–27 SA. See Situation awareness Safety considerations: of audio systems, 106, 109, 110; flashbacks, 233; motion sickness–like symptoms associated with VE exposure, 232–34 SAF (semi-automated forces) systems, 189–201; distributed simulation, 190– 91; ModSAF (modular semi-automated forces), 198–99; One Semi-Automated Forces (OneSAF) system, 192–93, 193f, 199–200; overview, 189–90; purpose and advantages in training, 191–92; research directions, 200–201; SAF Finite State Machine, 195f; SAF Operator Interface (OneSAF), 193f;
system characteristics, 192–97; validation, 197, 201 SAGE, 84 Salas, E., 263, 272–74, 288, 326, 328, 330, 366 Salience, 414 Sampling, 83, 112 Sandin, Daniel J., 63 Sato, I., 83 Sato, Y., 83 Savery, J. R., 365 SBAT (Synthetic Battlefield Authoring Tool) approach, 288, 289t Scaffolding, 279, 395 Scalability, 191, 201 Scalable Display Technologies, 83 Scanning-laser range finder technology, 16 Scenario based training (SBT), 263–68; defined, 263; events, 265–66; feedback, 267; instructional strategies, 266–67; learning management systems, 267–68; within military training environment, 366; overview, 365–66; performance measurement/assessment/diagnosis, 266; process, described, 263–64, 264f; scenario generators, 213; task analysis/learning objectives/competencies, 265 Scene graph application programming, 150 Schank, R., 382, 384–85 Schmalstieg, D., 142 SCORM (Sharable Content Object Reference Model), 313–15, 317; Content Aggregation Model (CAM), 314; overview of LMS and Simulation Interfaces, 324f; Run-Time Environment (RTE), 314–15; standard specifications for assessment results, 321, 324–25 SCOs (sharable content objects), 314, 317 Seamless rendering, 68, 75–84, 77f, 78f Seasickness, 130. See also Motion sickness–like symptoms Second Life, 185
Index Self-correction skills, 290–91 Self-critiques, 290. See also Feedback Self-tracking, 27 Seligmann, D., 139 Semi-automated forces, 10, 19 Sensics Inc., 57, 60 Sensorama, 116, 117f, 127 Sensor sample rates, 40–41 Sensory input fidelity, 242–50 SEOS Limited, 60, 83 Serial control interface, 72 Serious Games Summit, 205, 212 Servers, 187 Shader models, 17–18, 181–83, 184 Shadrick, S. B., 366–67 Shannon’s sampling theorem, 40 ShapeTape, 28 ShapeWrapIII, 28 Sharable Content Object Reference Model. See SCORM Sharable content objects (SCOs), 314, 317 Sharkey, T. J., 230 Sheldon, B. E., 141–42 Shipboard Mobile Aid for Training and Evaluation (ShipMATE), 288–90, 291 Sickness. See Motion sickness–like symptoms Sielhorst, T., 142 SIGGRAPH (Special Interest Group on Graphics and Interactive Techniques), 63, 256 Signal-to-noise ratio (SNR), 111 Siler, S. A., 395 Silicon Graphics Inc., 63 Silicon X-tal Reflective Display (SXRD) projector, 71 SIM EYE, 57, 59 Simple magnifiers, 51 Simpson, E. J., 273 Simulated Network (SIMNET) program, 186 Simulation code, 321 Simulation programs, overview, 270–71; analog computational programs, xiv, 219; augmented cognition, xv; distributed interactive simulation, xiv;
445
fundamental components, xiv; history of development, xiv, 270; lack of time concept, 305; medical simulation, xiv– xv; network security concerns, 316 Simulation validation, 197 Simulator sickness, 219, 220. See also Motion sickness–like symptoms Singer, M. J., 244, 410 Sinusoids, 91 Situated cognition, 393 Situation Awareness Behaviorally Anchored Rating Scale (SABARS), 330–31 Situation awareness (SA) development, 326–45; assessment of distributed training participants, 343–45; breakdowns in, 327–28; defined, 327; developing SA measures for VR training, 329–31; Endsley’s model, 329–30; example measures, 330; real time SA probes, 330; review and feedback, 328, 337–41, 341f, 342–43f; SA assessment system, 331; SABARS, 330–31; SA behavioral measure, 335– 37, 336f; SA measurement of team communications, 337, 338–39f; SA probe delivery, 331–35, 333t, 334f, 341f; teamwork skills, 327–28, 330– 31, 343, 344f; validation measures, 345; Virtual Environment Situation Awareness Review System (VESARS), 333–45 6-DOF Optical motion capture system. See Gaiter Skarbez, R., 79 “Sketchpad” (Sutherland), 48 Slater-Usoh-Steed presence questionnaire, 242, 251–52, 256 Slay, H., 148 SLI product, 185 Slope discontinuities, 80 Smallman, H. S., 415 Smart client technologies, 315–16 Smell. See Olfactory display systems SMEs (subject matter experts), 332 Smith-Jentsch, K. A., 273–74, 288 Smode, A. F., 410
446
Index
SNR (signal-to-noise ratio), 111 Soar Training Expert for Virtual Environments, 194, 400 Sociability of the tracking system, 26, 45 SOCRATES (system of object based components for review and assessment of training environment scenarios), 306–8 Software. See Computing components Song, P., 69 Sonnenwald, Diane H., 257–59 Sony, 63, 71 Sound. See Audio Sound generation cards, 175 Southwest Research Institute, 128 Spatial errors in tracking, 34–35 Spatial issues, 35–38 SPCAP (speaker-placement correction amplitude panning), 111 Speakers. See Audio Special Devices Desk at the Bureau of Aeronautics, xiv Specialization, 191 Speech and language systems, 21–22 Spherical aberration, 51 Squire, K. D., 207 SRAM (static random access memory), 179 Srinivasan, M. A., 122–23 Staircase presentation of stimuli, 255–56 Standard for Reusable Competency Definitions, 322 Standing waves, 98 Stanney, K. M., 205, 220, 232, 233–34 Star Trek, 11–13 State, A., 69, 147 Static assessment, 318 Static field distortion, 35 Static measurement error, 43 Static random access memory (SRAM), 179 Stationary displays, 145 Staveland, L. E., 354 Stealth tutoring, 402 Steed, A., 242, 251–52 Steel Beasts, 206, 207–8 Stereo projection, 145
Sterling, G. C., 130 Sternberg, R., 382 Stevens, R., 82 St. John, M., 415 Stock, I., 150 Stone, M., 148 Story based learning environments, 378– 89; fictionalization of lessons learned, 387–89; historical development of, 378–79, 387; ICT Leaders Project, 380–82; Internet weblogging, 383–84; methods of collecting stories, 382–84; pedagogical agents, 400–401; stories on the fringe of expectation, 384–86; tutorial planning systems, 399, 401, 403 Stripling, R., 142 Studio monitors, 105 Subjective workload assessment, 353–54, 355–56 Subject matter experts (SMEs), 332 Subwoofers, 103, 104, 105, 107 Sumerians, 270 Sun Microsystems, Inc., 63, 315 Surati, R., 63 Surgical training, 121–23, 123f. See also Medical simulation Surround sound, 100–101 Suslick, K. S., 127 Sutherland, Ivan, 48, 57, 139 Swan, J. E., 259 Swanson, R., 383–84 SXRD (Silicon X-tal Reflective Display) projector, 71 Symbolic haptics, 417 “Symbolic Olfactory Display” (Kaye), 127 Synchronization delays, 39, 41–43, 41f, 42f Synchronous logic, 177 Synthetic Battlefield Authoring Tool (SBAT) approach, 288, 289t System control, 413 System latency, 83, 225 System of object based components for review and assessment of training environment scenarios (SOCRATES), 306–8
Index Systems approach, described, 407–8 Tacit knowledge (implicit knowledge), 364 Tactical Language and Culture Training System (TLCTS) mission environment, 399, 401, 404 Tactile displays, 119 Tactile feedback, 352 Tactile interactions. See Haptics Tactors, 417 TADMUS (tactical decision making under stress), 288–90 Takemura, H., 135 Tangible AR, 148 Tannenbaum, S. I., 273–74 TARGETs (Targeted Acceptable Responses to Generated Events and Tasks), 288, 289t Tasks: described, 198–99, 409; task analysis, 265; task fidelity, 410–11; teamwork skills versus task work skills, 273–74. See also Assessment, of cognitive workload Taste. See Gustatory display systems Taxonomy of Usability Characteristics in Virtual Environments (Gabbard), 24 Taylor series, 44 TDT (team dimensional training), 288 Team training: cross-training, 287; environment, 85f; homeland security (FLEX example), 371–72; networking VEs, 8; situation awareness development, 327–28, 330–31 (see also Enhancing virtual environments to support training); task work skills versus teamwork skills, 273–74; tracking for training, 45. See also Designing user interfaces for training dismounted infantry; Training: guidelines for using simulations Telecommunication switches, 141, 142f Teleoperation, 143 Telestrators, 307 Temperature, 412 Temporal delays, 223, 231 Temporal errors in tracking, 34–35
447
Terrain models, 16 Testing. See Evaluating VE component technologies Texas Instruments, 11, 71 Text-to-speech systems, 21–22 Texture mapping, 17 Thales, 30–31 3rdTech, 30 Thomas, B., 148 3-DOF Inertial tracker, 161 3D-Perception (company), 69, 83 Three-dimensional (3-D) modeling, 15–20; art assets, 15; modeling object movement dynamics, 18–20; modeling objects: shape and surface appearance, 16–17; recommended references, 15; scene lighting, 17–18; software packages, 16 Throw ratio (D/W), 72 Tiled designs, 56–57 Timestamps, 305 TLCTS (Tactical Language and Culture Training System) mission environment, 399, 401, 404 “To Err is Human” (National Academy of Science, Institute of Medicine), 121 Tool-object interactions, 126 Topolski, R., 375 Total tracker error, 43 Touch displays. See Haptics Touring Machine, 147 Towles, H., 79 Trackers/tracking for training, 23–46; augmented reality (AR) systems, 145– 48, 146f; current technologies, 27–33; described, 174; evaluation of, 32f; factors to consider when evaluating tracking systems, 31–33; first-order dynamic error, 39; fundamental usage issues, 33–44; future technologies, 44– 46; head-mounted displays, 9; hybrid tracking systems, 29, 31, 147–48; inertial tracking system, 27; motion-induced measurement noise, 39–40, 40f; motion prediction, 43–44; online simulation, 23, 27; overview, 7, 9; posture, 23, 24–25; projection
448
Index
displays, 9; real time, 23, 27; self-tracking, 27; sensor sample rate, 40–41; sociability of, 25–26; sources of error, 34; spatial issues, 34–38; synchronization delays, 41–43; team training, 45; temporal errors in tracking, 34–35, 38–39; total tracker error, 43; tracking scenarios, 24–27; weight and bulk of body-worn components, 25. See also specific devices by name “Train as you fight,” 366 Trainer for Radar Intercept Officers (TRIO), 398 Training: guidelines for using simulations, 270–95; advantages of simulations for training, 271–72; apprentice level mentors, 284; characterization of training situations, 272; cost savings, 272; enhancing virtual environments to support training, 409; feedback, 280–83, 290– 91; instructional strategies, 278–79, 286–87; instructors and instructor training, 283–84; key components, 366–67; logistical issues, 284–86, 291– 92; performance measurement/assessment, 280, 287– 90; scenario and training environment, 274–77, 286; trainee control, 413–14; training as learning environment tier within assessment, 320–21; training enhancements/instructional features, 410; transfer of training, 356. See also Team training TRANSoM (Training for Remote Sensing and Manipulation), 398 Transverse waves, 91 Traum, D., 22 Triangle mesh, 16 Triangulation, 257 Triggers (scenario conditions), 265, 274, 320 TRIO (Trainer for Radar Intercept Officers), 398 TripleHead2Go, 74 Trust, 414
Truth games, 187 Turing test, 197 Tutoring. See Intelligent tutoring system 2-D Painting programs (paint programs), 17 UASs (Class I unmanned aircraft systems), 375 Ubisense system, 30 Ultrasound imagery, 139 Ultrawide Band (UWB) communications, 30–31 Universal serial bus (USB), 111 University of North Carolina (UNC) at Chapel Hill, 66f, 240, 241, 245–47, 257–59 University of Southern California. See Institute for Creative Technologies (ICT) Unreal Tournament 2003, 381 Unreal 3 game engine, 211, 212 Untethered operation, 31, 32t Update rates, 31, 32t, 224–25 Usability engineering, 259 U.S. Air Force, 413–14 U.S. Army: AAR use, 297–98; Army Excellence in Leadership (AXL) project, 387–89; Army Research Institute (ARI) for the Behavioral and Social Sciences, 328, 366, 416–17; Asymmetric Warfare Virtual Training Technology, 418; Class I unmanned aircraft systems (UASs), 375; Dismounted Infantry Virtual AAR System (DIVAARS), 300–304, 303t, 328; “Every Soldier a Sensor Simulation,” 416; FLEX approach, 375; Future Combat Systems, 375; ICT Leaders Project, 380–89; live simulation, 191; National Training Center, 401; olfactory displays, 128; OneSAF (One Semi-Automated Forces) system, 192–93; Simulated Network (SIMNET) program, 186; “train as you fight,” 366; Virtual Environment Situation Awareness Review System (VESARS), 333–45
Index U.S. Department of Defense, xiv, 233 U.S. Department of the Navy. See also Naval Research Laboratory User interfaces, 7, 10, 151, 159–61, 165, 250–56. See also Designing user interfaces for training dismounted infantry U.S. Global Positioning System, 147 U.S. Marine Corps (USMC): AAR use, 297; Close Combat: First to Fight, 209; Military Operations on Urban Terrain, 158; tactical decision-making simulations, 215 U.S. Military Academy, 383 U.S. Navy, 123–24, 233 Usoh, M., 242, 251–52. See also the Pit Utsumi, A., 135 UWB (Ultrawide Band) communications, 30–31 VACP (visual, auditory, cognitive, and psychomotor) method, 354 Validity, xv, 197, 201 Valve Source, 150 Vanderbilt University, 378 VBAP (Vector Base Amplitude Panning), 111 VE component technologies, overview, 7–8, 9, 11–13, 12t Vection, 222 Vector Base Amplitude Panning (VBAP), 111 VECTOR (Virtual Environment Cultural Training for Operational Readiness), 416 Vehicle simulators, described, 19 Vergence, 144 Vernik, R., 148 Vertex shaders, 181–83 Vertigo. See Motion sickness–like symptoms VESARS. See Virtual Environment Situation Awareness Review System (VESARS) Vestibular display systems, 129f. See also Multimodal display systems
449
VETT (Virtual Environments Technology for Training), 123–24 Vibration levels, 351–52 Vickers, D. L., 148 Video games. See Games and gaming technology for training Video output circuitry, 180 Video see-through displays, 144–45 Video synchronization delay, 39 Vidulich, M. A., 356 Viirre, E. S., 232 Vincenzi, D. A., 142 VIRPI, 84 Virtual Environment Cultural Training for Operational Readiness (VECTOR), 416 Virtual Environment Situation Awareness Review System (VESARS), 333–45, 334f; assessment of distributed training participants, 343–45; described, 331; review and feedback, 337–40, 341f, 342–43f; situation awareness measurement of team communications, 337, 338–39f; team communication measures, 343, 344f; validation measures, 345; VESARS behavioral rating system, 335–37, 336f Virtual Environments Technology for Training (VETT), 123–24 Virtual environments (VEs), overview: described, 2, 9, 173–74; immersion, 3, 6; interactivity, 6–7; VE component technologies, 2f, 3f, 4–5t, 7–8, 9, 11– 13, 12t Virtual fixtures (artificial force fields), 124–25 Virtual humans, 19 Virtual Patient simulator, 22 Virtual “pit rooms.” See the Pit Virtual Portal, 63 Virtual Reality Peripheral Network, 33 Virtual reality sickness. See Motion sickness–like symptoms Virtual Research V8, 57, 60 Virtual Technologies and Environments program. See Designing user interfaces for training dismounted infantry
450
Index
Visible interaction volumes, 148–49 Visual, auditory, cognitive, and psychomotor (VACP) method, 354 Visual displays, 8, 9, 226–27. See also Head-mounted displays Vizard, 33 Voice Communication, 305–6 Volpe, C. E., 273–74 Vomiting. See Motion sickness–like symptoms VPL Research, Inc., 63 Vreuls, D., 137 VR Juggler, 33, 64 VRSonic, 111 Vuzix (Icuiti) products, 57–58, 59 Wainess, R., 213–14 Waldinger, H. C., 142 Walk, R. D., 242 Waller, D., 165 Wand based selection, 148 Wang, N., 401 Wann, J. P., 226–27 Ward, L. M., 255 Ware, C., 157 Warp-and-blend rendering techniques, 83, 84 Warping (remapping) basics, 76–77, 77f Waschbuesch, M., 74 Wave field synthesis (WFS), 101–102 Wearability of AR, 151 Web based training. See Interfacing interactive 3-D simulations with learning systems Weblogs, 383–84 Webster, T., 147 Weidenhausen, J., 150 Weight of display technologies, 25, 55, 59, 60, 145, 227–28
Welch, G., 69 Wellek, S., 249 West Point, 383 Wetzstein, G., 69, 83 WFS (wave field synthesis), 102 Wheaton, G. R., 414 Wheeler, A. C., 143 Whelchel, A., 207 Whitton, Mary C., 257–59 Wickens, C. D., 350, 355, 414 Wide fields of view, 53–54, 56 Wierwille, W. W., 355 Wi-Fi based active radio frequency identification tracking, 31 Wii, 11, 205 Wilkinson, Jeff, 212 Wilson, J. R., 224–25 Witmer, B. G., 244 Woods, D. D., 375 Workload. See Assessment, of cognitive workload World of Warcraft, 185 WorldViz LLC, 33 Wright, M. C., 330–31 Wrist motions, 25 X-IST DataGlove, 28 XML (Extensible Markup Language), 307–8 XNA game development environment, 151 Yamauchi, T., 395 Yeh, M., 414 Zeisig, R. L., 288 Zhai, S., 143 Zimmons, P., 247–50, 248f, 249f, 250t
ABOUT THE EDITORS AND CONTRIBUTORS THE EDITORS DENISE NICHOLSON, Ph.D., is Director of Applied Cognition and Training in the Immersive Virtual Environments Laboaratory at the University of Central Florida’s Institute for Simulation and Training. She holds joint appointments in UCF’s Modeling and Simulation Graduate Program, Industrial Engineering and Management Department, and the College of Optics and Photonics. In recognition of her contributions to the field of Virtual Environments, Nicholson received the Innovation Award in Science and Technology from the Naval Air Warfare Center and has served as an appointed member of the international NATO Panel on “Advances of Virtual Environments for Human Systems Interaction.” She joined UCF in 2005, with more than 18 years of government experience ranging from bench level research at the Air Force Research Lab to leadership as Deputy Director for Science and Technology at NAVAIR Training Systems Division. DYLAN SCHMORROW, Ph.D., is an international leader in advancing virtual environment science and technology for training and education applications. He has received both the Human Factors and Ergonomics Society Leland S. Kollmorgen Spirit of Innovation Award for his contributions to the field of Augmented Cognition, and the Society of United States Naval Flight Surgeons Sonny Carter Memorial Award in recognition of his career improving the health, safety, and welfare of military operational forces. Schmorrow is a Commander in the U.S. Navy and has served at the Office of the Secretary of Defense, the Office of Naval Research, the Defense Advanced Research Projects Agency, the Naval Research Laboratory, the Naval Air Systems Command, and the Naval Postgraduate School. He is the only naval officer to have received the Navy’s Top Scientist and Engineers Award. JOSEPH COHN, Ph.D., is a Lieutenant Commander in the U.S. Navy, a full member of the Human Factors and Ergonomics Scoeity, the American Psychological Association, and the Aerospace Medical Association. Selected as the Potomac Institute for Policy Studies’ 2006 Lewis and Clark Fellow, Cohn has
452
About the Editors and Contributors
more than 60 publications in scientific journals, edited books, and conference proceedings and has given numerous invited lectures and presentations. THE CONTRIBUTORS G. VINCENT AMICO, Ph.D., is one of the pioneers of simulation—with over 50 years of involvement in the industry. He is one of the principal agents behind the growth of the simulation industry, both in Central Florida and nationwide. He began his simulation career in 1948 as a project engineer in the flight trainers branch of the Special Devices Center, a facility now known as NAVAIR Orlando. During this time, he made significant contributions to simulation science. He was one of the first to use commercial digital computers for simulation, and in 1966, he chaired the first I/ITSEC Conference, the now well-established annual simulation, training, and education meeting. By the time he retired in 1981, he had held both the Director of Engineering and the Direct of Research positions within NAVAIR Orlando. Amico has been the recipient of many professional honors, including the I/ITSEC Lifetime Achievement Award, the Society for Computer Simulation Presidential Award, and an honorary Ph.D. in Modeling and Simulation from the University of Central Florida. The NCS created “The Vince Amico Scholarship” for deserving high school seniors interested in pursuing study in simulation, and in 2001, in recognition of his unselfish commitment to simulation technology and training, Orlando mayor Glenda Hood designated December 12, 2001, as “Vince Amico Day.” JOE ARMSTRONG is a Senior Consultant with CAE Professional Services in Ottawa, Canada, with a background in Cognitive Psychology, Human Factors Engineering, and computational modeling. He has a specific interest in the development and application of computational models of human behavior across a range of applications including R&D, acquisition support, and training. ˘ ATAY BAS¸DOG ˘ AN, Ph.D., is a faculty member at Koc University and C ¸ AG conducts interdisciplinary research in the areas of haptics and virtual environments. Before joining Koc University, he worked at Nasa-JPL/Caltech, MIT, and Northwestern University Research Park. He has a Ph.D. degree in mechanical engineering from Southern Methodist University. MARK BOLAS is an Associate Professor of Interactive Media in the University of Southern California’s School of Cinematic Arts, an IEEE award recipient, and Director of Fakespace Labs. His work explores using HMDs and augmented reality to create virtual environments that engage perception and cognition, making visceral synthesized memories possible. CLINT BOWERS is a Professor of Psychology and Digital Media at the University of Central Florida. His research interests include the use of technology for individual and team learning.
About the Editors and Contributors
453
FRED BROOKS is Kenan Professor, UNC–Chapel Hill. He was Corporate Project Manager for the IBM System/360 hardware and software. He founded UNC’s Computer Science Department. Books include The Mythical ManMonth and Blaauw and Brooks, Computer Architecture: Concepts and Evolution. Brooks received the National Medal of Technology and the Turing Award. BRAD CAIN is a Defence Scientist and Professional Engineer at Defence Research and Development with a background in computational modeling for the Canadian Forces. His research interests include human behavior and performance modeling for application in simulation based acquisition and distributed training systems using virtual agents that incorporate human sciences knowledge. JAN CANNON-BOWERS is a Senior Research Associate at the UCF’s Institute for Simulation and Training and Director for Simulation Initiatives at the College of Medicine. Her research interests include the application of technology to the learning process. In particular, she has been active in developing synthetic learning environments for a variety of task environments. CURTIS CONKEY, from NAWC-TSD-US Navy, is the Principle Investigator for the Learning Technologies Lab whose primary charter is to investigate emerging technologies for training. Curtis holds a Bachelor in Electronics Engineering, a master’s degree in Computer Science, and is a Doctoral Student in Modeling and Simulation. LARRY DAVIS, Ph.D., is a Research Associate at the Institute for Simulation and Training at the University of Central Florida. He is a member of the Applied Cognition and Training in Immersive Virtual Environments (ACTIVE) Laboratory where he is conducting research in interfaces and technology for use in virtual environments. PATRICIA DENBROOK is a software developer and researcher in the fields of computer graphics, virtual reality, and advanced user interfaces. As a member of Naval Research Laboratory’s Immersive Simulation Laboratory group, she is the software architect of the Gaiter and Pointman dismounted infantry simulation interfaces. JULIE DREXLER is the Associate Director for Human-Systems Integration and Engineering Management in the ACTIVE Lab at UCF’s Institute for Simulation and Training. She earned an M.S. in Human Engineering/Ergonomics and a Ph.D. in Industrial Engineering from the University of Central Florida. She has over 12 years of experience as a human factors research professional. MICA ENDSLEY, Ph.D., is President of SA Technologies, a small business specializing in research, design, and training activities related to cognition and situation awareness. She has conducted research studies on a variety of issues related
454
About the Editors and Contributors
to situation awareness including, but not limited to, investigations of human error, and analyses of situation awareness requirements in numerous domains. STEVEN FEINER, Ph.D., is a Professor of Computer Science at Columbia University, where he directs the Computer Graphics and User Interfaces Laboratory. He is co-author of Computer Graphics: Principles and Practice (AddisonWesley) and currently serves as general co-chair for ACM Virtual Reality Software and Technology 2008. HENRY FUCHS is Federico Gil Professor of Computer Science and Adjunct Professor of Biomedical Engineering at UNC–Chapel Hill. His interests include projector-camera systems, virtual environments, tele-presence, and medical applications. He is a member of the National Academy of Engineering and recipient of the 1992 ACM-SIGGRAPH Achievement Award. STEPHEN GOLDBERG, Ph.D., is the Chief of the Orlando Research Unit of the U.S. Army Research Institute. He received a doctorate in Cognitive Psychology from the State University of New York at Buffalo. He supervises a research program focused on feedback processes and training in virtual simulations and games. ANDREW GORDON is a Research Associate Professor of Computer Science at the Institute for Creative Technologies at the University of Southern California. He received his Ph.D. from Northwestern University in 1999. He is the author of the book Strategy Representation: An Analysis of Planning Knowledge. MICHAEL GUERRERO is a senior engineer at Delta3D specializing in graphics technologies. He has contributed to commercial games on both the Nintendo DS and the PC and is now pushing the state of the art in simulation where he makes extensive use of vertex and pixel shaders to enhance the quality of Delta3D’s applications. Major STEVEN HENDERSON holds an M.S. in Systems Engineering from the University of Arizona and a B.S. in Computer Science from the United States Military Academy. He is currently pursuing a Ph.D. in Computer Science at Columbia University as part of the U.S. Army’s Advanced Civil Schooling Program. BRAD HOLLISTER holds a B.Sc. degree in Biochemistry and a M.Sc. degree in Computer Science from Clemson University. His interests span many fields. His professional endeavors are primarily associated with real time computer graphics. For most of his career, he has been employed in the simulation industry as a software engineer.
About the Editors and Contributors
455
AMANDA HOWEY received the B.S. degree in psychology from Eckerd College and the M.S. degree in modeling and simulation, human systems track from the University of Central Florida. She is currently working toward the Ph.D. degree in applied experimental psychology and human factors psychology degree from the University of Central Florida. LEWIS JOHNSON, Ph.D., is President and Chief Scientist of Alelo TLT LLC and formerly a Research Professor at the Information Sciences Institute of USC. His current research focuses on the adoption of interactive learning environments. He holds a B.A. in linguistics from Princeton University and a Ph.D. in computer science from Yale University. TYLER JOHNSON is a Ph.D. student in the Department of Computer Science at the University of North Carolina at Chapel Hill. Since graduating from North Carolina State University in 2005, his doctoral work has focused on continuous calibration techniques for multiprojector displays. DAVID KABER, Ph.D., is professor of industrial and systems engineering at North Carolina State University. He also directs the Ergonomics Laboratory and is an associate faculty in psychology. He received his Ph.D. from Texas Tech University in 1996 and has published research on presence and situation awareness in virtual environment simulations. ROBERT KENNEDY, Ph.D., has been a Human Factors Psychologist for over 48 years and has conducted projects with numerous agencies including DoD, NASA, NSF, DOT, and NIH on training and adaptation, human performance, and motion/VE sickness. He is also an Adjunct Professor at the University of Central Florida. DON LAMPTON is a Research Psychologist with the U.S. Army Research Institute (ARI) for the Behavioral and Social Sciences. He is the co-developer of the Virtual Environments Performance Assessment Battery (VEPAB), the Fully Immersive Team Training (FITT) system, and the Dismounted Infantry Virtual After Action Review System (DIVAARS). H. CHAD LANE, Ph.D., is a Research Scientist at the USC Institute for Creative Technologies who specializes in artificial intelligence, intelligent tutoring systems, and cognitive modeling. His research focuses on learning in game based and immersive environments. He holds a Ph.D. in Computer Science from the University of Pittsburgh, earned in 2004. R. BOWEN LOFTIN holds a B.S. (physics) from Texas A&M University (1970) and a Ph.D. (physics) from Rice University (1975). Bowen is Vice President and Chief Executive Officer of Texas A&M University at Galveston,
456
About the Editors and Contributors
Professor of Maritime Systems Engineering, and Professor of Industrial and Systems Engineering at Texas A&M University. JAMES LUSSIER, Ph.D., has worked for the U.S. Army Research Institute for the Behavioral and Social Sciences since 1984. He is the Chief of ARI-Fort Bragg Scientific Coordination Office supporting the U.S. Army Special Operations Command and ARI-Fort Knox Research Unit engaged in battle command and unit-focused training research. LINDA MALONE is a Professor in Industrial Engineering. She is the co-author of a statistics text and has authored or co-authored over 75 refereed papers. She has been an associate editor of several journals. She is a Fellow of the American Statistical Association. GLENN MARTIN is a Senior Research Scientist at the University of Central Florida’s Institute for Simulation and Training where he leads the Interactive Realities Laboratory, pursuing research in multimodal, physically realistic, networked virtual environments and applications of virtual reality technology. DANNY MCCUE realized his dream of developing commercial video games professionally after graduating with a Bachelor of Science in Computer Science and a minor in Education from the University of California at Santa Cruz. He now works at the MOVES Institute at the Naval Postgraduate School in Monterey, California. IAN MCDOWALL has been involved in the development of stereo displays since the early 1990s. He is one of the founders of Fakespace Labs and has worked on the design and integration of many different stereo displays. PERRY MCDOWELL is a Research Associate at the MOVES Institute and is the Executive Director for the Delta3D open source game engine developed there. A former naval officer, he has a B.S. in Naval Architecture from the U.S. Naval Academy and an MSCS from the Naval Postgraduate School. He currently teaches graphics and develops simulations for training, primarily military. LARRY MELIZA earned his doctorate in psychology from the University of Arizona prior to joining the U.S. Army Research Institute. He has over 30 years of research and development experience in the measurement and design of collective training with a focus on feedback issues. DENISE NICHOLSON, Ph.D., is the Director of the Applied Cognition and Training in Immersive Virtual Environments Laboratory at the University of Central Florida’s Institute for Simulation and Training (IST). Her additional UCF appointments include the Modeling and Simulation Graduate Program, the Department of Industrial Engineering and Management Systems, and the College of Optics and Photonics/CREOL.
About the Editors and Contributors
457
ROBERT PAGE has 20 years of experience as a Computer Scientist. Mr. Page is a member of the Immersive Simulation Section at the U.S. Naval Research Laboratory. He is the software architect responsible for designing Virtual Environments. Mr. Page received his M.S. in Computer Science from George Washington University. MIKEL PETTY is Director of the University of Alabama in Huntsville’s Center for Modeling, Simulation, and Analysis. He has worked in modeling and simulation research and development since 1990. He received a Ph.D. in Computer Science from the University of Central Florida in 1997. JENNIFER RILEY, Ph.D., is a Senior Research Associate with SA Technologies. She received her doctoral degree from Mississippi State University in Engineering with an emphasis on human factors and cognitive engineering. Riley has conducted and published research on training for situation awareness and presence in virtual environments. RAMY SADEK is a developer and researcher at the USC Institute for Creative Technologies. His research focuses on immersive audio rendering and high performance audio software architectures for virtual environments. SCOTT SHADRICK, Ph.D., is a Team Leader and Senior Research Psychologist at the U.S. Army Research Institute’s Fort Knox Research Unit. He has conducted research on the acceleration of adaptive performance, training complex cognitive skills, cognitive task analysis and knowledge elicitation techniques, performance assessment, and leader development. MOHAMED SHEIK-NAINAR, Ph.D., is a Usability Research Scientist at Synaptics Inc. He received his doctoral degree in Industrial Engineering from North Carolina State University specializing in Ergonomics. He is currently conducting research on input methodologies, specifically touch interactions for mobile devices. LINDA SIBERT has over 25 years of experience in the field of human-computer interaction, 19 at Naval Research Laboratory. Her recent work is in the design and evaluation of interfaces for virtual reality urban combat training systems. Ms. Sibert frequently reviews for journals and conferences and has numerous publications. MIKE SINGER, Ph.D., is a research psychologist at the Army Research Institute for the Behavioral & Social Sciences. He received his Ph.D. in Cognitive Psychology from the University of Maryland in 1985 and has over 25 years of experience conducting training research, authoring over 75 reports, journal articles, papers, and chapters.
458
About the Editors and Contributors
BRENT SMITH has served as Chief Technology Officer for Engineering & Computer Simulations since 1997. While at ECS, he has performed extensive research in the areas of collaborative distributed learning architectures, distributed simulations, and the use of commercial gaming technologies as educational tools for the U.S. military. ´ E STOUT received her Ph.D. in Human Factors Psychology from the RENE University of Central Florida in 1994, has worked in the areas of training research, design, and development and human performance measurement for more than 20 years, and has more than 100 publications/professional conference presentations. JAMES TEMPLEMAN is Professor of Psychology and Management at the University of Central Florida. He received the Distinguished Scientific Contribution Award from the Society for Industrial and Organizational Psychology and is a Fellow in the Society for Industrial and Organizational Psychology, the American Psychological Association, and the American Psychological Society. HERMAN TOWLES is a Senior Research Engineer in the Department of Computer Science at the University of North Carolina at Chapel Hill. With over 30 years of graphics and video experience, he has been developing projective display systems and camera based calibration methodologies since joining UNC in 1998. GREG WELCH is a Research Associate Professor of Computer Science at UNC–Chapel Hill. He works on motion tracking systems and telepresence. Prior to UNC he worked on the Voyager Spacecraft Project at NASA’s Jet Propulsion Laboratory, and on airborne electronic countermeasures at Northrop-Grumman’s Defense Systems Division. JEREMY WENDT is a Research Assistant and Graduate Student at the University of North Carolina at Chapel Hill. His research has included modeling and rendering fluids, real time shadow generation, and virtual environments— focused primarily on locomotion interfaces. He works on a company-wide build and test system at NVIDIA. MARY WHITTON is Research Associate Professor of Computer Science at the University of North Carolina at Chapel Hill. She develops and evaluates techniques to make virtual environments effective for applications such as training and rehabilitation. Ms. Whitton has an M.S. in Electrical Engineering from North Carolina State University.
The PSI Handbook of Virtual Environments for Training and Education
Praeger Security International Advisory Board Board Cochairs Loch K. Johnson, Regents Professor of Public and International Affairs, School of Public and International Affairs, University of Georgia (U.S.A.) Paul Wilkinson, Professor of International Relations and Chairman of the Advisory Board, Centre for the Study of Terrorism and Political Violence, University of St. Andrews (U.K.) Members Anthony H. Cordesman, Arleigh A. Burke Chair in Strategy, Center for Strategic and International Studies (U.S.A.) The´re`se Delpech, Director of Strategic Affairs, Atomic Energy Commission, and Senior Research Fellow, CERI (Fondation Nationale des Sciences Politiques), Paris (France) Sir Michael Howard, former Chichele Professor of the History of War and Regis Professor of Modern History, Oxford University, and Robert A. Lovett Professor of Military and Naval History, Yale University (U.K.) Lieutenant General Claudia J. Kennedy, USA (Ret.), former Deputy Chief of Staff for Intelligence, Department of the Army (U.S.A.) Paul M. Kennedy, J. Richardson Dilworth Professor of History and Director, International Security Studies, Yale University (U.S.A.) Robert J. O’Neill, former Chichele Professor of the History of War, All Souls College, Oxford University (Australia) Shibley Telhami, Anwar Sadat Chair for Peace and Development, Department of Government and Politics, University of Maryland (U.S.A.) Fareed Zakaria, Editor, Newsweek International (U.S.A.)
The PSI Handbook of Virtual Environments for Training and Education DEVELOPMENTS FOR THE MILITARY AND BEYOND Volume 3 Integrated Systems, Training Evaluations, and Future Directions Edited by Joseph Cohn, Denise Nicholson, and Dylan Schmorrow
Technology, Psychology, and Health
PRAEGER SECURITY INTERNATIONAL
Westport, Connecticut
•
London
Library of Congress Cataloging-in-Publication Data The PSI handbook of virtual environments for training and education : developments for the military and beyond. p. cm. – (Technology, psychology, and health, ISSN 1942–7573 ; v. 1-3) Includes bibliographical references and index. ISBN 978–0–313–35165–5 (set : alk. paper) – ISBN 978–0–313–35167–9 (v. 1 : alk. paper) – ISBN 978–0–313–35169–3 (v. 2 : alk. paper) – ISBN 978–0–313–35171–6 (v. 3 : alk. paper) 1. Military education–United States. 2. Human-computer interaction. 3. Computer-assisted instruction. 4. Virtual reality. I. Schmorrow, Dylan, 1967- II. Cohn, Joseph, 1969- III. Nicholson, Denise, 1967- IV. Praeger Security International. V. Title: Handbook of virtual environments for training and education. VI. Title: Praeger Security International handbook of virtual environments for training and education. U408.3.P75 2009 355.0078’5–dc22 2008027367 British Library Cataloguing in Publication Data is available. Copyright © 2009 by Joseph Cohn, Denise Nicholson, and Dylan Schmorrow All rights reserved. No portion of this book may be reproduced, by any process or technique, without the express written consent of the publisher. Library of Congress Catalog Card Number: 2008027367 ISBN-13: 978–0–313–35165–5 (set) 978–0–313–35167–9 (vol. 1) 978–0–313–35169–3 (vol. 2) 978–0–313–35171–6 (vol. 3) ISSN: 1942–7573 First published in 2009 Praeger Security International, 88 Post Road West, Westport, CT 06881 An imprint of Greenwood Publishing Group, Inc. www.praeger.com Printed in the United States of America
The paper used in this book complies with the Permanent Paper Standard issued by the National Information Standards Organization (Z39.48–1984). 10 9 8 7 6 5 4 3 2 1
To our families, and to the men and women who have dedicated their lives to educate, train, and defend to keep them safe
This page intentionally left blank
CONTENTS
Series Foreword
xi
Preface by G. Vincent Amico
xiii
Acknowledgments
xvii
SECTION 1: INTEGRATED TRAINING SYSTEMS Section Perspective Neal Finkelstein
1
Part I: Systems Engineering and Human-Systems Integration
9
Chapter 1: Systems Engineering Approach for Research to Improve Technology Transition Denise Nicholson and Stephanie Lackey
9
Chapter 2: Human-Systems Integration for Naval Training Systems Katrina Ricci, John Owen, James Pharmer, and Dennis Vincenzi
18
Chapter 3: Virtual Environments and Unmanned Systems: Human-Systems Integration Issues John Barnett
27
Part II: Defense Training Examples
33
Chapter 4: U.S. Marine Corps Deployable Virtual Training Environment Pete Muller, Richard Schaffer, and James McDonough
33
Chapter 5: Infantry and Marksmanship Training Systems Roy Stripling, Pete Muller, Richard Schaffer, and Joseph Cohn
41
Chapter 6: Fielded Navy Virtual Environment Training Systems Daniel Patton, Long Nguyen, William Walker, and Richard Arnold
50
Chapter 7: Virtual Technologies for Training: Interactive Multisensor Analysis Training Sandra Wetzel-Smith and Wallace Wulfeck II
62
viii
Contents
Chapter 8: A Virtual Environment Application: Distributed Mission Operations Dee Andrews and Herbert Bell
77
Chapter 9: Virtual Environments in Army Combat Systems Henry Marshall, Gary Green, and Carl Hobson
85
Chapter 10: DAGGERS: A Dismounted Soldier Embedded Training and Mission Rehearsal System Pat Garrity and Juan Vaquerizo
92
Chapter 11: Medical Simulation Training Systems M. Beth Pettitt, Michelle Mayo, and Jack Norfleet
99
Chapter 12: Aviation Training Using Physiological and Cognitive Instrumentation Tom Schnell and Todd Macuda
107
Chapter 13: Virtual Environment Lessons Learned Jeffrey Moss and Michael White
117
Part III: Game Based Training
125
Chapter 14: So You Want to Use a Game: Practical Considerations in Implementing a Game Based Trainer John Hart, Timothy Wansbury, and William Pike
125
Chapter 15: Massively Multiplayer Online Games for Military Training: A Case Study Rodney Long, David Rolston, and Nicole Coeyman
131
Part IV: International Training Examples
138
Chapter 16: A Survey of International Virtual Environment Research and Development Contributions to Training Robert Sottilare
138
SECTION 2: TRAINING EFFECTIVENESS AND EVALUATION Section Perspective Eric Muth and Fred Switzer
147
Part V: Factors for Training Effectiveness and Evaluation
157
Chapter 17: Training Effectiveness Evaluation: From Theory to Practice Joseph Cohn, Kay Stanney, Laura Milham, Meredith Bell Carroll, David Jones, Joseph Sullivan, and Rudolph Darken
157
Chapter 18: Transfer Utility–Quantifying Utility Robert C. Kennedy and Robert S. Kennedy
173
Chapter 19: Instrumenting for Measuring Adam Hoover and Eric Muth
184
Contents
ix
Part VI: Relevance of Fidelity in Training Effectiveness and Evaluation 196 Chapter 20: Identical Elements Theory: Extensions and Implications for Training and Transfer David Dorsey, Steven Russell, and Susan White
196
Chapter 21: Assessment and Prediction of Effectiveness of Virtual Environments: Lessons Learned from Small Arms Simulation Stuart Grant and George Galanis
206
Chapter 22: Simulation Training Using Fused Reality Ed Bachelder, Noah Brickman, and Matt Guibert
217
Chapter 23: Dismounted Combatant Simulation Training Systems Bruce Knerr and Stephen Goldberg
232
Part VII: Training Effectiveness and Evaluation Applications
243
Chapter 24: Conducting Training Transfer Studies in Complex Operational Environments Roberto Champney, Laura Milham, Meredith Bell Carroll, Ali Ahmad, Kay Stanney, Joseph Cohn, and Eric Muth
243
Chapter 25: The Application and Evaluation of Mixed Reality Simulation Darin Hughes, Christian Jerome, Charles Hughes, and Eileen Smith
254
Chapter 26: Trends and Perspectives in Augmented Reality Brian Goldiez and Fotis Liarokapis
278
Chapter 27: Virtual Environment Helicopter Training Joseph Sullivan, Rudolph Darken, and William Becker
290
Chapter 28: Training Effectiveness Experimentation with the USMC Deployable Virtual Training Environment—Combined Arms Network William Becker, C. Shawn Burke, Lee Sciarini, Laura Milham, Meredith Bell Carroll, Richard Schaffer, and Deborah Wilbert
308
Chapter 29: Assessing Collective Training Thomas Mastaglio and Phillip Jones
324
SECTION 3: FUTURE DIRECTIONS Section Perspective Rudolph Darken and Dylan Schmorrow
337
Part VIII: Future Visions
341
Chapter 30: In the Uncanny Valley Judith Singer and Alexander Singer Introduction by Kathleen Bartlett
341
x
Contents
Chapter 31: Trends in Modeling, Simulation, Gaming, and Everything Else Jack Thorpe
349
Chapter 32: Technological Prospects for a Personal Virtual Environment Randall Shumaker
355
Part IX: Military and Industry Perspectives
371
Chapter 33: The Future of Navy Training Alfred Harms Jr.
371
Chapter 34: The Future of Marine Corps Training William Yates, Gerald Mersten, and James McDonough
377
Chapter 35: The Future of Virtual Environment Training in the Army Roger Smith
386
Chapter 36: Future Air Force Training Daniel Walker and Kevin Geiss
392
Chapter 37: Factors Driving Three-Dimensional Virtual Medical Education James Dunne and Claudia McDonald
399
Chapter 38: Virtual Training for Industrial Applications Dirk Reiners
405
Chapter 39: Corporate Training in Virtual Environments Robert Gehorsam
413
Part X: Next Generation Concepts and Technologies
420
Chapter 40: Virtual Environment Displays Carolina Cruz-Neira and Dirk Reiners
420
Chapter 41: Mindscape Retuning and Brain Reorganization with Hybrid Universes: The Future of Virtual Rehabilitation Cali Fidopiastis and Mark Wiederhold
427
Chapter 42: Personal Learning Associates and the New Learning Environment J. D. Fletcher
435
Chapter 43: The Future of Museum Experiences Lori Walters, Eileen Smith, and Charles Hughes
444
Acronyms
453
Index
461
About the Editors and Contributors
491
SERIES FOREWORD
LAUNCHING THE TECHNOLOGY, PSYCHOLOGY, AND HEALTH DEVELOPMENT SERIES The escalating complexity and operational tempo of the twenty-first century requires that people in all walks of life acquire ever-increasing knowledge, skills, and abilities. Training and education strategies are dynamically changing toward delivery of more effective instruction and practice, wherever and whenever needed. In the last decade, the Department of Defense has made significant investments to advance the science and technology of virtual environments to meet this need. Throughout this time we have been privileged to collaborate with some of the brightest minds in science and technology. The intention of this three-volume handbook is to provide comprehensive coverage of the emerging theories, technologies, and integrated demonstrations of the state-of-the-art in virtual environments for training and education. As Dr. G. Vincent Amico states in the Preface, an important lesson to draw from the history of modeling and simulation is the importance of process. The human systems engineering process requires highly multidisciplinary teams to integrate diverse disciplines from psychology, education, engineering, and computer science (see Nicholson and Lackey, Volume 3, Section 1, Chapter 1). This process drives the organization of the handbook. While other texts on virtual environments (VEs) focus heavily on technology, we have dedicated the first volume to a thorough investigation of learning theories, requirements definition, and performance measurement. The second volume provides the latest information on a range of virtual environment component technologies and a distinctive section on training support technologies. In the third volume, an extensive collection of integrated systems is discussed as virtual environment use-cases along with a section of training effectiveness evaluation methods and results. Volume 3, Section 3 highlights future applications of this evolving technology that span cognitive rehabilitation to the next generation of museum exhibitions. Finally, a glimpse into the potential future of VEs is provided as an original short story entitled “Into the Uncanny Valley” from Judith Singer and Hollywood director Alex Singer.
xii
Series Foreword
Through our research we have experienced rapid technological and scientific advancements, coinciding with a dramatic convergence of research achievements representing contributions from numerous fields, including neuroscience, cognitive psychology and engineering, biomedical engineering, computer science, and systems engineering. Historically, psychology and technology development were independent research areas practiced by scientists and engineers primarily trained in one of these disciplines. In recent years, however, individuals in these disciplines, such as the close to 200 authors of this handbook, have found themselves increasingly working within a unified framework that completely blurs the lines of these discrete research areas, creating an almost “metadisciplinary” (as opposed to multidisciplinary) form of science and technology. The strength of the confluence of these two disciplines lies in the complementary research and development approaches being employed and the interdependence that is required to achieve useful technological applications. Consequently, with this handbook we begin a new Praeger Security International Book Series entitled Technology, Psychology, and Health intended to capture the remarkable advances that will be achieved through the continued seamless integration of these disciplines, where unified and simultaneously executed approaches of psychology, engineering, and practice will result in more effective science and technology applications. Therefore, the esteemed contributors to the Technology, Psychology, and Health Development Series strive to capture such advancements and effectively convey both the practical and theoretical elements of the technological innovations they describe. The Technology, Psychology, and Health Development Series will continue to address the general themes of requisite foundational knowledge, emergent scientific discoveries, and practical lessons learned, as well as cross-discipline standards, methodologies, metrics, techniques, practices, and visionary perspectives and developments. The series plans to showcase substantial advances in research and development methods and their resulting technologies and applications. Cross-disciplinary teams will provide detailed reports of their experiences applying technologies in diverse areas—from basic academic research to industrial and military fielded operational and training systems to everyday computing and entertainment devices. A thorough and comprehensive consolidation and dissemination of psychology and technology development efforts is no longer a noble academic goal—it is a twenty-first century necessity dictated by the desire to ensure that our global economy and society realize their full scientific and technological potentials. Accordingly, this ongoing book series is intended to be an essential resource for a large international audience of professionals in industry, government, and academia. We encourage future authors to contact us for more information or to submit a prospectus idea. Dylan Schmorrow and Denise Nicholson Technology, Psychology, and Health Development Series Editors
[email protected]
PREFACE G. Vincent Amico It is indeed an honor and pleasure to write the preface to this valuable collection of articles on simulation for education and training. The fields of modeling and simulation are playing an increasingly important role in society. You will note that the collection is titled virtual environments for training and education. I believe it is important to recognize the distinction between those two terms. Education is oriented to providing fundamental scientific and technical skills; these skills lay the groundwork for training. Simulations for training are designed to help operators of systems effectively learn how to operate those systems under a variety of conditions, both normal and emergency situations. Cognitive, psychomotor, and affective behaviors must all be addressed. Hence, psychologists play a dominant role within multidisciplinary teams of engineers and computer scientists for determining the effective use of simulation for training. Of course, the U.S. Department of Defense’s Human Systems Research Agencies, that is, Office of the Secretary of Defense, Office of Naval Research, Air Force Research Lab, Army Research Laboratory, and Army Research Institute, also play a primary role—their budgets support many of the research activities in this important field. Volume 1, Section 1 in this set addresses many of the foundational learning issues associated with the use of simulation for education and training. These chapters will certainly interest psychologists, but are also written so that technologists and other practitioners can glean some insight into the important science surrounding learning. Throughout the set, training technologies are explored in more detail. In particular, Volume 2, Sections 1 and 2 include several diverse chapters demonstrating how learning theory can be effectively applied to simulation for training. The use of simulation for training goes back to the beginning of time. As early as 2500 B.C., ancient Egyptians used figurines to simulate warring factions. The precursors of modern robotic simulations can be traced back to ancient China, from which we have documented reports (circa 200 B.C.) of artisans constructing mechanical automata, elaborate mechanical simulations of people or animals. These ancient “robots” included life-size mechanical humanoids, reportedly capable of movement and speech (Kurzweil, 1990; Needham, 1986). In those
xiv
Preface
early days, these mechanical devices were used to train soldiers in various phases of combat, and military tacticians used war games to develop strategies. Simulation technology as we know it today became viable only in the early twentieth century. Probably the most significant event was Ed Link’s development of the Link Trainer (aka the “Blue Box”) for pilot training. He applied for its patent in 1929. Yet, simulation did not play a major role in training until the start of World War II (in 1941), when Navy captain Luis de Florez established the Special Devices Desk at the Bureau of Aeronautics. His organization expanded significantly in the next few years as the value of simulation for training became recognized. Captain de Florez is also credited with the development of the first flight simulation that was driven by an analog computer. Developed in 1943, his simulator, called the operational flight trainer, modeled the PBM-3 aircraft. In the period after World War II, simulators and simulation science grew exponentially based upon the very successful programs initiated during the war. There are two fundamental components of any modern simulation system. One is a sound mathematical understanding of the object to be simulated. The other is the real time implementation of those models in computational systems. In the late 1940s the primary computational systems were analog. Digital computers were very expensive, very slow, and could not solve equations in real time. It was not until the late 1950s and early 1960s that digital computation became viable. For instance, the first navy simulator to use a commercial digital computer was the Attack Center Trainer at the FBM Facility (New London, Connecticut) in 1959. Thus, it has been only for the past 50 years that simulation has made major advancements. Even today, it is typical that user requirements for capability exceed the ability of available technology. There are many areas where this is particularly true, including rapid creation of visual simulation from actual terrain environment databases and human behavior representations spanning cognition to social networks. The dramatic increases in digital computer speed and capacity have significantly closed the gap. But there are still requirements that cannot be met; these gaps define the next generation of science and technology research questions. In the past decade or so, a number of major simulation initiatives have developed, including distributed interactive simulation, advanced medical simulation, and augmented cognition supported simulation. Distributed simulation enables many different units to participate in a joint exercise, regardless of where the units are located. The requirements for individual simulations to engage in such exercises are mandated by Department of Defense standards, that is, high level architecture and distributed interactive simulation. An excellent example of the capabilities that have resulted are the unprecedented number of virtual environment simulations that have transitioned from the Office of Naval Research’s Virtual Technologies and Environments (VIRTE) Program to actual military training applications discussed throughout this handbook. The second area of major growth is the field of medical simulation. The development of the human
Preface
xv
patient simulator clearly heralded this next phase of medical simulation based training, and the field of medical simulation will certainly expand during the next decade. Finally, the other exciting development in recent years is the exploration of augmented cognition, which may eventually enable system users to completely forgo standard computer interfaces and work seamlessly with their equipment through the utilization of neurophysiological sensing. Now let us address some of the issues that occur during the development process of a simulator. The need for simulation usually begins when a customer experiences problems training operators in the use of certain equipment or procedures; this is particularly true in the military. The need must then be formalized into a requirements document, and naturally, the search for associated funding and development of a budget ensues. The requirements document must then be converted into a specification or a work statement. That then leads to an acquisition process, resulting in a contract. The contractor must then convert that specification into a hardware and software design. This process takes time and is subject to numerous changes in interpretation and direction. The proof of the pudding comes when the final product is evaluated to determine if the simulation meets the customer’s needs. One of the most critical aspects of any modeling and simulation project is to determine its effectiveness and whether it meets the original objectives. This may appear to be a rather straightforward task, but it is actually very complex. First, it is extremely important that checks are conducted at various stages of the development process. During the conceptual stages of a project, formal reviews are normally conducted to ensure that the requirements are properly stated; those same reviews are also conducted at the completion of the work statement or specification. During the actual development process, periodic reviews should be conducted at key stages. When the project is completed, tests should be conducted to determine if the simulation meets the design objectives and stated requirements. The final phase of testing is validation. The purpose of validation is to determine if the simulation meets the customer’s needs. Why is this process of testing so important? The entire development process is lengthy, and during that process there is a very high probability that changes will be induced. The only way to manage the overall process is by performing careful inspections at each major phase of the project. As the organization and content of this handbook make evident, this process has been the fundamental framework for conducting most of today’s leading research and development initiatives. Following section to section, the reader is guided through the requirements, development, and evaluation cycle. The reader is then challenged to imagine the state of the possible in the final, Future Directions, section. In summary, one can see that the future of simulation to support education and training is beyond our comprehension. That does not mean that care must not be taken in the development process. The key issues that must be addressed were cited earlier. There is one fact that one must keep in mind: No simulation is perfect. But through care, keeping the simulation objectives in line with the
xvi
Preface
capabilities of modeling and implementation, success can be achieved. This is demonstrated by the number of simulations that are being used today in innovative settings to improve training for a wide range of applications. REFERENCES Kurzweil, R. (1990). The age of intelligent machines. Cambridge, MA: MIT Press. Needham, J. (1986). Science and civilization in China: Volume 2. Cambridge, United Kingdom: Cambridge University Press.
ACKNOWLEDGMENTS
These volumes are the product of many contributors working together. Leading the coordination activities were a few key individuals whose efforts made this project a reality: Associate Editor Julie Drexler Technical Writer Kathleen Bartlett Editing Assistants Kimberly Sprouse and Sherry Ogreten We would also like to thank our Editorial Board and Review Board members, as follows: Editorial Board John Anderson, Carnegie Mellon University; Kathleen Bartlett, Florida Institute of Technology; Clint Bowers, University of Central Florida, Institute for Simulation and Training; Gwendolyn Campbell, Naval Air Warfare Center, Training Systems Division; Janis Cannon-Bowers, University of Central Florida, Institute for Simulation and Training; Rudolph Darken, Naval Postgraduate School, The MOVES Institute; Julie Drexler, University of Central Florida, Institute for Simulation and Training; Neal Finkelstein, U.S. Army Research Development & Engineering Command; Bowen Loftin, Texas A&M University at Galveston; Eric Muth, Clemson University, Department of Psychology; Sherry Ogreten, University of Central Florida, Institute for Simulation and Training; Eduardo Salas, University of Central Florida, Institute for Simulation and Training and Department of Psychology; Kimberly Sprouse, University of Central Florida, Institute for Simulation and Training; Kay Stanney, Design Interactive, Inc.; Mary Whitton, University of North Carolina at Chapel Hill, Department of Computer Science
xviii
Acknowledgments
Review Board (by affiliation) Advanced Brain Monitoring, Inc.: Chris Berka; Alion Science and Tech.: Jeffery Moss; Arizona State University: Nancy Cooke; AuSIM, Inc.: William Chapin; Carlow International, Inc.: Tomas Malone; CHI Systems, Inc.: Wayne Zachary; Clemson University: Pat Raymark, Patrick Rosopa, Fred Switzer, Mary Anne Taylor; Creative Labs, Inc.: Edward Stein; Deakin University: Lemai Nguyen; Defense Acquisition University: Alicia Sanchez; Design Interactive, Inc.: David Jones; Embry-Riddle Aeronautical University: Elizabeth Blickensderfer, Jason Kring; Human Performance Architects: Richard Arnold; Iowa State University: Chris Harding; Lockheed Martin: Raegan Hoeft; Max Planck Institute: Betty Mohler; Michigan State University: J. Kevin Ford; NASA Langley Research Center: Danette Allen; Naval Air Warfare Center, Training Systems Division: Maureen Bergondy-Wilhelm, Curtis Conkey, Joan Johnston, Phillip Mangos, Carol Paris, James Pharmer, Ronald Wolff; Naval Postgraduate School: Barry Peterson, Perry McDowell, William Becker, Curtis Blais, Anthony Ciavarelli, Amela Sadagic, Mathias Kolsch; Occidental College: Brian Kim; Office of Naval Research: Harold Hawkins, Roy Stripling; Old Dominion University: James Bliss; Pearson Knowledge Tech.: Peter Foltz; PhaseSpace, Inc.: Tracy McSherry; Potomac Institute for Policy Studies: Paul Chatelier; Renee Stout, Inc.: Renee Stout; SA Technologies, Inc.: Haydee Cuevas, Jennifer Riley; Sensics, Inc.: Yuval Boger; Texas A&M University: Claudia McDonald; The Boeing Company: Elizabeth Biddle; The University of Iowa: Kenneth Brown; U.S. Air Force Academy: David Wells; U.S. Air Force Research Laboratory: Dee Andrews; U.S. Army Program Executive Office for Simulation, Training, & Instrumentation: Roger Smith; U.S. Army Research Development & Engineering Command: Neal Finkelstein, Timothy Roberts, Robert Sottilare; U.S. Army Research Institute: Steve Goldberg; U.S. Army Research Laboratory: Laurel Allender, Michael Barnes, Troy Kelley; U.S. Army TRADOC Analysis Center– Monterey: Michael Martin; U.S. MARCORSYSCOM Program Manager for Training Systems: Sherrie Jones, William W. Yates; University of Alabama in Huntsville: Mikel Petty; University of Central Florida: Glenda Gunter, Robert Kenny, Rudy McDaniel, Tim Kotnour, Barbara Fritzsche, Florian Jentsch, Kimberly Smith-Jentsch, Aldrin Sweeney, Karol Ross, Daniel Barber, Shawn Burke, Cali Fidopiastis, Brian Goldiez, Glenn Martin, Lee Sciarini, Peter Smith, Jennifer Vogel-Walcutt, Steve Fiore, Charles Hughes; University of Illinois: Tomas Coffin; University of North Carolina: Sharif Razzaque, Andrei State, Jason Coposky, Ray Idaszak; Virginia Tech.: Joseph Gabbard; Xavier University: Morrie Mullins
SECTION 1
INTEGRATED TRAINING SYSTEMS SECTION PERSPECTIVE Neal Finkelstein
HISTORY The uses of modeling and simulation for military purposes have come a long way with the advances of information and computer technology. Long gone are those early days in the 1930s with the ANT-18 Basic Instrument Trainers, known to tens of thousands of fledging pilots as Ed Link’s Blue Box Trainers. The noisy ANT-18s were guaranteed to offer an individual a flight training course for $85 (Vintage Flying Museum, 2005). The military was soon sold on the promise of being able to provide flight training instruction with instruments alone after seeing a demonstration of the ANT-18 technology under some of the harshest conditions. Shortly thereafter, the Army Air Corps purchased six of Link’s trainers for $3,500 apiece, thus beginning Link Aviation Devices, Inc., and creating an everchanging industrial base for military training with simulation. Throughout the decades since Link’s ANT-18 was discovered by the military, the modeling and simulation industry has come to produce many bright, committed, and innovative men and women working to improve methods for modeled and simulated military training. This section includes chapters authored by experts in the fields of modeling and simulation for the purposes of military training. The authors will address specific training applications that use some of the same concepts once employed in the early ANT-18s, as well as cutting-edge research and development, some of which is still in its infancy awaiting Moore’s, Metcalfe’s, and Gilder’s laws to move the industry a little farther along its path (Pinto, 2002). Throughout this section, the authors discuss a cross-section of modeling and simulation applications used by the various branches of military service. Some of the applications began from the very outset with specific requirements, goals, and customers, while others were started in the great minds of engineers,
2
Integrated Systems, Training Evaluations, and Future Directions
scientists, and warfighters in search of new ways to improve upon warfighters’ abilities. However, no matter how projects are started, developed, or challenged by funding issues or requirements creep, the bottom line is that these applications are developed with the best intentions to aid warfighters by physically, mentally, and emotionally immersing them in specific environments to prepare them for the next phase of training or the ultimate test on the battlefield. RECENT PAST As the book editors and chapter leads began to discuss a section on the applied use of simulation for military applications, I was quickly reminded of how far we have come just in the last few decades. Specifically, I thought of a meeting on June 24, 1996, at the Pentagon, when a Department of the Army Inspector General LTC Elms gave a 90 minute briefing to the Vice Chief of Staff of the Army on modeling and simulation management. During this briefing, the Vice Chief of Staff of the Army directed the U.S. Army staff to conduct a functional area assessment on modeling and simulation management due to the Vice Chief of Staff of the Army’s three areas of concern. a. The army is spending more on modeling and simulation than it can afford. b. The army does not have total visibility on current modeling and simulation investments, especially operations and maintenance accounts. c. The army cannot define the value added of its modeling and simulation investments.
Many changes occurred following this briefing, including organizational transitions, the drafting of the Army Investment Plan, and reallocations of funding (Department of the Army, 2005), among many other changes I will not go into within the space of this introduction. However, looking back on that historical meeting, it is important to note how the military, industrial, and academic communities rose to the challenge in answering those questions in the following decade, although some could debate that we still have a long way to go. From either perspective on that debate, walking through the 400-plus exhibit booths at the annual Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC; www.iitsec.org) in Orlando, Florida, which was visited by over 16,000 visitors from around the globe, reveals that something special has happened in the modeling and simulation community. The I/ITSEC experience and offerings clearly demonstrate the industry’s worth, introducing a vast array of new technologies each year and highlighting the drive toward research in various fields, such as embedded simulation; interoperability for live, virtual, and constructive simulations; and holographic visual imaging, realistic environments of the large training centers to the smaller mobile military operations in urban terrain facilities being fielded all over the world to game based/mobile computing platforms and improvised explosive device trainers. That list clearly demonstrates an industry that has shown its value in a very short time.
Integrated Training Systems
3
Those early questions from a Vice Chief of Staff of the Army can be seen as a testament to the advances of the modeling and simulation community and programs. Many times during the early phases of technology growth there is an obligation by leadership not to stifle that growth (for example, the hundreds of unmanned systems companies competing for the Department of Defense marketplace today), and there is a “budget pass” given on funding efficiency in order to further science and that specific growth technology. However, when the big dollars start rolling toward a business area or technology, much more scrutiny is brought to bear. And that increased scrutiny is what the modeling and simulation industry saw in the mid-1990s. There should be no doubt that the big dollars are still rolling into the industry. For example, one estimate in 2005 by the National Training Systems Association suggests that government spending on U.S. military training, training support, and simulation is generally pegged at somewhere around 8 percent of overall departmental budgets, about $35 billion. In 2008, that total would be approximately $40 billion. Whether or not this funding has fueled a renewed debate on those three questions presented over a decade ago, there is little debate on the potential benefit of modeling and simulation technologies when executed in the truest sense of best value, with correct pedagogy, and tied to acceptable and validated requirements for the military community. Under these conditions these technologies easily demonstrate their abilities to benefit the military, homeland defense, academia, and commercial applications. Although the potential for the modeling and simulation industry is certainly there, and the military is certainly seizing that potential, it has not been without its challenges and pitfalls. Some of the pitfalls are driven by technological immaturity, others by political immaturity, and still others by the economy of bottom lines. The primary challenges the industry still faces are hardware, software engineering practices, software reuse, standards, interoperability, games management, industry and distribution rights, embedded training, and information protection. The chapters of this section discuss many of these challenges, so, as a means of transition, I will discuss three future trends for the next decade: embedded training, game based simulation, and the move toward Hollywood. TRENDS IN THE MODELING AND SIMULATION COMMUNITY Embedded Training The move toward embedded training has been under way within the U.S. Army since 1983 when the U.S. Army Training and Doctrine Command (TRADOC) made it the preferred way of training the U.S. Army (Witmer & Knerr, 1996). Embedded training has a wide variety of definitions in the army, but one that serves to encompass the complex nature of the term is given by TRADOC Pamphlet 350-37, Training: Objective Force Embedded Training (OFET) Users’ Functional Description (Department of the Army, 2003), which states that it is hardware and/or software, integrated into the overall equipment configuration,
4
Integrated Systems, Training Evaluations, and Future Directions
that supports training, assessment, and control of exercises on the operational equipment, with auxiliary equipment and data sources, as necessary. Embedded training, when activated, starts a training session, or overlays the system’s normal operational mode, to enter a training and assessment mode. A case can be made that, to some degree, embedded training has been successful, especially in the U.S. Navy and the U.S. Air Force where many embedded training systems have been deployed. In a sense, embedded training may be a little easier to deploy in those services as opposed to the U.S. Army, in which the distribution and numbers of systems far outweigh the ability to move training down to the “boots on the ground.” A few embedded training programs, such as the multiple integrated laser engagement system, tank weapons gunnery simulation system, precision gunnery system, and some air defense systems have been highly successful. However, the challenges arise in the extreme complexity needed to meet the vision of the leaders for embedding all training for future systems on the systems themselves. For example, embedding an advanced gunnery training system into the future combat systems so that a soldier, a crew, and even a platoon can seamlessly train within their actual battle-tested vehicles while steaming across an ocean, waiting in a motor pool, or deployed in a desert is no easy task. All this must be accomplished while limiting or eliminating the need for any new hardware for the training to be accomplished in those systems. Embedded training remains a large challenge for our industry and one that will continue to be with us for some time. One of the lights somewhere near the end of the tunnel for embedded training is the reduction of operations and maintenance accounts costs and negative training as the actual battle-tested system is being used with concurrent software revisions. As long as the first thought on embedded training is a good pedagogy and solid requirements, embedded training has tremendous potential to save costs, provide solid returns on investments, and show great benefits. This would have made the Vice Chief of Staff of the Army smile. Games—Individual, Team, Massively Multiplayer, and Mobile Another trend for the foreseeable future is computer based learning, which has been going on for decades; however, recent advances in personal computer (PC) based graphics and image generation have led to the building of computer games and massively multiplayer (MMP) environments to train the military warfighter. The first attempt to use a computer game for the military was when Lt. Scott Barnett, a project officer, in Quantico, Virginia, working for the Marine Corps Modeling and Simulation Management Office, obtained a copy of the commercial Doom, released in 1993 (http://www.tec.army.mil/TD/tvd/survey/Marine _Doom.html). General Charles C. Krulak (1997), Commandant of the U.S. Marine Corps, issued a directive (Marine Corps Order 1500.55) to use war games for improving “Military Thinking and Decision Making Exercises.” Moreover, he entrusted the Marine Combat Development Command with the tasks of developing, exploiting, and approving computer based war games to train U.S.
Integrated Training Systems
5
Marines for “decision making skills, particularly when live training time and opportunities were limited” (p. 1). In the U.S. Army, the first army game was developed by the U.S. Army Simulation Training and Instrumentation Command. The game was called Battle Command 2010 with the stated goal of being able to supplement lecture training by supplying a more engaging and stimulating system in order to provide warfighters with a better ability to retain information provided to them (Stottler, Jensen, Pike, & Bingham, 2002). With the hope of low cost solutions and relatively quick development schedules, games seem to be an avenue for the military to supplement live training and provide mission rehearsal and after action capability. After the success of America’s Army or Army Game Project (http:// www.americasarmy.com/), which was originally developed as a tactical multiplayer first-person shooter game for the public relations initiative to help with U.S. Army recruitment, gaming has now drifted into many aspects of the acquisition process. No matter what generation is involved, a case can be made that humans like playing games. Games are here to stay in the military because warfighters want them and use them. The amount of use these games get is shown by the many hours warfighters will spend playing or training themselves on a game even during their free time both individually and with their teammates (Solberg, 2006). No matter if they are deployed, on the move, or at a home station, the games are capable of getting both attention and time from the warfighters. Warfighters freely choosing to use an approved computer based game for their personal downtime seems to be an ultimate achievement for the creators, trainers, and investors in military training games. Currently, many studies are under way to investigate the effectiveness of these games for schools, businesses, and government agencies. The outcome of these studies is wide raging; however, once you discount the obligatory statement noting that individual differences may play as large a role as any other factor, you get down to what the scientific community has concluded over and over again with games. Games bring to training certain elements: competition, engagement, repetition, fun, and individual pace, all of which are key components to memory and retention (Farr, 1986). However, the many challenges of game based initiatives are discussed in this section. Some of these challenges include the digital divide (not everyone plays games) and the fact that some of these games are not built by folks who understand the cultural differences of the community they are trying to reach (Rosser, 2007). Other challenges include those faced by the larger-scale simulations, such as data rights, corporate business models, standards, interoperability, requirements creep, and most importantly, building a well-constructed game for good pedagogy, based on solid requirements that meet the goals set out for learning provided by dedicated trainers. One thing is for certain, based on the amount of games and game environments being built for industry, academia, and the military, games are here to stay, whether built in a MMP environment, computer based PC, or a mobile computing platform, such as a cell phone or iPod.
6
Integrated Systems, Training Evaluations, and Future Directions
Hollywood A trend that had been ramping up in the late 1990s and early twenty-first century, but which seems to be slowing now, was the U.S. Army move to solicit the help of Hollywood as a research and development agency for military training needs. This partnering plan served as the “big idea” for the army to further develop the future of military training and infuse the military and industrial base with new ideas (Ferren, 1999). As such, Hollywood seemed like an idea worth pursuing. Hollywood certainly can make immersive movies and excellent video games, and rides at Disneyland and Sea World capture the science of storytelling and immersion, making a two minute ride seem like an adventure. Hollywood and the Pentagon have a long history of making movies together; in fact, throughout much of the last century the military has frequently needed Hollywood just as Hollywood has frequently needed the military. This mutually beneficial partnering can be seen as far back as the early days of silent films (Roberts, 1997). Hollywood producers get what they want—access to billions of dollars’ worth of military hardware and equipment, such as tanks, jet fighters, nuclear submarines, and aircraft carriers, while the military gets what it wants —films that portray the military in a positive light, as well as films that help the services recruit new soldiers. This is well documented in the official U.S. Army publication A Producer’s Guide to U.S. Army Cooperation with the Entertainment Industry, in which Lawrence (2005) states film productions seeking the army’s assistance “should help Armed Forces recruiting and retention programs.” This continues today as the military budgets are used for NASCAR, motorcycle racing, extreme events, and thousands of hours of commercial time during television shows or advertising during the World Series and the Super Bowl. However, at times that partnership can become troublesome. This situation is what Hollywood journalist David L. Robb (2004) revealed in his book Operation Hollywood, quoting filmmaker Oliver Stone after he refused military assistance for his Vietnam War films Platoon and Born on the Fourth of July, “They make prostitutes of us all because they want us to sell out to their point of view” (p. 25). One of the main reasons the army looked to Hollywood during the quest for immersive simulation technologies was to help develop an immersive virtual environment that would capture the spirit of the battlefield and immerse soldiers in a system best described like the holodeck from Star Trek (Pollack, 1999). This system was to be so real that soldiers would unquestionably believe they had been physically, mentally, and emotionally deployed to the actual battle. In 1999, when E! Entertainment ran a piece just over three minutes in length to promote the $45 million army investment into the University of Southern California Institute for Creative Technology stating at the promotion’s end that the army could see this holodeck within the next two years, many legitimate acquisition professionals knew that this was an overly optimistic timeline. Perhaps the reason for the optimism was that Hollywood would easily be able to develop inventive immersive technologies with the artistry and creative genius of the Hollywood mindset, but the military and the entertainment complex are two very different communities, and it is going to take a lot longer to get to know each other with
Integrated Training Systems
7
regard to the military’s requirements based environment. Additionally, the business model of Hollywood dictates a small percentage of successes within a large number of trials. How many films must Hollywood make to achieve that one big blockbuster? If we are willing to give Hollywood the reins to creatively come up with a solution, then so be it. But get ready for many failures in order to achieve a few very big blockbuster solutions. One way or the other, this relationship is likely to be a rocky one where both parties frequently do not see eye to eye. As Bette Davis, the great American actress (1908–1989), once said, “Hollywood always wanted me to be pretty. But I fought for realism.” Hollywood has hired more and more ex-military people to help in these efforts, and there is tremendous potential for the military/entertainment complex to provide solutions that are just as good as any solutions the military/industrial complex can provide. This should prove to be an interesting relationship to watch in the coming years. CONCLUSION While the modeling and simulation community has come a long way since the Pentagon briefing in 1996, many challenges remain. By minimizing the point solutions of the past and embracing new technologies, the road ahead is well paved. With such concepts as blended simulation, sometimes referred to as blended learning (BL), continual progress will be made (Alvarez, 2005). Concepts like BL support opportunities to harness the best of face-to-face interaction with live, virtual, constructive, linked, unlinked, embedded, and mobile technologies to deliver the advantages of all forms of learning together when appropriate. What is unique today about BL is that never before have trainers had so much overlap in their abilities to bring training to the warfighter. With our newfound technology and the ability to move it on the digital highway comes the responsibility of spending as much time as possible on the front end of training requirements and the back end with scientific training effectiveness evaluations ensuring we get the training to the warfighter on time and on budget while meeting the performance needed for knowledge and skill transfer. For this reason, we have chosen a wide variety of experts in research, development, and acquisition of military applications in order for the readers of this section to be exposed to simulation work that may provide a way for more warfighters to be trained in a more effective manner than ever before possible. REFERENCES Alvarez, S. (2005). Blended learning solutions. In B. Hoffman (Ed.), Encyclopedia of educational technology. Retrieved November 6, 2007, from http://coe.sdsu.edu/eet/ articles/blendedlearning/start.htm Department of the Army. (2003, June 1). Training: Objective Force Embedded Training (OFET) users’ functional description (TRADOC Pamphlet No. 350-37). Fort Monroe, VA: Author, Headquarters.
8
Integrated Systems, Training Evaluations, and Future Directions
Department of the Army. (2005, February 1). Management: Management of Army models and simulations (Army Regulation No. AR-5-11). Washington, DC: Author, Headquarters. Elms, P. (1996, June 24). DAIG briefing to the VCSA on modeling and simulation management. Alexandria, VA: Pentagon. Farr, M. (1986). The long-term retention of knowledge and skills: A cognitive and instructional perspective (IDA Memorandum Rep. No. M-205). Alexandria, VA: Institute for Defense Analyses. Ferren, B. (1999, May–June). Some brief observations on the future of Army simulation. Army Research, Development and Acquisition Magazine. Krulak, C.C. (1997, April). Military thinking and decision making exercises (Marine Corps Order 1500.55). Washington, DC: U.S. Marine Corps Headquarters, Department of the Navy. Lawrence, J. S. (2005). Operation Hollywood: How the Pentagon shapes and censors the movies [Book review]. Journal of American Culture, 28(3), 329–331. National Training Systems Association. (2005). Training and simulation industry market survey 1.0. Arlington, VA: Author. Pinto, J. (2002). The 3 technology laws. San Diego, CA: Automation.com. Pollack, A. (1999, August 18). Pentagon looks for high-tech help from film. New York Times. Available online: http://www.amso.army.mil/resources/smart/add-nfo/articles/ ict.htm Robb, D. L. (2004). Operation Hollywood: How the Pentagon shapes and censors the movies. Amherst, NY: Prometheus Books. Roberts, R. (1997). Sailing on the Silver Screen: Hollywood and the U.S. Navy. The American Historical Review, 102(4), 1246. Rosser, J. B. (2007, October). We have to operate on you, but let’s play games first! Keynote Address Follow-Up presented at the Learning2007 Conference, Orlando, FL. Solberg, J. (2006, August). Researching serious games: Asking the right questions. Paper presented at the Intelligent Tutoring in Serious Games Workshop, Marina del Rey, CA. Stottler, R. H., Jensen, R., Pike, B., & Bingham, R. (2002, December). Adding intelligent tutoring system to an existing training simulation. Paper presented at the 2002 Interservice/Industry Training, Simulation & Education Conference, Orlando, FL. Vintage Flying Museum. (2005, September). Fort Worth, TX: Meacham International Airport [http://www.vintageflyingmuseum.org/] Witmer, B. G., & Knerr, B. W. (1996). A guide for early embedded training decisions— Second edition (Research Product No. 96-06). Alexandria, VA: U.S. Army Research Institute for Behavioral and Social Sciences.
Part I: Systems Engineering and Human-Systems Integration
Chapter 1
SYSTEMS ENGINEERING APPROACH FOR RESEARCH TO IMPROVE TECHNOLOGY TRANSITION Denise Nicholson and Stephanie Lackey The scientific community is typically trained within conventional sciences, such as physics, psychology, and computer sciences, based on the scientific method for research. However, in this highly competitive era, the need to deliver readyto-be-applied or transitioned products from research, in addition to publishing conclusions, is ever increasing. This emphasis on deliverable products creates a need for project teams to be multidisciplinary and to bring a more rigorous project management technique to research projects, such as systems engineering procedures typically utilized for the development of complex systems, for example, automobiles and planes. This chapter addresses a novel approach adapted from a more complex environment to merge with the scientific method. Following this approach will result in research that can answer both of the following questions: “What is the science?” and “What is the deliverable product?” SCIENCE AND TECHNOLOGY FUNDAMENTALS Advancement of scientific methodologies and tools through research and development is intended to benefit follow-on enterprise elements. Research and development enables the introduction of products, systems, and capabilities in order to “maximize value realized and minimize time until realization” (Bodner & Rouse, 2007). Efforts typically correlate to three phases: basic research, applied research, and advanced technology development (Department of Defense, 2006). Basic research focuses on the development of theoretical underpinnings to future technological advancement through systematic investigation (Department of Defense, 2006). These research activities do not focus on specific applications, hardware, or products. However, success at this level may transfer results to subsequent research in applied projects. One example would be an experiment designed to study cognitive processes and memory using college student
10
Integrated Systems, Training Evaluations, and Future Directions
participants who complete basic tasks, such as recalling a list of numbers, words, or objects, within a controlled laboratory setting. It is typically challenging to interpret the impact of the results to real world applications. Applied research aims to address specified needs through targeted development and advancement of an existing knowledge base. Results of these efforts may be presented as designs, systems, methods, or prototype devices in response to general mission requirements. Frequently, products provide insight into initial technical feasibility or potential technical solutions to general military needs (Department of Defense, 2006). In applied research, the above study would change by asking participants to perform a task similar to an operational task within a laboratory setting, such as simulated driving, operating a control panel, or making decisions based on information presented on a display. The results are easier to translate into use since real requirements are taken into account when the original problem is defined. Advanced technology development concentrates on subsystem and component development intended for integration into field experimentation or simulation efforts (Department of Defense, 2006). Technology demonstrations provide opportunities for product exposition, initial testing outside of a laboratory environment, and technology readiness reviews. Success at this level indicates that a technology should be made available for transition. An example would be the study of interface designs to improve cognitive processes during a real operational task. Experiments can include actual operators as participants within operational environments where the system will be used. In this case it is imperative that the end-use environment be well understood; otherwise, the results could go unused. Current trends in research indicate increased demand for products earlier in the traditional research cycle. The increased demand for insertion of innovative products often correlates to increased technical, cost, and schedule risks. Systems engineering (SE) principles and practices offer opportunities to address the technology transition challenges faced by such endeavors. SYSTEMS ENGINEERING FUNDAMENTALS The genesis of modern system engineering practices emerged from the weapons race between the United States and the Soviet Union following World War II. The complexity of post–World War II military projects increased exponentially compared to their predecessors (Hallam, 2001). Prior to World War II, military departments would rely on a “prime contractor” with expertise in the field required (for example, aircraft) to manage subcontractors and develop military systems. However, the Cold War motivated the integration of aircraft characteristics into advanced weapon systems. The technical advancement of weapon system technology resulted in significantly increased complexity and required an alternative development approach: a systems engineering approach. The Atlas Intercontinental Ballistic Missile Program served as the flagship project upon which systems engineering principles were founded (Hughes, 1998). This project team is credited with pioneering the development of quantitative methods that
Systems Engineering Approach for Research to Improve Technology Transition
11
form the basis of analytic tools and decision aids still prevalent in the twenty-first century (Hallam, 2001; Hughes, 1998). From an academic perspective, the International Council on Systems Engineering (2004) summarizes systems engineering as an “interdisciplinary approach and means to enable the realization of successful systems.” SE requires balancing several related developmental components: operations, performance, testing, manufacturing, cost and schedule, training and support, and disposal. In addition, the Council also emphasizes the importance of early identification of customer needs and functional requirements, and documentation of those requirements before system design and validation occur (International Council on Systems Engineering). Thus, the underlying philosophy of systems engineering focuses on what system entities do prior to determining what the entities are (Badiru, 2006). SYSTEMS ENGINEERING APPROACHES Since systems engineering is not dictated by physical properties leading to strict mathematical relationships, an abundant variety of approaches exists. Applications of systems engineering principles often differ depending on the nature of each specific project. However, Bahil and Dean (2007) specify seven tasks inherent to any systems engineering approach: 1. State the problem. 2. Investigate alternatives. 3. Model the system. 4. Integrate. 5. Launch the system. 6. Assess performance. 7. Reevaluate.
This process is known by the acronym SIMILAR (Bahil & Gissing, 1998). Commonly used systems engineering approaches that incorporate SIMILAR components include the waterfall method, the spiral approach, and the “Vee” model. The traditional waterfall method is composed of successive steps leading from problem formulation to system testing (Royce, 1970). Each of the typical waterfall phases is followed by a review of progress and documentation in order to determine whether the project is prepared to proceed to the next phase. This approach represents a structured engineering methodology aimed at designing and constructing large, complex systems. The waterfall method is easy to understand and well recognized. However, the waterfall approach fails to allow for adequate executive control, nor does it accommodate highly complex systems (Wideman, 2003). Furthermore, the waterfall approach is time consuming and costly, and prototyping is not accommodated (Lackey, Harris, Malone, & Nicholson, 2007).
12
Integrated Systems, Training Evaluations, and Future Directions
Spiral approaches also include a series of steps to be completed in succession, but multiple iterations of the process are planned. For example, project objectives, requirements, and constraints provide input for requirements analysis. Lower level functions are then decomposed during functional analysis. Next, the design phase identifies and specifies the elements required to produce system components meeting the specified requirements. Components are then developed based upon the established design. Feedback loops provide opportunities to revisit previous phases if necessary. Each iteration constitutes a spiral and is followed by a progress and documentation review by an oversight function. Oversight typically takes the form of management responsible for trade-off analyses, decision support, scheduling, and integration of technical disciplines (Lackey et al., 2007). Multiple spirals are employed until completion of the final product. The spiral approach provides greater flexibility than the waterfall approach. Each phase is preceded by a requirements phase. Amended requirements and prototyping may be incorporated if required (Lackey et al., 2007). Thus, the spiral approach facilitates faster development, but still requires a significant time investment. Additional advantages of the spiral method include close interaction with the user, iterative requirement refinements, and the cyclic development that continues until the product is accepted. A key disadvantage also exists. Executive control is challenged by a lack of schedule and budget accountability on the part
Figure 1.1. The Human Performance Systems Model: Integrating the Scientific Method and Systems Engineering
Systems Engineering Approach for Research to Improve Technology Transition
13
of developers. Without disciplined implementation of the spiral method, indefinite additions contribute to cost and schedule overruns (Wideman, 2003). The Vee model represents a linear approach that can be applied iteratively. Decomposition and definition of tasks comprise the downward slope of the “V.” Product requirements, design specifications, and test plans are documented. Fabrication and assembly follow. Once built, component verification and validation lead to system integration, testing, and demonstration on the upward slope of the V. Positive attributes to the approach involve well-documented requirements and designs, which facilitate auditing purposes. However, documentation processes can be cumbersome, and stakeholder accountability may be questionable. An alternative approach that incorporates elements from the previously described approaches is known as the human performance systems model (HPSM). Establishment of the HPSM reflects the U.S. Navy’s desire to apply system processes to address human performance issues. However, recent research has demonstrated the applicability of HPSM to technical research, development, and engineering. The HPSM (see Figure 1.1) is comprised of four iterative phases: (1) define requirements, (2) define solutions, (3) develop components, and (4) execute and measure. Each of the HPSM phases is presented in detail in the figure. Table 1.1 summarizes the steps for each of the systems engineering approaches. HPSM: INTEGRATING THE SCIENTIFIC METHOD AND SYSTEMS ENGINEERING Although the genesis of the HPSM resides within human systems development, the desire for products within the uncertainty of research endeavors makes the HPSM a viable alternative to traditional systems engineering approaches. This model has demonstrated applicability to prototype and research efforts due to its organized, yet flexible, structure (Lackey et al., 2007). The growing complexity of technology development amplifies the need to carefully plan and execute highly advanced research. Sauser (2007) stresses the importance of identifying appropriate systems engineering approaches to facilitate project success. Both positive and negative consequences must be considered when choosing a systems engineering method. The categorization of work to be performed plays an important role in systems engineering method selection (Bodner & Rouse, 2007; Saucer, 2007; Maier, 2006). HPSM Phase I—Define Requirements Requirements established through knowledge acquisition are documented and defined based upon the theories and techniques previously covered in Volume 1, Sections 1 and 2 of this handbook. Establishment of baseline or standard performance criteria is the focus. Statement of the problem occurs here as in the SIMILAR method. Constraints and derived requirements are delineated in a
Table 1.1.
Systems Engineering Approaches
Step
Scientific Method
SIMILAR
Waterfall
Spiral
Vee
HPSM
1
State the problem
State the problem
Define requirements
Analyze system functionality and define requirements
Define requirements
Specify knowledge acquisition and requirements
2
Make observations
Investigate alternatives
Specify design
Specify design
Specify design and verification plan
Define solutions and specify design
3
Form a hypothesis
Model the system
Build
Develop components
Fabricate and assemble
Model system and develop components
4
Perform experiment
Integrate
Test
Test performance
Verify and validate components
Conduct experimentation and measure performance
5
Draw and publish conclusions
Launch the system
Deploy
Provide feedback
Integrate and test
Deliver results: prototype and reports
Iterate
Iterate
Iterate
6
Assess performance
7
Reevaluate
Systems Engineering Approach for Research to Improve Technology Transition
15
similar manner to the spiral method. Like the Vee model, input for test plans and evaluation products are also crafted in this phase. In HPSM Phase I, researchers focus on functionality, scope, and performance criteria. These aspects are defined to the level of detail commiserate with the level of effort allowable. Due to the typically smaller investment in research, as opposed to full cycle acquisition (executed under the waterfall, spiral, or Vee methods), the type of freedom provided by the HPSM is beneficial. Although the formality of reporting may be reduced, traceability may be maintained. HPSM Phase II—Define Solutions The second phase presents solutions to the problems and requirements defined in Phase I of the HPSM. (For VE training and education systems, solutions can include the methodologies and technologies described in Volumes 1 and 2 of this handbook.) Clear parallels exist between the activities of this phase of the model and SIMILAR’s “investigation of alternatives.” This phase comprises typical design tasking found in the waterfall, spiral, and Vee models. The design phase in research benefits from the simplicity of the HPSM. Largescale programs warrant formal design review processes. However, research projects suffer from programmatic overload when required to function as full acquisition programs. For example, a preliminary design review for an acquisition program requires the same level of programmatic effort (for example, completion of formal and labor-intensive entrance and exit criteria checklists) as a critical design review. Additionally, the time frame between preliminary design review and critical design review may be several months or more depending on the level of system complexity. Within research the appropriate level of effort for a preliminary design review may be peer review, and the entrance and exit criteria may not require intervention by program management. Moreover, the time between preliminary design review and critical design review may be a period of weeks, rather than months. The flexibility of the HPSM permits the research team to tailor systems engineering processes to meet each project’s unique needs without sacrificing quality. HPSM Phase III—Develop Components Next, Phase III focuses on component development that typifies the other systems engineering methods discussed. (Many of the chapters in this section of the handbook describe prototypes developed via this process.) The team will finalize the design based on feedback from the design review. In relation to the scientific method, the developers “hypothesize” that the design will satisfy the requirements. Any requirements that cannot be satisfied with readily available approaches are documented as new hypotheses for further scientific exploration. Many times the development is designed to be implemented in phases or builds. Each build is named using a version nomenclature, that is, v1.0, 1.5, 2.0, and so forth. Depending upon the final products to be developed and the iteration
16
Integrated Systems, Training Evaluations, and Future Directions
of the HPSM, each version of the development may focus on an initial or upgraded model, process, software, hardware, or the final integration of these components into a system. HPSM Phase IV—Execute and Measure Finally, the product’s performance is measured through experimentation, test, and measurement against stated requirements. (See Volume 3, Section 2 for chapters related to this step of the process.) This phase shares similarities with the spiral method’s verification loop and the Vee model’s integration and verification branch. Like the spiral and Vee methods, the output of the execution and measurement phase provides input for the first phase of the next HPSM iteration. Multiple iterations, or spirals, of the HPSM are conducted based upon predetermined evaluation criteria (Human Performance Center, 2003). The results of experimentation verify that the design satisfies the requirements and hypothesis. Unfavorable results can be used as requirements for design improvements in the next cycle. In the meantime, the current version could be transitioned to the user as a prototype for initial implementation while the components are being upgraded. This approach is routinely seen in the software development field where initial “Beta” versions are released to users, and their feedback is a valuable step toward improving the final delivery. HPSM Iteration By iterating the HPSM execution, some of the spiral method advantages are available. However, clear metrics defined in Phase I alleviate the endless spiral syndrome addressed by the Vee model. Iterative applications of the HPSM allow the user to tailor systems engineering steps to an appropriate level. These features present advantages for smaller engineering efforts, and recent research findings (Lackey et al., 2007) indicate that this approach also reduces technical, cost, and schedule risks during prototype development. SUMMARY AND CONCLUSIONS The human performance systems model offers features similar to those found in the SIMILAR, waterfall, spiral, and “Vee” approaches, without the associated cost and schedule overhead. By enlarging the goal of the traditional scientific method to include the development of deliverables, HPSM provides a systems engineering alternative that can be customized to meet the needs of individual projects. The flexibility of the model lends itself to efforts that do not require exhaustive documentation and review cycles and/or efforts that may not be suited to complex, large-scale engineering efforts in need of detailed auditing. This handbook has been organized to help facilitate the human performance systems model systems engineering process. Applying the model to research allows project managers to plan and track milestones and to align deliverables
Systems Engineering Approach for Research to Improve Technology Transition
17
to transition program requirements. HPSM also permits research to proceed without unwarranted programmatic overhead. Thus, this process enables researchers to balance tasking to most effectively achieve mission objectives. As described in following chapters, research conducted via this process has resulted in successful programs with substantial contributions to both the scientific literature and delivered products that have transitioned to satisfied end users. REFERENCES Badiru, A. B. (2006). Handbook of industrial and systems engineering. Boca Raton, FL: CRC Press. Bahil, A. T., & Dean, F. F. (2007). What is systems engineering? A consensus of senior systems engineers. Systems Engineering. Retrieved June 24, 2007, from http:// www.sie.arizona.edu/sysengr/whatis/whatis.html Bahil, A. T., & Gissing, B. (1998). Re-evaluating systems engineering concepts using systems thinking. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, 28(4), 516–527. Bodner, D. A., & Rouse, W. B. (2007). Understanding R&D value creation with organizational simulation. Systems Engineering, 10(1), 64–82. Department of Defense (2006). DoD financial management regulation 7000.14-R 2B budget formulation and presentation. Retrieved October 26, 2007, from http:// www.defenselink.mil/comptroller/fmr/ Hallam, C. R. A. (2001). An overview of systems engineering—the art of managing complexity. Massachusetts Institute of Technology Research Seminar in Engineering Systems. Retrieved June 19, 2007, from http://web.mit.edu/esd.83/www/notebook/ NewNotebook.htm Hughes, T. P. (1998). Rescuing Prometheus. New York: Pantheon Books. Human Performance Center. (2003). Human performance system model. Retrieved November 18, 2005, from https://www.spiderhpc.navy.mil International Council on Systems Engineering (2004). What is systems engineering? Retrieved June 22, 2007, from http://www.incose.org/practice/whatissystemseng.aspx Lackey, S. J., Harris, J. T., Malone, L. C., & Nicholson, D. M. (2007). Blending systems engineering principles and simulation-based design techniques to facilitate military prototype development. Proceedings of the 2007 Winter Simulation Conference (pp. 1403–1409). New York: Institute of Electrical and Electronics Engineers. Maier, M. W. (2006). System and software architecture reconciliation. IEEE Transactions on Systems Engineering, 9(2), 146–152. Royce, W. W. (1970, August). Managing the development of large software systems. Proceedings of IEEE WESCON, 26, 1–9. Sauser, B. (2007). Toward mission assurance: A framework for systems engineering management. IEEE Transactions on Systems Engineering, 9(3), 213–227. Wideman, M. (2003). Software development and linearity (or, why some project management methodologies don’t work): Part 1. Retrieved August 18, 2007, from http:// www.maxwideman.com/papers/linearity/linearity1.pdf
Chapter 2
HUMAN-SYSTEMS INTEGRATION FOR NAVAL TRAINING SYSTEMS Katrina Ricci, John Owen, James Pharmer, and Dennis Vincenzi For the past several decades, there has been a surge of activity associated with human-systems integration (HSI). Affordability and human performance factors have forced the U.S. Navy, and the Department of Defense [DoD] at large, to reconsider its way of doing business. Within the U.S. Navy, recent HSI activity is seen in the context of policy and organizational changes, research and development activities, and educational initiatives (Pharmer, 2006). Problems of today’s acquisition processes echo issues raised in the 1970s (Smootz, 2003). Such acquisition programs as Aquila, a remotely piloted vehicle that was to capitalize on emerging technology, illustrated the need for welldefined HSI policies and procedures. This program started in 1979 and was originally estimated to cost $123 million for a 43 month development effort, followed by planned expenditures of $440 million for procurement of 780 air vehicles (U.S. General Accounting Office, 1997). By the time the army abandoned the program in 1987 due to cost, schedule, and technical difficulties, Aquila had cost over $1 billion (Smootz, 2003; U.S. General Accounting Office, 1997), and future procurement costs were expected to be an additional $1.1 billion for 376 aircraft (U.S. General Accounting Office). Yet even after lessons learned from acquisition programs such as Aquila, and considerable investment in initiatives, policies, and processes (for example, MANPRINT [manpower and personnel integration], HARDMAN [hardware versus manpower], and the DoD 5000 series), cost overruns, performance failures, and delivery delays continue to occur. From a training perspective, failures in the acquisition and design process inevitably lead to problems for the training community (Office of the Chief of Naval Operations, 2001). Further, advances in weapon system technology compel the training community to continuously examine the knowledge, skills, and abilities and corresponding training technologies and methodologies needed for today’s navy. Among those technologies is the application of virtual environments, affording not only an immersive training environment, but a tool to examine human performance in the systems engineering process as well.
Human-Systems Integration for Naval Training Systems
19
In order to understand current issues in HSI and training, it is useful to examine the history associated with HSI, the implementation strategies applied to the navy and DoD acquisition programs, and the changes that have occurred that require further transformation in the manner with which we design, develop, deliver, and maintain total system performance. Following a brief description of HSI, this chapter will detail the rise of HSI in the Department of Defense and the U.S. Navy. Advances in the processes and tools enhancing HSI and training will be discussed, particularly in the context of recent research and acquisition programs where advanced technology for both design and training applications are being used. Finally, this chapter will examine future issues and emerging challenges for HSI and training.
HSI DEFINED While a number of definitions exist for describing HSI (for example, Booher, 2003; Defense Systems Management College, 2001; U.S. Department of Defense, 1999), the basic underlying elements are that it is a process that includes the human component in the context of systems engineering. It is a continuous, cyclical, and ever-evolving process over the course of system design that integrates human centric disciplines across the entire lifecycle of the system. These disciplines, or HSI domains, include manpower, personnel, training, human factors engineering, survivability, habitability, safety, and occupational health. Integration processes occur at several levels: between the domains themselves, within the systems engineering process, and within the acquisition strategy. First, individual HSI domains must recognize implications of design options and decisions in other domains. The manpower, personnel, and training (MPT) and human factors engineering domains and the interactions inherent to those domains embody the importance of HSI processes. A push to reduce manpower has enormous implications for proper human factors engineering, the personnel required to man the system, and the training necessary to ensure successful performance. Failure to recognize this interaction can lead to subsequent system inadequacies. A second interaction must take place between the HSI domains and the systems engineering process, which consists of the iterative execution of the activities to analyze and decompose system requirements, to perform functional analyses and allocations, and to synthesize these requirements and functions into a product baseline. At each of these phases in the process, decisions are made that will have an impact on who the end user will be (personnel), how many users will be required (manpower), the characteristics the system must have to support the user (human factors engineering), and how that user must gain the competencies (training) to safely (safety) and effectively operate and maintain the designed system. Without direct involvement in the systems engineering process of individuals with expertise in the HSI domains, these critical decisions impacting the end user are made with too much emphasis on hardware and software considerations and not enough emphasis on the end user.
20
Integrated Systems, Training Evaluations, and Future Directions
To some degree, this past approach was less problematic than it is today, owing to the resourcefulness and resilience of end users. However, the navy has recognized that manning costs are a very large component of the total cost of ownership of a new system, and this knowledge has driven decisions to reduce the number of operators and maintainers of increasingly complex systems. In this environment of doing more with less, the inclusion of human considerations on equal par with hardware and software considerations in the systems engineering process has become much more important. While human considerations need to be integrated into the systems engineering processes, a need also exists to integrate these considerations within the acquisition strategy and management processes. While systems engineering focuses on development of the end product, the acquisition strategy and management processes ensure that the product is developed to meet the performance capability requirements within cost and schedule constraints. Again, each of the HSI domains plays a role in trade-offs among cost, schedule, and performance that characterize the acquisition of a new system. Without advocacy for the end user in the process, decisions would be made on the basis of reducing cost or maintaining the acquisition schedule. Of particular concern is the fact that huge advances in technology over the last several decades have made it possible to automate many tasks that, in the past, could be performed only by a human operator. On the surface, the widespread application of automation to a program would appear to resolve a number of the concerns of an acquisition program, including the cost of manpower and meeting performance goals. However, with the application of automation, the role of the end user moves from direct manipulator and maintainer of the system to a supervisor of, perhaps, multiple complex systems. This has implications on the knowledge, skills, and abilities required of the users (personnel and training), as well as the characteristics the system itself must have to support situational awareness and maintain manageable cognitive workload (human factors engineering) and protect against errors (safety). Thus, what may have appeared to be a simple solution creates a number of human-related challenges that can be addressed only by the inclusion of expertise on these issues into the daily trade-offs between cost, schedule, and performance that characterize the acquisition management process. Unfortunately, one challenge for those who advocate human systems integration is that many of the benefits are not readily seen in the early phases of exploring and refining the developing technology when the most important decisions about the design of a system are being made. In fact, including human considerations can increase the cost of developing the system in the short term. However, minor changes made early in the system development process are likely to cost substantially less than modifications made after fielding a system. The following section provides a brief history of the roots of this cultural change within the military. HSI: A BRIEF HISTORY Imagine this scenario: a major military acquisition is running behind schedule. Costs are skyrocketing, and from what has been tested to date, system
Human-Systems Integration for Naval Training Systems
21
performance is significantly lower than promised. This certainly sounds like a scenario happening today. However, this story has been recurring for decades. Over 30 years ago, as the Vietnam era drew to a close and the ranks of the armed forces decreased, the U.S. military began a force modernization program. Technology insertion was seen as a secure undertaking that promised to increase capability and readiness and to help regain a powerful force. Unfortunately, new weapons systems incorporating advanced technologies often proved difficult to operate, maintain, and support. New systems were delivering far poorer performance than expected, forcing a demand for higher levels of manpower and for more highly educated personnel. A passage from a 1981 General Accounting Office (GAO) report describes just a few of the problems the Department of Defense was experiencing: A tank hatch that a soldier, clothed for winter, cannot fit through; a major shipboard fire control system that cannot be adequately supported; aircraft test equipment that causes more problems than it solves; and a handheld missile that when fired startles the person that fires it, resulting in misses, are some examples of the problems with currently fielded weapon systems. (p. 4)
The purpose of the 1981 GAO report was to identify some of the more prominent causes of problems with acquiring and fielding major weapon systems and to recommend some meaningful actions to reduce problems with deployed systems in the future. The report focused on a concept termed “ownership considerations”—the factors other than cost, performance, and schedule that influence the effectiveness of a weapon system. These considerations today are termed “ilities” and include aspects of the acquired system that manifest themselves after the system is delivered. Maintainability, survivability, interoperability, and transportability are just a few of the factors that, when not considered in the system design process, can force huge cost surges during the lifecycle of the system.
Policy, Practices, and Technology The acquisition issues of the post-Vietnam era stirred a number of policy and process changes designed to avoid the types of pitfalls experienced within system acquisition. More recently, research efforts have taken aim at underlying technology advances that on one side allow trade-offs between manpower and design (for example, automation to reduce manning), yet foster the development of personnel to meet new knowledge, skill, and ability targets (for example, advanced training technologies). The Department of Defense Directive (DoDD) 5000.1 and the accompanying instruction (DoDI 5000.2) were first released in 1971 and 1975, respectively. Both were seen as a mechanism for effectively managing defense acquisition and controlling cost growth. This first DoDD 5000.1 was relatively small—only seven pages in length, but contained the cornerstone for future releases of the directive: centralizing policy, decentralizing execution, and streamlining
22
Integrated Systems, Training Evaluations, and Future Directions
organizations (Ferrarra, 1996). Key components and continuing themes in DoDD 5000.1 included the need for acquisition workforce competency and clear and logical requirements definition. Most recently, the May 2003 issuance of DoDD 5000.1 specifies that the acquisition program manager “shall apply human systems integration to optimize total system performance” (Ferrara). In 1985, long before the DoDD requirement to apply HSI, the U.S. Navy introduced HARDMAN as a process to inject manpower and personnel considerations into system design for all major acquisition categories. The process utilized a baseline comparison system to project MPT requirements. In doing so, program managers could make decisions on alternate designs in order to avoid acquisitions that could prove costly to operate and maintain. However, the HARDMAN methodologies themselves were cost and labor intensive and, thus, not always totally embraced. In reaction to this, the navy produced the training planning process methodology (TRPPM). The TRPPM prescribed a much more tailored approach to MPT analysis, allowing that smaller systems—systems not as likely to produce huge MPT dilemmas—required less analysis. In a similar effort, the army introduced the MANPRINT program in 1982. While the HARDMAN and TRPPM processes dealt specifically with analysis of MPT, MANPRINT is divided into seven domains: manpower, personnel capabilities, training, human factors engineering, system safety, health hazards, and soldier survivability. Although each domain is called out as a separate entity, in practice there is considerable overlap, and, in fact, the success of MANPRINT relies on the interaction of the seven domains. Throughout the acquisition process, the backbone of MANPRINT relies on the constant communication, interaction, and coordination between the MANPRINT domains, the program manager, and program integrated product teams (IPTs).
Advanced Technology and Training Considerations While policy and guidance provide a strong supporting plank for sound HSI practices, ever-evolving technology must also be addressed. As Foushee (1990) argues, improved design and increased automation must be met with corresponding changes in training. Where previously emphasis was placed on individual knowledge and skills, an added emphasis must be placed on the interactions of a team and the judgment and decision-making skills of its members. Several research efforts have helped define and optimize this emerging training challenge. The tactical decision making under stress research program was specifically designed to meet the increasingly complex decision-making environment through both decision support tools and training technologies. The events of July 3, 1988, in which the USS Vincennes mistakenly shot down an Iranian airbus killing all 290 passengers and crew onboard, dramatically emphasized the relationship between advance technology and training considerations. Under a perceived threat and operating in a hostile environment, the crew members of the Vincennes, a guided missile cruiser (CG-49) with the U.S. Navy’s most sophisticated battle-management system, mistook the Iranian airbus for a
Human-Systems Integration for Naval Training Systems
23
probable hostile F-14 (Collyer & Malecki, 1998). Subsequent research provided invaluable knowledge on the human decision-making process (for example, Zsambok & Klein, 1997), the development of critical thinking skills (for example, Cohen, Freeman, Wolf, & Militello, 1995), the behaviors that characterize high performing teams (for example, Smith-Jentsch, Johnston, & Payne, 1998), and the training and performance measurement strategies that produce and identify successful team performance (for example, Smith-Jentsch, Payne, & Johnston, 1996; Cannon-Bowers & Salas, 1998). Additional research has examined the use of technology within the context of team training. Team training in a complex environment, such as a shipboard combat information center, can require a large number of trainers in order to observe and record performance and provide feedback in a timely manner. Embedded training, defined as simulation based training seamlessly integrated into the operational setting (Lyons & McDonald, 2001), provides an opportunity to practice and to receive feedback on critical job skills within the context of the job. Further, advance technologies minimize the number of instructors needed in order to conduct training (Lyons & McDonald). The application of automated performance recording capabilities, the tracking of trainee performance as compared to experts, and the automation of the feedback process provided the opportunity to practice and learn individual and team skills without the traditionally manpower intensive context.
HSI and Training Systems Training is not only a domain of HSI—there is also an element of HSI in the design of training systems. As one example, the LPD-17 San Antonio class, the navy’s newest class of amphibious assault ships, embraced training as an important design consideration. As the functional replacement for the 41 ships of the LST-1179, LKA-113, LSD-36, and LPD-4 classes built in the 1960s, there was little doubt that ships of the San Antonio class would be called upon to support a wide range of missions. With that in mind, the LPD-17 class training required flexibility and adaptability to changing missions and demands (Phillips et al., 1997). Optimized manpower levels also underscored the need for organic training capabilities—covering introductory, or familiarization training, all the way to total crew team training. Further, it was recognized that the addition of automation that allowed for a reduction in crew size would require additional training resources, as the expertise required to work with new technology would require additional practice and exposure to a variety of environmental cues. As with the introduction of any capability, training considerations are required early in the design process. Requirements from the LPD-17 required operational capabilities document included provisions to provide fleet training services and to maintain readiness by providing for training of one’s own unit’s personnel, including onboard medical personnel and the embarked U.S. Marine Corps. Thus, critical training components, such as dedicated training compartments, embedded training systems, and an onboard training management system, can
24
Integrated Systems, Training Evaluations, and Future Directions
be traced to early program documentation. Further, the development of these capabilities necessitated the formal requirements documentation, software/hardware requirements reviews, and tests and evaluations as would be required for any capability. The development of training resources for the LPD-17 took into account the emerging technology by providing an electronic resource center and an advanced electronic classroom, as well as supporting access to electronic training through any of the over 300 shipwide area network drops located throughout the ship. Design considerations were also impacted by the need to support a virtual environment training capability for the Marine Corps. Finally, the LPD-17 training program worked along side the manpower and personnel IPTs in defining the personnel, knowledge, and skills necessary to man the five-crew-member training department—a new department onboard the LPD-17 whose tasks represented jobs usually allocated as collateral duties onboard other navy ship classes. As a part of a larger ship design team, over 100 workshops were conducted that made use of three-dimensional modeling capabilities that allowed the visualization of sailors and marines moving through the food service lines, stretcher movement through triage, fully equipped marines moving through air locks and stairs, forklifts operating in storage areas, and even trainees entering and engaging in learning in the advanced electronic classroom. These workshops produced feedback and design recommendations that significantly enhanced the ship’s design. Individuals were able to visualize design problems both small (for example, a phone located too far from a workstation) and large (for example, a welding shop located too close to a fuel compartment) that could be easily remedied early in the engineering process, but would be costly if not impossible later on.
THE FUTURE OF HSI AND NAVY TRAINING Certainly, the challenges of integrating human considerations into system design are still visible in Department of Defense acquisition. However, there have been a number of innovations in both processes and technologies that are paving the way for how we conduct human systems integration. Although technology can afford newer tools and processes to support the infusion of human considerations into system design, changes will always continue in the technology associated with modern weapons systems. As these changes occur, the training community—as well as other HSI domains—must also evolve. One emerging challenge for the HSI community is the use of unmanned vehicles, prominently as aviation assets, but as surface and subsurface platforms, as well. The use and demand for autonomous vehicles (AVs) has risen dramatically since the onset of the Global War on Terrorism and military operations in both Iraq and Afghanistan. In fact, a recent National Research Council study recommended that the navy accelerate the introduction of existing AVs and pursue new AV concepts and technologies (U.S. Department of Defense, 2005).
Human-Systems Integration for Naval Training Systems
25
As the surge of activity for a relatively new military asset continues, ironically, there is a concurrent debate related to the manpower, personnel, training, and human factors issues associated with an “unmanned” system. With the mishap rate for unmanned aircraft systems much higher than that of the manned flight community (Tvaryanas, Thompson, & Constable, 2006), a growing body of empirical data strongly suggest the need for immediate research and development to address fundamental human performance areas associated with the current inventory of AVs. Such questions as how many vehicles a single operator or control station can operate, whether or not the operator should be a certified aviator, and how much training and what type of training is necessary, are all unique challenges for the HSI community. More and more, advanced training technologies are providing the tools to meet the training demands of today’s navy. Embedded training capabilities, networked personal computer based simulation systems, and distant learning technologies provide contextual opportunities to practice and hone both individual and team skills. Further, immersive interactive training applications provide not only training capabilities, but offer opportunities to study design consideration and their impact on human performance. REFERENCES Booher, H. (2003). Handbook of human systems integration. Hoboken, NJ: Wiley & Sons. Cannon-Bowers, J. A., & Salas, E. (1998). Individual and team decision making. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress (pp. 17–38). Washington, DC: American Psychological Association. Cohen, M. S., Freeman, J. T., Wolf, S. P., & Militello, L. (1995). Training metacognitive skills in naval combat decision making (Tech. Rep. No. 95-4). Arlington, VA: Cognitive Technologies. Collyer, S. C., & Malecki, G. S. (1998). Tactical decision making under stress: History and overview. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress (pp. 3–15). Washington, DC: American Psychological Association. Defense Systems Management College. (2001, January). Defense acquisition acronyms and terms (10th ed.). Washington, DC: U.S. Government Printing Office. Ferrara, J. (1996, Fall). DOD’s 5000 documents: Evolution and change in defense acquisition policy. Acquisition Review Quarterly, 109–130. Foushee, C. H. (1990). Preparing for the unexpected: A psychologist’s case for improved training. Flight Safety Digest [Electronic version]. Retrieved July 28, 2007, from, http://www.mtc.gob.pe/portal/transportes/aereo/aeronauticacivil/alar_tool_kit/pdf/ fsd_mar90.pdf Lyons, D. M., & McDonald, D. P. (2001). Advanced embedded training with real-time simulation for Navy surface combatant tactical teams. In M. Smith & G. Salvendy (Eds.), Systems, social and internationalization design aspects of human computer interaction (Vol. 2, pp. 859–863). Hillsdale, NJ: Lawrence Erlbaum. Office of the Chief of Naval Operations. (2001, August). Revolution in training executive review of Navy training. Washington, DC: Author.
26
Integrated Systems, Training Evaluations, and Future Directions
Pharmer, J. (2006). The challenges and opportunities of implementing human system integration into the navy acquisition process. Defense Acquisition Review Journal, 14(1), 279–291. Phillips, D., Sujansky, J., Hontz, E. T., Cannon-Bowers, J. A., Salas, E., & Villalonga, J. (1997). Innovative strategies and methods for total ship training on LPD-17. Proceedings of the 19th Annual Interservice/Industry Training, Simulation and Education Conference (CD-ROM). Arlington, VA: National Training Systems Association. Smith-Jentsch, K. A., Johnston, J. H., & Payne, S. C. (1998). Measuring team-related expertise in complex environments. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress (pp. 61–87). Washington, DC: American Psychological Association. Smith-Jentsch, K. A., Payne, S. C., & Johnston, J. H. (1996, April). Guided team selfcorrection: A methodology for enhancing experiential team training. Paper presented at the 11th annual conference of the Society for Industrial and Organizational Psychology, San Diego, CA. Smootz, E. R. (2003). Human systems integration and systems acquisition interfaces. In H. R. Booher (Ed.), Handbook of human systems integration (pp. 101–119). Hoboken, NJ: Wiley & Sons. Tvaryanas, A. P., Thompson, W. T., & Constable, S. H. (2006). Human factors in remotely piloted aircraft operations: HFCAS analysis of 221 mishaps over 10 years. Aviation, Space, and Environmental Medicine, 77, 724–732. U.S. Army. (2005). MANPRINT handbook. Retrieved July 26, 2007, from http:// www.manprint.army.mil/manprint/mp-ref-works.asp U.S. Department of Defense. (1999, May). Department of Defense handbook: Human engineering program process and procedures (MIL-HDBK-46855A). Retrieved April 25, 2008, from http://hfetag.dtic.mil/docs-hfs/mil-hdbk-46855a.pdf U.S. Department of Defense, Office of the Secretary of Defense. (2005). Unmanned aircraft system roadmap 2005–2030. Retrieved March 2, 2006, from http:// www.acq.osd.mil/usd/uav_roadmap.pdf U.S. General Accounting Office. (1981). Effectiveness of U.S. forces can be increased through improved weapon system design (Report No. PSAD-81-17). Washington, DC: General Accounting Office. U.S. General Accounting Office. (1997, April). Unmanned aerial vehicles: DOD’s acquisition efforts (Report No. T-NSIAD-97-138). Washington, DC: Author. Zsambok, C., & Klein, G. (1997). Naturalistic decision making—Where are we now? Mahwah, NJ: Erlbaum.
Chapter 3
VIRTUAL ENVIRONMENTS AND UNMANNED SYSTEMS: HUMANSYSTEMS INTEGRATION ISSUES John Barnett People work with unmanned vehicles (UVs) and similar systems in a considerable number of applications. The employment of these systems has increased significantly in recent years, and indications are that the number of applications for UVs will increase in the future. Unmanned systems may include teleoperated vehicles or fixed systems where the operator controls the system’s actions, or they may be semi-autonomous vehicles that perform certain functions on their own under the operator’s direction. UVs are used primarily where it is dangerous or costly for humans to work. For example, they may explore other planets (Jet Propulsion Laboratory, 2007), neutralize improvised explosive devices (Lawlor, 2005), or operate in dangerous industrial environments (Brumson, 2007). UVs are relevant to virtual environments (VEs) because there are a number of advantages to using VEs to research unmanned systems (Evans, Hoeft, Jentsch, Rehfeld, & Curtis, 2006) and also to train users to operate them. Unmanned systems are controlled primarily by software. Similarly, objects operating in VEs are controlled by software. Although the lines of code may be different, the core architecture is the same. This similarity means that, from a human operator’s perspective, objects modeled in VEs will tend to act similar to robotic systems in the real world. Thus, the behavior of virtual unmanned systems can be used as an analog to real unmanned systems. This can be important because humans and UVs sometimes do not work together well. When people give directions to semi-autonomous systems, they can be unpleasantly surprised by the way the system carries out their commands, a phenomenon known as “automation surprise” (Sarter & Woods, 1997). This disconnect occurs because the way people process information is significantly different from the way software processes information, which means that human-system integration can be a challenge, both in the real world, and, correspondingly, in VEs. Unmanned systems are a form of automation; thus to understand how VEs can facilitate their use, it is important to understand the benefits and challenges of
28
Integrated Systems, Training Evaluations, and Future Directions
automation. Understanding some of the fundamental differences between how humans and automation operate might illustrate why communication failures and surprises occur and how they could possibly be avoided. THE PROMISE AND CHALLENGES OF AUTOMATION Considerable research has been done in the last several decades about how people work with automation. This research is relevant to UVs since many of the issues involving people working with automation apply to people operating UVs. Automation was originally seen as a means of reducing (human) errors and also reducing human workload (Billings, 1997; Bowers, Oser, Salas, & Cannon-Bowers, 1996). What research has found is that, although automation has significant benefits, there are also new challenges associated with humanautomation interaction (Billings, 1997), including workload, operator trust, automation failures, and human/automation interface problems. Workload/Boredom Continuum Automation often does not reduce human workload as much as change it from performing functions to monitoring the automation. When it does reduce workload, it can come at the wrong time, so that the operator experiences boredom with its consequent loss of vigilance (Billings, 1997). Since vigilance is necessary for monitoring, if the automation malfunctions, the operator will be less likely to notice it immediately. Dealing with Automation Failure If the automation does malfunction, it is frequently difficult for the operator to recover from the failure (Sarter & Woods, 1997). Often this is because the operator may not be aware of what the automation was doing at the time of the failure (known as being “out of the loop”), may not notice the failure right away, and may have difficulty reverting to manual means of performing the function. Trust and the Complacency/Distrust Continuum The way people establish a level of trust in semi-autonomous and automated systems generally mirrors how they trust other people (Muir, 1994). However, there is one major exception: people who have little experience with a system tend to put too much trust in it. They tend to have overconfidence in the ability of the system to perform its task, a phenomenon known as automation bias (Mosier, Skitka, Heers, & Burdick, 1998), and are surprised when the automation does not perform as expected. Research shows that once people experience automation failure, they tend to lose confidence in the system. Their confidence is slowly restored as they see
Virtual Environments and Unmanned Systems
29
the system perform correctly, but it never reaches the pre-failure level (Lee & Moray, 1994; Eidelkind & Papantonopoulos, 1997). Human-Automation Interface Problems Human-automation teams are significantly different from human-human teams, and if this fact is not considered in the design of the human-automation interface, problems ensue. Often automated systems will have too many ways to do the same thing (known as mode proliferation), which tends to confuse human operators. This is especially true when feedback about what the automation is doing is not available to the operator, resulting in a lack of mode awareness. A major reason that human-automation teams are significantly different from human-human teams is that people and software process information in very different ways. Some of the major differences in human and machine information processing involve what each “knows” and how that information is “known.” INFORMATION PROCESSING: SOFTWARE VERSUS “BIOWARE” At the most fundamental level, people and software process information differently. Software processes information as a series of logic trees and finite algorithms. Digits and states are discreet, and a bit is either one or zero. On the other hand, people process information as probabilistic networks. A nerve impulse does not necessarily trigger a subsequent nerve; it increases (or decreases) the probability the subsequent nerve will fire. This fundamental difference means that software will tend to react the same way to the same stimulus, whereas people may react differently to similar stimuli presented at different times. Another major difference between people and automated systems is that people perceive things by matching patterns, but automated systems normally require exact matches to perceive something. This means that people can recognize something if there are pieces missing or obscured, but semi-autonomous systems find it much more difficult. One difference between people and automation that has probably the greatest impact on misunderstandings is that people share a vast amount of implicit knowledge about the world, but automated systems do not share this same knowledge. Because this knowledge is shared by nearly all humans, it is commonly understood and therefore unstated. For example, if one person asks another to go into a dark room and find an item, it is not necessary to tell him or her to turn on the light. Humans share the implicit knowledge that providing enough light to see is a preliminary step to conducting a visual search. Semi-autonomous systems do not share this implicit knowledge the way people do, and if they are not given explicit instructions, they do not perform as expected. One example from a computer game may illustrate this point. In one game, the player leads a team of commandos with the mission of rescuing “hostages” held
30
Integrated Systems, Training Evaluations, and Future Directions
by “terrorists” in an urban environment. Aside from the human player, all of the other entities are computer-generated avatars. In one instance, the human player commanded an avatar teammate to throw a stun grenade in a room where it was suspected hostages were being held by terrorists. The avatar dutifully selected a stun grenade, pulled the pin, and threw it at the closed door. The grenade bounced off the door and stunned the team. Not surprisingly, the human player was startled at this turn of events. The player never thought to open the door, because a human teammate would have understood that as a preliminary step to throwing something into the room. The reason people are surprised at the lack of implicit knowledge in automated systems is twofold. First, people’s implicit knowledge of the world is so deeply ingrained it operates automatically (that is, without conscious thought). Therefore, it rarely occurs to them that the automated system would not have the same world knowledge. Second, people tend to interact socially with automated systems (Sundar, 2004) and frequently expect them to behave as people do (Bergeron & Hinton, 1985; Muir, 1994). When automation violates these expectations, people are caught by surprise. Fortunately, even with such fundamental differences, human-automation teams can work well together if a conscious effort is made to integrate people and automation.
IMPROVING HUMAN-SYSTEM INTERACTION Given that there are sometimes some surprising disconnects between humans and UVs, there are two basic approaches to reducing these difficulties. The first is to provide comprehensive training to UV operators so that they understand how the UVs function at the most basic level and also to provide extensive practice so that the operators will overcome their natural tendency to think of the them in human terms. Although feasible, this approach is a time consuming and expensive process. It would require retraining behaviors that people have developed over decades of working with other humans. Such training would be time consuming and could easily break down under stress. The second approach would be to program UVs to function more like operators would expect and to provide feedback to the operators so that they can better predict the system’s actions. The advantage of this method is that, unlike people, changing the software for a UV changes its actions permanently. Software based systems do not need to be trained and retrained. The complicated part of this approach is that it would require user testing to identify where humanautomation conflicts occur so that they could be addressed. However, once the incompatibilities are identified and addressed, the system would not only be easier for the operators to use, but could serve as a model for future user-friendly systems. Conversely, training operators to work with the system, as in the first approach, means each new operator would not only require initial training, but periodic retraining as well. Programming semi-autonomous systems to act more like people expect them to is especially important in VEs when the system is an avatar modeling a human.
Virtual Environments and Unmanned Systems
31
Obviously, avatars that model humans should be expected to act like humans as much as possible. If a person in a virtual environment can interact with an avatar in a natural way, it tends to increase his or her sense of presence in the VE (Schroeder, 2006). From a practical standpoint, it may not be possible to make UVs act exactly like humans. There may be cases in which the technical challenges make it impractical to change the actions of the unmanned system. Therefore, the best technique may be a melding of the two approaches; that is, modify the software to reflect human expectations where possible, but, when necessary, train operators to understand system limitations so as to mitigate violations of user expectations and thus minimize surprises.
USING VES TO IMPROVE HUMAN-SYSTEM INTEGRATION A VE may be the ideal environment to help improve the fit between people and semi-autonomous systems. Since UVs act similarly in a VE as they would in the real world, a VE can be used to assess the quality of the interaction between UVs and people under a variety of controlled simulated conditions. A VE can be used to identify those situations where misunderstandings occur between people and the unmanned systems. It can also be used to develop more human-friendly software. The virtual world may be the best environment to test human-system fit when it is impractical or dangerous to test in the real world. For example, testing UVs that carry weapons such as unmanned combat air vehicles or unmanned ground combat vehicles requires considerable space to maintain safety. Virtual unmanned combat vehicles would not require the same large weapons ranges or have the same safety concerns as their real counterparts. Testing human-system integration would include not only testing the software routines, but the physical interfaces as well. Again, a VE would be a good environment to test the interface since it is generally easier and less costly to make software changes versus building hardware interfaces. Some interface testing of this type is already being accomplished (Neumann, 2006). Obviously, VE would be good for training operators of unmanned systems. VE is essentially a training environment. It could be used to train people how to operate the interfaces, as well as introducing operators to any automation quirks of the system.
CONCLUSION Semi-autonomous UVs and similar robotic systems are likely to become commonplace in the future. For this to happen, the human operators and the automated systems must learn to work well together. The VE promises to be the ideal environment for designing, testing, and improving the human-systems interface, as well as training the operator to take advantage of the benefits of the unmanned
32
Integrated Systems, Training Evaluations, and Future Directions
system. Both VEs and unmanned systems are evolving technologies that promise to enhance significantly how people work with future technology. REFERENCES Bergeron, H. P., & Hinton, D. A. (1985). Aircraft automation: The problem of the pilot interface. Aviation, Space, and Environmental Medicine 56(2), 144–148. Billings, C. E. (1997). Aviation automation: The search for a human-centered approach. Mahwah, NJ: Lawrence Erlbaum. Bowers, C. A., Oser, R. L., Salas, E., & Cannon-Bowers, J. A. (1996). Team performance in automated systems. In R. Parasuraman & M. Mouloua (Eds.), Automation and human performance: Theory and applications (pp. 243–263). Mahwah, NJ: Lawrence Erlbaum. Brumson, B. (2007). Chemical and hazardous material handling robots. Robotics online. Retrieved May 17, 2007, from http://www.roboticsonline.com/ public/ articles/ articlesdetails.cfm?id=2745 Eidelkind, M. A., & Papantonopoulos, S. A. (1997). Operator trust and task delegation: Strategies in semi-autonomous agent system. In M. Mouloua & J. M. Koonce (Eds.), Human automation interaction: Research and practice (pp. 46–52). Mahwah, NJ: Lawrence Erlbaum. Evans III, A. W., Hoeft, R. M., Jentsch, F., Rehfeld, S. A., & Curtis, M. T. (2006). Exploring human-robot interaction: Emerging methodologies and environments. In N. J. Cooke, H. Pringle, H. Pedersen, & O. Connor (Eds.), Human factors of remotely piloted vehicles (pp. 345–358). Amsterdam: Elsevier. Jet Propulsion Laboratory. (2007). Mars exploration Rover mission. Retrieved May 17, 2007, from http://origin.mars5.jpl.nasa.gov/overview Lawlor, M. (2005). Robots take the heat. Signal Magazine. Retrieved May 17, 2007, from http://www.afcea.org/signal/articles/anmviewer.asp?a=692 Lee, J. D., & Moray, N. (1994). Trust, self-confidence and operators’ adaptation to automation. International Journal of Human-Computer Studies, 40, 153–184. Mosier, K. L., Skitka, L. J., Heers, S., & Burdick, M. (1998). Automation bias: Decision making and performance in high-tech cockpits. International Journal of Aviation Psychology, 8(1), 47–63. Muir, B. M. (1994). Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems. Ergonomics, 37(11), 1905–1922. Neumann, J. L. (2006). Effect of operator control configuration on uninhabited aerial system trainability. Dissertation Abstracts International B, 67(11). (UMI No. AAT 3242458). Sarter, N. B., & Woods, D. D. (1997). Team play with a powerful and independent agent: Operational experiences and surprises on the Airbus A-320. Human Factors, 39(4), 553–569. Schroeder, R. (2006). Being there together and the future of connected presence. Presence, 15(4), 438–454. Sundar, S. S. (2004). Loyalty to computer terminals: Is it anthropomorphism or consistency? Behavior & Information Technology, 23(2), 107–118.
Part II: Defense Training Examples
Chapter 4
U.S. MARINE CORPS DEPLOYABLE VIRTUAL TRAINING ENVIRONMENT Pete Muller, Richard Schaffer, and James McDonough Deployable Virtual Training Environment (DVTE) is an evolving U.S. Marine Corps program. It is not a monolithic system, but rather a framework to deliver individual and small-team training and mission rehearsal simulations on networked laptops. DVTE has changed significantly over the years, and it has absorbed a number of personal computer (PC) based training systems. Before we discuss the DVTE program, we will address the evolution of distributed simulation technologies in the Department of Defense (DoD) and commercial PC games because they each play a critical role in the development of DVTE. DEPARTMENT OF DEFENSE DISTRIBUTED SIMULATION In the early 1990s, the Defense Advanced Research Projects Agency (DARPA), under the leadership of Jack Thorpe, began a visionary program known as simulation network (SIMNET). At the time, the U.S. Air Force had very expensive aircraft simulators that were unable to interoperate. Thorpe envisioned a shared virtual environment where aircrews could train together even if they were not collocated. Ironically, the SIMNET program found a strong supporter not in the air force, but in the U.S. Army, and tank training became a major focus of the program (Cosby, 1999). This eventually led to millions of dollars of army investment, ultimately resulting in the close combat tactical trainer (CCTT) for the M1 Abrams tank. While the army developed the CCTT, Modular SemiAutomated Forces (ModSAF) was developed with DARPA funding to further improve the quality of distributed simulation. As the name implies, it provided a modular architecture upon which researchers could expand. In 1995, DARPA began a three year advanced concept technology demonstration known as synthetic theater of war (STOW) with U.S. Atlantic Command (now Joint Forces Command [JFCOM]) that had very aggressive goals, including developing semi-automated forces for each of the services, developing high resolution terrain, and developing realistic environmental effects, such as weather and smoke (Lenoir & Lowood, 2005; Feldmann & Muller, 1997).
34
Integrated Systems, Training Evaluations, and Future Directions
To meet the challenges, DARPA had each service develop its own semiautomated force (SAF) based on ModSAF. Concurrently, the synthetic environment within ModSAF was significantly improved, adding such physically correct features as wind, rain, clouds, fog, smoke, and deformable terrain (Lukes, 1997). Marine Corps Semi-Automated Forces development was led by Naval Research and Development (now known as SPAWAR [Space and Naval Warfare] Systems Center, San Diego, California). Most of the ModSAF development up until that point focused on platforms, such as aircraft and armored vehicles. Hughes Research Laboratories led the development of simulated infantry or individual combatants (Howard, Hoff, & Tseng, 1995). After an intense development period, all of the service SAFs were integrated to form Joint Semi-Automated Forces (JSAF). JSAF became one of the first simulations to use the high level architecture (HLA), which is a DoD-mandated software architecture designed specifically for distributed simulation applications. JSAF continues to evolve and is used today by JFCOM for joint experimentation, the navy for fleet battle experiments, and the Marine Corps for DVTE. COMMERCIAL VIDEO GAMES IN THE U.S. MARINE CORPS The U.S. Marine Corps (USMC) has a tradition of doing a lot with limited resources. When PC games became popular in the mid-1990s, innovative marines at the Marine Corps Modeling and Simulation Office in Quantico, Virginia, experimented with them to see how they could be used for training. They discovered that the popular first-person shooter game Doom, by id Software, could be modified to look more like a marine training tool. The “space marine” was made to look like an actual marine; realistic-looking weapons replaced the futuristic weapons, and the demons were made to look like conventional opposing forces. When the commercial version of Doom II was released in 1996, these modifications, or mods, were compiled and put on a USMC Web server. The commandant of the Marine Corps, Gen. Charles Krulak, approved the use of certain PC based games to be used during duty hours on government computers for marines “to exercise and develop their decision making abilities” (Krulak, 1997). In April 1997, Marine Doom even made the cover of Wired magazine (Riddell, 1997). Despite the early promise, modification of commercial first-person shooter games had many limitations that hindered their ability to be effective training tools. Doom was designed to be entertaining, not tactically accurate. Even though the character looked like a marine, it still acted like a Doom space marine. The marines continued to experiment with commercial off-the-shelf (COTS) games, but they did not become integrated into any formal program of instruction. The early efforts with mods showed that there was strong potential for PC based training simulations to supplement or replace the more traditional Silicon Graphics or Sun workstations based simulations of the time. In 1997, MAK Technologies won a naval phase one Small Business Innovative Research (SBIR) for training amphibious forces with video games. This SBIR evolved into Marine Air-Ground Task Force XXI (MAGTF XXI), one of the first tactical decision simulations (TDSs) that was not a modified game. It used
U.S. Marine Corps Deployable Virtual Training Environment
35
gaming technology, but was designed from the outset for training in a USMC environment (Lenoir, 2000). MAGTF XXI has been continuously improved, including the addition of a capability to stimulate real C4I (command, control, communications, computers, and intelligence) systems. Throughout the late 1990s and into the next century, COTS games continuously improved, particularly those portraying small unit infantry operations. Marines continued to experiment with games such as Medal of Honor: Allied Forces and Rogue Spear. In addition, the Office of Naval Research (ONR) began research into modifying COTS games to make them into more effective training tools. This led to such TDS as Close Combat Marines and Tactical Operations Marine Corps (TacOpsMC). While games were showing their utility for limited training tasks, there was still a need for an integrated system to provide more robust team training. Congress included language and funding in the Defense Authorization Act for Fiscal Year (FY) 2001 that jump-started USMC modeling and simulation. This is known as a “congressional plus-up.” SHIPBOARD SIMULATORS FOR MARINE CORPS OPERATIONS The budget request included no funding for analysis of shipboard Marine Corps operational simulator technology. The committee is aware of advances made in training simulation technology and the potential that training and rehearsal planning simulators have in supporting Marines deployed at sea. It is clear that technology exists to provide shipboard simulators for many of the expeditionary missions embarked Marines will have to execute. As these simulators will allow Marines an opportunity to train to the fullest extent possible while in transit, the committee believes it is time to explore the availability and applicability of both existing and new training simulators to meet Marine Corps requirements. (National Defense Authorization Act, 2000, p. 177)
REQUIREMENTS SPECIFICATION FOR DVTE To meet the congressional mandate, the Technical Division of the USMC Training and Education Command, led by Dr. Mike Bailey, began an aggressive program to define the future of Marine Corps simulation and developed a plan that was presented to Congress in 2001. The report said the “training goals are to maintain and expand proficiency in individual skills, improve decisionmaking, and enhance teamwork for both Marine teams and Navy-Marine teams” (Technical Director, Training and Education Command, 2001, p. 2). Because of space limitations aboard ships, the report recommended the use of laptops and proposed that each laptop should be able to run multiple training applications (Technical Director, Training and Education Command). While formal requirements were drafted, Dr. Bailey’s team began putting together a prototype configuration that would be tested by marines. In addition
36
Integrated Systems, Training Evaluations, and Future Directions
to a number of COTS games, the team assembled a JSAF based federation. They coined the term user scrutiny event (USE) to describe events that put marines in front of the systems. The idea was not to demonstrate a particular capability, but to put as many different capabilities as possible in front of marines to solicit their feedback to support a future program. To meet the rapid demonstration objectives, requirements were fairly informal. Each simulator had to be interoperable with JSAF via the HLA and the MAGTF federation object model. Each component simulation had to operate on the same configuration Dell laptop and Microsoft joystick. All of the vehicles had to use the same basic keyboard and joystick commands, and all of the software had to reside on the hard drive. Each laptop had to be able to be rebooted to run any simulator, and each simulator had to have a virtual representation of the same piece of terrain. DEVELOPMENT OF SYSTEM PROTOTYPES There have always been two main components to the DVTE program, the Combined Arms Network and the Infantry Tool Kit. Infantry Tool Kit The Infantry Tool Kit (ITK) is a collection of COTS and government off-theshelf (GOTS) simulations that are able to run on the DVTE host computers. The individual applications are not interoperable with each other, but some can be networked to the same simulation on another laptop. The most visible portion of the Infantry Tool Kit is the first-person shooter (FPS) application. Coalescent Technologies Corporation had the exclusive license to develop Operation Flashpoint, a popular FPS application for the Department of Defense military training market and called it Virtual Battlefield System 1 (VBS-1). The improved graphics, realism, and networking made it much more suitable than Operation Flashpoint for infantry team training. VBS-1 became the backbone of the USMC DVTE Infantry Tool Kit. Combined Arms Network By using JSAF as the virtual environment, the developers were able to rapidly build simulators for a large portion of the marine expeditionary unit. The Combined Arms Network (CAN) was intended to be the interoperable suite of laptop simulators envisioned in the congressional mandate. Raydon developed vehicle simulations of the amphibious assault vehicle, the M1 tank, and the light armored vehicle. Naval Air Systems Command manned the flight simulator at Patuxent River, Maryland, and developed the air simulations, including the AH-1 Cobra helicopter and the AV-8 B Harrier. FATS, Inc. built the forward observer trainer, based on its Indoor Simulated Marksmanship Trainer system. A common viewer was initially provided by the naval visualization program, a GOTS product written and maintained by the Naval Surface Warfare Center—Coastal Systems Station, Panama City, Florida.
U.S. Marine Corps Deployable Virtual Training Environment
37
OPERATIONAL OR USER CONSIDERATIONS Although marines quickly learned how to operate the DVTE system, it required several contractors to set up and maintain the system. Early experience with the system made it clear that future systems should be much easier to operate and maintain so that contractor support would not be needed. DVTE had a series of two USEs. USE 1 took place in December 2001 at Camp Lejeune, North Carolina, and focused on marines’ evaluations of the basic functional pieces in a company-sized combined arms scenario. USE 2 was in July 2002 and was conducted aboard LHD-7, the USS Iwo Jima, pierside in Norfolk, Virginia. Technical Performance USE 1 validated that networked laptops and their graphics cards could be used as the basis for a shipboard virtual environment based training system. Although many people were skeptical of the performance of laptops, they were very capable of running the simulations in real time. One of the major lessons of USE 1 was the need for a single consistent “view” of the world. Although each of the DVTE systems played in the same “box,” each one used its own unique database, and each one presented its unique database using its own individual image generator (IG). In addition to making configuration management difficult, the use of different world views compromised the interoperability of the simulations. After USE 1, DVTE moved to a common database and a common IG, the AAcuity PC-IG developed by SDS International. Another major limitation to the DVTE was that each laptop had to be started manually. This was replaced by an application that started the entire federation from a single control station for USE 2 (Zeswitz, 2001; Bailey & Guckenberger, 2002). DVTE (Current Generation) Performance Standards The original DVTE program successfully demonstrated a number of concepts, but the technology did not become part of a program of record. Fortunately, the Office of Naval Research had a research program called Virtual Technologies and Environments (VIRTE) Demo 3: Multi-Platform Operational Team Training Immersive Virtual Environment (MOT2IVE) that carried forward the spirit of DVTE. The objective of this multiyear (FY2005 to FY2007) demonstration was to develop reconfigurable, deployable prototype training systems and to demonstrate effective team training in virtual environments across dynamic, heterogeneous, networked, interoperable systems. System Design The VIRTE program had a significant interest in using the testbed as a tool to conduct research in how marines learn. One of the limitations that the USMC had with the original DVTE prototypes was that each of the developers produced
38
Integrated Systems, Training Evaluations, and Future Directions
proprietary components. This led to a complex, difficult-to-maintain system that required expensive licenses. While this is an acceptable situation for a demonstration program, it is not for a research system in which the code must be made freely available to a large number of research teams. One of the first tasks of the MOT2IVE team was to develop GOTS equivalents of all of the proprietary platforms in the original DVTE. Rather than develop each component in a vacuum, the new generation of DVTE was designed as a completely integrated system from the start. Another major change was to replace the proprietary AAcuity IG with an open source IG, DELTA3D. As discussed previously, JSAF, a GOTS product, continued to provide the backbone of the simulation infrastructure, and the DoD HLA that is used by JSAF allowed the simulations to share data. DEVELOPMENT OF SYSTEM PROTOTYPES Infantry Tool Kit COTS simulations have evolved significantly since the original DVTE prototype was tested. VBS2 in the ITK will be replacing VBS-1, and the Marine Corps secured an enterprise license directly from Bohemia Interactive. This next generation FPS application contains many of the improvements that have been requested from the U.S. Army, Marine Corps, and Allied Forces to VBS-1 to include the importation of real world terrain and an HLA networking capability (Marine Corps Systems Command, 2006). A DARPA-sponsored simulation called Tactical Iraqi is in the ITK for language and culture training. The ONRdeveloped MAGTF XXI, Close Combat Marines, and TacOpsMC are a part of the ITK. The army-developed Recognition of Combat Vehicle series, including improvised explosive device and suicide bomber, are also included. Combined Arms Network The VIRTE team quickly built a testbed federation that could be freely shared with researchers and developers. The team used a spiral development model, with four major integration events and testbed releases. To support Fire Support Team training experiments, the team expanded and improved the forward observer personal computer simulation, originally developed by students at the Naval Post Graduate School. Due to the critical need to train joint terminal air controllers (JTAC), the VIRTE team added a commercial head-mounted display and inertial tracker to the DVTE suite and demonstrated the utility of the CAN to train JTAC skills. The marines have subsequently purchased over 40 JTAC trainers based on this prototype to be fielded in order to meet an urgent need for JTAC training across the Marine Corps. Programmatics As we discussed earlier, DVTE got its start as a result of a Congressional Plusup. Although formally established as a Program of Record in April 2004,
U.S. Marine Corps Deployable Virtual Training Environment
39
DVTE’s funding profile did not have the required funding necessary to go beyond the initial prototype built by the Congressional Plus-up. In recent years, Congress has appropriated “supplemental funding” to support the Global War on Terrorism. The USMC took advantage of this funding and began fielding DVTE suites of computers with the ITK in FY2007. DEMONSTRATIONS AND TRANSITIONS One of the challenges for the DVTE program has been educating marines on the available capabilities. In addition to demonstrations at trade shows, such as Interservice/Industry Training, Simulation, and Education Conference and Modern Day Marine, there has been a concerted effort to educate marine leadership. One thrust is supporting the deployment through the formal simulation centers located with each marine expeditionary force. In addition, there have been a number of experiments at marine formal schools, such as The Basic School, the Infantry Officer Course, and the Expeditionary Warfare School. Suggested improvements from marines and experimenters were prioritized and added to the development program. This exposure helps to educate young officers about capabilities that will be available to them when they reach the fleet. CONCLUSION DVTE continues to evolve and absorbs new capabilities from the science and technology and commercial communities to meet the critical training needs of the warfighter. As marines become more familiar with using simulations, we expect DVTE to become an integral part of USMC training and education. REFERENCES Bailey, M., & Guckenberger, D. (2002). Advanced distributed simulation efficiencies & tradeoffs: DVTE, DMT, and BFTT experiences. Proceedings of the Interservice/Industry Training, Simulation and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. Cosby, N. L. (1999). SIMNET—An insider’s perspective. Simulation Technology, 2(1). Retrieved April 8, 2008, from http://www.sisostds.org/webletter/siso/iss_39/ art_202.htm Feldmann, P., & Muller, P. (1997). DARPA STOW synthetic forces. Proceedings of the 19th Interservice/Industry Training, Simulation and Education Conference (pp. 461– 471). Arlington, VA: National Training Systems Association. Howard, M., Hoff, B., & Tseng, D. (1995). Individual combatant development in ModSAF. Proceedings of the Fifth Conference on Computer Generated Forces and Behavior Representation (pp. 479–486). Orlando, FL: UCF Institute for Simulation and Training. Krulak, C. C. (1997, April). Military thinking and decision making exercises (Marine Corps Order 1500.55). Washington, DC: US Marine Corps Headquarters, Department of the Navy.
40
Integrated Systems, Training Evaluations, and Future Directions
Lenoir, T. (2000). All but war is simulation: The military-entertainment complex. Configurations, 8(3), 289–335. Lenoir, T., & Lowood, H. (2005). Theaters of war: The military-entertainment complex. In H. Schramm, L. Schwarte, & J. Lazardzig (Eds.), Collection, laboratory, theater: Scenes of knowledge in the 17th Century (pp. 427–456). Berlin: Walter de Gruyter. Retrieved April 8, 2008, from http://www.stanford.edu/dept/HPST/TimLenoir/ Publications/Lenoir-Lowood_TheatersOfWar.pdf Lukes, G. (1997). DARPA STOW synthetic environments. Proceedings of the 19th Interservice/Industry Training, Simulation and Education Conference (pp. 450–460). Arlington, VA: National Training Systems Association. Marine Corps Systems Command, PM Training Systems (2006, November). Program Manager for Training Systems, products & services information handbook (pp. 36– 38). Orlando, FL: Author. Available from http://www.marcorsyscom.usmc.mil/ trasys/ trasysweb.nsf/All/21A46B83F0CFFBB085256FC4005BBA61 National Defense Authorization Act of 2001, Pub. L. No. 106-398, 177 (2000). Riddell, R. (1997). Doom goes to war. Wired, 5(4), pp. 1–5. Technical Director, Training and Education Command. (2001, February). Report to Congress: Shipboard simulators for Marine Corps operations. Quantico, VA: Author. Zeswitz, S. (2001). Shipboard simulation system for Naval combined arms training. Proceedings of the Interservice/Industry Training, Simulation and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association.
Chapter 5
INFANTRY AND MARKSMANSHIP TRAINING SYSTEMS Roy Stripling, Pete Muller, Richard Schaffer, and Joseph Cohn Today’s military must train greater numbers of individuals more quickly than in the past, and these learners must master a list of knowledge, skills, and abilities (KSAs) that are continually adapting in response to evolving threats. As part of an overarching training strategy, virtual environments (VEs) may offer one of the most potent tools. A VE is any set of technologies that allows a user to interact with a computer-simulated environment. Because they are primarily software driven, VEs can be quickly updated. Additionally, the footprint for the hardware supporting these applications is generally small enough to be deployed with troops. Moreover, many new KSAs involve maneuvers that are difficult or too risky to train in live environments or that require significant instructor intervention to impart. VEs allow a level of safety and instructor oversight that cannot be duplicated using live training. The challenge lies in understanding the content and interaction requirements that infantry VEs must satisfy in order to deliver on their promise of effective training. The basic KSAs of infantry warfighting include marksmanship, room entry, rapid decision making, team communication, team coordination, and situational awareness. While many of these KSAs are similar to those required for other domains, developing useful infantry training simulations is more challenging. This is because infantry tasks involve direct interaction between the warfighter and the real world. By contrast, vehicle simulations place artificial instruments between the user and the simulated environment. It is relatively easy to build vehicle VEs that are similar to those of the actual system. It is much more difficult to do this for dismounted infantry. INFANTRY VIRTUAL TRAINING SYSTEMS Marksmanship Systems Marksmanship training was one of the earliest applications of infantry-oriented VEs. In their simplest forms, these training systems focus on acquiring and
42
Integrated Systems, Training Evaluations, and Future Directions
removing targets. Other skills, such as room clearing, communicating, and maintaining situation awareness, are not covered. Consequently, the associated range of interaction requirements need not be addressed. Even with this seemingly basic skill, however, many interaction challenges need to be addressed, such as field of view, atmospheric variables, and weapon dynamics. The earliest successful system was the Indoor Simulated Marksmanship Trainer (ISMT) developed by FATS, Inc. The early FATS systems used demilitarized weapons that were instrumented with a coded laser that was initiated by the trigger pull. The coded laser enabled the system to distinguish shots fired from different weapons at the same screen. The weapon was also instrumented with sensors that detected other important marksmanship attributes, such as the position of the safety switch. The weapons were given firing recoil powered by compressed carbon dioxide (CO2) gas from an external supply. Early VE systems were video based, and the scenarios presented shoot/no shoot situations. The videos were hand-coded frame by frame to indicate where in each video frame the humans were located. The video was projected on a screen and a laser detector determined if and where the people in the video were hit. Eventually, the FATS system added a full computer-generated VE in additional to the video. The latest versions untether the weapons by providing CO2 gas for recoil through a replaceable magazine. Bluetooth communications is used to communicate from the weapon to the computer system. The U.S. Army procured a similar marksmanship trainer known as the Engagement Skills Trainer 2000, produced by Cubic Corporation. Functionally, it is very similar to the current generation of ISMT, and several other systems on the market are also functionally similar. The largest differences between them are their specific VEs and their weapons implementation.
Immersive Systems—Training More Complex KSAs Marksmanship is a critical skill that all infantry must have, but dismounted combat requires many more complex skills as well. These include the ability to maneuver, maintain situation awareness, team coordination, communication, and decision making in dynamic and uncertain environments. The army pioneered VEs for complex skills training with the Soldier Visualization Station (SVS) by Reality By Design (now Advanced Interactive Systems). This system combines a rear-projection VE with weapon and head position tracking via InterSense inertial-acoustic trackers. This technology does not have the marksmanship level accuracy of laser hit detection systems, but does allow for a much more interactive experience. Unlike the marksmanship trainers in which the trainee stays in one place in the virtual world, this class of system allows the individual and team to move through the VE. A careful review of a sample of the different VE systems available provides a glimpse of the many different interaction technologies that are necessary to ensure that these more complex effects are enabled within the training environment. These can be analyzed in terms of a range of characteristics including
Infantry and Marksmanship Training Systems
43
operating system, human sensory modalities stimulated, navigation and/or interaction methods, footprint, and so forth (see Table 5.1). As Table 5.1 suggests there are many different approaches to delivering effective interactions in dismounted infantry VE systems. The main challenge with each of these is demonstrating that a selected approach contributes directly to enhanced performance once the trainees are faced with actual combat or combat-like situations.
RESEARCH ON TRAINING EFFECTIVENESS The U.S. military has a long history of pursuing and supporting technology advancement. Over the past few decades, this focus has both encouraged and benefited from the rapid advances in computer technologies and led to, among other things, multiple generations of simulator and VE training systems. However, this rapid advance has come at a cost. The pace of new system development and deployment often exceeds the rate at which these systems can be evaluated for their training effectiveness. This raises the risk that training systems developed for and purchased by the military may go unused or underused by the training community. For these reasons, the general approach taken in researching these systems is not to focus on specific applications or on specific pieces of equipment, but rather to identify the relative limitations that different VE approaches have on training (see Table 5.2). For example, rather than determine if a specific head-mounted display (HMD) supports a specific training objective, researchers may focus on the impact of limited field of view in a training objective so that any HMD can be evaluated based on this criterion. One example of this approach is the experiments that were conducted under the Office of Naval Research program Virtual Technologies and Environments (VIRTE). VEs for dismounted infantry may make use of handheld controllers such as joysticks and gamepads; they may involve optically tracking body movements as the user walks across a monitored space or as he or she walks in place, or they may involve allowing the user to walk naturally across a moving platform (examples of these include the omnidirectional treadmill (Darken, Cockayne, & Carmein, 1997) and the VirtuSphere [VirtuSphere, Inc.]). Rather than test all of these interfaces and all of the commercial systems that make use of them, researchers sample a cross-section of these interfaces and evaluate them while users undertake the same set of tasks (see Figure 5.1). In some cases, the results of these experiments have been mundane (once mastered, users seem to be able to achieve the same level of precision and accuracy with any of these interfaces), or fairly narrow in scope (proprioceptive/kinesthetic feedback for locomotion is relatively unimportant EXCEPT where visibility is poor AND the user’s movement in this portion of the VE will include rotations). Although the number of unique locomotion interfaces means that the number of experiments needed is still relatively large, the hope is that this approach will reduce the need to evaluate each and every system. Once general principles are
Table 5.1.
Partial Summary of VE Infantry and Marksmanship Trainers
Category
Product
Visual
Locomotion
360/180 View?
Rifle Prop?
6OF Aiming
Footprint
PC only
VBS-1
Operation Flashpoint
Mouse & keyboard
Tracking
Yes
No
No
Desktop
PC only
VBS-2
Mouse & keyboard
Yes
No
No
Desktop
PC only
RealWorld
Mouse & keyboard
Yes
No
No
Desktop
PC only
America’s Army
Unreal Tournament
Mouse & keyboard
Yes
No
No
Desktop
Projection
Flatworld
Rear, Gamebryo
Real
Inertial
Yes
Projection
SVS
Rear, Proprietary
Joystick on weapon
Inertial
No
Yes
Projection space
Projection
VCCT
Front, Proprietary
Joystick on weapon
Inertial
No
Yes
Projection space
Projection
VICE
Front, Proprietary
Joystick on weapon
Inertial
No
Yes
Laser
Projection space
Projection
VIRTE Screen Shooter
Front, Gamebryo
Joystick on weapon
None
No
Yes
Laser
Projection space
Projection space
Immersive VE
VCCT
Iglasses
Knee joystick Inertial
Yes
Yes
No
Immersive VE
IGS
V8 HMD
Joystick on weapon
Inertial
Yes
Yes
No
6´ × 6´
Immersive VE
Expedition DI
eMagin HMD
Joystick on weapon
Inertial
Yes
Yes
No
Limited
Immersive VE
VIRTE Pointman
HMD, screen or monitor
Gamepad and Inertial foot pedals
Yes
No
Yes
Desktop
Immersive VE
VIRTE Pod
NVisor
Joystick on weapon
Inertial or optical
Yes
Yes
Yes
10´ × 10´
Immersive VE
VIRTE Gaitor
NVisor
Walk-inplace
Optical
Yes
Yes
Yes
12´ × 12´
Immersive VE
Virtu-Sphere
Iglasses
Walk inside sphere
Acoustic and inertial
Yes
Yes
No
10´ × 10´
46
Integrated Systems, Training Evaluations, and Future Directions
Table 5.2. Partial Summary of Recent VE Effectiveness Evaluations Aspect Investigated
Determining Effect of . . .
Conclusions
Reference
See Chapter
Locomotion Different interface locomotion interfaces for equivalency of precision and accuracy and of spatial knowledge encoding/ recall.
Once sufficiently trained, precision and accuracy are equivalent across tested systems. However, pattern of movement in the VE may differ notably from in the real world. Also, bodytracked systems improve user position awareness when performing rotations in visually impaired environments. Bodytracked VEs also support better spatial recall.
Cohn, Whitton, Razzaque, Becker, & Brooks, 2004; Farrell et al., 2003; Grant & Magee, 1998, Stripling et al., 2006; Whitton et al., 2005
Templeman, Sibert, Page, and Denbrook, Volume 2, Section 1, Chapter 7
Field of FOV on view (FOV) performance
Mixed: no measurable effect or benefits from wider FOV
Arthur, 2000; Browse & Gray, 2006; Johnson & Stewart, 1999
Bolas and McDowall, Volume 2, Section 1, Chapter 2
Passive/ active haptics
Haptic feedback on performance or sense of presence
Haptics can enhance sense of presence in VE and improve performance of infantry skills
Insko, Meehan, Whitton, & Brooks, 2001; Lindeman, Sibert, MendezMendez, Patil, & Phifer, 2005
Bas¸dog˘an and Loftin, Volume 2, Section 1, Chapter 5
Auditory cues
Auditory cues on task performance and/or sense of presence
Audio cues can enhance sense of presence and performance in memory and localization tasks
Larsson, Vastfjall, & Kleiner, 2002; Sanders & Scorgie, 2002; Shilling, 2002
Sadek, Volume 2, Section 1, Chapter 4
Multimodal cues
Multimodal cueing on performance and/or sense of presence
Multimodal cues improve sense of presence and response times
Hecht, Reiner, & Halevy, 2006; Milham, Hale, Stanney, & Cohn, 2005a; Milham, Hale, Stanney, & Cohn, 2005b
Bas¸dog˘an and Loftin, Volume 2, Section 1, Chapter 5
Figure 5.1. Diversity of Locomotion Interfaces for Virtual Environments
48
Integrated Systems, Training Evaluations, and Future Directions
extracted, purchasers of new systems will be able to make reasonable assessments of the likelihood that a system will meet their training needs. This experimental approach should also be useful for evaluations of visual and auditory interfaces as well. Increases in screen resolution, field of view, refresh rates, and reduction in lag are likely to continue in the ensuing years; however, by following this human centric approach, the assessment of training systems should provide developers, buyers, and users with a better sense of which new developments will make a difference and which ones will not. CONCLUSIONS Development of VEs for infantry and marksmanship have been driven by several factors, including the need to reduce cost, increase safety, increase training throughput, and provide a greater diversity of experiences during training. In the foreseeable future, shifting mission requirements will likely drive new developments into more cognitive and complex skill domains, such as basic foreign language skills and the awareness and understanding of cultural practices, local customs, and local laws. The successful VE infantry trainer of the future will likely include avatars driven by advanced behavior models that incorporate this type of information, as well as interfaces that permit the user/trainee to interact with the avatar equally at this level. Many of these needs may still be met using lower cost desktop or laptop interfaces. However, high end immersive VEs that provide a more complete and interactive representation of the full environment will also be in demand. These systems will allow advanced students to train in an environment where they must contend with both the physical and mental challenges of infantry duties. REFERENCES Arthur, K. (2000). Effects of field of view on performance with head-mounted displays. Unpublished doctoral dissertation, University of North Carolina, Chapel Hill. Browse, R. A., & Gray, D. W. S. (2006). Display conditions that influence wayfinding in virtual environments. Conference on Human Vision and Electronic Imaging XI. Human Vision and Electronic Imaging XI. (Article No. 605713; pp. 5713). Bellingham, WA: International Society for Optical Engineering. Cohn, J., Whitton, M., Razzaque, S., Becker, W., & Brooks, F. (2004). Information presentation and control method impact performance on a complex virtual locomotion task. Proceedings of the 48th Annual Meeting of the Human Factors and Ergonomics Society. Santa Monica, CA: Human Factors and Ergonomics Society. Darken, R. P., Cockayne, W. R., & Carmein, D. (1997). The omni-directional treadmill: A locomotion device for virtual worlds. Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology (pp. 213–221). New York: Association for Computing Machinery. Farrell, M. J., Arnold, P., Pettifer, S., Adams, J., Graham, T., & MacManamon, M. (2003). Transfer of route learning from virtual to real environments. Journal of Experimental Psychology-Applied, 9(4), 219–227.
Infantry and Marksmanship Training Systems
49
Grant, S. C., & Magee, L. E. (1998). Contributions of proprioception to navigation in virtual environments. Human Factors, 40(3), 489–497. Hecht, D., Reiner, M., & Halevy, G. (2006). Multimodal virtual environments: Response times, attention, and presence. Presence: Teleoperators and Virtual Environments, 15 (5), 515–523. Insko, B. (2001). Passive haptics significantly enhances virtual environments. Unpublished doctoral dissertation, University of North Carolina, Chapel Hill. Insko, B., Meehan, M., Whitton, M., & Brooks Jr., F. P. (2001). Passive haptics significantly enhances virtual environments. Proceedings of the 4th Annual Presence Workshop. Johnson, D. M., & Stewart, J. E. (1999). Use of virtual environments for the acquisition of spatial knowledge: Comparison among different visual displays. Military Psychology 11(2), 129–148. Larsson, P., Vastfjall, D., & Kleiner, M. (2002, June). Better presence and performance in virtual environments by improved binaural sound rendering. 22nd International Congress of the Audio-Engineering-Society: Virtual, Synthetic, and Entertainment Audio (pp. 31–38). Lindeman, R. W., Sibert, J. L., Mendez-Mendez, E., Patil, S., & Phifer, D. (2005). Effectiveness of directional vibrotactile cuing on a building-clearing task. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 271–280). New York: ACM. Milham, L., Hale, K., Stanney, K., & Cohn, J. (2005a, July). Selection of metaphoric and physical fidelity multimodal cues to enhance virtual environment (VE) performance. Paper presented at the VR International Conference, Las Vegas, NV. Milham, L., Hale, K., Stanney, K., & Cohn, J. (2005b, July). Using multimodal cues to support the development of situation awareness in a virtual environment. Paper presented at the 1st International Conference on Virtual Reality, Las Vegas, NV. Sanders, R. D., & Scorgie, M. A. (2002, March). The effect of sound delivery methods on a user’s sense of presence in a virtual environment. Unpublished master’s thesis, Naval Postgraduate School, MOVES Institute, Monterey, CA. Shilling, R. D. (2002, June). Entertainment industry sound design techniques to improve presence and training performance in VE. Paper presented at the European Simulation Interoperability Workshop, London. Stripling, R., Templeman, J. N., Sibert, L. E., Coyne, J. T., Page, R. G., La Budde, Z., & Afergan, D. (2006, May). Identifying interface limitations for virtual environment training systems. Paper presented at the 55th meeting of the DoD Human Factors Engineering Technical Advisory Group (DoDHFETAG), Las Vegas, NV. Whitton, M., Cohn, J., Feasel, J., Zimmons, P., Razzaque, S., Poulton, S., McLeod, B., & Brooks, F. (2005). Comparing VE locomotion interfaces. Proceedings of IEEE Virtual Reality (pp. 123–130). Los Alamitos, CA: IEEE Computer Society.
Chapter 6
FIELDED NAVY VIRTUAL ENVIRONMENT TRAINING SYSTEMS Daniel Patton, Long Nguyen, William Walker, and Richard Arnold The U.S. Navy has invested significantly in virtual environment (VE) simulation technology to address contemporary training requirements. VE simulation has replaced many standing training devices (TDs) and tactical training equipment (TTE), which historically were expensive to design, develop, and sustain throughout their lifecycles. TDs and TTE typically required significant building modifications and encumbered a large “footprint.” Many of these devices required high power consumption, as well as other ancillary equipment to operate and maintain. In many instances, these training suites also required extensive manpower to monitor student performance during training events. Often four to six additional personnel were needed for the complementary watch stations when training a single student. Also, since only one student could be trained at a time, a significant bottleneck occurred, causing many courses to be longer than necessary. Because it solves those problems associated with the “large footprint” of earlier TD and TTE efforts, VE simulation has been used to reduce training course durations and annual costs for sustainment of these training capabilities. The implementation of VE training capabilities is preceded by a front-end analysis to ensure the application is addressing appropriate, contemporary training requirements. Post-implementation is followed by a training effectiveness evaluation (TEE) to validate the benefits of virtual training and identify any performance “deltas” that may remain. The Office of Naval Research (ONR) and Naval Air Warfare Center Training Systems Division, through the Virtual Environment Training Technologies and Virtual Technologies and Environments (VIRTE) programs, made some of the first efforts to research, test, and develop VE training technologies and field them into operational military training settings. Critical development issues in the design and fielding of these three currently fielded systems cultivated under these programs are described in this chapter: Virtual Environment Submarine
Fielded Navy Virtual Environment Training Systems
51
(VESUB), Conning Officer Virtual Environment (COVE), and Virtual Environment Landing Craft, Air Cushion (VELCAC).
VIRTUAL ENVIRONMENT SUBMARINE Submarine piloting and navigation skills are taught at the Submarine Training Facility in Norfolk, Virginia, and the Naval Submarine School in Groton, Connecticut. Traditionally, the classroom based training is augmented with TTE, TDs, or simulation based exercises. While this provides basic ship-handling skills, many of the finer points of submarine ship handling can be internalized only through hands-on experience. However, for dangerous maneuvers, such as navigating a surfaced submarine through a harbor or channel, opportunities for on-the-job training are limited and are generally reserved for the more experienced officers of the deck (OOD). For such tasks, a high fidelity simulation based trainer provides a safe and effective training solution (Nguyen, Cohn, Mead, Helmick, & Patrey, 2001; Hays, Seamon, & Bradley, 1997). One potential simulation based training solution is the VE, which can provide for a high sense of presence that is comparable to the critical on-the-job experience that is unavailable to the novice junior officer (JO). The effectiveness of such a training system depends heavily on the level of fidelity for the training cues (for example, auditory and visual) and the intuitiveness of the trainee interface, which in turn, depends upon the degree to which VE technology areas have matured. Fortunately, VE technology has matured rapidly in the areas of threedimensional (3-D) visualization and audio cues, the most critical sensory modalities in the submarine handling domain. Similarly, speech recognition and position-tracking VE interface technologies exist; they have been successfully used in other application domains, including training. In order to determine if VE was a viable training technology for submarine handling, in the late 1990s, ONR began a three-stage process: (1) identifying the training requirements for submarine piloting, (2) developing a prototype VESUB system, and (3) evaluating the effectiveness of this VE system for training. The first stage of the VESUB program was identifying training requirements. VESUB sought to enhance novices’ skills in rare, difficult, and dangerous submarine maneuvers, such as harbor and channel navigation. A task analysis and determination of the required training cues were conducted. Submarine commanding officers and other senior officers were interviewed to gain a clear understanding of how the operational task was accomplished and how they decided when a JO was qualified to be designated as an OOD. The consensus answer for qualification was when a JO developed the “seaman’s eye”: the total situation awareness of the ship-handling environment and the ability to safely maneuver the vessel in all conditions (Hays et al., 1997). Additional submarine subject matter expert (SME) interviews and focused group discussions further defined this seaman’s eye concept as having 8 perceptual and 12 cognitive task components (Tables 6.1 and 6.2). These perceptual
52
Integrated Systems, Training Evaluations, and Future Directions
Table 6.1. Eight Perceptual Components of the “Seaman’s Eye” (Hays et al., 1997) Perceptual Components
1. Locating navigation aids 2. Judging distance 3. Identifying turn start/stop 4. Avoiding obstacles 5. Sense of ship’s responsiveness 6. Recognizing environmental conditions 7. Recognizing equipment failures 8. Detecting and filtering communications
and cognitive tasks were mapped to 73 overall training objectives. The training objectives provided quantifiable measures of performance to facilitate training evaluation and optimization of the training system design (Hays et al., 1997). The 8 critical perceptual elements and associated training objectives focused on supporting the training task, which was primarily visual. Extensive measures were taken to establish adequate visual fidelity in the VESUB system. Adequate visual resolution, instantaneous field of view, scene refresh rate, unlimited gaze angle, cultural features, detailed ship models, marine environment cues, weather and time of day cues, and ship visual motion cues were essential factors. VE technologies in the areas of high resolution head-mounted display (HMD), eye or head tracking, high end image generation, visual models, hydrodynamic Table 6.2. Twelve Cognitive Components of the “Seaman’s Eye” (Hays et al., 1997) Cognitive Components
1. Understanding visual cues and their representations on navigation charts 2. Understanding relative size, height, range relations, and angle on the bow 3. Understanding advance and transfer 4. Understanding the effects of tides, currents, wind, and seas 5. Understanding rules of the road 6. Understanding relative direction and speed 7. Understanding methods to differentiate and prioritize traffic contacts 8. Understanding ship’s operation 9. Understanding methods to deal with uncooperative traffic 10. Understanding operation of ship’s systems 11. Understanding how to take corrective actions 12. Understanding communication procedures
Fielded Navy Virtual Environment Training Systems
53
modeling, and environmental models were areas that warranted investment for VESUB. While not as dominant as visual, audio cues were also essential in all 8 perceptual tasks identified in the seaman’s eye concept. Since piloting the vessel is conducted exclusively through voice commands from the OOD or trainee to the crew, a speech interface was also essential. Accordingly, high fidelity audio cues, such as spatial audio for fog signals, environmental sounds, and engine sounds added to the realism of the virtual environment. The human computer interface (HCI) involves primarily visual and audio cues and eye-position tracking and voice-recognition interactions. The user interacts primarily through speech commands, and the VE responds with verbal feedback. Critical developmental issues include determining the level of necessary immersion, defining speech recognition requirements, designing the hydrodynamic models, identifying visual/auditory cues, and determining the computer platform sufficient to run the entire simulation. For the VESUB simulation, the 12 cognitive elements and associated training objectives drove the design of the training scenario and development of instructional tools. The trainee’s level of mastery and understanding of the task had to be measured and the scenario had to be controlled in real time. Accordingly, an instructor operator station (IOS) with numerous features, including scenario generation, scenario control, performance measurement, and performance tracking, was warranted for VESUB. With the training task analysis completed and a good understanding of the perceptual and cognitive tasks required, VESUB’s second stage, the prototype development stage, began. The analyses indicated that a high resolution (high res) HMD technology was needed in order to accommodate the visual cue fidelity required for the submarine handling task. nVision Inc. developed a fully immersive, 1,280 × 1,024 pixels, 40 horizontal (H) × 30 vertical (V) deg field of view (FOV), cathode ray tube (CRT), monoscopic view HMD, which was integrated into the VESUB prototype. While this was the highest resolution available for HMDs, it did not provide enough detail for 20/20 vision. However, this HMD resolution was adequate to meet the training objectives. Since the visual scene is considered a far-field environment, stereoscopy is not required and a monoscopic 100 percent overlap HMD was used, which renders only one visual image that is the same for both eyes. While the HMD provides for 40H × 30V degree instantaneous FOV (Figures 6.1 and 6.2), the training and system requirement was full range FOV, that is, 360° horizontal, vertical, and rotation (pitch, yaw, and roll). The high level design also required the viewpoint to be anywhere within the submarine bridge. Therefore, a 6 degrees of freedom (6 DOF) position and angle tracking system was necessary to determine trainee eye position and gaze angle. A Polhemus, magnetic based, 6 DOF, head tracking system was integrated into the prototype to meet this requirement. Since the primary method for the OOD (trainee) to pilot the submarine is through speech commands to the crew below deck, a highly accurate voice-
54
Integrated Systems, Training Evaluations, and Future Directions
Figure 6.1.
Naval Officer in VESUB Trainer. Courtesy of B. Walker.
recognition system was required. To increase accuracy and reliability, the prototype system design was optimized for a task specific “lexicon” of only 250 words. The voice synthesis feedback was similarly constrained. A Silicon Graphics, Inc. (SGI) Onyx provided the image generation capability to handle the high graphics demand. Two SGI Indy desktops were included in the instructor operator station to author, monitor, and control the scenario and track the trainee’s performance real time. The virtual geospecific area included all training relevant visual and audio cues, including the submarine features, ship traffic, ocean waves and wakes, cultural objects, buoy sounds, weather effects, and so forth. The underlying software used hydrodynamics models to approximate ship dynamics accounting for such variables as wind and current. With a working prototype, VESUB’s third stage, the evaluation stage, began. VESUB underwent two system evaluations, formative evaluation and TEE. Formative evaluations using submarine SMEs provided for iterative improvements to the prototype. TEEs determined the effectiveness of VESUB and offered recommendations on how the technology could be effectively integrated into navy training (Seamon, Bradley, & Hays, 1990). Formative evaluations of the prototype were performed, with daily guidance from submarine SMEs. Inputs from the active fleet and SMEs at the submarine school were also collected periodically as the prototype matured through the
Fielded Navy Virtual Environment Training Systems
Figure 6.2.
55
View from the VESUB Bridge. Courtesy of B. Walker.
iterative design phases. These provided continuous improvements in functionality and trainee interface (Hays et al., 1997). A TEE found several areas in which VESUB improved performance, such as an increase of 57 percent in contact management skills, a decrease of 44 percent in reaction time during a man-overboard event, an increase of 40 percent in using commands during a yellow-sounding event, an increase of 39 percent in checking range markers, an increase of 39 percent in visually checking the rudder, an increase of 29 percent in using correct commands during a man-overboard event, and an increase of 13 percent in issuing correct turning commands (Hays, Vincenzi, Seamon, & Bradley, 1998). VESUB is currently fielded in the following six locations: Groton, Connecticut; Norfolk, Virginia; Kings Bay, Georgia; San Diego, California; Bangor, Washington, and Pearl Harbor, Hawaii. The VESUB system today still uses an SGI computer for the visual image generation and a personal computer (PC) IOS. Newer hardware includes a lighter NVIS, Inc. hi-res liquid crystal on silicon HMD and an Intersense 6 DOF position tracker. IBM’s ViaVoice is used for the speech recognition and speech synthesis. Also, the system has evolved to include a radar, voyage management system (VMS), and global positioning system (GPS) displays. These displays are switched to the student HMD view when looking at the display in the virtual scene and pressing a button. Planned
56
Integrated Systems, Training Evaluations, and Future Directions
improvements continue and include changing the image generator from an SGI computer to a high end PC. Finally, VESUB requires geospecific harbors for students to practice navigating the peculiarities of particular waterways. Currently 32 harbors from around the world are modeled for the VESUB system. These models are compatible and directly shared with COVE, the next VE training system discussed.
CONNING OFFICER VIRTUAL ENVIRONMENT The COVE training system at Surface Warfare Officers School (SWOS), Newport, Rhode Island, is a derivative of the VESUB trainer (Figure 6.3). The goals of the COVE research project were to demonstrate a high fidelity ship simulation on a PC platform, with speech recognition and an intelligent tutor. The training goal in the COVE system was for the OOD student to develop the cognitive skills with the virtual reality system supplying the requisite visual cues to safely command the ship’s movement. These factors drove the need for a high fidelity visual environment. COVE arose from several of the lessons learned from the VESUB prototype. Specifically, the challenges of cost, instructor workload, training transfer, and deployability were addressed. The goals of COVE included reduced instructor operation requirements, a performance-driven system for training “seaman’s eye,” and development of both schoolhouse and deployed applications.
Figure 6.3. Officer Practicing Maneuvers in COVE Trainer. Courtesy of MC1 David Frech, Surface Warfare Magazine.
Fielded Navy Virtual Environment Training Systems
57
A central COVE concept was the interaction of artificial intelligence techniques with a VE designed for PCs. Initial applications included the DDG-51 class destroyer and the AOE-6 class supply ship for simulating underway replenishment (UNREP). Visual rendering was executed in PowerScene. The system was operable in either high fidelity immersive HMD mode or by use of a CRT and mouse. The primary interface for communication from the user to the system was through BBN Hark’s speech recognizer, and communication from the system to the user incorporated audio playback of prescripted responses to realistically simulate the helmsman or other bridge members. While the first iteration of COVE focused on an UNREP task, other tasks, such as harbor transit and pierwork, were being incorporated. This expansion allowed the navy to employ a novel method for determining the relevant cues necessary to support effective scenario development, data-driven knowledge engineering (DDKE) (Cowden, Burns, & Patrey, 2000). VESUB relied solely on interviews or verbal reports to establish these elements. However, this method is typically time consuming, costly, and error prone. DDKE utilizes fuzzy logic tools to represent those elements of the task that directly affect performance (Cowden et al., 2000). Once identified, these task elements can serve as a guide for designing a high fidelity VE training scenario (Cohn, Helmick, Meyers, & Burns, 2000). The embedded dynamic intelligent tutoring system (OMAR, with a Java interface) was designed with a newly commissioned officer in mind, but provides scenarios that even prospective commanding officers may find challenging. The tutor was derived from extensive interviews with surface warfare officers, shiphandling instructors, and several task analyses, both goals, operators, methods, and selection rules and Soar based (Norris, 1998; Tenney, 1999). The development was iterative in that the simulation was constructed first, with continual SME feedback. The tutor provided instant feedback and direction during the simulation, concentrating on critical decision points that the student must note and actions that must be taken to perform the evolution successfully. Student performance was recorded throughout the session and compared with a set of validated performance metrics. Students were provided grades that compared their performance with that of a prototypical expert ship handler. At the time of transition to a production COVE system, a market survey found the VShip had improved and met most of the COVE design requirements. Therefore, VShip was chosen as a cost-effective solution for COVE with the thought of continued development to meet all the research goals. Twelve systems were installed at the SWOS in Newport, Rhode Island. These systems are referred to as COVEs 1 and 2. They have a single visual channel powered by a high end PC to a HMD. Also included are PC-driven radar and VMS displays. COVEs 1 and 2 are used to train JOs in the basics of ship handling. Six more stations are installed as the COVE 3 systems. These have a threechannel, flat-panel display in addition to and as an alternative to the HMD. The COVE 3s are used to train senior officers who are prospective commanding or
58
Integrated Systems, Training Evaluations, and Future Directions
executive officers in the finer points of ship handling for their particular class of ship. Leveraging off the installed base of the COVE systems, a full mission bridge (FMB) was installed using the COVE software and 12 visual channels. It has a circular screen and displays a full 360°. The FMB is used to train antiterrorism force protection to middle- and senior-grade officers. It can be linked to the other COVE systems for combined multiship tactics training. The latest training system installed is a bridge navigation trainer for the littoral combat ship (LCS). It again uses the underlying COVE software and has a fivechannel, flat-panel visual display. A HMD channel is provided for simulating driving the LCS from the ship’s side. It is being upgraded to be reconfigurable for both the LCS 1 and LCS 2 class of ships.
VIRTUAL ENVIRONMENT LANDING CRAFT, AIR CUSHION The landing craft, air cushion (LCAC) hovercraft transports marine amphibious forces materiel and personnel from ship to shore. Its core manning consists of a three-man crew: craftmaster, engineer, and navigator. The LCAC’s 17-week training course focuses primarily on simulation based training in the LCAC fullmission trainer (FMT), supplemented by live missions. After completion of preliminary training, students report to their operational commands where they continue in advanced qualification training. In 2002 the LCAC fleet was undergoing a service life extension program (SLEP), which included significant cockpit redesign. The SLEP effort was scheduled to upgrade three to four crafts per year, with a fully SLEP fleet planned by 2015. The FMT would not be upgraded to reflect the SLEP configuration until half the LCAC fleet had undergone SLEP, which was projected to occur in 2009 (Muller, Cohn, & Nicholson, 2003). Thus, a requirement was established for interim training for LCAC crews designated to operate SLEP LCAC prior to 2009. VELCAC was developed primarily to address this requirement. The development of the VIRTE VELCAC prototype trainer was accomplished via an iterative process of user input to refine training requirements, design, development, and user feedback informing each research and development spiral (Schaffer, Cullen, Cohn, & Stanney, 2003). It was determined during the process of requirements refinement that the SLEP changes would have the greatest impact on the engineer position, followed by the navigator position. The subsequent iterative design and feedback approach focused on the specific cockpit and interface features and functions that represented the most significant changes at the engineer and craftmaster stations. Schaffer et al. (2003) have described the high level design considerations for the PC based VELCAC science and technology prototype trainer. The development environment was Microsoft Visual Studio .NET. Configuration management was via Concurrent Versions System. The system graphics engine NetImmerse by Numerical Design Limited was selected due to its wellstructured application programming interface (API) and its ability to track the
Fielded Navy Virtual Environment Training Systems
59
latest graphics hardware and the lower level API capabilities via updates. For the synthetic natural environment, the VELCAC used technologies developed by the Defense Advanced Research Projects Agency (DARPA) synthetic theater of war (STOW) and the Defense Modeling and Simulation Office environmental federation. The terrain database was the U.S. Army Topographic Engineering Center’s Camp Lejeune terrain database. VELCAC also included ephemeris and illumination models. The ephemeris model was reused from the DARPA STOW program. The output of the ephemeris model was used to feed the illumination model, which was based on the U.S. Army Research Laboratory’s ILUMA (illumination under realistic weather conditions) model. Surface wave and ocean dynamics were based on models developed by the Naval Surface Warfare Center, Carderock Division, and craft dynamics were based on the LCAC full mission trainer craft dynamics model. To meet the requirement of distributed training, VELCAC was developed in accordance with the Department of Defense high level architecture, version 1.3. Two run-time infrastructures (RTIs) were used to implement HLA: RTI-1.3NG and RTI-s. VELCAC used the MCO2 federation object model (FOM) for FOM attribute data sharing (Schaffer et al., 2003). The VIRTE program used an iterative integration and transition approach. Since the VELCAC transition involved use of the trainer for interim SLEP “differences” training, the system was installed locally at the training site, Coastal Systems Station, Panama City, Florida. The product of each development spiral was pushed to the local site, with usability, functionality, and configuration feedback solicited from instructors and incorporated into subsequent development spirals. The transitioned prototype thus was the product of extensive user testing based design input. Task analysis was used to elicit the data required to enable task and mission performance standards to be incorporated into system design. Cognitive task analyses were performed to elicit training objectives, to support elements for the objectives, and to provide guidance on training scenario design. For the SLEP interim training, the key high level elements elicited by the task analyses were an interactive three-dimensional environment and live instrumentation for all three crew stations (Muller et al., 2003). A HCI evaluation was conducted to identify the sensory modalities that needed to be represented in the VELCAC and to provide a review of available technologies capable of representing the identified sensory modalities. Ultimately the evaluation identified two critical sensory modalities for the VELCAC system, vision and haptics (Muller et al., 2003). A system usability analysis was conducted to assess three critical usability factors: effectiveness, intuitiveness, and subjective perception. The results were in the form of ranked redesign recommendations elicited from users and SMEs, which informed design decisions during the iterative design and integration spirals. The VELCAC S&T prototype (Figure 6.4) addressed the LCAC community’s most urgent training requirements for the SLEP LCAC. Specifically, the engineer station, and to a lesser extent the navigator station, received greater attention than the craftmaster station. Post-transition, the Naval Sea Systems Command has
60
Integrated Systems, Training Evaluations, and Future Directions
Figure 6.4.
VELCAC Training System. Courtesy of R. Wrenn, Unitech.
funded development of full functionality of all three starboard cabin crew stations. VELCAC has continued to be incorporated into the SLEP differences course after the course was relocated from Coastal Systems Station, Panama City, Florida, to the operational units. The SLEP VELCAC software architecture, originally adapted from the legacy LCAC full mission trainer, is now serving as the baseline software architecture for the SLEP full mission trainer, scheduled for completion in fiscal year 2009 (FY2009). With VELCAC now on site at the operational LCAC units, other uses are now under consideration, such as currency training and mission rehearsal. Virtual environment training devices, such as VESUB, COVE, and VELCAC, have demonstrated the utility and flexibility of VE to address emergent naval operational training needs. VE component technologies have continued to rapidly improve in terms of cost and performance. As operational costs rise, the use of VE based training solutions to provide effective, affordable training is likely to increase significantly. The legacy of pioneering VE trainers such as VESUB, COVE, and VELCAC will be to provide a foundation of lessons learned and technologies to underpin the next generation of VE training systems. REFERENCES Cohn, J. V., Helmick, J., Meyers, C., & Burns, J. (2000). Training-transfer guidelines for virtual environments (VE). Proceedings of the 22nd Annual Interservice/Industry
Fielded Navy Virtual Environment Training Systems
61
Training Systems Conference (pp. 1000–1010). Arlington, VA: National Training Systems Association. Cowden, A., Burns J., & Patrey, J. (2000). Data driven knowledge engineering. Proceedings of the 22nd Annual Interservice/Industry Training Systems Conference (pp. 11– 12). Arlington, VA: National Training Systems Association. Hays, R. T., Seamon, A. G., & Bradley, S. K. (1997). User-oriented design analysis of the VESUB technology demonstration system (Tech. Rep. No. 97-013). Orlando, FL: Naval Air Warfare Center Training Systems Division. Hays, R. T., Vincenzi, D. A., Seamon, A. G., & Bradley, S. K. (1998). Training effectiveness evaluation of the VESUB technology demonstration system (Tech. Rep. No. 98003). Orlando, FL: Naval Air Warfare Center Training Systems Division. Muller, P., Cohn, J., & Nicholson, D. (2003, November). Developing and evaluating advanced technologies for military simulation. Paper presented at the 2003 Interservice/Industry Training, Simulation and Education Conference, Orlando, FL. Nguyen, L., Cohn J., Mead A., Helmick J., & Patrey, J. (2001, August). Real-time virtual environment applications for military maritime training. Proceedings of the HCI International Conference (pp. 864–868). Mahwah, NJ: Lawrence Erlbaum. Norris, S. D. (1998). Task analysis of underway replenishment for virtual environment ship-handling simulator scenario development. Unpublished master’s thesis, Naval Postgraduate School, Monterey, California. Schaffer, R., Cullen, S., Cohn, J., & Stanney, K. M. (2003, November). A personal LCAC simulator supporting a hierarchy of training requirements. Paper presented at the 2003 Interservice/Industry Training, Simulation and Education Conference, Orlando, FL. Seamon, A. G., Bradley, S. K., & Hays, R. T. (1999). VESUB Technology Demonstration: Project Summary (Tech. Rep. No. 1999-02). Orlando, FL: Naval Air Warfare Center Training Systems Division. Tenney, K. R. (1999). A virtual Commanding Officer, intelligent tutor for the underway replenishment ship-handling virtual environment simulator. Unpublished master’s thesis, Naval Postgraduate School, Monterey, California.
Chapter 7
VIRTUAL TECHNOLOGIES FOR TRAINING: INTERACTIVE MULTISENSOR ANALYSIS TRAINING Sandra Wetzel-Smith and Wallace Wulfeck II The Interactive Multisensor Analysis Training program, called IMAT, is a series of developmental efforts that have explored the use of virtual technologies for training and performance aiding since the early 1990s.1 This work has been conducted jointly by the Space and Naval Warfare Systems Center in San Diego, California, and by the Naval Surface Warfare Center, Carderock Division in West Bethesda, Maryland.2 The term IMAT refers to these development efforts and also refers to some products of these development efforts that are now used in day-to-day training and operations in the U.S. Navy. IMAT development has been done in the context of antisubmarine warfare (ASW). ASW is a branch of naval warfare in which many teams of people on submarines, ships, and aircraft employ sensors to detect, locate, classify, and, if necessary, interdict opposing submarines, while avoiding counterdetection or counterattack. The tasks involved in ASW include the following: • Choosing, configuring, placing, and operating sensors so as to detect an opposing submarine; • Locating or “localizing” the opposing submarine and determining speed, course, and depth—a process called “target motion analysis”; • Classifying or identifying the submarine once it is detected; • Maintaining contact with the submarine and, if necessary, attacking it.
Supervisory tasks include the following: • Planning operations so as to assure detection, while minimizing both search time and the number of ships, aircraft, and consumable sensors required; • Coordinating ASW operations with other naval tasks, such as air and missile defense; • Monitoring and adjusting tactics during ASW operations; • Analyzing or “reconstructing” completed ASW events; and • Planning exercises to develop and maintain skill among ASW practitioners.
Virtual Technologies for Training: Interactive Multisensor Analysis Training
63
These tasks are incredibly complex (Wulfeck & Wetzel-Smith, 2008) because they involve dynamic abstraction, multiple interacting sources of nonlinear variation, and both ambiguity and uncertainty. Further, these tasks may involve many people and platforms: a recent exercise in the Pacific involved about 30,000 individuals (3,000 of them directly participating in ASW operations), nearly 30 ships, dozens of aircraft, and cost many millions of dollars. Clearly, extensive training and experience is required to prepare people for the massive complexity of such exercises and, more importantly, for the possibility of real warfare operations. VIRTUAL ENVIRONMENTS IN THE IMAT PROJECT There is no generally accepted definition of the expression virtual environment. Characterizations range from “systems that use computers” to complete virtual reality systems that provide an immersive computer representation of a space in which users can move, change their point of view, interact with objects, and interact or collaborate with other users or simulated actors. The IMAT project has developed several different learning and performance support systems, including four distinct efforts with different goals and with different underlying technology developments: • Instructor- or student-controlled visualization tools for classroom learning, • Deployable systems for operational training, • Collaborative systems for collective training in multiship ASW operations, and • Command level training and performance support systems for senior level staff.
Shore School Based IMAT Training (1994–1998) Initial work developed visualizations (for example, for acoustic properties of rotating machinery) to explain physical phenomena that are the basis for passive acoustic detection and classification. This work began in the aviation warfare (AW) apprentice school and was subsequently extended to apprentice and advanced courses in all the ASW communities (air, surface, subsurface, and surveillance). These technologies have since transitioned into over 20 different training programs in the surface, subsurface, and air communities. Many are still in use, and some of the laboratories have been reimplemented to provide Web based individual interactive training. Figure 7.1 shows one view of a computer-modeled laboratory for understanding sources of acoustic energy emitted by submarines. The propulsion and other mechanical systems are animated, and their characteristics (such as speed or gearing arrangement) can be varied to illustrate their effect on the generation of acoustic signals. The sounds can be played aloud and shown on the sound spectrogram in the bottom part of the display. The acoustic laboratory includes modeled engines, gears, pumps, motors, generators, compressors, turbines, blowers, clutches, and other devices. They are coupled to a high fidelity acoustic
64
Integrated Systems, Training Evaluations, and Future Directions
Figure 7.1.
Acoustic Laboratory
simulator that can be “operated” to show how changes in operating mode and speed are related to changes in the visual and auditory displays. In addition to this acoustic laboratory, many other laboratories were developed that similarly illustrate such concepts as sound propagation in the ocean and the properties of acoustic sensors and sensor arrays. These laboratories were implemented on high end graphics workstations, which at the time provided the needed computation and display power. It is important to note that all these developments relied on the use of navy-validated models (for example, for propagation loss) and approved databases of oceanographic and atmospheric parameters. These included, for example, bathymetric “maps” of the ocean floor (because sound reflects from the bottom), and bathythermographic information (variation in temperature with depth, because temperature affects sound speed, which in turn affects ducting or focusing of sound due to refraction). A good example is the propagation loss laboratory shown in Figure 7.2, which allows a visual exploration of sound propagation paths due to reflection and refraction. In this display, the leftmost panel is a color code for the amount of loss in decibels (dB) (here coded in gray scale due to printing limitations). Next, a sound speed profile (SSP) is displayed on the left edge of the main display. (Sound speed is inversely related to refractivity, which is a function of pressure,
Virtual Technologies for Training: Interactive Multisensor Analysis Training
Figure 7.2.
65
Propagation Loss Laboratory
temperature, and salinity variation with depth.) The bottom type, SSP, and bottom contour data can be manually entered or extracted from high resolution databases. The top right panel shows an example full-field plot of energy loss. All the factors that affect transmission loss, such as spreading, absorption, or reflection by the bottom, and scattering at the bottom and surface, are modeled and contribute to the interactive display. In this example, it is easy to see several different ways that acoustic energy may propagate from a source at the upper left, including direct spreading, bottom bounce, and refraction. The striations in the display are a typical pattern resulting from in-phase or out-of-phase multipath interference. Aside from building the visualizations themselves and implementing the physics based computational models that drive them, the main developmental questions involved whether such advanced visualization technologies could improve individual learning in school, for example, by providing context and underlying physical explanation. Evaluations were conducted in six different training courses ranging in length from 3 to 65 days, with the number of subjects ranging from 47 to 117. In all these courses (and in all other IMAT work), care was taken to apply modern principles of instructional science in the instructional design. These included (a) use of cognitive objectives, (b) scenario and context based explanations and examples, (c) development of appropriate mental models, (d) use of the laboratories for “what-if ” explorations, and (e) tests involving problem solving as opposed to mere multiple-choice recognition (see, for example, Ellis, Knirk, Taylor, & McDonald, 1992; Reigeluth, 1999).
66
Integrated Systems, Training Evaluations, and Future Directions
Results indicate that these visualization technologies, together with the new curricula based on improvements in instructional science, yielded test-score improvements of one to over three standard deviations compared to conventional training, while reducing time to train and training development costs (Czech, Walker, Tarker, & Ellis, 1998; Ellis, Devlin, & Allen, 1999; Ellis & Parchman, 1994; Ellis, Tarker, Devlin, & Wetzel-Smith, 1997; Wetzel-Smith, Ellis, Reynolds, & Wulfeck, 1995). Evaluations of training effectiveness in shore schools indicated that IMAT technologies are among the most successful classroom training technologies ever introduced in the navy. In 1997, the Naval Studies Board of the National Academy of Sciences noted (Committee on Technology for Future Naval Forces, 1997) the following: • IMAT students outperform students in conventional classroom instruction, and in many cases score higher than qualified fleet personnel with 3 to 10 years experience. Improved performance has been observed in apprentice and advanced training in aviation and submarine ASW courses. Evaluations consistently show gains of two to three standard deviations on comprehension, reasoning, and problem solving tasks. Overall, the IMAT approach is much more effective than conventional lecture instruction, or technologies such as interactive video or computer based training. • Instructors report that IMAT increases their ability to teach difficult topics, respond to student questions, and reinforce critical principles. • IMAT students score higher on attitude scales measuring attention, relevance, confidence, and satisfaction than students in standard Navy classrooms or students in specially designed individualized computer based training. • IMAT development costs for initial courses are equivalent to or less than conventional courses, and less expensive than other new-technology courses. Subsequent development of related training is up to 90% less expensive.
Deployable Sonar Operations Training (1997–2003) The initial goal of the next phase of IMAT was to develop new technologies for platform level team training on tasks involving real world performance (rather than school knowledge). The goal expanded in two different directions as it became clear that IMAT visualization and modeling technologies had tactical value for performance aiding and that they could also be used for simulation based training and performance support. We were therefore particularly interested in both deployable technologies that could be used at sea and in development of simulation based training coupled with visualization techniques. As a result of the IMAT schoolwork, flag officers and senior officials from the submarine community challenged the team to apply IMAT training methods at sea and to determine whether improvements in at-sea performance could be obtained. This led first to initial development of a personal computer (PC) version of the prior workstation based IMAT visualization and modeling programs and then to 10 at-sea developmental tryouts of the evolving software system and associated training. These at-sea tryouts yielded rapid development and revision of the software, and new features were built in that directly supported real world
Virtual Technologies for Training: Interactive Multisensor Analysis Training
67
tasks. With at-sea instruction, use of these systems resulted in demonstrable improvement in at-sea performance (Chatham & Braddock, 2001). For example, in some at-sea sub-on-sub exercises, the propagation modeling capabilities described above were used to make better predictions about the detectability of acoustic signals from opposing submarines than had previously been available. Following this developmental period, the system was independently tested by the submarine force. PC-IMAT then became an ASW mission support and training system approved by Submarine Development Squadron Twelve (2000) as a navy-standard tactical decision aid for submarines and was used aboard all submarines and most surface combatants. The sonar tactical decision aid (STDA), based on the same technologies but integrated with the acoustic rapid commercial off-the-shelf insertion (ARCI) combat system, has since replaced PC-IMAT on submarines, while development has continued on PC-IMAT for other ASW platforms. It is currently approved for use on integrated shipboard network systems (ISNS), the OCONUS (outside contiguous United States) Navy Enterprise Network (ONEnet), and the Navy Marine Corps Internet (NMCI). Meanwhile, development also continued on upscaled versions of IMAT sensor performance prediction programs (Beatty, 2000) and on development of real time acoustic simulation and fast propagation modeling for purposes of simulation for operator team training. This led first to the development of the sonar employment trainer (SET; Wetzel-Smith & Wulfeck, 2000), which later transitioned into acquisition by the Naval Sea Systems Command (Wulfeck, Wetzel-Smith, Beatty, & Loeffler, 2000). The SET combines operator console simulations, a real time propagation simulation engine, and visualizations like those described earlier. It provides an “immersive” virtual environment, even though it does not actually submerge. The primary purpose of the SET is to provide instructorcontrolled scenario based training with what-if capabilities for submarine sonar teams (four operators plus sonar supervisor), coupled with a highly visual explanation and debriefing capability. This training supports development of reasoning concerning sonar systems employment and tactics by exposing trainees to experiences that might have been encountered only opportunistically during mission deployments. The SET includes a large number of acoustic contacts to provide multiple contact experience, including such effects as contact merging/masking, and tracker sharing. In order to avoid mere memorization of target characteristics, targets support operating mode changes, appropriate steady state and speedrelated components, and changes in signature appropriate for speed, course (aspect), and depth changes. Target simulations also support transients and other nontraditional acoustic events related to significant changes in target mode, speed, aspect, and depth. The SET presents to the trainee the acoustic effects of complex ocean environments. The ocean environment includes variable resolution bathymetry, sound velocity, salinity, and ambient noise databases. Scenarios control surface wind speed; sea state; ambient noise, such as rain or shipping noise; and appropriate local biological, physical, and man-made noise effects. The ocean modeling system properly models effects from shallow water bathymetry, such as steep slope
68
Integrated Systems, Training Evaluations, and Future Directions
effects, ridges, and trenches; short-range effects; and computes an adequate number of paths for complex shallow water environments. This level of performance from the ocean-environment model is necessary to show how and why environmental conditions affect sensor performance and to provide what-if evaluation of alternative tactics to deal with environmental variability. The SET also provides scenario control capability for use by instructors or exercise controllers, including scenario startup, pause/resume, and backup/resume. These features are necessary so that explanations for physical effects on sensor performance can be given and so that alternative courses of action can be explored (see Figure 7.3). The SET is now in place at Naval Submarine School. More importantly, the technologies demonstrated in the SET led directly to a new version of the submarine multimission team trainer (SMMTT). The SMMTT is a full mission simulator for the submarine combat center. Versions of the SMMTT are being developed for the Los Angeles, Seawolf, and Virginia classes of attack submarines and for SSBN (nuclear-powered ballistic missile submarine) and SSGN (nuclear-powered cruise missile submarine) Trident variants. SMMTT simulators are installed at submarine training facilities at Norfolk, Groton, San Diego, Pearl Harbor, Bangor, and Kings Bay (Lotring & Johnson, 2007).
Figure 7.3.
Sonar Employment Trainer
Virtual Technologies for Training: Interactive Multisensor Analysis Training
69
Finally, the sensor performance prediction and visualization systems developed in the IMAT work led to development of the STDA incorporated in the ARCI process for combat-systems development and now installed on many submarines (replacing the stand-alone PC-IMAT). Versions of the STDA are also being transitioned to surface ships and the surveillance community.
Surface Platform, Strike-Group (Battle-Group), and Network Level Training (FY2002–2007) The third major phase of IMAT work has focused on ASW on surface ships operating in battle groups, and on using network connectivity to achieve multiplatform collaborative ASW. The objective was to provide training and performance support systems for network centric ASW—systems that did not then exist. Specific products to the fleet include shore based and at-sea training, exercise support, and feedback systems for multiplatform undersea warfare planning and tactical execution that support network-enabled collaboration. Supporting products included task and mission analyses for ASW operations and tactics, training curricula for surface active ASW, new-technology network based training systems, tactical planning and prediction systems for use as planning tools, tactical decision aids, and assessment/reconstruction tools at the platform and strike-group levels. In general, the development approach was to provide early versions of performance support systems and training to fleet users and then progressively to refine them through heavy fleet interaction. An example of this approach was our initial development of the first multiplatform version of PC-IMAT that was developed and tested with Destroyer Squadrons Fifteen and Seven. The following is an early report from a DESRON (destroyer squadron) commodore on the use of the initial multiplatform version of PC-IMAT: during development PC-IMAT was used as a planning tool for Composite Training Unit Exercises (COMPTUEXs). Commanders found that PC-IMAT superbly supported tactical planning and that it provided a basis for a common understanding among the DESRON team. The associated mobile training team taught and reinforced tactical sensor employment strategies and complex relationships among threat characteristics, the properties of sensors, and environmental variables. Second, the at-sea use of the PCIMAT helped to verify and validate the accuracy of the physics models underlying the system. In addition, a new capability allowed PC-IMAT to ingest current environmental data in situ, so as to further increase the fidelity of prediction. PCIMAT allowed individual ships’ sensor capabilities to be integrated to form a comprehensive and understandable Strike-Group ASW picture. Users noted this is key for optimum asset placement for both offensive and defensive postures. Third, users were enthusiastic about the ease of system operation. PC-IMAT developers tailored the user interface to support the search-planning and tactical-execution tasks. The system is designed to allow users to access automatically network resources such as threat intelligence, low level sensor system characteristics, and environmental data.
70
Integrated Systems, Training Evaluations, and Future Directions
Based on experiences like this, the IMAT program initiated the IMAT mobile training teams, which transitioned to the Naval Mine and ASW Command in 2004. Detachments of the training teams are now located in Norfolk and Jacksonville on the East Coast and in San Diego, California, the Pacific Northwest, Pearl Harbor, Hawaii, and Yokosuka, Japan. The IMAT mobile training teams and in many cases IMAT scientists and researchers over the past five years have provided training and fielded decision-aiding systems for every carrier strike group and most expeditionary strike groups deploying to the western Pacific and Indian oceans. Subsequent development and refinement continued to 2008. Products produced include the following: • Active-sonar optimization: Our colleagues at the University of Texas Applied Research Lab have developed a new active-sonar processing system and associated training, called the Advanced Acoustic Analysis Adjunct for IMAT (A4I). The system provides enhanced sonar processors and digitized displays for surface ships to improve their ability to detect and track submarines. On the basis of at-sea trials, the program executive officer for integrated warfare systems has adopted A4I as an adjunct processor for active acoustic data and installed it on 23 surface combatants (Ma, 2005). The A4I combines active signal processing and display software along with IMAT.Explore A4I lessons. The A4I processing is based upon advanced echo tracker classifier processing software, which is the core of AN/SQQ-89(A)V15 active hull functional segment signal processing. The A4I software contains real time and faster than real time signal processing and display of fleet hull-mounted active recordings. Over 24 hours of recorded AN/SQS-53C data packaged into 17 scenarios are included.
The A4I team has been collecting evaluative data on the effectiveness of A4I training. The training is given to shipboard ASW personnel already fully qualified and trained on ASW tasks who have had all the training the navy offers for their ratings/grades. Earlier studies on the effectiveness of IMAT approaches have consistently shown effects sizes between experimental and control or pretest–post-test groups ranging from .84 to 2.0 standard deviation units (Wulfeck, Wetzel-Smith, & Dickieson, 2004). This compares very favorably with typical effects sizes on principle and conceptual tasks in problem based learning of about .78. In the current A4I study (Wulfeck & Wetzel-Smith, 2008), we have pre-test– post-test data on 50 subjects. The effect size is 2.62, and the difference between tests is highly significant [paired t(49) = 15.37, p < .001]. • Incorporation of IMAT products with composable FORCEnet (CFn). PC-IMAT visualization technologies provide the ASW portion of the CFn system currently installed on CTF 74, CTF 72, CTF 72.2, USS Blue Ridge (C7F), USS Kitty Hawk, USS Ronald Reagan, and at NOPF (Naval Ocean Processing Facility) Whidbey Island, Washington. Adm. Gary Roughead (then Commander, United States Pacific Fleet, now the Chief of Naval Operations) spearheaded the initial test of CFn. CFn resulted in a major improvement in the ASW warfighting capability. In a subsequent interview, Admiral Roughead described IMAT products as follows:
Virtual Technologies for Training: Interactive Multisensor Analysis Training
71
“We are very good at antisubmarine warfare, but we can be better,” Roughead said Sept. 16 in a telephone interview with Navy Times. Roughead wants to expand the basic antisubmarine warfare skills of the fleet. “There’s always much more to learn about the ocean and the environment in which we operate,” he said. The ASW exercises will put some new technologies to the test. (Fuentes, 2005)
One such technology, composeable FORCEnet, is a new networking tool designed to generate and integrate information and intelligence and move both quickly. Roughead wants to know how to incorporate technologies to move information, especially with submarines moving at speed undersea, so commanders and operational planners can quickly make decisions. “This is a great tool being used at multiple levels in the antisubmarine warfare game,” (Fuentes, 2005, p. 3) Roughead said. In August 2006, the Chief of Naval Operations, Adm. Mike Mullen (now Chairman, Joint Chiefs of Staff), commended SPAWAR (Space and Naval Warfare) Systems Center, San Diego for meritorious service for work on composeable FORCEnet for antisubmarine warfare. The award cited “unparalleled improvements in the Fleet Commander’s understanding of the tactical situation and ability to protect carrier strike groups from submarine attack.” The award further cited the Center’s “tenacious dedication to the mission and unsurpassed technical acumen in developing and implementing a new network centric means for performing Anti-Submarine Warfare Command, Control, Communications, Intelligence, Surveillance, and Reconnaissance.” Elements of the Composeable FORCEnet concept were developed and installed at key intelligence and command and control nodes in the Pacific Theater, enabling substantially improved operational management of antisubmarine warfare forces and tactical antisubmarine warfighting. • Development of a Web based online learning system. In order to support the learning and retention of critical knowledge related to effective ASW performance after initial shore training, the IMAT project has developed a new system, called IMAT.Explore, for training development and distribution based on the IMAT family of products (for example, it uses PC-IMAT to generate visualizations automatically during instruction run time). This system is currently used to teach the integrated ASW course for strike group training, as part of the STDA A-RCI program, and is fielded to most of the air warfare classrooms through dedicated installations and as part of NAVAIR’s air-crew online.
Current navy requirements for development and transition of online performance aiding or learning systems are complex and severe. They include requirements for Sharable Courseware Object Reference Model Conformance, the Functional Area Manager, and Department of the Navy Application Database Management System processes, a formal Authority to Operate from the fleet or a systems command, and then formal certification testing for security and network compliance for three different networks: NMCI, ONE-net, and ISNS, each of which has its own requirements. Without these certifications, no computer based
72
Integrated Systems, Training Evaluations, and Future Directions
training or performance-aiding system of any kind can be connected to a ship or navy network, even for testing. IMAT.Explore v.2.0 completed ONE-net and NMCI “Ready to Deploy” certification in July 2007, with ISNS pending (IMAT Development Team, 2007). During 2006 and 2007, IMAT.Explore courseware was developed for over 20 online training courses, on such topics as submarine characteristics, directional noise measurement, ASW mission planning, and on oceanographic environmental characteristics for interesting areas of the world’s oceans. These courses have been provided for transition customers and have been briefed extensively to theater warfare commanders, the Naval Meteorology and Oceanography Command, and the Naval Mine and ASW Command. In addition, many pre-deployment briefings have been given to individual ships and helicopter squadron weapons and tactics units, and this development contributed heavily to new senior ASW seminars at the Naval Mine and ASW Command. All courses are being made available over the Secret Internet Protocol Router Network during 2007 and 2008. For our next effort, the IMAT.Explore platform will serve as the foundation for an integrated training and performance support system. Although it began as a training tool, it will become an interactive repository and intelligent support system for all planning factors that go into theater level planning and mission execution.
Integrated IMAT Training and Performance Support for Theater Level ASW Operations (2008–2012) The IMAT work above has focused on the knowledge requirements for effective performance at various levels of command (for example, individual sensor operator, sensor team, platform command team, squadron, and strike group). The culmination of this approach is to extend the effort to the highest levels of command that deal with ASW—the theater/force level. The new effort directly supports navy requirements. The chief of naval operations’s recent ASW task force, Team Bravo, recommended the development of a high fidelity physics based training and mission support environment to prepare commanders and senior staff at the theater and force levels for ASW operations using modern C4ISR (command, control, communications, computers, intelligence, surveillance, and reconnaissance) systems. Such an environment is required to provide practice on the full range of tasks among all levels of command, at realistic levels of complexity, against competent and alerted opponents, in highly complex multithreat ASW scenarios. No realistic planning, practice, and reconstruction environment currently exists. The overall objective of this effort is the development of mission-rehearsalquality training systems and mission support applications for theater level commanders and their staffs. The approach will be to identify knowledge requirements for effective performance at the force/theater command levels, to characterize high level decision making, and to identify critical command tasks that
Virtual Technologies for Training: Interactive Multisensor Analysis Training
73
can be addressed by training and decision support tools. The effort will then develop simulation based training systems to support expert performance throughout the warfighting continuum of training, mission planning, execution, and reconstruction. It will also develop physics based simulation and visualization tools to create theater commander–level views of, and intelligent decision aids to support the management of, the detect-to-engage sequence for dynamic, complex, ASW against multiple opponents. For this effort, Anti-Submarine Warfare in the Pacific Theater was directed as the immediate application and training domain by several Chiefs of Naval Operations and the last several Commanders of the Pacific Fleet. Theater and ForceLevel Anti-Submarine Warfare is the most critical warfighting priority for the Pacific Fleet. LESSONS LEARNED FROM IMAT EXPLORATIONS IN VIRTUAL ENVIRONMENTS FOR TRAINING User Involvement The history of IMAT involvement with the fleet is long and extensive. The strategy of conducting development, test, and revision of products directly in concert with operational users of IMAT products leads directly to task-centered system design and to highly relevant training. In addition it greatly facilitates the transition of developmental products into fleet use. Fidelity Requirements of Underlying Physical Models Early in training development, or to save money, it is tempting to adopt cartoon or notional or fake visualizations or display environments to depict relationships or even to “simulate” system operation. This is an extremely bad idea, for several reasons. First, it leads to oversimplification in the training process. This reductive bias (Feltovich, Hoffman, Woods, & Roesler, 2004; Feltovich, Spiro, & Coulson, 1991) leads to poor learning. Second, it often omits key underlying parameters because their importance to outcomes may not be realized. High fidelity physical models, however, force the inclusion of all variables known to affect the phenomenon under study and allow their exploration during system refinement. Third, the lack of physical rigor makes further transition more difficult because most of the system ultimately needs to be reimplemented. A related issue involves the adoption of probabilistic models of system outcomes, rather than fully modeled predictions. This hides variation and uncertainty in statistical error, rather than making them objects of study where, for training complex tasks, they belong. SUMMARY The IMAT program has developed virtual environments for training and performance support systems designed to make difficult scientific and technical
74
Integrated Systems, Training Evaluations, and Future Directions
concepts comprehensible to the operational users of advanced sensor systems. Products of the effort integrate computer models of physical phenomena with scientific visualization technologies to demonstrate interactive relationships for training, to simulate sensor/processor/display systems for team training, and to provide tactical decision-aiding systems for use in the fleet. The IMAT vision is to integrate training, operational preparation, tactical execution, and postmission analysis into a seamless support system for developing and maintaining mission-related critical skills. In many ways, IMAT is a prototype for future human-performance support systems that transcend the traditional dichotomy between formal school training and actual live performance, to span career-long skill development and expert performance from apprentice to master levels, across missions, platforms, and communities. ACKNOWLEDGEMENT/DISCLAIMER The Interactive Multisensor Analysis Training (IMAT) program is a joint effort of the Space and Naval Warfare Systems Center, San Diego, and the Naval Surface Warfare Center, Carderock Division, supported by several contractor companies. Over 50 people have worked on the IMAT effort, and the authors in particular thank Bill Beatty and Rich Loeffler (NSWC-CD), Eleanor Holmes (Rite Solutions, Inc., Middletown, RI), Kent Allen (Anteon Corp.), and Joe Clements (Applied Research Laboratories, University of Texas at Austin) for their contributions to the program. The Capable Manpower Future Naval Capability program at the Office of Naval Research, Code 34, supports portions of the work described in this chapter. The views and opinions expressed herein are those of the authors and should not be construed as official or as reflecting those of the Department of the Navy. NOTES 1. The principal investigators’ initial work on ASW-related issues began in the 1980s, during which time Ms. Wetzel-Smith conducted a number of studies concerning knowledge and skill retention in ASW operations, and Mr. Beatty conceived and managed a classified sensor/signal processing program for maritime patrol ASW that led to initial visualization techniques for sensor performance. 2. In the 1990s predecessor organizations were the Navy Personnel Research and Development Center, San Diego, California, and the Naval Surface Warfare Laboratory, White Oak, Maryland.
REFERENCES Beatty, W. F. (2000). The design of a high-speed transmission loss server and its tactical implications (U). U.S. Navy Journal of Underwater Acoustics, Winter. [SECRET] Chatham, R., & Braddock, J. (2001). Training superiority and training surprise (Report of the Defense Science Board Task Force). Washington DC: Defense Science Board. Available online: http://www.dtic.mil/ndia/2001testing/chatham.pdf
Virtual Technologies for Training: Interactive Multisensor Analysis Training
75
Committee on Technology for Future Naval Forces. (1997). Technology for the United States Navy and Marine Corps, 2000–2035 becoming a 21st-Century force: Volume 4. Human resources. Washington, DC: National Academy Press. Available online: http://books.nap.edu/openbook.php?record_id=5865&page=48 Czech, C., Walker, D., Tarker, B., & Ellis, J. A. (1998). The Interactive Multisensor Analysis Training (IMAT) system: An evaluation of the airborne acoustic mission course (Rep. No. NPRDC TR 98-2). San Diego, CA: Navy Personnel Research and Development Center (ADA338076). Ellis, J. A., Devlin, S., & Allen, K. (1999). The Interactive Multisensor Analysis Training (IMAT) system: An evaluation in Sonar Technician Submarine (STS) “A” School (Tech Rep. No. 99-3). San Diego, CA: Navy Personnel Research and Development Center. Ellis, J. A., Knirk, F. G., Taylor, B. E., & McDonald, B. A. (1992). The course evaluation system. Instructional Science, 21(4), 313–334. Ellis, J. A., & Parchman, S. (1994). The Interactive Multisensor Analysis Training (IMAT) system: A formative evaluation in the Aviation Antisubmarine Warfare Operator (AW) Class “A” School (Tech. Note NPRDC TN 94-20). San Diego, CA: Navy Personnel Research and Development Center. Ellis, J. A., Tarker, B., Devlin, S. E., & Wetzel-Smith, S. K. (1997). The Interactive Multisensor Analysis Training (IMAT) system: An evaluation of acoustic analysis training in the Aviation Antisubmarine Warfare Operator (AW) Class “A” school (Rep. No. NPRDC-TR-97-3). San Diego, CA: Navy Personnel Research and Development Center. (DTIC AD No. ADA328827). Feltovich, P. J., Hoffman, R. R., Woods, D. R., & Roesler, A. (2004). Keeping it too simple: How the reductive tendency affects cognitive engineering. IEEE Intelligent Systems, 90–94. Available online: http://www.ihmc.us/research/projects/EssaysOnHCC/ ReductiveExplanation.pdf Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1991). Learning, teaching, and testing for complex conceptual understanding. In N. Frederiksen, R. J. Mislevey, & I. I. Bejar (Eds.), Test theory for a new generation of tests (pp 181–217). Hillsdale, NJ: Lawrence Erlbaum. Fuentes, G. (2005, October 3). Pacific Fleet Commander keeps eyes below water: Roughead stresses anti-sub technology. Navy Times, 3. Available online: http:// www.navytimes.com/legacy/new/0-NAVYPAPER-1140361.php IMAT Development Team. (2007). IMAT.Explore NMCI documentation. West Bethesda, MD: Naval Surface Warfare Center Carderock Division. Lotring, A.O., & Johnson, E. A. (2007). Improving Fleet ASW training for submariners. Proceedings of the US Naval Institute, 133(6), 34–39. Ma, J. (2005, February 7). Enhanced processors, displays to help surface ships’ ASW capability: LaFleur Finds Money For A4I. Inside the Navy [Online]. Reigeluth, C. M. (Ed.). (1999). Instructional-design theories and models: A new paradigm of instructional theory. Mahwah, NJ: Lawrence Erlbaum. Submarine Development Squadron Twelve. (2000). IMAT tactical employment manual (U) (TM-FZ1460-1-00). Groton, CT: Author. [Confidential] Wetzel-Smith, S. K., & Czech, C. (1996, August). The Interactive Multisensor Analysis Training system: Using scientific visualization to teach complex cognitive skills (Rep. No. NPRDC TR 96-9). San Diego, CA: Navy Personnel Research and Development Center. (ADA313318)
76
Integrated Systems, Training Evaluations, and Future Directions
Wetzel-Smith, S. K., Ellis, J. A., Reynolds, A. M., & Wulfeck W. H. (1995). The interactive multisensor analysis training (IMAT) system: An evaluation in operator and tactician training (Tech. Rep. No. NPRDC TR 96-3). San Diego, CA: Navy Personnel Research and Development Center. Wetzel-Smith, S. K., & Wulfeck, W.H. (2000). Tactical sensor employment training (U). U.S. Navy Journal of Underwater Acoustics, Winter. [SECRET] Wulfeck, W. H., & Wetzel-Smith, S. K. (2008). Use of visualization tasks to improve high-stakes problem solving. In E. L. Baker, J. Dickieson, W. H. Wulfeck, & H. F. O’Neal (Eds.), Assessment of problem solving using simulations (pp. 223–238). New York: Lawrence Erlbaum. Wulfeck, W. H., Wetzel-Smith, S. K., Beatty, W. F., & Loeffler, R. (2000). Military characteristics for the Sonar Employment Trainer (SET) (MC No. N87-MC-S-20-00-03). Wulfeck, W. H., Wetzel-Smith, S. K., & Dickieson, J. L. (2004). Interactive Multisensor Analysis Training. In Advanced Technologies for Military Training (RTO Meeting Proceedings No. MP-HFM-101, pp. 4.1–4.14). Neuilly-sur-Seine Cedex, France: Research and Technology Organisation.
Chapter 8
A VIRTUAL ENVIRONMENT APPLICATION: DISTRIBUTED MISSION OPERATIONS Dee Andrews and Herbert Bell
U.S. AIR FORCE TRAINING AND OPERATIONAL CHALLENGES1 Operation Enduring Freedom and Operation Iraqi Freedom have shown clearly that military forces must remain flexible as they conduct new types of warfare. In Iraqi Freedom, after the first phase of the war, it became clear that the advantage allied troops held in maneuver warfare was greatly affected by the type of insurgent tactics used by the enemy. Fresh approaches to training assure that coalition forces can optimally adapt to new battle conditions. Virtual environments provide some of the new approaches to training required. This chapter describes a form of virtual environment training, distributed mission operations (DMO), that has provided the United States Air Force (USAF) with an effective method for training required new skills and competencies. While our focus will be on the use of DMO principles in USAF training, it is important to point out that distributed virtual environments for training are being used by all U.S. services and, indeed, by all coalition forces to increase readiness. Current U.S. Air Force warfighter training and operational needs are driven by a number of different factors. There are increased operations and constant deployments as the United States fights wars in a number of countries and conducts both wartime and peacetime missions in many more (Andrews, 2001). These increases in operations not only put strain on personnel and equipment, but they decrease training opportunities because personnel are engaged in real world missions. They also take warfighters away from many training resources at their home bases. In addition, the increased operations tempos put more hours on aging equipment (some airframes are being flown by the grandchildren of the original aircrews), and there is a desire to limit training time on these equipment 1
The opinions expressed in this chapter are those of the authors and do not necessarily represent the official views of the Department of the Air Force or the Department of Defense.
78
Integrated Systems, Training Evaluations, and Future Directions
sets. Also, there is growing pressure on training ranges due to population growth, environmental concerns, and competition for airspace. These pressures make it more difficult to expand training ranges that exist and even to maintain the training range areas that currently exist. Increasing fuel costs have caused decision makers to seek less expensive ways to train than in the actual equipment at least part of the time. At the same time these constraints are being felt, the need for better and more frequent training has speeded up as complex, perishable skills have increased and the need for refresher training accelerates. Finally, USAF senior management would like to use current modeling and simulation technology to break down organizational “stovepipes” that prevent different U.S. Air Force organizations from training and operating with organizations in other Departments of Defense and coalition allies. DISTRIBUTED MISSION OPERATIONS The Air Force Research Laboratory developed a construct, and attendant methods and technologies, called distributed mission training (DMT), that has helped the USAF to overcome many of the problems discussed above (Grant, Greschke, Raspotnik & Mayo, 2002). After DMT showed its capability to solve those training problems, senior USAF management determined that the DMT construct could also be used to improve actual operations and the term distributed mission operations was coined. DMO connects live, virtual, and constructive environments to form a synthetic battle space for training and for operations. DMO helps break down stovepipes between military units so they can have better communication and understanding of how best to work together. DMO TECHNOLOGIES A DMO links virtual and constructive technologies with live equipment (for example, actual aircraft) via interconnection technologies. Virtual technologies include human-in-the-loop, immersive capabilities, such as flight simulators. Constructive technologies include computer-generated entities and wargames. The goal is to allow USAF warfighters to train as they intend to fight. This imposes performance requirements on participating simulators. For example, if the time required to send information from one simulator to another is too long, simulator performance may appear unrealistic and may negatively impact training effectiveness. Therefore, a performance goal is to keep the transmission delays between simulators to 100 ms or less. DMO technologies include communication technologies for brief/debrief of the missions. These include typical video and telephonic devices, as well as electronic whiteboards that allow instructors and trainees to transmit photos, PowerPoint slides, and maps. A key feature of the electronic whiteboards is the capability for instructors or trainees to immediately communicate with all other participants on the network. Experience has shown that because DMO participants may well have never worked together before, any means by which they
A Virtual Environment Application: Distributed Mission Operations
79
can rapidly develop a shared mental model that builds the trust necessary for effective teamwork (Crane, 1999). It is also very helpful to capture data as the exercise unfolds so the entire exercise can be played back for the trainees after the exercise is finished. Freeze features and the capability to replay the exercise at slower or faster speeds also are very helpful (Bennett., Schreiber, & Andrews, 2002). DMO technology improves the capability to measure training performance in two ways. The first involves capturing objective data by embedding measurement technologies in the computers that run the DMO exercises. These measurement capabilities allow digital data to be captured from fast-moving training exercises as they happen. Second, measurement technologies that allow instructors and observers to record subjective data in real time can be invaluable in helping to highlight key performance failures during the debriefing and later for analysis (Schreiber, Carolan, MacMillian, & Sidor, 2002; Rowe, Schvaneveldt, & Bennett, 2007). It is important to again note that all of the technologies and methods that make DMO viable for training can also be applied to operational purposes. The USAF believes that eventually these technologies will be used to carry out operational missions. So, in many cases the human-in-the-loop equipment, although installed in a building, could be used not only to train warfighters, but also to let them perform their missions on the same suite of equipment. While embedded training (providing training exercises on operational consoles) has been used for some time, DMOs would now allow that concept to include exercises even for equipment not tethered to a fixed facility. For example, as unmanned aerial systems (UAS) have been introduced into the inventory, they are now flown over the operational theater half way around the world by operators in fixed sites who can use their control consoles to both train and operate. DMO METHODS AND ISSUES When warfighters first start to use a DMO capability, they have a tendency to revert to the same training methods that they are accustomed to using on a live training range. While DMO can make use of those training methods, trainees soon learn that DMO can support additional methods that can provide better learning and learning retention. A few examples include the following: • DMO can allow for exercises to be frozen in midflight so that points can be made by the instructor and errors corrected before they become reinforced; • DMO provide real time kill removal capability, which means that synthetic and human-in-the-loop entities can be taken out of the scenario as soon as they become casualties. This has an important learning benefit because it means that trainees will not spend time worrying about entities that are no longer germane to the training exercise. Currently, when an aircrew is informed by a range controller that it has been hit, the aircrew flies the aircraft to a “regenerate” zone, from which it is then allowed to come back into the exercise. However, while it is transiting to the regenerate area, it may be mistaken for an active player by aircrews that do not know it has already
80
Integrated Systems, Training Evaluations, and Future Directions become a casualty. In that case, it may be attacked again, which takes the trainees who do the attacking away from a part of the exercise that is still active and relevant. • DMO allow exercises to be flown so that an aircraft that is hit does not suffer real time kill removal. That is, the aircrew in that aircraft is allowed to keep flying in the exercise, but it is signaled that the aircraft has been hit (usually by flashing the out-the-window displays red) and the aircrew continues to fly. This feature is used when an instructor believes it would harm the integrity of a multiteam exercise to take out one of the aircraft early in an exercise. This condition is often referred to as “shields up.” • Because of the digital nature of DMO exercises, the same conditions for exercises can be re-created over and over again. In training range exercises it is very difficult to recreate exact conditions, and therefore it is difficult to measure progress from one exercise to the next.
A key issue that affects DMO training is the need to have a training strategy that systematically trains new concepts and measures results, rather than merely a practice strategy. This problem plagues all simulator based training, but is particularly pronounced in DMO because DMO may have a live component (Andrews, 1988). When the “practice strategy” is used, the general feeling is that all the instructors have to do is set up a realistic scenario with high fidelity entities and let the trainees fight in the way they normally would in an actual operation (Allesi, 1989). There is no doubt that such practice exercises do produce some learning in a discovery mode; however, the learning is generally haphazard, unsystematic, and unpredictable. DMO instructors who instead follow the instructional system development approach in developing the training exercises find that considerably more learning takes place when training is designed and conducted in that mode (Rothwel and Kazanas, 2003). Prerequisite skills should be defined before the training starts, and then clear training objectives must be stated based upon training requirements. Using this front-end analysis, the scenarios then are planned with appropriate measurement of process and product stated. Trainees should be given time to familiarize themselves with the simulators and constructive models before the exercise begins again. Then, and only then, should the actual training exercise start. Instructors must decide beforehand about the following issues: whether or not freeze will be used (“freeze” refers to the strategy of stopping the scenario at certain points for instructional purposes), if and how new entities will be introduced once the exercise is under way, and whether real time kill removal will be used. Significant DMO experience has shown that these systematic steps are crucial to the instructional effectiveness of the training exercise.
DMO EVALUATION Evaluating the effectiveness of DMO is a complex undertaking. Metrics are necessary for assessing the trainees’ process as well as mission effectiveness on the actual battlefield (Bell, Bennett, Denning & Landrum, 2003). Such DMO evaluations follow many of the same procedures and use many of the same
A Virtual Environment Application: Distributed Mission Operations
81
process and product measures as are used when training is delivered in non-DMO modes. These include process and product metrics such as number and types of communications between teammates, degree of coordination, accuracy of situational assessments, correctness of command and control decisions, and impact of the mission’s effects on the simulated battlefield (Schvaneveldt, Tucker, Castillo, & Bennett, 2001). In addition to the evaluation of trainee and operator actions in the DMO environment, technologists can also measure the effectiveness of the technology in providing a realistic synthetic battle space. Examples include the following: • Interdevice transport delay—How long does it take for an output of one DMO device to be distributed to other nodes on the DMO network? • Adequacy of communication quality over the DMO network—This is typically measured subjectively by instructors who are listening to the DMO exercise. The criteria for measuring quality of communications have to do with the type of communication, the actual message, and the timeliness and completeness of the message. • Network security—Is the DMO network protected from external intrusions? Is the network protected from internal intrusions; that is, can all the sites inside the DMO network be sure that other sites inside the network do not intrude into parts of their computer complex for which they do not have authorization? • Mission planning—Can the warfighters who are planning the missions access and send relevant information in a time frame consistent with the mission requirements? What is the quality of the missions that are planned?
IMPACT OF DMO VIRTUAL ENVIRONMENTS Training The DMO construct has had a considerable positive impact on USAF training. In addition, over the years that impact has spread to the other U.S. military services and to coalition allies. DMOs are used in many of the air force’s major commands, including Space Command, Air Mobility Command, and Special Operations Command. To provide an example of how DMO works in the U.S. Air Force, we now examine briefly DMO use in the Air Combat Command. The Air Combat Command has installed “mission training centers” (MTCs) at many of its fighter bases. These MTCs consist of two- or four-flight simulators and attendant instructional support systems. The simulators have wraparound domes with out-the-window 360° visual scenes. The cockpits have high physical and high functional fidelity. The trainees can fly as a two- or four-ship formation just as they do in the real world. The MTC can be linked to other simulation centers that might simulate command and control platforms, U.S. Army or Navy units, or coalition partners. Training exercises can include air-to-air and air-to-surface missions. The DMO simulators can be used to provide a range of training opportunities for the warfighters: individual procedural training, two-ship or four-ship element level team training, as well as team of teams training with other DMO sites including
82
Integrated Systems, Training Evaluations, and Future Directions
coalition partners. Instructors are provided an instructor/operator station that allows them to view the formation from a “God’s eye view,” as well as the entire mission evolution. In addition, they can see what the trainees are seeing out of their front windscreens. They hear all communication between the pilots and other entities on the network. When the exercise is completed, the trainees can debrief, as was described above, by replaying the data from the exercise. The trainees or instructor can stop and start the exercise debrief as needed. Operations The full DMO concept has not yet been realized in operations. Perhaps the best current application is in mission rehearsal. In mission rehearsal, various DMO entities can be linked to create a synthetic mission environment that closely mimics the environment the warfighters will be encountering when they perform their mission. Terrain, cultural features, humans, weather, threat effects, and many other elements can be highly modeled to present a very close approximation to what the warfighters will see when conducting the mission. As mentioned above, a good example of this principle is the operation of a UAS over a battlefield while its operator is many thousands of miles away at a control station. The difference between the UAS control station as an operational control device versus a training simulator is difficult to differentiate physically; only in purpose do we see differences, those differences of purpose being actual operational control versus training. FUTURE OF DMOS The DMO concept has been adopted (often with different names) across the DoD and in many allied countries. We expect increased use of DMO technologies and methods as budgets become tighter, the military seeks to relieve stress on personnel, mission deployment training needs increase, and alliances increase in size and complexity. These four factors are explained in more detail below. DMOs can provide significant cost savings for both training and operations. Current fuel costs and wear and tear on actual equipment can make flying even a one-ship training mission very expensive. When the costs are combined to train entire multiplayer exercises, the costs can easily be in the millions of dollars for a large exercise. DMO allow the trainees to train in relatively inexpensive-tooperate simulators and constructive models as opposed to actual equipment. Orlansky and Chatelier (1983) provide an excellent framework for determining the cost-effectiveness of single simulators for training. It is believed that that model can be used to determine the cost-effectiveness for DMOs. While the DMO concept allows for live equipment entities (for example, aircraft) to be part of DMO exercises, it is expected that their role will become more limited in the DMO future as simulators and constructive models improve. Having said that, it is important to note that there will always be a place for live equipment in those exercises. DMO is expected to allow simulators and constructive models to be
A Virtual Environment Application: Distributed Mission Operations
83
used more and more in actual operations as supports and/or replacements for some actual equipment. That will potentially save lives. Since the end of the Cold War, coalition forces have deployed at a much greater frequency because many of the forward-deployed bases used in the Cold War were closed. That means that the warfighters are generally away from home base more often than before. Not only does this frequency of deployment affect the warfighters’ personal and family lives, but this makes it much more difficult to meet their training goals. Therefore, the training they do get at home must be as effective as possible with high skill retention. DMO technologies can assist in increasing the effectiveness of training time they do get. In addition, DMO assets are becoming more deployable and will be going with the warfighters to their deployed bases more often. Warfighters will rely more and more on DMO to help them prepare for and carry out missions. Mission rehearsals at home and in the area of operations will rely increasingly on DMO technologies. These include rapidly updatable databases that will give DMO scenarios remarkable fidelity for the missions that will be conducted. These database updates can include real time weather changes, as well as new threats. Virtual and constructive DMO technologies are used more and more to actually support the mission, including having warfighters conduct their mission at very long distances through the use of weapon systems such as the UAS. Finally, the U.S. military forces will seldom conduct operations, especially large operations, by themselves anymore, but instead will fight with coalition partners. Obviously, physically bringing together large units from many different countries to train together is limited due to distance and cost. This DMO coalition concept for distributed mission training across countries, continents, and oceans has already been tested, and this approach will become much more widespread in the future. In like manner, entire operations of coalition partners will see the positive impact of DMO as virtual and constructive entities work with live operational equipment in the theater to support the mission (McIntyre and Smith, 2002). For all of these reasons, DMO will be a major factor in future training and operations around the world. REFERENCES Allesi, S. M. (1989). Fidelity in the design of instructional simulations. Journal of Computer-Based Instruction, 15, 40–47. Andrews, D. H. (1988). Relationships among simulators, training devices, and learning: A behavioral view. Educational Technology, 28, 48–53. Andrews, D. H. (2001). Distributed mission training. In W. Karwowski (Ed.), International Encyclopedia of ergonomics and human factors (Vol. II, pp. 1214– 1217). New York: Taylor and Francis. Bell, J. A., Bennett, W., Jr., Denning, T. E., & Landrum, L. (2003). Tactics development and training program validation in distributed mission training a case study and evaluation with the USAF weapons school. Proceedings of the 2003 Interservice/Industry
84
Integrated Systems, Training Evaluations, and Future Directions
Training, Simulation and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. Bennett, W., Jr., Schreiber, B. T., & Andrews, D. H. (2002). Developing competencybased methods for near-real-time air combat problem solving assessment. Computers in Human Behavior, 18(6), 773–782. Crane, P. M. (1999, April). Designing training scenarios for distributed mission training. Paper presented at the10th International Symposium on Aviation Psychology, Columbus, OH. Grant, S., Greschke, D., Raspotnik, B., & Mayo, E. (2002). A complex synthetic environment for real-time, distributed aircrew training research. Proceedings of the 2002 Interservice/Industry Training, Simulation and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. McIntyre, H. M., & Smith, E. (2002, April). Collective training for operational effectiveness. Paper presented at the NATO RTO Studies, Symposium on Air Mission Training Through Distributed Simulation (MTDS)—Achieving and Maintaining Readiness, Brussels, Belgium. Orlansky, J., & Chatelier, P. R. (1983). The effectiveness and cost of simulators for training. International Conference on Simulators (Publication No. 226, pp. 297–305). London: London Institution of Electrical Engineers. Rothwell, W., & Kazanas, H. (2003). Mastering the instructional design process: A systematic approach (3rd ed.). San Francisco: Pfeiffer. Rowe, L. J., Schvaneveldt, R. W., & Bennett, W., Jr. (2007). Measuring pilot knowledge in training: The pathfinder network scaling technique. Proceedings of the 2007 Interservice/Industry Training, Simulation, and Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. Schreiber, B. T., Carolan, T. F., MacMillan, J., & Sidor, G. J. (2002, March). Evaluating the effectiveness of distributed mission training using “traditional” and innovative metrics of success. Paper presented at the NATO SAS-038 Working Group Meeting, Brussels, Belgium. Schvaneveldt, R., Tucker, R., Castillo, A. R., & Bennett, W., Jr. (2001). Knowledge acquisition in distributed mission training. Proceedings of the 2001 Interservice/Industry Training, Simulation and Education Conference. Arlington, VA: National Training Systems Association.
Chapter 9
VIRTUAL ENVIRONMENTS IN ARMY COMBAT SYSTEMS Henry Marshall, Gary Green, and Carl Hobson This chapter provides an overview of past and current efforts to embed virtual environments (VEs) into operational army combat vehicles for training and augmented reality. It addresses embedded VEs for vehicles, such as the M1 Abrams tank, the Bradley fighting vehicle, and the Stryker family of vehicles. Discussion includes the requirement for embedded training (ET) and the technology challenges that have been and are being addressed to use VE in military systems. TASK ANALYSIS Performance Standards The Army Training and Doctrine Command (TRADOC) defines ET as training capabilities built into or added onto operational systems, subsystems, or equipment to enhance and maintain the skill proficiency of personnel (Department of the Army, 2003). Vehicles equipped with ET may be employed as stand-alone trainers to sustain crew or individual skills or, when networked together with other vehicles or simulators, for combined arms training and mission rehearsal. The operational requirements documents for the army’s future combat system (FCS), Abrams, Bradley, and Stryker all require ET. Providing the technologies to support these requirements has been a major army research focus for the past 10 years. Technology Requirements Virtual simulation is one of the primary ET technologies. This term includes numerous supporting technologies, such as computational systems, image generation, rapid terrain database development, computer-generated forces (CGF), local and wide area networks, position and orientation tracking, and miniaturization. ET can be integrated into vehicles at different levels. To define these levels TRADOC provided the following criteria (Department of the Army, 2003).
86
Integrated Systems, Training Evaluations, and Future Directions
Fully Embedded “Fully embedded” means built into the operational systems with no training unique hardware. This is the ultimate goal of ET. Currently FCS is pursuing this level of integration as it is being designed. ET is a key performance parameter for FCS and is a mandatory capability before fielding. The FCS program mandates horizontal integration between the vehicle and training system developers to capitalize on innovative methods to integrate ET with operational needs. This horizontal integration is an ongoing effort and will continue through the life of the FCS vehicles. Appended “Appended” means added to an existing operational system. Most current force ET systems fall into this category. This level of integration typically adds one or more line replaceable unit (LRU) computer systems to the vehicle to perform ET functions and provide virtual imagery to the crew members’ displays. The current Stryker ET system is an example of an appended system and is the only ET system fielded in current force vehicles. This ET system provides remote weapon station gunnery training. The system uses an appended ET module as the simulation host and interfaces to the video display terminal, a device used by the training manager as the instructor operator station (IOS) to select a gunnery training exercise and monitor trainee progress. Sustainment/opportunity gunnery training has also been demonstrated in the M1A2 Abrams tank and in the Bradley fighting vehicle. Its focus is keeping soldiers proficient on gunnery skills that have been proven to be perishable over time. Currently the army has an ongoing effort to develop and field a common embedded training system for the Abrams M1A2 system enhancement package tank and the Bradley M2A3 fighting vehicle in 2009. The common solution will initially provide sustainment opportunity gunnery training capability, followed by a mission rehearsal and live training by 2012. Umbilical “Umbilical” means connected to the operational system as needed. An example is the Abrams full crew interactive simulator trainer (AFIST) (Department of the Army, 2000). AFIST appends monitors outside the Abrams’ vision blocks for trainee visualization and attaches sensors to the vehicle controls to capture trainee operation of vehicle systems during simulation. Data are routed to computers outside of the vehicle to drive the simulation software, and training sessions are controlled using an IOS. SYSTEM HIGH LEVEL DESIGN Fully embedded, appended, and umbilical approaches each present implementation issues. The fully embedded approach requires a significant redesign of the vehicle electronics (vetronics) architecture. The appended approach requires the addition of one or more LRUs to an already space-constrained vehicle, increases
Virtual Environments in Army Combat Systems
87
the thermal dissipation requirements, and requires some vetronics modification. The umbilical approach requires significant add-on hardware, connections to vehicle control devices and LRUs, increases the logistical footprint, and is time consuming to set up and tear down. The fundamental issue in providing ET to any current force system is the routing of information from vehicle controls and subsystems to an ET system for simulation of the vehicle’s operation in the virtual environment, which is typically called ownship simulation among ET developers, and generation of data to replicate the vehicle’s sensor imagery. Typically, analog-intensive vehicles are the most problematic for ET and require appended devices to capture their state information. Examples include steering wheels and braking systems that are linked directly to hydraulic systems, switches and dials that are purely analog, and direct view optics, such as vision blocks and degraded mode weapon sights, which are the direct view optics to back up the digital optical systems in case of failures. A multifunctional vision block concept that can switch between direct view and simulated view modes is under development to address this latter issue (Montoya et al., 2007). Multifunctional vision blocks also provide an operational enhancement with a mixed view mode that overlays the outside view of the real world with synthetic imagery, such as a heads-up display for system state information and for augmented reality. Safety is a major concern since ET systems must interoperate with vehicle software that controls potentially dangerous components. Examples of common safety concerns include weapon computers, laser range finders, automated ammunition systems, and movement of a turret or ramp. Typically these systems must be disabled in preparation for training. At the same time vehicle cautions and warning systems should not be inhibited since the trainee could become confused between what is simulated and actual, when a potential dangerous condition could be occurring. There is a notional high level design for ET in combat vehicles (Department of the Army, 2003). This architecture shows how the various vehicle components are related to ET and how an ET system could interface to external systems. In operation, a crew member receives simulated information from the vehicle displays and sensors and reacts to this stimulus by making menu selections, pushing buttons, flipping switches, firing a weapon, and so forth. The ET application senses the crew actions and injects a stimulus that causes the displays to transition, which then requires the crew member(s) to react to the new situation. In a combined arms training exercise when actions by the crew change the world state by moving the vehicle, destroying another vehicle, and so forth, the world state is updated in all the other players in the exercise.
EMBEDDED TRAINING RESEARCH There have been three distinct army ET technology programs as follow: The first, called inter-vehicle embedded simulation and training (INVEST; Institute for Simulation and Training, 2002), focused on appended ET for the
88
Integrated Systems, Training Evaluations, and Future Directions
Abrams and Bradley (A/B) during the period 1998–2002. It examined issues of architectures, communications, live-virtual correlation, and mixed reality for live and virtual ET. The two areas of primary emphasis of the INVEST Program were the development of an A/B kit architecture for the appended ET hardware/ software and the exploration of methods for reducing the bandwidth requirements of the CGF, which are the computer-controlled simulated forces that typically play the opposing forces and any friendly forces not controlled by a manned simulation. A/B kit architecture employs the concept of a B kit that contains common hardware/software applications that can be used across multiple vehicle platform ET applications, such as CGF or after action review. The A kit interfaces to items unique to the vehicle configuration of a given platform, such as handles and displays. The CGF research focused on the exploration of synchronization and meta level representation of crew interactions and vehicle dead reckoning. The program concluded with a networked demonstration of Abrams and Bradley vehicles and an Abrams testbed with various degrees of ET technology performing a combined arms training exercise. The follow-on to INVEST was named embedded combined arms team training and mission rehearsal (ECATT/MR; Research Development and Engineering Command, 2007). ECATT spanned the years 2002–2006 and shifted the research focus from current force vehicles to FCS. At the time FCS was considered the replacement for both the Abrams and Bradley vehicles in the 2014 time frame. This program investigated issues for fully embedding ET applications into the vehicle. It researched ET architectures based on the army’s standard semiautomated forces program, the use of intelligent structured training as a replacement for on-site instructors, and interfaces between mounted and dismounted ET and augmented reality. This program culminated with a demonstration of dismounted ET for mission planning, mission rehearsal, and after action review at the Army Aerial Expeditionary Force Experiment in November 2006 (Marshall, Garrity, Roberts, & Green, 2007). The most recent program, scalable ET and mission rehearsal (SET-MR), began in December 2006 and again focused on current force vehicles. SET-MR is researching common ET components applicable to multiple vehicles. This program is also seeking to develop highly accurate, miniaturized sensors to determine weapon location and orientation for pairing of shooters and targets without line of sight. In addition to the army’s Abrams, Bradley, and Stryker vehicles, the U.S. Marine Corps expeditionary fighting vehicle is also participating in this ET research. SET-MR developed software requirements for ET on each of the vehicles of interest (Oasis Advanced Engineering, 2007). It also defined a variety of use cases for ET on these vehicles.
USER CONSIDERATIONS The INVEST program primarily examined issues of embedded gunnery training. Since an objective of the program was to minimize the addition of new hardware to the vehicle, INVEST used existing vehicle displays to provide
Virtual Environments in Army Combat Systems
89
information to the crew. For the Abrams and the Bradley, this limited involvement to the vehicle commander and gunner stations as they have the only displays capable of displaying virtual imagery. In actual operations, the driver also plays a major role in target detection, but at the time there was no means to support the driver’s visualization of the virtual environment. ECATT and SET-MR expanded the research to include the entire crew for mission rehearsal. PROTOTYPES Each of the research programs built a variety of prototypes. Recent prototypes included an FCS infantry carrier vehicle (ICV) for demonstration and experimentation, an FCS command and control vehicle for robotics management experimentation, and several dismounted soldier systems ranging from man-wearable, fully immersive systems to handheld computer versions. SET-MR is also prototyping simulation software for the future force warrior program. DEMONSTRATIONS The ET research programs have conducted a number of demonstrations over the years. Most recently, in April 2007 the SET-MR research program participated with the program manager for Stryker in an ET interoperability experiment (Optimetrics, Inc., 2007). The goal was to show the potential of collective ET and mission rehearsal for the Stryker vehicle and obtain early feedback from the user community on the path forward for ET objective requirements. In order to move from the current gunnery ET simulation to a collective capability involving the full crew and supporting dismounted soldiers, the experiment required that the vehicle’s driver, the commanders C2 (command and control) capabilities, and the infantry soldiers that ride in the vehicle be included in the experiment. The experiment included two ICVs, one command and control vehicle (C2V) and one mobile gun system (MGS). The systems were networked using high level architecture (HLA) and a gateway between HLA and dismounted soldier simulation systems using the distributed interactive simulation (DIS) protocol. To support the collective training scenario the existing Stryker ICV ET software was modified to accept ownship control from a joystick in the driver’s station, and a video feed was routed to the driver visualization enhancement device to provide the driver’s view. The remote weapons station, which permits the commander to fire the top-mounted weapon from inside the vehicle, is supported by the current ET system and was used without modification. A similar configuration was designed for the main gun on the MGS. To replicate the dismounted forces several models were explored. One of the more successful used a system built around the Half-Life game engine that was interoperable with the DIS protocol (Institute for Simulation and Training, 2002; Research Development and Engineering Command, 2007). The exercise scenario was based on a breaching tactic where the MGS fired rounds to create a hole for the infantry to enter the building. The ICVs then moved into position where the infantry soldiers dismounted and
90
Integrated Systems, Training Evaluations, and Future Directions
moved through the breach to conduct a raid. When the opposing forces were cleared, the soldiers remounted the ICV and departed the area. The demonstration generated a number of lessons learned that are being incorporated into the research (Optimetrics, Inc., 2007). Among them was the inability of current computer-generated forces systems to realistically play opposing forces in urban environments and the cost and difficulty of maintaining the existing HLA implementation. In general, the feedback from soldiers was positive. They stated that the collective ET system could be a great tool to use during home station training and rehearsals, but had concerns for its effectiveness in a theater of operations because small units typically have little time for rehearsals between the receipt of orders and the start of a mission. Other areas of concern include setup time and ease of use. SUMMARY Program managers for each of the vehicles of interest are active participants in the ongoing SET-MR research. Plans call for technologies developed by SETMR to transition directly to the vehicle program managers. SET-MR sensor development has already transitioned to the program manager for tactical engagement simulation. Dismounted soldier prototypes have also transitioned to various organizations. A final demonstration of SET-MR technologies is planned for 2009. This will be a field demonstration of small unit mission rehearsal with a mix of current force vehicles working with dismounted soldiers. It is likely that additional ET research will be approved after SET-MR ends in 2009. REFERENCES Department of the Army. (2000, May). Tank gunnery training devices and usage strategies (Field Manual No. 17-12-7). Washington, DC: Author, Headquarters. Department of the Army. (2003, June). Objective Force Embedded Training (OFET) users’ functional description (TRADOC Pamphlet 350-37). Ft. Monroe, VA: Author, Headquarters. Institute for Simulation and Training. (2002, April). Distributed embedded simulation and training research, Summary of Findings—2001–2002 (Final Rep. No. IST-CR-02-02). Orlando, FL: Author. Marshall, H., Garrity, P., Roberts, T., & Green, G. (2007, November). Initial real-world testing of dismounted Soldier embedded training technologies. Paper presented at the Interservice/Industry Training, Simulation and Education Conference, Orlando, FL. Montoya, J., Lamvik, M., White, J., Frank, G., McKissick, I., & Cornel, G. (2007, November). Switchable vision blocks: The missing link for ET. Paper presented at the Interservice/Industry Training, Simulation and Education Conference, Orlando, FL. Oasis Advanced Engineering, Inc. (2007, October). Software requirements specification for Embedded Mission Rehearsal (eMR)—Version 1.1. Auburn Hills, MI: Author. Optimetrics, Inc. (2007, April). Stryker ET interoperability experiment (Final Rep.). Ann Arbor, MI: Author.
Virtual Environments in Army Combat Systems
91
Research Development and Engineering Command, SFC Paul Ray Smith Simulation & Training Technology Center. (2007, September). Embedded simulation testbed research and FCS technology integration—Consolidated Final Report 2005–2006 (Final Rep.). Orlando, FL: Author. Stryker ET systems layout—Remote weapons station [Briefing]. (n.d.). Sterling Township, MI: General Dynamics Land Systems.
Chapter 10
DAGGERS: A DISMOUNTED SOLDIER EMBEDDED TRAINING AND MISSION REHEARSAL SYSTEM Pat Garrity and Juan Vaquerizo Since the late 1990s, the visual simulation community has been in the midst of a revolution in terms of price, performance, size, scalability, and features available from personal computer (PC) based visual image generation systems. In 2003 this revolution entered a new accelerated phase of system evolution where size, portability, power consumption, and overall system performance were being stretched to the current limits of technology to meet the requirements for fully embedded, battery-operated, man-worn, embedded training systems required by such army programs as land warrior (LW) and future force warrior (FFW). These programs have embedded training key performance parameters written in their respective capability development documents that specify embedded training applications to conduct virtual training exercises and mission rehearsals in the live, virtual, and constructive domains anywhere, anytime. The army lacked a comprehensive training and simulation system for the dismounted soldier to meet these requirements. Such a man-portable system would provide an immersive synthetic environment for the soldier to practice and improve individual and team collective skills with simulation on demand. Packaging, weight, miniaturization, and power requirements have in the past been the primary technology obstacles to building such a system. Rapidly advancing technologies for mobile computing, three-dimensional/two-dimensional (3-D/ 2-D) graphics, interactive simulation environments/models, and immersive virtual reality now make it possible to redress this shortfall. The U.S. Army Research, Development and Engineering Command (RDECOM) Simulation and Training Technology Center (STTC) had been involved in researching dismounted training both in virtual and augmented reality applications to address the challenge of a man-portable simulation system for training the next generation of soldiers. Under the embedded training for dismounted soldiers (ETDS) science and technology objective (STO), RDECOM spearheaded the research with a small team of private industry partners to develop the capabilities required to train dismounted infantrymen (DI) using wearable computers similar to the
DAGGERS
93
operational equipment contemplated for the current LW system and the FFW system in the 2010 to 2012 time frame. As part of this development, STTC needed to research capabilities and test prototypes in a testbed. The DAGGERS (distributed advanced graphics generator and embedded rehearsal system) project supported STOs in this area through the ETDS STO and subsequent embedded combined arms team training and mission rehearsal army technology objective. During the course of early research, the ETDS team investigated several hardware technologies, including commercial off-the-shelf computing platforms, wireless communications, motion trackers, head-mounted displays, battery technologies, and human factors for an advanced warfighter-machine interface prototype. Although the goal of this research was to provide a realistic and highly immersive environment for the soldier to train anywhere and at anytime, it was never considered as a replacement for the live training that the dismounted soldiers perform at military operations in urban terrain sites or training ranges. Current situations in both Afghanistan and Iraq have identified a persistent need for a virtual training simulation to build and sustain the training readiness of dismounted infantry units. These units are required to conduct close combat operations in complex/restrictive terrain against asymmetric forces. Virtual training can prepare and build confident and adaptive dismounted infantry leaders and soldiers to dominate battlefield situations in varying conditions (combined arms, joint interagency and multinational, and contemporary operational environment with urban and complex terrain). The DAGGERS system offers a unique system to research, evaluate, and develop new training doctrine for the twenty-first century warfighters. Historically, there has never been a widely accepted virtual/immersive dismounted soldier training system due to computer form factor, weight, power, display resolution, wireless connectivity limitations, and other human factors restrictions. While these limitations remain a challenge and merit further development, the DAGGERS system successfully illustrates sufficient performance capabilities to offer the warfighter a completely untethered dismounted soldier training and mission rehearsal system.
EARLY STO OBJECTIVES The ETDS STO, managed by the U.S. Army Simulation, Training and Instrumentation Command, was a three-year (fiscal year [FY] 2002–FY2004) research program to develop and demonstrate revolutionary embedded training and simulation capabilities for the dismounted soldier. DAGGERS was one of the main projects under the ETDS STO, and its main objective was the development of a proof of concept embedded training system, completely untethered, soldier worn, battery powered, and requiring no external facilities or infrastructure to operate. Critical advances in graphics processor technology and low power, high performance central processing units provided the foundation for development of the man-portable visual computing system called Thermite, the heart of the DAGGERS system.
94
Integrated Systems, Training Evaluations, and Future Directions
CONCEPT OF OPERATION The DAGGERS system is intended to provide dismounted soldiers with an embedded virtual training and mission rehearsal capability. Using geospecific or geotypical synthetic environment databases, the system provides the ability to move, shoot, and communicate in a combined-arms virtual battlefield. The system is utilized in a distributed (networked) configuration. When the distributed interactive simulation network interface is activated, the system can interface with several other distributed simulation systems, such as other virtual simulators and computer-generated forces (that is, DI-SAF [Dismounted Infantry SemiAutomated Force], OneSAF [One Semi-Automated Forces], and so forth).
DAGGERS ARCHITECTURE The basic DAGGERS system is composed of two hardware systems: the soldier station and an 802.11g (802.11b compatible) wireless broadband router. The router can support multiple simultaneous soldier stations. The soldier station software receives orientation updates from the helmetmounted tracker wired via serial input. The weapon-mounted wireless tracker sends orientation updates via the wireless serial port. These tracker updates are received from an InterSense library using a published application programming interface. The soldier station software receives digital input information from the weapon-mounted controller via the wireless serial interface. User Interfaces The primary viewing device for the soldier station is the eMagin Z800 headmounted display (HMD). All operating systems and image generation software displays will be rendered on the HMD. Users can control their views in the virtual environment by turning their heads and/or their bodies, kneeling, standing, or going prone. By synchronizing virtual motions to corresponding physical motions and maintaining consistent real time video update, motion sickness is almost nonexistent in the DAGGERS’ design. The helmet-mounted motion tracker and the weapon motion tracker are used to determine the user views in the virtual environment with a tracking accuracy of 1° root mean square or less. Users can also control their weapon aiming in the virtual environment by aiming their actual weapons in the real environment. The weapon-mounted tracker (in combination with the helmet-mounted tracker) controls weapon orientation in the virtual environment. Once the system is running, users can maneuver through the virtual environment using the weapon-mounted controller. This device has a joystick that can be used to move the soldier forward and back by pushing the joystick in the up or down direction (relative to the up-down axis of the M4A1 weapon). By rotating their bodies (or their heads) in the real world, users can turn left or right; the same concept applies for looking up or down. In addition, to move (slide) left
DAGGERS
95
and right in the virtual world, the users must press the joystick left or right (again relative to the left-right axis of the M4A1 weapon). There are a few menu selections that are available to the users to allow for sight-type selection, calibration of the helmet, and weapon motion trackers, among others. The weapon controller has a total capability for six independent buttons: one is tied directly to the weapon trigger mechanism; a second one is tied directly to the weapon magazine release; a third one acts as a menu mode toggle (as indicated above); and the other three buttons have functionality that provides for quick sight change (top button), a “reincarnate” function that is useful during tests (second from top), and changing from standing to kneeling and back (second from bottom) for configurations where the leg tracker is not used/desired. The function of the buttons might change in the future based on user feedback and might be fully programmable sometime in the future to allow users to have a system configured to their preferences. Audio cues are presented to the users via the headset that forms part of the helmet subassembly. GAME BASED ENVIRONMENTS AND APPLICATIONS As part of the research for the DAGGERS project, several different game based training environments were evaluated on the DAGGERS prototype system. One of the more interesting games that was integrated and tested on the DAGGERS platform was the America’s Army game (Unreal Tournament–U.S. Army). America’s Army (AA) Developed by the army as a recruiting tool, AA can be downloaded free of charge via the Internet. The game was built using the Unreal Tournament game engine and has become one of the most played first-person-shooter games over the last few years. Since the game’s concept was based on introducing players to the basic training skills required for becoming a U.S. Army soldier (using a first-person perspective), it is a natural platform for the development of a training system. Part of the research that the ETDS STO undertook was to integrate the DAGGERS prototype system with one of the commercial versions of AA. The goal was to test the flexibility of the DAGGERS design by integrating its controls with the game controls and then evaluating the usability of the resulting merger of technologies. Initially the interface was limited to a simple keyboard/mouse emulation that supported the commercial game interface without modifications. This initial effort exposed basic issues related to the design inconsistencies between the DAGGERS’ and the AA’s user input paradigm. DAGGERS’ design implements all user-motion tracking as a highly accurate, absolute measure of the orientation of the trainee’s head, body, and weapon. AA’s design simplifies the user controls and the level of control granularity available to the player. AA uses a computer mouse’s input to control the avatar’s direction and gaze angle. In order to simplify the interface for the average PC user, the avatar’s weapon and head
96
Integrated Systems, Training Evaluations, and Future Directions
(view) move in unison using the mouse’s motion. The game also supports a joystick that essentially acts as a more precise mouse. While these simplifications improved the game playability for PC users with standard interfaces, they minimized and in some cases defeated the usability and functionality of the DAGGERS system design. The main issues created by these simplifications centered on the integration of the mouse-driven user interface and the reduction of the functionality of the DAGGERS system by “locking” together the motion of the avatar’s head and weapon. AA’s User Weapon-Control Integration Like most modern first-person-shooter games, AA portrays the avatar’s view of self and whatever weapon it holds as one unit that moves together throughout the environment. This means that the weapon is always in the ready to fire position and is always pointing in the same direction as the avatar’s view (implied head). This approach works well for these types of games since they require fast, coordinated motion to point and shoot and because most PC users have a mouse (or joystick) that they can use to control the avatar and its weapon. That approach increases the target audience since anyone with a PC can play the game. In contrast, the DAGGERS system was designed to increase the level of virtual immersion that a user experiences in part by allowing the avatar’s view direction to be separate and independent from the weapon’s direction. By using two independent motion trackers (one for the head and one for the weapon) and incorporating a display (HMD) that physically follows the user’s head motion, the DAGGERS system creates a very realistic feeling of “being there” that allows the user to experience a deep level of immersion. FUTURE APPLICATIONS OF THE TECHNOLOGY Although the DAGGERS system was designed as a self-contained dismounted soldier training device using today’s technologies, the result was a system with a vast field of potential applications (commercial and government) and large potential for expansion and growth. As identified in the gaming technology experiments, by mixing the right components of system hardware, software, and human-machine interface designs, the DAGGERS concept has the potential to provide very immersive and engaging virtual environments. In addition to the obvious use of this system to train dismounted soldiers, this system could easily be adapted to provide an immersive environment for almost every station of a mounted crew trainer, with the added advantage of having the ability to dismount and mount from the vehicles at will, all while remaining immersed in a full surround 3-D scenario. Use of this system, or an upgrade of it in this way, could mean that the Department of Defense services, such as the army, the marines, the navy, and Special Operations Command could have complete crew training systems (9 to 12) fielded anywhere in the world, using a single small transport crate and requiring essentially no support infrastructure (see Figure 10.1). Other
DAGGERS
Figure 10.1.
97
Small Unit Training Using DAGGERs
potential applications of this technology are for training police, emergency response, and SWAT (special weapons and tactics) teams. By replacing the weapon subsystem with a firefighting tool(s) subsystem, teams of firefighters could train side by side to learn how to work as a team to contain large urban or wild fires. Beyond regular training applications, the DAGGERS system design is ideally suited for in-theater mission rehearsal applications. Due to its ability to be used anywhere at anytime and its full 3-D immersion capabilities, this system represents the ultimate way for a team to rehearse details of an upcoming mission in a way that allows all participants to experience the environment as if they were there. Participants can analyze all moves, identify all relevant structures around them, and play “what if?” games while rehearsing the mission rather than guessing once they are there. The system design allows individuals outside the mission team to participate from anywhere in the world while relating relevant experiences about certain locations or structures and even guiding the mission team virtually through the environment. THE FUTURE OF DAGGERS’ SUCCESSORS In looking toward the future and analyzing some of the deficiencies in the technologies or components used by the DAGGERS system, some of the future
98
Integrated Systems, Training Evaluations, and Future Directions
concepts look very promising. One of the areas where the current system is lacking is in the area of HMD technology. While the HMD technology used in the system is the best you can buy (today) within a reasonable budget, it is still very deficient in image resolution (800 × 600), field of view (40° diagonal FOV), and optical distortion effects. Future DAGGERS’ successors should provide at least 1,280 × 1,024 image resolution per eye (perhaps even 1,600 × 1,200), all while providing a FOV greater than 200° horizontal and 100° vertical covering the entire field of human vision with distortion-free, very fast (>60 hertz) and deterministic video portraying highly detailed visual representations of virtual environments. Although the tracking technology used by the current system design works relatively well, newer motion tracking technologies and human machine interfaces will have to be used to provide a system that can support a wide variety of communication metaphors, such as hand, body, facial gestures, and eye movement. Future successor designs will have to do all of this while maintaining or lowering the cost of the system, making user operation of the system natural and intuitive, and maintaining most if not all of the basic design principles of the original DAGGERS system.
Chapter 11
MEDICAL SIMULATION TRAINING SYSTEMS M. Beth Pettitt, Michelle Mayo, and Jack Norfleet
BENEFITS AND NEED OF SIMULATION IN MEDICAL TRAINING Medical training has changed significantly throughout the years. Traditional classroom lectures are still useful for teaching the fundamentals of medicine, but hands-on training is critical for developing lifesaving skills. Through the years, hands-on training has progressed from live animals to cadavers, to high tech patient simulators and haptic interfaces, to virtual environments throughout the military and civilian medical communities. The old model “see one, do one, teach one” is effective in controlled environments, such as universities and teaching hospitals, but gaining the necessary numbers of patient contacts in military training is nearly impossible given the number of students needing training. Simulation has emerged as an important player in training medical personnel to master those skills needed to preserve human life (Mayo, 2007). HISTORY OF THE USE OF SIMULATION IN MEDICAL TRAINING Although medical professionals have used simulation devices for hundreds of years, the military really began institutionalizing computerized simulation for medical training only approximately 10 years ago. Even at that time, simulation was primarily used for verification and validation of the simulation technology’s training effectiveness. The stated model for trying to employ medical simulation and training in existing military medical curriculums was the aviation simulation community. The operation of actual flights is hazardous and potentially fatal without proper hands-on training. Didactic training is still important, but it does not teach the tactile awareness and the psychomotor skills necessary to fly an aircraft. The military realized this and addressed this problem as early as World War I when pneumatic motion platforms powered mock-up cockpits for handson training. Pilots were able to learn controls and movement of the aircraft without ever having to become airborne. Medical training faced some of the same
100
Integrated Systems, Training Evaluations, and Future Directions
challenges as aviation training, but it took much longer for simulation to be accepted by the military. Hands-on training with patients was nearly impossible to achieve given the large number of students versus the small number of available patients. Until a decade ago military medical students (especially the prehospital caregivers, such as the combat medic) were forced to practice their skills on animals, on cadavers, and occasionally on other students and current patients in military medical facilities. Over the past 10 years, the army’s research and development community has striven to improve medical training through the use of simulation and has changed the way the army trains combat medics and combat lifesavers. In 1997, the Simulation, Training and Instrumentation Command’s technology base, now part of the Research, Development and Engineering Command (RDECOM) Simulation and Training Technology Center (STTC), developed the first distributed medical simulation training system called the combat trauma patient simulation (CTPS). The CTPS system was the first attempt by the army to introduce distributed interoperable simulation products into the military medical training community that up to that time had relied heavily on didactic and live tissue training. The CTPS program yielded significant research and user data for its time that guided an additional 10 years of research and development of simulation technologies for military medical training (Rasche & Pettitt, 2002). The purpose of the CTPS program was to provide a simulated capability to realistically assess the impact of battlefield injuries on the military medical care structure. Commercial off-the-shelf and government off-the-shelf components provided a network infrastructure to simulate medical handling and treatment for combat injuries through every echelon of care on the battlefield. Capabilities include simulating, replicating, and assessing combat injuries by type and category, monitoring the movement of casualties on the battlefield, capturing the time of patient diagnosis and treatment, and comparing interventions and outcomes at each military treatment level and passing injury data to and from warfighter simulation systems (Carovano & Pettitt, 2002).
STAND-ALONE VERSUS NETWORKED MEDICAL TRAINING SYSTEMS CTPS is a networked medical training system designed with an open system architecture to allow interoperability with any simulation system, including other patient simulators, distance learning systems, constructive simulations, and instrumentation systems (Figure 11.1). The system consists of six treatment nodes, a casualty transfer network, pre-programmed clinical scenarios, and an after action review capability. Computerized mannequins representing treatment nodes are the most recognizable CTPS components as they provide the primary student interface with hands-on training for users. Sophisticated physiological models drive the computerized mannequins, forcing users to assess and treat the casualty. The transfer network electronically moves casualties through each echelon of care. Clinical scenarios provide standardized training on specific medical
Medical Simulation Training Systems
Figure 11.1.
101
Combat Trauma Patient Simulation System Cycle of Care
skills and can be varied for trainees of differing skill levels. The mannequins have the ability to accurately represent hundreds of injury and disease conditions, both battlefield specific as well as those primarily seen only by civilian care providers. The CTPS program set out to develop the end-all medical simulation training system, which unfortunately, with a large and costly footprint, was not reasonable for training the huge numbers of combat medics and combat lifesavers. The CTPS program did, however, generate vast amounts of data that identified training gaps and user requirements. These data were used as guides to develop smaller, lighter, cheaper, and more portable stand-alone training system solutions. A prime example of technology that emerged from the CTPS program is the mannequin known commercially as the Emergency Care Simulator (ECS). The ECS, requiring only compressed air and a single laptop for operation, is a stand-alone system with a much smaller footprint than the networked CTPS system or the more complex mannequins originally used in the CTPS system. As part of the ECS development, it was incorporated into the CTPS system to bolster the pre-hospital nodes that trained combat medics, combat lifesavers, and first responders. The pre-hospital levels of care encompass the vast majority of military medical caregivers. This new technology introduced a new capability that was affordable and the right fidelity for training large numbers of students in both classroom and nonclassroom settings for more realistic and effective training. These users, both military and civilian medical personnel, were finally testing the limits of current simulation training systems, which subsequently led to new requirements. These
102
Integrated Systems, Training Evaluations, and Future Directions
requirements include increased portability, more realistic physiological attributes, and ruggedness ideal for movement in atypical training environments. The result was the Stand-Alone Patient Simulator (SAPS) (Figure 11.2), which blends physiologically accurate injuries, sensor technologies, miniaturization/packaging technology, and wireless networking technologies with state-of-the-art patient simulation technologies. Designed primarily to meet military training needs derived from user test data, the SAPS is a rugged, full body patient simulator that is physiologically based and completely wireless, enabling soldiers to move their patients to truly train as they fight. The original CTPS system transferred patients through an electronic network, but SAPS requires the physical evacuation of patients from the battlefield. This enhanced capability allows for more accurate resource management data to be collected from the simulation. For example, instead of hitting a button and sending the virtual patient on its way, the SAPS continues to deteriorate during transport. With the possibility of the patient dying, the evacuation system is exercised and en route care is required. This capability opens a training realm that has not been possible in the past and has the potential to better prepare warfighters to save lives and resources. Other simulation technologies that have emerged from the CTPS program include partial task trainers developed to train specific skills (Figure 11.3), game based systems for patient management and basic care, as well as surgical trainers for more advanced procedures, such as the crichothyroidotomy. As CTPS research has proven over the years, there is no end-all training system for medical simulation. Stand-alone systems, partial task trainers, and networked,
Figure 11.2.
A Stand-Alone Patient Simulator (SAPS)
Medical Simulation Training Systems
Figure 11.3.
103
A Partial Task (Tourniquet) Trainer
interoperable systems all serve specific purposes and contribute to the overall success of training medical personnel for the difficulties of providing care on the battlefield. SUCCESS STORIES Transitioning medical research and development efforts to actual funded programs has been a government challenge for years. Because of the disbursed nature of medical simulation and training facilities and, consequently, medical simulation research and development, this has been a particularly difficult challenge for the medical simulation community. There are, however, some success stories that have emerged from medical simulation research at RDECOM and other organizations. None of these simulation success stories have resulted from any type of formal or typical acquisition program. As an example of this, the CTPS system was developed with congressional research funding by a commercial vendor, Medical Education Technologies, Inc. This research has been performed through the years by academia, industry, and government partners, including the University of Central Florida Institute for Simulation and Training in Orlando, Florida; the Tekamah Corporation in Fairfax, Virginia; the National Center for Simulation in Orlando, Florida; and the U.S. Army Medical Research and Materiel Command (MRMC) located at Fort Detrick, Maryland. RDECOM STTC has provided oversight and management for this government initiative over the past 10 years through numerous congressional awards. User based design and a spiral engineering methodology guided development throughout
104
Integrated Systems, Training Evaluations, and Future Directions
all nine phases of the CTPS program. Early phases yielded a test and evaluation program that ensured functionality and usability of the components. By phase 4, a full system test and evaluation were implemented that measured training efficacy of the system in a simulated operational environment. The last few phases of the program incorporated stand-alone trainers and refined the overall design. Currently, user tests are still ongoing at Fort Gordon, Georgia; Fort Polk, Louisiana; Camp Pendleton, California; and the Defense Medical Readiness Training Institute, Fort Sam Houston, Texas. After several successful installations and user evaluations, the CTPS system became a catalyst for driving more medical simulation efforts within the army. The army funded three science and technology objectives to build upon the CTPS architecture and develop advanced medical training technologies for combat medics and combat lifesavers. The most recent technological development lies in the SAPS. The SAPS is changing the way the military medical community conducts training, while finally gaining appropriate support from the army leadership. This is a direct result of the revolution in medical training that CTPS started 10 years ago. Throughout the years of CTPS user tests, research, and development, the army has realized the positive impact medical simulation devices have on training. In 2005, Lt. Gen. Kevin C. Kiley (Kiley, 2005), under orders from the Vice Chief of Staff of the Army (VCSA), signed an operational needs statement (ONS) for medical simulation training centers (MSTC) to be established for combat lifesavers (CLSs) and tactical combat casualty care (TC3) training. This ONS is the first time the army has stated that medical simulation is critical for the success of the army’s mission. It also established the very first medical simulation acquisition program that is currently being managed by program executive office simulation training and instrumentation. Although the military has been training with medical simulation devices for many years, the MSTC program provides lifecycle support for the fielded training devices. Sites are also provided instructors, a building with all classroom equipment, and the necessary training devices to train combat medic advance skills training, TC3, and CLSs. Some researchers speculate that the CTPS program jump-started the MSTC acquisition program. Without the CTPS system and its derivative technologies being used at installations for user tests and training, soldiers would not have known the capability existed. As a result, units and installations would not have procured and sustained systems on their own, and the VCSA would not have realized the power of simulation in medical training or seen the need to establish the MSTC. Furthermore, the training device that was selected and fielded as the army standard patient simulator for the MSTC was the ECS.
CHALLENGES TO THE USE OF SIMULATION IN MEDICAL TRAINING As with all simulation and training systems, verification and validation (V&V) must be performed. However, there must be well-established test objectives and
Medical Simulation Training Systems
105
criteria, and the V&V must be based on current training practices and objectives. It is obvious that there is no 100 percent solution for any simulation and training system. To continue to validate a system with no well-defined performance criteria is futile, and all of the results become null and void as soon as the system changes. Some in the military research community have used government V&V as the end goal and ignored the need to use technology to improve training and ultimately save lives. Many of the technologies in question have been proven effective training tools in the civilian sector by improving test scores and by providing training opportunities that would not otherwise exist. Many times, these systems are not new at all; they are simply new to the government. V&V should not be used as a roadblock, but as a way to measure the strengths and weaknesses of a system. REQUIRED BREAKTHROUGHS FOR THE GROWING USE OF SIMULATION IN MEDICAL TRAINING One of the next big challenges of medical simulation and training devices will be to improve the realism of the simulation injury or disease state. The goal is to create training devices that are so realistic that they are indistinguishable from real skin, flesh, blood, and bone, including smells and textures. To be successful as training devices, the new technologies must also be durable and reusable to be able to meet the training needs. To meet these goals, the army has established an advanced technology objective jointly funded by RDECOM STTC and MRMC to investigate the next generation of simulated injuries and injury repair. As with so many other military medical simulation developments, these results will serve multiple medical professionals from frontline medics to surgeons. WHAT THE CIVILIAN AND MILITARY COMMUNITIES CAN LEARN FROM EACH OTHER Technologies developed for military medical simulation are directly applicable for civilian use. Additionally, civilian developments in medical simulation often meet military needs with little or no additional research. For example, the mannequins that were the main components of the original CTPS system came from industry. The military adapted those devices for its use, adding such capabilities as trauma and chemical and biological scenarios that have become indispensable in the post 9-11 world. There are also a number of game based platforms that have easily flowed between military and homeland defense uses. Medical simulation is a global requirement as all caregivers, regardless of uniform, must master their skills in order to save lives. Saving lives is the ultimate goal. REFERENCES Carovano, R. G., & Pettitt, M. B. (2002, December). The combat trauma patient simulation system: An overview of a multi-echelon, mass casualty simulation research and
106
Integrated Systems, Training Evaluations, and Future Directions
development program. Paper presented at the Interservice/Industry Training, Simulation, and Education Conference, Orlando, FL. Kiley, K. C. (2005, April). Operational needs statement for medical simulation training Centers for Combat Lifesavers (CLS) and Tactical Combat Casualty Care (TC3) Training [Memorandum to Deputy Chief of Staff, G-3]. Mayo, M. (2007). The history of the Army’s research and development for medical simulation training. Paper presented at the Fall 2007 Simulation Interoperability Workshop, Orlando, FL. Rasche, J., & Pettitt, M. B. (2002, December). Independent evaluation of the Combat Trauma Patient Simulation (CTPS) system. Paper presented at the Interservice/Industry Training, Simulation, and Education Conference, Orlando, FL.
Chapter 12
AVIATION TRAINING USING PHYSIOLOGICAL AND COGNITIVE INSTRUMENTATION Tom Schnell and Todd Macuda Effective flight training requires a pedagogical approach that provides training scenarios and quantitative feedback. The goal of flight training is to provide the student with a lasting knowledge base of relevant information, a set of cognitive and motor skills, and the tools to exercise judgment in situation assessment. Learning a skill, such as flying, is a cognitively involved process. Today, instructors use objective measures of task performance and additional estimated, subjective data to assess the cognitive workload and situation awareness of students. These data are very useful in training assessment, but trainees can succeed at performing a task purely by accident (referred to as “miserable success”). Additionally the student can be in a less than optimal state for learning when the instructor/operator applies brute force training tasks and methods with little regard to the learning curve, which can result in the training being too easy or more often too difficult thereby inducing negative learning. By using neurocognitive and physiologically based measures to corroborate objective and subjective measures, instructors will be much more able to diagnose training effectiveness. At the Operator Performance Laboratory (OPL), we quantify cognition using a system that we call the quality of training effectiveness assessment (QTEA) tool. QTEA is based on the cognitive avionics tool set (CATS) that has been developed over the last four to five years at the OPL. QTEA is a systems concept that allows the instructor pilot to assess a flight student in real time using sensors that can quantify cognitive and physiological workload. Using QTEA, the instructor can quantify the student’s workload level in real time so that the scenarios can be adjusted to an optimal intensity. The cognitive and physiological measures also serve as a measure of a student’s learning curve, making it possible for the trainer to detect plateaus in learning. Using QTEA, the trainer will be able to assess the need for further training in a student. The basic idea of QTEA is to give the instructor a real time picture of the performance of a student based on physiological and cognitive data, flight technical data, and mission-specific data. With the help of the data collected by the instrumentation, the instructor is able to provide
108
Integrated Systems, Training Evaluations, and Future Directions
the student with a detailed after action review. This provides the student with unambiguous knowledge of results (KR), a key ingredient in closing the learning loop. The deployment of the QTEA instrumentation goes hand-in-hand with instrumented flight training assets, such as those that are available at the OPL and the National Research Council Flight Research Laboratory (NRC-FRL). The OPL has focused its instrumented training paradigms on fixed-wing simulators and aircraft, and the NRC-FRL has specialized on rotary-wing aircraft. This chapter illustrates a training system that can be developed from those core capabilities. QTEA leverages years of work in basic research in augmented cognition (Schmorrow, 2005) and applies it to the exciting field of flight training. Wilson et al. (1999, 2003, 2005) generated a sound foundation of workload measurement using neurophysiological measures in the aviation context. Schnell, Keller, and Macuda (2007a, 2007b) described their workload assessment research in actual flight environments. The introduction of human neurocognitive data to quantify the effectiveness of training represents a breakthrough application in the operational field of flight training.
INSTRUMENTED TRAINING SYSTEMS REQUIREMENTS There are several methodologies that have evolved in flight training over the years, and effective instructor pilots are familiar with the broad principles of learning (FAA-H-8083-9, 1999). A critical component of these principles is motivating students to learn. It is also well known that exercise and repetition are a required element of flight training, but care must be taken not to “burn out” the student with mindless rote memory and recall exercises. Through cause and effect, students learn skills more quickly, especially if the effect was associated with a positive feeling of accomplishment. In the training context, it is not productive to expose students to scenarios for which they are not ready and in which they are guaranteed to fail. The feeling of defeat and failure that comes from such exposure is sure to hamper the learning effect. The principle of primacy is important as well, and it is the instructor’s job to ensure that a skill is taught correctly the first time. Habit-forming mistakes are hard to eliminate once they become engrained. An instrumented quantitative approach such as our QTEA system concept provides the instructor with the required tools to detect bad habits before they become engrained. Training scenarios need to have an intensity that is appropriate to the level of expertise in the student. Instrumented training systems such as QTEA should provide the instructor with real time cognitive loading data of the student, thus allowing for real time adaptation of scenario intensity. Figure 12.1 shows an example of how QTEA can be integrated into training station software, such as the Common Distributed Mission Training Station (CDMTS). Figure 12.1 shows CDMTS as the platform that is used to generate and manage the training scenarios and analyze the results for after action review. Through a plug-in software interface inside of CDMTS, QTEA provides such components as a timeline with traces of workload, level of fatigue, and
Aviation Training Using Physiological and Cognitive Instrumentation
109
Figure 12.1. Quality of Training Effectiveness Assessment System Incorporated in the Common Distributed Mission Training Station
110
Integrated Systems, Training Evaluations, and Future Directions
situation awareness, as well as a workload meter. In this concept, the user also has the capability to drill into the timeline at a specific time in the scenario to launch the QTEA application as shown in Figure 12.1. This drill-down capability allows the user to find detailed information about pilot and aircraft states at selected locations on the timeline. In conversations with potential users, we identified the following core requirements for the user interface: 1. User Interface a. Synchronized timeline of scenario events—This allows events to be shown as flags on a timeline for easy visualization. b. Quality of training metric based on student neurophysiological and aircraft state data—This metric is described in Schnell, Keller, Cornwall, & WalwanisNelson (2008). c. Ability to drill down into the sensor and aircraft state data to determine root causes of training problems—This is illustrated in Figure 12.1. The user should be able to click at any location along the timeline to explore the underlying aircraft and human performance data. d. Graphical workload meter (real time)—This meter should represent an overall score of workload, such as the National Aeronautics and Space Administration (NASA) Task Load Index (Hart & Staveland, 1988). The overall score may consist of such subscores as task difficulty, time pressure, flight technical performance, sensory effort, physical effort, frustration, and so forth. The subscores may be revealed in the drill-down mode. e. Summary of performance (pass/fail)—This score is based on flight and mission technical performance (Schnell et al., 2008). f. Real time adaptation of scenario intensity based on operator state and aircraft state characterization—This system capability is intended to keep the student optimally stimulated. Scenario intensity and difficulty may be increased by adding tasks such as clearance changes (rerouting, holding, and approaches), weather changes, or non-normal situations. Scenario difficulty may be simplified by removing additional tasks and by providing steering vectors. g. Quantitative after action review using hard sensor data from the instrumented flight training system—This is one of the main purposes of QTEA. h. Ability to interact with established training tools, such as CDMTS. 2. Operator State Classification a. Reliable sensor and classification system. b. Simple to train/calibrate with little or no need for sensor preparation. c. Real time workload and situation awareness gauge. d. Indicator of available cognitive resources. 3. Architecture a. PC based and low cost. b. Ability to connect to multitude of sensors of different sampling rates. c. Sensor manufacturer independent—This means that the training system should be flexible enough to accommodate sensors from different providers.
Aviation Training Using Physiological and Cognitive Instrumentation
111
d. Rugged and robust system for use by flight instructors without expertise in neural measurement. e. Networked, high level architecture (HLA) federation, and Ethernet. This will provide the training system with the ability to participate in live virtual training exercises involving several participants across a networked federation of flight simulators and flight assets. f. Automatic synchronization of data with robust protocol. g. Distributed mission operations capable for team training and assessment of team cognition. h. Easy to integrate in a virtual environment. i. Integrate and fuse neural, physiological, aircraft, and mission state data. j. Rugged and tested. 4. Usability a. Sensors must be easy to set up. b. No burden on trainer; easy to learn and use. c. Instrumented system must save training time without loss in quality of training. d. Better KR.
In our work toward an instrumented training platform, we addressed these user requirements by translating them to engineering requirements as embodied in our QTEA concept. For example, to achieve a high system level of acceptability early on, we feel that a phased approach of sensor deployment may be sensible (Schnell et al., 2008). Sensors that are ready for deployment with high payoff from a flight training point of view include eye tracking, electrocardiogram (ECG), and respiration amplitude and frequency. Past research by Schnell, Kwon, Merchant, Etherington, & Vogl (2004) indicated the value of eye tracking systems in determining the scanning patterns of instrument pilots. Also, as a certified instrument flight instructor, Schnell found that some instrument flight students who have problems flying accurate instrument approaches used inappropriate instrument scanning techniques. Detailed after action review using eye scanning records will provide such instrument flight students with evidence that their scans can be improved. On this basis, we are confident that these sensors can benefit the training community today with considerable cost savings. Next in line would be neural sensors, such as electroencephalogram (EEG), that could be developed to field hardened robustness in a matter of a few years. The key is to use an underlying core architecture such as QTEA that can grow with maturing sensor technology.
SYSTEM HIGH LEVEL DESIGN The QTEA architecture concept (see Figure 12.2) was developed by OPL to interface with the airborne and simulation assets at the OPL and the NRC-FRL. Additional connectivity to established training systems, such as CDMTS, was enabled through a plug-in architecture, and connectivity to distributed training
112
Integrated Systems, Training Evaluations, and Future Directions
Figure 12.2.
Quality of Training Effectiveness Assessment System Architecture
systems, such as the deployable virtual training environment, was enabled through an HLA gateway. QTEA uses several neural and physiological sensors, including dense array EEG, ECG, galvanic skin response, pulse oximetry, respiration (amplitude and frequency), and noncontact measures, such as facial feature point location, facial temperature differences, and eye tracking. Not all sensors need to be used to perform operator state characterization. During a transition from the research to the training world, it may be best to deploy sensors that are ready for field use, such
Aviation Training Using Physiological and Cognitive Instrumentation
113
as eye tracking and ECG first and gradually migrate the other sensors as they mature to fieldable products. DEVELOPMENT OF QTEA SYSTEM PROTOTYPES QTEA is an instrumented aviation training systems concept developed by the OPL. The instrumented aircrafts that OPL and NRC-FRL use in their cognitive avionics research are the Computerized Airborne Research Platform (CARP) at OPL and the Bell 412 Advanced Systems Research Aircraft (ASRA) at NRCFRL. The CARP is a Beech A-36 Bonanza equipped with the full QTEA suite of sensors, a tactile cueing seat, and liquid crystal display overlay displays that can be controlled to systematically manipulate pilot situation awareness, spatial orientation, and workload. The Cognitive Delfin (COD) is an Aero Vodochody L-29 Delfin training aircraft that OPL acquired in February 2008. Once fully equipped with QTEA, this aircraft will support research in advanced cognitive avionics and concepts of airborne simulation for training of aviators. Airborne simulation refers to a concept where part of the scenario is simulated and part of it is real. That is, the aircraft is flown in real airspace with real dynamics, but in simulated battle space. Through a high speed datalink, the COD will be able to federate with ground based training assets. The ASRA facility at NRC-FRL is a single-string (simplex), full authority, flyby-wire (FBW) system (Gubbels, Carignan, & Ellis, 2000). The ASRA facility has a unique capability to change the control laws of this airborne platform to allow the simulation of other helicopter types and to vary the workload experienced by the evaluation pilot. This provides test pilot school participants with a rotorcraft platform that can emulate a wide range of handling qualities. The native handling quality of normal commercial helicopters is so refined that test pilot school participants cannot experience the range of poor handling qualities the Cooper-Harper rating scale can offer. By manipulating the control laws of the FBW system, the NRC-FRL can give students a sense what a helicopter with poor handling qualities would feel like. Combined with our operator state classification technology, it is becoming feasible to systematically manipulate pilot workload by manipulating handling quality and simultaneously monitoring operator state for the purpose of providing KR in pilot training programs. OPERATIONAL CONSIDERATIONS OF INSTRUMENTED AVIATION TRAINING SYSTEMS QTEA uses a battery of neurocognitive and physiological sensors on the student, fuses these data with aircraft and mission data, and applies sophisticated signal processing and classification techniques to gain a fuller picture of training effectiveness. The aviation training community could benefit from such quantitative tools that measure the effectiveness of training on the basis of human performance data. Using this real time data, training scenarios can be adapted to optimal intensity so as to maximize the effectiveness of learning. This will likely
114
Integrated Systems, Training Evaluations, and Future Directions
save money in terms of shorter training times, less time spent on actual airborne training assets, higher operational success levels, and improved performance based specifications for flight simulators.
ITERATIVE INTEGRATION AND TRANSITION Transitioning a tool such as QTEA may be best accomplished through an initial deployment of the mature components, such as eye tracking and ECG based operator state characterization measures of training effectiveness. Other sensors including EEG can be transitioned in iterative steps when they reach operational readiness that can satisfy the user requirements. In the naval aviation context we identified the CDMTS framework as the most effective way for an initial transition of QTEA to the fleet. CDMTS is an instructor software package that handles all aspects of training, ranging from scenario development to after action review. Through a plug-in mechanism, it is possible to integrate the neurocognitive measures from QTEA into the training timeline for dynamic adaptation of scenario intensity and for quantitative after action review. An early demonstration of the QTEA system was held at the 2007 Interservice/Industry Training, Simulation, and Education Conference in Orlando, Florida. The system was integrated in a fixed-base jet procedure training flight simulator and connected to the CDMTS station.
TEST AND EVALUATION Tests of the underlying neurocognitive operator state characterization system have been conducted over the past few years at OPL and NRC-FRL. Research on the ASRA and CARP was discussed by Schnell, Keller, and Macuda (2007a, 2007b) and Schnell, Macuda, and Poolman (2006). The detailed architecture of the CARP and the ASRA are discussed by Schnell, Macuda, and Keller (in press). Going forward, we intend to test QTEA in environments that will iteratively approach operational status. Of particular importance will be the development of detailed performance measures that the instructor pilot can use to assess the student’s performance. These measures are discussed in detail by Schnell et al. (2008). There are two types of performance measures in QTEA, mission specific and physiological ones. The mission specific measurers will change with the use case of QTEA. For naval aviation in a close air support task, they will include airmanship measures (flight technical), administrative, and tactical. The physiological measures describe the body’s reaction to the task and include cognitive measures quantifying stress, cognitive resources, and attention (alertness). The physiological measures subsystem of QTEA will function automatically and in real time, providing the trainer with a metric that indicates cognitive loading. The mission-specific performance measures will be integrated and partially automated with trainer override. QTEA will provide the trainer with a leading
Aviation Training Using Physiological and Cognitive Instrumentation
115
indication of impending task saturation, thus providing a measure of the student’s cognitive bandwidth. CONCLUSIONS Instrumented aviation training systems, such as QTEA, provide instructors with quantitative data of the student’s performance. These data can be used by automated scenario generation systems to adjust scenario intensity in real time to maximize learning by keeping stimulation at its optimal level. The quantitative data generated by QTEA also provide for superior after action review, offering the instructor and the student the ability to review deviations in mission or flight technical domains, as well as the occurrence of cognitive (workload) bottlenecks, poor control manipulation, or ineffective eye scanning technique. Through review and discussion of such quantitative data, the instructor and the student can develop training strategies that achieve the training goal in a shorter time than would be possible without such advanced tools. REFERENCES FAA-H-8083-9. (1999). Aviation instructor’s handbook. U.S. Department of Transportation, Federal Aviation Administration (FAA), Flight Standards Service. Gubbels, A. W., Carignan, S., & Ellis, K. (2000). Bell 412 ASRA safety system assessment (Tech. Rep. No. LTR-FR-162). Ottawa, Canada: Flight Research Laboratory. Hart S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index), results of theoretical and empirical research. In P.A. Hancock & N. Meshkati (Eds.), Human mental workload (pp. 239–250). Amsterdam: North Holland Press. Schmorrow D. (Ed.). (2005). Foundations of Augmented Cognition. Mahwah, NJ: Lawrence Erlbaum. Schnell, T., Keller, M., Cornwall, R., & Walwanis-Nelson, M. (2008). Tools for virtual environment fidelity design guidance: Quality of Training Effectiveness Assessment (QTEA) tool (Rep. No. N00014-07-M-0345-0001AC, Contract No. N00014-07-M034). Arlington, VA: Office of Naval Research. Schnell T., Keller, M., & Macuda, T. (2007a, October). Application of the Cognitive Avionics Tool Set (CATS) in airborne operator state classification. Paper presented at the Augmented Cognition International Conference, Baltimore, MD. Schnell, T., Keller, M., & Macuda, T. (2007b, April). Pilot state classification and mitigation in a fixed and rotary wing platform. Paper presented at the Aerospace Medical Association (ASMA) annual conference, New Orleans, LA. Schnell T., Kwon J., Merchant S., Etherington, T., & Vogl, T. (2004). Improved flight technical performance in flight decks equipped with synthetic vision information system displays. International Journal of Aviation Psychology, 14(1). Schnell, T., Macuda, T., & Keller, M. (in press). Operator state classification with the cognitive avionics tool set. In D. Schmorrow & K. Stanney (Eds.), Augmented Cognition: A practitioner’s guide. Schnell, T., Macuda, T., & Poolman, P. (2006, October). Toward the cognitive cockpit: Flight test platforms and methods for monitoring pilot mental state. Paper presented at the Augmented Cognition International Conference, San Francisco, CA.
116
Integrated Systems, Training Evaluations, and Future Directions
Wilson, G. F., & Russell, C. A. (1999). Operator functional state classification using neural networks with combined physiological and performance features. Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting (pp. 1099–1102). Santa Monica, CA: Human Factors and Ergonomics Society. Wilson, G. F., & Russell, C. A. (2003). Real-time assessment of mental workload using psychophysiological measures and artificial neural networks. Human Factors, 45, 635–643. Wilson, G. F. (2005). Operator functional state assessment in aviation environments using psycophysiological measures. Cognitive Systems: Human Cognitive Models in System Design Workshop, Santa Fe, NM.
Chapter 13
VIRTUAL ENVIRONMENT LESSONS LEARNED Jeffrey Moss and Michael White This chapter provides an overview of lessons learned from integration and transition of research prototypes into joint interoperable training systems. Specific examples we cite include joint interoperability special events from the Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), as well as the virtual at sea trainer, virtual technologies and environments, virtual fire support trainer, virtual simulation training toolkit, and the deployable virtual training environment programs. The term “joint” is used to describe actions, events, or operations within or associated with the military that involve more than one service. Highlights of our activities included the following: 1. Innovative reuse of existing technologies, 2. Development of new capabilities that allow multiple simulation systems to interact with one another along with live battle command systems, creating a training environment that allows military members to train as they fight. 3. Replacement of proprietary software solutions with open source applications, 4. Use of rapid prototyping, allowing users to experiment with and provide feedback on systems that are still under development, and 5. Integration of science (that is, psychologists, industrial hygienists, and educators conducting usability studies, training effectiveness evaluations, and so on) and technology (discussed in items 1–4).
TASK ANALYSIS Performance Standards The process used for performing task analysis was generally the same for all programs. In each instance, a thorough understanding of the military knowledge domain was indispensable in defining the capabilities required of the systems. The knowledge acquisition process began with investigation of existing military doctrine, as well as current tactics, techniques, and procedures. The initial
118
Integrated Systems, Training Evaluations, and Future Directions
analysis culminated in the preparation of a systems requirements document, the basis for rapid prototyping, tracing requirements to training needs, and providing the foundation for iterative testing and integration. Lesson Learned Tether performance standards to military doctrine. Technology Requirements Technology requirements were derived from a survey of current and emerging technologies. Desired capabilities were based on performance requirements and tempered with system optimization and, of course, budgetary considerations. Hardware and software system configurations were selected based on intended use. Low cost hardware alternatives were employed when using facilitated simulation goals (Stripling et al., 2006). For prototype systems targeted toward transitioning to the Marine Corps’ deployable virtual training environment, we chose to adopt a notebook computer configuration. Early virtual technologies and environments experimental systems (demonstration I and demonstration II) employed rack-mounted computers as well as desktop and notebook computer configurations. At least one of the virtual technologies and environments demonstration II systems, the Warfighter Human Immersive Simulation Laboratory’s Infantry Immersive Trainer (see Stripling, Muller, Schaffer, and Cohn, Volume 3, Section 1, Chapter 5), adopted a combination of single-person pods, high end head-mounted displays, and infrared tracking cubes, all mounted within a medium room–sized tubular framework. Innovative reuse of existing technologies was a common theme throughout all the efforts. For instance, the multihost automated remote command and instrumentation application, originally developed under the synthetic theater of war advanced concept technology demonstration, served as the core element of the battle master exercise control component in virtual technologies and environments demonstration I and again as the core element of the common distributed mission training station in virtual technologies and environments, virtual fire support trainer, virtual simulation training toolkit, and deployable virtual training environment, where it is used to launch and control multiple computer applications across computers interconnected by a local area network or a wide area network. Another example of technology innovation common to these projects was development of the joint live virtual constructive data translator. Initially funded as a research and development effort for Joint Forces Command, joint live virtual constructive data translator allows multiple disparate simulation systems using such communication protocols as distributed interactive simulation, high level architecture, and training and testing enabling architecture to interact with one another. Beyond facilitating communications across multiple simulation protocols, the team adapted joint live virtual constructive data translator, enabling stimulation of live battle command systems by virtual and constructive
Virtual Environment Lessons Learned
119
simulations to create a comprehensive training environment, allowing military personnel to train as they fight. For instance, the development team created a joint live virtual constructive data translator module that translates digital messages between the advanced field artillery tactical data system and the computergenerated forces application know as Joint Semi-Automated Forces, resulting in appropriate, realistic interactions between the live battle command system and constructive artillery elements. One of the challenges we sought to overcome was that of replacing proprietary software solutions with open source applications, saving the government money on licensing fees while enhancing the training experience. For example, the original virtual fire support trainer prototype employed a proprietary image generation system that imposed a $10,000 license fee per computer. Considering the implications of this expense for the Marines Corps deployable virtual training environment program’s initial fielding plan of nearly 1,000 computers, the integrated product team of developers and customer stakeholders decided to adopt DELTA3D,1 a technologically mature, open source gaming application developed under the sponsorship of the Naval Postgraduate School 2 as the visualization engine. Lesson Learned To the maximum extent possible, reuse work that has already been done for the Department of Defense because where appropriate, software reuse saves the customer money and time.
SYSTEM HIGH LEVEL DESIGN The system architecture employed in the virtual technologies and environments’ multipurpose operational team training immersive virtual environment was initially adopted in the virtual at sea trainer, improved during the virtual fire support trainer effort, and further refined in the virtual technologies and environments and virtual simulation training toolkit development efforts to be included in the deployable virtual training environment. The linchpin of the architecture is the joint live virtual constructive data translator, which enables integration of live battle command systems with simulation components, as well as integration of simulation systems operating on disparate communications protocols. Its modular construction facilitates bidirectional translation of real world battle command system messages, usually in variable message format variants, between such applications as advanced field artillery tactical data system or command and control personal computer and such simulation components as the Joint Semi-Automated Forces. The joint live virtual 1
http://www.delta3d.org/index.php?topic=about MOVES Institute, Naval Postgraduate School, c/o Erik Johnson, WA366, 700 Dyer Road Monterey, CA 93943-5001. 2
120
Integrated Systems, Training Evaluations, and Future Directions
constructive data translator application also enables simulation systems operating on different protocols to interoperate as parts of a complete simulation federation. Lesson Learned Even when embarking on leading-edge research and development to create innovative training systems, common commercial items may be appropriate. DEVELOPMENT OF PROTOTYPES Although we employed different development lifecycle models for the projects we address in this chapter, a common theme was rapid development of a prototype capability (see Nicholson and Lackey, Volume 3, Section 1, Chapter 1). Worthy of note, virtual technologies and environments share with the virtual fire support trainer the unique experience of having end-user interaction with prototypes while the software and hardware was under development, providing invaluable feedback as the development progressed. User input on the prototypes enabled us to rapidly respond to customer needs. Prototype Development After development and delivery of the initial virtual fire support trainer prototype to the 10th Marine Regiment at Camp Lejeune, North Carolina, the marines immediately pressed it into service to assist in training forward observers. Although virtual fire support trainer implemented only a minimal set of advanced field artillery tactical data system messages, the Artillery Training School instructors integrated the device into their program of instruction, relying on it as a training aid for 2 days of the 10 day course designed to teach forward observers how to employ the advanced field artillery tactical data system to prosecute targets with field artillery. During the virtual technologies and environments effort, the team developed a prototype amphibious assault vehicle turret trainer. Combining a refurbished amphibious assault vehicle turret that had been scrapped, an infantry skills marksmanship trainer–enhanced demilitarized .50 caliber machine gun and Mk19 grenade launcher, a projection system featuring the DELTA3D visualization engine and a deployable virtual training environment prototype effort, the multipurpose operational team training immersive virtual environment system provided the marines of Company D, 3d Assault Amphibian Battalion, 1st Marine Division with a useful part-task amphibious assault vehicle (AAV) crew trainer. After arrival in Orlando, the salvaged AAV turret and stand were repaired, reconditioned, and fitted with the simulation hardware, software, and weapons. The amphibious assault vehicle turret trainer team conducted systems integration and testing and delivered the prototype system to the marines, who immediately put it to use. Based on the performance of the amphibious assault vehicle turret
Virtual Environment Lessons Learned
121
trainer prototype, the Marine Corps has decided to purchase 16 additional training devices. One example of the integration of science and technology that permeated the virtual technologies and environments effort was the human factors research using the prototype multipurpose operational team training immersive virtual environment platform as the subject of operational field experiments conducted by the training effectiveness evaluation virtual product team. As a nominative case study, the training effectiveness evaluation virtual product team collected data on the interactions of a Marine Corps Fire Support Team engaged in supporting arms training and compared their observations with hypotheses developed in an exhaustive search of learning science and cognition literature in an effort to suggest potential remediation in techniques or technology to improve training performance. The results of this study will be forthcoming. To support the training effectiveness evaluation virtual product team data collection, the multipurpose operational team training immersive virtual environment system was introduced to users as early as practicable and at various stages throughout the iterative development lifecycle. This spiral development and rapid prototyping enabled the training effectiveness evaluation virtual product team to collect training data while simultaneously providing system and software developers with meaningful and timely feedback that was incorporated into each subsequent release. At nearly every data collection event, marines were provided updated software based on feedback gathered during the previous event. This type of spiral development allowed the team to provide a simulation system to the marines that incorporated their user input. Normally the marines could see the results of their input and use the resultant tools in the subsequent spiral. Lessons Learned User input to spiral development gave the marines a simulation tool they understood how to use upon delivery. At the lowest level of command structure, marines know what tools they need to train. Operational or User Considerations The primary user of the prototype virtual fire support trainer, the virtual simulation training toolkit, and the multipurpose operational team training immersive virtual environment systems, as well as the deployable virtual training environment to which they are transitioning, is the U.S. Marine Corps. From very early in the development lifecycle, marines employed the systems in existing facilities with no special provisions required for the simulation systems. Power supply and distribution, a perennial issue with multistation, personal computer based simulations, was somewhat mitigated through the implementation of laptop based technologies, thus reducing electrical power requirements. However, the ability to host multiple workstations is constrained by physical space, the number of available power receptacles, and the appropriate load distribution across the circuits
122
Integrated Systems, Training Evaluations, and Future Directions
servicing the outlets. The use of power strips or other current splitting devices provides a reasonable approach, but caution must be used in the distribution of electric current as the power draw can quickly overwhelm a circuit. In most cases, any expanded use of the simulation systems will need to be supported by two or more 20 amp circuits. Of particular interest to scientists and engineers alike is system usability. The typical user of the simulation system does not have a robust background in computer engineering, but possesses sufficient computer skills to use Microsoft Windows–like products and point-and-click navigation techniques. Since the users of the simulations we developed are military members, we realize members of the training audience may have a widely divergent set of computer skills. Therefore, in most cases we found that some minimal level of system familiarization is required prior to using the systems for training. Familiarization training with first-person simulations is usually brief since the simulators are easy to use and often replicate the tools available to the marine while in the field to a reasonable level of accuracy. Anecdotal evidence indicates that a large portion of the training audience learned the systems quickly, since many of the applications either replicate real world military systems or present icons and symbology with which users are familiar. The human-computer user interface for the instructor/operator presented a greater challenge because the workstation controls both Windows and Linux applications and offers a more robust set of controls with which to present a synthetic battle space to the training audience. As one might imagine, many instructor/operators have never used Linux or the applications that run on this operating system. We discovered that it is usually advisable to allow sufficient time (three to four days) to train prospective instructors. As with the acquisition of other computer skills, nothing takes the place of hands-on experience to achieve an adequate level of proficiency. Lesson Learned Even when using common interfaces, it will take a nominal amount of time to train a new user—plan for the time.
Iterative Integration and Transition Use of Department of Defense Standards With the exception of the voice over Internet protocol communications emulation, which operates under distributed interactive simulation standards, all the systems we have discussed use the Department of Defense high level architecture standard. High level architecture is a general-purpose architecture for simulation reuse and interoperability. The high level architecture was developed under the leadership of the Defense Modeling and Simulation Office to support reuse and interoperability across the large numbers of different types of simulations developed and maintained by the Department of Defense. The Defense Modeling and
Virtual Environment Lessons Learned
123
Simulation Office has been renamed the Modeling and Simulation Coordination Office (http://www.msco.mil/). Lesson Learned Use industry standard interfaces and protocols; integrating with other applications will be easier later. Demonstrations or Transitions The virtual fire support trainer (2005, 2006), virtual technologies and environments’ amphibious assault vehicle turret trainer (2005) and multipurpose operational team training immersive virtual environment (2006), and deployable virtual training environment (2006, 2007) were all feature demonstrations at the I/ITSEC3 over the last few years. The expeditionary fighting vehicle trainer and the amphibious assault vehicle turret trainer are currently being used by marines (Schmorrow, 2005). Key system attributes showcased during the I/ITSEC demonstrations included portability, distributed training capability, simulation system interoperability, and linking virtual and constructive simulations with live battle command systems. We encountered a few technical challenges during these demonstrations, but none that were insurmountable. For instance, demonstrating the interoperability of the virtual fire support trainer and the deployable virtual training environment with the advanced field artillery tactical data system required the use of a single channel ground and airborne radio system, necessitating request for permission to broadcast on the appropriate military radio frequencies during the show. A more mundane, but nonetheless important, consideration was the power requirement for the radios, usually provided by a vehicle when mounted and portable battery packs when employed in a dismounted configuration. In preparation for transition to the Marine Corps’ deployable virtual training environment, virtual fire support trainer, virtual simulation training toolkit, and the virtual technologies and environments multipurpose operational team training immersive virtual environment systems participated in numerous demonstrations, user acceptability tests, and data collection events. Because all were designed to transition to the deployable virtual training environment, travel to and from the events was relatively painless. The hardened travel cases and modular design reduced the logistical burden of packaging, shipping, unpacking, setting up, and placing the systems into service.
SUMMARY Through the reuse of existing technologies, the development of new capabilities, the use of open source applications, the use of rapid prototyping, and the integration of science and technology, we have assessed various aspects of integrating and transitioning research prototypes into various joint interoperable 3
http://www.iitsec.org/
124
Integrated Systems, Training Evaluations, and Future Directions
training systems. The purpose of this study, the lessons learned, and the overall military goal is to create a training environment that allows military members to train as they fight. REFERENCES Schmorrow, D. (2005, May 25). Marine Corps modeling & simulation review. Available from www.techdiv.org/erik.jilson/2005%20USMC%20M&S%20Review/14%2005 MCMSMOVIRTE.ppt Stripling, R. M., Templeman, J. N., Sibert, L. E., Coyne, J., Page, R. G., La Budde, Z., & Afergan, D. (2006). Identifying virtual technologies for USMC training, information technology and communication [Electronic version]. 2006 NRL Review. Available from http://www.nrl.navy.mil/Review06/images/06Information(Stripling).pdf
Part III: Game Based Training
Chapter 14
SO YOU WANT TO USE A GAME: PRACTICAL CONSIDERATIONS IN IMPLEMENTING A GAME BASED TRAINER John Hart, Timothy Wansbury, and William Pike There is tremendous interest across a wide spectrum of training domains in the use of computer based games to enhance training and education. Dr. Robert T. Hays (2005) presents an excellent summary of the use of games for education and training in K–12 education, college, and the workplace, citing over 50 examples from these arenas. In addition, the U.S. Department of Defense has invested significantly in identifying the best practices in how to effectively design, develop, and use computer based games in both institutional as well as distributed training environments. The interest in using games in training continues to grow across academia, business, and all of the military services. Many of the books and articles devoted to the use of games for training provide different definitions of what is a “game,” with many distinguishing among such terms as game, simulation, game based simulations, simulation games, nonsimulation games, and so forth. The fact is that the term “game” means different things to different people. For purposes of this chapter, the term “games” refers to personal computer (PC) based computer games, and the discussion in this chapter focuses on describing six key recommendations that trainers should consider when preparing for the development or use of a PC based computer game for training. These six recommendations were derived from lessons learned through a number of research efforts conducted by the U.S. Army to include a recent effort to develop a PC based computer game called Bilateral Negotiation (BiLAT). BiLAT is a game based tool that provides an environment for soldiers to demonstrate their knowledge and skills on how to plan for and conduct successful bilateral meetings and negotiations in different cultural settings. The recommendations discussed in this chapter are of such significant importance that they should all be addressed by any trainer considering the use of “a game” as part of a future training program.
126
Integrated Systems, Training Evaluations, and Future Directions
RECOMMENDATIONS The U.S. Army Research, Engineering and Development Command, Simulation and Training Technology Center (RDECOM STTC) has led several research efforts focused on identifying the “keys to success” in how to design, develop, and use PC based computer games to train soldiers effectively. These projects involve a variety of training objectives to include instruction on traditional, small unit, infantry leadership tasks (Pike & Hart, 2003), tactical combat casualty care (Fowler, Smith, & Litteral, 2005), asymmetric warfare (Mayo, Singer, & Kusumoto, 2006), and cultural awareness training (Hill et al., 2006). Lessons learned from these research efforts have resulted in six recommendations that should be considered by trainers considering the use of PC based computer games for training. The first four directly influence the trainer’s decision regarding whether a game should, in fact, be used for training in the first place, as well as selecting the “right” game for the training exercise. The last two recommendations affect how the trainer plans to actually use the game in a training event. The importance of these recommendations has been validated in developing the BiLAT as well as other follow-on prototype efforts. This chapter will discuss the following recommendations within the context of developing the BiLAT prototype: 1. Define how the game fits into the overall instructional framework for the training. 2. Define the specific learning objectives for the training exercise. 3. Define how the game activities support the overall learning objectives. 4. Identify how performance and feedback will be accomplished during game play and following completion of an exercise. 5. Assess the relative experience of the trainees in using games, and be prepared to provide the additional time and training required to help those with little or no gameplaying experience. 6. Assess the experience level of the instructors or trainers in using games for training, and provide the required assistance in order to ensure the trainers are adequately prepared to use the games effectively.
Define How the Game Fits into the Overall Instructional Framework Identifying the right game and deciding how to integrate it into an established curriculum can be a difficult task. As Hays argues, in order to be successful, games must be “incorporated logically into an instructional framework” (Hays, 2006, p. 252). Key questions have to be addressed, such as who is the training audience, what knowledge does this target audience already possess, how much training content must be provided before the trainees participate in a game exercise, and how much content must be included within the game itself. The trainer must decide if the game should present such training content as initial concepts or should it be used solely as a practice environment allowing trainees to demonstrate their knowledge in specific training scenarios. Identifying how a game fits
So You Want to Use a Game: Practical Considerations
127
into the overall instructional framework is a critical first step to using a game effectively in training. The U.S. Army Command and General Staff College, School for Command Preparation (SCP) at Fort Leavenworth, Kansas, was the first organization to use the BiLAT. SCP conducts the army’s battalion and brigade commanders’ pre-command course (PCC), a three week resident training course focused on preparing senior officers for key command assignments in the army. The BiLAT is a purpose-built game created specifically by the U.S. Army to support training at the SCP. Defining how the BiLAT would fit into the overall instructional framework of a training program was an essential step that drove key design and development decisions during the early phases of the project. Instructional designers, software designers, and SCP course instructors devoted significant effort at the beginning of the project evaluating how the BiLAT could be integrated into the overall PCC course of instruction. Key decisions were made identifying the functions and tasks to be performed in the application, as well as the tasks that were to be provided by another form of training. For instance, the designers and instructors agreed that basic instruction on the core principles of cultural awareness and negotiation would be provided by other means, such as lectures, readings, and classroom discussions, and that BiLAT would serve only as a practice environment for honing skills in the art of bilateral meetings and negotiation. As a result, no preliminary instruction is provided in the BiLAT game environment. The decisions about how to integrate BiLAT in the overall instructional design laid the foundation for the overall success of the BiLAT project.
Establish Learning Objectives Identification of clearly defined learning objectives in any training exercise is the second important step that must be addressed to successfully create or select a game for training. Identification of the learning objectives is also the single most important factor that differentiates a “training game” from an “entertainment game.” As Hays states, “instructional objectives must be determined a priori and depend on job requirements.” (Hayes, 2006, p. 260). Oftentimes, trainers try to evaluate different games in order to select the best one for use in a training exercise. A key consideration during that evaluation process is to determine to what extent does a game support the achievement of the trainer’s overall learning objectives. Finding an existing game that satisfies specific learning objectives can be difficult. As a result, a trainer must be prepared to modify how a game is used or, worst case, be prepared to develop a whole new game in order to ensure it supports the specific learning objectives to be achieved. The design team for BiLAT devoted significant effort identifying strategic learning objectives and tested those objectives initially through the use of a paper based version of the game. The flow of the paper based version provided valuable insight that allowed the designers to validate the key learning objectives before
128
Integrated Systems, Training Evaluations, and Future Directions
costly software development began. Once the learning objectives were finalized, work commenced in developing the software for the PC version of the BiLAT. Link Game Activities to Learning Objectives The next major issue in the evaluation and selection process is to ensure that the specific activities performed by a trainee in a game scenario actually support the overall learning objectives for the exercise. Hays states that “a simulation game is ineffective if it does not directly link game events to instructional objectives and does not ensure that the learner understands whether he or she has met those objectives” (Hays, 2006, p. 252). Again, the designers of BiLAT used the low cost, paper based prototype as a method of vetting the specific steps that should be performed by an individual preparing for and conducting a successful meeting and negotiation. We utilized a number of subject matter experts and employed a process called the Cognitive Task Analysis (CTA) to guide the development effort. “Cognitive Task Analysis (CTA) uses a variety of observation and interview strategies to capture a description of the knowledge which experts use to perform complex tasks” (Clark, Feldon, van Merrienboer, Yates, & Early, 2006). Using such techniques as the CTA ensured that the tasks performed during each BiLAT scenario were linked to and supported the strategic learning objectives for the application. Assess Performance and Provide Feedback Assessing performance and providing feedback is essential in determining the overall success in using a game for training. Ideally, assessment should take place within the game itself, and feedback should be provided in the form of a tutor or coach during game play. Feedback must also be provided to the trainee in the form of an after action review (AAR) at the end of the exercise. A critical factor that differentiates between “good” and “poor” assessment and feedback is the ability to link performance to the overall learning objectives. The trainer who is able to link performance in the game experience to the overall learning objectives greatly increases the likelihood of success in using the game in training. BiLAT incorporates an enhanced set of artificial intelligence tools, methods, and technologies to provide thorough assessment and feedback throughout each BiLAT exercise. It provides a rich coaching and tutoring capability through both the meeting and negotiation stages of each exercise and provides a thorough AAR using a virtual coach at the end of each meeting engagement. These assessment and feedback capabilities make the BiLAT an effective trainer in both instructor-facilitated and distance learning environments. Assess the Experience Your Trainees Have Using Games before Conducting the Training Exercise A PC based computer game can be a powerfully effective tool for training individuals; however, even well designed and well-built games can result in a waste
So You Want to Use a Game: Practical Considerations
129
of time if they are not used effectively. Assessing the relative experience that trainees have in using games is an important step in the overall planning process. Trainers should recognize the likelihood that a significant number of their students are not “gamers” and will likely need additional training to overcome a natural fear or reluctance to use the application if additional support is not provided. The Army Research Institute recently reported that fewer than 32% of 10,000 Soldiers surveyed across all ranks in the US Army admitted to playing videogames recreationally on a weekly basis (numbers vary by rank) . . . Consistently, our research shows that the assumption that most Soldiers are “gamers” is exaggerated. Continuing to act on that assumption can be troublesome unless certain precautions are taken. (Belanich, Orvis, Moore, & Horn, 2007, p. 958)
The researchers go on to suggest the importance that “instructors should assess trainees’ game experience” and “provide targeted opportunities to gain prerequisite experiences prior to training.” (Belanich et al., 2007, p. 963). One of the best ways to overcome this issue of game experience is to ensure that an adequate overview/introduction is provided prior to beginning the actual training event. This overview should include a discussion of the learning objectives, a demonstration of the “knobology” (for example, instruction on how to “play the game”), and an opportunity for the students to practice with the game before actual training begins. SCP instructors learned this lesson with the BiLAT and today set aside approximately one hour of additional classroom time prior to a BiLAT exercise to ensure that each student is ready to conduct the training. Assess the Experience Level of the Instructors Who Will Be Using the Game in Training The use of PC based computer games in training is a relatively new phenomenon, and many instructors do not have a lot of experience using games for training. Game developers are encouraged to provide support materials that can be used by new instructors and trainers to help them learn how to use a game in training before attempts are made for the first time. The developers of BiLAT addressed this issue by creating a simple to use learning support package (LSP), providing new trainers with detailed instructions in how to set up and conduct a successful BiLAT training exercise. This LSP is distributed along with the BiLAT to users across the army. CONCLUSION There is growing interest in schools, business, and in the military in using PC based computer games to enhance training opportunities. As a result of its experience in conducting research into the design, development, and use of PC based computer games, the RDECOM STTC has identified six key recommendations
130
Integrated Systems, Training Evaluations, and Future Directions
that trainers should address prior to using a game for training. These recommendations and lessons learned have been validated in the successful development of the BiLAT, a game that is being deployed and used across the army. These lessons are being consistently revalidated as new game prototypes are being developed and tested in educational settings. REFERENCES Belanich, J., Mullins, L. N., & Dressel, J. D. (2004). Symposium on PC-based simulations and gaming for military training (Army Research Institute Product 2005-01). Retrieved on May 22, 2007, from http://www.hqda.army.mil/ari/wordfiles/RP%202005-01.doc Belanich, J., Orvis, K. A., Moore, J. C., & Horn, D. B. (2007). Fact or fiction—soldiers are gamers: Potential effects on training. Proceedings of the Interservice/Industry Training, Simulation & Education Conference—I/ITSEC (pp. 958–964). Arlington, VA: National Training Systems Association. Clark, R. E., Feldon, D., van Merrienboer, J. J. G., Yates, K., & Early, S. (2006). Cognitive task analysis. In J. M. Spector, M. D. Merrill, J. J. G. van Merrienboer, & M. P. Driscoll (Eds.), Handbook of research on educational communications and technology (3rd ed.). Mahwah, NJ: Lawrence Erlbaum. Retrieved April 18, 2008 from http:// www.cogtech.usc.edu/publications/clark_etal_cognitive_task_analysis_chapter.pdf Fowler, S., Smith, B., & Litteral, D. J. (2005). A TC3 game-based simulation for combat medic training. Paper presented at the 2005 Interservice/Industry Training Simulation & Education Conference, Orlando, FL. Hays, R. T. (2005). The effectiveness of instructional games: A literature review and discussion (Tech. Rep. No. 2005-004). Orlando, FL: Naval Air Warfare Center Training Systems Division. Hays, R. T. (2006). The science of learning: A systems theory perspective. Boca Raton, FL: BrownWalker Press. Hill, R., Belanich, J., Lane, C., Core, M., Dixion, M., Forbell, E., Kim, J. & Hart, J. (2006). Pedagogically structured game-based training: Development of the ELECT BiLAT simulation. Paper presented at the 25th Army Science Conference, Orlando, FL. Mayo, M., Singer, M. J., & Kusumoto, L. (2006). Massively multi-player (MMP) environments for asymmetric warfare. Journal of Defense Modeling and Simulation, 3(3), 155–166. Pike, W. Y., Anschuetz, R., Jones, C., & Wansbury, T. (2005). The rapid decision trainer: Lessons learned during an R & D development and fielding process. Proceedings of the 2005 Interservice/Industry Training Simulation & Education Conference [CD-ROM]. Arlington, VA: National Training Systems Association. Pike, W. Y., & Hart, D. C. (2003). Infantry officer basic course (IOBC) rapid decision trainer (RDT). Proceedings of the 2003 Interservice/Industry Training Simulation & Education Conference. Arlington, VA: National Training Systems Association.
Chapter 15
MASSIVELY MULTIPLAYER ONLINE GAMES FOR MILITARY TRAINING: A CASE STUDY Rodney Long, David Rolston, and Nicole Coeyman Massively multiplayer online games (MMOGs) are one of the fastest growing forms of entertainment, with such games as World of Warcraft claiming more than 10 million users (World of Warcraft, 2008). Large-scale online social environments, such as Second Life (www.secondlife.com) and There (www.there.com), draw millions of additional users. These online environments are examples of virtual worlds—online three-dimensional (3-D) synthetic “worlds” that have the following characteristics: • A large number of players, typically thousands; • Large-scale geographic areas, often up to continent-sized worlds; • Avatars (graphical characters in the virtual world) that may be controlled by a player’s actions; • Nonplayer characters that are controlled by artificial intelligence; • Team environments that emphasize interaction among many players; • Geographically distributed players that can log into the server from any location around the world that has a high speed Internet connection; • Dynamic objects that can be moved, carried, placed, operated, and so forth; • Free-form activity where avatars can move around freely, exploring and interacting with the environment; and • Persistent environments that are intended to exist continuously and indefinitely, evolving over time.
In 2003, the U.S. Army Simulation and Training Technology Center began a research program, Asymmetric Warfare—Virtual Training Technologies, to investigate the use of MMOG technology to train soldiers for asymmetric warfare while operating in large urban areas, which are crowded with civilians.
132
Integrated Systems, Training Evaluations, and Future Directions
TASK ANALYSIS In the 2003 military operating environment of Afghanistan and Iraq, the U.S. Army was no longer fighting a force-on-force operation as had been the case in past conflicts. Soldiers were fighting an asymmetric war in urban cities, surrounded by a civilian population. As soldiers performed such dismounted infantry operations as traffic control points, patrols, buildings searches for weapons and high value targets, they were being attacked by improvised explosive devices (IEDs) and small arms fire. The goal of the research program was to create a virtual world that could be used to train soldiers for this environment and the wide variety of tasks they were required to perform. Terrain Development The virtual world had to reflect the type of terrain conditions where soldiers were currently operating and where they might be operating in the future. Having the virtual world reflect the current terrain not only enhanced realism and immersion, but also improved training scenarios. For example, the amount of trash and rubble on city streets made it very difficult for soldiers to detect IEDs. For our program, we provided terrain environments typical of urban areas in Southwest Asia and the United States (for Homeland Security), as well as rural desert and jungle areas. MMOG technology allowed us to provide large terrain areas and have all of these different areas available at the same time for concurrent operations. Avatars Using capabilities inherent in MMOGs, the role-players could change their avatars’ facial features, skin/hair color, clothing, shoes, and so forth according to the culture. Civilian role-players had the clothing typically used in theater, while insurgent role-players could wear a uniform or a shemagh/ghutra to hide their faces. They also had the weapons to implement the enemy tactics being used, such as IEDs and guns. Soldiers were also provided the tools to perform their tasks. This included uniforms for identification, basic weapons, military vehicles, and radios for communications. Avatars and vehicles assessed damage based on weapon type. For military vehicles, the damage included mobility kill, fire power kill, or catastrophic kill. Communications Given the interactions between soldiers and civilians in urban operations, communication mechanisms, both verbal and nonverbal, were provided. Communication over simulated military radio networks enabled communications among military personnel to support operations over large geographic areas. Voice communication was implemented to reflect the verbal interactions among military
Massively Multiplayer Online Games for Military Training: A Case Study
133
and civilians in urban environments. The voice communication was tied to the speaker through lip synching and was spatially accurate in 3-D, attenuating with distance. Commonly found in MMOGs, animated gestures, or emotes, were provided to reflect culture and nonverbal communication. Role-players could select a button on the screen to have their avatars perform the Arabic gestures for greetings, thanks, to show anger, and so forth. Soldiers could use emotes to perform nonverbal commands to civilians, who might not understand English, for example, when operating a traffic control point. Performance Standards The virtual simulation used the scalability of an MMOG to support a large number of soldier trainees, as well as civilian and enemy role-players. Nonplayer characters could also be used, driven by artificial intelligence versus human roleplayers. The goal was to support approximately 200 characters/avatars in the virtual world at the same time to reflect urban clutter. Technology Requirements While the virtual world provided a very flexible and scalable simulation environment, there were other technical challenges that had to be solved to support effective training. Nonplayer characters were needed to create realistic, crowded urban cities, without requiring a large number of human role-players. The artificial intelligence had to reflect such typical civilian behavior as wandering and reacting to gunfire and IEDs. Also, a record and replay capability was needed to support after action review of the training exercise. Considering one of the strengths of MMOG technology is supporting multiplayers over the Internet, the after action review tool needed to be able to support a distributed after action review, including distributed playback of the recorded exercise. SYSTEM HIGH LEVEL DESIGN To support large numbers of simultaneous players/avatars and large geographic areas, the game engine runs on a cluster of servers. The large geographic area is broken down into sectors, with each sector assigned to a different server in the cluster. As an avatar moves through the virtual world, the modeling of the avatar is passed to different servers as it crosses sector boundaries. These boundaries also help filter the data that flow over the network to the individual client machines, as avatars in one sector do not need to know about events and interactions in the other sectors. Of course, there are exceptions to this, such as weapons and radio communications. To ensure a smooth transition, software logic handles the movement of avatars from one sector to another. While the trainee’s computer provides control of his avatar in the virtual world, it also has to display what is happening around him. In an urban area with many civilians, the client computer and graphics card could easily become overloaded.
134
Integrated Systems, Training Evaluations, and Future Directions
To manage the load, the game engine uses different levels of detail, with the closest 20 avatars displayed in full detail with a fast update rate. The remaining avatars in the scene will have a much lower level of detail and will be updated only at a three second rate. As the trainee’s avatar moves through the environment, the set of 20 avatars will change, and the avatars with the lower level of detail will transition to the higher level of detail and will begin updating at the higher rate. DEVELOPMENT OF SYSTEM PROTOTYPES With the network latency inherent in MMOGs, the simulation was designed to support training for operations that involved interactions between the soldiers and civilian population, as well as IEDs and small groups of insurgents, as opposed to high intensity combat. The commercial game engine chosen was very strong in human interactions and communication, which included integrated 3-D, spatially accurate voice. To explore MMOG technology and how it could be leveraged to improve military training, prototypes were developed that focused on specific training scenarios that were then evaluated through experimentation with soldiers. Prototype Development One of the first scenarios was checkpoint operations, which involve a lot of soldier-civilian interaction. Using the strength of the simulation to manipulate objects in the environment, materials used to set up checkpoints, for example, concertina wire, cones, signs, jersey barriers, and so forth, were added to the soldiers’ inventories. Soldiers used these objects to practice setting up the checkpoint while receiving feedback from the instructor. Once properly set up, the checkpoints could then be operated by the soldiers, with role-players acting as civilians driving cars. Using verbal communication, as well as hand gestures, the soldiers were able to successfully operate the checkpoint (Mayo, Singer, & Kusumoto, 2006). Functionality was gradually added to the virtual world to support other scenarios, including building searches, patrols, and so forth. Operational or User Considerations One of the greatest strengths of MMOGs and virtual worlds is to be able to interact in a virtual environment with people across the world. However, getting permission to run the software on a military network is a long and difficult process. Due to the number of ports used, the ports needed to operate the MMOG were often blocked by firewalls. This made running exercises on a military installation challenging. As a result, exercises were often run over a local area network. This was still an issue because permission was needed to load software on the military computers. Also, most of the military computers did not meet performance specifications, especially for the graphics cards, and could not run the MMOG software without being upgraded.
Massively Multiplayer Online Games for Military Training: A Case Study
135
Use of Department of Defense Standards To be interoperable with other existing military training simulations and simulation support tools, the design decision was made to implement a network interface using the Institute for Electrical and Electronics Engineers Standard for Distributed Interactive Simulation (IEEE 1278.1). This standard allows heterogeneous simulations to be interoperable by standardizing the way data are shared through the network. Through this network interface, we were able to populate the virtual world with nonplayer characters from the U.S. Army’s One SemiAutomated Forces simulation. This simulation allowed large numbers of nonplayer characters to be generated and controlled with a single operator and had the artificial intelligence required to support civilian and military behaviors. We were also able to integrate with the dismounted infantry virtual after action review system developed by the Army Research Institute to support exercise recording, replay, and critique of the trainee’s performance.
Demonstrations In 2005, a demonstration of the prototype was provided to the Secretary of the Army and the Army Science Board, using Iraqi role-players at the National Training Center, a National Guard unit in Orlando, Florida, and role-players in other parts of the United States. The scenario highlighted how personnel assets at the National Training Center and soldiers with recent experience in theater could all be tied into a training exercise over a wide area network with soldiers who are preparing to deploy. The Iraqi role-players spoke Arabic in the scenario, bringing in language and culture aspects into the scenario while the soldiers with experience in theater could share their observations and lessons learned. The demonstration showcased how the distributed nature of MMOG technology could improve training, providing a realistic simulation environment to support unit training at a home station.
TEST AND EVALUATION Technical Performance The goal of the training simulation was to be able to model a crowded urban environment on personal computers. At a simulation and training conference in December 2006, the Simulation and Training Technology Center demonstrated 200 avatars in the virtual world over the Internet, using a combination of live role-players across the United States and nonplayer characters generated by the One Semi-Automated Forces simulation. Using a broadband Internet connection, the computers had a 128 megabyte graphics card, 2 gigabytes of memory, and a 2.8 gigahertz Pentium 4 processor.
136
Integrated Systems, Training Evaluations, and Future Directions
Usability The accessibility of game based simulations is providing a new way to train soldiers, supplementing training at a home station, school houses, and live training centers. These simulations can prepare soldiers for the types of situations they will experience in theater. By training on various scenarios, soldiers can prepare for deployment by reinforcing basic skills, situational awareness, and decisionmaking techniques. While simulation based training may not replace live training, it can hone the skills needed to prepare soldiers for combat and enhance the live training they do receive. One of the key features of the simulation was its flexibility and adaptability. Using the simulation as a virtual stage, a wide variety of training scenarios could be supported by changing the objects in the environment and the dialogue and actions of the live role-players. Before beginning a particular scenario, the trainer is given the opportunity to set up the scenario, using the scenario editor to place dynamic objects (IEDs, barriers, and so forth) in specific locations, allowing for variations in training scenarios. The flexibility of the system could enable the military to rapidly and effectively communicate and train new tactics to the unit level. While developed to support training for urban combat operations, the simulation also showed potential for Homeland Security training, supporting a force protection/antiterrorism exercise at Fort Riley, Kansas (Stahl, Long, & Grose, 2006).
Training Effectiveness Evaluation The Simulation and Training Technology Center worked closely with the Army Research Institute to evaluate this new training technology. The Army Research Institute conducted formative evaluations with soldiers, consisting of an overview of the simulation, background questionnaires (experience, rank, training expertise, and computer familiarity), hands-on training on how to use the simulation, presentations of the features and functionality of the system, and questionnaires addressing the system aspects and features. After structured discussions on the system capabilities, tools, and features, the soldiers performed a specific preplanned mission. Following completion of the mission, the soldiers conducted an after action review and completed final questionnaires regarding the exercise and the system as a whole. The Army Research Institute has conducted four evaluations of the simulation to date. The overall results of these studies showed that soldiers do benefit from the training provided. The soldiers recognized the simulation’s ability to support and supplement situational awareness training exercises with the diverse environment provided by the system and the ability to model unpredictable behaviors. Another major training benefit was the after action review capability, namely, being able to replay a simulation exercise (Singer, Long, Stahl, & Kusumoto, 2007). As the popularity of these virtual worlds has grown, so has interest in our research program. Working with the Naval Air Warfare Center and our allies,
Massively Multiplayer Online Games for Military Training: A Case Study
137
we continue to explore how this technology can be leveraged to support training for the warfighter in joint and coalition warfare environments. REFERENCES Mayo, M., Singer, M. J., & Kusumoto, L. (2006, July). Massively Multi-player (MMP) environments for asymmetric warfare. JDMS: The Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 3(3), 155–166. Retrieved April 19, 2008, from http://www.scs.org/pubs/jdms/vol3num3/JDMSIITSECvol3no3Mayo155166.pdf Singer, M., Long, R., Stahl, J., & Kusumoto, L. (2007, May). Formative evaluation of a Massively Multi-player Persistent (MMP) environment for asymmetric warfare exercises (Technical Rep. No. 1227). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Stahl, J., Long, R., & Grose, C. (2006, June). The application and results of using MMOG technology for force protection/anti-terrorism training. Paper presented at the European Simulation Interoperability Workshop, Stockholm, Sweden. World of Warcraft reaches new milestone: 10 million subscribers. (2008, January). Retrieved April 19, 2008, from http://www.blizzard.com/us/press/080122.html
Part IV: International Training Examples
Chapter 16
A SURVEY OF INTERNATIONAL VIRTUAL ENVIRONMENT RESEARCH AND DEVELOPMENT CONTRIBUTIONS TO TRAINING Robert Sottilare The increased complexity of operational missions and environments has prompted researchers worldwide to evolve virtual environment (VE) technology to support more complex training missions and environments. Research and development in VEs has profound implications for all levels of training in the United States and around the world. These implications include, but are not limited to, improved representations of virtual humans and other intelligent agents to support training and training management, increased capabilities to support rapid construction of geospecific mission rehearsal environments, and increased capabilities to mix live and virtual environments to support more realistic training by incorporating real world data (for example, command and control information). Other chapters have addressed existing VE technologies within the United States. This chapter focuses on international activities in the research, development, and application of “VEs for training” external to the United States. It would take much more space than has been allocated here to do justice to all the excellent training research worldwide that utilizes VE technologies. This chapter attempts to (1) examine where VE research is taking place worldwide, (2) summarize a few of the current research vectors in VE for training, and (3) look ahead toward future research topics. What follows is a sample of organizations conducting “training research involving the use of virtual environments.” These organizations are reviewed in terms of their research thrusts, objectives, applications, and key publications. For the purposes of discussion, we will consider mission rehearsal to be “training for a specific mission” and therefore a subset of training. Specifically, the scope of Volume 3, Section 1 focuses on a sampling of research and technology development related to visual, aural, and haptic interaction in virtual environments. For this purpose, we consider any reality that contains a virtual component to
A Survey of International Virtual Environment Research and Development
139
be part of our discussions of virtual environments. This includes virtual reality and mixed reality (including augmented virtuality and augmented reality). Fundamental research in VEs, which has a potential impact on training, is also considered in this survey. VE RESEARCH AT THE VIRTUAL REALITY LAB The Virtual Reality Lab (VRLab) at the Swiss Federal Institute of Technology in Lausanne, Switzerland, was established in 1988 and is focused on the modeling and animation of “three-dimensional inhabited virtual worlds,” including real time virtual humans, multimodal (visual, aural, and haptic) interaction, and immersive VE (Virtual Reality Lab, 2007). VRLab’s research portfolio includes “JUST in Time Health Emergency Interventions” (Manganas et al., 2005), “Stable Real-Time AR Framework for Training and Planning in Industrial Environments” (Vacchetti et al., 2004), and “Immersive Vehicle Simulators for Prototyping, Training and Ergonomics” (Kallmann et al., 2003). VRLab is currently a research partner in seven European Union (EU) and four Swiss national projects involving virtual reality. VE RESEARCH AT MIRALAB MIRALab at the University of Geneva in Switzerland was founded in 1989 and presently includes about 30 researchers from various fields, including computer science, mathematics, medicine, telecommunications, architecture, fashion design, cognitive science, haptics, and augmented reality, who conduct research in computer graphics, computer animation, and virtual worlds. One MIRALab foundation project is the virtual life network (VLNET), a networked collaborative virtual environment that includes highly realistic virtual humans and allows users to meet in shared virtual worlds where they communicate and interact with each other and with the environment. The virtual human representations have appearance and behaviors similar to real humans in order to “enhance the sense of presence of the users in the environment as well as their sense of being together in a common virtual world” (Capin, Pandzic, Noser, MagnenatThalmann, & Thalmann, 1997). Extensions of this work can be seen in Joslin, Di Giacomo, and Magnenat-Thalmann’s (2004) review of the creation and standardization of collaborative virtual environments. VE RESEARCH AT THE VIRTUAL ENVIRONMENT LABORATORY The Virtual Environment Laboratory (VEL) was established at Ryerson University in Toronto, Ontario, Canada, in 2003 to “advance the integration of geospatial, modeling, visualization and virtual reality technologies for use in urban environment applications” (Ryerson University, 2007). VEL’s research and development goals include the automated detection, identification, correlation,
140
Integrated Systems, Training Evaluations, and Future Directions
and extraction of remotely sensed imagery; and the modeling of complex humanenvironmental interactions in urban environments. While initial operational applications of this technology include land use management, three-dimensional (3-D) urban modeling, landscape mapping, disaster management, and transportation planning, this cross-section of technologies could also be applied to urban environment training for first responders (that is, police, fire, and emergency medical personnel).
TNO DEFENSE SECURITY AND SAFETY TNO in Soesterberg, the Netherlands, conducts extensive training research in the areas of human factors, behavioral representation, artificial intelligence, games research, and movement simulation. Two projects related to VE for training that highlight TNO’s expertise are Ashley and Desdemona (DESorie¨ntatie DEMONstrator Amst). Ashley is a virtual human whose roles include wingman, unmanned aerial vehicle pilot, and training mentor. Current versions of Ashley are being used to monitor trainee performance and provide feedback. Future versions will be able to perceive natural language and provide feedback when the trainee requests it (Clingendael Center for Strategic Studies, 2006). Desdemona is a movement simulator that provides an extensive range of motion due to the combination of both the hexapod and the centrifuge. This motion base can be used for realistic virtual training for aircraft and automobiles or can be used for analysis of vehicle performance. Impressive accelerations and complex curves are possible where the additional realism of movement is required to support training objectives (TNO, 2008).
CHADWICK CARRETO COMPUTER RESEARCH CENTER At the Chadwick Carreto Computer Research Center in Mexico City, Mexico, Menchaca, Balladares, Quintero, and Carreto (2005, p. 40) defined “a set of tools, based on software engineering, HCI techniques and Java technologies, to support the software development process of 3D Web based collaborative virtual [environments] (CVE) populated by non autonomous interactive entities.” The thrust of this work is to define “a methodology supported by design, analysis and implementation tools that assist [in] the development of Web-based CVE” (Menchaca et al., p. 40). This research emphasizes collaboration and interaction of the VE entities. The tools defined include a model of “social groups, a graph-based, high level notation to specify the interactions among the entities, and a Java-based software framework that gives support to the model and the interaction graph in order to facilitate the implementation of the CVE” (Menchaca et al., 2005, p. 40). This research has the potential to support a more flexible, tailor-made training VE based on the trainees and their relationships.
A Survey of International Virtual Environment Research and Development
141
CENTER FOR ADVANCED STUDIES, RESEARCH, AND DEVELOPMENT IN SARDINIA Center for Advanced Studies, Research, and Development in Sardinia (CRS4) is an applied research center “developing advanced simulation techniques and applying them, by means of High Performance Computing, to the solution of large scale computational problems” (CRS4, 2007). This research has significance to complex training in virtual environments where large numbers of trainees interact (that is, network centric warfare scenarios) and includes research on massive-model rendering techniques in which various output-sensitive rendering algorithms are used to overcome the challenge of rendering very large 3-D models in real time as required for interactive training (Dietrich, Gobbetti, & Yoon, 2007). VE RESEARCH AT THE VIRTUAL REALITY AND VISUALIZATION RESEARCH CENTER The Virtual Reality and Visualization (VRVis) Research Center in Vienna, Austria, has ongoing research in six key areas: real time rendering, virtual habitats, scientific visualization, medical visualization, virtual reality, and visual interactive analysis. Three of these research areas have a direct impact on training. The basic research being conducted by VRVis in “virtual reality” includes goals to provide more realistic virtual environments and a more interactive 3-D representation and rapid display of realistic objects within a 3-D environment (Matkovic, Psik, Wagner, & Gracanin, 2005). The training benefit of this work is the ability to present more complex and realistic environments to the trainee. Additional benefits are faster and more cost-efficient content development for virtual environments. Another research goal being pursued by VRVis is “interactive rendering,” the real time response of complex, 3-D graphics to human actions for such applications as museum displays, one-to-one-training, and location based services. A third research goal that is applicable to VE training is “medical visualization,” which includes the development of extremely fast renderers that allow large, complex datasets to be viewed in real time for virtual training of surgeons. Endoscopy has recently been applied to pituitary surgery as a minimally invasive procedure for the removal of various kinds of pituitary tumors. Surgeons performing this procedure must be both familiar with the individual patient’s anatomy and well trained. A VE for endoscopy training permits very precise training and pre-operative planning using a realistic representation of the patient’s anatomy with no risk to the patient (Neubauer, et al., 2005). REALISTIC BROADCASTING RESEARCH CENTER Kim, Yoon, and Ho (2005) at the Gwangju Institute of Science and Technology (GIST), Gwangju, Korea, have defined a multimodal immersive media termed “realistic broadcasting” in which 3-D scenes are created by acquiring
142
Integrated Systems, Training Evaluations, and Future Directions
immersive media using a depth based camera or a multiview camera. According to Kim et al. (p. 164), “after converting the immersive media into broadcasting contents,” the content is sent to users via high speed/high capacity transmission techniques where the user can experience realistic 3-D display, 3-D sound, and haptic interaction with the broadcast VE. VE RESEARCH AT THE MARMARA RESEARCH CENTER OF TURKEY The Marmara Research Center (MRC) in Kocaeli, Turkey, is a multidisciplinary organization that participates in fundamental and applied simulation research that includes partnerships in the European Simulation Network (ESN) project. ESN is developing procedures and software to overcome risks/obstacles to prepare for the creation of a Europe-wide military training and exercise simulation environment by performing a virtual environment analysis of the network requirements with the aid of artificial intelligence software (Marmara Research Center, 2007). Additional research vectors at MRC include the development and use of command and control simulations to train decision makers and support the development of new concepts (Hocaog˘lu & Firat, 2003) and the research, development, and evaluation of a common evaluation model and an intelligent evaluation system to simplify and minimize time (and cost) of training evaluations where train¨ ztemel & O ¨ ztu¨rk, 2003). ing is conducted in a synthetic environment (O VE RESEARCH WITHIN THE NORTH ATLANTIC TREATY ORGANIZATION The complex missions entrusted to the military of the North Atlantic Treaty Organization (NATO) countries are driving new performance requirements for military personnel (Alexander et al., 2005) and the need for new and improved (1) approaches to training, (2) methods to analyze changes to tactics, techniques, and procedures, and (3) virtual environments to support the transition and application of training research via experimentation. NATO countries are researching and developing virtual environment technology to support the need for more realistic training and more natural/realistic human-systems interaction (HSI). Collaborative activities, topics, and VE research themes within NATO organizations include the following: • Human Factors and Medicine (HFM) Panel: The mission of the HFM Panel is to provide the science and technology base for optimizing health, human protection, wellbeing, and performance of the human in operational environments with consideration of affordability. In 2005, the HFM Panel focused on the study of “Virtual Environments for Intuitive Human-System Interaction” (Alexander et al., 2005). This study and its associated workshop provided an overview of VE research in Canada, Denmark, Germany, Sweden, the Netherlands, the United Kingdom, and the United States. The application of VE technology encompassed vehicle operation training,
A Survey of International Virtual Environment Research and Development
143
individual skills and collective tactical training, and command and control training for maritime, ground combat, and aviation environments. • Simulated Mission and Rehearsal Training (SMART) initiative: The mission of SMART is to “coordinate, develop, and implement the potential for interactive virtual military training amongst participant nations’ armed forces” (NATO Research and Technology Organization, 2007). The “potential” includes emerging technologies developed by NATO panels and other research organizations.
VE RESEARCH IN THE EUROPEAN UNION As training missions become more complex and the student’s cognitive load increases, it is critical to provide students with tools to manage information within the VE. The management of information in VE training is being enhanced by the architecture and technologies for inspirational learning environments (ATELIER). ATELIER is a EU project within the programs for future and emerging technologies and includes participants from Italy, Sweden, Austria, and Finland. ATELIER focuses on learning environments for architecture and interaction design students, and its main purpose is to implement methods for students to manage large amounts of data collected in a VE during the learning process using intuitive criteria (Loregian, Matkovic, & Psik, 2006).
FINDINGS AND CONCLUSIONS First, in surveying the VE literature, it is evident that significant contributions to the research and development of VE are worldwide. Collaborations between research centers are the norm with journal article by-lines that often include authors from multiple countries. Next, the science involved in creating and interacting with VEs can be and is often broadly applied not only to training, but to other operational missions as well. The need for very complex VEs is driving innovative thinking resulting in the application of existing technologies in new ways and the creation of new technology to support large-scale VEs, highly realistic VEs, and more interactive VEs for planning, training/rehearsal, and evaluation. The research vectors consist of HSI design (including navigation, manipulation, and data management), rapid construction of VEs, large-scale simulation architectures, and the integration of real world data from sensors and other sources to form augmented and mixed reality environments. Significant work remains to evaluate the effectiveness of evolving VE technologies for training applications. This future work will result in new standards, heuristics, and best practices. The integration of new VE technology with commercial products (that is, games) will continue to expand their capabilities and make them more affordable and available to a wider training audience (for example, Nintendo Wii includes a wireless game controller, which can be used as a handheld pointing device and can detect acceleration in three dimensions).
144
Integrated Systems, Training Evaluations, and Future Directions
Finally, this chapter covers a very small percentage of the research being conducted worldwide. However, it can still serve as a guide to lead readers to pockets of expertise in VE, innovative methods and applications of VE, the identification of the strengths and limits of current VE technology, and the VE technology gaps that define future research and development programs. REFERENCES Alexander, T., Goldberg, S., Magee, L., Borgvall, J., Rasmussen, L., Lif, P., Gorzerino, P., Delleman, N., Mcintyre, H., Smith, E. & Cohn, J. (2005). Virtual Environments for intuitive human-system interaction: National research activities in augmented, mixed and virtual environments (RTO Tech. Rep. No. RTO-TR-HFM-121-Part-I). Neuillysur-Seine Cedex, France: NATO Research and Technology Organization. Capin, T. K., Pandzic, I. S., Noser, H., Magnenat-Thalmann, N., & Thalmann, D. (1997). Virtual human representation and communication in VLNET networked virtual environments. IEEE Computer Graphics and Applications, Special Issue on Multimedia Highways, 17(2), 42–53. Center for Advanced Studies, Research, and Development in Sardinia (CRS4). (2007). Research and technological development activities. Retrieved August 29, 2007, from http://www.crs4.it/ Clingendael Center for Strategic Studies. (2006). Where humans count: Seventh of a series of nine essays on the future of the Air Force (p. 19.). Dietrich, A., Gobbetti, E., & Yoon, S. E. (2007). Massive-model rendering techniques: A tutorial. IEEE Computer Graphics and Applications, 27(6), 20–34. Hocaog˘lu, M. F., & Fırat, C. (2003, October). Exploiting virtual C4ISR simulation in training decision makers and developing new concepts: TUBITAK’s experience. Paper presented at the C3I and Modeling and Simulation (M&S) Interoperability, NATO RTO Modeling and Simulation Conference, Antalya, Turkey. Joslin, C., Di Giacomo, T., & Magnenat-Thalmann, N. (2004). Collaborative virtual environments, from Birth to Standardization. IEEE Communications Magazine, Special Issue on Networked Virtual Environments, 42(4), 65–74. Kallmann, M., Lemoine, P., Thalmann, D., Cordier, F., Magnenat-Thalmann, N., Ruspa, C., & Quattrocolo, S. (2003, July). Immersive vehicle simulators for prototyping, training and ergonomics. Paper presented at Computer Graphics International CGI-03, Tokyo, Japan. Kim, S. Y., Yoon, S. U., & Ho, Y. S. (2005). Realistic broadcasting using multi-modal immersive media. Gwangju, Korea: Gwangju Institute of Science and Technology (GIST). Loregian, M., Matkovic, K., & Psik, T. (2006). Seamless browsing of visual contents in shared learning environments. Proceedings of the Fourth IEEE International Conference on Pervasive Computing and Communications Workshops— PERCOMW’06 (pp. 235–239). Washington, DC: IEEE Computer Society. Manganas, A., Tsiknakis, M., Leisch, E., Ponder, M., Molet, T., Herbelin, B., MagnenatThalmann, N., & Thalmann, D. (2005). JUST in time health emergency interventions: An innovative approach to training the citizen for emergency situations using virtual reality techniques and advanced IT tools (The VR Tool). Journal on Information Technology in Healthcare, 2, 399–412.
A Survey of International Virtual Environment Research and Development
145
Marmara Research Center (MRC). (2007). European simulation network. Retrieved December 9, 2007, from http://www.mam.gov.tr/eng/ Matkovic, K., Psik, T., Wagner, I., & Gracanin, D. (2005). Dynamic texturing of real objects in an augmented reality system. Proceedings of IEEE Virtual Reality Conference (VR 2005), 329, 257–260. Menchaca, R., Balladares, L., Quintero, R., & Carreto, C. (2005). Software engineering, HCI techniques and Java technologies joined to develop web-based 3D-collaborative virtual environments. Proceedings of the 2005 Latin American conference on Humancomputer interaction (pp. 40–51). New York: Association for Computing Machinery. NATO Research and Technology Organization. (2007). Simulated Mission and Rehearsal Training (SMART). Retrieved August 29, 2007 from http://www.rta.nato.int/panel.asp? panel=SMART Neubauer, A., Wolfsberger, S., Forster, M., Mroz, L., Wegenkittl, R., & Buhler, K. (2005). Advanced virtual endoscopic pituitary surgery. IEEE Transactions on Visualization and Computer Graphics, 11(5), 497–507. ¨ ztemel, E., & O ¨ ztu¨rk, V. (2003, April). Intelligent evaluation definition of training sysO tems in synthetic environments. Paper presented at the International Training and Education Conference (ITEC 2003), London, United Kingdom. Ryerson University. (2007). Virtual environment laboratory. Retrieved December 27, 2007, from http://www.ryerson.ca/civil/research/laboratories/ TNO. (2008). Desdemona: The next generation in movement simulation. Retrieved April 17, 2008, from http://www.tno.nl/downloads/veilig_training_desdemona_S080014_EN.pdf Vacchetti, L., Lepetit, V., Papagiannakis, G., Ponder, M., Fua, P., Thalmann, D., & Magnenat-Thalmann, N. (2004). Stable real-time AR framework for training and planning in industrial environments. In S. K. Ong & A. Y. C. Nee (Eds.), Virtual and augmented reality applications in manufacturing. London: Springer-Verlag. Virtual Reality Lab (VRLab) [website]. (2007). Retrieved December 7, 2007, from http:// ligwww.epfl.ch/About/about_index.html
This page intentionally left blank
SECTION 2
TRAINING EFFECTIVENESS AND EVALUATION SECTION PERSPECTIVE Eric Muth and Fred Switzer Any training effectiveness evaluation (TEE) study that does not consider the training context would be useless. Effectiveness of a training system is always a joint function of the training system itself and the tasks, goals, and operational environment of the activities being trained. Training and determining the effectiveness of that training is an interdisciplinary problem that crosses numerous fields including the following: fields focused on human behavior, such as psychology and education; fields that focus on human-system interactions such as human factors and industrial engineering; and fields that focus on system development, including computer engineering and computer science. While some of the information presented below is covered in more depth in other sections of this book, we felt it necessary to review some basic principles of training to give the reader some broad context in which to place TEE. In addition, we discuss some of the training outcomes for which virtual environments (VEs) may be particularly suited and some of the outcomes for which VE may be inappropriate or ineffective. In any case, evaluating training effectiveness (and its economic dimension, utility of training) must always occur in context. Finally, the evaluation of training effectiveness should begin before system development, else system development is not justifiable; that is, if current training is effective, why develop a new system? VE developers cannot simply develop a system and then wait until the system is operational to begin the evaluation process and justify the need and use of a system. This technology push is costly, not only in development dollars, but potentially in misappropriated training dollars to a good salesperson. Therefore, we discuss some context issues that should drive early informal and formal evaluation that should occur long before “full-up” operational evaluation.
148
Integrated Systems, Training Evaluations, and Future Directions
A PRIMER ON HOW PEOPLE LEARN There are a number of key principles in learning and skill acquisition—Levy (2006) referred to these principles as the “learning context” in training. These principles include active versus passive learning, massed (all the training is received at once) versus distributed (training is given over time) practice, whole versus part learning, positive versus negative transfer of training, feedback, and practice (Shultz & Shultz, 2002). These factors must be considered in designing a training program, especially when training novel and/or complex tasks. Active learning (also known as active practice) refers to the doing that may be necessary for trainees to learn. While observational learning can be helpful, often active learning is necessary for true training effectiveness. For example, a trainee cannot just be told or shown how to go about clearing a building (see Hoover and Muth, Chapter 19, and Knerr and Goldberg, Chapter 23). He or she must actively experience it to acquire the understanding and skills necessary to be effective quickly. VE training systems can lend themselves to either approach, but VE offers unique opportunities for the trainee to actively participate in the learning process. For example, Darken and Banker (1998) found that VE training outperformed real world training in some conditions in an orienteering task. The design process should include at least some informal evaluation of the degree to which the training system allows the trainee to actively practice. Massed or distributed practice refers to two different approaches. In one approach, massed practice, trainees are given only one or a few long practice sessions. In the other approach, trainees are given many short practice sessions. Military logistics usually dictate that a massed practice approach be taken. However, it is important to note that distributed practice typically results in more effective learning. Shoot houses, such as those discussed in Hoover and Muth (Chapter 19) afford massed practice. Such VEs as those described in Grant and Galanis (Chapter 21) Bachelder, Brickman, and Guibert (Chapter 22) and Knerr and Goldberg (Chapter 23) afford distributed practice, especially when they can be run without supervision and contain appropriate feedback and after action review functions. Also, the spacing of practice sessions may have important motivational effects (this is discussed in more detail below). Whole and part learning refers to the portion of a task being learned. Whole learning refers to learning the entire task as a whole, for example, flying a plane. Part learning refers to breaking a task into its component parts, for example, emergency procedures, stick and rudder skills, communications, and so forth. The effectiveness of whole or part learning depends on the complexity of the task and the ability of the trainees. Typically, when a novice is confronted with a complex task that can be broken down into subskills, part learning will be more effective. The decision about how to subdivide the content of a training program has to be a joint function of an effective and accurate task analysis and an understanding of learning principles. Dorsey, Russell, and White (Chapter 20) and Goldiez and Liarokapis (Chapter 26) touch on aspects of this process in identifying common (“identical”) elements in the training task and the real task and integrating virtual training and live training.
Training Effectiveness and Evaluation
149
One of the biggest contributions VEs may make to the training process is the increased opportunity to practice. Obviously practice is a critical component in high level performance of complex tasks. But the practice must include feedback to avoid practicing apparently effective behaviors, that is, practicing bad habits instead of good ones. Also, practicing beyond the point of mastering the target skill (practicing to automaticity or “overlearning”) has value, especially in the kinds of tasks that are likely to be taught in VEs. Feedback (also called “knowledge of results” or “KR”) refers to giving the trainees information on how they are doing in the training and how they are progressing. If trainees are not given feedback, they may continue to use inappropriate strategies and perform ineffective behaviors because they have no information on whether these strategies and behaviors are working. The feedback should be immediate and frequent, if possible, and both positive and negative feedback are effective when delivered correctly. Hoover and Muth (Chapter 19) discuss instrumentation for training and how data can be derived from training systems to give the training feedback. Lampton, Martin, Meliza, and Goldberg (Volume 2, Section 2, Chapter 14) discuss the use of after action review systems in VEs. Transfer of training is, in many ways, the “bottom line” of training. Transfer of training refers to the application of knowledge and skills acquired during training to the actual work environment (Baldwin & Ford, 1988). The training program needs to give careful attention to the gap between the training and work environments to ensure that skills mastered during the training transition to performance in the work environment. For example, the relevance of any simulations, demonstrations, or lectures on such a topic as spatial disorientation, the feeling of being positioned in three-dimensional space in a position different from true position, to actual flight must be made obvious to trainees. Note that transfer of training can be both positive and negative. Negative transfer is acquiring bad habits, misinformation, and misperceptions during training and applying those negative behaviors in the actual work environment. Negative transfer can be reduced by ensuring that critical elements of the training environment match the critical elements of the operational environment. This is Edward L. Thorndike’s famous “identical elements theory.” Sullivan, Darken, and Becker (Chapter 27) and Becker, Burke, Sciarini, Milham, Carroll, Schaffer, and Wilbert (Chapter 28) discuss transfer of training; and Dorsey, Russell, and White (Chapter 20) discuss extensions of identical elements theory for use in VEs. A PRIMER ON TRAINING EVALUATION As in good training design, accurate training evaluation depends on having done the appropriate task, job, needs, and organizational analyses to determine the knowledge, skills, and abilities that the training is intended to deliver. Only then can the training outcome measures be chosen rationally and effectively. Those outcome measures have traditionally (Kirkpatrick, 1976) been divided into four categories, reaction measures, learning measures, behavioral measures, and results measures.
150
Integrated Systems, Training Evaluations, and Future Directions
Reaction measures, that is, measures of the trainees’ attitudes, their positive or negative evaluation of the training program, have long been used in training evaluation. However, it has also long been known that reaction measures have limited usefulness in evaluating training effectiveness (including the finding that there is often a positive bias in reaction measures). However, Alliger and Janak (1989) have proposed looking at trainees’ “utility” reactions, that is, their perceptions of the amount of positive transfer of training they expect to occur. This type of measure may prove more useful than the traditional affective reactions measure. However, affective reactions to training should not be ignored. Participants’ post-training attitudes (and especially belief and attitude changes caused by training—see below) may have effects on such motivation-critical variables as self-efficacy/confidence, motivation to engage in further training, and, in the case of teams, team efficacy, team communication, backup behavior, and so forth. Learning measures, typically implemented by some form of testing at the conclusion of the training, have proven useful to a degree. Kraiger, Ford, and Salas (1993) proposed three categories of learning criteria: 1. Cognitive outcomes (knowledge and memory; this would likely include both semantic and procedural memory); 2. Skill based outcomes—skills, procedures, and performance; and 3. Affective outcomes—beliefs and attitude changes as a result of training.
In addition, other taxonomies of learning criteria have been proposed including Bloom’s original taxonomy of educational objectives (Bloom, Englehart, Furst, Hill, & Krathwohl, 1956) and its later revision (Anderson et al., 2001). This revised version of Bloom’s taxonomy categorizes learning criteria on two dimensions: knowledge (factual, conceptual, procedural, and metacognitive) and cognitive process (remembering, understanding, applying, analyzing, evaluating, and creating). Behavioral measures are those that quantify actual changes in behavior in the operational environment that are a result of training. As Levy (2006, p. 244) noted, “An evaluation of a training intervention that didn’t include measures of behavioral criteria would be seriously flawed.” In some senses, behavioral measures are the “gold standard” of training evaluation. That said, there are typically serious design and measurement issues that often restrict or prevent the acquisition of accurate behavioral measures (see the discussion below). Finally, the fourth type of outcome measure is “results.” Results outcomes are the value of the training program to the organization and its goals. In other words, was the training worth the time, effort, expense, and so forth expended by the organization? While in many ways results are (at least theoretically) the ultimate measure of the effectiveness of a training program, there are two factors that often argue against the use of results measures (or worse, generate misleading data). One factor is criterion deficiency. This refers to situations in which outcome measures are not tapping into the full range of target behaviors and results. In other words, important aspects of performance are being left out of the
Training Effectiveness and Evaluation
151
measures such that they form an incomplete picture of actual training effectiveness. Another factor is criterion contamination. This refers to situations in which the outcome measure is tapping into irrelevant behaviors and results and therefore generating misleading data. The other major issue in evaluating training effectiveness is the design of the evaluation process. This is the issue of research design—how should the evaluation study be configured and analyzed to give accurate and unconfounded answers to the question “did this training work?” Research design is such a broad topic that we cannot adequately even introduce it in this short primer. But good research design is critical to accurate evaluation of training effectiveness and positive transfer.
BUILDING EFFECTIVE VIRTUAL ENVIRONMENTS: SOME ISSUES (AND UNANSWERED QUESTIONS) VEs are famous (or notorious) for allowing more extreme situations than are typically presented in most training situations. VEs allow the trainee to experience a variety of variants of the worst case scenario and its close relatives. VEs allow a degree of control and precision over the training process seldom seen in previous training methods. This is primarily because VEs allow control over a greater number of variables in the training process than has been possible previously. But this also creates some new problems. Control over variables means that choices must be made about the levels of those variables to implement in the VE. This means that the instructor has a new challenge—the requirement to specify the ranges and values (even if those values are fixed or made the default) for a potentially huge number of variables. Very often the current research literature offers little or no guidance about the proper choices. And there is a corollary to this issue—what variables can safely be ignored, minimized, or eliminated entirely? Further, as Salas, Wilson, Burke, and Bowers (2002, p. 21) point out, it is a myth that “everyone who has ever learned anything or has gone to training is a training expert and therefore can design it (the training).” Therefore, defaulting to subject matter experts may not be the best solution either. Salas and Burke (2002) offered a set of useful guidelines for use of simulations in training. There is one hidden benefit to the development and use of VEs: we will be forced to simultaneously look at a broad range of variables in the training process, specify the relative importance of these variables to training outcomes, and specify the useful ranges and values of these variables to guide the trainer. The development of VEs for training is spotlighting other issues that have either not been present before or have had minimal impact on the effectiveness of training: for example, issues like the physical interaction of the training equipment (that is, the actual training apparatus itself ) with the trainee; questions such as, Does the weight or weight distribution of a head-mounted display adversely affect the training effectiveness of HMDs when the training goes on for long periods? Is the limited field of view (FOV) found in VEs, such as driving simulators,
152
Integrated Systems, Training Evaluations, and Future Directions
reducing training effectiveness (and how)? For that matter, can limited FOV be a benefit by reducing distractions that interfere with early skill or knowledge acquisition? The issue of simulator sickness (nausea brought on by the VE apparatus; see Drexler, Kennedy, and Malone (Volume 2, Section 1, Chapter 11) has spawned an entire research literature of its own. Of course there is the classic issue of optimal level of fidelity. Note that we are characterizing the fidelity issue as seeking the optimal, not the highest, level of fidelity. As Salas et al. (2002, p. 23) point out, it is another myth that “the higher the fidelity of the simulation, the better one learns.” There is an implicit assumption in much of the VE training literature (and which stems very naturally from the larger simulation literature in general and identical elements theory in particular) that higher fidelity is always better in simulations. Is this necessarily true? In classic training methods, simplified, “lower fidelity” situations are often used effectively in the early stages of learning. Think of the elementary school reading primer (“See Dick run.”) as the classic example. Is there an analog in VE training? When is higher fidelity a liability rather than an asset? Again, the classic issue of psychological fidelity versus physical fidelity is something that must be considered if the goal is developing an effective VE. One point should be made here. Unfortunately, VE training system designers and developers will be faced with design decisions that the general training literature has not yet answered. Another hidden benefit of VEs may be to force training researchers (such as ourselves) to answer questions that have not been adequately addressed previously. For example, task analysis has not been refined to the point where we can easily and accurately answer such questions as “what are the critical elements of a process-control task mental model such that the process control team has an adequate shared mental model?” Another area in which VE may push training research is in the area of postexercise learning (for example, debriefings, after action reviews, follow-up training sessions, remedial training sessions, and so forth; see also our comments on feedback above), An often-overlooked capability of VE systems is to make a complete recording of an entire training session. While this capability may be approached in a real world system (for example, by using multiple video cameras), VEs can literally record every variable presented to the trainees. This capability offers enormous possibilities for post-exercise reviews. The digital nature of the data can allow trainers to literally put trainees in someone else’s (virtual) shoes. A trainee could see precisely how his or her actions affected a teammate’s behavior by replaying the scenario from the teammate’s point of view. For that matter, a combat simulator could put the trainee in the enemy’s shoes (the “bad-guy cam”). This same high density recording capability also provides a wealth of new opportunities and challenges in training performance measurement (and in training system evaluation). The opportunities are largely in two forms: more finegrained, even microscopic analysis of training performance is possible by examining very short time periods for very specific variables. At the other end of the level-of-analysis scale, sophisticated multiple weighted composite performance
Training Effectiveness and Evaluation
153
variables can be created that can capture the multiple-objective nature of most tasks (rather than oversimplifying task outcomes for convenience or to accommodate cruder performance measures). Note that this latter possibility demands a clear understanding of all of the task objectives, the performance measures associated with each, and what the relative importance of each objective is toward overall task performance—a tall (but desirable) order. TRAINING MOTIVATION AND VIRTUAL ENVIRONMENTS In many theories of training motivation and skill acquisition the pacing of the training is hypothesized to have a significant effect on trainees’ motivation. One of the strengths of VE training is often the ability of VE systems to dynamically alter the timing and difficulty of the tasks being trained (as opposed to more traditional training methods that are often linear and operate at a fixed pace). This ability to dynamically change the task timing, duration, even introduce unplanned pauses in the training scenario may give VE training a unique advantage. This may yield a distinct advantage for the (common) training situation in which the trainees exhibit various initial levels of ability. For example, Kanfer and Ackerman (1990, as cited in Kanfer, 1996) found that, at the same stage of training, lower ability trainees were more likely to suffer from low self-efficacy and negative affect during skill acquisition, while higher ability trainees were subject to boredom because they had already achieved a level of proficiency with the task. This implies that task pacing and timing could be varied in order to keep the slower-acquisition trainees’ self-efficacy up and the faster-acquisition trainees’ boredom down. It should be noted that Kanfer and Ackerman (along with a number of other self-regulation and skill acquisition researchers; Horvath, 1999) used a VE (a simulated air traffic control task) to test their hypotheses. Kanfer, Ackerman, Murtha, Dugdale, and Nelson (1994) found that goals during training impaired performance during massed practice. However, they found that this effect could be ameliorated by providing short breaks during training, presumably because this allowed the trainees to self-regulate (process and assess the goals relative to their immediately preceding training performance). VEs are not only typically amenable to building in this kind of pause, but the timing of the pauses and breaks could potentially be tailored to the individual trainee’s progress (or lack of it). TRAINING CRITICAL TEAM VARIABLES IN VIRTUAL ENVIRONMENTS Salas, Sims, and Burke (2005) have identified eight critical variables (five primary variables and three “coordinating mechanisms”) critical to team functioning: team leadership, mutual performance monitoring, backup behavior, adaptability, and team orientation as the primary variables; and shared mental modes, closed-loop communication, and mutual trust as the coordinating mechanisms. If VEs are to be used effectively in team training, we must understand
154
Integrated Systems, Training Evaluations, and Future Directions
their capabilities and limitations relative to these critical variables. For example, VEs offer a wealth of opportunities to observe, measure, and test mutual performance monitoring. VEs can often monitor directly (or restrict for testing) the information channels by which team members monitor each other. Potentially, measures of both quantity and quality of mutual performance monitoring can be implemented in VE. Likewise, many forms of backup behavior, another critical team variable, can be observed, measured, and manipulated during team training (and observed and evaluated during debriefing replays and AARs). At least two of Salas et al.’s (2005) coordinating mechanisms, shared mental models and closed-loop communication, have characteristics for which VEs likely offer unique and useful possibilities. VEs allow mental model assessments (for example, Endsley’s SAGAT procedure, 1988), which can be used to probe the trainees’ progress and identify problem areas, as well as allowing testing of shared (and individual) mental models by dynamically varying the training situation, introducing system failures and emergencies, simulating communication problems (forcing the trainees to rely on their shared models for tacit coordination), and so forth. As noted above, VEs also have the capability for innovative and unique training in team (closed-loop) communication. For example, VEs may be able to vary the quality and quantity of communication among team members. This could help develop trainees’ skills in overcoming communication problems, make them aware of the necessity for communications, and train restraint in unnecessary communications. Although speculative, it is not too much of a stretch to imagine VE training systems that can help train more esoteric team variables, such as mutual trust and team orientation. Mutual trust is typically developed over time in experienced teams, but VEs may allow new teams in training to experience a wide range of situations and challenges. Properly managed, that sequence of VE experiences might help new teams get to a higher level of mutual trust (and even develop higher levels of team orientation and team efficacy) than would be possible with other training methods. VEs certainly offer a range of possibilities for training adaptability. If adaptability is a mix of experience and creativity (and creativity is “10 percent inspiration and 90 percent perspiration”), then VEs should allow the kinds of practice and the range of experience necessary to improve teams’ adaptability. Also certainly VEs allow the testing of adaptability by allowing trainers to simulate failures and situations that put a premium on team adaptability and prohibit routine solutions. Of course, VEs offer the opportunity for virtual teammates. Since virtual teammates are under the control of the trainer, many aspects of team communication and coordination, mutual performance monitoring, backup behavior, even team leadership can potentially be trained and tested. SECTION OVERVIEW The chapters in this section discuss how to assess and evaluate the effectiveness of training with a focus on training systems. Effectiveness is examined from
Training Effectiveness and Evaluation
155
a variety of perspectives including transfer of training analyses and cost-benefit trade-offs of various training approaches. A variety of training systems is discussed, including low tech and high tech VEs. Technology innovations in the classroom are not new. Chalkboards, then overheads, then whiteboards, and now computers have, over time, changed the way traditional lectures are delivered. As computational power becomes less expensive and more pervasive, the types of technologies available for training will only increase in complexity and availability. Nonetheless, the same question persists, “Does it matter?” Those who develop the technology and many consumers of the technology assume it does or are forced to abandon previous training tools as old tools are replaced by newer, higher tech tools. Those who study training, as do the authors of these chapters, challenge the “develop it and it will train” philosophy. The main goal of this section is not to debate whether technology in training is good or bad; it is simply to give those interested in using technology in training an awareness and understanding that there are techniques to answer the question, “Does it matter?” This section is organized into three subsections: factors for TEE, relevance of fidelity in TEE, and applications of TEE. The goals of the first subsection are to explain when, why, where, and how TEE should be completed. The goal of the second subsection is to examine the relationship between fidelity in training systems and effective training; that is, low tech versus high tech: does it matter, and if so, when? The goal of the third section is to present some illustrative examples of TEE in action to serve as templates for future studies of TEE. REFERENCES Alliger, G. M., & Janak, E. A. (1989). Kirkpatrick’s levels of training criteria: Thirty years later. Personnel Psychology, 42, 331–342. Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., Raths, J., & Wittrock, M. C. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Longman. Baldwin, T. P., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Psychology, 41, 63–105. Bloom, B. S., Englehart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook 1: Cognitive Domain. New York: McKay. Darken, R. P., & Banker, W. P. (1998). Navigating in natural environments: A virtual environment training transfer study. Virtual Reality Annual International Symposium. Proceedings of the IEEE 1998 National Conference, 12–19, Atlanta, GA. Endsley, M. R. (1988). Situation awareness global assessment technique (SAGAT). Proceedings of the IEEE 1988 National Conference, 3, 789–795. Dayton, OH. Horvath, M. (1999). Self-regulation theories: A brief history, and analysis, and applications for the workplace. Unpublished manuscript.
156
Integrated Systems, Training Evaluations, and Future Directions
Kanfer, R. (1996). Self-regulating and other non-ability determinants of skill acquisition. In P. M. Gollwitzer & J. A. Bargh (Eds.), The psychology of action: Linking cognition and motivation to behavior (pp. 404–423). New York: The Guilford Press. Kanfer, R., Ackerman, R. L., Murtha, T. C., Dugdale, B., & Nelson, L. (1994). Goal setting, conditions of practice, and task performance: A resource allocation perspective. Journal of Applied Psychology, 79, 826–835. Kirkpatrick, D. L. (1976). Evaluation of training. In R. L. Craig (Ed.), Training and development handbook. New York: McGraw-Hill. Kraiger, K., Ford, K., & Salas, E. (1993). Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation. Journal of Applied Psychology, 78(2), 311–328. Levy, P. E. (2006). Industrial/organizational psychology: Understanding the workplace. Boston: Houghton Mifflin. Salas, E., & Burke, C. S. (2002). Simulation for training is effective when . . . Quality and Safety in Health Care, 11, 119–120. Salas, E., Sims, D. E., & Burke, C. S. (2005). Is there a “Big Five” in teamwork? Small Group Research, 36(5), 555–599. Salas, E., Wilson, K. A., Burke, C. S., & Bowers, C. A. (2002). Myths about crew resource management training. Ergonomics in Design: The Quarterly of Human Factors Applications, 10(4), 20–24. Shultz D., & Shultz, S. E. (2002). Psychology and work today (9th ed., pp. 167–170). Upper Saddle River, NJ: Prentice Hall.
Part V: Factors for Training Effectiveness and Evaluation
Chapter 17
TRAINING EFFECTIVENESS EVALUATION: FROM THEORY TO PRACTICE Joseph Cohn, Kay Stanney, Laura Milham, Meredith Bell Carroll, David Jones, Joseph Sullivan, and Rudolph Darken Training effectiveness evaluation (TEE) is a method of assessing the degree to which a system facilitates training on targeted objectives. Organizations can utilize TEEs to better understand the overall value of an existing or newly implemented training program by illustrating strengths and weaknesses of the training program that should be either maintained, further developed, or improved upon to benefit the organization’s performance as a whole. Although the inherent value of a TEE is undeniable, instantiating this practice has proven challenging. There are three primary difficulties that must be addressed when planning and performing a TEE. The first is difficulty in the development of meaningful and collectable measures of performance to indicate whether targeted knowledge, skills, and attitudes have been acquired. Traditional TEEs have primarily been designed based on Kirkpatrick’s (1959) four level model of training evaluation: (1) reactions, (2) learning, (3) behavior, and (4) results. While Kirkpatrick’s approach does provide a structural framework and an approach for its implementation across these four levels, there is a cost/ time trade-off associated with developing each level of metric. The more advanced levels (3) and (4) often require larger data collection efforts, over greater time horizons, and, consequently, many TEEs have often fallen short of evaluating beyond trainee reactions and declarative learning, levels (1) and (2), failing to capture training transfer behaviors in the operational environment and the overall organizational impact of such training. This traditional approach to TEE is difficult to justify since it will not likely provide any diagnostic value or insight into why a particular training intervention was more or less effective. Additionally TEE methods that rely on performance evaluation generally assume continuous learning curves that reflect steady performance improvements over time. Clark (2006) provides examples, such as language acquisition skills, where improvements in underlying cognitive ability (transitioning from rule based to
158
Integrated Systems, Training Evaluations, and Future Directions
schema based performance) initially results in poorer performance. These underlying cognitive effects are of much greater interest, but are much harder to measure and apply. The second difficulty associated with operational TEEs is the need to have an operational system or prototype to facilitate empirical evaluation. Traditionally, TEEs are performed after a training system is fully developed and instantiated in the training curriculum. As such, it has potentially already replaced a legacy trainer, and comparisons of comparative value cannot be evaluated. Although post-instantiation TEEs provide data regarding the utility of a training system or approach, many times these results cannot be leveraged to modify the system in order to increase the utility of the overall training program due to financial and time constraints. Thus, there is a need to take a more proactive approach by addressing how to build training effectiveness into the system/ program from the conceptual design. The third difficulty is with the multiple logistical constraints that many times surround evaluation of a system in the operational field. For instance, in evaluating military training there are several factors inherent to the domain (for example, lack of experimental control in operational environments and lack of access to fielded systems), which threaten the validity of inferences made based upon training results (Boldovici, Bessemer, & Bolton, 2002). Other limiting logistical factors include limited numbers of participants and rigid scheduling issues, both of which require that evaluations are designed to be performed during trainees’ traditional training courses. Sullivan, Darken, and Becker (Volume 3, Section 2, Chapter 27) provide an example of the problems and benefits of operational field tests of novel VE training systems.
LIFECYCLE APPROACH TO TEE This chapter aims to bridge the gap between theory and practice by detailing a feasible approach to TEE, which allows the difficulties detailed above to be addressed during system development and evaluation. The solution lies in a lifecycle approach to TEE. A lifecycle approach to TEEs is designed to follow the development lifecycle in order to provide high level input as the training system is initially being developed and more precise guidance as training system releases are made. Driven by a detailed task analysis, training needs can be identified up front to not only allow training effectiveness to be built in during the development process, but also to facilitate the development of metrics to assess if the training is meeting intended training goals. The second and most evident advantage to this approach is that the input provided by training needs analysis can be used to shape future development of the training system, as well as theoretical evaluation of the system’s ability to support targeted training objectives, prior to development. This spiral approach also provides an opportunity for multiple system evaluations, which allows training system designers and evaluators the opportunity to assess whether or not any instantiated changes are effectively increasing the utility of the training system that is under development and the training program as a whole. Third, the approach leverages quasi-experimental methods to allow evaluations of the system to be made with some level of
Training Effectiveness Evaluation: From Theory to Practice
159
certainty, grounded in research methods theory. This spiral TEE approach (compare Gagne, Wager, Gola, & Keller, 2005) thus incorporates an iterative cycle that encompasses training needs analysis, training systems user-centered design, and training effectiveness and training transfer evaluations to help mold a system that meets targeted training objectives (see Figure 17.1). This method can be used throughout the iterative design cycle: at conceptual design, with prototypes, and with fully functional systems pre- and post-fielding. Training Needs Analysis To ensure a training system is targeting the intended objectives, it is first necessary to identify training goals early in the training system design cycle. Training goals provide a foundation for all future stages of the TEE; if a system is not focusing training on appropriate goals and objectives, it will not have the desired effect on trainee performance. Hence, the first step in training evaluation should be the same as that of training development: determining who and what should be trained. For training courses that do not have defined training goals, these must be derived by performing a training needs analysis (TNA) (see Milham, Carroll, Stanney, and Becker, Volume 1, Section 2, Chapter 9). A TNA is the process of collecting data to determine what training needs exist to allow development of training that facilitates accomplishment of an organization’s goals (Brown, 2002). TNAs are accomplished through doctrine review, instructor interviews, and direct observation. As such, in order to complete a thorough TNA it is necessary to have (1) access to documentation that adequately explicates the targeted tasks, (2) access to and dedicated time with instructors or subject matter experts (SMEs) with detailed knowledge of the tasks to be trained, and (3) admittance to training exercises or operational performance in order to observe task performance (see Milham, Carroll, Stanney, and Becker, Volume 1, Section 2, Chapter 9). These sources of information provide the background knowledge necessary to form a solid base of domain expertise to direct training objectives and training content design. Training objectives can be defined at a very high level (for example, mission level), which may lack the granularity to derive meaningful objectives to target, or at a more specific level (for example, task level) facilitating the identification of precise training objectives. In cases where the training goals are defined at a high level, it is crucial to eventually drill down into the task to operationalize the high level goals into specific task training objectives. Task Analysis From identifying high level training goals to drilling down to detailed task training objectives, it is necessary to perform a task analysis, focused on the identification of the specific tasks and subtasks necessary to complete a mission, task flow, frequency, timing, operators, and task performance requirements. As a result, training objectives identified during the TNA can be realized in the
Figure 17.1. Lifecycle Approach to Training System Design and Evaluation
Training Effectiveness Evaluation: From Theory to Practice
161
training content design, and metrics to gauge task performance can be developed. With respect to the user, it is important that the task analysis identify the target trainee expertise level and the associated knowledge, skills, and attitudes (KSAs) currently possessed by trainees (pre-training), as well as those KSAs that are intended to be acquired through training. In order to identify scenario elements to be incorporated in the training content design, it is necessary to understand the subtasks with respect to how trainees gather information (what cues are required and through what mode—visual, auditory, or haptic) and how they act upon the environment in the real world. This form of task analysis, referred to as a sensory task analysis (STA), enables identification of multimodal cues and functionalities experienced in real world performance. The outcome from the task analysis is deep, contextually rich data that can be used to inform and direct user-centered training system design (UCD), thereby leading to design efficacy (see Soloway, Guzdial, & Hay, 1994) for an early example of learner-centered design). Training Systems Design Based on the results of the TNA and task analysis, a mixed or “blended” fidelity training solution can be designed, which identifies how best to meet learning objectives that will close the target performance gap. The solution should integrate the optimal mix of classroom instruction training technologies and live events throughout a given course of training to ensure a desired level of readiness is achieved (Carter & Trollip, 1980). The composition of the material to be trained, characteristics of the target training community, cost of delivery, and operational requirements all contribute to the determination of the optimal blend of training delivery methods (Cohn et al., 2007). Thus, in addition to identifying the training goals and objectives, these must be translated into system requirements, including the following: cue and interface fidelity, overall training management approach, and underlying metrics. Once a design is agreed to, iterative usability analyses can provide insight into system potential during the build phases, allowing developers time to make high impact and critical modifications if they are identified. From the composition design perspective, training systems design can be supported through UCD input. UCD ensures that a given training solution affords training by identifying the interactive, egocentric, and affective cues that best support training objectives, as well as functionalities that allow trainees to practice key competencies. UCD is driven by results from the task analysis, including the STA, which extends traditional task analysis beyond task breakdown, flow, sequence, and so forth to include identification of the critical multimodal cues (visual, auditory, and haptic) and functionality required to complete a task as defined by the operational context. UCD translates this contextual information into interface requirements by determining how the identified cues should be presented to the trainee to afford training of the task. This is done by coupling task information with training objectives to determine whether physical, psychological, or functional fidelity is required for successful task training. For instance, if
162
Integrated Systems, Training Evaluations, and Future Directions
a task requires one to detect aircraft orientation and project its response to changing environmental conditions, then the fidelity requirements demand high resolution visuals and realistic aircraft dynamic modeling, not just a low fidelity representation of an aircraft. On the other hand, given a task that primarily focuses on decision making in response to low granularity cues, low fidelity visuals, and a keyboard interface should suffice. In addition to multimodal cue fidelity requirements, a training system must also support operational functionalities and coordination requirements. To support transfer, systems should facilitate performance of the actions and procedures trainees are required to execute in the field. These functionalities are identified through the task analysis, which facilitates systematic identification of the system functionalities/capabilities required for an operator to successfully complete each task (for example, system requirements for buttonology, temporal realism, and so forth). Cue fidelity and functionality requirements are defined to facilitate interface design of an effective training environment that allows practice of targeted tasks. Practice alone, however, does not necessarily result in effective training, regardless of the quality of the training environment. In addition to the environment, a training management component is necessary to ensure effective training. Training management describes the process of understanding training objectives within a domain, creating training events to allow practice of targeted objectives, measuring and diagnosing performance, and providing feedback to trainees (Oser, Cannon-Bowers, Salas, & Dwyer, 1999). The training management component relies on both contextual data from the TNA and task analysis (for example, training objectives, metrics, scenario events, and scenario manipulation variables) to facilitate targeting training objectives at varying levels of difficulty, as well as theoretical and empirical data from the training science community to ensure effective training strategies (for example, feedback methods) are incorporated. Usability analyses, which use this information, should be conducted to examine the degree to which the human-system interaction is optimized within the training system. First, a heuristic evaluation consisting of exploratory and scripted interaction with the system can be performed to identify any existing usability issues that could result in user perception and interaction errors. Second, SMEs conduct cognitive walkthroughs of the system employing a “thinking aloud” protocol, from which cognitive rationale while performing tasks with the system’s interface is identified. Third, user testing is conducted to validate and extend findings from the heuristic evaluation and cognitive walkthroughs, and by examining the degree to which interface requirements identified in previous steps are instantiated within the system. The results of the usability evaluation are problem/solution tables describing interaction issues with the interface and redesign recommendations to enhance the training system design. The outcome from the UCD stage is (1) a training system environment designed to achieve physical, psychological, and functional fidelity through the specification of contextually derived cue fidelity and interface requirements and (2) a training management component designed to ensure training objectives are effectively targeted and to diagnose trainees’ performance and provide
Training Effectiveness Evaluation: From Theory to Practice
163
feedback or mitigations that enhance training outcomes, which should in turn lead to training efficacy. Training Systems Evaluation Training system evaluations are performed through iterative theoretical and empirical evaluation. This provides a framework for identifying problems in the simulation design, and for evaluating how well the training system affords performance enhancement. The most well-known model for evaluating the effectiveness of training systems is the aforementioned Kirkpatrick (1959, 1998) fourtier model. Level 1 (reaction) assesses a trainee’s qualitative response (that is, how relevant the trainee felt the learning experience was) to the training system via interviews, questionnaires, informal comments, and focus group sessions. Level 2 (learning) assesses the trainee’s increase in KSAs based on interaction with the training system, typically via instructor evaluations or more formally via pre- and post-test scores and team assessments. Level 3 (behavior) assesses the applied learning (transfer) to the target task via on-the-job observation over time or self-assessment questionnaires. Level 4 (results) assesses the organizational impact of the training (for example, return on investment, increased production, decreased cost, retention, quality ratings, and turnover) via posttraining surveys, interviews with trainees and managers, ongoing appraisals, quality inspections, or financial reports. Thus, a well-planned TEE will utilize an integrated suite of methods to assess the degree to which a system facilitates training on targeted objectives, as defined at each level. In general, each level can be assessed via either process or outcome measures. Process measures examine the manner in which a task is accomplished (Cannon-Bowers & Salas, 1997; Salas, Milham, & Bowers, 2003), whereas outcome measures focus on how well a trainee accomplishes the overall task mission. Taken together, process and outcome measures can be useful diagnostic tools, in that they provide information about what happened and why (Fowlkes, Dwyer, Milham, Burns, & Pierce, 1999). Using such measures, TEEs provide training developers with data to better understand the overall value of an existing or newly implemented training program by illustrating strengths and weaknesses that should be either maintained, further developed, or improved upon to ensure that training goals are met. We have developed a three-tier TEE framework through which to gather evaluative data at all four levels of Kirkpatrick’s model, which include (1) theoretical TEE, (2) trainee performance evaluation, and (3) transfer performance evaluation. Theoretical TEE: Determining Training System Design Effectiveness By performing a theoretical TEE, practitioners can answer the question of whether the training system design itself is effective in affording learning and whether learning can occur before the system is ever developed. Two “theoretical” TEE methods are used to answer these questions early in the development lifecycle, before trainees are ever brought to the system. Cue fidelity evaluations examine the degree to which the system includes the cue fidelity requirements
164
Integrated Systems, Training Evaluations, and Future Directions
identified in the UCD stage as necessary to address goal accomplishment (Herbert & Doverspike, 1990). This analysis identifies gaps between requirements driven by the task analysis and actual training system specifications. Next, required capabilities analyses are conducted to determine if the training system supports operational functionalities and coordination requirements necessary for a trainee to effectively perform targeted tasks. For example, given the task of using a laser designator to mark a target, in order to facilitate practice of the skills required to perform this task, some representation of the tool functionality must be present, the required fidelity of which is dependent on whether the goal is to train the cognitive or physical aspects of the task. The capabilities analysis identifies gaps with respect to functionalities required to perform such tasks and system capabilities. These theoretical evaluations can be complimented by user testing to support Kirkpatrick’s level 1 (reaction) evaluation, which provides data on how relevant the training is to the trainee. Trainees can be asked to provide subjective evaluations via survey or interview after exposure to the training system. The reactions assessed vary based on the training program but may include the following: • The relevance of the training to the trainee’s job, • The usability of the training system (for example, performance measures can include efficiency and intuitiveness, while outcome measures can include effectiveness and satisfaction), and • The adaptability of the training system (for example, ability to personalize the learning both in terms of content and assessment).
Trainee Performance Evaluation: Did Learning Occur? Trainee performance evaluation (that is, evaluating the learning that has taken place due to exposure to a training system) focuses on determining the following: What knowledge was acquired? What skills were developed or enhanced? What attitudes were changed? Kraiger, Ford, and Salas (1993) delineate a number of training outcomes related to the development of specific KSAs that can be used to structure an overall assessment of Kirkpatrick’s level 2 (learning) metrics. Bloom’s taxonomy, a widely used and accepted framework, can be used to assess learning behaviors along three dimensions—cognitive, psychomotor, and affective (Bloom, Englehart, Furst, Hill, & Krathwohl, 1956). Many VE training systems developed to date have targeted improved cognitive and/or psychomotor performance and are thought to be ideal for the development of complex cognitive task behaviors, such as situational awareness, cognitive expertise, and adaptive experience (Stanney et al., in press). While both cognitive and psychomotor behaviors are important to task learning and performance, the third component of learning identified by Bloom et al. (1956), affective behavior, encompasses the attitudes that students have toward the learned content. The attitudes of trainees have a significant effect on their learning potential. Thus, it is important to address all three learning outcomes. Cognitive outcomes involve the development of declarative, procedural, and strategic knowledge, that latter of which supports
Training Effectiveness Evaluation: From Theory to Practice
165
distinguishing between optimal and nonoptimal task strategies, as well as mental models, situation awareness, and self-regulation. Psychomotor, skill based outcomes assess how well skills associated with a given task or process (for example, perceptual, response selection, motor, and problem solving skills) have been developed and capture proficiency in domain-specific procedural knowledge (Kraiger et al., 1993; Proctor & Dutta, 1995). Attitudinal outcomes describe how attitudes are changed through training. These measures should assess such factors as affective behaviors that are critical to task persistence when faced with difficult operational objectives (for example, motivation, self-efficacy, and goal commitment; Kraiger et al.) and physiological responses associated with task demands (for example, stress). Targeted emotional responses (for example, remaining calm under pressure) may be achieved when trainees are given the opportunity to practice in affectively valid environments (that is, those that produce emotional responses similar to the operational environment). A multifaceted approach to performance measurement, capturing each of these training metrics, is critical if training effectiveness is to be successfully interpreted. Specifically, competent performance in complex operational environments requires not only the basic knowledge of how to perform various tasks, but also a higher level conceptual and strategic understanding of how this knowledge is applied in order to optimally select the appropriate strategies and actions to meet task objectives (Fiore, Cuevas, Scielzo, & Salas, 2002; Smith, Ford, & Kozlowski, 1997). Moreover, it is also critical that trainees possess both welldefined, highly organized knowledge structures as well as the necessary selfregulatory skills to monitor their learning processes (Mayer, 1999). Thus, training evaluation should measure changes in the trainee (for example, post-training selfefficacy, cognitive learning, and training performance). Self-efficacy, or confidence with respect to trained tasks, which is an affective outcome of training, has been shown through several studies to be correlated with performance (Alvarez, Salas, & Garofano, 2004). Self-efficacy can be assessed via selfefficacy questionnaires (compare Scott, 2000), which are modified specifically for the tasks(s) being trained. Given the logistical constraints imposed by factors inherent to the domain (for example, lack of experimental control in operational environments or limited participants), which threaten the validity of inferences made based upon the training results (Boldovici et al., 2002), quasi-experimental methods can be leveraged to provide some level of control regarding whether trainees have learned targeted skills. For example, often training cannot be withheld from a group of soldiers. In these cases, soldiers in pre-deployment training programs will receive all possible training intervention opportunities available to ensure their safety. As such, experiments comparing a control group (either with or without alternative training) to an experimental group may be infeasible. Given that inferences made from performance results cannot be unequivocally attributed to the training itself without a baseline control group against which to compare, evaluation techniques must be extended. In these cases, Sackett and Mullen (1993) suggest several preexperimental designs: pre-test–post-test no-control-group designs or post-testonly nonequivalent control group designs. Although the pre-test and post-test
166
Integrated Systems, Training Evaluations, and Future Directions
design provides change data, Sackett and Mullen (1993) point out that training cannot be accountable as the sole reason for change. Haccoun and Hamtiaux (1994) proposed using an internal referencing strategy (IRS) within a pre-test and post-test design without a control group, which introduces both trained, relevant material and untrained, nonrelevant material. Training effectiveness is measured by showing that positive changes in trained material are significantly greater than for untrained materials. By carefully selecting skills that are orthogonal in training relevance, some implications of trainer effectiveness can be reported with confidence using this technique that allows trainees to serve as their own control. Training Transfer Evaluation: Did Learning Impact Mission Success? Training transfer evaluation (that is, Kirkpatrick’s level 3 [behavior—transfer]) involves evaluating the trainees’ capabilities to perform learned skills while in the target operational environment (that is, transfer of acquired KSAs). This can be accomplished by establishing a relationship between training objectives and operational requirements (that is, combat capability, operational readiness, and business competitiveness) as derived by a well-defined process and outcome measures. Specifically, level 3 metrics can link training objectives to operational requirements. This linkage can be established in the form of a taxonomy based on the outcome of the documentation review (for example, military standards, instruction and training manuals, and universal task lists) and task analysis and then refined, structured, and prioritized via interviews with SMEs and trainers. Resulting training metrics should be directly related to operational effectiveness; be qualitatively or quantitatively defined to allow for measurement and evaluation; have associated targeted performance levels, allowing for gradations (for example, trained, partially trained, and untrained); and have relative weights associated with impact on operational readiness. The objective is to achieve a set of precisely defined training metrics that enable trainers to predict how effective a given training solution can be in achieving the highest levels of operational readiness and mission effective performance. To examine how training affects mission success, training transfer studies can be conducted to determine the impact of VE pre-training on performance in operational environments. Transfer performance (that is, the degree that training results in behavioral change on the job) is the gold standard of training (Alvarez et al., 2004). In the military domain, this could be expanded to include live fire exercises, given that live training is a rare and costly commodity. Yet, given logistical constraints, it is challenging to determine how best to use training technologies to optimize live fire training to reduce or better use the time spent in live fire. Within the aviation community, numerous studies have demonstrated the training efficacy and transfer of training from ground based flight simulators to actual flight operations (Finnegan, 1977; Flexman, Roscoe, Williams, & Williges, 1972; Jacobs & Roscoe, 1975; Lintern, Roscoe, & Sivier, 1989). To guide such studies, early on Roscoe and Williges (1980) provided a systematic approach for establishing the amount of live training time that could be saved
Training Effectiveness Evaluation: From Theory to Practice
167
through substitution of simulator training. One of the downfalls of these approaches, however, is the need for a large number of participants, a particular challenge in operational environments. To address this, methods have been developed to adapt the aviation transfer of training methods so as to deal with operational constraints (Champney, Milham, Bell-Carroll, Stanney, & Cohn, 2006) and provide an experimental paradigm to answer the questions of (1) how much VE pre-training should be provided to indicate stable and significant learning? and (2) for a set number of VE pre-training trials, how much live fire training can be saved? Ultimately, to evaluate transfer of training for a training system, the performance of a baseline group (trained to criterion in the live fire environment) is compared to trainees with VE pre-training (for a succession of training increments [trials/time intervals]) and then trained to criterion in the live environment Transfer Efficacy If the TNA and task analysis have led to design efficacy, which has in turn through UCD led to training efficacy, then transfer efficacy should be achieved thereby leading to operational readiness and mission effectiveness. To determine if these have been achieved, training results need to be translated to and compared to the organization’s overall goals. Kirkpatrick’s level 4 (results) evaluation involves determining the organizational impact (for example, transfer effectiveness and operational readiness), financial impact (for example, training throughput and return on investment), and internal impact (for example, achieving excellence; supporting organizational change or growth of trainees) of the training. To examine organizational impact, the transfer effectiveness ratio (TER; Roscoe, 1971) can be used to specify the trials/time saved in the live environment as a function of prior trials/time in the training platform. The incremental transfer effectiveness ratio (ITER; Flexman et al., 1972) can also be used to determine the transfer effectiveness of successive increments of training in the training platform, with successive increments of training predicted to decrease the average TER and ITER to a point where additional training is no longer effective. By examining the ITER, a training instructor can perform a trade-off analysis between live and simulator training time and prescribe the number of trials/time increments that should be run for a trainee to reduce the amount of live training needed to meet performance criterion (for example, spend 10 hours in live training versus spending 2 hours in simulator training and 7.5 hours in live training to obtain the equivalent performance criterion, thereby saving 2.5 hours of live training time). However, to examine the ITER, a large number of participants are required (that is, an experimental group is needed for each increment of time). As such, a method has been developed to identify a single point at which to compare the transfer effectiveness of VE training (Champney et al., 2006), allowing the calculation of TER and percent transfer despite logistical limitations. In terms of operational readiness, the U.S. Armed Forces primarily use the status of resources and training system (SORTS) to report readiness in four critical areas: personnel, equipment on hand, equipment serviceability, and training
168
Integrated Systems, Training Evaluations, and Future Directions
(Knapp, 2001). A “C rating” is produced for each of the four areas, as well as an overall rating assessment: • C-1 indicates the unit has the requisite resources and can support the full wartime mission(s) for which it is assigned. • C-2 indicates the unit can do most of the missions assigned. • C-3 indicates the unit can do many, but not all, of portions of the missions assigned. • C-4 indicates the unit needs more resources (people, parts, or training) before it can do its assigned missions.
Knapp (2001) recommends the addition of two reports that address higher levels of training and readiness: (1) a training report from the training event coordinator and (2) a concurrent readiness report. The training report assesses the level of training planned and the training goal and reports how well the training program was accomplished. The concurrent readiness report ties in current capabilities as depicted in the training report with current resources (for example, manning, equipment, and parts) to accomplish the mission. As for the financial impact, the ITER and the cumulative transfer effectiveness ratio (CTER) allow for an examination of training-cost efficiency. Specifically, ITER describes the time saved in one training situation due to successive increments of training in another (Roscoe & Williges, 1980). As opposed to percent transfer methods, this method accounts for the time spent in a trainer and considers the savings on one training platform (for example, live training) that is realized by each training interval in another (generally less costly) training platform. Theoretical and empirical work suggests that ITER and CTER are negatively decelerated functions; even if the percent transfer is still increasing, the value of each additional interval in the original trainer is adding only a fraction of value added. Hypothetically speaking, 1 hour in a trainer may save more than an hour and a half in flight, whereas 15 hours in a trainer may only save 7.75 hours
Figure 17.2. Training Transfer Effectiveness. Cumulative Transfer Effectiveness Ratio (CTER); Incremental Transfer Effectiveness Ratio (ITER)
Training Effectiveness Evaluation: From Theory to Practice
169
of flight. The ITER and CTER can be graphed to illustrate this function and the point of diminishing returns (see Figure 17.2). In our work, we have applied the Roscoe and Williges (1980) methodology to evaluate the ITER and CTER of several types of virtual environments (VEs), in an effort to examine the value of different levels of VE fidelity. Table 17.1 presents examples and descriptions of the Level 1–Level 4 metrics used in an operational TEE performed on the multiplatform operational team training immersive virtual environment system. Table 17.1. TEE Evaluation Metrics Kirkpatrick’s Level
Level 1— Reaction
Level 2— Learning
Level 3— Behavior
Level 4— Results* *
Notional metric.
Metric
Metric Description
Self-efficacy
Questionnaire assessing self-confidence in performing targeted task
Example Item
How confident are you that you can determine correction from visual marks PROFICIENTLY during live fire training? 1 (certain cannot do it)–10 (certain can do it) Knowledge test Test that targets You and your team are procedural knowledge performing a close air support mission. The call for fire (including suppression of enemy air defense) and 9 line have been called in and time on target is approaching. What will you and each of your team members be doing at this time? Corrections from mark Event based A checklist to be filled communicated to pilot checklist out by observers that (occurred, instructor queries which tasks were/were not performed assisted, instructor by the trainee and which corrected) tasks were instructor assisted Multiple effective marks Instructor A form in which the on deck and simultaneous evaluation form instructor rates trainee performance on targeted marks differentiated (poor, average, or training objectives excellent) Throughput Number of trainees In 1 hour, how many trained per unit time? trainees can complete the training scenario?
170
Integrated Systems, Training Evaluations, and Future Directions
CONCLUSIONS Developing, planning, and performing a training effectiveness evaluation is a deceptively complex exercise. From the basic knowledge elicitation efforts through metric development and ending with the actual assessment, there are many different factors that must be accounted for when attempting to quantify the impact of a training system. Oftentimes, significant trade-offs must be made, such as conducting an assessment that identifies benefits to the individual user at the cost of conducting a more holistic assessment to identify the return on investment to the organization. An enduring challenge for training system developers, procurers, and end users is ensuring that a complete lifecycle approach to TEEs is laid out as early as possible so that the costs, benefits, and risks to all stakeholders can be clearly identified and trade-offs made proactively, not reactively. This chapter provides one such approach, starting with analyses to identify specific training goals and objectives, and then defining the types of scenario elements necessary to realize these goals. The results from these early analyses can then be used to inform system design, support iterative usability analyses that should be interspersed through the development cycle, and guide the development of a training management component that ties together the other elements into a cohesive training system. Last, training system effectiveness may be determined by quantifying system utility across several levels: reaction, which focuses on training usability; learning, which quantifies how well the system provides the necessary information; behavior, which provides insight into the degree to which performance is improved; and results, which demonstrate long-term impact to the organization through the use of the developed system. Ideally, all four levels would be determined through a comprehensive TEE; in practice, however, constraints may force some levels to be emphasized over, or to the exclusion of, others. Done well, though, these trade-offs may be made, and the assessments performed, in a way that still allows all those involved to understand the overall utility of the training system REFERENCES Alvarez, K., Salas, E., & Garofano, C. M. (2004). An integrated model of training evaluation and effectiveness. Human Resource Development Review, 3(4), 385–416. Bloom, B., Englehart, M., Furst, E., Hill, W., & Krathwohl, D. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. New York: Longmans, Green. Boldovici, J. A., Bessemer, D. W., & Bolton, A. E. (2002). The elements of training effectiveness (Rep. No. BK2002-01). Alexandria, VA: The U.S. Army Research Institute. Brown, J. (2002). Training needs assessment: A must for developing an effective training program. Public Personnel Management, 31(4), 569–579. Cannon-Bowers, J. A., & Salas, E. (1997). A framework for developing team performance measures in training. In M. T. Brannick, E. Salas, & C. Prince (Eds.), Team performance assessment and measurement: Theory, methods, and applications. Series in applied psychology (pp. 45–62). Mahwah, NJ: Lawrence Erlbaum.
Training Effectiveness Evaluation: From Theory to Practice
171
Carter, G., & Trollip, S. R. (1980). A constrained maximization extension to incremental transfer effectiveness, or, how to mix your training technologies. Human Factors, 22, 141–152. Champney, R., Milham, L., Bell-Carroll, M., Stanney, K., & Cohn, J. (2006). A method to determine optimal simulator training time: Examining performance improvement across the learning curve. Proceedings of the Human Factors and Ergonomics Society 50th Annual Meeting (pp. 2654–2658). Santa Monica, CA: Human Factors and Ergonomics Society. Clark, E. V. (2006). Color, reference, and expertise in language acquisition. Journal of Experimental Child Psychology, 94, 339–343. Cohn, J. V., Stanney, K. M., Milham, L. M., Jones, D. L., Hale, K. S., Darken, R. P., & Sullivan, J. A. (2007). Training evaluation of virtual environments. In E. L. Baker, J. Dickieson, W. Wulfeck, & H. O’Neil (Eds.), Assessment of problem solving using simulations (pp. 81–105). Mahwah, NJ: Lawrence Erlbaum. Finnegan, J. P. (1977). Evaluation of the transfer and cost effectiveness of a complex computer-assisted flight procedures trainer (Tech. Rep. No. ARL-77-07/AFOSR-776). Savoy: University of Illinois, Aviation Research Lab, Institute of Aviation. Fiore, S. M., Cuevas, H. M., Scielzo, S., & Salas, E. (2002). Training individuals for distributed teams: Problem solving assessment for distributed mission research. Computers in Human Behavior, 18, 729–744. Flexman, R. E., Roscoe, S. N., Williams, A. C., Jr., & Williges, B. H. (1972, June). Studies in pilot training (Aviation Research Monographs, Vol. 2, No. 1). Savoy: University of Illinois, Aviation Research Lab, Institute of Aviation. Fowlkes, J. E., Dwyer, D. J., Milham, L. M., Burns, J. J., & Pierce, L. G. (1999). Team skills assessment: A test and evaluation component for emerging weapons systems. Proceedings of the 1999 Interservice/Industry Training, Simulation, and Education Conference (pp. 994–1004). Arlington, VA: National Training Systems Association. Gagne, R. M., Wager, W. W., Gola, K. C., & Keller, J. M. (2005). Principles of instructional design (5th ed.). Belmont, CA: Wadsworth/Thompson Learning. Haccoun, R. R., & Hamtiaux, T. (1994). Optimizing knowledge tests for inferring learning acquisition levels in single group training evaluation designs: The internal referencing strategy. Personnel Psychology, 47, 593–604. Herbert, G. R., & Doverspike, D. (1990). Performance appraisal in the training needs analysis process: A review and critique. Public Personnel Management, 19(3), 253–270. Jacobs, R. S., & Roscoe, S. N. (1975). Simulator cockpit motion and the transfer of initial flight training (Tech. Rep. No. ARL-75-18/AFOSR-75-8). Savoy: University of Illinois Aviation Research Lab, Institute of Aviation. Kirkpatrick, D. L. (1959). Evaluating training programs (2nd ed.). San Francisco: Berrett Koehler. Kirkpatrick, D. L. (1998). Another look at evaluating training programs. Alexandria, VA: American Society for Training and Development. Knapp, J. R. (2001). Measuring operational readiness in today’s inter-deployment training cycle. Newport, RI: Naval War College. Retrieved June 3, 2007, from: http:// stinet.dtic.mil/cgi-bin/GetTRDoc?AD=ADA393512&Location=U2&doc= GetTRDoc.pdf
172
Integrated Systems, Training Evaluations, and Future Directions
Kraiger, K., Ford, J. K., & Salas, E. (1993). Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation. Journal of Applied Psychology, 78, 311–328. Lintern, G., Roscoe, S. N., & Sivier, J. E. (1989). Display principles, control dynamics, and environmental factors in pilot performance and transfer of training (Tech. Rep. No. ARL-89-03/ONR-89-1). Savoy: University of Illinois, Aviation Research Lab, Institute of Aviation. Mayer, R. E. (1999). Instructional technology. In F. T. Durso, R. S. Nickerson, R. W. Schvaneveldt, S. T. Dumais, D. S. Lindsay, & M. T. H. Chi (Eds.), Handbook of applied cognition (pp. 551–569). Chichester, England: John Wiley & Sons. Oser, R. L., Cannon-Bowers, J. A., Salas, E., & Dwyer, D. J. (1999). Enhancing human performance in technology-rich environments: Guidelines for scenario-based training. In E. Salas (Ed.), Human/technology interaction in complex systems (Vol. 9, pp. 175– 202). Stamford, CT: JAI Press. Proctor, R. W., & Dutta, A. (1995). Skill acquisition and human performance. London: Sage. Roscoe, S. (1971). Incremental transfer effectiveness. Human Factors, 13, 561–567. Roscoe, S. N., & Williges, B. H. (1980). Measurement of transfer of training. In S. N. Roscoe (Ed.), Aviation psychology (pp. 182–193). Ames: Iowa State University Press. Sackett, P., & Mullen, E. (1993). Beyond formal experimental design: Towards an expanded view of the training evaluation process. Personnel Psychology, 46, 613–627. Salas, E., Milham, L., & Bowers, C. (2003). Training evaluation in the military: Misconceptions, opportunities, and challenges. Military Psychology, 15(1), 3–16. Scott, W. (2000). Training effectiveness of the VE-UNREP: Development of an UNREP self-efficacy scale (URSE). Orlando, FL: Naval Air Warfare Center, Training Systems Division. Smith, E. M., Ford, J. K., & Kozlowski, S. W. J. (1997). Building adaptive expertise: Implications for training design strategies. In M. A. Quinones & A. Ehrenstein (Eds.), Training for a rapidly changing workplace: Applications of psychological research (pp. 89–118). Washington, DC: American Psychological Association. Soloway, E. Guzdial, M., & Hay, K. (1994). Learner-centered design: The challenge for HCI in the 21st century. Interactions, 1(2), 36–48. Stanney, K. M., Cohn, J., Milham, L., Hale, K., Darken, R., & Sullivan, J. (in press). Deriving training strategies for spatial knowledge acquisition from behavioral, cognitive, and neural foundations. Military Psychology.
Chapter 18
TRANSFER UTILITY— QUANTIFYING UTILITY Robert C. Kennedy and Robert S. Kennedy Transfer effectiveness can be evaluated from numerous perspectives and can be quantified using a variety of metrics depending on the practical interests of the evaluator. This chapter provides several approaches that explain the challenges of transfer utility assessment and that can be implemented or modified for use in evaluating a virtual environment (VE) training system of interest. Such metrics as hours of practice, training days, simulator time, and flight time each quantify the training process in units of contact or learning time. Because of the large economic costs associated with administering VE systems, it is also useful to quantify such programs in fiscal units, such as budgetary dollars, capital outlay, or dollars saved over equivalent field training. Flight simulator training does not have any direct fuel costs compared to actual flight time, and with U.S. jet fuel consumption of over 600 million barrels (Energy Information Administration, 2007), simulator training administrators have an interest in looking at the marginal utility of the VE device in terms of fuel-cost savings. For example, Orlansky and String (1977) examined VE systems considering multiple cost variables and found that the investment of a multimillion dollar flight simulator could be amortized over about two years using fuel savings alone as an effectiveness criterion. Flight simulator administrators could also find value in quantifying the environmental impact of reduced fuel consumption, considering the fact that jet engines release over 20 pounds of carbon dioxide per gallon of fuel (Energy Information Administration). Clearly VE training evaluation draws interest from various and often independent factions and, thus, the process of assessment may require a variety of quantitative indices. The process begins with criterion definition. In many cases, the outcomes of interest are well specified in terms of proximal performance. Less common are those that result on higher levels of the organization. (See Goldstein & Ford, 2002, on criterion development.) The remainder of Volume 3, Section 2 discusses several practical approaches to evaluating VE transfer effectiveness.
174
Integrated Systems, Training Evaluations, and Future Directions
TRADE-OFF MODELS VE training systems provide opportunities to the trainee for development of important knowledge and skills with the expectation that virtual training can alleviate some of the risks and costs associated with on-watch training. It is not always clear as to what those benefits are and, more importantly, what the relative utilities are to multiple benefits associated with implementing VE devices over their real world counterparts. Some have attempted to make comparisons of the relative utility using a trade-off approach. For example, early research on incremental training benefits of various fidelity qualities was done by Miller (1954), who suggested that while engineering fidelity development was important, its utility would tend to show diminishing returns at high levels. One approach is the incremental transfer effectiveness ratio (ITER; Roscoe, 1971, 1980; Roscoe & Williges, 1980). Their work looked at the relationship between time spent in a flight simulator and its impact on savings using ITER [see Eq. (1)].
X represents hours of simulator time, and Y indicates the subsequent flight time necessary to reach criterion performance requirements. Using tabular data presented in Roscoe and Williges (1980), a graph may be plotted showing the relationship between ITER and simulator hours (see Figure 18.1). The plot supports
Figure 18.1. Using tabular data presented in Roscoe and Williges (1980), a graph was plotted showing the relationship between ITER and simulator hours.
Transfer Utility—Quantifying Utility
175
their contention that the first hour of training in a flight simulator saves more than an hour of training in an aircraft, with each subsequent hour saving less time until after 14 hours there is no significant savings in flight training time per unit VE training. This approach illustrates that when evaluating cost of training to some criterion level, the costs associated with a VE system can be compared with that of flight training to determine the more cost-effective approach. A related trade-off model called isoperformance derives contour lines or surfaces from all possible combinations of two or more determinants for a given performance (Jones, Kennedy, Kuntz, & Baltzley, 1987; Kennedy, Jones, & Baltzley, 1988). Like the ITER approach, isoperformance defines effectiveness at some desired outcome level, while the determinants act as boundaries in an iterative maximization analysis. The tabular data used in the previous example can also be used to construct an isoperformance curve. Figure 18.2 includes the ITER plot from Figure 18.1 with the addition of an isoperformance curve developed from the same tabulated flight and simulator training times for a given performance criterion. Thus, a comparison is provided of the resultant ITER and an isoperformance curve illustrating that the ITER slope approximates the derivative of the isoperformance curve relative to simulator time. The isoperformance approach assumes a value for the performance criteria and provides a means for trading off various predictor values, such as simulator and flight time as with this case. Other variables could be incorporated using the same approach, such as classroom training, previous flight experience, or previous VE experience just to name a few.
Figure 18.2. This plot shows Figure 18.1 with an isoperformance curve developed from the same tabulated flight and simulator training times for a given performance criterion.
176
Integrated Systems, Training Evaluations, and Future Directions
The need for objective data on fidelity payoffs was advanced through programmatic investigations such as that of Collyer and Chambers (1978), in which the architecture of the U.S. Navy aviation wide-angle visual system (AWAVS) program was defined. They noted the cost and complexity associated with unnecessarily high levels of physical or engineering fidelity and presented a program for acquiring behavioral research data designed to empirically investigate these questions. Consider the following four related elements: physical fidelity, realism, training effectiveness, and cost. If a seasoned aviator were asked to pick a good simulator, he or she would probably prefer one that has a high degree of realism. An engineer asked the same question might select one that possesses technological advances and that depicts faithfully the physical characteristics of the environment. Alternatively, a human resources manager, whose job it is to place the operator in an operational setting and demonstrate satisfactory performance, may consider training effectiveness as most important. The fourth element is cost, which is likely to increase along with the other three. We readily admit that all four elements are likely to be correlated with each other, but to the extent that they are not isomorphic, focus on one does not guarantee that the other criteria are also met. A realistic scene can have high fidelity, which would be expected to have a high training transfer, and so on, but these are empirical questions and when they have been studied, the correspondence among them is far from perfect. It is important to remember that these four elements are not likely to be perfectly correlated, and all too often we can adhere to one element when another is intended, which can result in erroneous or unreliable assessment results. Furthermore, there may be overriding reasons (time, resource constraints, and user acceptance) to emphasize one of these elements over another in making managerial decisions related to a particular VE configuration or alternative training approach. From the standpoint of best return on investment, however, it is critically important to keep a clear focus on the ultimate goal of empirically demonstrated effective performance in the operational setting. Evaluation plans should be based on the fullest possible understanding of each of these elements and how they can be traded off, and experimental questions should be framed that will allow such trade-offs to occur.
TIME-COST MODELS The evaluation of VE and other training systems involves several factors that include cognitive, attitudinal, affective, and skill improvement (Kraiger, Ford, & Salas, 1993). One of the most robust findings from the behavioral sciences is that which gives rise to learning theory: practice improves performance and a key element in practice is repetition (Ladd & Woodworth, 1911). This is particularly challenging for military applications when considering the significant level of turnover of its personnel. By design, officer and enlisted training programs assume that a large proportion of their graduates will complete an initial tour of duty and then separate from the armed services. There are certainly those who are career focused, but the rule has long been observed both in truth and in jest,
Transfer Utility—Quantifying Utility
177
“Half the military spends half its time training the other half ” (R. Ambler, personal communication, ca. 1960). Thus, the extremely costly military training model necessitates aggressive actions for identifying the most equitable training programs. VE training evolved, in part, from the need to provide a high level of practice trials while minimizing the impact of variables, such as raw training dollars and risk/danger to the servicemember or equipment, and the need for specialized practice that is not possible in operational settings, as well as a need to consider secondary and tertiary consequences of training, such as the environmental impact of fossil fuel consumption. Many VE systems have directly replaced or supplemented existing programs at reduced labor, fuel, operations, and maintenance costs while improving on safety and environmental impact. Each of these variables and many more could potentially describe the effectiveness of a VE training system. In other words, training effectiveness, which is often viewed in terms of efficacy of facilitating learning and practice, can also be evaluated in terms of business factors, such as return on investment, environmental impact, or even scrap-metal inventory. The most basic time-cost evaluation involves the comparison of early performance levels with those subsequent to a VE training program. Hays and Vincenzi (2000) conducted a training effectiveness evaluation (TEE) on the virtual environment submarine VE device, which provides simulated training for such shipboard tasks as determining position, ship handling, and emergency operations. They conducted routine analyses of variance on pre-measures and postmeasures, which showed positive learning effects. This approach may be ideal when post-training test performance is assumed to be a valid proxy for at-sea performance and can provide important data for use in cost analysis. At this level, a VE system can be compared to other programs of training. Even the most complex programs consist of a series of more simple training exercises, measurable using a time-cost metric. Applying the time-cost metric in laboratory tasks involves little more than specifying a level of performance (for example, 85 percent of the items correct or latencies less than a specified level) and then determining how much time, or how much practice, it takes subjects drawn from a specified population to meet those standards. Real world training environments are not so straightforward. As a model for the process of real world training where sufficient data are available for analysis, a Red Cross sponsored swimming program was evaluated (Kennedy et al., 2005). Though this example does not specifically evaluate VE systems, the model could prove quite useful when attempting to model VE transfer utility. The swimming lessons program was evaluated in which children first begin their swimming classes with a goal of achieving the level of “Beginner.” Analogous to flight training, upon successful completion of the Beginner level, the next level was the “Advanced Beginner,” followed by “Intermediate,” and finally “Swimmer.” Each level requires well-defined criteria that must be successfully achieved in order to move to the next level of qualification. Beginners are required to tread water for 30 seconds, swim the elementary backstroke for
178
Integrated Systems, Training Evaluations, and Future Directions
Table 18.1. The Data from the Swimmers as They Progressed through Training Level
Nonswimmer
H(N, z|z)
qz
0.00
1.00
Beginner
14.84
.72
Advanced Beginner
23.58
.54
25 yards, swim crawl for 25 yards, swim underwater for 15 feet, and so on. Table 18.1 presents data from a sample of 287 swimmers as they progress through the various levels of training. H(N, z|z) is the average number of hours taken to reach level z beginning at Nonswimmer (N). Qz is the proportion of swimmers reaching level z. As with military flight training, the results show that each level retains fewer and fewer participants as they complete their requirements. This sequence indicates that the set of participants at each consecutive level is a subset of those in the previous level. As a group, moreover, they are faster learners than the group of all children who reached Beginner. Children (also analogous to pilot students) who advance to higher levels reach lower levels faster than all children who reach those lower levels. Advanced Beginners who averaged 23.58 hours to reach that level are not the same children as the Beginners who averaged 14.84 hours to reach the Beginner level. The Intermediates took more time to reach Intermediate (30.42 hours) than the Swimmers. When analyzing training data, it is important to control for the potential confound that trainees who do not continue to a subsequent level may not be a random subset of those who reach a level. In the case of the Swimmers, they are slower to reach the level than others who get there and the further the Swimmers advance the stronger the company they keep becomes, until finally they are progressing no faster than average. If the Swimmers still continue, they will reach a level where they are among the late arrivals. VE training programs are likely to involve substantially more complex sets of tasks, which require more complex consideration of the interactions between skill levels, longer training times, and restricted range in trainees at higher training levels when conducting this type of analysis. However, the parallels with the learning processes involved with swimming training suggest we have much to learn from their association, including potential approaches to evaluating the transfer effectiveness of various VE systems. COST JUSTIFICATION Orlansky and String (1977, 1979) approach effectiveness evaluation in terms of an economic efficiency criterion, which they suggest must satisfy one of the following: 1. Minimize economic cost for a given level of performance effectiveness or production; 2. Maximize performance effectiveness for a given cost.
Transfer Utility—Quantifying Utility
179
Often the latter is of interest when there is a fixed budget and the transfer or performance is manipulable. In their example of flight training, criteria are required for flight training as well as simulator time. Cost/flight hour is specified through a process that distinguishes between relevant and irrelevant cost elements. Simulator effectiveness varies simultaneously based not only on training time, but also on how available the unit is for simulator time (Orlansky & String, 1977). Orlansky and String (1979) evaluated multiple simulators in terms of cost and performance effectiveness using existing data from a coast guard program (Isley, Corley, & Caro, 1974) and a navy program (Browning, Ryan, Scott, & Smode, 1977). They make the assumption that transfer of performance is constant, focusing on the economic efficiency of the programs after the implementation of a new VE (flight simulator) system. In the coast guard study, they note that flight training without the simulator costs $3.1 million per year compared to $1.6 million per year with flight training and simulator training combined. Using these figures, their simulator is amortized over 2.1 years. The navy study showed similar results. Their P-3c simulator costs were $4.2 million and its implementation reduced the required flight training time from 15 hours to 9 hours. Based on an average of 200 pilots per year, their approach showed a $2.5 million per year savings. An amortization of the simulator costs resulted in less than two years. They further considered that this system also reduced the number of aircraft required for the training program, which would result in a 10-year savings of over $44 million.
UTILITY ANALYSIS Training systems that use VE and other sophisticated training devices are quite expensive to administer and can run in the millions of budgetary dollars. There may be a desire to index training program effectiveness on economic scales using macroanalyses methods, such as utility analysis (UA; Brogden, 1949; Brogden & Taylor, 1950; Cronbach & Gleser, 1965; Taylor & Russel, 1939). UA is a general term used to describe a number of methods used in human resources program evaluation, especially in employment selection. Brogden (1949) and Cronbach and Gleser (1965) developed procedures that provided measures of selection program effectiveness in terms of dollars of profit. Iterations of these models are still in use, one of which is appropriately referred to as the Brogden-Cronbach-Gleser (BCG) model. The BCG model assumes a linear relationship between a predictor and a performance criterion, and employs the following formula:
Here, ΔU is the incremental utility gain from a selection instrument or program, Ns represents the number of applicants hired, SDY is the standard deviation of job performance in dollar terms, rxy is the validity coefficient of the measure, μXs is the mean predictor score of the selectees, N is the total number of applicants, and C is the average cost of administering the instrument for an applicant.
180
Integrated Systems, Training Evaluations, and Future Directions
This model also assumes that the predictor score is standardized (Z), that the organization is selecting based on a top-down hiring policy, and that those offered a position all accept the offer (as discussed in Gatewood & Field, 2001; Cabrera & Raju, 2001). Others have conducted utility analyses, with one of the substantive differences across these studies being how the authors decided to measure SDY (for example, Bobko, Karren, & Kerkar, 1987; Bobko, Karren, & Parkington, 1983; Boudreau, 1983; Cascio & Ramos, 1986; Raju, Cabrera, & Lezotte, 1996; Reilly & Smither, 1985; Schmidt, Hunter, McKenzie, & Muldrow, 1979; Schmidt, Hunter, & Perlman, 1982; Weekly, Frank, O’Connor, & Peters, 1985). The principles of UA laid the foundation for various economic models for training program evaluation (for example, Barati & Tziner, 1999; Cascio, 1989, 1992; Cohen, 1985). Honeycutt, Karande, Attia, and Maurer (2001) adapted this approach in conducting a utility analysis of a sales training program using the following formula:
The analysis defined utility (U) is the financial impact of the sales training program on the organization as a whole. Time (T) is defined as the period that the training continues to impact the firm; N is the number of employees that remained with the firm; d1 represents performance changes resulting from the training; SDy represents the pooled standard deviation of employee performance. Their calculations show that the sales training program resulted in a profit (U) of over $45,000, which equated to $2.63 of revenue ($1.63 of profit) for each dollar spent on the training program. There are certainly limitations to this and each of the other methods. As with any attempt at data interpretation, it is important to understand the underlying assumptions and to test them accordingly. Given the assumptions stipulated by the Honeycutt et al. (2001) method, this method provides a potentially useful implementation of utility analysis models for use in training transfer. The extremely high costs associated with many VE training systems suggest a high likelihood for this or similar types of macroanalyses. SUMMARY As with any training program, VE programs seek to provide some treatment that will result in learning or practice ultimately intended for application on the job. The process of transfer effectiveness evaluation is intended to provide valid metrics that present training outcomes in terms of departmental and organizational objectives. Trade-off models, such as effectiveness ratios and isoperformance curves, are available and can be adapted for analyses where the evaluators seek to identify optimal configurations of training factors, such as fidelity and training time. Operational costs, such as maintenance, electrical/mechanical costs, trainer salaries, and trainee salaries, each have the potential to impact the overall effectiveness of a VE system, as well as corresponding departments. A careful consideration and definition of the effectiveness metrics are critical in
Transfer Utility—Quantifying Utility
181
prescribing a transfer effectiveness evaluation approach that will produce meaningful data and subsequent inferences. REFERENCES Barati, A., & Tziner, A. (1999). Economic utility of training programs. Journal of Business and Psychology, 14, 155–164. Bobko, P., Karren, R., & Kerkar, S. P. (1987). Systematic research needs for understanding supervisor-based estimates of SDY in utility analysis. Organizational Behavior and Human Decision Processes, 40, 69–95. Bobko, P., Karren, R., & Parkington, J. J. (1983). Estimation of standard deviations in utility analyses: An empirical test. Journal of Applied Psychology, 68, 170–176. Boudreau, J. (1983). Effects of employee flows on utility analysis of human resource productivity improvement programs. Journal of Applied Psychology, 68, 396–406. Brogden, H. E. (1949). When testing pays off. Personnel Psychology, 2, 171–185. Brogden, H. E., & Taylor, E. K. (1950). The dollar criterion—Applying the cost accounting concept to criterion construction. Personnel Psychology, 3, 133–154. Browning, R. F., Ryan, L. E., Scott, P. G., & Smode, A. F. (1977). Training effectiveness evaluation of devices 2F87F, P-3C Operational Flight Trainer (TAEG Report No. 42). Orlando, FL: Training Analysis and Evaluation Group. Cabrera, E. F., & Raju, J. S. (2001). Utility analysis: Current trends and future directions. International Journal of Selection and Assessment, 9, 92–102. Cascio, W. F. (1989). Using utility analysis to assess training outcomes. In I. L. Goldstein, (Ed.), Training and development in organizations (pp. 63–88). San Francisco, CA: Jossey-Bass. Cascio, W. F. (1992). Managing human resources: Productivity, quality of work life, profits. New York: McGraw-Hill. Cascio, W. F., & Ramos, R. A. (1986). Development and application of a new method for assessing job performance in behavioral/economic terms. Journal of Applied Psychology, 71, 20–28. Cohen, S. I. (1985). A cost-benefit analysis of industrial training. Economics of Education Review, 4, 327–339. Collyer, S. C., & Chambers, W. S. (1978). AWAVS, a research facility for defining flight trainer visual requirements. Proceedings of the Human Factors Society 22nd Annual Meeting. Santa Monica, CA: Human Factors Society. Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions. Urbana: University of Illinois Press. Energy Information Administration. (2007). Aviation gasoline and jet fuel consumption, price, and expenditure estimates by sector, 2005. Retrieved August 15, 2007, from http://www.eia.doe.gov/emeu/states/sep_fuel/html/fuel_av_jf.html Gatewood, R. D., & Field, H. S. (2001). Human resource selection (5th ed.). Fort Worth, TX: Harcourt. Goldstein, I. L., & Ford, J. K. (2002). A review of training in organizations: Needs assessment, development, and evaluation (4th ed.). Boston: Wadsworth/Thomson. Hays, R. T., & Vincenzi, D. A. (2000). Fleet assessments of a virtual reality training system. Military Psychology, 12, 161–186.
182
Integrated Systems, Training Evaluations, and Future Directions
Honeycutt, E., Karande, K., Attia, A., & Maurer, S. (2001). An utility based framework for evaluating the financial impact of sales force training programs. Journal of Personal Selling & Sales Management, 21, 229–238 Isley, R. N., Corley, W. E., & Caro, P. W. (1974). The development of U. S. Coast Guard aviation synthetic training equipment and training programs (Final Rep. No. FR-D674-4). Fort Rucker, AL: Human Resources Research Organization. Jones, M. B., Kennedy, R. S., Kuntz, L. A., & Baltzley, D. R. (1987, August). Isoperformance: Integrating personnel and training factors into equipment design. Paper presented at the Second International Conference on Human-Computer Interaction, Honolulu, HI. Kennedy, R. S., Drexler, J. M., Jones, M. B., Compton, D. E., & Ordy, J. M. (2005). Quantifying human information processing (QHIP): Can practice effects alleviate bottlenecks? In D. K. McBride & D. Schmorrow (Eds.), Quantifying human information processing (pp. 63–122). Lanham, MD: Lexington Books. Kennedy, R. S., Jones, M. B., & Baltzley, D. R. (1988). Optimal solutions for complex design problems: Using isoperformance software for human factors trade-offs. Proceedings of the Operations Automation and Robotics Workshop: Space Application of Artificial Intelligence, Human Factors, and Robotics (pp. 313–319). Kraiger, J., Ford, K., & Salas, E. (1993). Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation. Journal of Applied Psychology, 78, 311–328. Ladd, G. T., & Woodworth, R. S. (1911). Elements of physiological psychology: A treatise of the activities and nature of the mind, from the physical and experimental points of view. New York: C. Scribner’s Sons. Miller, R. B. (1954). Psychological considerations in the design of training equipment (Tech. Rep. No. TR-54-563). Wright-Patterson Air Force Base, OH: Wright-Patterson Development Center. Orlansky, J., & String, J. (1977). Cost-effectiveness of flight simulators for military training: Volume I. Use and effectiveness of flight simulators (IDA Paper No. P-1275). Arlington, VA: Institute for Defense Analyses. Orlansky, J., & String, J. (1979). Cost effectiveness of computer-based instruction in military training (IDA Paper No. P-1375). Arlington, VA: Institute for Defense Analyses. Raju, N. S., Cabrera, E. F., & Lezotte, D. V. (1996, April). Utility analysis when employee performance is classified into two categories: An application of three utility models. Paper presented at the Annual Meeting of the Society for Industrial and Organizational Psychology, San Diego, CA. Reilly, R. R., & Smither, J. W. (1985). An examination of two alternative techniques to estimate the standard deviation of job performance in dollars. Journal of Applied Psychology, 70, 651–661. Roscoe, S. N. (1971). Incremental transfer effectiveness. Human Factors, 13(6), 561–567. Roscoe, S. N. (1980). Transfer and cost-effectiveness of ground based trainers. In S. N. Roscoe (Ed.), Aviation psychology (pp. 194–203). Ames: Iowa State University Press. Roscoe, S. N., & Williges, B. H. (1980). Measurement of transfer of training. In S. N. Roscoe (Ed.), Aviation Psychology (pp. 182–193). Ames: Iowa State University Press. Schmidt, F. L., Hunter, J. E., McKenzie, R. C., & Muldrow, T. W. (1979). Impact of valid selection procedures on work force productivity. Journal of Applied Psychology, 64, 609–626.
Transfer Utility—Quantifying Utility
183
Schmidt, F. L., Hunter, J. E., & Pearlman, K. (1982). Assessing the economic impact of personnel programs on workforce productivity. Personnel Psychology, 35, 333–347. Taylor, H. C., & Russel, J. T. (1939). The relationship of validity coefficients to the practical effectiveness of tests in selection. Journal of Applied Psychology, 23, 565–578. Weekly, J. A., Frank, B., O’Connor E. J., & Peters, L. H. (1985). A comparison of three methods of estimating the standard deviation of performance in dollars. Journal of Applied Psychology, 70, 122–126.
Chapter 19
INSTRUMENTING FOR MEASURING Adam Hoover and Eric Muth This chapter considers the problem of instrumentation for the recording of live building clearing exercises. The recording of live building clearing exercises can help training in a number of ways. The recorded data can assist an after action review by allowing an instructor or group of trainees to replay an exercise to look for weaknesses and errors. It also allows for the development of automated methods to analyze performance. Such automation could potentially make training more objective, and more readily available, where dedicated facilities and instructors are lacking. Over time, a database of millions of exercises could be built, facilitating deeper studies of variability of team performance, as well as the evolution of performance of individual teams as training progresses. In order to record live exercises, some amount of instrumentation is necessary. The instrumentation can be built into a dedicated training facility, for example, by placing sensors throughout the buildings used to conduct exercises. Instrumentation can also be placed on the trainees in order to record individual actions. In either case, there is some trade-off with regard to the variability and face validity of the allowed exercises: in general, the more data that are to be recorded, the more restricted the exercises. Instrumentation is by nature somewhat fragile. If trainees are required to wear or carry instrumentation, then their actions must be somewhat limited to ensure the correct operation of the instrumentation and to prevent its destruction. The same holds true for the facility and any instrumentation permanently deployed in the infrastructure. For example, trainees must not physically break through walls or fire rounds in the direction of cameras or other instruments recording the exercise. In addition, instrumentation fixed into the infrastructure of a facility is not easily redeployed; therefore, all the exercises must take place using the same buildings. This limits the variability in that the same structural layout and floor plans must be used every time. Training for urban operations is not a new idea; it is the instrumentation that is changing. Many military bases are already using facilities in which building clearing exercises can be conducted. These facilities range in size from a single building to a multiple-block area consisting of tens of buildings. The buildings are usually made of concrete and have no glass windows or other easily broken materials. Soldiers practice against mock enemy forces using either simunitions,
Instrumenting for Measuring
185
generally made either from paint or rubber, or “laser-tag” instrumented weaponry. It is important to note that while these facilities provide practice in urban operations, they already give up some realism due to the costs associated with simulating actual urban warfare. For example, the buildings cannot be harmed through breached entry without making it prohibitively costly to repair for the next group of trainees. Thus, adding instrumentation to augment and learn from training exercises is merely an extension of existing practices. Some existing training facilities have already been instrumented to allow the recording of exercises. Examples include the concrete towns at Fort Benning, Georgia; Fort Polk, Louisiana; Quantico, Virginia; and Twentynine Palms, California. These facilities are in general focused on platoon-sized operations involving multiple buildings. Cameras may be placed throughout the facility to record video from a variety of angles. However, the video is not used to track trainees and is correlated manually (if at all) for playback. Some facilities use the global positioning system (GPS) to track participants. This limits position tracking to outdoors and limits accuracy to several meters, at best. Weapons can be instrumented with equipment that tracks the number of shots fired. MILES (multiple integrated laser engagement system) gear is the most widely known type of training gear; it operates like laser-tag equipment. This type of gear can track when participants were shot and by whom. However, it is intended for outdoor use and is not suited for the shorter distances involved in indoor engagements. In contrast to these facilities, we are interested in the action that occurs inside a single building. While such actions can involve platoon- and larger-sized forces, we are interested in how a single fire team (four to five men) cooperates during building clearing. All the action we are interested in takes place indoors; therefore, GPS cannot be used to track the locations of participants. In addition, we require location accuracy on the order of 10 centimeters (cm) so that it is possible to identify what position a person occupies inside a room, not just what room that person is in. We also desire instrumentation that tracks where trainees are aiming weapons, and where they are looking, at all times. Monitoring the coverage of weapons and line of sight should allow for a deeper analysis of team performance. The rest of this chapter describes a facility we built to meet these needs. We call our facility the Clemson Shoot House. It consists of reconfigurable walls so that the floor plan layout can be changed. It uses a network of cameras to automatically track trainee locations. We constructed custom laser-tag-style weapons and helmets to track shots, hits, and the orientations of weapons and heads. We also constructed heart-rate monitors to provide some physiological monitoring of the trainees. All the tracking information is gathered at a central recording station where it can be stored and replayed. While describing our facility, we break down the options currently available and discuss the lessons learned during the construction of this facility. To our knowledge, the Clemson Shoot House represents the current cutting edge of this type of facility. There is almost no literature published regarding the construction of a shoot house or instrumentation for the recording of building clearing exercises. Even the relatively well-known MILES gear is barely discussed in the research
186
Integrated Systems, Training Evaluations, and Future Directions
literature. Therefore many of the lessons learned must be reported without reference to published literature; we hope that by documenting our facility and experiences this trend will change.
WALLS AND FACILITY INFRASTRUCTURE We constructed our facility at the 263rd Army National Guard, Air and Missile Defense Command site in Anderson, South Carolina. This site was chosen because of its proximity to Clemson University (about 15 kilometers), and the large area available. The Army National Guard provided space within a large warehouse that has a 6.1 meter (m) (20 foot) high ceiling. Our facility covers approximately 200 square meters (sq m), the size of a single-floor house, and is constructed entirely inside the warehouse. It consists of a shoot house and an instructor operator station. The shoot house is approximately 180 sq m of reconfigurable rooms and hallways. The instructor operator station houses equipment and provides for centralized training observation and after action review. By constructing the shoot house entirely inside an existing warehouse, we were able to leave off a ceiling or roof, and yet still provide protection from the environment for the instrumentation in the facility. Figure 19.1 shows a computer-aided design (CAD) diagram of the facility, where the instructor operator station is on the left side. The configuration of the shoot house can be changed by inserting walls at hallway junctions (creating various L, T, and Z hallways) and by removing entire walls between rooms (creating larger rectangular or L-shaped rooms). There are several external entrances to the shoot house so that various exercises can be scripted.
Figure 19.1.
Computer-Aided Design Diagram of the Clemson Shoot House
Instrumenting for Measuring
187
Within the shoot house, a small amount of furniture is placed into fixed positions. The furniture can be moved between exercises, but is expected to remain stationary during a single building clearing run. This is necessary to preclude the confusion that could be caused by tracking moving furniture and mistaking it for people. Figure 19.2 shows a picture overlooking a portion of the shoot house containing furniture. Note that in this picture, the shoot house is configured differently than shown in the CAD diagram (the walls between three of the small rooms have been removed, creating a large L-shaped room). The materials used for the walls are similar to those used for office partitions. They consist of thick Styrofoam sandwiched between two pieces of paneling. The framing is metal and bolted into the concrete floor. Support at the top is provided by additional framing spanning across halls and other open areas. These materials withstand simple collisions and pressure from people leaning on them, but could be damaged by strong actions or point loads. Compared to using concrete blocks, the benefit is that it takes less than one hour to reconfigure the floor plan. Lessons Learned • Partition walls are sufficient (2.4 m [8 feet] high, movable, sturdy enough) when constructed inside a warehouse.
Figure 19.2.
An Overhead View of Part of the Shoot House
188
Integrated Systems, Training Evaluations, and Future Directions
• The facility should have varying room sizes and shapes and door placements. Opposing symmetric doorways in a hallway are more challenging than offset doorways or single entries. • If live observation is desired, an observation platform or tower near the center of the facility is useful to avoid having to go too high to see inside.
POSITION TRACKING Our shoot house is equipped with 36 cameras (visible on top of the walls in Figure 19.2) wired to a rack of seven computers. The cameras are calibrated offline to a common coordinate system (Olsen & Hoover, 2001). The computers record the video feeds and process them in real time to track the spatial locations of subjects (Hoover & Olsen, 1999). Tracks are updated 20 times per second and are accurate to approximately 10 cm in two dimensions. This provides for a detailed analysis of team motion, such as how a team moves through a doorway and what positions are taken during clearing of a room. The raw video is also recorded and can be viewed during playback along with the tracking data. Figure 19.3 shows a screenshot from a replay of a four-man exercise. The location of each tracked person is displayed as a colored circle, shown in the floor plan of the shoot house. Walls are displayed as white lines, and stationary furniture is displayed as white rectangles. The lines coming out of each circle represent the weapon orientation of each tracked person (this equipment is discussed in more detail in a following section). The orientations of helmets (not shown in this screenshot) can be displayed similarly. The video from a nearby camera is displayed at the right side of the screenshot, showing a live view of some of the
Figure 19.3.
A Screenshot of a Replay of an Exercise
Instrumenting for Measuring
189
action. The camera view can be manually selected to any of the 36 cameras, or it can be automated to select the camera closest to a particular track or to select the camera closest to groups of people. At the bottom right of the screenshot are data from the heart-rate monitors worn by each trainee (this equipment is also discussed more in a later section). The replay of data can be controlled through standard video controls, including pause, play, fast forward, and rewind. The replay can also be controlled according to shots fired, moving forward or backward, to successive shots fired. This mode of control lets those watching the replay quickly find moments of weapons action and examine the outcome. The deployment of the camera network and computers to process the camera feeds can take up to a full day. Most of this time is spent on running wires and could be reduced by using wireless cameras. Changing the floor plan of the shoot house can require repositioning some cameras. Moving a few cameras does not take too much time (less than one hour), including the time for calibration. The position tracking software is made aware of wall and furniture locations in order to help maintain continuity. For example, knowing where a wall is prevents a track from inadvertently being associated with another track on the opposite side. This similarly helps with tracking trainees as they move around furniture, preventing the track from inadvertently locking onto the furniture. The floor plan of the walls and furniture is stored in a file that is loaded when position tracking starts and can be changed very quickly. The lighting in the shoot house is provided by overhead spotlights of the variety commonly used in warehouses. The quality of this lighting is poor and causes a great deal of shadows. A single person standing in the middle of a room may cast three to four shadows of varying depth, each cast by a different overhead spotlight. These lighting conditions are among the worst possible for automated image processing because the shadows are difficult to differentiate from actual people. Although our tracking system is designed to be resilient to this problem, there are cases where multiple tracks within a small area are not properly differentiated. In such cases, a track may inadvertently lock onto a shadow, or two tracks may get mutually confused as shadows cross each other. The position tracking also tends to suffer in hallways and in doorways. If several people bunch up in a hallway, then the cameras typically do not have enough vantage to see the correct positions of each person. This is because the cameras are placed at opposing ends of the hallway, looking toward each other. If four people stand in a line between the cameras, then neither camera can see the two people in the middle. In a room this problem does not occur because cameras are placed in all four corners and generally provide complete coverage. In a doorway, tracks must be “handed off ” from one camera to another as a person moves through. Our tracking system tackles this problem by maintaining a global position for each track (as seen in Figure 19.3) and using that information to assist in camera hand-off. However, when multiple people quickly go through the same doorway, there can be some momentary confusion while the system performs the hand-offs of the multiple tracks.
190
Integrated Systems, Training Evaluations, and Future Directions
Overall, we estimate that our position tracking performs correctly roughly 70 percent to 80 percent of the time, with no operator intervention. In order to create “clean” tracking data, a recording is manually reviewed by a human operator using the replay tool. The human uses the video to compare against the automatically recorded tracks. If an error is observed, the human can override the automatically recorded position and manually fill in a corrected position. Depending on the number of people tracked throughout the exercise (four to eight people), the length of the exercise (0:30–6:00 minutes), and the quality of the data, the position cleaning process can take anywhere from 15 minutes to 1 hour. In addition to cleaning the position tracking data, the recording is also postprocessed to correlate rifle, helmet, and heart-rate data to individual tracks. This is accomplished by indicating which devices correspond to which position tracks. Finally, the shots fired and hits are correlated to identify kill shots. Typically, the entire system registers a hit roughly 50–100 ms after the corresponding shot (or about one to two time steps at our 20 Hz [hertz] sampling rate). These correlations are identified and saved in the final post-processed (or clean) data file in order to facilitate subsequent analysis. During the construction of this facility, we had the opportunity to observe several tracking technologies, such as radio frequency identification (RFID) based systems. It is our opinion that no currently available indoor tracking technology works better than what we have developed and that none of them (ours included) have fully solved the problem. There is still a need for a reliable, fully automated indoor tracking system, which can provide accuracy on the order of 10 cm and an update rate of 30 Hz. Preferably, the tracking system should be easy to deploy in any infrastructure and require minimal instrumentation on the bodies of tracked subjects. Until such a technology is developed, it will continue to be difficult to obtain tracking data on indoor operations. Lessons Learned • Video cameras are probably still the best sensor option to track people indoors at high accuracy (for example, on the order of 10 cm), even though they will not work near 100 percent of the time. • Based upon observations of the performance of other sensor types (such as RFID, sonar, and wireless signal strength based tracking), all currently available options still suffer in hallways and doorways. A completely automated, hands-off solution for tracking inside a building remains unknown. • Position data recorded at 20 Hz and at 10 cm accuracy allow for the visualization of the motion of trainees at a level heretofore unseen. Based upon watching over 1,000 exercises, we believe this fidelity of data should allow for new types of analysis of team performance; studies to validate this hypothesis are still ongoing.
WEAPONS AND BODY INSTRUMENTATION The primary purpose of instrumentation on the weapons and bodies of trainees is to track who shot whom. However, the instrumentation can also be used to
Instrumenting for Measuring
191
track the actions of trainees during periods when no firing is taking place. For example, it may be useful to track the orientations that trainees hold weapons, collectively and individually. It may be useful to track where trainees are looking at all times. Keeping weapons oriented properly, covering the blind areas of fellow team members, and watching weapons coverage are all likely related to team performance. Therefore, we desired instrumentation to track the orientations of the heads and weapons of trainees. We investigated the availability of MILES gear and the suitability of gear produced by the laser-tag gaming industry. Neither was found to meet our needs. While MILES is a familiar term to many involved in this field, we found surprisingly little information on vendors. We also were unable to find any literature detailing the MILES standard. In the gaming industry, we had more success identifying and communicating with vendors. However, all the systems we found were sold as closed products, making it difficult to modify to suit our needs. The vendors tend to be small companies, with small markets, and are not geared toward custom solutions. Facing these obstacles, we decided to design and construct our own custom laser-tag gear. We constructed several embedded devices, including weapons, helmets, and heart monitors. All of the devices are wireless and completely untethered. They use a chip built on the 802.11b networking standard to communicate data to the rack of computers in the instructor operator station. This allows us to use a commercial off-the-shelf 802.11 router to communicate with all our embedded devices, with all its advantages of throughput and error recovery. All data are updated at between 5–20 times per second, depending on the update rate of the individual sensors. Our weapons are plastic M16 replicas (see Figure 19.4) gutted and fitted with electronics to facilitate tracking. An orientation sensor fitted in the barrel of the weapon (see Figure 19.5) measures the three-dimensional orientation of the weapon relative to Earth’s magnetic field. The weapon emits a custom infrared signal upon firing, designed to avoid interference from ambient signals. Range is good to over 50 m, well beyond the size of the shoot house. The weapon is instrumented with a detector for the infrared signal for determining hits. All electronics are wired to a custom circuit board (see Figure 19.6). The circuitry details of our system can be found in Waller, Luck, Hoover, and Muth (2006). Our helmets (see Figure 19.7) are constructed using many of the same parts. The electronic compass is stored in the top of the helmet and allows us to roughly track where a subject is looking. Four infrared detectors are used, one on each side of the helmet, to determine when a subject has been shot. The same circuit board used in the weapon is used in the helmet to control all the parts and communicate with the instructor operator station. Our heart monitors (see Figure 19.8) use a standard electrocardiogram to measure heart activity. Individual heartbeats are detected onboard the device. The time between heartbeats is then used to compute heart-rate variability (Hoover & Muth, 2004). Heart-rate variability gives a longer-term measure related to the state of autonomic arousal of the subject, while heart rate gives the more familiar
192
Integrated Systems, Training Evaluations, and Future Directions
Figure 19.4.
Custom Laser-Tag-Type Weapon
Figure 19.5.
Orientation Sensor in Weapon Barrel
Instrumenting for Measuring
Figure 19.6.
Custom Circuit Board in Weapon Stock
Figure 19.7.
Custom Helmet for the Laser-Tag System
193
194
Integrated Systems, Training Evaluations, and Future Directions
Figure 19.8.
Wireless Heart-Rate Monitor
shorter-term measure related to physical activity. We expect the data measured by this device to be noisy, as the subject will be mobile and active. Therefore the monitor includes methods to overcome errors in heartbeat detection, while still being able to accurately measure heart-rate variability (Rand et al., 2007). Lessons Learned • MILES gear is not well published; neither is laser-tag equipment in general. It is difficult to find schematics (ours is now in the literature; Waller et al., 2006). The lasertag gaming industry does not produce equipment suitable for research and military training. • Wireless connections should be handled by a standard, such as 802.11. Off-the-shelf chips are available for use in embedded systems without the need to design custom radio frequency methods. Essentially, wireless connections have become a commodity. For building-sized distances, 802.11 is ideal. • Orientation sensing for small embedded systems is a relatively new endeavor and is not standardized yet (Waller, Hoover, & Muth, 2007). Until there is a standard
Instrumenting for Measuring
195
method for performance evaluation, engineers and scientists should be wary of commercially published specifications for orientation tracking, • Instrumentation should be worn on a helmet, a vest, and a weapon. Although the torso does not necessarily need to be tracked independently of other body parts, it is the most commonly fired-upon body part. It is useful to track head and weapon orientations of trainees.
CONCLUSION In an ideal situation, it would be possible to track the actions of soldiers as they clear buildings in active combat. Of course, the conditions of the task leave no time or concern for instrumentation for data collection. It has already become standard practice for trainees to practice building clearing exercises in mock towns and buildings. This practice is now being extended by instrumenting these facilities in order to record the exercises. To date, there is very little literature published concerning the construction and instrumentation of shoot houses, or facilities where building clearing exercises are conducted. Our goal with this chapter was to document our facility and the lessons we learned in the hope of changing this trend. REFERENCES Hoover, A., & Olsen, B. (1999, May). A real-time occupancy map from multiple video streams. Proceedings of IEEE International Conference on Robotics and Automation (pp. 2261–2266). Washington, DC: IEEE. Hoover, A., & Muth, E. (2004). A real-time index of vagal activity. International Journal of Human-Computer Interaction, 17(2), 197–209. Olsen, B., & Hoover, A. (2001). Calibrating a camera network using a domino grid. Pattern Recognition, 34(5), 1105–1117. Rand, J., Hoover, A., Fishel, S., Moss, J., Pappas, J., & Muth, E. (2007). Real-time correction of heart interbeat intervals. IEEE Transactions on Biomedical Engineering, 54, 946–950. Waller, K., Hoover, A., & Muth, E. (2007). Methods for the evaluation of orientation sensors. Proceedings of the 2006 World Congress in Computer Science Computer Engineering and Applied Computing (pp. 141–146). Waller, K., Luck, J., Hoover, A., & Muth, E. (2006). A trackable laser tag system. Proceedings of the 2006 World Congress in Computer Science Computer Engineering, and Applied Computing (pp. 416–422).
Part VI: Relevance of Fidelity in Training Effectiveness and Evaluation
Chapter 20
IDENTICAL ELEMENTS THEORY: EXTENSIONS AND IMPLICATIONS FOR TRAINING AND TRANSFER David Dorsey, Steven Russell, and Susan White In this chapter, we consider the issue of training transfer from an identical elements theory perspective. When organizations invest time and money in training, they do it for a specific reason—to realize a return on investment in terms of increased performance. However, even assuming task- or job-relevant training, there are no guarantees that what trainees learn in a training context will help them perform better on the job. This is the “transfer of training” problem that remains unresolved after 100 years of discussion in the academic literature (see Barnett & Cici, 2002; Cox, 1997). One of the most fundamental topics in this debate, and the one of central interest in this chapter, is how to construct a training environment for maximum transfer. In particular, we consider the implications of this debate on training that occurs in virtual environments (VEs). Our starting point is the seminal work of Thorndike and Woodworth (1901a, 1901b, 1901c) and the idea of identical elements. The basic tenet of this theory is that transfer is facilitated to the extent that the training environment matches the performance environment in terms of physical features, cognitive requirements, and the like. The more dissimilar the features, the more transfer will be degraded. The theory has great intuitive appeal, and its basic approach guides many modern theories of instructional design. From its conception, identical elements theory was much different from the theories that prevailed at the time of Thorndike and Woodworth’s research. For example, their work appeared at a time when the “formal” or “mental discipline” model was popular, arguing that people can strengthen their mental capacities (that is, general cognitive capabilities) by exercising their brains. By learning difficult subject matter, such as Latin, one’s capacity for doing other difficult things—performing difficult jobs—would presumably be improved. Despite its intuitive appeal, little research evidence supports the idea that performance on a task can be improved by improving on an unrelated one. Identical elements theory was also a marked departure from the perspectives of the Gestalists. The idea of dividing a training environment into “elements” is
Identical Elements Theory: Extensions and Implications for Training
197
inconsistent with the view that the whole is greater than the sum of its parts. Of course, putting identical elements theory in a Gestalt context does remind us to consider that it may be combinations/groups of features that need to be consistent between training and performance environments—and it may be that some elements are not separable from others or that some elements may cue others spontaneously (Cormier, 1984). When we describe the training environment and its essential features, we should clarify that we are referring to the features that contribute to creating a particular psychological or perceptual state within learners. Note that in Thorndike’s original works (for example, Thorndike & Woodworth, 1901a), the similarities of interest between training and performance environments were those between tasks or “mental habits”—not just the physical environment. Thus, in identifying the elements that should be matched, we focus on those that create a similarity in cognitive functioning between the two environments, as well as those that re-create the physical environment. Thinking of identical elements theory only in terms of matching features of the physical environment, and failing to distinguish between the physical and psychological features of the environment, substantially limits the depth and applicability of the theory. A wide variety of environmental dimensions impact the degree of transfer between learning and performance environments (see Barnett & Cici, 2002, for a comprehensive taxonomy of transfer dimensions). MODERN EXTENSIONS OF IDENTICAL ELEMENTS THEORY Although the difficulty in operationalizing identical elements has been recognized since their introduction, at a minimum, most modern accounts of training transfer invoke the tenets of identical elements theory as a jumping-off point. Fortunately, training research has progressed to a point of suggesting the transfer mechanisms necessary to overcome the absence of identical elements. In this chapter, we review these mechanisms in terms of how they extend identical elements theory, particularly as it applies to virtual environments. Transfer mechanisms can be arranged into three general categories: person based, design based, and situation based. This type of classification scheme is consistent with previous efforts to organize the transfer-of-training literature (for example, Baldwin & Ford, 1988). Examples of person based mechanisms include cognitive abilities, noncognitive traits, training motivation, and skills. The most relevant design based mechanism for a discussion of virtual environments concerns VE fidelity. Last, situation based mechanisms include characteristics of the social/training environment and levels of analysis (for example, individual training versus team training versus teams-of-teams training). Each of the above mechanisms extends identical elements theory and will be discussed in turn. Person Based Transfer Mechanisms Person based transfer mechanisms are individual characteristics (knowledge, skills, and abilities) that promote transfer in the absence of identical elements
198
Integrated Systems, Training Evaluations, and Future Directions
between learning and transfer contexts. A review of these mechanisms will assist us in building a profile of trainees most likely to demonstrate transfer. Cognitive capabilities are among the most robust of these transfer mechanisms (Chen, Thomas, & Wallace, 2005; Day, Arthur, & Gettman, 2001; Colquitt, LePine, & Noe, 2000; Holladay & Quin˜ones, 2003; Singley & Anderson, 1989). For example, Colquitt et al. meta-analyzed studies from the training motivation literature and reported a corrected r = .43 relationship between cognitive ability and transfer. This suggests that high ability trainees can more easily overcome gaps between learning and transfer environments. The manner in which trainees symbolically represent knowledge also plays an important role in transfer. John R. Anderson and his Carnegie Mellon University colleagues reframed Thorndike’s identical elements to refer to units of procedural and (to a lesser extent) declarative knowledge across settings (Anderson, Corbett, Koedinger, & Pelletier, 1995; Singley & Anderson, 1989). Specifically, transfer occurs when students have learned the same production rules (that is, if-then) in one setting that are required to succeed in a second setting. Without knowledge of the exact production rules required in a transfer context, cognitive skill can be demonstrated only to the extent that the learner manages to convert or translate known productions or declarative knowledge into new actions. Many cognitive based training interventions seek to influence the thought processes of novices by exposing them to experts’ knowledge representations of training tasks. Anderson’s laboratory has produced an impressive body of evidence supporting its approach to computer based tutoring (based on the adaptive control of thought–rational [ACT-R] theory of learning; Anderson et al., 1995), which tutors students by giving feedback based on discrepancies between student behaviors and a cognitive model of the problem. In a related vein, the authors of this chapter are presently conducting research investigating the effect of feedback on performance across different U.S. Navy air defense warfare (ADW) scenarios; the feedback we are using, not unlike the ACT-R tutors, is based on comparisons of student behaviors to those of an “expert” computational model. The positive link between trainee/expert knowledge structure similarities and transfer has also been demonstrated by Day et al. (2001) in a complex video game task. A number of noncognitive characteristics can also facilitate transfer. Selfefficacy, the belief one has in his or her ability to successfully complete a given task (Bandura, 1997), is perhaps the most frequently studied of these mechanisms (for example, Ford, Smith, Weissbein, Gully, & Salas, 1998; Mathieu, Martineau, & Tannebaum, 1993; Mitchell, Hopper, Daniels, George-Falvy, & James, 1994; Stevens & Gist, 1997). In their meta-analysis of the training literature, Colquitt et al. (2000) reported moderate-sized relationships between self-efficacy and transfer, including corrected correlations of r = .47 for pre-training self-efficacy and r = .50 with post-training self-efficacy. Thus, in our efforts to profile trainees most likely to transfer new skills to the work environment, two critical components include ability and belief in one’s ability. Ford et al. (1998) found support for a model that linked both self-efficacy (directly) and an “identical elements” learning strategy (indirectly) to transfer in
Identical Elements Theory: Extensions and Implications for Training
199
an ADW training simulator. In a novel extension of identical elements theory, learners in this study were asked to choose the task complexity of their practice trials, with the understanding that the most complex practice trial would be the most similar to the final transfer task. This allowed Ford et al. to operationalize identical elements between practice and transfer tasks as simply the number of self-selected practice trials of maximum difficulty; students choosing more complex practice trials minimized the practice-performance gap and performed better during the final task. Other noncognitive moderators of transfer include goal orientation (for example, Fisher & Ford, 1998; Heimbeck, Frese, Sonnentag, & Keith, 2003), personality traits (for example, Herold, Davis, Fedor, & Parsons, 2002), motivation (for example, Mathieu, Tannenbaum, & Salas, 1992; Colquitt et al., 2000), and such work “involvement” variables as job involvement and organizational commitment (for example, Colquitt et al., 2000). However, as Baldwin and Ford (1988) noted, the utility of identifying individual difference transfer mechanisms is limited by the fact that in most circumstances all employees in a given organizational unit must undergo training, not merely the employees most likely to demonstrate transfer. Unfortunately, an aptitude-treatment approach to personnel training—whereby individual trainees are matched to optimal training interventions (for example, low fidelity simulations or high fidelity simulations)—has received little research attention and still represents a “next frontier” for the instructional sciences. A final person based transfer mechanism worth noting is the acquisition of general learning skills that foster the acquisition of other skills. Metacognition and adaptive behavior are two such skills. Metacognition (sometimes used interchangeably with “self-regulation”) refers to an individual’s knowledge of, and control over, his or her own thoughts (Flavell, 1979; Ford et al., 1998). An individual who is trained to monitor his or her own learning progress, for example, can more easily identify trouble areas and adjust learning strategies than trainees with less developed metacognitive skills. The advantage of metacognitive strategies is their wide applicability, which does not depend upon specific content or contexts. Although the value of metacognitive procedures for transfer has been pointed out in educational research (Cox, 1997; Pressley, Borkowski, & O’Sullivan, 1984), metacognitive based instructional strategies have been slow to catch on in the classroom (Moley et al., 1992). Recent research involving computer based learning tasks (Ford et al., 1998; Keith & Frese, 2005) reinforces the notion that metacognition promotes transfer and is likely to generalize to virtual environments. In addition, current approaches to personnel training assume that everchanging situational elements can be overcome by learning “adaptive” behavioral skills (for example, Bell & Koslowski, 2002; Pulakos, Arad, Donovan, & Plamondon, 2000; Pulakos, Schmitt, Dorsey, Hedge, & Borman, 2002). Individuals who successfully modify their behaviors in response to changing work conditions are more adaptive than those who do not modify their behaviors (or those who do modify their behaviors, but choose unsuccessful behaviors). Recent
200
Integrated Systems, Training Evaluations, and Future Directions
research has documented that adaptability relates to training performance (Lievens, Harris, & Van Keer, 2003) and other job performance criteria (Johnson, 2001). Although such a proposition has not been tested directly in the training literature, we suspect that individuals who can demonstrate adaptive performance will be more likely to transfer new skills beyond original learning contexts. Design Based Transfer Mechanisms Design based transfer mechanisms represent instructional design characteristics of the virtual environment that can be altered to promote transfer. Of course, several instructional design features may impact transfer of training (for example, use of advanced organizers in training, encouraging self-reflection as a learning tool, and goal setting techniques), but we will maintain our identical elements focus here and discuss fidelity between the virtual learning and performance environments. Blade and Padgett (2002) defined fidelity as the “degree to which a VE . . . duplicates the appearance and feel of operational equipment (i.e., physical fidelity) and sensory stimulation (i.e., functional fidelity) of the simulated context” (p. 19). The transfer context is typically a real world setting (for example, the cockpit of an actual military aircraft); however, in certain instructional pipelines or sequences, VE training may be a precursor to training in a nextstep, higher fidelity simulator. According to identical elements theory, high fidelity training environments should facilitate transfer that is greater than low fidelity environments because the former more closely approximates the situational characteristics of the transfer environment (for example, ergonomic design, visual or auditory features, time pressure, and distractions). For example, there is considerable evidence in conventional training and educational research that physical differences between the learning and transfer contexts affect transfer in a negative way (for example, Ceci, 1996; Chen & Klahr, 1999; Rovee-Collier, 1993; Spencer & Weisberg, 1986). Despite widely held views to the contrary, however, there is little evidence available to support a relationship between the degree of the physical fidelity of VEs and transfer success (Koonce & Bramble, 1998; Lathan, Tracey, Sebrechts, Clawson, & Higgins, 2002; Salas, Bowers, & Rhodenizer, 1998). In fact, there is some evidence to suggest that distorting or augmenting the visual capabilities of VEs, in ways that no longer mirror real world parameters, can improve transfer (Dorsey, Campbell, & Russell, in press). Lathan et al. suggest that the primary benefit of high physical fidelity lies in motivating trainees: VEs with high tech appeal have greater “face” validity and therefore will be more likely to engage trainees. Functional or psychological fidelity, however, does appear to be in play for training within virtual environments. In a convincing demonstration of the power of identical elements, Lathan et al. (2002) described a program of research demonstrating that participants who learn routes using VEs develop “orientation specificity.” Participants tested on routes aligned with their original training were successful, but struggled to transfer route-learning skills when the required route was in a direction opposite (contra-aligned) to what had been learned in virtual
Identical Elements Theory: Extensions and Implications for Training
201
training. Rizzo, Morie, Williams, Pair, and Buckwalter (2005) described an ongoing research program examining the impact of what might be termed “emotional” fidelity on training outcomes. Specifically, Rizzo et al. were applying state-dependent learning principles (Overton, 1964) to test whether stress induced during virtual reality training could improve transfer to similarly stressful military environments. More research is needed to determine the strength of the relationship between degree of functional fidelity and transfer success, but at least some research does suggest that overlap in the functional/psychological elements of VEs promotes transfer. Situation Based Transfer Mechanisms Situation based transfer mechanisms represent social characteristics of either the training or transfer environment that can be altered to promote transfer. In conventional training studies, a number of researchers have examined perceptions of the post-training work environment as facilitators of transfer, including organizational culture, transfer climate, and supervisory support (for example, Facteau, Dobbins, Russell, Ladd, & Kudisch, 1995; Tracey, Tannenbaum, & Kavanagh, 1995; Smith-Crowe, Burke, & Landis, 2003). The social context of the transfer environment can determine whether skills that are learned in training will be supported and maintained or will be extinguished. Rouiller and Goldstein (1993) outlined several facets of an organization’s climate that promote transfer, including goal cues, social cues, task and structural cues, and feedback. Tracey et al. found empirical support for a model linking both transfer climate and an organizational culture of continuous learning (that is, a culture that places high value on social support, continuous innovation, and competitiveness) to transfer among a managerial population. Although post-training environment has not been systematically studied in the VE literature, we believe that an unsupportive training climate can undermine the effectiveness of any training intervention. Levels of Analysis Last, levels of analysis can impact how both person based and situation based mechanisms operate. Chen et al. (2005) conducted one of the few existing studies examining how individual characteristics and processes influence transfer at both the individual and team levels. Using a low fidelity flight simulator, they found that task knowledge and skill had a greater influence on transfer at the individual level than at the team level, whereas the impact of efficacy was greater at the team level. Thus, previously conflicting findings from individual and team training research may be better understood using a multilevel perspective. Regarding situation based mechanisms, levels of analysis might moderate the impact of organizational climate on transfer. Because transfer climates (and cultures of continuous learning) probably differ among work groups within organizations, as well as across organizations, Baldwin and Ford (1988) speculated that the same training program conducted in different work groups or organizations might
202
Integrated Systems, Training Evaluations, and Future Directions
result in different degrees of transfer. Until a greater number of multilevel studies of transfer accumulate, however, the influence of levels factors on transfer will remain largely speculative. Regardless, consideration of identical elements at multiple levels of analysis may be fruitful for both theory and practice. For example, a VE simulator may be quite similar to the actual performance environment at the level of individual performers, yet lack critical features at a team or group level that impact transfer.
CONCLUSION AND IMPLICATIONS FOR RESEARCH AND PRACTICE Although some elements of Thorndike’s original identical elements theory were not fully explicated in terms of understanding the range of cognitive, social, and emotional elements and mechanisms that can impact transfer, training research and practice are unlikely to escape the fundamental tenets of identical elements. Completely identical training and performance environments may be an unrealistic goal, but, as suggested by Cox (1997), the practice of decomposing training and transfer situations into their constituent elements is fundamental to both developing effective learning interventions and conducting experimental science. Understanding similarity in elements may be even more important in VE training domains, as little is understood about transfer in such settings. For example, the kinds of knowledge or skills that are best suited to VE training has received little, if any, research attention. Declarative knowledge (that is, facts and figures) does not seem to be as natural a fit to VE training as procedural knowledge (that is, how-to knowledge), but what about spatial or sensorimotor skills, vigilance, memory, or complex problem solving? Van Buskirk, Cornejo, Astwood, Russell, Dorsey, and Dalton (Volume 1, Section 1, Chapter 6) present a framework for beginning to address these questions by mapping a taxonomy of training interventions to a complementary taxonomy of learning objectives. To facilitate future work on identical elements in virtual settings, we offer a few summary thoughts and ideas regarding future research. First, as discussed above, any simulated environment, including a virtual one, can be characterized along various dimensions of fidelity (for example, physical fidelity and functional fidelity). From a design perspective, the fidelity of various features and elements represents potential points of correspondence between training and performance environments. Second, various factors known to moderate training effectiveness —be they cognitive, skill, social, affective, or environmental—must be considered alongside issues of identical elements. Such factors may act to amplify or attenuate the effects of identical elements discrepancies on transfer. Third, there is currently a dearth of research on many of the issues highlighted here. Further theory development and empirical research is needed, including measurement models and approaches to assess fidelity and identical elements correspondence in a multifaceted/multilevel manner. Research to inform the choice of VE based instructional techniques and strategies, in order to maximize identical elements and transfer across a wide variety of domains, does not currently exist.
Identical Elements Theory: Extensions and Implications for Training
203
By specifying the theory of identical elements, Thorndike and Woodworth (1901a, 1901b, 1901c) provided an important and foundational perspective on training and performance environments and related issues of transfer. By continuing to reflect upon their ideas, while considering modern extensions of identical elements, researchers and practitioners have much to gain in designing learning interventions that optimize transfer and promote learning that is reflected in the real world. REFERENCES Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. The Journal of the Learning Sciences, 4, 167–207. Baldwin, T. T., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Psychology, 41, 63–105. Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman. Barnett, S. M., & Cici, S. J. (2002). When and where do we apply what we learn? A taxonomy for far transfer. Psychological Bulletin, 128(4), 612–637. Bell, B. S., & Koslowski, S. W. J. (2002). Adaptive guidance: Enhancing self-regulation, knowledge, and performance in technology-based training. Personnel Psychology, 55, 267–306. Blade, R. A., & Padgett, M. L. (2002). Virtual environments standards and technology. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum. Ceci, S. J. (1996). On intelligence: A bioecological treatise on intellectual development. Cambridge, MA: Harvard University Press. Chen, Z., & Klahr, D. (1999). All other things being equal: Acquisition and transfer of the control of variables strategy. Child Development, 70, 1098–1120. Chen, G., Thomas, B., & Wallace, J. C. (2005). A multilevel examination of the relationships among training outcomes. Journal of Applied Psychology, 90, 827–841. Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of training motivation: A meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85, 678–707. Cormier, S. (1984). Transfer of training: An interpretive review (Technical Report No. 608). Alexandria, VA: Army Research Institute for the Behavioral and Social Sciences. Cox, B. D. (1997). The rediscovery of the active learner in adaptive contexts: A developmental-historical analysis of transfer of training. Educational Psychologist, 32(1), 41–55. Day, E. A., Arthur, W. Jr., & Gettman, D. (2001). Knowledge structures and the acquisition of a complex skill. Journal of Applied Psychology, 86, 1022–1033. Dorsey, D., Campbell, G., & Russell, S. (in press). Adopting the instructional science paradigm to encompass training in virtual environments. Theoretical Issues in Ergonomic Science. Facteau, J. D., Dobbins, G. H., Russell, J. E., Ladd, R. T., & Kudisch, J. D. (1995). The influence of general perceptions of the training environment on pretraining motivation and perceived training transfer. Journal of Management, 21, 1–25.
204
Integrated Systems, Training Evaluations, and Future Directions
Fisher, S. L., & Ford, J. K. (1998). Differential effects of learner effort and goal orientation on two learning outcomes. Personnel Psychology, 51, 397–420. Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitivedevelopmental inquiry. American Psychologist, 34, 906–911. Ford, J. K., Smith, E. M., Weissbein, D. A., Gully, S. M., & Salas, E. (1998). Relationships of goal orientation, metacognitive activity, and practice strategies with learning outcomes and transfer. Journal of Applied Psychology, 83, 218–233. Heimbeck, D., Frese, M., Sonnentag, S., & Keith, N. (2003). Integrating errors into the training process: The function of error management instructions and the role of goal. Personnel Psychology, 56, 333–361. Herold, D. M., Davis, W., Fedor, D. B., & Parsons, C. K. (2002). Dispositional influences on transfer of learning in multistage training programs. Personnel Psychology, 55, 851–869. Holladay, C. L., & Quin˜ones, M. A. (2003). Practice variability and transfer of training: The role of self-efficacy generality. Journal of Applied Psychology, 88, 1094–1103. Johnson, J. W. (2001). The relative importance of task and contextual performance dimensions to supervisor judgments of overall performance. Journal of Applied Psychology, 86, 984–996. Keith, N., & Frese, M. (2005). Self-regulation in error management training: Emotion control and metacognition as mediators of performance effects. Journal of Applied Psychology, 90, 677–691. Koonce, J. M., & Bramble, W. J., Jr. (1998). Personal computer-based flight training devices. International Journal of Aviation Psychology, 8, 277–292. Lathan, C. E., Tracey, M. E., Sebrechts, M. M., Clawson, D. M., & Higgins, G. A. (2002). Using virtual environments as training simulators: Measuring transfer. In K. M. Stanney (Ed.), Handbook of virtual environments: Design, implementation, and applications. Mahwah, NJ: Lawrence Erlbaum. Lievens, F., Harris, M. M., Van Keer, E., & Bisqueret, C. (2003). Predicting cross-cultural training performance: The validity of personality, cognitive ability, and dimensions measured by an assessment center and a behavior description interview. Journal of Applied Psychology, 88, 476–489. Mathieu, J. E., Martineau, J. W., & Tannebaum, S. I. (1993). Individual and situational influences on the development of self-efficacy: Implications for training effectiveness. Personnel Psychology, 46, 125–147. Mathieu, J. E., Tannenbaum, S. I., & Salas, E. (1992). Influences of individual and situational characteristics on measures of training effectiveness. Academy of Management Journal, 35, 828–847. Mitchell, T. R., Hopper, H., Daniels, D., George-Falvy, J., & James, L. R. (1994). Predicting self-efficacy and performance during skill acquisition. Journal of Applied Psychology, 79, 506–517. Moley, B. E., et al. (1992). The teacher’s role in facilitating memory and study strategy development in the elementary school classroom. Child Development, 63, 653–672. Overton, D. A. (1964). State-dependent or “dissociated” learning produced with pentobarbital. Journal of Comparative Physiological Psychology, 57, 3–12. Pressley, M., Borkowski, J. G., & O’Sullivan, J. T. (1984). Memory strategy instruction is made of this: Metamemory and durable strategy use. Educational Psychologist, 19, 94–107.
Identical Elements Theory: Extensions and Implications for Training
205
Pulakos, E. D., Arad, S., Donovan, M. A., & Plamondon, K. E., (2000). Adaptability in the workplace: Development of a taxonomy of adaptive performance. Journal of Applied Psychology, 85, 612–624. Pulakos, E. D., Schmitt, N., Dorsey, D. W., Hedge, J. W., & Borman, W. C. (2002). Predicting adaptive performance: Further tests of a model of adaptability. Human Performance, 15, 299–324. Rizzo, A., Morie, J. F., Williams, J., Pair, J., & Buckwalter, J. G. (2005). Human emotional state and its relevance for military VR training. Proceedings of the 11th International Conference on Human Computer Interaction. Rouiller, J. Z., & Goldstein, I. L. (1993). The relationship between organizational transfer climate and positive transfer of training. Human Resource Development Quarterly, 4, 377–390. Rovee-Collier, C. (1993). The capacity for long-term memory in infancy. Current Directions in Psychological Science, 2, 130–135. Salas, E., Bowers, C. A., & Rhodenizer, L. (1998). It is not how much you have but how you use it: Toward a rational use of simulation to support aviation training. International Journal of Aviation Psychology, 8, 197–208. Singley, M. K., & Anderson, J. R. (1989). The transfer of cognitive skill. Cambridge, MA: Harvard University Press. Smith-Crowe, K., Burke, M. J., & Landis, R. S. (2003). Organizational climate as a moderator of safety knowledge-safety performance relationships. Journal of Organizational Behavior, 24, 861–876. Spencer, R. M., & Weisberg, R. W. (1986). Context-dependent effects on analogical transfer. Memory & Cognition, 14, 442–449. Stevens, C. K., & Gist, M. E. (1997). Effects of self-efficacy and goal-orientation training on negotiation skill maintenance: What are the mechanisms? Personnel Psychology, 50, 955–978. Thorndike, E. L., & Woodworth, R. S. (1901a). The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review, 8, 247–261. Thorndike, E. L., & Woodworth, R. S. (1901b). The influence of improvement in one mental function upon the efficiency of other functions: The estimation of magnitudes. Psychological Review, 8, 384–395. Thorndike, E. L., & Woodworth, R. S. (1901c). The influence of improvement in one mental function upon the efficiency of other functions: Functions involving attention, observation, and discrimination. Psychological Review, 8, 553–564. Tracey, J. B., Tannenbaum, S. I., & Kavanagh, M. J. (1995). Applying trained skills on the job: The importance of the work environment. Journal of Applied Psychology, 80, 239–252.
Chapter 21
ASSESSMENT AND PREDICTION OF EFFECTIVENESS OF VIRTUAL ENVIRONMENTS: LESSONS LEARNED FROM SMALL ARMS SIMULATION1 Stuart Grant and George Galanis Fielded small arms appear to be reaching the limits of development (Jane’s Information Group, 2006/2007), and the operational environment facing Western militaries continues to increase in complexity. A greater likelihood of close quarters battle, the more difficult friend-versus-foe discriminations, and the wider presence of noncombatants on the battlefield increase performance demands. Training and training technologies are one avenue for meeting the challenge. Virtual environments (VEs), as represented by the current generation of simulators for the training of marksmanship skills, are readily available as training solutions. Indeed, there are a number of commercially available marksmanship simulators in existence based on commercial off-the-shelf components. Such devices appear to offer significant cost savings compared to expensive-tooperate live ranges (English & Marsden, 1995). In addition, the simulators also offer unprecedented levels of safety given they do not employ live ammunition. Training is not subject to adverse weather conditions, and because simulators are instrumented extensively, there are possibilities for coaching and feedback to trainees that are not available in live ranges. Compared to flight simulators, rifle-range simulators appear to be relatively simple environments. Rifle ranges do not have the complex systems of an aircraft. The simulation of a rifle and its ballistics is simpler than the complex systems of a modern aircraft and its flight dynamics. However, researchers evaluating small arms simulators have been perplexed at the difficulty they have encountered in finding quantitative evidence of transfer of training or significant levels of correlation between marksmanship skills in the live range to performance in the simulators. To find greater levels of transfer and closer relationships between live and simulated fire, this chapter argues that higher levels of marksmanship training and more knowledge of how humans employ live and simulated small arms are required.
Assessment and Prediction of Effectiveness of Virtual Environments
207
ASSESSING TRANSFER OF TRAINING In evaluating a simulator for marksmanship training, the transfer of training to live-fire performance is the principal criterion. Although criteria for assessing a training device are naturally influenced by the benefits stakeholders seek from the device (for example, increased safety, reduced environmental impact, lower operating cost, or smaller footprint), if it cannot be demonstrated that the device contributes to successful live-fire performance, the other criteria are moot. Kirkpatrick (1959) identified trainee reactions, knowledge obtained in training, subsequent performance in the operational environment, and the impact on overall organizational performance as possible criteria. However, meta-analysis (Alliger, Tannenbaum, Bennett, Traver, & Shotland, 1998) indicates that neither evaluation of the training by the trainees nor the amount of knowledge acquired during training correlated strongly with subsequent performance on the job (r ≤ 0.21). If the goal is to determine the effect of training on performance in the operational setting, then it should be assessed directly. In assessing transfer, the existing assessment tests employed by the target training organization are very valuable performance metrics because they permit comparison against historical data and have inherent meaning and validity to the training organization that will strongly influence their acceptance of the device. However, these tests may incorporate factors that, while certainly relevant to effectively employing a rifle, are not strictly marksmanship per se. For an accurate evaluation, careful consideration must then be given to the requirements of the training device and how it is used. For example, the Canadian Personal Weapons Test—Level 3, used for assessing infantry use of the C7A1 assault rifle, includes a “run down” serial (Department of National Defence, 1995) that requires the firer wearing fighting gear to run 100 meters between successive timed target exposures. The physical fitness of the firer will certainly affect the soldier’s score. Whether and how a marksmanship training device should train physical fitness pertaining to marksmanship should be established first to frame the assessment. In addition, many military marksmanship tests count hits of targets as the measure of marksmanship. This provides a single binary data point for each round fired. Although that is a meaningful result for combat, it is a relatively impoverished way to score the result. For these reasons, the collection of additional measures is worthwhile. Among other possible measures, constant and variable error of impact point from the target’s center have desirable properties (Johnson, 2001; Taylor, Dyer, & Osborne, 1986). As continuous variables, they provide more information than binary scoring, and being measured on a ratio scale, they can support various summary and inferential statistics. Finally, they correspond to zeroing and grouping aspects of marksmanship and so have inherent meaning to the subject matter. Performance during skill acquisition is governed by the learner’s evolving supporting knowledge base. Initially performance is based largely on declarative knowledge that is more readily communicated verbally, but further practice of the skill results in the chaining together of the initially separate components of performance, until ultimately a smooth, automatic level of performance is
208
Integrated Systems, Training Evaluations, and Future Directions
reached (Anderson, 1983; Fitts, 1964). In applying this model of skill acquisition to marksmanship, Chung, Delacruz, de Vries, Bewley, and Baker (2006) noted that different training devices could support different stages of marksmanship skill. This suggests not only that different types of training devices can support expert performance by supporting different stages of performance, but that estimates of the efficacy of those devices may differ depending on the stage of skill attained when the estimate is made.
PREDICTION Accurate predictions by a simulation are desirable for the purpose of validation and effective employment. The correspondence between results people obtain in the simulated and live environments speak to the validity of the simulation. Furthermore, a training simulator that accurately predicts live performance allows the live training and testing to be scheduled only when the trainees are ready. The ability to predict live-fire performance from results obtained in marksmanship simulators is limited (see Chung et al., 2006, for a good set of references). Correlations between performance in marksmanship simulators and live fire are typically in the r = 0.3 to 0.4 range, usually accounting for less than 20 percent of the variance in live-fire scores. This is typical across various types of simulators. Simulators with dedicated simulator weapons that use lasers have provided correlations with live-fire scores ranging from 0.01 and 0.45 (Filippidis & Puri, 1999); 0.02 and 0.28 (Filippidis & Puri, 1996); 0.4 (Gula, 1998); 0.41 (Yates, 2004); to 0.68 (Hagman, 1998). A training device employing a laser insert for regular service weapons has achieved similar correlations, ranging from 0.16, 0.24, 0.55 (Smith & Hagman, 2000, 2001) to 0.5 and 0.55 (Smith & Hagman, 2003). A surprising outcome of research looking for correlations between marksmanship performance in simulators and the live range is that performance in simulators appears to be worse (Filippidis & Puri, 1996; Yates, 2004). This is surprising as anecdotal evidence based on face validity suggests that recoil in simulators is significantly less and that the lack of live ammunition in simulators should make the simulator a less stressful environment—hence the expectancy is that simulators should induce superior marksmanship performance. Further investigations into this effect indicate that pixilation of targets in the simulator degrade marksmanship performance. When targets on the live range were modified to exhibit pixilation similar to that present in a simulator, marksmanship performance degraded by the same amount found in simulated conditions (Temby, Ryder, Vozzo, & Galanis, 2005). This finding suggests that eye-limited resolution of targets is a necessary requirement for simulators that are to be used for prediction of live-fire performance. Predictions that can account for substantial amounts of variance in live-fire scores (especially up to 46 percent) can have practical value in screening trainees for live-fire training or testing (Smith & Hagman, 2003). It is worth noting,
Assessment and Prediction of Effectiveness of Virtual Environments
209
however, that questionnaires regarding affect and knowledge have shown equivalent predictive power (Chung et al., 2006). TRANSFER OF TRAINING Researchers attempting to find transfer of skill acquired in marksmanship simulators to live firing have often focused their attention on measuring the ability of the devices to train for performance on defined rifle marksmanship qualification tests. This approach achieves direct relevance to the military client’s training requirement and exploits the underlying validity of the qualification test. The typical control group is the standard method of instruction. Obtaining solid support for the training effect has proven surprisingly elusive. Support is often based on the finding of equivalent live-fire test scores for those trained on the device of interest and those trained in the conventional manner. However, this approach is dependent on the statistical power available for the comparison. White, Carson, and Wilbourn (1991) substituted marksmanship simulator training for the dry fire and sighting exercises used for U.S. Air Force Security Police weapons training. Their results showed no overall difference between the simulator-trained group and the control group, although the trainees with less prior weapons experience achieved higher scores if they received simulator training. The relatively large sample size (n = 247) provided the basis for good statistical power, making the claim of training equivalence convincing. The treatment effect was weak, however. The experimental manipulation being 30 minutes of conventional training versus 10–20 minutes of simulator training did not appear to provide a great training effect. Both the control and experimental groups achieved low, failing scores on the post-training, live-fire practice test, but then more than doubled their scores when the test was immediately repeated for qualification purposes. Hagman (2000) found significant benefits of using a laser insert device (Laser Marksmanship Training System) over a suite of other training devices in grouping, zeroing, and known range firing. These were the tasks actually trained on the devices. The experimental group’s advantage was not repeated on other tasks that comprised the marksmanship course. Both the control and experimental groups performed well on the live record fire test, with no significant difference between them. Yates (2004) also found equivalence between one platoon trained using a dedicated laser based simulator and another trained with dry fire. Both platoons achieved success on the final qualification test, although inclement weather experienced by the experimental group on the range could have suppressed a training benefit. Comparing with soldiers trained using live fire, English and Marsden (1995) detected no difference to scores of soldiers trained with a dedicated laser system. Testing similar simulator technology, Grant (2007) found soldiers trained entirely in simulation could obtain qualification results that were successful and indistinguishable from those trained entirely with live fire and that significantly superior results were found with an equivalent amount of training using a mix of live and simulated fire.
210
Integrated Systems, Training Evaluations, and Future Directions
CHALLENGES IN ASSESSING SIMULATIONS FOR MARKSMANSHIP TRAINING Marksmanship scores encountered in the assessment of training devices typically show a large amount of error variance (Torre, Maxey, & Piper, 1987). Attempts to attribute scores in a live-fire environment to prior experience in a simulation environment or to predict live-fire scores on the basis of performance on a training device must contend with the fact that subject performance is unstable, as one would expect from various skill acquisition theories (Anderson, 1983; Fitts, 1964). Studies using live-fire training for live-fire testing as a control group typically show only a weak consistency in scores. Torre et al. found correlations between live-fire sessions of 0.3 and 0.54. Over a one-year interval the correlation between successive live-fire tests has been found to be 0.37 (Smith & Hagman, 2000). Indeed, Hagman (2005) examined 180 trainees firing a 40-round test and found that the score achieved after 20 rounds had been fired accounted only for less than 70 percent of the variance of the final score. Assessments of marksmanship training devices are frequently hampered by the limitations imposed by the subject matter (the use of deadly weapons) and the subject populations (transient military personnel). Ideally, transfer of training studies provide a pre-test prior to any training to provide assurance that there are no (or at least a basis for controlling for) preexisting differences among the experimental groups that could be mistaken for a differential training effect. This is not feasible if the training audience is without any prior experience with firearms. Limiting the control group’s training to that required to safely discharge the weapon (Boyce, 1987) may be the condition closest to the experimentalist’s ideal. FIDELITY IN MARKSMANSHIP SIMULATION Training technologies continue to increase in power. For example, Moore’s law (Moore, 1965) predicts a doubling of computational power every two years; theoretical predictions appear to suggest that this increase should continue for the coming decade, and some futurists suggest that this trend may continue in new forms beyond that period (for example, Moore, 1995). The implication is that new technology may provide new possibilities and modes of training delivery—including new types of VEs. One of the simplest ways to apply new VEs to training systems is to substitute existing live-training systems with new VEs. This approach to VE design has been common in the past and will arguably continue to be prevalent in the near future. In such a paradigm, the simulator designer’s role is reduced to analyzing an apparently functional live training environment and replicating that functional environment with a more cost-effective VE. Instructional staff and trainees are already familiar with the existing live environment, and hence there is no requirement to make significant changes to instructional and learning techniques once the new VE is introduced into service. This evolutionary approach places the emphasis of the VE design on the analysis of the existing live training
Assessment and Prediction of Effectiveness of Virtual Environments
211
environment. As such, the main disciplines for the analysis of the live environment and synthesis of the VEs replicating the physical are from the physical sciences and engineering. This approach minimizes—but does not completely eliminate—the requirement for detailed costly studies of human learning and instructional techniques. Determining Fidelity Requirements for Simulators When considering VE fidelity requirements, a typical goal is to learn what elements of the environment must be simulated and to what degree of fidelity for the task to be trained. This question is framed by Lintern (1996) in the following form: For instruction of high-speed, low-level flight in a fixed wing aircraft, a simulated visual scene with a Field of View of w° by h°, a frame rate of f Hz and a scene content level of s units when placed on a simulator cockpit of c type can substitute for n% of the hours normally required in the aircraft to reach proficiency. The time required in the simulator to achieve proficiency is t hours.
Although Lintern’s (1996) framing of the simulation fidelity problem is stated in terms of flight simulation and is limited to issues related to out-the-window scenery, such an approach can be translated into statements for other tasks and VEs. Given the apparently extensive knowledge of marksmanship, it would appear a relatively straightforward matter to list the issues identified in marksmanship, and, using the knowledge of one or more subject matter experts, produce a statement of performance requirements for a small arms trainer. For example, one could refer to such documents as an army marksmanship pamphlet and begin listing the major considerations in marksmanship (for example, Department of National Defence, 1994). Such factors as grouping requirements, target clarity, wind, and lighting requirements could be included. Similarly, design engineers could also refer to the design manuals for particular rifles (and other documents) to ascertain the operational characteristics of a rifle to determine the type of projectile, caliber of the rifle, characteristics of the firing mechanisms, and the functioning of the sights. Once a training system was designed, the evaluation process of the training devices could then include comparison of the training device to the specifications, as well as the subject assessment of the complete operation by expert marksmen. However, it appears that there are problems with the design approach discussed above and that such problems persist even though there has been research for several decades in this area. It has been suggested by a number of researchers that the design process places an overreliance on subject matter experts for developing the statement of requirements, and evaluation of the VEs, and that both aspects continue to have an overreliance on face validity as a measure of suitability of the design (Salas, Milham, & Bowers, 2003). We are not disputing the requirement for subject matter experts in the design and evaluation of such
212
Integrated Systems, Training Evaluations, and Future Directions
systems, but we suggest that some of the shortfalls of current marksmanship simulators are occurring despite this design practice.
Limitations of Rational and Replication Approach One of the reasons why overreliance on face validity and subjective assessment of a training system occurs is that there is a disconnect between the ability to verbalize how psychomotor skills are performed and the tendency of humans to confabulate explanations when asked how such skills are performed. Hence even the subject matter experts may not be aware that they are performing the task in a manner quite different from the way in which they verbalize the task. So, for example, when marksmen are asked how they perform the task, they may report recognizing the target, positioning and holding the weapon, pointing the weapon toward the target, and then carefully releasing the projectile toward the target. However, how exactly a target was “recognized,” the nature of the rates of movement involved in “pointing,” and the way the trigger was activated are difficult to articulate; research would suggest these are not actually available to verbal consciousness. Consider the apparently simple task of picking up a coin that is lying on a table. The broad parameters (similar to marksmanship) might include recognizing the coin to be picked up, positioning oneself close enough to reach the coin, and then reaching out to grasp the coin. However, even this apparently simple act has all sorts of complications. Research conducted by Westwood and Goodale (2003) investigated what was involved in picking up various shaped objects. One example considered how the apparent size of a coin changes when surrounded by other coins. If the surrounding coins are smaller than the central coin, then the central coin “looks” larger than it really is, whereas if the surrounding coins are larger, the central coin looks smaller than it really is. However, Westwood and Goodale also found that although experimental subjects could verbalize the apparent change in size of the coin to be picked up, when subjects actually reached for the coin, the reaching and grasping behavior (the psychomotor component of the task) did not reflect the verbal descriptions. Goodale speculates that there are possibly two different regions of the brain involved in such tasks—one region for the verbalization and recognition skills and the other for the psychomotor skills. The academic research in coin-reaching experiments of the 1990s is reflected in real applications involving VEs. In the late 1950s and early 1960s considerable research effort was carried out relating to airplane accidents in the approach to landing at night. During the investigations as to how pilots actually perform this task, Lane and Cumming (1956) administered a questionnaire asking experienced airline pilots to indicate the geometry they believed that they used in performing the slant perception task during the approach to landing. To their astonishment, just over 50 percent of the responses were geometrically implausible, while another 25 percent of the subjects stated they did not know how they performed the task. Only 25 percent of the respondents indicated geometrically plausible explanations. Lane and Cumming concluded that the only way to design an
Assessment and Prediction of Effectiveness of Virtual Environments
213
improved landing system was through a lengthy process of analysis and evaluation of final task performance—and that expert opinion was limited. Later research in approach to landing investigated the effects of artifacts of simulators in the approach to landing. So, for example, simulator displays are pixilated, the scenery is not displayed at eye-limited resolution, and textures in synthetic scenery are not as rich or dense as those found in the real world. A series of rigorous evaluations conducted by Lintern and a number of coresearchers in the 1980s demonstrated that the simulator artifacts create biases in pilots’ slant perception, so there is a danger that pilots training in simulators will be incorrectly calibrating their perceptions while training in simulators (Lintern & Walker, 1991). The work in the field of approach to landings would then appear to apply some weight to the research in the field of visual perception (such as Westwood & Goodale, 2003) and that for psychomotor skills learning, VEs must be validated by empirical experiments, since expert verbalization cannot be relied upon to reveal the principles underlying human performance in complex psychomotor tasks. The implication for VE design then is that asking a subject matter expert how a psychomotor task is performed may not reveal the actual learning underlying performance of the manual control part of a task. So then, an analysis of the task to be performed (flying an aircraft at a low level or aiming a weapon) must be based on empirical data, not solely on simple verbalizations. It requires models and evaluations of performance of the complete task as it is actually performed. Such analyses and evaluations are often time consuming and expensive, but the scientific literature would indicate that analyses and experimental validation are critical.
CONCLUSION Marksmanship simulators have been used as part of successful marksmanship training programs. Their contribution to trainee success is not always easy to estimate, however, and the relationship between performance in the VE and in live fire is weak. As these systems are refined, effort should be invested in obtaining more and better data regarding how the task is performed and what is learned. In particular, data should be collected on skilled firers whose performance shows little variable error within the rifle-firer system. Although the highest levels of expertise can take years to achieve (for example, Crossman, 1959), thereby making true experts difficult to find, using subjects who can demonstrate a high level of consistency across repeated live-fire tests will provide greater precision to developers and trainers. If reliable discrepancies can be found between the power of simulator and live-fire data to predict performance on a live-fire test, then researchers will be in a position to understand and overcome a simulator’s limiting factors. This does not assume that live fire will be the most reliable predictor or even that achieving a comparable level of prediction with a simulator demonstrates that all the underlying factors in marksmanship have been captured in the
214
Integrated Systems, Training Evaluations, and Future Directions
simulator, but simply that a reliable research tool is available for evaluating simulator design. Additionally, extensive data collection on the acquisition of marksmanship skill should be sought. Theory-driven data collection on novices, experts, and people transitioning between those levels should be used to complement marksmanship subject matter experts. These will inform decisions regarding visual resolution requirements, acceptable transport delays, and the type and precision of data required of the scoring systems. Finally, improving prediction and demonstrable transfer of training of current marksmanship simulators is a significant challenge, and the challenge may be increasing. There is an emerging call for marksmanship training to explicitly and thoroughly address the highly dynamic, close quarters and built-up situations that are characteristic of current operations (Ellison, 2005). Current marksmanship simulators face significant obstacles in presenting these situations (Muller, Cohn, & Nicholson, 2004). These situations can be created for soldiers using live simulation, but marksmanship training was not the driving force behind the technologies used to instrument the soldiers and simulate their weapons fire. Nevertheless, the knowledge gained in overcoming the challenges to existing marksmanship trainers will go a long way toward solving them in the live domain. NOTE 1. This chapter was originally published by the Government of Canada, DRDC Toronto Publications, Jack P. Landdt, Ph.D., Editor.
REFERENCES Alliger, G. M., Tannenbaum, S. I., Bennett, W., Traver, H., & Shotland, A. (1998). A metaanalysis of the relations among training criteria (Rep. No. AFRL-HE-BR-TR-19980130). Brooks Air Force Base, TX: Air Force Research Laboratory. Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Boyce, B. A. (1987). Effect of two instructional strategies on acquisition of a shooting task. Perceptual and Motor Skills, 65, 1003–1010. Chung, G. K., Delacruz, G. C., de Vries, L. F., Bewley, W. L., & Baker, E. L. (2006). New directions in rifle marksmanship research. Military Psychology, 18(2), 161–179. Chung, G. K., Delacruz, G. C., de Vries, L. F., Kim, J., Bewley, W. L., de Souza e Silva, A. A., Sylvester, R. M., & Baker, E. L. (2004). Determinants of rifle marksmanship performance: Predicting shooting performance with advanced distributed learning assessments (Rep. No. A178354). Los Angeles: UCLA CSE/CRESST. Crossman, E. R. F. W. (1959). A theory of the acquisition of speed-skill. Ergonomics, 2(2), 153–166. Department of National Defence. (1994). The rifle 5.56 mm C7 and the carbine 5.56 mm C8 (Report No. B-GL-317-018 / PT-001). Ottawa, Ontario, Canada: Department of National Defence.
Assessment and Prediction of Effectiveness of Virtual Environments
215
Department of National Defence. (1995). Shoot to live: Part 1—Policy (Report No. B-GL382-002/FP-001). Ottawa, Ontario, Canada: Department of National Defence. Ellison, I. W. (2005). Current inadequacy of small arms training for all military occupational specialties in the conventional army (Master’s thesis; Rep. No. A425634). Fort Leavenworth, KS: U.S. Army Command and General Staff College. English, N., & Marsden, J. (1995). An evaluation of the training and cost effectiveness of SAT for recruit training (Report No. DRA/CHS/HS3/CR95039/01). Farnborough, Hampshire, United Kingdom: Defence Research Agency. Filippidis, D., & Puri, V. P. (1996). An analysis of Fire Arms Training System (FATS) for small arms training. In Annual Meeting of TTCP HUM Technical Panel 2. Toronto, Ontario, Canada: The Technical Cooperation Program. Filippidis, D., & Puri, V. (1999, November). Development of training methodology for F-88 Austeyr using an optimum combination of sim/live training. Paper presented at the Land Weapons System Conference, Salisbury, South Australia. Fitts, P. M. (1964). Perceptual-motor learning. In A. W. Melton (Ed.), Categories of human learning (pp. 243–285). New York: Academic Press. Grant, S. C. (2007). Small arms trainer validation and transfer of training: C7 Rifle (Rep. No. TR 2007-163). Toronto, ON: Defence Research and Development Canada. Gula, C. A. (1998). FATS III combat firing simulator validation study (DCIEM Rep. No. 98-CR-26). North York, Ontario, Canada: Defence and Civil Institute of Environmental Medicine. Hagman, J. D. (1998). Using the engagement skills trainer to predict rifle marksmanship performance. Military Psychology, 10(4), 215–224. Hagman, J. D. (2000). Basic rifle marksmanship training with the laser marksmanship training system (Research Rep. No. 1761). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Hagman, J. D. (2005). More efficient live-fire rifle marksmanship evaluation (Rep. No. A762144). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Jane’s Information Group. (2006/2007). Executive overview: Infantry weapons. Jane’s infantry weapons. Alexandria, VA: Jane’s Information Group. Johnson, R. F. (2001). Statistical measures of marksmanship (Rep. No. TN-01/2). Natick, MA: US Army Institute of Environmental Medicine. Kirkpatrick, D. L. (1959). Techniques for evaluating training programs. Journal of ASTD, 13(11), 3–9. Lane, J. C., & Cumming, R. W. (1956). The role of visual cues in final approach to landing (Human Engineering Note 1). Melbourne, Australia: Aeronautical Research Laboratories, Defence Science and Technology Organisation. Lintern, G. (1996). Human performance research for virtual training environments. Proceedings of the Simulation Technology and Training (SimTecT) Conference (pp. 239– 244). Melbourne, Australia: Simulation Industry Association of Australia. Lintern, G., & Walker, M. B. (1991). Scene content and runway breadth effects on simulated landing approaches. The International Journal of Aviation Psychology, 1(2), 117–132. Moore, G. E. (1965). Cramming more components onto integrated circuits. Electronics, 38 (8), 114–117.
216
Integrated Systems, Training Evaluations, and Future Directions
Moore, G. E. (1995). Lithography and the future of Moore’s law. Proceedings of SPIE—Volume 2437 (pp. 2–17). Santa Clara, CA: The International Society for Optical Engineering. Muller, P., Cohn, J., & Nicholson, D. (2004). Immersing humans in virtual environments: Where’s the Holodeck? Proceedings of the Interservice/Industry Training, Simulation, and Education Conference (pp. 1321–1329). Arlington, VA: National Training Systems Association. Salas, E., Milham, L. M., & Bowers, C. (2003). Training evaluation in the military: Misconceptions, opportunities, and challenges. Military Psychology, 15(1), 3–16. Smith, M. D., & Hagman, J. D. (2000). Predicting rifle and pistol marksmanship performance with the laser marksmanship training system (Tech. Rep. No. 1106). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Smith, M. D., & Hagman, J. D. (2001). A review of research on the laser marksmanship training system (ARI Research Note No. 2001-05). Alexandria, VA: U.S. Army Research Institute for the Behavioral Science. Smith, M. D., & Hagman, J. D. (2003). Using the laser marksmanship training system to predict rifle marksmanship qualification (Research Rep. No. 1804). Alexandria, VA: U.S. Army Research Institute of the Behavioral and Social Sciences. Taylor, C. J., Dyer, F. N., & Osborne, A. (1986). Effects of rifle zero and size of shot group on marksmanship scores (ARI Research Note 86-15). Fort Benning, GA: U.S. Army Research Institute. Temby, P., Ryder, C., Vozzo, A., & Galanis, G. (2005). Sharp shooting in fuzzy fields: Effects of image clarity on virtual environments. Proceedings of the 10th Simulation Technology and Training (SimTecT) Conference. Sydney, Australia: Simulation Industry Association of Australia. Torre, J. P., Maxey, J. L., & Piper, S. (1987). Live fire and simulator marksmanship performance with the M16A1 rifle. Study 1: A validation of the artificial intelligence direct fire weapons research test bed (Vol. 1, Technical Memorandum No. 7-87). Aberdeen Proving Ground, MD: U.S. Army Human Engineering Laboratory. Westwood, D. A., & Goodale, M. A. (2003). Perception illusion and the real-time control of action. Spatial Vision, 16, 243–254. White, C. R., Carson, J. L., & Wilbourn, J. M. (1991). Training effectiveness of an M-16 rifle simulator. Military Psychology, 3(3), 177–184. Yates, W. W. (2004). A training transfer study of the indoor simulated marksmanship trainer. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA.
Chapter 22
SIMULATION TRAINING USING FUSED REALITY Ed Bachelder, Noah Brickman, and Matt Guibert This chapter describes a novel mixed reality technique for real time, color based video processing for training applications combining software and off-the-shelf hardware called “fused reality.” This technique allows an operator, using a helmet-mounted display, to view and interact with the physical environment in real time while viewing the virtual environment through color-designated portals (that is, painted surfaces, such as window panels). Additionally, physical objects can be deployed in real time into the virtual scene (for example, a person gesturing outside the simulator cabin can be virtually moved relative to the vehicle). Fused reality’s adaptive feature recognition allows for realistic set lighting, colors, and user movement and positioning. It also enables multiple keying colors to be used (versus just blue or green), which in turn allows “reverse chromakeying”—preserving only keyed colors and rendering all others transparent. This technology enables hands-on immersive training for a very wide range of environments and tasks. FUSED REALITY Due to physical constraints and fidelity limitations, current simulation designs often fail to provide both functional utility and immersive realism. Fused reality (Bachelder, 2006) is a mixed reality approach that employs three proven technologies—live video capture, real time video editing, and virtual environment simulation—offering a quantum jump in training realism and capability. Video from the trainee’s perspective is sent to a processor that preserves pixels in the near-space environment (that is, the cockpit) and makes transparent the far-space environment (outside the cockpit windows) pixels using blue screen imaging techniques. This bitmap is overlaid on a virtual environment and is sent to the trainee’s helmet-mounted display (HMD). Thus the user can directly view and interact with the physical environment, while the simulated outside world serves as a backdrop. Fused reality is a technique conceived at Systems Technology, Inc. (STI). It is similar in certain respects to the blue screen technique that Hollywood is using (such as that employed by Alfred Hitchcock in Vertigo). However, Hollywood
218
Integrated Systems, Training Evaluations, and Future Directions
processes its blue screening offline—STI is conducting it in real time—and, in contrast with blue screening, the backdrop required by fused reality allows large variations in color aberrations and lighting intensity. HISTORY OF CHROMAKEY Well before the film industry employed modern computer-generated imagery (CGI) to create stunning visual effects, many directors relied on simpler techniques. One of the earliest methods developed was the static matte. This technique is referred to as a static matte because the same roll of film is used to create the effect—there is no need (and really no ability) to overlay different mattes. The most common application of mattes consists of exposing two different parts of film to light at different times, which is known as a double-exposure matte. For example, many directors used the double-exposure technique to combine a tranquil terrain scene with a turbulent sky that is seemingly in fast-forward (with dark, seething clouds). This was accomplished by filming the tranquil scene with a sheet of black paper over the upper portion of the camera lens to prevent the sky from exposing the film. Once the initial scene was captured, the film would then be rewound and the terrain side of the film would be masked with the black paper (to shield the film that has already been exposed). The cameraman then filmed the stormy sky with a slower film speed, so that when the final video was played, two very different elements are combined into the same scene. Another simple example of using static mattes is the creation of widescreen films. By simply placing thin black strips of paper on the top and bottom edges of a lens, the film is instantly converted to widescreen dimensions. This process is referred to as a hard matte (whereas a soft matte requires a film projectionist to mask the projector to create the widescreen effect). Unfortunately, static mattes are not very versatile and cannot be used with moving objects. Following the invention of the static matte technique, the traveling matte was developed as a more complicated improvement that allows mattes to “follow” moving objects. In a traveling matte shot, multiple mattes are used to designate the exact shapes of different elements of a scene. For instance, if the scene consists of an actor falling from a building, one film would be created to film the actor simulating a fall in a studio and another film would simply capture the building on location. Two different mattes are then created: one with the actor’s figure masked in black and one with the actor’s background (the studio) in black. During each frame, a new matte is created to adjust to the actor’s movement along the background (to account for arm/leg movements and so forth). Once the filtering is complete, the film consists of four different pieces: the two originals and the two mattes. Finally, the building image is combined with the actor’s blackened figure (so that a dark “gap” appears in the building), then the film is rewound and reexposed to the matte with the actor in it. Although this process did provide more flexibility in filming, it was very difficult to accomplish and required a tremendous amount of time and effort. As CGI technology improved in the 1950s, the film industry began to conduct research in order to create better techniques for more efficient use of traveling
Simulation Training Using Fused Reality
219
mattes. Two of the most prominent researchers in this field were Arthur Widmer and Petro Vlahos, who are widely credited with the development of the chromakey process (also referred to as blue screen or green screen). Widmer first began developing chromakey while working for Warner Bros., and it was soon used in the 1958 making of the Old Man and the Sea—the adaptation of Ernest Hemingway’s novel. Petro Vlahos’s work earned him an Oscar in 1964 for blue screen compositing technology. With chromakey, a predetermined key color (often blue or green) is rendered transparent in one image in order to reveal the image behind it. Chromakey typically refers to the use of analog methods, whereas the more modern processes rely on digital compositing techniques (henceforth, the process will be referred to as “blue screen”). The blue screen technique allows for the combination of multiple sets of film (or computer images) into one. The process begins by filming an actor in front of a blue background. When filming is complete, the images are put through a blue filter that will allow only the background to be exposed on black and white film. This new image, referred to as the female matte, now consists of a black background with a blank space where the actor stood. Next, the original blue screen shot is now processed through a green and red filter in order to capture the actor’s figure. This time, the black and white film, referred to as the male matte, shows a black figure where the actor stood and a clear background. It is important to note that in both the male and female mattes, the areas that are not black are actually clear (not white) because they are unexposed. With the actor’s (inclusive and exclusive) mattes completed, it is now possible to start combining the background and foreground images. First, the background image is filmed using the male matte as a filter so that the male matte occludes the background image and prevents portions of it from being exposed (now the background image has an unexposed gap where the actor’s figure can be placed). Afterward, the original blue screen film (with the actor and the screen) is refilmed using the female matte as a filter so that only the actor’s figure is exposed on the film and not the background. Finally, the images are combined frame by frame using high powered computers or special film equipment (such as optical printers and so forth). This process can also be accomplished during production: rather than filming the entire scene and then compositing it afterward (postproduction), computers can be used to break down each frame as it is filmed and generate the composite. The ability of computers to do this in real time has opened the doors for many modern applications, such as the weather screen in many TV stations. The weather screen is what TV viewers see, but not what the weather anchor sees—in order for the weather anchor to view the simulated environment from his or her eyepoint, a helmet-mounted display is required. Blue and green are typically used as the key colors because these two colors are not noticeably visible in human skin tone. Furthermore, digital cameras preserve more detail from green channels and the color has a higher luminance value, so it needs less light. However, it is always important to consider the background image of the current scenario. If the background contains a lot of natural scenery (that is, grass or sky), it would be wise to use magenta as the key.
220
Integrated Systems, Training Evaluations, and Future Directions
The Naval Postgraduate School has conducted research in flight training using chromakey, and its most recent project is called the Chromakey Augmented Virtual Environment (ChrAVE) 3.0 System (Hahn, 2005). This system is more hardware relative to fused reality, employing (1) a compositing device, (2) a video graphics array–to-digital scan converter, and (3) an analog-to-digital signal converter. Green light emitting diode ring lighting is used to illuminate highly reflective material to produce the keying color; however, the main disadvantage of this technique is that the user’s viewing angle is limited to small deviations from head on. In other words, if the surface was viewed at an angle of 45° relative to the surface’s normal, very little of the source’s energy will return back to the eye’s viewpoint, destroying the chromakey effect. Oda (1996) at Carnegie Mellon University uses stereoscopic pixel-by-pixel depth information in the form of a depth map as a switch (z-keying), allowing space to be segmented based on the detailed shape and position of surfaces. Frame rate using the process is very low (15 frames per second), which makes it unsuitable for much real time simulation training. Another serious drawback of this technique is that it fails when the background surface is featureless (such as a uniformly painted wall). CHROMAKEY PROCESS The blue in blue screening was chosen, as mentioned above, because blue is not present in human skin tones. These backdrops preclude the use of similar colors in the physical environment. Similarly, fused reality uses magenta as the target color, since it is rarely encountered in simulation environments. The color recognition technique used in fused reality can accurately distinguish between skin tones and magenta. Figure 22.1 gives pixel scatter plots of the red, green, and blue (RGB) components comprising the magenta color target. Due to nonuniformities across the material surface, as well as sensor artifacts and lens artifacts (there are darker areas within the magenta screen), there is a wide variance in RGB values. In order to algorithmically define the target color, scatter plots of the pixel colors were created, as shown in Figure 22.2, with the areas of the scatter plots approximated by bounding polygons. This technique using polygon templates was initially used by fused reality. Hue saturation value has been identified as a simpler technique to RGB for color decomposition and is now used by fused reality to define a surface’s color. Figure 22.2 shows the pixel scatter plots for saturation and value corresponding to the image in Figure 22.1. Note that these coordinates produce scatter plots that can be defined via bands (instead of complex and relatively imprecise polygons) based on their probability densities, shown below the scatter plots. Thus it is possible to statistically define the color characteristics of an image simply through lower and upper boundaries—a much simpler process than the RGB mapping (which requires linear interpolation) shown in Figure 22.2. The robustness of this technique is demonstrated in Figure 22.3, where a magenta surface (shown top) mounted on a placard serves as a virtual display. The bottom photos in Figure 22.3 show two very different lighting environments
Simulation Training Using Fused Reality
Figure 22.1.
221
RGB Components That Comprise the Magenta Color Target
(note the desk reflection brightness), but the magenta is correctly identified despite the variation in lighting. The advantages of fused reality and its current recognition scheme thus include the following: (1) target color backdrops can be made from inexpensive and widely available cotton sheets, (2) chromakey is independent of the viewing angle, (3) it can use any lighting (incandescent or fluorescent) and brightness that makes the backdrop visually distinguishable from its surroundings, (4) more than
Figure 22.2.
RGB Mapping of the Scatter Plot Shown in Figure 22.1
222
Integrated Systems, Training Evaluations, and Future Directions
Figure 22.3.
Lighting Variations of the Magenta Color Target
one target color can be used, and (5) the lighting level does not have to be low to simulate low light visual environments or night vision as each pixel can be operated on to change the light level or color displayed to user. Fused feality divides space into the near tangible environment and the distant virtual environment and maintains high perceptual fidelity in both domains while minimizing computational expense. The user naturally encounters the high detail of the physical world through vision and touch, and excellent perception of the distant virtual world requires a low to medium level of detail. VISUAL SYSTEM A preliminary visual system is shown in Figure 22.4, where a Sumix camera has been mounted onto an eMagin HMD. A 12mm Computar lens is shown mounted on the camera, and an inertial head tracker made by Intersense (IC2) is attached. The camera is flush with the eye level. The HMD has a diagonal 40° field of view, with the resolution being 800 × 600 pixels. The system frame rate is approximately 70 hertz. DEMONSTRATION Fused reality was integrated with two of Systems Technology’s simulation products: ParaSim (a parachute simulator) and STISIM Drive (a driving simulator).
Simulation Training Using Fused Reality
Figure 22.4.
223
Preliminary Visual System
ParaSim ParaSim was first developed for the U.S. Forest Service to train firefighting smokejumpers. More recent versions are used to train navy and air force aircrews. A multiple concurrent jumper version has been developed for the Special Operations Command for mission planning and rehearsal. A typical configuration for ParaSim is shown in Figure 22.5, where a jumper is suspended in an actual parachute harness that is attached to a scaffolding. The user views a virtual jump scene using an HMD. The immersive effect is limited, however, due to the absence of visual cues corresponding to the physical tangible environment: harness, jump equipment, and limb location. The capability to view the virtual scene relative to one’s boots would be especially helpful in assisting the jumper’s spatial orientation. In Figure 22.6 the scaffolding is shrouded with magenta cloth on all sides (including top and bottom) except the rear, which the jumper is not able to turn and view. The trainee in the fused reality simulation sees the simulation display wherever the key color (in this case, magenta) exists. The monochrome drape over the simulation frame becomes an immersive display, completely surrounding the trainee, yet allowing him to see his own arm movements, body position, and the direction of his feet, as shown in Figure 22.7. The instructor’s displays in Figure 22.8 show
224
Integrated Systems, Training Evaluations, and Future Directions
Figure 22.5.
Typical ParaSim Configuration
the simulation controls, the simulation display, and the live video feed from the camera mounted on the HMD. STISIM Drive This simulator was originally developed for the Arizona Highway Patrol to evaluate the fitness for duty of long-haul truckers (Stein, Parseghian, Allen, & Rosenthal, 1991). Recent applications include use by medical research institutions to study, for example, the effects of new drugs, the cognitive impact of brain injuries, and the effects of HIV (human immunodeficiency virus) medications. Current STI research applications include programs for the National Institutes of Health to study the impact of simulator based training on novice drivers
Simulation Training Using Fused Reality
Figure 22.6.
225
Shrouded ParaSim Configuration
(Allen et al., 2000) and cognition of impaired drivers. The STISIM Drive device shown in Figure 22.9 has the following key system features that include off-theshelf hardware components: Pentium processor; Windows 2000 Professional operating system; nVidia G-Force4 graphics processor; multiple screen display provides a 135° field of view, input/output interface processor for control inputs (that is, steering, braking, throttle, and optional clutch); steering force feedback (feel system); network for distributed processing; and optional motion base. Recent advances in commercial off-the-shelf laptop computer technology have enabled both these simulators to be operated with a laptop computer, greatly enhancing the portability of the simulators. Figure 22.10 shows a scene generated by STISIM Drive.
226
Integrated Systems, Training Evaluations, and Future Directions
Figure 22.7.
Immersive View of ParaSim Configuration
The STISIM Drive configuration employed with fused reality used a Honda car cab with a force-feedback steering wheel, a brake, and gas pedal as inputs to the simulation. Dashboard instruments, such as the speedometer and the tachometer, responded to the simulated car states. Magenta cloth was draped in front of the car cab as shown in Figure 22.11, so that the driver’s field of regard was approximately 135°. A flat screen monitor, mounted outside the rear left window, displayed the same image the user was seeing in the HMD. Figures 22.11 and
Figure 22.8.
ParaSim Configuration Instructor’s View
Simulation Training Using Fused Reality
Figure 22.9.
227
STISIM Drive Device
22.12 show views of the user looking forward right and forward, respectively (note the driver’s hands on the steering wheel in Figure 22.12). Some key advantages that fused reality offers in the driving simulation include the following: • Unlimited field of regard—every window can be covered with magenta to create a virtual portal; • Enhanced experimental capability—drivers can interact with physical objects (such as maps and cell phones); • Enhanced immersion realism—drivers can observe and operate real equipment while being framed in an authentic near-field environment, also known as embedded simulation; and
Figure 22.10.
Scene generated by STISIM Drive
228
Integrated Systems, Training Evaluations, and Future Directions
Figure 22.11.
View of Driver Looking Forward Right
Figure 22.12.
View of Driver Looking Forward
Simulation Training Using Fused Reality
229
• Lighting effects, such as blinding headlights, can be applied to both the virtual and video layers.
VIRTUAL DEPLOYMENT OF PHYSICAL OBJECTS One of the most powerful aspects of fused reality is the capability to capture real time video and maneuver the video within the virtual scene, allowing the video to be occluded by virtual objects. This technique would significantly enhance realism and component recognition by users, since components of a virtual SolidWorks assembly could be textured with real time images of the actual assembly. These physical images would be extracted individually by the object recognition tool. Figure 22.13 shows an example of this in the current version of fused reality. Here the physical object, a hand, is identified by pixel brightness (the background is a black cloth). The pixels associated with the hand are maneuvered in 3-D virtual space via a joystick. Note that the strut of the water tower is occluding the hand that is behind it. The following is an example of how this technique could be used in a driving simulation. An actual person standing in front of the driver could be gesturing as a policeman commanding traffic at an intersection (Figure 22.14). Although the policeman is physically fixed at some distance away from the car, the real time bitmap of the policeman that the driver sees can be made to move anywhere within the virtual scene. In this way the policeman’s gesturing image would appear small in the distance and loom larger as the vehicle approaches. The person playing the part of the policeman could be viewing a screen mounted on top of the car so that he or she can respond to the simulated motion and position of the driver’s car. It should be noted that this scenario does not require the policeman to be in the same physical location as the trainee. Allowing the policeman (or a specific person, such as a police chief ) to be filmed at a remote location makes the technology even more useful. Thus two trainees in different locations could be set up to interact with each other in a virtual world, enabling teamwork training. A key feature of using live persons in fused reality vice models is that models perform according to a script, eliminating scenario flexibility and adaptability. With live actors, all participants can interact in a more natural flow of
Figure 22.13.
Physical Image of a Hand in the Current Version of Fused Reality
230
Integrated Systems, Training Evaluations, and Future Directions
Figure 22.14.
Example of Fused Reality Technique in a Driving Simulation
events. As an example, if the driver does not initially notice the signals of the traffic policeman (and the policeman is virtually in danger of being run over), the officer can become decidedly more animated in his or her gestures—perhaps even dodging the vehicle as a last resort. FUTURE FEATURES USING FUSED REALITY Target areas (portals) could be designated by infrared (IR) and ultraviolet (UV) reflection, giving rise to virtual reality portal generation on command (that is, directing an IR or UV source toward a reflective surface). Dual reality portals could be made by coating glass with transparent material that reflects IR or UV light, so that the trainee sees a virtual environment while the naked-eye observer can view the actual environment that exists beyond the glass. This would allow training in actual vehicles, such as cars (driven in vacant parking lots) and aircraft, while a safety observer looks for potential conflicts or other hazards. Thus a trainee would experience the actual forces that he or she is effecting while operating in a virtual world. CONCLUSION This general description of fused reality, as applied to a parachute flight training system and a driving simulator, has demonstrated the difference between fused reality and traditional blue screen techniques. Fused reality offers a more cost-effective solution for real time image compositing for multiple color keys, wider ranges of color aberrations, and greater robustness to a variety of environmental lighting conditions. Virtual deployment of physical objects is an aspect of fused reality that further expands the capabilities of simulation and greatly enhances user immersion. The superiority of fused reality is clear in that the near
Simulation Training Using Fused Reality
231
tangible environment and the distant virtual environment are both accessible to the human operator and a high perceptual fidelity is maintained in both domains with minimal computational expense. The user experiences details of the physical world plus excellent perception of the generally less-detailed, distant virtual world. REFERENCES Allen, R. W., Cook, M. L., Rosenthal, T. J., Parseghian, Z., Aponso, B. L., Harmsen, A., et al. (2000). A novice driver training experiment using low-cost PC simulation technology. Paper presented at the Driving Simulator Conference (DSC) 2000, Paris, France. Bachelder, E. N. (2006). Helicopter aircrew training using fused reality. In Virtual Media for Military Applications (RTO Meeting Proceedings No. MP-HFM-136, pp. 27-1–2714). Neuilly-sur-Seine, France: Research Technology Organisation. Hahn, M. E. (2005). Implementation and Analysis of the Chromakey Augmented Virtual Environment (ChrAVE) Version 3.0 and Virtual Environment Helicopter (VEHELO) Version 2.0 in Simulated Helicopter Training. Master of Science, Naval Postgraduate School, Monterey, CA. Oda, K. (1996). Z-Key: A New Method for Creating Virtual Reality. Retrieved April 25, 2008, from http://www.cs.cmu.edu/afs/cs/project/stereo-machine/www/z-key.html Stein, A. C., Parseghian, Z., Allen, R. W., & Rosenthal, T. J. (1991). High risk driver project: validation of the truck operator proficiency system (TOPS) (STI-TR-240601). Hawthorne, CA: Systems Technology, Inc.
Chapter 23
DISMOUNTED COMBATANT SIMULATION TRAINING SYSTEMS Bruce Knerr and Stephen Goldberg The term “dismounted combatant” may invoke a variety of colorful images, from a weather-beaten American Civil War cavalryman holding the reins of his horse to a medieval knight pinned to the ground by the weight of his armor. We will use it to describe contemporary army soldiers or marines who perform their missions in direct contact with the people, places, and objects in their environments rather than from inside a combat vehicle or via remote sensors or weapons. These are the soldiers whom we have traditionally described as “infantry.” Over the past decade two factors have conspired to make the job of the dismounted combatant more complex: changes in the variety and types of missions they perform and changes in the environment in which they perform those missions. VARIETY AND TYPES OF MISSIONS Dismounted combatants have traditionally been trained to conduct combat operations in an environment occupied almost exclusively by friendly and enemy forces. Today they are required to carry out a variety of activities ranging from food distribution and traffic checkpoint operation to combat, in an environment that includes a large number of people who are not clearly either friendly or enemy. Moreover, they must frequently transition rapidly from one type of activity to another. Their success is often dependent on the decisions and actions of relatively junior personnel (Krulak, 1999). THE ENVIRONMENT While we previously trained our infantrymen to operate in open terrain, there are two reasons why we must prepare them for urban combat as well. First, the world is becoming increasingly urban. Second, the two Gulf wars demonstrated U.S. superiority in open terrain. We can expect future enemies to attempt to fight us on urban terrain, which is more to their advantage. Urban areas are more
Dismounted Combatant Simulation Training Systems
233
complex than open terrain; buildings add a vertical dimension, limit visibility and communication, and are usually occupied by noncombatants, whose presence complicates decision making and limits options for the use of force. Dismounted combatant simulations have different requirements than more traditional training simulations, such as flight or vehicle simulators. Their tasks frequently have the following characteristics. DIRECT INTERACTION WITH THE SIMULATED PHYSICAL ENVIRONMENT Unlike crew members of aircraft or armored vehicles, dismounted combatants interact directly with their weapons and the objects in their environment. They walk and run through streets, climb stairs, throw grenades, and drop behind barricades. They fire their weapons by lifting them into position and pulling a trigger. They obtain information about their environment directly through their basic senses (sight, hearing, touch, and smell), not an electronic display. DIRECT PERSON-TO-PERSON INTERACTION UNMEDIATED BY EQUIPMENT Communication with others is face-to-face and direct. They make eye contact and interpret posture and gestures. EMPHASIS ON PHYSICAL ACTIVITY The actions that dismounted combatants take are predominantly physical. While they make situation assessments, plan, evaluate alternative courses of action, and make decisions, the results of those cognitive activities are physical actions. EARLY HISTORY—THE 1990s The simulation networking program, begun in the early 1980s, and the close combat tactical trainer program, begun in 1992, established the feasibility of using networked simulators to train for combat in ground vehicles (Cosby, 1995). Interest in the use of immersive simulation for dismounted infantry training began in the early 1990s. Partly as a result of the efforts of Gorman (1990), a conference held in 1990 to discuss individual soldier systems and the role that an individual immersive simulator would play in their development provided the impetus for the initiation of research programs in the area (Goldberg & Knerr, 1997). The navy was the first service to produce a prototype of a virtual individual combatant simulator. The team tactical engagement simulator program was begun in 1993. The team tactical engagement simulator consisted of an 8´ × 10´ rear projection display, demilitarized rifle tracker, head tracker, computer
234
Integrated Systems, Training Evaluations, and Future Directions
graphics generator, and system software. Trainees moved through the virtual world through the use of a foot pedal; pressure on the front of the pedal moved the trainee in the direction of gaze, while pressure on the back of the pedal moved the trainee in the opposite direction. J. H. Lind (Lind & Adams, 1994; Lind, 1995) used a structured process to obtain subject matter expert ratings that indicated high potential usefulness of the team tactical engagement simulator for training tactical situations, marksmanship, discretionary decisions, mission preview, and mission rehearsal. No empirical evaluations were conducted. The army began its dismounted warrior network program in 1997. This program developed and evaluated a variety of different simulators and simulator technologies that could be used by dismounted combatants. The program did not evaluate training effectiveness, but did obtain data about task performance in the virtual simulators used (Lockheed Martin Corporation, 1997, 1998; Pleban, Dyer, Salter, & Brown, 1998; Salter, Eakin, & Knerr, 1999). It showed that soldiers could perform basic infantry tasks, such as engaging targets, simulating locomotion, and identifying people and objects, in the simulators. It also revealed limitations. Salter et al. identified four major areas for improvement following experiments conducted at the conclusion of the program. First, improved position and orientation tracking was necessary to improve weapons accuracy. Second, “walking” in the virtual world required conscious effort and may have impaired the performance of other tasks or interfered with training. Soldiers did not acquire full proficiency in the simulators in the time available and consequently moved more slowly in the virtual world than in the real world and frequently collided with walls. The third was providing a means of nonverbal communication, such as gestures and facial expressions. The fourth was increasing the field of view of visual displays. RECENT HISTORY: 1999–2005 From 1999 to 2005, five army organizations conducted a series of related programs that shared a common assessment methodology and used nearly identical measures of task performance and training effectiveness. The organizations were the U.S. Army Research Institute Simulator Systems and Infantry Forces Research Units, the U.S. Army Simulation Training and Instrumentation Command, the U.S. Army Research Laboratory Human Research and Engineering Directorate, and the U.S. Army Research Laboratory Computational and Information Sciences Directorate. Each organization had a particular area of interest, but all worked together to explore concepts and systems. Evaluations conducted in 1999, 2001, 2002, 2004, and 2005 will be described. These were comprehensive assessments conducted with squads of soldiers, using as much of the developed technology as was feasible, in a realistic training exercise. The squads consisted of a squad leader (the primary trainee) and one or two four-man fire teams. Each assessment involved a different group of soldiers. The assessments differed in their procedural details, but each involved three squads of six to nine soldiers. Each squad conducted a series of tactical scenarios in the simulators over one or, more commonly, two days. Each scenario consisted
Dismounted Combatant Simulation Training Systems
235
of a planning period, scenario execution in the simulators (lasting about 20 minutes), and an after action review. Only the 2004 assessment used a mix of immersive and desktop simulators. Otherwise, immersive simulators were used exclusively. More detail on these assessments can be found in Knerr (2007). Typical hardware and software included the following networked components: • SVS (Soldier Visualization Station) individual soldier simulators. The SVS is a personal-computer (PC) based dismounted infantry simulator developed by Advanced Interactive Systems, Inc. Immersive and desktop versions are functionally similar, but have different displays and controls. The immersive SVS uses a rearscreen projection system to present images (800 × 600 resolution) on a screen approximately 10 feet wide by 7.5 feet high. The soldier’s head and weapon are tracked using an acoustic tracking system. The soldier navigates through the environment via a thumb switch located on the weapon. The desktop SVS is functionally similar to the immersive SVS, but the soldier sits at a PC and views the simulation on a monitor. Immersive and desktop visual displays presented the same information. A joystick is used to control view, movement, and weapon use. In these assessments, squad and fire team leaders always used the immersive SVS, role-players always used the desktop SVS, and fire team members usually used the immersive SVS. The simulators were typically equipped with radio headsets, which permitted verbal communication within the squad and between the squad leader and his higher headquarters. • Dismounted Infantry Semi-Automated Forces Operator Station. An operator and the exercise controller used this station. Dismounted Infantry Semi-Automated Forces was developed by SAIC to provide a realistic representation of simulated entities. • Dismounted Infantry Virtual After Action Review System. The Dismounted Infantry Virtual After Action Review System is a PC based system developed by the Army Research Institute and the University of Central Florida Institute for Simulation and Training specifically to meet the after action review requirements for dismounted infantry in urban combat (Knerr, Lampton, Martin, Washburn, & Cope, 2002). It was used in all evaluations beginning in 2001. The key capabilities are replay with synchronized audio and video, including the capability to jump to pre-designated segments or views and to produce tabular data summaries. It also includes the capability to display building interiors and to capture and replay voice communications.
All assessments used essentially the same list of 54 soldier activities to determine how well soldier tasks could be performed in the simulators (from Pleban, Eakin, & Salter, 2000). Soldiers rated their abilities to perform each activity as very poor, poor, good, or very good. The list of activities is shown in Table 23.1. Squad and fire team leaders completed a questionnaire that asked them to rate their improvement in 11 areas as a result of their training. Improvement was rated on a four-point scale from “no improvement” (0) to “vast improvement” (3). The 11 areas are shown in Table 23.3. Each assessment involved a different group of soldiers. ASSESSMENT RESULTS The capabilities of the SVS were fairly static during the 2002–2005 period, and for this reason their ratings of the 53 soldiers who completed simulator capability
236
Integrated Systems, Training Evaluations, and Future Directions
Table 23.1. Combined Ratings of Simulator Capability (2002–2005)* Task
Mean Rating Task
Mean Rating
Move through open areas as a widely separated group.
2.49
Identify areas that mask supporting fires.
2.00
Identify civilians.
2.47
Maneuver below windows.
2.00
Execute planned route.
2.42
Use flash-bang grenades to help clear rooms.
2.00
Move in single file.
2.38
Take hasty defensive positions.
1.98
Fire weapon in short bursts.
2.35
Engage targets within a room.
1.98
Understand verbal commands.
2.31
Scan from side to side.
1.90
Locate assigned areas of observation, for example, across the street.
2.30
Look around corners.
1.89
Identify assigned sectors of observation.
2.29
Determine other team/squad members’ positions.
1.84
Move according to directions.
2.29
Take position to one side of a doorway.
1.81
Identify noncombatants within a room.
2.27
Locate enemy soldiers inside buildings firing at your unit.
1.80
Identify sector of responsibility.
2.25
Move close to walls.
1.77
Execute the assault as planned.
2.25
Scan the room quickly for hostile combatants.
1.75
Communicate enemy location to team member.
2.22
Maneuver/move around obstacles.
1.75
Move quickly to the point of attack.
2.21
Use fragmentation grenades.
1.70
Aim weapon.
2.21
Estimate distances from self to a distant object/point.
1.67
Communicate spot reports to squad leader.
2.21
Maneuver close to others.
1.66
Fire weapon accurately.
2.19
Take a tactical position within a room.
1.66
Coordinate with other squad members.
2.17
Move past furniture in a room.
1.63
Dismounted Combatant Simulation Training Systems
*
237
Identify covered and concealed routes.
2.16
Climb up or down stairs.
1.59
Identify safe and danger areas.
2.13
Maneuver around corners.
1.53
Assume defensive positions.
2.11
Visually locate the source of enemy fire.
1.52
Locate support team positions.
2.11
Move quickly through doorways.
1.40
Use handheld illumination (flares).
2.10
Distinguish between friendly and enemy fire.
1.39
Identify enemy soldiers.
2.08
Maneuver past other personnel within a room.
1.36
Locate buddy team firing positions.
2.06
Determine the direction from which enemy rounds are coming.
1.33
Employ tactical handheld smoke grenades.
2.03
Scan vertically.
1.19
Maintain position relative to other team members.
2.00
Determine the source of enemy fire by sound.
1.15
Note: N varies from 41 to 53. Ratings are on a scale from “very poor” (0) to “very good” (3).
ratings were combined to produce the summary rating shown in Table 23.1. Activities are ordered from best (very good, 3.0) to worst (very poor, 0.0). Thirty of the 54 activities were rated good or better (2.00 and above), 18 were rated between the good/poor midpoint (1.50) and good (2.00), and 6 were rated poor (1.00–1.49). Activities that were rated highly included outdoor movement, identification of types of people (civilians, noncombatants within a room, and enemy soldiers), identification of tactically significant areas (sectors of observation and responsibility), and individual weapons use (but not grenades). Poorly rated items included maneuver indoors (close to others, past furniture, close to walls, around objects, past other personnel, around corners, through doorways, and up and down stairs), and identifying the source and type of fire (enemy or friendly), either by auditory or visual cues. Issues with maneuver and auditory cues will be addressed further in the discussion section. Table 23.2 summarizes the overall results of the administration of the Training Effectiveness Questionnaire over the five assessments. It was completed by squad and fire team leaders in every assessment and by fire team members in the assessments conducted in 2004 and 2005. The pattern is consistent, with the mean ratings increasing consistently every year, from 0.82 (less than slight improvement) in 1999 to 2.06 (moderate improvement) in 2005. Table 23.3 compares leader and soldier ratings on individual tasks. Leaders report the most improvement in controlling their units, assessing the tactical situation, and communication. Soldiers report the most improvement in
238
Integrated Systems, Training Evaluations, and Future Directions
Table 23.2. Training Effectiveness Questionnaire Results* Assessment Year
*
Leaders
Soldiers
Combined
Mean (N)
Mean (N)
Mean (N)
1999
0.82 (9)
–
–
2001
1.24 (9)
–
–
2002
1.45 (9)
–
–
2004
1.74 (9)
1.82 (18)
1.79 (27)
2005
2.06 (9)
1.30 (17)
1.55 (26)
Combined 2002–2005
1.75 (27)
1.57 (35)
1.65 (62)
Note: Ratings are on a scale from “very poor” (0) to “very good” (3).
communication and planning a tactical operation. Overall, the general pattern is for more improvement to be reported for planning, coordination, and control tasks and less improvement for more rigidly described tasks and drills. The 2002 assessment included an objective measure of unit performance, a 14-item checklist of unit behaviors that was scored independently by three raters for each scenario. Scores on similar scenarios improved with practice, providing rare evidence, beyond trainee opinion, that the training was effective. Table 23.3. Combined Training Effectiveness Questionnaire Results (2002–2005)* Question
N
*
Leaders
Soldiers Combined
27
35
62
Assess the tactical situation.
2.09
1.68
1.86
Control of squad/fire team movement during the assault.
2.17
1.61
1.85
Communicate with members of your team or squad.
1.80
1.83
1.82
Coordinate activities with your chain of command.
1.80
1.71
1.75
Plan a tactical operation.
1.72
1.76
1.74
Control squad or fire team movement while not in contact with the enemy.
1.78
1.38
1.56
React to Contact Battle Drill.
1.55
1.53
1.54
Control your squad or fire team.
1.83
1.30
1.53
Clear a building.
1.50
1.54
1.53
Locate known or suspected enemy positions.
1.54
1.47
1.50
Clear a room.
1.48
1.42
1.45
Mean
1.75
1.57
1.65
Note: Ratings are on a scale from “no improvement” (0) to “vast improvement” (3).
Dismounted Combatant Simulation Training Systems
239
DISCUSSION Training Effectiveness Soldiers and small unit leaders have reported that their skills improved as a result of training in dismounted combatant simulations, with the most improvement in controlling, coordinating, communicating, and planning and less improvement in the mechanics of tasks. Their reports have generally, if informally, been confirmed by observers. Objective measures of performance, obtained only in the 2002 assessment, indicated improvement in performance over the course of the training. There have been no successful attempts to measure the transfer of this training to a more advanced phase of training, such as a live simulation. These results were obtained despite a number of limitations. The total amount of time that trainees spent conducting training scenarios was actually relatively short (no more than three hours and usually considerably less). In most cases, the software was still under development, and not fully problem-free. The scenarios were usually not tailored to the skill level of the leader or unit. The trainer/ after action review leader was frequently unfamiliar with the technology and did not necessarily use it to best advantage. In terms of an overall training approach, dismounted infantry simulation appears well suited to provide “walk” level training in a “crawl, walk, run” sequence of instruction. In the assessments it was used in that way, although it was not usually possible to include the “run” phase. The crawl level consists of individual skill training and initial demonstration and practice of the collective tasks. The run level consists of exercises conducted in an instrumented facility using military versions of laser tag or paintball. Both the walk and run (virtual and live) training consist of the planning and execution phases of tactical scenarios, with each exercise followed immediately by feedback. In accordance with standard army practice, this feedback takes the form of an after action review during which the trainees seek to discover what happened, why it happened, and how to improve. Soldier Task Performance Fine or precision movement in confined areas, such as movement indoors, has consistently emerged as the most important improvement required. The difficulties of indoor maneuver are likely a result of several contributing factors: the size of the bounding box that detects collisions between soldiers and other objects and between soldiers, the limited texture and shading cues on the walls inside buildings, the relatively narrow field of view of both the desktop and the immersive visual displays, the problem of representing objects in the immersive simulators that are located between the soldier and the rear-projection screen, and the linkage between the direction of gaze and the direction of movement. While soldiers reported difficulty localizing auditory cues in the simulators, subject matter experts reported that it is difficult to localize auditory cues in real
240
Integrated Systems, Training Evaluations, and Future Directions
urban combat situations. It is not clear, therefore, whether this difficulty reflects limitations in the simulators or the complexity of the real world. Simulators may differ in the specifics of the physical actions that they permit, particularly with regard to locomotion and touch. For example, trainees in the SVS could not physically “throw” grenades. They selected a type of grenade from a menu, chose direction and launch angle by pointing their mock weapons, adjusted the force desired by moving a slider with a pair of buttons on the weapon, and pulled the triggers to launch the grenades. While this bears little resemblance to the physical act of throwing a real grenade, it does provide practice in deciding which type of grenade to use and where and when to employ it. Such constraints as these limit training effectiveness only if those actions that cannot be performed in the simulator are not trained by other means. It is important that dismounted combatant simulations be used as part of a planned sequence of instruction that provides trainees with the prerequisite skills prior to the simulation and provides subsequent live training to improve the physical skills that cannot be trained in the virtual simulation. Even though a simulator may not allow soldiers to perform many of the physical actions they need to perform in the way they need to perform them, it may allow them to practice and learn the cognitive skills they need. These include planning prior to the start of missions, maintaining situational awareness, and making appropriate decisions in complex situations. RECOMMENDATIONS Conduct a Large-Scale Evaluation Using Current Technology The evidence that immersive simulation can provide effective training is not sufficient to justify the immediate acquisition of such systems on a large scale. However, it is sufficient to justify a rigorous training effectiveness evaluation of a prototype system that would permit quantification of the effectiveness of the training and comparison with alternatives. While this would be costly, in terms of both dollars and soldier time, it would lead to a more informed acquisition decision. This evaluation should consider objective measurement of both skill improvement on the simulator and transfer of those skills to a live simulation environment. It should involve a sufficient number of units that any meaningful real differences in the training effectiveness can be detected. The evaluation should be embedded in the unit’s normal training progression, and unit personnel should be involved in its development and delivery. Consider Cost-Effectiveness of Fully Immersive and Desktop or Laptop Systems The cost differential between fully immersive simulators using large projection displays or head-mounted displays and simulators using desktop or laptop computers to provide the same functionality can be enormous. Knerr (2006)
Dismounted Combatant Simulation Training Systems
241
estimated the difference at over $75,000 per individual simulator. While the cost difference is likely to decrease as the cost of large visual displays decreases, the major cost drivers for fully immersive simulators are the interface devices, particularly the position and orientation trackers and visual display systems. The computers are relatively cheap. The interface devices also increase space and support requirements. In contrast, the research evidence indicates that any difference in training effectiveness between immersive and desktop systems is likely to be small. Loftin et al. (2004) found small differences in effectiveness between immersive and desktop simulators and questioned the cost-effectiveness of the immersive simulator. A comparison of soldier (trained in the desktop simulators) and leader (trained in the immersive simulators) ratings of training effectiveness in the 2004 assessment raises the same question. Other research addressing the question of interface fidelity on training effectiveness is limited. The basic concept is that immersive systems, as compared to desktop systems, provide the trainee with more information about their orientation in and movement through physical space. It appears that head-tracked visual displays, body-controlled movement, or a combination of the two can improve the performance of spatially oriented tasks and acquisition of spatial knowledge (for example, Grant & Magee, 1998; Lathrop & Kaiser, 2005; Singer, Allen, McDonald, & Gildea, 1997; Waller, Hunt, & Knapp, 1998), but this difference does not appear to be large. Whether immersive simulators are also better when training squads to conduct urban or counterinsurgency operations and, if so, whether the difference is large enough to justify the increased cost, are unknown, but should be investigated, perhaps in conjunction with the large-scale evaluation recommended above. REFERENCES Cosby, L. N. (1995). SIMNET: An insider’s perspective (IDA Document D-1661). Alexandria, VA: Institute for Defense Analyses. (ADA294786) Goldberg, S. L., & Knerr, B. W. (1997). Collective training in virtual environments: Exploring performance requirements for dismounted soldier simulation. In R. J. Seidel & P. R. Chatelier (Eds.), Virtual reality, training’s future? (pp. 41–52). New York: Plenum Press. Gorman, P. F. (1990). Supertroop via I-Port: Distributed simulation technology for combat development and training development (IDA Paper No. P-2374). Alexandria, VA: Institute for Defense Analyses. (ADA229037) Grant, S. C., & Magee, L. E. (1998). Contributions of proprioception to navigation in virtual environments. Human Factors, 40(3), pp. 489–497. Knerr, B. W. (2006). Current issues in the use of virtual simulations for dismounted soldier training. In Virtual Media for Military Applications (RTO Meeting Proceedings No. RTO-MP-HFM-136, pp. 21-1–27-11). Neuilly-sur-Seine, France: Research and Technology Organization. Knerr, B. W. (2007). Immersive Simulation Training for the Dismounted Soldier (Study Rep. No. 2007-01). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. (ADA464022)
242
Integrated Systems, Training Evaluations, and Future Directions
Knerr, B. W., Lampton, D. R., Martin, G. A., Washburn, D. A., & Cope, D. (2002). Developing an after action review system for virtual dismounted infantry simulations. Proceedings of the 2002 Interservice/Industry Training, Simulation and Education Conference. Arlington, VA: National Training Systems Association. Krulak, C. C. (1999, January). The Strategic Corporal: Leadership in the three block war. Marines Magazine. Retrieved April 15, 2008, from http://www.au.af.mil/au/awc/awcgate/usmc/strategic_corporal.htm Lathrop, W. B., & Kaiser, M. K. (2005). Acquiring spatial knowledge while traveling simple and complex paths with immersive and non-immersive interfaces. Presence, 14(3), 249–263. Lind, J. H. (1995). Perceived usefulness of the Team Tactical Engagement Simulator (TTES): A second look (Rep. No. NPS-OR-95-005). Monterey, CA: Naval Postgraduate School. Lind, J. H., & Adams, S. R. (1994). Team Tactical Engagement Simulator (TTES): Perceived training value (Rep. No. NAWCWPNS TM 7724). China Lake, CA: Naval Air Warfare Center Weapons Division. Lockheed Martin Corporation. (1997). Dismounted warrior network front end analysis experiments (Advanced Distributed Simulation Technology II, Dismounted Warrior Network DO #0020, CDRL AB06, ADST-II-CDRL-DWN-9700392A). Orlando, FL: U.S. Army Simulation, Training and Instrumentation Command. (ADA344365) Lockheed Martin Corporation. (1998). Dismounted warrior network enhancements for restricted terrain (Advanced Distributed Simulation Technology II, Dismounted Warrior Network DO #0055, CDRL AB01, ADST-II-CDRL-DWN-9800258A). Orlando, FL: U.S. Army Simulation, Training and Instrumentation Command. (ADA370504) Loftin, R. B., Scerbo, M. W., McKenzie, R., Catanzaro, J. M., Bailey, N. R., Phillips, M. A., & Perry, G. (2004, October). Training in peacekeeping operations using virtual environments. Paper presented at the RTO HFM Symposium on Advanced Technologies for Military Training, Genoa, Italy. (ADA428142) Pleban, R. J., Dyer, J. L., Salter, M. S., & Brown, J. B. (1998). Functional capabilities of four virtual individual combatant (VIC) simulator technologies: An independent assessment (Tech. Rep. No. 1078). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences (ADA343575) Pleban, R. J., Eakin, D. E., & Salter, M. S. (2000). Analysis of mission-based scenarios for training soldiers and small unit leaders in virtual environments (Rep. No. RR 1754). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Salter, M. S., Eakin, D. E., & Knerr, B. W. (1999). Dismounted warrior network enhancements for restricted terrain (DWN ERT): An independent assessment (Research Rep. No. 1742). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. (ADA364607) Singer, M. J., Allen, R. C., McDonald, D. P., & Gildea, J. P. (1997). Terrain appreciation in virtual environments: Spatial knowledge acquisition (Tech. Rep. No. 1056). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. (ADA325520) Waller, D., Hunt, E., & Knapp, D. (1998). The transfer of spatial knowledge in virtual environment training. Presence: Teleoperators and Virtual Environments, 7(2), 129–143.
Part VII: Training Effectiveness and Evaluation Applications
Chapter 24
CONDUCTING TRAINING TRANSFER STUDIES IN COMPLEX OPERATIONAL ENVIRONMENTS Roberto Champney, Laura Milham, Meredith Bell Carroll, Ali Ahmad, Kay Stanney, Joseph Cohn, and Eric Muth Training effectiveness evaluations (TEEs) are used to assess the amount of learning that occurs from a prescribed training regime and the degree to which the regime results in observable performance changes in the domain environment (that is, transfer of training [ToT]). Transfer effectiveness evaluations are a method for assessing the degree to which a system facilitates training on targeted objectives (see Cohn, Stanney, Milham, Carroll, Jones, Sullivan, and Darken, Volume 3, Section 2, Chapter 17, for a comprehensive review of training effectiveness evaluations). A TEE is critical to perform as it is the primary method for truly understanding the efficacy of a training system or program. Particularly, a ToT evaluation can assist in (1) making decisions regarding the adoption of new training regimes or systems by providing quantitative data to compare their relative value, (2) determining the level of training offered by different training platforms (for example, classroom, simulated environments, live training, and so forth), and (3) determining the correct mix of each of those platforms. Considering the importance of evaluating training effectiveness, it is surprising that there are reports of a general lack of robust evaluation practices in industry (Carnevale & Shultz, 1990; Eseryel, 2002). Similarly, the American Society for Training and Development (ASTD) has reported on the limited assessment of training effectiveness (compare Bassi & van Buren, 1999; Thompson, Koon, Woodwell, & Beauvais, 2002). The application of training effectiveness evaluations has been problematic (for example, Baldwin & Ford, 1988; Saks & Belcourt, 2006) and has resulted in a lack of rigor and a limited application of the trained constructs. Furthermore, as pointed out by Cohn et al. (Volume 3, Section 2, Chapter 17) and others (compare Flexman, Roscoe, Williams, & Williges, 1972), TEEs, in particular transfer of training evaluations, are a resource-intensive process, which is often infeasible to apply in real world settings (this is discussed in more detail later in this chapter). This has made it difficult for practitioners to conduct TEEs in the field.
244
Integrated Systems, Training Evaluations, and Future Directions
The application of TEEs in the field is presented with three broad challenges as illustrated by Cohn et al. (Volume 3, Section 2, Chapter 17): (1) the difficulty in developing meaningful measures of skill transfer and the associated large and time consuming data-collection efforts required for their implementation, (2) the need for an operational system or prototype to support the empirical evaluation, and (3) the multiple logistical constraints that surround TEEs (accessibility to trainees, scheduling, and so forth). The authors have tackled many TEE challenges and, in this chapter, they address two of the issues presented above that are encountered while applying training evaluation methods: (1) the logistical constraints that lead to the utilization of untrained undergraduates in TEEs to assess learning and transfer and (2) the resource-intensive nature of traditional transfer of training evaluation methods requiring more efficient approaches. The chapter discusses these challenges and provides a case study in which alternative methodologies were applied in a military operations on urban terrain (MOUT) domain (the sorts of tasks the infantry trains to perform, such as room cleaning). These approaches are believed to render TEEs more feasible and cost-effective to conduct. ROADBLOCKS TO CONDUCTING TRAINING EFFECTIVENESS EVALUATIONS Training effectiveness evaluation participants representative of the target domain are required in order to ensure validity of a training regime. Yet, it can be difficult to obtain access to individuals from the target population. This chapter addresses one way to resolve this problem Also, determining an appropriate transfer evaluation methodology (that is, a methodology for calculating the amount of training transfer to operational performance that a training system or program can achieve) and the amount of training a participant should receive in order to evaluate transfer is a challenge, which if incorrectly prescribed, can limit the ability to draw conclusions about transfer effectiveness. This chapter proposes an approach developed to address this issue. Test Population In applied fields, it is common to face limitations with respect to obtaining representative domain samples to conduct TEEs early enough in the training lifecycle to influence the training system design. Representative samples are often limited by availability, restricted access, willingness to participate, and resource constraints (for example, cost), and so forth. In these instances, evaluators often recruit from substitute populations to compare design alternatives or to conduct other forms of empirical validation (Ward, 1993). It is imperative that the chosen substitute population sample possesses sufficient knowledge, skills, and attitudes (KSAs) regarding the target domain to ensure validity of results. For example, a common practice is to recruit undergraduate students as experimental participants due to their availability. In these cases, unless these students obtain
Conducting Training Transfer Studies in Complex Operational Environments
245
sufficient knowledge, skills, and attitude levels that bring them closer to those of the target population to be representative enough, they may not provide an appropriate sample from which to generalize to the target population (Wintre, North, & Sugar, 2001). There are other potential differences between a student population and a target population that may limit generalizability, some of which include a lack of context (for example, task meaningfulness) and artificiality of experimental setting (Gordon, Schmitt, & Schneider, 1984). Lack of appropriate KSAs, however, is particularly important given the need for representative data for making inferences regarding the target population. Hence, novel ways of increasing the validity of utilizing undergraduates with limited basic competencies as transfer effectiveness evaluation participants were sought. The approach adopted and presented herein was to bring a sample of undergraduate students “up to speed” by putting them through a “bootcamp” (Champney, Milham, Bell Carroll, Stanney, Jones, et al., 2006). Specifically, the objective of the bootcamp was to bring undergraduate student participants closer to their target population counterparts in terms of knowledge, skills, attitudes, and an understanding of “what is at stake.” In order to accomplish this, the bootcamp consisted of a lecture, practice, rehearsal, feedback, and contextual references (for example, videos and pictures). The bootcamp methodology presented herein is discussed within the context of a MOUT case study reported later in the chapter. It is important to note that such methodology is not intended to replace a real sample of the target population and should be used only in the absence of a domain sample to bring a substitute sample closer to a domain sample in terms of KSAs and context to ensure validity of conclusions.
Transfer Evaluation Methodologies Transfer of training refers to the extent with which learning a task is facilitated or hindered by the prior learning of a task (Roscoe & Williges, 1980). In other words, transfer of training can be viewed as the degree with which a trainee’s abilities in the real world have been improved (or made worse) by prior training. TEEs utilize this degree of impact on a task to gauge the effectiveness of a training program or system (for example, percent transfer). Nonetheless, there are multiple options for how this impact can be computed, and it is the selection of the more robust measures that proves resource intensive. Of the multiple options for measuring transfer, those approaches that not only consider the impact produced by the training system or program but also its efficiency are the most robust. This is because two distinct training regimes or systems may have the same impact on training transfer, but have completely different efficiencies (for example, one could require more time than the other to produce the same effects). Roscoe (1971) sought to address this issue by using the transfer effectiveness ratio (TER), which takes into account the amount of prior training (that is, in the training system or program under evaluation) by specifying the savings in time (or trials) in the live environment to reach a criterion. This time savings is expressed as a function of a predetermined single amount of time (or trials) in
246
Integrated Systems, Training Evaluations, and Future Directions
the alternate trainer (for example, X time saved in an airplane for Y time spent in a simulator). One limitation to this approach is that the transfer effectiveness ratio does not consider the incremental gains from variations in the amount of training time/trials in an alternate training platform; it considers only a single instance of training time and its effect on transfer to a live environment (for example, could these differences among 10, 20, or 30 training trials depend on how quickly trainees reach a criterion in the live environment?). Given that there is no guidance regarding at what point the evaluation should occur (for example, how much time or many training trials), this presents the risk of drawing conclusions about training transfer at points along the learning curve that have not stabilized yet, and any inferences made at such points may be of limited validity and utility. To address this issue, Roscoe (1972) adopted the incremental transfer effectiveness ratio (ITER) approach, which takes into consideration the effectiveness of successive increments of training in each training platform by comparing multiple training time regimes in an alternate platform and their associated transfer to a live environment; however, it requires a considerable time (that is, the evaluation must be done for 1 trial, 2 trials, 3 trials, and so forth until a point of diminishing returns is identified) and a large number of participants. As such, it is not always possible to conduct ITER studies given the copious resources required of this approach (compare Roscoe & Williges, 1980). When faced with a situation with limited resources, an alternative approach may be to use the transfer effectiveness ratio approach and couple it with a technique used to systematically specify the point at which transfer should be evaluated, using the diminishing learning rates in the training system or program under evaluation as a guide. Thus, rather than evaluating increments of training and relative transfer effectiveness (that is, as in the ITER approach), an effort is made to identify the single point at which transfer effectiveness of a system should be evaluated. To address this, a learning curve methodology was developed in which continuous monitoring of performance across trials is used to identify a “plateau” in learning improvements (Champney, Milham, Bell Carroll, Stanney, & Cohn, 2006). The next section presents the learning curve methodology within the context of a TEE for a MOUT trainer, where the goal was to compare the ToT between a low and a high fidelity training solution. METHODOLOGY AND CASE STUDY This case study addresses both methodologies discussed above, including (1) a bootcamp to bring the test population up to speed and (2) a feasible transfer evaluation method using the TER where the evaluation point is prescribed using learning curves. Test Participants: Bootcamp, Bringing Nondomain Participants up to Speed The use of an undergraduate bootcamp for the TEE of the MOUT trainer included two primary objectives: (1) increase target KSAs and (2) increase
Conducting Training Transfer Studies in Complex Operational Environments
247
contextual understanding of the target domain. In order to develop these into a bootcamp, two activities were conducted: (1) domain understanding through task analysis and training objective identification and (2) use of this knowledge to create an instructional course (training plan, procedure, and material). Task Analysis and Training Objective Identification The identification of relevant KSAs was performed using a task analysis of the MOUT domain (that is, specifically the room-clearing task; Milham, GledhillHolmes, Jones, Hale, & Stanney, 2004). The task analysis included interviews and collaborations with subject matter experts, observation of task demonstrations, and reviews of military doctrine. The task analysis resulted in a breakdown of the room-clearing task into subtasks, identification of the KSAs necessary to complete these tasks/subtasks, and creation of metrics and performance standards. These data were then used to identify training objectives, which served as a blueprint for developing training materials that matched associated performance metrics and standards (see Table 24.1). Curriculum After identifying the training objectives, a training curriculum for the bootcamp was developed with the participation of a subject matter expert. The curriculum focused on the constructs to be learned (for example, knowledge, skills, and attitudes) and domain context to make tasks meaningful. In order to instill the desired KSAs and context, the bootcamp curriculum was built around multiple components, each of which was designed to address a particular component of the training experience, including the following: 1. Initial classroom instruction: Used to introduce participants to the domain and teach desired constructs. Beyond conventional lecture conveyance of the requisite KSAs, videos and images were also used to immerse participants into the target domain, thereby supporting contextual training. Table 24.1. Sample of Training Objectives and Performance Metrics Training Objective
Performance Metric
Engagement/acknowledgment
Enemies neutralized Noncombatants acknowledged Missed shots
Room clearing
Percentage of room scanned Time to clear room
Survivability
Shots taken
Exposure
To danger areas (doorways, windows, and entryways) Enemy line of sight
248
Integrated Systems, Training Evaluations, and Future Directions
2. Practical instruction and evaluation: For physical tasks (for example, maneuvering through corridors while manipulating a rifle), practice opportunities were provided with the subject matter expert providing instruction, evaluation, and direct feedback. 3. Rehearsal: Participants were given a rehearsal worksheet, which consisted of a mnemonic that provided an organizational framework for the learned content. Participants were given an opportunity for rehearsal of the mnemonic during the period of time between pre-training (bootcamp) and training with the target training system (that is, days to a few weeks). 4. Review: To mitigate any memory decay from delays between when a bootcamp was to take place and when the training system was to be used, a domain refresher review was instantiated in a short video, which highlighted the requisite KSAs utilizing the mnemonic. This was in addition to a familiarization practice, where participants were allowed to familiarize themselves with the training system. 5. Scenario based feedback: Subject matter expert feedback was instantiated in different forms in the curriculum, first, following the practical instruction of the physical skills, and later during training (while using the actual training system in evaluation). After the physical skills training, the subject matter expert feedback consisted of a verbal instruction on the aspects of the task execution that were correct or incorrect. During training with the system, participants were periodically evaluated and given feedback using an assessment instrument designed around the mnemonic. In an operational context, this feedback represents the after action review that is provided during field training operations.
The bootcamp was designed to provide quality pre-experimental instruction to trainees, resulting in an experimental group that had the basic KSAs for interacting in the targeted MOUT domain.
Transfer Evaluation Methodology: TER at Point Informed by Learning Curve Analysis Learning has been shown to follow a universal power law of practice, generally adhering to a pattern of rapid improvement followed by ever-diminishing further improvements with practice (Ritter & Schooler, 2001). This implies that the rate of return for additional practice reaches a point where additional training is no longer cost-effective. Unfortunately, in practice the observation of an asymptote is usually visible only after extensive trials (for example, in some cases over 1,000 trials; Newell & Rosenbloom, 1981); thus an operational definition of plateau in terms of operational metrics (for example, cost) is required. Plateau analysis of learning curves may be accomplished through several methods, such as by fitting curves and finding asymptotes, or by utilizing parametric or nonparametric statistical tests, such as analysis of variance (Grantcharov et al., 2004) or Friedman tests (for example, Grantcharov, Bardram, Funch-Jensen, & Rosenberg, 2003). Another approach that may be suitable for limited samples involves evaluating the plateau by identifying a period along the curve where improvement variability has slowed to a predetermined level.
Conducting Training Transfer Studies in Complex Operational Environments
249
Plateau analysis was applied in a MOUT training system ToT evaluation. A plateau analysis involves more than simple visual inspection. While at first glance it may seem tempting to visually determine the location of plateaus, the benefits of such an approach are dismissed once one observes the effect of graphing a scale on one’s selection of such plateaus (that is, slopes may appear smaller in larger scales and vice versa). Such an approach lacks objectivity and reliability. This is why it is important to define an objective measure by which to determine where a plateau is present (within the acceptable parameters one defines as a plateau; a description follows).
Parameters There are three parameters to consider in identifying a plateau within operational constraints. These are (1) percent variability, (2) period of variability stabilization, and (3) general location of plateau, which are illustrated in Figure 24.1 and described below. The process for determining these parameters is explained later in this chapter as applied to the MOUT domain.
Figure 24.1. Parameters to consider in identifying a plateau within operational constraints follow: (1) percent variability, (2) period of variability stabilization, and (3) general location of plateau.
250
Integrated Systems, Training Evaluations, and Future Directions
1. Percent variability: Variability of a measure’s cumulative average between subsequent trials is less than X percent. This parameter is used to define the acceptable gains (or losses) in performance across trials. It is defined as the number of resources needed for one additional trial over the acceptable amount of gains (for example, 10 percent improvement). In cases where limited data are available, the cumulative aggregate data are not as “smooth” so that an acceptable percentage is used as a gauge in variability for either gains or losses in performance. This parameter is of practical importance given that a true plateau may in theory be observed only after very extensive periods (for example, ~1,000 trials; Newell & Rosenbloom, 1981). Values used in the MOUT training system ToT evaluation ranged from 2.5 to 10 percent variability across trials. 2. Period of variability stabilization: In order to determine that a plateau has occurred with an acceptable variability level one must specify a suitable range of continuous performance (for example, performance variability within 10 percent for Y number of trials). Depending on the application, this may range from a few trials to “all remaining trials.” Three to five trials have been used with adequate success (for example, Champney, Milham, Bell Carroll, Stanney, & Cohn, 2006). 3. General location of plateau: When using limited datasets it might be possible to observe, while using only parameters (1) and (2), localized plateaus (that is, periods where performance stabilizes and later increases outside the initial parameters). In such cases one might be required to establish a general rule for the location of the plateau. While this might be determined through a visual inspection, specifying a rule ensures consistency across multiple measures. In the MOUT training system ToT evaluation, the latter 1=3 of the performance range was selected as a general area to qualify as a plateau [that is, the period in the performance scale where performance is hypothesized to stabilize per the universal law of learning; see the shaded area (2) in Figure 24.1]. Given the shape of a typical power curve, one is generally guaranteed that the true plateau would be at the tail of the curve.
Learning Curve Methodology The proposed learning curve methodology involves a series of phases to arrive at an objective recommendation for the point at which to evaluate transfer under the premise that training on a simulator is to be optimized by maximizing improvement gains before testing for any transfer performance. These steps as applied to the MOUT training system evaluation are discussed below. 1. Understanding the data: Before parameters can be established, the nature and behavior of the selected metrics, which are aligned with specific tasks and training objectives, should be analyzed using data from a pilot study. For the MOUT TEE, this was done by constructing cumulative average plots and tables with the performance data collected (see Table 24.2). Important data characteristics to determine are valid ranges, expected variability from trial to trial, and expected data values (for example, continuous or discrete; note that this method is better suited for continuous data). 2. Determining parameter (1): This is performed by identifying acceptable performance gains per trial per metric through a cost-benefit analysis or in the absence of cost criteria, an arbitrary number (for example, 10 percent). In operational terms, this means that when improvement variability across trials is less than this threshold, the cost of continuing to train outweighs the desired benefits (the expected training
Conducting Training Transfer Studies in Complex Operational Environments
251
Table 24.2. Example of Performance Data and Cumulative Data Trial
Performance
Cumulative Performance
1
50%
50.00%
2
60%
55.00%
3
63%
57.67%
4
70%
60.75%
5
75%
63.60%
6
80%
66.33%
gains). For the MOUT TEE, parameter (1) ranged from 2.5 to 10 percent variability across trials depending on the particular measure. 3. Determining parameter (2): This step involves determining an appropriate parameter Y, which is the number of trials to use to determine that a plateau is present. There is no established standard for determining this parameter other than one’s desire for rigor (where a higher number leads to added assurance of a true plateau). Five were used in the MOUT evaluation because, as reported by Champney, Milham, Bell Carroll, Stanney, and Cohn (2006), three produced too many relative plateaus and so a more restrictive number of trials was required. 4. Identifying plateaus in data tables: Utilizing cumulative data tables, the plateau is then identified as the first trial at which the established criteria, parameters (1) and (2), are met. For the MOUT TEE, the plateau was found at different trials depending on the measure; trial 25 was selected as the point of plateau as more than 2=3 of all measures had reached a plateau by that trial number. 5. Visual inspection: Utilizing cumulative data plots, the plateau is next verified to be an absolute plateau with respect to the available data and not just a local one. If it is determined visually that there is considerable learning still occurring past the identified trial, parameter (3) should be applied before reapplying parameters (2) and (3). For the MOUT TEE, it was necessary to apply parameter (3), which resulted in the identification of a more conservative plateau (that is, later in the curve, implying more trials) to minimize the probability of having a relative plateau.
The described TEE learning curve methodology can be used to identify the point at which a TER (Roscoe, 1971) evaluation should be performed; then leverage this calculation to determine the number of trials of live training saved as a result of pre-training. Applying this approach should avoid drawing conclusions about training transfer at points along the learning curve that have yet to stabilize, which is a risk when using the TER.
CONCLUSION With the methodologies provided in this chapter, practitioners can conduct TEEs with considerably less resource involvement, both in terms of participant and evaluation costs. While the methodologies are not intended to replace the
252
Integrated Systems, Training Evaluations, and Future Directions
value afforded from the use of target domain participants or incremental evaluations, in the absence of available resources, the approaches can provide data to support informed decisions regarding training effectiveness. REFERENCES Baldwin, T. T., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Psychology, 41, 63–105. Bassi, L. J., & van Buren, M. E. (1999). 1999 ASTD state of the industry report. Alexandria, VA: The American Society for Training and Development. Carnevale, A. P., & Schulz, E. R. (1990). Economic accountability for training: Demands and responses. Training and Development Journal Supplement, 44(7), pp. s2–s4. Champney, R. K., Milham, L., Bell Carroll, M., Stanney, K. M., & Cohn, J. (2006). A method to determine optimal simulator training time: Examining performance improvement across the learning curve. Proceedings of the Human Factors and Ergonomics Society 50th Annual Meeting (pp. 2654–2658). Santa Monica, CA: Human Factors and Ergonomics Society Champney, R. K., Milham, L. M., Bell Carroll, M., Stanney, K. M., Jones, D., Pfluger, K. C., & Cohn, J. (2006). Undergraduate boot camp: Getting experimental populations up to speed. Proceedings of the Interservice/Industry Training, Simulation & Education Conference. (No. 2976). Arlington, VA: National Defense Industrial Association (NDIA). Eseryel, D. (2002). Approaches to evaluation of training: Theory and practice. Journal of Educational Technology and Society, Special Issue: Integrating Technology into Learning and Working, 5(2), 93–98. Flexman, R. E., Roscoe, S. N., Williams, A. C., Jr., & Williges, B. H. (1972, June). Studies in pilot training (Aviation Research Monographs, Vol. 2, No. 1). Savoy: University of Illinois, Institute of Aviation. Gordon, M. E., Schmitt, N., & Schneider, W. (1984). An evaluation of laboratory research bargaining and negotiations. Industrial Relations, 23, 218–233. Grantcharov, T. P., Bardram, L., Funch-Jensen, P., & Rosenberg, J. (2003). Learning curves and impact of previous operative experience on performance on a virtual reality simulator to test laparoscopic surgical skills. American Journal of Surgery, 185(2), 146–149. Grantcharov, T. P., Kristiansen, V. B., Bendix, J., Bardram, L., Rosenberg, J., & FunchJensen, P. (2004). Randomized clinical trail of virtual reality simulation for laparoscopic skills training. British Journal of Surgery, 91(2), 146–150. Milham, L., Gledhill-Holmes, R., Jones, D., Hale, K., & Stanney, K. (2004). Metric toolkit for MOUT (VIRTE Program Report, Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 1–51). Hillsdale, NJ: Lawrence Erlbaum. Ritter, F. E., & Schooler, L. J. (2001). The learning curve. In International encyclopedia of the social and behavioral sciences (pp. 8602–8605). Amsterdam: Pergamon. Roscoe, S. N. (1971). Incremental transfer effectiveness. Human Factors, 13(6), 561–567. Roscoe, S. N. (1972). A little more on incremental transfer effectiveness. Human Factors, 14(4), 363–364.
Conducting Training Transfer Studies in Complex Operational Environments
253
Roscoe, S. N., & Williges, B. H. (1980). Measurement of transfer of training. In S. N. Roscoe (Ed.), Aviation technology (pp 182–193). Ames: Iowa State University Press. Saks, A. M., & Belcourt, M. (2006). An investigation of training activities and transfer of training in organizations. Human Resource Management, 45, 629–648. Sadri, G., & Snyder, P. F. (1995). Methodological issues in assessing training effectiveness. Journal of Managerial Psychology, 10(40), 30–32. Thompson, C., Koon, E., Woodwell, W. H., & Beauvais, J. (2002). Training for the next economy: An ASTD state of the industry report (Rep. No. #790201). Alexandria, VA: The American Society for Training and Development. Ward, E. A. (1993). Generalizability of psychological research from undergraduates to employed adults. Journal of Social Psychology, 133(4), 513–519. Wintre, M. G., North, C., & Sugar, L. A. (2001). Psychologists’ response to criticisms about research based on undergraduate participants: A developmental perspective. Canadian Psychology, 42, 216–225.
Chapter 25
THE APPLICATION AND EVALUATION OF MIXED REALITY SIMULATION Darin Hughes, Christian Jerome, Charles Hughes, and Eileen Smith Mixed reality (MR) is a blending of technologies that leverage the advantages and challenges of combining the real world with virtual objects and processes. Like virtual environment (VR) systems, MR can create entirely synthetic environments, objects, characters, and interactions, but unlike VR, MR can merge these virtual components with real world environments, objects, human characters, and human-to-human interactions. The advantage of MR is in its ability to set experiences in everyday environments and build off of the kinds of interactions that occur in real world experiences—all the while leveraging the power of richly layered visuals, sounds, and physical effects that are generated computationally. With these benefits come the unique challenges of creating MR simulations. Virtual objects and real objects must be able to exist side-by-side and be properly registered in three-dimensional (3-D) space. Real objects must occlude virtual objects and vice versa depending on their relative location to a user. For example, if a user moves a hand in front of his or her face, virtual objects that are intended to be farther away must not “pop” out of space and into the user’s hand. Additionally, virtual sounds and physical effects must be developed in a way that real world sounds and physical phenomenon are not precluded or contradicted. This chapter describes the MR infrastructure, illustrated in Figure 25.1, developed by the Media Convergence Laboratory at the University of Central Florida, an overview of several unique simulations generated using this infrastructure (MR for training, education, entertainment, and rehabilitation), and last, an evaluation of several of these simulations in terms of training effectiveness and outcomes, knowledge, skill, attitude, presence, and simulator sickness. MR INFRASTRUCTURE The MR infrastructure described in the following sections has four central components: visual, auditory, haptic and digital multiplex (DMX) effects, and
Figure 25.1. A Basic Overview of the MR Infrastructure
256
Integrated Systems, Training Evaluations, and Future Directions
scripting. Taken together, these components enable the creation of richly layered and immersive mixed reality simulations. VISUAL The visual blending of real and virtual objects requires an analysis and understanding of the real objects so that proper relative placement, interocclusion, illumination, and intershadowing can occur. In the system we describe here, we will assume that, with the exception of other humans whose range of movement is intentionally restricted, the real objects in the environment are known and their positions are static. Other research we are carrying out deals more extensively with dynamic real objects, especially in collaborative augmented virtuality environments. Note, for instance, in Figure 25.2 that two people are sitting across from each other in a virtual setting; each has a personal point of view of a shared virtual environment, and each can see the other. In this case, we are using
Figure 25.2. Retroreflective Technology Being Used to Extract Dynamic Silhouettes of Real World Objects
The Application and Evaluation of Mixed Reality Simulation
257
unidirectional retroreflective material so each user can extract a dynamic silhouette of the other (C. E. Hughes, Konttinen, & Pattanaik, 2004). These silhouettes can be used to correctly register players relative to each other and, consequently, relative to virtual assets. The primary visual issues are (a) lighting of real by virtual and vice versa and (b) shadowing of virtual on real and vice versa. We and our colleagues developed the real time algorithms (see, for example, Nijasure, Pattanaik, & Goel, 2003; C. E. Hughes, Konttinen, et al., 2004), some of which were based on work by Haller, Drab, and Hartmann (2003). Here we just note that each real object that can interact with virtual ones has an associated phantom or occlusion model. These phantoms have two purposes. When used as occlusion models, invisible renderings of phantom objects visually occlude other models that are behind them, providing a simple way to create a multilayered scene; for example, the model of a sea creature is partially or fully hidden from view when it passes behind a display case. When used for lighting and shadows on real objects, these phantom models help us calculate shading changes for their associated pixels. Thus, using them, we can increase or decrease the effects of lights, whether real or virtual on each pixel. The specific algorithms we have developed can simply and efficiently run on the shaders of modern graphics cards. This graphics processing unit implementation, as well as careful algorithm design, allows us to achieve an interactive frame rate, despite the apparent complexity of the problem (McGuire, Hughes, Egan, Kilgard, & Everitt, 2003). The virtual objects are lit by the flashlight, and the real box is both lit by the flashlight and darkened by the shadows cast from the virtual teapot and ball. For this simple demonstration, we tracked the box, the “hot spot” on the table and the cylinder using ARToolkit, an image based tracking. In general, though, our preferred tracking method is acoustical, with physical trackers attached to movable objects. Viewing these scenes can be done with a video see-through head-mounted display (HMD), a mixed reality window (a tracked flat screen monitor that can be reoriented to provide differing points of view), or a mixed reality dome. While the HMD is more flexible, allowing the user to walk around an MR setting, even to stare virtual three-dimensional (3-D) characters in the eye, it is more costly and creates far more problems (for example, hygiene, breakage, and physical discomfort) than the MR window or dome. Both the MR window and the dome require an added navigation interface (for example, control buttons and/or a mouse), since neither is moveable, unlike the HMD whose user can walk around, somewhat freely. The window is more flexible than the dome in that it can be physically reoriented, but it lacks the convenient audience view and the sense of immersion (both visual and auditory) of the dome.
AUDITORY In the film industry, there is an expression that “the audio is half the experience.” This expression is validated by the careful attention to sound and music
258
Integrated Systems, Training Evaluations, and Future Directions
in film. However, audio production in the simulation community is often given little attention, if any at all. While simulation companies or research institutes may own expensive audio equipment (such as tracked 3-D headphones, and so forth), they often may not allocate resources toward production techniques or sound designers, depending instead upon nonaudio specialists to insert generic sound effects from purchased libraries. The end result of this process is a shallow, unrealistic, nonimmersive auditory environment. The irony in this arrangement is that audio is at least half of the human experience. Auditory cues are perceived in 360° and on all three axes. Sound can travel through walls and around corners, providing information that is well out of the line of sight. Additionally, audio plays a crucial role in environmental recognition, immersion, and presence and is essential in most forms of communication. The desire to create an easily configurable, powerful audio engine and a high level interface came about over the course of designing audio for interactive experiences during the last several years. These experiences include exhibits at SIGGRAPH (Special Interest Group on Graphics and Interactive Techniques) 2003, ISMAR (International Symposium on Mixed and Augmented Reality) 2003/2004, I/ITSEC (Interservice/Industry Training, Simulation, and Education Conference) 2002/2003, IAAPA (International Association of Amusement Parks and Attractions) 2002/2003, and the Orlando Science Center, as well as longterm installations at the U.S. Army’s Simulation Technology Training Center, Orlando, Florida. Such standard media production tools as sonar, Pro Tools, Cubase, and so forth, while very useful for synthesizing, mixing, and mastering, cannot provide the kind of dynamic control necessary for interactive simulation. In addition to lacking any support for real time spatialization, they do not have features to compensate for suboptimal speaker placement or expanded, multitiered surround systems. Both the previously mentioned features are essential since many interactive experiences must occur in environments where optimal speaker placement is not possible and where sounds along the vertical plane are essential both for immersion and for accuracy of training. The shortcomings of using one of these media production tools in simulations are well documented in the Institute for Creative Technologies’ 2001 audio technical report (Sadek, 2001). Their system was based around Pro Tools. Sounds were generated dynamically by sending MIDI (musical-instrument digital interface) triggers to the applications. While this arrangement had some success with “tightly scripted and choreographed scenarios,” it was entirely incapable of creating any dynamically created panned sounds. Additionally, their system was further hampered by an inability to trigger more than three pre-mixed sound sets simultaneously. We also purchased and investigated the AuSIM 3D Goldminer system, which is an integrated hardware and software solution for audio simulation. While producing fairly realistic surround impressions, its main drawback for mixed reality is its use of headphones as its delivery method. It is very important to be able to hear real world sounds in an MR experience. This is especially true in a military training scenario where it is not only essential to hear the footsteps and
The Application and Evaluation of Mixed Reality Simulation
259
movements of virtual characters, but also of other human interactors. More information can be found about AuSIM 3D through the Web site http://ausim3d.com. For the specific demands of a highly dynamic and immersive MR simulation, it became clear that a system must be built either on top of an existing application programming interface (API) or from scratch. Some of our early attempts involved the use of the Java Media Framework that, while providing dynamic cueing, did not support multichannel output, spatialization, and channel control, among many other things (C. E. Hughes, Stapleton, et al., 2004). An extensive review of the available technology was conducted, including both proprietary and open source hardware and software solutions. Due to the specific demands of the MR audio, EAX (environmental audio extension) 2.0 was selected as the appropriate environment for building interactive audio experiences. However, this technology was also limited in its ability to address specific channels and provide control of multiple hardware arrangement through a single application. In addition, EAX does not provide the kind of low level support for the creation of digital signal processing (DSP) effects—rather a static set of effects are provided. The frustration of working with these technologies due to the particular demands of interactive, immersive simulation led to the design and creation of a custom-built audio engine and high level interface called SoundDesigner. SoundDesigner was conceived as an intuitive application that allows the user to create or modify entire soundscapes, control all output channels of connected sound cards, designate sound states, and support a variety of delivery systems and classifications. The user of SoundDesigner can individually address audio channels and assign some to a surround system while leaving others open for use with such devices as point source speakers. To achieve this, SoundDesigner needed to be built on top of an API that allows low level control over audio hardware. Most computer-generated simulations lack the ability to be reconfigured easily and quickly. SoundDesigner allows nonprogrammers and audio novices to assert a high level of control over the soundscape and auditory structure of a scenario. This is particularly useful in simulations where various factors, such as ambient noise, cueing, expectation, and other important variables, can be modified for purposes of evaluation. This software also allows for easy configuration of new audio scenarios or the alteration of previous simulations without the need for reprogramming. The implementation of a high level interface to an advanced audio architecture such as SoundDesigner requires the definition of new abstractions to represent system components in an intuitive way. The SoundDesigner interface represents individual sound clips in terms of buffers, which represent the discrete samples of a sound and sources, which represent instances of a buffer currently being played. These are common audio concepts in other libraries, such as OpenAL, but SoundDesigner is unique in providing an explicit representation of individual speakers, which it groups and addresses through the interface of channels. Each sound source is played on a specific channel, which is to say that the samples
260
Integrated Systems, Training Evaluations, and Future Directions
generated by that source are mixed, filtered, and output to the speakers bound to that channel. The two fundamental channel types are point-source channels (which simply mix and copy channels to all speakers) and spatialized channels (which use information about the position of sounds and speakers to perform a per-speaker attenuation on samples in order to associate each source with a specific spatial direction). An overview of SoundDesigner features follows: support for 3-D sound, assignable channels (3-D and point source), multitiered speaker configurations, configurable speaker placement, real time spatialization, user placement compensation, timeline triggers prescripted paths with waypoints (linear and curved), real time capture and playback of sound (with full SoundDesinger support), basic DSP (echo and reverb), savable configuration files, such standard features as looping, volume control, and envelopes, and the ability to address multiple sound cards. Once sounds have been arranged inside a SoundDesigner configuration file or set aside as mono, real time sounds, a “naming and associations” document is passed along to the MR StoryEngine programmer for the final phase of integration into a scenario. This document contains all of the user IDs, their story associations (for example,“trainee_fire” is called when the gun is triggered), an indication of whether the sound is prescripted or real time, and the appropriate file directory to find the configuration file and real time sounds. For full implementation details, see D. E. Hughes (2005). HAPTICS AND PHYSICAL EFFECTS The MR infrastructure includes a special-effects engine that employs traditional scenography and show control technology from theme parks to integrate the physical realities of haptic and olfactory devices. For lights, smoke machines, olfactory devices, and so forth, a standard protocol (DMX) is utilized. This allows for dynamic, real time, and variable control over these physical effects. Haptic vests are controlled through a series of actuators that allow for different levels of intensity. In addition to typical haptic devices, such as vests, haptic audio devices are employed as well. Haptic audio refers to sounds that are felt more than heard. Such effects can be achieved using subwoofers and mounted “bass shakers” that physically vibrate floors, walls, and other surfaces. They can be used to increase the sense of realism and impact, but can also be used to provide informational cues. In the case of haptic vests, pressure points can be used to alert participants to the direction of targets or potential threats. These feedback mechanisms can be used in coordination with detection devices. As a personalized, tracked audio display within a haptic vest, these devices provide directional cues without cluttering up the already intense acoustic audioscape. With the use of speakers that vibrate more for feeling than for hearing, an intimate communication of stimulating points in the body provides the approximate orientation of potential threats
The Application and Evaluation of Mixed Reality Simulation
261
that may not be heard or seen. Thus, a threat may be identified by a vibration or combination of vibrations. This information can give an immediate sense of the direction of a threat and its proximity (for example, by making the vibration’s intensity vary with the distance to the threat). It works in essence like a tap on the shoulder to tell the user of a direction without adding to or distracting from the visual or acoustic noise levels. This message is transferred to an alternative sense and thus allows for this critical datum to cut through the clutter of the audiovisual simulation. With targets outside of the line of sight, this approach can significantly reduce a user’s response time. SCRIPTING The current incarnation of our framework utilizes an XML (Extensible Markup Language) scripting language, based on the concepts of interacting agents, behaviors, guards, and state information. This separates the scriptwriters from the internals of the engine, while providing them a meaningful and effective context in which to encode simple, direct behaviors (O’Connor & Hughes, 2005). The reorganization of the system prompted the development of other supporting engines, dubbed auxiliary physics engines (APEs). These engines are responsible for tasks such as pathfinding and ray casting, since our revised architecture attempts to make distinct and clear the tasks of each engine. The philosophy of a distributed system was key to the construction of this framework. The StoryEngine is the hub, providing scriptwriters access to any presentation requirements they need. For complex cases, our XML based script language allows one to escape into a special sublanguage, dubbed the advanced scripting language, or ASL. The ASL provides the ability to code behaviors using C-style programming constructs. Such constructs include loops, conditionals, basic arithmetic, and assignment operations. The script defines a set of agents, each of which generally embodies some character the user may interact with (directly or indirectly). Agents are defined in terms of behaviors, which include actions, triggers, and reflexes and a set of state variables that define an agent’s current state. Each behavior can perform several tasks when called, such as state modification and the transmission of commands to the presentation engines. Thus, agents are the fundamental building blocks of the system. The ability for agents to communicate with each other allows for a “world-direct” representation to be built: developers define a set of agents in terms of how they want them to act around each other, rather than such actions being a side effect of a more program-like structure. The graphics and audio engines understand the same basic set of commands. This allows the scriptwriter to easily generate worlds that offer visual and audio stimulations. Each engine also has a set of commands unique to its particular functionality (for example, audio clips can be looped, and visual models can have associated animations). The SFX engine utilizes the DMX protocol, but control over it originates from the StoryEngine through a series of commands, most of which are defined by
262
Integrated Systems, Training Evaluations, and Future Directions
loadable “DMX scripts.” These scripts are direct control specifications that offer a set of basic functions (typically setting a device to a value between 0 and 255, meaning off to fully on and anything in between). These primitives are hooked together to form complex DMX events. In our older versions of the system, agent information, such as position and orientation, was managed by the graphics engine. This required the StoryEngine to request regular updates, thus causing network congestion when the number of agents was high. The current incarnation of the system does away with this, and now all physics simulation is performed by the StoryEngine. The data are transmitted as a binary stream, encapsulated in a cross-platform and cross-language format. The data stream is denoted the “control stream,” as it controls the position, orientation, velocities, and accelerations of agents. A given control stream is broken up into numbered channels, one channel for each agent (channel numbers are automatically assigned to agents and are accessible through the reserved state variable name channel). This enables us to transmit only a subset of the data, usually only that which has changed since the last transmission. The system scales remarkably well. Many of the distributed capabilities not only involve the major engines, but also a set of utility servers. One type of utility server, the auxiliary physics engine, was referenced earlier. Two APEs were developed for our projects: one to control pathfinding on a walk mesh and another to manage ray casting in a complex 3-D universe. These engines plug in at run time and simply serve the requests of agents. Another utility server is the sensor server. This basically abstracts data from position and orientation sensors into data streams, which are then transmitted across a network to interested clients. This allows any number of agents to utilize the data. The data stream is transmitted via transmission control protocol/Internet protocol for reliability purposes. The data format follows that of the StoryEngine’s control stream data. Thus, to a graphics or audio engine, it is immaterial where control data come from; a given agent’s control may be governed by the user’s own movements or that of a simulated entity. The sensor server also enables us to record user movement, a vital piece of information for after action review (a military training term, but equally important for such rehabilitation applications as MR Kitchen) and cognitive experimentation. The ability to define a set of behaviors to be reused in several scripts came to life in the “script component” architecture. This architecture allows “component” files to be written by the scriptwriter and then be included in any number of scripts. Behaviors or entire agents can be scripted and, consequently, included into the main script. This also means that difficult-to-code behaviors and algorithms can be written once and used repeatedly, without having to perform copy-and-paste operations or rename a vast number of states and agents. The StoryEngine allows object-oriented capabilities, such as prototype based inheritance and delegation, to make coding agents reasonably straightforward and simple.
The Application and Evaluation of Mixed Reality Simulation
263
A final and rather recent innovation to the architecture was that of a remote system interface. Originally designed as an interface to allow remote (over-thenetwork) access to agent state information for display on a graphical user interface, the remote graphical user interface (GUI) protocol also provides a way to transmit information back to the StoryEngine. It is, in effect, a back door into the virtual world controlled by a given script, whereby agent command and control can be affected by a purely remote, alien program. We recently took advantage of this capability to link our system to DISAF (Dismounted Infantry Semi-Automated Forces), an artificial intelligence system that provides behaviors used in many distributed interactive systems applications. Used for its original purpose, the remote GUI protocol and program architecture allow graphical interfaces to be defined by a simple XML file. The file specifies graphical components to be used, as well as options for each component that link it to agents and states in the script. An example of this would be to display the number of times a particular agent was encountered by the user. A simple state variable in the agent itself would keep track, and changes to that information would be retrieved and displayed by the remote GUI. This approach is amenable to a drag-and-drop approach for creating such GUIs, something we will do in the next release of the software. All of the previously described technologies have been configured or hardwired into a number of various testbeds for different applications. The following section will describe notable configurations that have shown promise as useful training testbeds. MR FOR TRAINING—MR MOUT The MR MOUT (military operations in urban terrain) testbed is a training simulation that re-creates urban fac¸ades to represent a 360° mini MOUT site. Tracking employs the InterSense IS-900 acoustical/inertial hybrid system. The tracked area contains virtual people (friends, foes, and neutrals), real props (crates, doors, and a swinging gate), a realistic tracked rifle, real lights, and building fac¸ ades. Standing inside the mini MOUT creates the sense of the reality faced by a dismounted soldier who is open to attack on all sides and from high up. Using a combination of blue screen technology and occlusion models, the real and virtual elements are layered and blended into a rich visual environment. The trainee has the ability to move around the courtyard and hide behind objects with real and virtual players popping out from portals to engage in close-combat battle. The most effective and powerful result of this mixed reality training is the fact that the virtual characters can occupy the same complex terrain as the trainees. The trainees can literally play hide-and-seek with virtual foes, thereby leveraging the compelling nature of passive haptics. Figure 25.3 shows the mini MOUT from the observer station. In the middle, to the right of the observer, you can see the participant with HMD and a rifle. That person’s view is shown on the screen mostly blocked by the observer; the other
Figure 25.3. Participant
MR MOUT: Demonstrates Observer Views (Mixed and Virtual) with a Direct View of the Real World Containing an MR
The Application and Evaluation of Mixed Reality Simulation
265
three views are from an observer camera (middle right) and two virtual characters (lower right and top center). Notice the crates in the view on the middle right side. The models that match these physical assets are rendered invisibly, providing appropriate occlusion (they clip the rendered images of characters that they would partially or totally occlude in the real world). Special effects complete the creation of a realistic combat scenario where the real world around the trainee feels physically responsive. This is done using the SFX engine to control lights, smoke from explosions, and other types of on/off or modulated actions. The system can react to the trainee based on position, orientation, or actions performed with a gun that is tracked and whose trigger and reload mechanism are sensed. For example, the lights on the buildings can be shot out (we use a simple ray casting auxiliary physics engine that returns a list sorted by distance of all intersected objects), resulting in audio feedback (the gunshot and shattered glass sounds) and physical world visual changes (the real lights go out). With all the compelling visual and haptic effects, users’ hearing and training can provide a competitive edge, due to a heightened acoustical situational awareness (D. E. Hughes, Thropp, Holmquist, & Moshell, 2004). They cannot see or feel through walls, around corners, or behind their heads. However, their ears can perceive activity where they cannot see it. In urban combat where a response to a threat is measured in seconds, realistic audio representation is vital to creating a combat simulation and to training soldiers in basic tactics. Standard 3-D audio with earphones shuts out critical real world sounds, such as a companion’s voice or a radio call. The typical surround audio is still two dimensional (x and z axis) with audio assets designed for a desktop video game that tend to flatten the acoustical capture. Our system allows audio to be synchronized temporally and spatially, leading to an immersive experience.
MR FOR EDUCATION—MR SEA CREATURES The experience begins with the reality of the Orlando Science Center’s DinoDigs exhibition hall—beautiful fossils of marine reptiles and fish in an elegant, uncluttered environment. As visitors approach the MR dome, a virtual guide walks onto the screen and welcomes them to take part in an amazing journey. While the guide is speaking, water begins to fill the “hall” inside the dome. As it fills, the fossils come to life and begin to swim around the pillars of the exhibit hall. The dome fills with water and visitors experience the virtual Cretaceous environment. The visitors are able to navigate a rover through the ocean environment to explore the reptiles and fish. The viewing window of the rover is shown in the heads-up display of the MR dome. (See Figure 25.4.) As the experience winds down, the water begins to recede within the dome, and the unaugmented science center hall begins to emerge again. At about the point where the water is head high, a pterodactyl flies overhead, only to be snagged by a tylosaur leaping out of the water. Holding the pterodactyl in its mouth, the tylosaur settles back down to the ocean floor. When all the water
266
Integrated Systems, Training Evaluations, and Future Directions
Figure 25.4. MR Sea Creatures: Free Choice Learning Using Mixed Reality at Orlando Science Center
drains, the reptiles and fish return to their fossilized reality at their actual locations within the hall. A walk into the exhibit space reveals that the tylosaur was trapped in time with the pterodactyl in its mouth. This connection of the MR experience back to the pure real experience is intended to permanently bond the experiences together in the visitor’s mind. The purpose of an informal education experience is to inspire curiosity, create a positive attitude toward the topic, and engage the visitor in a memorable experience that inspires discussion long after the visit. One of our research initiatives is in creating experiential learning landscapes, where the currently harsh boundaries between learning in the classroom, learning at a museum, and learning at home become blurred. MR Sea Creatures is our first MR museum installation intended for this purpose. We have, in fact, already experimented with a nonMR installation that supported extended experiences to the home and school
The Application and Evaluation of Mixed Reality Simulation
267
(C. E. Hughes, Burnett, Moshell, Stapleton, & Mauer, 2002). Its success, though on a small scale, has helped to strengthen our convictions.
MR FOR ENTERTAINMENT—MR TIME PORTAL MR Time Portal, publicly shown at SIGGRAPH 2003, was the first widely seen experience we developed that involved complex 3-D models with rich animations and a nontrivial storyline. Its goal was to immerse participants within a story, with some people at the center of the action and others at the periphery. Figure 25.5 (left) is a scene from an animatic1 we produced that helped us to test story elements while still in VR mode. Figure 25.5 (right) shows the full MR with one person on the right wearing an HMD in order to be embedded in the experience and two people at a vision dome on the left observing the experience from the perspective of an unseen second participant. In essence, this is an MR version of a theme park experience employing those venues’ notion of divers (the ones who get in the action), swimmers (those who influence the action), and waders (those who observe from afar) (Stapleton & Hughes, 2003, 2005). In 2003, our StoryEngine was based on the concept of Java objects holding the states and primitive behaviors of actors, each having an associated finite state machine (Coppin, 2004) that controlled the manner in which these behaviors were invoked based on stimuli such as timed events, GUI inputs, and interactions with other actors. Most actors reflected the virtual and active real objects of the MR world, but some existed to play the roles of story directors, encouraging players in directions deemed most supportive of the underlying story. For instance, the MR time portal contained actors associated with a back-story movie, the portal through which threats to our world arose, various pieces of background scenery, each robotic threat, each friendly portal guard, a futuristic physical weapon, a ray-tracing beam to make it easier to aim the gun, a number of virtual explosions, the lighting rig above the exhibit area, several abstract objects operating as story directors, and, of course, the real persons who were experiencing this world. Each of these actors had optional peers in our graphics engine, audio engine(s), and special effects engine. The reason these are optional is that, at one extreme, abstract actors have no sensory peers; at the other extreme, robotic threats have visual representations, audio presentations that are synchronized in time and place with the visuals, and special effects when the robots hit the ground (a bass shaker vibrates the floor under the shooter); in between are such things as the lighting rig that has only a special effects peer. An actor component, when added to the authoring system, had a set of core behaviors based on its class. An actor class sat at the top of this hierarchy providing the most common default behaviors and abstract methods for required behaviors for which no defaults exist. A finite state machine, consisting of states and transitions, was the primary means of expressing an actor’s behavior. Each transition emanated from a state, had a set of trigger mechanisms (events) that enabled the transition, a set of actions that were started when the transition was selected, and a new state that was entered as a consequence of carrying out the
Figure 25.5. MR Time Portal: Experiential Movie Trailer: (Left) Animatic and (Right) Mixed
The Application and Evaluation of Mixed Reality Simulation
269
transition. A state could have many transitions, some of which had overlapping conditions. If multiple transitions were simultaneously enabled, one was selected randomly. The probability of selecting a particular transition could be increased by repeating a single transition many times (the cost was just one object handle per additional copy). States and transitions could have associated listeners causing transitions to be enabled or conditions to be set for other actors.
MR FOR REHABILITATION—MR KITCHEN The goal of the MR Kitchen was to demonstrate the use of MR in simulating a cognitively impaired person’s home environment for purposes of helping that individual regain some portion of independence. More broadly, the goal was to experiment with the use of MR as a human experience modeler—an environment that can capture and replicate human experiences in some context. Here the experience was making breakfast, and the context was the individual’s home kitchen (Fidopiastis et al., 2005). Experience capture starts by recording the spatial, audio, and visual aspects of an environment. This is done at the actual site being modeled (or a close approximation) so we can accurately reproduce the space and its multisensory signature. To accomplish this we employ a 3-D laser scanner (Riegl LMS-Z420i), a light capture device (Point Grey Ladybug camera), and various means of acoustical capture (Holophone H2-PRO, stereo microphones on grids, transducers to pick up vibrations and sounds in microenvironments, and even hydrophones for underwater soundscapes). Once captured, models of this real environment can be used to augment a real setting or to serve as a virtual setting to be augmented by real objects. This MR setting immerses a user within a multimodal hybrid of real and virtual that is dynamically controlled and augmented with spatially registered visual, auditory, and haptic cues. For our MR Kitchen experiment, we went to the home of a person who had recently suffered traumatic brain injury due to an aneurism. Spending about two hours there, we “captured” his kitchen (see the bottom right monitor of Figure 25.6 for an image of him in his home kitchen). This capture included a point cloud, textures, and the lighting signature of the kitchen and its surrounds (audio was not used for the experiment). We then built parts of the real kitchen out of plywood to match the same dimensions and location of critical objects (pantry, silverware drawers, and so forth). We purchased a refrigerator, cupboard doors, a coffee maker, and toaster oven and borrowed common items (cups, utensils, and favorite cereal). Figure 25.6 shows two participants in this kitchen. The screen on the left shows the view from the man on the right. Notice that the real objects are present; however, the textures of the counter and doors are the same as in the subject’s home. All aspects of the subject’s movement and his interaction with objects and the human therapist are captured by our system as seen in the center monitor of Figure 25.6. This capture includes a detailed map of the subject’s movement and head orientation, allowing for analysis and replay. Additionally, cameras can be
270
Integrated Systems, Training Evaluations, and Future Directions
Figure 25.6. MR Kitchen: Cognitive Rehabilitation—Demonstrates Real and Mixed Views, Captured Movement Data for One Participant, and Subject’s Home Kitchen
positioned to capture any number of observer viewpoints of his activities and those of his therapist. Replaying the experience allows viewing events from multiple perspectives and with appropriate augmentation (for example, data on the user’s response times and measured stress levels). EFFECTIVENESS The lessons learned throughout the MR evolution have led to the design and production of a robust training environment that is capable of simulating real world tasks in a safe environment. Field testing across various applications, including installations for entertainment, free-choice learning, training, and cognitive rehabilitation, provided much feedback to the iterative design lifecycle, as well as revealing new training insights. The success of our MR environment as a training tool has been assessed in a number of ways during a number of research efforts. The following section will summarize how effective this environment can be as part of a training program. TRAINING OBJECTIVES AND/OR OUTCOMES Training effectiveness can be operationally defined a number of different ways summarizing various facets of the participants’ personal change or improvement
The Application and Evaluation of Mixed Reality Simulation
271
over their pre-training state. The overall objectives or outcomes of interest of the training system must be decided upon in order to determine how well the system meets these training goals. Effectiveness is typically quantified by measures of how much information is obtained during and retained after the training tasks, by various measures of skill improvement via task performance, and by changes in motivation or attitude. Knowledge One way to assess training effectiveness is to observe the amount of information a participant is able to recall after training has occurred in the MR environment. MR has high face validity as a learning tool since it presents information from multiple sensory modalities, which improves recall and recognition. Visual images are especially helpful; much research has shown that people seem to be better at remembering images than remembering words (Bower & Winzenz, 1970); however, this imagery benefit can be exploited to improve memory for words as well (Sweeney & Bellezza, 1982). This can be seen nonexperimentally with MR Sea Creatures and MR Kitchen experiences. MR Sea Creatures presents fossils of marine fish and reptiles from the Cretaceous period in an uncluttered learning environment (C. E. Hughes, Stapleton, Hughes, & Smith, 2005). The goal was to present an environment to the users in such a way that would be highly memorable, as well as fun, in order to inspire curiosity and create a positive attitude toward the topic. Subjective questionnaires revealed that 98 percent of the users felt that the environment encouraged longer time spent exploring the sea creatures and that more than 80 percent felt that the system encouraged them to return to the exhibit again. Although this reveals more of the user’s attitude than knowledge gained, more time in the system would likely lead to more information retention; 83 percent of users even reported that they felt as though they had learned more about the sea creatures of the Cretaceous period. Skill Skill or task performance can also be used to assess training effectiveness. If training has been successful, an improvement in task performance can be seen over the performance when the training system is not used. Recent research by Jerome, Witmer, and Mouloua (2005) and Jerome (2006) have used the MR MOUT environment to investigate human perception, attention, and performance. The first goal was to determine if the user can successfully locate where stimuli are being displayed visually, auditorily, or haptically using the MR system. Further, this research explored whether people can focus attention on specific spatial locations of the visual scene when cued from either visual, auditory, or tactile cues; if there are any differences when the user is cued using similar cues, but with no spatial information; if the user can be cued to focus
272
Integrated Systems, Training Evaluations, and Future Directions
attention of differing breadths; and how workload may interact with these modalities. In the first experiment, the effectiveness of the cues was assessed, that is, whether spatial information can be determined from the computer-generated stimuli. Spatial cues were three different types: visual cues, auditory cues, and a combination of visual and audio cues (unfortunately, the tactile vest was not ready in time for the first study). While visual cues led to more targets being accurately acquired than audio cues (the mean difference was −0.537, standard error (SE) of 0.044, and 95 percent confidence interval (CI) −0.628 to −0.447, which does not include zero, indicating that the null hypothesis of zero difference can be rejected), the combination of audio and visual cues together produced even better performance than the visual cues alone (mean difference was 0.100, SE of 0.044, and 95 percent CI 0.009 to 0.191). The presence of visual cues in the visual and audiovisual conditions allowed participants to pinpoint the targets more accurately. The primary value of including audio cues is to direct participants to targets that are not within the immediate line of sight. Audio cues and visual cues did not differ significantly in affecting the speed of acquiring targets. In contrast, the audiovisual combination produced significantly faster acquisition times than either cue modality used alone. In the second experiment, these spatial cues (including tactile cues) were incorporated into an interactive simulated scenario and were also varied in size (small, medium, large, none) to determine the performance effects of different levels of spatial information. During the experimental task, 64 participants searched for enemies (while cued from visual, auditory, tactile, combinations of two, or all three modality cues) and tried to shoot them while avoiding shooting the civilians (fratricide) for two 2 minute low workload scenarios and two 2 minute high workload scenarios. The results showed significant benefits of attentional cuing on visual search task performance as revealed by benefits in accuracy from the presence of the tactile cues (M = 0.44, SD = 0.13) was significantly better than the presence of visual cues (M = 0.31, SD = 0.10), t(42) = −2.57, p < 0.01; and the presence of audio cues (M = 0.36, SD = 0.07) was significantly better than no cues (M = 0.26, SD = 0.07), t(42) = 1.99, p < 0.05) when displayed alone; and the combination of the visual and tactile cues together (M = 0.43, SD = 0.09) was significantly better than the combination of audio cues and visual cues together (M = 0.32, SD = 0.13), t(42) = 2.31, p < 0.05. Fratricide occurrence was shown to be amplified by the presence of the audio cues, that is, significantly higher for the audio cues (M = 0.047, SD = 0.035) than for the control group (M = 0.016, SD = 0.022), t(42) = 2.63, p < 0.01. The two levels of workload produced differences within individual’s task performance for accuracy, F (1, 56) = 6.439, p < 0.05, and reaction time F (1, 56) = 11.426, p < 0.001. Accuracy and reaction time were significantly better with the medium-sized cues than the small cues [accuracy = F(1, 56) = 13.44, p < 0.01, reaction time = F(1, 56) = 4.31, p < 0.05], large cues [accuracy = F(1, 56) = 17.37, p < 0.01, reaction time = F(1, 56) = 8.56, p < 0.01], and the control condition (cues with no spatial information) [accuracy = F(1, 56) = 63.62, reaction time = F(1, 56) = 17.67, p < 0.01] during low workload and
The Application and Evaluation of Mixed Reality Simulation
273
marginally better during high workload. Generally, cue specificity resulted in better accuracy and reaction time with the medium cues. Attitude Attitude and motivation can also be used to assess the effectiveness of a training system. Although knowledge gained and skill improvement are more objective measures that more clearly represent training effectiveness, training could be undermined without motivation and a positive attitude toward the system. Subjective data that represents the user’s feeling toward the system was also collected with the Sea Creatures research described above. Results revealed that 88 percent of users felt that the experience was entertaining, and 84 percent would be motivated to visit this exhibit or similar exhibits in the future. PRESENCE Similar to attitude and motivation, presence does not usually represent training effectiveness; however, research shows that without a feeling of presence, training effectiveness can be undermined, and with presence, training effectiveness can be enhanced. To become immersed, that is, to have the feeling of “being there” in the artificial environment and to become removed from real world stimuli is referred to as presence and enhances the individual’s experience in a VE (Witmer & Singer, 1998). Individual differences can moderate the effects of a particular immersive VE on an individual’s feeling of presence. Individual differences in immersive tendency, aspects of the technology affecting the sense of presence, and negative side effects of the VE causing sickness symptoms may mediate VE task performance and training effectiveness using these systems. Measuring these mediating effects is of great importance toward understanding the relationships among them and, of course, maximizing the effectiveness of the training simulation. In a study by Jerome and Witmer (2004), data from 203 Orlando, Florida, college students from five separate studies using various VEs and tasks were analyzed using the structural equation/path analysis module in STATISTICA. To judge the fit of the hypothesized model to the data, the goodness-of-fit index (GFI) is generally used and should be above 0.90 to show good fit. The GFI for the hypothesized model is 0.94, showing good fit of the data. The results suggest that a sense of presence in VEs may have a direct causal relationship upon VE performance, and immersive tendency and simulator sickness may have an indirect relationship with VE performance, both fully mediated through presence. The findings imply that improving the virtual experience to enhance the feeling of presence may improve human performance on virtual tasks. Also, since immersive tendency is a characteristic of the individual, it may not (easily and/ or quickly) be manipulated. Therefore, VE task performance may not be improved by immersive tendency. However, it may be used as a post hoc explanation of low/high task performance. VE designers may benefit from such results
274
Integrated Systems, Training Evaluations, and Future Directions
such that they could tweak an environment to capitalize on presence-enhancing features. Consequently, the VE may be a more usable, entertaining, and trainable tool. SIMULATOR SICKNESS Sickness symptoms caused by simulator exposure has long been thought to undermine training effectiveness (Kennedy, Fowlkes, & Lilienthal, 1993). Recent research has suggested that the relationship might be indirect, that is, that simulator sickness does not directly cause poor task performance and reduced learning, but first causes a reduction in the feeling of presence, and the reduced presence is directly related to the reduction in training effectiveness. Jerome and Witmer (2004) showed these results in their structural equation modeling analysis described above. The analysis showed a negative zero order correlation between simulator sickness and performance; however, when put into the structural equation modeling model considering the effects of presence and immersive tendency, all of the variance explaining the relationship between sickness and performance is rerouted through the presence construct. These results suggest that reducing the occurrence and severity of simulator sickness may improve task performance indirectly by increasing the sense of presence felt. The MR environments used in the Media Convergence Laboratory have shown very few simulator sickness symptoms and at the same time have created a high level of presence as subjectively reported from the users. Therefore, these MR environments will not have significant reductions in training effectiveness due to these issues. CONCLUSION This chapter describes the evolution of one specific system for authoring and delivering MR experiences. We make no specific claims about its comparative benefits over other systems, such as AMIRE (authoring mixed reality) (Traskback, 2004), MX Toolkit (Dias, Monteiro, Santos, Silvestre, & Bastos, 2003), Tinmith-evo5 (Piekarski & Thomas, 2003), and DELTA3D (http://www.delta3 d.org). Rather, our goal is to note the challenges we faced creating complex MR experiences and, within this context, to describe our means of addressing these issues. As in any project that is coping with an evolving technology, we must sometimes provide solutions using existing and new technologies (for example, solving clipping problems with blue screens and then employing unidirectional retroreflective material in contexts that require the dramatic effects of changing real light). Other times we need to develop new scientific results, especially in the algorithmic area as in addressing realistic illumination and associated shading and shadowing properties in interactive time (Konttinen, Hughes, & Pattanaik, 2005). Yet other times we must create new artistic conventions to deal with issues not easily solved by technology or science (for example, taking advantage of people’s expectations in audio landscapes) (D. E. Hughes et al., 2004).
The Application and Evaluation of Mixed Reality Simulation
275
We believe that the most important properties of the framework we evolved are its use of open software, its protocols for delivering a scalable distributed solution, and its flexible plug-in architecture. In general, flexibility in all aspects of the system has been the key to our success and is helping us to move forward with new capabilities, such as a bidding system for story based rendering. In its present form, our framework still requires scripts to be written or at least reused to create a new experience. Our goal (dream) is to be able to use our experience to capture capabilities to evolve the behaviors of virtual characters in accordance with actions performed by human participants, as well as those of other successful virtual characters. For instance, in a training environment, the actions of an expert at room clearing could be used to train virtual SWAT (special weapons and tactics) team members by example. In a rehabilitation setting, the actions of a patient could be used as a model for those of a virtual patient that is, in turn, used to train a student therapist in the same context. Of course, this is a rather lofty goal, and just making authoring more intuitive, even with dragand-drop, would help. The MR framework described here is a system that is intended to generate, deploy, capture, analyze, and synthesize an interactive story. Whether these stories are designed to train, teach, sell, or entertain is immaterial. The point is that we drive an MR experience by generating a world within, on top, beneath, and around the real world and real senses in which we live. Our goals for this framework and for mixed reality in general are bounded only by our temporal imagination. Tomorrow, we will conceive of new applications of MR, leading to new requirements that continue to guide the evolution of our system and place new demands on our creativity. NOTE 1. An animatic is a simple visual rendering of the story from a single point of view. Its purpose is to communicate the vision of the creative team. This allows the art director, the audio producer, and the lead programmer to effectively exchange ideas and determine each team’s focus.
REFERENCES Bower, G. H., & Winzenz, D. (1970). Comparison of associative learning strategies. Psychonomic Science, 20, 119–120. Coppin, B. (2004). Artificial intelligence illuminated. Sudbury, MA: Jones and Bartlett Publishers. Dias, J. M. S., Monteiro, L., Santos, P., Silvestre, R., & Bastos, R. (2003). Developing and authoring mixed reality with MX toolkit. In IEEE International Augmented Reality Toolkit Workshop (pp. 18–26). Tokyo, Japan. Fidopiastis, C. M., Stapleton, C. B., Whiteside, J. D., Hughes, C. E., Fiore, S. M., Martin, G. A., Rolland, J. P., & Smith, E. M. (2005, September). Human experience modeler: Context driven cognitive retraining and narrative threads. Paper presented at the 4th International Workshop on Virtual Rehabilitation (IWVR2005), Catalina Island, CA.
276
Integrated Systems, Training Evaluations, and Future Directions
Haller, M., Drab, S., & Hartmann, W. (2003). A real-time shadow approach for an augmented reality application using shadow volumes. Proceedings of ACM Symposium on Virtual Reality Software and Technology—VRST’03 (pp. 56–65). Osaka, Japan. Hughes, C. E., Burnett, J., Moshell, J. M., Stapleton, C. B., & Mauer, B. (2002). Spacebased middleware for loosely-coupled distributed systems. Proceedings of SPIE, 4862, 70–79. Hughes, C. E., Konttinen, J., & Pattanaik, S. N. (2004). The future of mixed reality: Issues in illumination and shadows. Proceedings of the 2005 Interservice/Industry Training, Simulation & Education Conference. Arlington, VA: National Training Systems Association. Hughes, C. E., Stapleton, C. B., Hughes, D. E., & Smith, E. (2005). Mixed reality in education, entertainment and training: An interdisciplinary approach. IEEE Computer Graphics and Applications, 26(6), 24–30. Hughes, C. E., Stapleton, C. B., Micikevicius, P., Hughes, D. E., Malo, S., & O’Connor, M. (2004). Mixed fantasy: An integrated system for delivering MR experiences [CDROM]. Proceedings of the VR Usability Workshop: Designing and Evaluating VR Systems. Hughes, D. E. (2005, July). Defining an audio pipeline for mixed reality. Paper presented at the Human Computer Interaction International 2005 (HCII2005), Las Vegas, NV. Hughes, D. E., Thropp, J., Holmquist J., & Moshell, J. M. (2004, November 29–December 2). Spatial perception and expectation: Factors in acoustical awareness for MOUT training. Paper presented at the 24th Army Science Conference (ASC 2004), Orlando, FL. Jerome, C. J. (2006). Orienting of visual-spatial attention with augmented reality: Effects of spatial and non-spatial multi-modal cues. Dissertation Abstracts International, 67 (11), 6759. (UMI No. 3242442) Jerome, C. J., & Witmer, B. (2004, October). Human performance in virtual environments: Effects of presence, immersive tendency, and simulator sickness. Poster presented at the Human Factors & Ergonomics Society’s Annual Conference, New Orleans, LA. Jerome, C. J., Witmer, B., & Mouloua, M. (2005, July). Spatial orienting attention using augmented reality. Paper presented at the Augmented Cognition Conference, Las Vegas, NV. Kennedy, R. S., Fowlkes, J. E., & Lilienthal, M. G. (1993). Postural and performance changes following exposures to flight simulators. Aviation, Space, and Environmental Medicine, 6(10), 912–920. Konttinen, J., Hughes, C. E., & Pattanaik, S. N. (2005). The future of mixed reality: Issues in illumination and shadows. Journal of Defense Modeling and Simulation, 2(1), 51–59. McGuire, M., Hughes, J. F., Egan, K. T., Kilgard, M. J., & Everitt, C. (2003). Fast, practical and robust shadows (Brown University Computer Science Tech. Rep. No. CS-0319). Retrieved September 27, 2004, from http://www.cs.brown.edu/publications/ techreports/reports/CS-03-19.html Nijasure, M., Pattanaik, S. N., & Goel, V. (2003). Interactive global illumination in dynamic environments using commodity graphics hardware. Proceedings of Pacific Graphics 2003 (pp. 450–454). Canmore, Alberta, Canada.
The Application and Evaluation of Mixed Reality Simulation
277
O’Connor, M., & Hughes, C. E. (2005). Authoring and delivering mixed reality experiences. Proceedings of 2005 International Conference on Human-Computer Interface Advances in Modeling and Simulation—SIMCHI’05 (pp. 33–39). Las Vegas, Nevada. Piekarski, W., & Thomas, B. H. (2003). An object-oriented software architecture for 3D mixed reality applications. Proceedings of the IEEE and ACM International Symposium on Mixed and Augmented Reality—ISMAR 2003 (pp. 247–256). Tokyo, Japan. Sadek, R. (2001). 3D sound design and technology for the sensory environments evaluations project: Phase 1 [Online]. http://www.ict.usc.edu/publications/ICT-TR01-2001.pdf Stapleton, C. B., & Hughes, C. E. (2003). Interactive imagination: Tapping the emotions through interactive story for compelling simulations. IEEE Computer Graphics and Applications, 24(5), 11–15. Stapleton, C. B., & Hughes, C. E. (2005). Mixed reality and experiential movie trailers: Combining emotions and immersion to innovate entertainment marketing. Proceedings of 2005 International Conference on Human-Computer Interface Advances in Modeling and Simulation—SIMCHI’05 (pp. 40–48). Las Vegas, Nevada. Sweeney, C. A., & Bellezza, F. S. (1982). Use of the keyword mnemonic in learning English vocabulary words. Human Learning, 1, 155–163. Traskback, M. (2004). Toward a usable mixed reality authoring tool. In the 2004 IEEE Symposium on Visual Languages and Human Centric Computing (pp. 160–162). Rome, Italy. Witmer, B. G. & Singer, M. J. (1998). Measuring presence in virtual environments: A presence questionnaire. Presence: Teleoperators and Virtual Environments, 7(3), 225–240.
Chapter 26
TRENDS AND PERSPECTIVES IN AUGMENTED REALITY Brian Goldiez and Fotis Liarokapis Training in the real environment is not easy, mainly due to sociotechnological barriers. This chapter explores the potential effectiveness of augmented reality (AR) applied to training. We discuss previous applications of AR in live training and findings arising from formative evaluations of these systems. Various approaches for applying AR to training are discussed. Overviews of the most characteristic evaluation methods, as well as suggestions on assessing the performance of AR in training, are provided. INTRODUCTION AR describes a technology where the real world is the baseline, and additional information from a computer-generated sensory display is added. AR is contrasted with virtual reality (VR) where the baseline is a synthetic (artificial) environment and the desired state is complete immersion of the human sensory system within a computer-created environment. As one adds more computer augmentation to a real world, the demarcation between virtual and augmented becomes blurred. Rapid advances in technology have contributed to blurring. Milgram (2006) has characterized a variety of continuums between the real and virtual worlds that reflect different ways one can view the interaction and use of the technology. The confluence of views of realities provides opportunities for adapting technologies from one domain to another and the opportunity to adapt human performance studies across domains. There are three major characteristics of AR systems described by Azuma (1997). First, AR systems must seamlessly combine the real world with virtual information. This combination is typically considered in the visual domain, but is not exclusively restricted to it. Second, an AR system must operate in real time. That is, an AR system must provide responses commensurate with the system using the AR, in this case a human. Third, an AR system must spatially register the display with the real world in three-dimensional (3-D) space. Currently there are two broad application areas for AR: decision-making tasks, where mobility
Trends and Perspectives in Augmented Reality
279
is not critical, and tasks where mobility is of primary importance (for example, navigation). Technical Requirements for AR Systems Research in tracking, display, and interaction technologies contributes to the immersiveness of AR systems, and new mathematical algorithms improve the effectiveness of the software system and realism of the visualization output. To correctly register the computer-generated objects and the real world, accurate tracking of the coordinates of the participant’s point of view is required. When training requires a stationary user position, the registration process is much easier than if the user is moving (Azuma, 1997). In both systems, accurate alignments between the virtual and real objects are required to avoid the appearance of floating objects. The common tracking techniques are vision or sensor based. Sensor based systems typically use magnetic or sonic devices to detect the user’s position, while vision based systems use cameras and visually distinctive markers to locate the user’s position in an environment. These markers, called fiducials, are used in indoor systems with good results (Kato, Billinghurst, Poupyrev, Imamoto, and Tachibana, 2000). Other indoor systems employ hybrid-tracking technologies, such as magnetic and video sensors, to achieve good registration results. In outdoor and mobile systems, other sensor devices are used, such as the global positioning system (GPS) coupled with orientation sensors (for example, digital compasses). AR systems typically use special displays to immerse participants in augmented environments. Head-mounted displays (HMD) for AR include optical or video see-through, which project merged computer-generated and real images onto the user’s eyes. However, there are other ways to immerse the user, such as large-area displays or stereoscopic glasses. Regarding interaction technologies, many commercially available hardware devices can be used to increase the level of interaction between participants and computing devices. Thus, a robust AR system must integrate an ergonomic software and hardware framework and address the following issues: (1) calibration and accurate userviewing position, (2) natural interaction, and (3) realistic rendering of virtual information. REVIEW OF STATE OF THE ART IN AR The literature is organized by successively considering AR applications, prototypes, components, and concepts. AR applications exist (principally in laboratories or as prototypes) and have been subjected to some type of evaluation by humans. Prototypes are essentially AR systems that have been created, but generally have not been evaluated by users. Components are subsystems of AR systems. Concepts are ideas or concerns that have not been reduced to practice. A further categorization of AR literature supports work by Goldiez (2004) that pointed to technological hurdles in AR in the areas of tracking and visualization.
280
Integrated Systems, Training Evaluations, and Future Directions
Tracking can currently be accomplished at a precise level in small spaces and in gross terms in larger areas. Visualization that makes added content indistinguishable from the real world requires graphics processing and display technology that does not currently exist in a mobile computing environment and minimally exists in a fixed setting. Many technical problems are mitigated when mobility is restricted, but with large impacts on cost and/or flexibility in AR usage. As AR systems are deployed for experimentation and demonstration, new issues arise, principally in human factors and ergonomics, because the focus of AR expands from technical only to encompass usage. A more complete review of the literature can be found in Goldiez, Sottilare, Yen, and Whitmire (2006). It is worth mentioning that there is no perfect AR technology, and all existing ones have some advantages as well as limitations. To overcome the limitations of each technology hybrid, AR systems can be employed to meet requirements that do not fall strictly into one category noted above. These hybrid AR systems combine different vision techniques and hardware devices to achieve results that better meet a user’s requirements. Obviously, hybrid systems can further immerse participants, but they will also generally increase the overall cost of the AR system because they might stretch the limits of the technology and have special integration and operational needs.
AR Application Domains AR systems have been developed to facilitate improved human performance in such areas as entertainment, medicine, communications, navigation/decision making, and military-oriented operations. Entertainment AR is being used in several areas in the entertainment industry. As examples, Liarokapis (2006b) describes how to transform a traditional arcade game into 3-D and then into an AR interactive game. Initial studies found that users preferred the AR experience in terms of enjoyment. Cavazza, Martin, Charles, Marichal, and Mead (2003) created an interactive system that immerses the storyteller into the background environment, while Gandy et al. (2005) integrates users into a scenario based on the Wizard of Oz. A simple tennis game has been developed using commercially available Bluetooth cellular technology (Henrysson, Billinghurst, & Ollilia, 2005). Medicine The medical field currently benefits from AR systems. For example, a virtual retinal display is being used for patients who suffer from poor vision and as a surgical display (Viirre, Pryor, Nagata, & Furness, 1998). Scheuering, Rezk-Salama, Barfufl, Schneider, and Greiner (2002) report on using a video see-through HMD to overlay imagery during surgical procedures. Also, Vogt, Khamene, Sauer, Keil, and Niemann (2003) developed a system to visualize X rays, CT scans,
Trends and Perspectives in Augmented Reality
281
and so forth, onto a person or mannequin by utilizing a retroreflective marker tracking system. Communication Several AR systems have been developed to facilitate communication and collaboration. Regenbrecht et al. (2003) describe an AR conferencing system allowing users to meet without leaving their desks. Billinghurst, Belcher, Gupta, and Kiyokawa (2003) describe two experiments investigating face-to-face collaboration using a multiuser AR interface. These results, however, found no advantage in using AR due to limitations from restricted peripheral vision. Navigation AR has been used to facilitate navigation and wayfinding. As part of the LOCUS project, Liarokapis (2006c) developed a system that uses AR and VR techniques to enhance mobile navigation by guiding pedestrians between locations in urban environments. Two prototypes were developed for outdoor navigation, one based on manually placed fiducials and another based on natural feature selection. The first prototype has robust tracking, but limited range, while the opposite is true for the second prototype. A hybrid approach using natural features and GPS is being researched that should provide better tracking efficiency. Goldiez (2004) utilized the Battlefield Augmented Reality System (BARS) to study the benefits of using AR in search and rescue navigation by exploring using different map displays to facilitate navigation through a maze. Results determined that BARS does improve user performance in specific situations. Spatial Relations Using AR Bennet and Stevens (2004) describe a projection augmented, multimodal system to explore how interaction with spatially coincident devices affects perception of object size. Results showed that performance in combined (visual/ haptic) conditions was more accurate in distance estimation, verifying the theory that a person’s perception of size is magnified by using more than one sense. Grasset, Lamb, and Billinghurst (2005) investigated how a pair of users, one user utilizing AR (an exocentric view of the maze) and one utilizing VR (an egocentric view of the maze) can accomplish a collaborative task. Results concluded mixed space AR collaboration does not disrupt task efficiency. Military-Oriented AR Systems BARS is an important military based AR application that was developed by the Naval Research Laboratory for use in urban settings. BARS has served as a de facto integration platform for a number of technological and humanperformance research efforts. For example, it has been used in several experiments investigating the impact of various technological innovations on human performance (for example, Goldiez, 2004; Livingston, Brown, Julier, & Schmidt,
282
Integrated Systems, Training Evaluations, and Future Directions
2006). Livingston et al. developed innovative algorithms to facilitate pointing accuracy and the sharing of information among BARS users. Additionally, Franklin (2006) discussed experiments using a system similar to BARS, but developed by QinetiQ to assess the maturity of AR to supplement live training. In the QinetiQ-developed system a virtual aircraft was inserted into live ground assets, which could see and interact with live participants, but live participants had no knowledge of the virtual world. The results suggested that a more robust interface to the live environment was necessary and the bulkiness of the AR equipment was an impediment to performance. To overcome limitations in the field of view, users suggested the use of small visual icons on the display periphery to cue the user to the aircraft position. Discrepancies between the real and synthetic worlds with respect to environmental effects were problematic to training.
AR Components At a top level, AR components include visual software and hardware, spatial tracking devices, other sensory devices, computing, and consideration of ergonomics. Integrating components creates an AR system. Visual Components Visual software and hardware are key factors distinguishing AR from VR. Superimposing virtual images onto a real background is challenging and relies on efficient processing to create realistic scenes, compensation for motion, and tracking tools for placing images in the correct position. Several factors contribute to the VR-AR distinction, including the need in AR to accommodate differences in dynamic changes in brightness and contrast between the real and virtual parts of the scene, latency in overlaying the virtual image onto the real world, image fidelity differences, helmet-mounted display weight, and so forth. A variety of visualization research has been conducted to enhance AR. A novel approach was taken by Fischer, Bartz, and StraBer (2005), who reduced the visual realism of the real environment to better match the computer-generated object(s) being superimposed onto the real world. An alternative approach for interacting with smaller 3-D objects in AR is suggested by Lee and Park (2005), who use blue augmented foam as a marker. Mohring, Lessig, and Bimber (2005) describe the technology of video see-through AR and its development on a consumer cell phone achieving 16 frames per second. Ehnes, Hirota, and Hirose (2005) have developed an alternative to the HMD based on a computercontrolled video projection system that displays information in the correct place for a user. Tracking Components Tracking in AR is the operation of measuring the position of real 3-D objects (or humans) that move in a defined space. Six degrees of freedom (6 DOF)
Trends and Perspectives in Augmented Reality
283
tracking is referred to as the simultaneous measurement of position and orientation in some fixed coordinate system, such as the earth. It is normally required that the location of the tracking device (for example, a camera) and the item being tracked (for example, a trainee) be simultaneously and continuously known in 6 DOF. The most significant technologies available for tracking in AR environments can be subdivided into six broad categories: mechanical, electromagnetic, optical, acoustic, inertia, and GPS. As with visual systems, tracking systems drive AR implementations into fixed or limited motion situations to allow for display rendering and for precisely tracking human appendages or important components. Wider-range motion AR systems are less precise and therefore limit the degree the virtual image aligns with the real world. Computer vision tracking is also a major area of research for AR. Vision based tracking (Neumann & You, 1999) enables the potential recognition of an object in a natural environment that serves as a fiducial. Software algorithms have been developed by Behringer, Park, and Sundareswaran (2002) to use vision tracking to recognize buildings and/or structures. Naimark and Foxlin (2005) describe the development of a hybrid vision-inertial self-tracker that utilizes light emitting diodes (LEDs). Tenmoku, Kanbara, and Yokoya (2003) describe an alternative to vision based tracking that integrates magnetic and GPS sensors for indoor and outdoor environments. The user’s location is tracked utilizing a combination of radio frequency identification (RFID) tag(s) deployed in the environment, GPS (outdoors), and magnetic (indoors) sensors.
Human Factors/Mobility Even a system with flawless tracking and visual augmentation would be worthless if the user were unable to perform the desired tasks comfortably and effectively; thus, ergonomics cannot be overlooked in AR development. Weight, location of controls, and mobility all influence user performance. Liarokapis (2006a) presents an overview of a multimodal AR interface that can be decomposed into offline, commercially produced components. A variety of interaction paradigms, such as the use of fiducial based icons, support physical manipulation of an object. Vogelmeier, Neujahr, and Sandl (2006) from the European Aeronautic Defence and Space Company discuss the need for similarity in various sensory interactions when wearing AR and/or VR equipment as compared to the real world. An attractive feature of AR is mobility and with it possible extensions in the variety and range of human interactions. Tappert et al. (2001) and Espenant (2006) discuss the possibilities of using AR based wearable devices as visual memory prosthetics or for training. Mobility in AR will also require considering user location. For example, Butz (2004) discusses approaches that consider using radio links and infrared or third-generation cellular technology to support mobility, enabling the acquisition of the user’s location for subsequent processing of relevant data.
284
Integrated Systems, Training Evaluations, and Future Directions
ADVANCED CONCEPTS IMPACTING AR The markets will determine when several technologies important to AR emerge, as it appears that several needed technical innovations are dependent upon developments in the commercial sector. These interrelated areas include advances in power management, computer packaging, and communications. Power management (power sources and power-consuming devices) is important to sustained mobility and operations in AR. Computer packaging is another area where the commercial market will determine what products become available. The literature alludes to the need for devices that consume less power and are more compactly packaged. Handheld and mobile computing may become an advantageous platform for hosting AR applications. Emerging mobile technology employs on-board computing and graphics rendering resources that are useful for AR applications. Researchers (for example, Liarokapis, 2006c) are exploiting this technology, but are not creating the hardware or software operating systems. They are dependent upon the mobile industry to create products that are useful to AR while also serving the wider cellular marketplace. This type of leveraging is advantageous as development costs and economies of scale are borne by someone other than the AR community. However, the AR community must stand by the sidelines and wait for developments that may or may not occur. A review of the literature suggests that when real and virtual environments are mixed, handling interruptions is a major unresolved issue. Unanticipated items (for example, people) crossing the field of view could result in unacceptable anomalies in the AR visualization. The work of Drugge, Nilsson, Liljedahl, Synnes, and Parnes (2004) showed that interruptions in AR occur due to unforeseen events (for example, someone walking across a scene causing visual anomalies), but also are due to the tasks conducted by the user (for example, divided attention tasks). This work could be significant to AR in providing a strategy for handling events that occur in the virtual world when mixed with the real world. Conceptually, one could envision an AR user marking an item of interest and having the AR system report back if the item’s situation had changed, thereby possibly mitigating divided attention related issues. Understanding context is another concept where a better understanding of the impact of mixing environments to create viable AR implementations is needed. Because AR uses the real world, which is naturally multimodal, it is not yet clear what information needs to be captured prior to and during an AR experience to understand human activity that occurs during the AR experience. A wide range of environmental data and externally originated sensory stimuli could be relevant to creating an appropriate and dynamic AR experience. In conclusion, AR systems-oriented research and development progress in the United States has been principally technological. Formal evaluations of this technology are not yet evident in the training-related areas. Future work currently sponsored by the European Commission will create new VR and AR systems along with formal evaluations for various purposes.
Trends and Perspectives in Augmented Reality
285
AR UTILITY FOR TRAINING AR seems ideally suited to support training in navigation, manipulation of items, and decision making. Experimentation has indicated benefits for using AR in training for certain applications. Early work demonstrated its usefulness in manipulation and spatial experiments (Goldiez, 2004). AR’s role in supporting decision making requires a longer-term view with enhancements needed in technology before human performance benefits can be realized (Franklin, 2006). AR has shown benefits to enhance human performance in navigating, and near-term benefits in training appear promising. In live (or live/virtual) exercises, AR could serve as an on-board instructor, guiding the trainee should he or she become lost or venture outside the desired training area. This capability could greatly simplify the tracking problems in AR by allowing the use of GPS or RFIDs for gross tracking and a more precise tracking mechanism at critical locations. Thus, for training, the aforementioned tracking problem can be controlled by appropriate scenario design coupled with the use of AR as a surrogate instructor. AR offers the opportunity to improve various training subsystems. Visual simulation immediately comes to mind because of the potential for video or optical see-through devices to add (or subtract) content from a scene. AR, though, can also augment the instructor by providing in situ tutoring (such as hints when the trainee is lost while learning to navigate) and individualized after action review of trainee activity in live and/or virtual exercises. Mobile AR also offers the potential for personalized training by providing information in a form most suitable for the user’s needs. At a conceptual level AR can also be envisioned as a technology that will facilitate better methods in team training. Because of its ability to provide additional information display, as well as information storage and persistence, AR can facilitate mitigating team situational awareness issues by providing pointers and nonverbal communication into areas for team attention. It is logical to envision this sharing of information and enhanced situational awareness being used as a tool for training. Dr. Walter Van de Velde, Program Officer for the European Commission’s Future and Emerging Technology Initiative, noted the following in a brochure e-mailed to said author on August 11, 2006: Current virtual and augmented reality environments try to provide the best display realism, taking for granted that this automatically leads to the best user-experience. Practice shows that this is not true: users do not easily feel fully engaged in hightech VR worlds. On the other hand they can feel extremely present in simpler environments, like when chatting on line or when reading a book. A better understanding of this [presence] will give rise to new immersive interface technologies that exploit human perceptual, behavioral, cognitive and social specificities for stimulating a believable and engaging user-experience of presence, in spite of using artificial stimuli. (Van de Velde, 2006)
286
Integrated Systems, Training Evaluations, and Future Directions
Investigations into measuring and controlling presence are potentially critical for training using AR because users will be interacting with real and virtual items and could need to distinguish between the two. Properly structured research in this area would thus yield valuable insights into strategies for handling interruptions. After action review systems for live and virtual training have been prototyped; however, AR adds new complexities. An appropriate after action review for AR should include the following: capturing relevant contextual information in the real world, identifying interruptions, and handling or correlating varying spatial positions and poses of the trainee with his or her real and virtual positions. Moreover, AR has the huge potential for improving training by integrating new and existing skills in training. In some cases, this might be done by providing AR training systems that have unique capabilities for testing and evaluating trainees. From another perspective more research into AR interface issues will likely help answer some key questions, as well as help foster better training solutions and applications. Some additional aspects of the utility of AR for training could include enhanced assessment and diagnostic capabilities in the real time portion of the system allowing trainees the ability to review actions and decisions from different perspectives. Potentially such AR systems could have the capability to visually compare the trainee’s paths, actions, decisions, and so forth to those of experienced experts such that trainees could see (and the instructor could discuss) differences between the novice’s and the expert’s actions. Several aspects of human-centered design should be studied with respect to making AR better suited to supporting training in various vocations. These include personalizing the software for training to certain classes of individuals and human factors considerations for hardware, noted above. The work of Liarokapis (2006c) using mobile technology adapted for VR and AR shows great promise for training, using virtual scenes at modest prices and good operating performance. Coupling location awareness (through techniques such as RFIDs) with a digital compass provides reasonable information on user location. Rendering time and data transfer rates are currently insufficient for real time operation, but advances are being made by the cellular community. These types of devices represent a viable future delivery mechanism.
CONCLUSIONS AR is an exciting technological development offering the opportunity to overcome many of the limitations in individualized virtual environment systems. These include performance limitations, such as self-motion, and programmatic limitations, such as high costs and relatively large facility requirements. AR has its own set of issues, as noted in this chapter, that are being addressed by research teams across the globe. Most AR activity has been focused on computer graphics fused to the real world to create an immersive environment. While fully immersive systems are beneficial, there are more immediate and near-term opportunities for less immersive AR systems. A principal benefit in using AR is its apparent ease of deployment. Such deployable systems employing wearable
Trends and Perspectives in Augmented Reality
287
computers provide increased flexibility for AR’s use when and where needed. Moreover, coupling the broader view of AR with its classification into three categories and two usage areas encourages experimentation and development along more focused lines of research. ACKNOWLEDGMENTS Part of the work presented herein was supported by the U.S. Army Research Institute for Behavioral Sciences. Also, part of the work presented has been conducted within the LOCUS project. The views expressed herein, though, are those of the authors and do not reflect an official position of a government agency. REFERENCES Azuma, R. T. (1997). A survey of augmented reality. Presence: Teleoperators & Virtual Environments, 6(4), 355–385. Behringer, R., Park, J., & Sundareswaran, V. (2002). Model-based visual tracking for outdoor augmented reality applications. International Symposium on Mixed and Augmented Reality, 01, 277–322. Bennet, E., & Stevens, B. (2004). The effect that haptically perceiving a projection augmented model has on the perception of size. Third IEEE and ACM International Symposium on Mixed and Augmented Reality, 03, 294–295. Billinghurst, M., Belcher, D., Gupta, A., & Kiyokawa, K. (2003). Communication behaviors in collocated collaborative AR interfaces. International Journal of HumanComputer Interaction, 16(3), 395–423. Butz, A. (2004). Between location awareness and aware locations: Where to put intelligence. Applied Artificial Intelligence, 18(6), 501–512. Cavazza, M., Martin, O., Charles, F., Marichal, X., Mead, S. J. (2003). User interaction in mixed reality interactive storytelling. The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, IEEE Computer Society, 304–305. Drugge, M., Nilsson, M., Liljedahl, U., Synnes, K., & Parnes, P. (2004). Methods for interrupting a wearable computer user. Proceedings of the Eighth International Symposium on Wearable Computers (pp. 150–157). Washington, DC: IEEE Computer Society. Ehnes, J., Hirota, K., Hirose, M., (2005). Projected augmentation-augmented reality using rotatable video projectors. Third IEEE and ACM International Symposium on Mixed and Augmented Reality, 03, 26–35. Espenant, M. (2006). Applying simulation to study human performance impacts of evolutionary and revolutionary changes to armoured vehicle design. In Virtual Media for Military Applications (RTO Meeting Proceedings No. RTO-MP-HFM-136, pp. 17-1– 17-2). Neuilly-sur-Seine, France: Research and Technology Organisation. Fischer, J., Bartz, D., & StraBer, W. (2005). Stylized augmented reality for improved immersion. IEEE Virtual Reality, 01, 195–202. Franklin, M. (2006). The lessons learned in the application of augmented reality. Virtual Media for Military Applications (RTO Meeting Proceedings No. RTO-MP-HFM-136, pp. 30-1–30-8). Neuilly-sur-Seine, France: Research and Technology Organisation.
288
Integrated Systems, Training Evaluations, and Future Directions
Gandy, M., Macintyre, B., Presti, P., Dow, S., Botter, J., Yarbrough, B., Oapos, Initial, & Rear, N. (2005). AR karaoke acting in your favorite scenes. Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality, 04, 114–117. Goldiez, B. F. (2004). Techniques for assessing and improving performance in navigation and wayfinding using mobile augmented reality. Dissertation Abstracts International, 66(02), 1206B. (UMI No. 3163584) Goldiez, B. F., Sottilare, J., Yen, C., & Whitmire, J. (2006, November). The current state of augmented reality and a research agenda for training. (Tech. Rep., Contract No. W74V8H-06-C-0009). Orlando, FL: U.S. Army Research Institute for Behavioral Sciences. Grasset, R., Lamb, P., & Billinghurst, M. (2005). Evaluation of mixed-space collaboration, Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality, 04, 90–99. Henrysson, A., Billinghurst, M., & Ollilia, M. (2005). Face to face collaborative AR on mobile phones, Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality, 04, 80–89. Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K., & Tachibana, K. (2000). Virtual object manipulation on a table-top AR environment. Proceedings of the International Symposium on Augmented Reality (pp. 111–119). Washington, DC: IEEE Computer Society. Lee, W., & Park, J. (2005). Augmented foam: Tangible augmented reality for product design. Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality, 04, 106–109. Liarokapis, F. (2006a). An augmented reality interface for visualizing and interacting with virtual content. Virtual Reality. 11(1), 23–43. Liarokapis, F. (2006b). An exploration from virtual to augmented reality gaming. Simulation & Gaming, 37(4), 507–533. Liarokapis, F. (2006c) Location based mixed reality for mobile information services. Advanced Imaging: Solutions for the Electronic Imaging Professional, 21, 22–25. Livingston, M. A., Brown, D. G., Julier, S. J., Schmidt, G. S. (2006). Mobile augmented reality: Applications and human factors evaluations. Advanced Information Technology Code 5580. Washington, DC: Naval Research Laboratory. Milgram, P. (2006). Some human factors considerations for designing mixed reality interfaces. Virtual Media for Military Applications (RTO Meeting Proceedings No. RTOMP-HFM-136, pp. KN1-1–KN1-14). Neuilly-sur-Seine, France: Research and Technology Organisation. Mohring, M., Lessig, C., & Bimber, O. (2005). Video see-through AR on consumer cellphones. Third IEEE and ACM International Symposium on Mixed and Augmented Reality, 3, 252–253. Naimark, L., & Foxlin, E. (2005). Encoded LED system for optical trackers. Fourth IEEE and ACM International Symposium on Mixed and Augmented Reality, 4, 150–153. Neumann, U., & You, S. (1999). Natural feature tracking for augmented reality. IEEE Transactions on Multimedia, 1, 53–64. Regenbrecht, H., Ott, C., Wagner, M., Lum, T., Kohler, P., Wilke, W., & Mueller, E. (2003). An augmented virtuality approach to 3D videoconferencing. The Second IEEE and AC International Symposium on Mixed and Augmented Reality, 02, 290–291. Scheuering, M., Rezk-Salama, C., Barfufl, H., Schneider, A., Greiner, G. (2002). Augmented reality based on fast deformable 2D-3D registration for image guided surgery.
Trends and Perspectives in Augmented Reality
289
In S. K. Mun (Ed.), Medical Imaging 2002: Visualization, Image-Guided Procedures, and Display (pp. 436–445). Bellingham, WA: International Society for Optical Engineering. Tappert, C. C., Ruocco, A. S., Langdorf, K.A., Mabry, F. J., Heineman, T. A., Brick, D. M., et al. (2001). Military applications of wearable computers and augmented reality. In W. Barfield & C. Thomas (Eds.), Fundamentals of wearable computers and augmented reality (pp. 625–647). Mahwah, NJ: Lawrence Erlbaum. Tenmoku, R., Kanbara, M., & Yokoya, N. (2003). A wearable augmented reality system for navigation using positioning infrastructures and a pedometer. The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2, 344–345. Van de Velde, W., (2006). Presence and interaction in mixed-reality environments (FET Proactive Initiative). Unpublished manuscript. Viirre, E., Pryor, H., Nagata, S., and Furness, T. A. (1998). The virtual retinal display: A new technology for virtual reality and augmented vision in medicine, In D. Stredney & S. J. Weghorst (Ed.), Proceedings of Medicine Meets Virtual Reality (pp. 252– 257). Amsterdam: IOS Press and Ohmsha. Vogelmeier, L., Neujahr, H., & Sandl, P. (2006). Interaction methods for virtual reality applications. In Virtual Media for Military Applications (RTO Meeting Proceedings No. RTO-MP-HFM-136, pp. 14-4–14-8). Neuilly-sur-Seine, France: Research and Technology Organisation. Vogt, S., Khamene, A., Sauer, F., Keil, A., Niemann, H. (2003). A high performance AR system for medical applications. The Second IEEE and ACM International Symposium on Mixed and Augmented Reality. Los Alamitos, CA: IEEE Computer Society.
Chapter 27
VIRTUAL ENVIRONMENT HELICOPTER TRAINING Joseph Sullivan, Rudolph Darken, and William Becker Through the last decade, the virtual environment (VE) community became interested in the use of VEs for training a variety of tasks, particularly spatial tasks. Because VEs, unlike conventional interactive computing environments, are inherently spatial, it is reasonable to assume that spatial tasks might be performed better in VEs and possibly trained better in VEs. We began a line of research focused on spatial navigation in VEs, initially confining ourselves to terrestrial navigation in both urban and natural terrains (Banker, 1997; Darken & Banker, 1998; Goerger et al., 1998; Jones, 1999). We were able to show how a VE could be used to develop spatial knowledge of a real environment via exposure to a VE simulation of that place. While much progress was made, a common criticism in the training domain (as opposed to mission rehearsal) was that it was cheaper to train land navigation by practicing map and compass skills in a physical environment rather than in a VE. The same could not be said, however, when we began to look at helicopter navigation where every hour in the air is extremely expensive and consequently limited. What we did not know is how much of what we had learned about navigation on the ground would translate to the air. In addition to advancing our basic understanding of human spatial cognition, we needed to significantly improve how aviators are trained to navigate from the air. We started by contrasting the tasks of helicopter navigation and land navigation. In the military setting, navigation is rarely a primary goal; it is a necessary component of a larger mission. Land navigation and helicopter navigation both rely on terrain association skills: the ability to match a two-dimensional map representation of a terrain feature with a feature within the field of view. The primary differences are altitude, speed, maneuvering limitations, and available field of view. At typical helicopter speeds and altitudes, fewer terrain features will be in view for a shorter amount of time. In the time it takes a novice to look down to reference a map, an entirely new set of terrain features may come into view. It is rarely feasible for a helicopter crew to stop to regain orientation. This makes error recovery a notoriously difficult task. Navigation is inherently a crew task. All crew members share the task of avoiding terrain and obstacles
Virtual Environment Helicopter Training
291
and relaying information on navigation cues. Ironically, the aircraft proves to be a difficult platform for training. The instructor is responsible for basic aircraft control and obstacle avoidance. This affords few opportunities to observe a trainee’s procedures and provide guidelines. Given these unique challenges, VEs are an appealing training solution. They are inherently spatial and many of the important characteristics of the real world can be faithfully re-created while many of the real world limitations can be removed. There were several practical issues involved in applying VE technology to helicopter training. As is the case in many application domains, cost was an issue. The form of the solution was also key. Because most of our work was for naval aviators (U.S. Navy and U.S. Marine Corps), any training device had to be small and rugged enough to fit aboard ship. We were also concerned about usage modes. The VE training literature is rife with examples of unsuccessful automated intelligent tutoring systems (ITS) for complex skills. For a good discussion of general issues of ITS, see Psotka, Massey, & Mutter (1988). We did not want to rely on ITS technology, but requiring that an instructor be physically present at all times was not an attractive alternative either. Thus, knowing what modes of usage were appropriate and how they affect training was a necessary component of this line of research. This introduction frames our research program that has spanned over 10 years of applying emerging VE technology to helicopter training and evaluating the results. We begin with our earliest efforts to apply VE simulation to this unique problem domain and take the story to the present where the characteristics of the problem remain unchanged, but the form of the solution has changed dramatically. See Figure 27.1.
MAP INTERPRETATION AND TERRAIN ASSOCIATION VIRTUAL ENVIRONMENT Helicopter pilots are currently trained to navigate in a number of ways. The navy has a course of instruction called the map interpretation and terrain association course, or MITAC, that specifically teaches these skills. Conventional classroom instruction is used to teach the basic concepts, such as the use of displays, map coordination, dead reckoning, and compass use. Noninteractive video is sometimes used to practice the task where the view from a flight is shown and the trainee must follow along on a paper map. However, navigation is an inherently interactive task. A video that does not respond to the action of the trainee is of little use in learning the cause and effect relationship of movement to spatial orientation. To a large degree, the shortcomings of this video were a key motivator for the use of simulation. Instruction then moves to the aircraft where the trainee must perform a complex navigation task in the cockpit under the pressure of all the other things that a pilot has to be aware of in flight. Scaffolding techniques in the aircraft are difficult if not impossible to achieve. As noted by Ward, Williams, and Hancock (2006, p. 252), “intuition and emulation” tend to guide this process more than
Figure 27.1. A Road Map of Navigation Research for the Office of Naval Research
Virtual Environment Helicopter Training
293
evidence based practices. There are not many options for the instructor to use in between classroom materials and the real task in which to train. The result is that, upon graduation from flight school, many pilots are far from expert navigators. Can simulation be used to remedy this? We first prototyped a simple navigation training device that would run on standard hardware and that would be small enough and inexpensive enough to function in the training space of a typical squadron. We developed a three-screen display format on a Silicon Graphics desktop graphics computer that used a simple joystick for control (Sullivan, Darken, & McLean, 1998; Sullivan, 1998). We called the system the map interpretation and terrain association virtual environment system or MITAVES, as these were exactly the skills our simulator was intended to address. See Figure 27.2. The screen resolution was poor on the initial prototype, and we needed to move to more common personal computer (PC) hardware, so we reimplemented the system two years later using an Intergraph PC with Wildcat graphics cards (McLean, 1999). With much improved resolution and an improved interface, the results were encouraging. See Figure 27.3. Most importantly, pilot navigation became active as opposed to the passive video training that was used previously. The wide field of view display was a critical element for the helicopter domain because a typical flight profile for a helicopter is slow enough that a feature, such as a hilltop or ridgeline, will remain in view long enough to be useful as a navigation aid. Had we opted for a singlescreen narrow field of view display, we would have been training pilots to focus only on features directly in front of them rather than looking side to side, which is not a good practice.
Figure 27.2.
Initial Prototype of the VE Navigation Training Device, MITAVES
294
Integrated Systems, Training Evaluations, and Future Directions
Figure 27.3.
The Second Version of MITAVES
Another element of the MITAVES design was the joystick control, which purposely did not simulate flight dynamics. The trainer is for navigation skills, and in real flight the navigating pilot will not be at the controls, so we simplified the control mechanism. The joystick was used to “point” the aircraft in a direction, and it would follow the terrain until the direction or altitude was changed. The last part of the design central to this discussion is the use of maps, especially “you-are-here” maps, in MITAVES. Being disoriented is a part of navigational training, but being hopelessly lost, which frequently occurs in video training, can be damaging. Providing some sort of you-are-here map was appropriate, but making it available while navigating would prove counterproductive. Pilots stop looking at the “out the window” view and stare at the moving map display, which is completely artificial. The solution was to provide a you-are-here map, but to provide it in a mode that stops active navigation. When the pilot requests the map, flight is stopped while the map is displayed and the pilot reorients on the paper map. Then the map is dismissed and flight continues. We can also count map views as a measure of performance in addition to actual navigation performance. See Figure 27.4. We tested both versions of MITAVES at Helicopter Antisubmarine Squadron Ten (HS-10) at Naval Air Station North Island in San Diego, California. We discovered several modes of use for MITAVES. The most obvious mode is as a practice tool for asynchronous use by trainees. After completing classroom material, student pilots can use MITAVES on their own to further develop their
Figure 27.4. The MITAVES Map View
296
Integrated Systems, Training Evaluations, and Future Directions
navigation skills. Instructors want to use MITAVES in the classroom when they realize how much more effective it is than the MITAC videotapes in current use. Experienced pilots can use the system for refresher training. Since navigation is a perishable skill, pilots can use MITAVES to retain sharpness in their navigation skills when they are in nonflying status. Using a subjective evaluation method, HS-10 instructors evaluated student pilots on a standard navigation flight normally scheduled as part of their syllabus. They did not know which students had received MITAVES training. Trainees were evaluated on overall flight performance, ability to recover from errors, and ability to correctly identify features. The number of pilots in the initial studies was low, but data suggest that pilots who received VE training were better prepared for the navigation flight than those who did not. These early studies revealed that time and distance estimation is a key element to successful air navigation and that the cues provided might be inadequate. The ability to perceive relative motion based on surface detail is an important cue that expert pilots use effectively. VEs often have such poor resolution on the ground that it is difficult if not impossible for the pilot to determine airspeed and distance traveled from anything other than cockpit displays, where in the actual aircraft the pilot develops the ability to estimate relative distance based on optical flow of the surface detail on the ground. That surface detail is often nonexistent in a VE. We conducted a separate study to determine how much detail was needed to sustain reasonable performance on this element of helicopter navigation. Using the amount of surface detail as the independent variable, we were able to show that just a 1 percent density of ground detail will allow a helicopter pilot to maintain a reasonable hover as compared to when only a flat texture is provided (Peitso, 2002). See Figure 27.5. Although a low cost PC based system showed promise as an effective trainer for helicopter pilots, results were limited exclusively to natural terrains. It could be assumed that the system would be equally effective in urban terrains, but rendering a full fidelity urban terrain in the VE to facilitate positive training remained a challenge. Using the same implementation as the second iteration of MITAVES, we developed a model of the northern Virginia area around Tysons Corner with the assistance of Marine Helicopter Squadron One (HMX-1, the Presidential Helicopter Squadron). See Figure 27.6. Using only experienced Marine Corps helicopter pilots (because Marine Corps pilots often fly in urban environments), we studied the effectiveness of this new approach to rapidly building urban terrains for training and mission rehearsal. Two groups of pilots received paper maps and charts with which to prepare for an evaluation flight in the simulation in which they would be asked to identify key buildings and features. One group was also able to practice the flight path using the VE, while the control group was not. We then used a high resolution video produced for us by HMX-1 as the transfer task. Pilots who received the VE training were significantly better at identifying features and checkpoints during flight than the control group.
Virtual Environment Helicopter Training
Figure 27.5.
297
Surface Detail Used to Hold a Hover
Thus, the nature of the task can be used to identify where to concentrate the effort in building virtual databases of real places. Since Marine Corps pilots were most interested in essential features, such as roads and intersections, rivers, bridges, and the most salient buildings, nonessential features, such as houses, cemeteries, and shopping malls, were not included. Using this approach, we can quickly produce large-scale virtual urban terrains that can be effectively used for mission rehearsal (Wright, 2000). Although the VE training device built for helicopter pilots operated on low cost hardware, with a relatively small footprint and positive effects on navigation performance in the air, and terrain databases had been built for it, significant shortcomings remained. For example, within MITAVES, the pilot can stop the simulation anytime to take a break, to think about an error, or for any other reason. In the air, stopping is never an option except in an emergency. Successful navigation has to be performed under conditions of extreme stress. In addition, even though MITAVES ran on an off-the-shelf PC using standard displays, a typical deployed squadron would be hard-pressed to find room for it on board ship. Could it be made any smaller? THE CHROMAKEY AUGMENTED VIRTUAL ENVIRONMENT We tried to address these issues as well as several others using a new approach to VE simulation for aviation. We called it the Chromakey Augmented Virtual
Figure 27.6. Virtual and Real Tysons Corner, Virginia
Virtual Environment Helicopter Training
299
Environment or ChrAVE (Darken, Sullivan, & Lennerton, 2003; Lennerton, 2003). The approach was to use chromakey technology to mix the real environment of the actual cockpit with the virtual environment computed separately and delivered to the pilot’s eyes via a head-mounted display. The glass of the cockpit canopy is covered in blue material. The head-mounted display (HMD) has a camera and a 6 degrees of freedom tracking device mounted on it. As the pilot looks around, the video feed from the camera is passed to a video mixing unit. The position and orientation of the head is passed to the simulation software, which renders the appropriate image for that frame. The mixing unit replaces anything the camera sees as blue with the virtual environment, resulting in an augmented, composite image where the cockpit interior and the pilot’s body are seen, but the glass is replaced with the VE. See Figure 27.7. In our laboratory, we built a helicopter cockpit mock-up with no glass canopy, so the blue-screen material was placed in frames in front and to the side of the apparatus. We used illumination to create an even distribution of light across the surface of the fabric. During training in the mock-up, the seated pilot wears the HMD and uses a paper map, the cockpit displays, and the VE to navigate the course. Flight commands are given using the same verbal protocol used on an actual flight. In practice, the system would be composed of a shock-mounted rack for the PC and the video mixer with cables going to the HMD. Further reductions in size may be possible, but, most importantly, the simulation platform becomes the helicopter itself and does not require a simulation device that would need space on board ship. See Figure 27.8. The design of the HMD apparatus was nontrivial. We needed a robust mounting of the camera to the display so it would not loosen or break easily. We considered using a small form factor camera mounted inside the housing of the HMD, but decided against that because it would be too fragile and we would lose the ability to alter the focal length of the camera. We decided to mount the camera on top of the HMD with the spatial tracker mounted above it. While this gave
Figure 27.7. Environment
The First Version of the Chromakey Augmented Virtual
Figure 27.8. The Camera (Left), VE (Center), and HMD (Right) Views
Virtual Environment Helicopter Training
301
us a rugged apparatus, it effectively moved the placement of the pilot’s eyes to a different location. This altered perception considerably and therefore caused us to investigate if this had adverse effects on performance overall. See Figure 27.9. We used a ball-tossing exercise as a measure of hand-to-eye coordination. We measured the participants’ abilities to catch a small ball tossed to them from a few feet away and compared their performance unhooded to hooded with the HMD initially, hooded with the HMD post-exposure, and finally unhooded again post-exposure. As expected, eye displacement minimally impairs hand-to-eye coordination, but performance returns to baseline levels very quickly after exposure. The study measured the performance of 15 experienced pilots as they prepared for and then executed an approximately hour-long flight over virtual Southern California. Several subjective evaluations were used to compare performance in the simulator to actual performance. Visual scan patterns in the ChrAVE were similar to actual cockpit scan patterns, as was the response to added stress in the task. For example, a secondary task required pilots to listen to simulated radio calls for an individually assigned call sign and report whether it had been called. The simulated environment of the task became so difficult at times that participants lost track of the radio calls altogether, and performance on this secondary task dropped significantly. This indicated that task complexity was mirroring reality and that performance was becoming comparable as well.
Figure 27.9.
The ChrAVE Head-Mounted Display Showing Visual Offset
302
Integrated Systems, Training Evaluations, and Future Directions
One of the things we gave up when we moved to the HMD in the ChrAVE was wide field of view. While wide field of view is critical for helicopter pilot navigation, narrow field of view in an HMD is less critical because (a) the field of view can be artificially extended by back and forth head movement and (b) this is exactly how night vision goggle (NVG) usage works. In the military context, a large percentage of flights are made under NVG conditions, so having NVG simulation capabilities is critical to the adoption of the system. In the next phase we focused on how to address NVG flight using the ChrAVE (Beilstein, 2003). We replicated the initial ChrAVE study using the same apparatus, but instead of a daylight simulation for the VE, we replaced it with an NVG simulator. The NVG simulation image generator used had two modes: physically based and nonphysically based. When using the physically based mode, the material properties of the environment are used to calculate the illumination of each pixel in the display. When not physically based, everything is approximated using simple heuristics to create a believable display, but it is not capable of changing based on the type of materials or the moon state. Using the same criteria as the initial study, we determined that performance in the NVG simulator was a close approximation to real flight and that again, the simulated stress of the primary task was valid. Interestingly, even the most experienced subject pilots were not affected by the physics of the NVG simulation. They performed equally well with or without physical realism. Given the heavy pre-simulation and run-time costs of computing NVG imagery to be physically authentic, questions remain as to which sorts of tasks require physical reality and which do not. We believe the default should be to approximate NVG imagery given our results. In addition to knowing how it would perform, before taking the ChrAVE to the fleet it was necessary to determine if the fleet saw it as a plausible way to train and rehearse missions if adopted for large-scale usage. At this time, the Office of Naval Research program adopted the ChrAVE as the third part of its threeplatform research, development, and concept demonstration for the U.S. Navy and the U.S. Marine Corps. The ChrAVE became VEHELO (Virtual Environment Helicopter) and was soon to perform in a networked simulated training environment along with VELCAC (Virtual Environment Landing Craft, Air Cushion) and VEAAAV (Virtual Environment Advanced Amphibious Assault Vehicle) in the Virtual Environments and Technologies program. Transitioning from laboratory to field experimentation involved changes in the physical equipment, the software, and the experimental design. Up to this point the design goal had been to re-create the environment, task, and stress in a virtual setting. Success was measured by comparing virtual performance and behavior to real world performance and behavior in order to determine if the virtual experience could serve as a surrogate for real world experience. The next step was to quantify the value of a virtual experience to determine if training with VEHELO could improve real world performance. Changes to the physical equipment for the first two experiments involved mounting the PC and electronic equipment in a portable rack-mount container
Virtual Environment Helicopter Training
303
and simplifying the mock cockpit to make it more portable. The portable mock cockpit included only the seat, flight controls, and a liquid crystal display (LCD) representing the instrument panel mounted in front of the user. It did not include the cockpit frame structure. In the laboratory setting, the pilots had a restricted field of view based on the physical cockpit. In the first two field experiments pilots had a less restricted field of view. The physical setup for the first two studies is shown in Figure 27.10. In designing the first transfer study we attempted to leverage and extend previous work on helicopter overland navigation. The goal changed from similar performance and behavior to improved performance. The key questions centered on ideal treatment for experiment and control groups, as well as performance measurement. Boldovici, Bessmer, and Bolton (2002) provide an excellent summary of the typical difficulties encountered in conducting and drawing meaningful conclusions from such field studies. To eliminate the potential bias of comparing a group that received training to a control group that received no additional training, an alternative treatment was devised consisting of a detailed review of the techniques of overland navigation applied to the specific navigation route that students would fly. The experimental group flew the route using VEHELO, while the control group reviewed the route using “best practices.” The experimental and control groups received equal time in their respective training treatment. This training followed normal classroom training leading up to syllabus flight events, which were used as the real world transfer task. It is difficult to determine reliable performance measurements for overland navigation. Experience from previous studies highlighted the fact that simple navigation plots are not always indicative of a trainee’s ability. A navigator who is accidentally on the route would be incorrectly rated as better than a pilot who is aware of his or her position and surroundings, but intentionally deviates from the route. Similarly, navigation plots from training sessions are not necessarily a reliable indicator of the potential value of a training session. A trainee who is consciously trying to practice a variety of navigation techniques may appear less proficient than a pilot who is inappropriately relying on a single
Figure 27.10.
VEHELO Portable Configuration
304
Integrated Systems, Training Evaluations, and Future Directions
method, such as dead reckoning. This is closely related to the concept of deliberate practice outlined by Ericsson, Krampe, and Tesch-Romer (1993). As noted by Ericsson et al., it is difficult to evaluate and measure via observation when deliberate practice occurs. To augment the navigation data, subjects completed questionnaires that included self-efficacy and workload ratings. Subjects also completed static terrain association tests designed to measure a subject’s ability to match a plotted position on a map with the corresponding out-the-window view. We also extended the normal grading criteria associated with the syllabus flight events. Instructor pilots who flew with test subjects completed a more detailed grade card with more specific assessment of navigation performance and workload. Members of the experimental design also flew as aircrew on syllabus flights to provide subjective assessment. While the small sample size made it difficult to provide statistically significant differences between the control and experimental groups, several conclusions were clear. Pilots who flew the route virtually were subjectively assessed by their instructors as having better overall navigation performance in terms of ability to location position, identify key navigation features, and manage the cockpit workload. Pilots who flew in VEHELO were also better at maintaining track. Based on this initial field study, two additional studies involving several major changes were conducted at the Marine Corps’ H-46 Fleet Replacement Squadron (FRS)—HMM(T)-164 in Camp Pendleton, California (Kulakowski, 2004; Hahn, 2005). The experimental design was changed significantly. In the initial experiment we wanted to avoid any confounds associated with individual instruction. Thus, the protocol limited the feedback that the individual running VEHELO could provide to warnings at prescribed distances from intended track. Given the positive indications of potential training value from the first study, we removed this constraint for the second study. In the second study, the evaluator [a former FRS instructor at HMM(T)-164] was allowed to provide whatever instruction he deemed appropriate. The task was extended from terrain association and navigation to explicitly include crew resource management. In practice, navigation is a collective responsibility in which every member has a role. In the H-46 the aerial observer and the crew chief provide information on salient visual cues within their field of view. The nonflying pilot is responsible for coordinating the overall effort. To provide trainees experience coordinating these efforts, the evaluator filled the role of other crewmen by simulating typical intercockpit communications system (ICS) calls for various crew positions. To attempt to increase sample size and minimize the impact on the syllabus, the terrain association test and student questionnaires were omitted. In the second study, the difference in ability to maintain track was significantly better for subjects who trained using VEHELO. Additionally, their instructor pilots rated them better at navigation performance, as well as crew resource management. While these results were promising, there were still issues with the VEHELO configuration. The equipment was cumbersome, difficult to set up and adjust, and did not provide the immersive environment originally intended.
Virtual Environment Helicopter Training
305
Coincidentally a system that would address these issues was being developed by the entertainment industry, which also needed highly portable and easy-to-set-up chromakey. The solution was a ring of light emitting diodes (LEDs) that could be mounted around a camera lens with a special retroreflective material that worked with the wavelength of the LEDs and only reflected light directly back at the source. This configuration was used on one of HMM(T)-146’s helicopters dedicated for maintenance training. The windscreen area was covered with a sheet of retroreflective material; a ring of LEDs was added to the camera/HMD, and an LCD panel was used to render the instrument panel. With the new configuration, the setup time changed from hours to minutes. Subjects now had many of the constraints of the operational environment: the same field of view and obstructions, communication via the aircraft’s ICS system, limited space to manage maps, checklists, and route kneeboard cards. We were also able to upgrade the HMD from a 640 × 480 display to a 1,280 × 1,024 display. See Figure 27.11. The experimental design for the third round of transfer studies did not change. Results from the third round of studies echoed previous work. Subjects who were exposed to virtual environment training maintained track better. They were also subjectively evaluated as superior at crew resource management, navigation, and management of cockpit workload.
Figure 27.11.
The Field Tested HMM(T) apparatus for VEHELO
306
Integrated Systems, Training Evaluations, and Future Directions
CONCLUSIONS Our years of study in the use of VEs for training navigation skills in helicopter pilots have proven fruitful. We have learned how to build low cost VEs for training that are valuable tools for spatial skill and knowledge development. We have refined our techniques to include night vision capabilities and urban terrains. We have extended the concepts toward embedded training that will be far more suitable for many training domains. In all cases, a thorough understanding of the training domain is key to properly constraining the design so that it fits the training requirements, the trainees’ needs, and the environment and situations for use. These are all critical for military training. Most importantly, this work revealed that conventional training transfer experiments are simply not practical in the military setting most of the time. The work represents a creative approach that controls exposure, treatment, and trainees in such a way that researchers can learn what works without sacrificing effective training for experimental subjects. The results of this work contribute to real training systems for military helicopter pilots and crew. REFERENCES Banker, W. P. (1997). Virtual environments and wayfinding in the natural environment. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA. Beilstein, D. L. (2003). Visual simulation of night vision goggle imagery in a chromakeyed, augmented, virtual environment. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA. Boldovici, J., Bessmer, D., & Bolton, A. (2002). The elements of training evaluation. Alexandria, VA: U.S. Army Research Institute for Behavioral and Social Sciences. Darken, R., Sullivan, J., & Lennerton, M. (2003). A chromakey augmented virtual environment for deployable training. Proceedings of I/ITSEC [CD ROM]. Arlington, VA: National Training and Simulation Association. Darken, R. P., & Banker, W. P. (1998). Navigating in natural environments: A virtual environment training transfer study. Proceedings of Virtual Reality Annual International Symposium (pp. 12–19). Washington, DC: IEEE Computer Society. Ericsson, K., Krampe, R., & Tesch-Romer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406. Goerger, S., Darken, R., Boyd, M., Gagnon, T., Liles, S., Sullivan, J., et al. (1998, April). Spatial knowledge acquisition from maps and virtual environments in complex architectural spaces. Paper presented at the 16th Applied Behavioral Sciences Symposium, U.S. Air Force Academy, Colorado Springs, CO. Hahn, M. E. (2005). Implementation and analysis of the Chromakey Augmented Virtual Environment (ChrAVE) version 3.0 and Virtual Environment Helicopter (VEHELO) version 2.0 in simulated helicopter training. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA. Jones, Q. (1999). The transfer of spatial knowledge from virtual to natural environments as a factor of map representation and exposure duration. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA.
Virtual Environment Helicopter Training
307
Kulakowski, W. W. (2004). Exploring the feasibility of the Virtual Environment Helicopter system (VEHELO) for use as an instructional tool for military helicopter pilots. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA. Lennerton, M. (2003). Exploring a chromakeyed augmented virtual environment as an embedded training system for military helicopters. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA. McLean, T. (1999). An interactive virtual environment for training map-reading skill in helicopter pilots. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA. Peitso, L. (2002). Visual field requirements for precision Nap-of-the-Earth helicopter flight. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA. Psotka, J., Massey, L. D., & Mutter, S. A. (Eds.). (1988). Intelligent tutoring systems: Lessons learned. Hillsdale, NJ: Lawrence Erlbaum. Sullivan, J., Darken, R., & McLean, T. (1998, June 2–3). Terrain navigation training for helicopter pilots using a virtual environment. Paper presented at the 3rd Annual Symposium on Situational Awareness in the Tactical Air Environment, Piney Point, MD. Sullivan, J. A. (1998). Helicopter terrain navigation training using a wide field of view desktop virtual environment. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA. Ward, P., Williams, A., & Hancock, P. (2006). Simulation for performance and training. In K. Ericsson, N. Charness, P. Feltovich, & R. Hoffman, (Eds.), The Cambridge handbook of expertise and expert performance (pp. 243–262). New York: Cambridge University Press. Wright, G. T. (2000). Helicopter urban navigation training using virtual environments. Unpublished master’s thesis, Naval Postgraduate School, Monterey, CA.
Chapter 28
TRAINING EFFECTIVENESS EXPERIMENTATION WITH THE USMC DEPLOYABLE VIRTUAL TRAINING ENVIRONMENT— COMBINED ARMS NETWORK William Becker, C. Shawn Burke, Lee Sciarini, Laura Milham, Meredith Bell Carroll, Richard Schaffer, and Deborah Wilbert Military teams are increasingly confronted with a need to be adaptive as they operate against asymmetric threats across a wide variety of environments and mission types ranging from combat to stability, support, transition, and reconstruction operations (U.S. Department of Defense, 2006). In efforts to achieve this goal there has been a movement toward training troops to operate independently in smaller, “scalable” units. It is argued that these units, acting in concert with commander’s intent, will be able to exercise initiative to locate, close with, and destroy enemies (Fleet Marine Force Field Manual 6-5, 1991). While the move toward smaller, scalable units may make military forces more agile, it does not guarantee that the skills needed to foster adaptation of strategy, structure, and process will be present. In order to promote such skills, the military relies on a wide variety of training methods, not the least of which is the use of experiential based simulation. Due to the current operational tempo where resources and time are at a premium, it is essential that these simulations be based on the science of learning paired with the appropriate use of technology such that maximum efficiency and learning occurs (Salas & Burke, 2002). One of the many units seeking to utilize such training is Fire Support Teams (FiSTs). A FiST is a small group of marines within the Marine Air-Ground Task Force charged with the tactical coordination of air and indirect fire assets. Key FiST members include the FiST leader, forward observers (FOs) (for artillery and mortar crews), and a forward air controller (FAC). Together the FiST creates and communicates an attack plan to supporting artillery, mortar, and aircraft units to achieve mission success.
Training Effectiveness Experimentation with the USMC DVTE
309
Currently, the majority of FiST training is conducted through the use of classroom instruction in which declarative and procedural knowledge is received. Trainees then practice the application of such knowledge through the use of mental simulation that incorporates a sand table or battle board. Finally, environment and organizational resources permitting, trainees engage in a live-fire exercise. This type of training is commonly referred to as the “crawl, walk, run” method. Unfortunately, the majority of the Marine Corps’ current training options are limited in the ability to provide accurate visualizations or temporal accuracy until live-fire training. In an effort to address these gaps an Office of Naval Research program known as the Multiplatform Operational Team Training Immersive Virtual Environment was field-tested in the fall of 2006. This program, sponsored under the Virtual Technologies and Environments (VIRTE) program office, later became a program of record for the Marine Corps as the Deployable Virtual Training Environment (DVTE), a subpart of a larger system known as the Deployable Virtual Training Environment—Combined Arms Network (DVTE-CAN). DVTE-CAN is a network of simulation and support systems, which incorporate advances in simulation technology along with knowledge about individual and team learning to provide a virtual training environment for FiST members, as well as air and ground support personnel. The system provides an environment in which the forward air controller and forward observers can train to provide coordinated close air support, mortar, and artillery fires. For a more detailed description, the reader is referred to Volume 3, Section 2, Section Perspective. The program objective for DVTE-CAN was to provide a simulation based experiential learning environment that focuses on the tasks requiring mastery for scalable marine infantry units, such as FiST teams. The requirement placed upon the experimental team was to determine the usability, utility, and effectiveness of DVTE-CAN as a supplemental training tool during each phase of the FiST training process. This chapter documents a portion of this process and lessons learned. METHODS One of the primary goals of this effort was to determine the usability and utility of the DVTE-CAN system across a variety of marine populations. To this end, experimentation was conducted that examined a variety of experience levels with units at several locations, including students at the Infantry Officers Course (IOC) (introductory training), Expeditionary Warfare School (EWS) (advanced refresher training), members of the active reserve (basic refresher training), and marines preparing for deployment (advanced training). Other United States Marine Corps (USMC) entities involved in field-testing included Expeditionary Warfighter Training Group Atlantic and Expeditionary Warfighter Training Group Pacific. Introductory Training The research team had three separate opportunities to insert DVTE-CAN into the curriculum at IOC. IOC prepares officers to be platoon leaders and
310
Integrated Systems, Training Evaluations, and Future Directions
is the second school a new Marine Corps officer attends. Included in this training are several days dedicated to call for fire (CFF), close air support (CAS), and FiST procedures and tactics. Participating in data collection efforts at IOC were 154 marines representing 2nd lieutenants with an average age of 23, with a majority having less than one year in service. In line with their time in service, this population of marines had minimal exposure or experience with CFF or CAS. Basic Refresher Training Data collection efforts with the 1st Battalion, 23rd Marines (1/23) afforded the team an opportunity to examine DVTE-CAN within an activated reservist unit conducting field training. The reservists reported having no experience as actual FiST members and minimal training in supporting arms or CAS. Twelve marines representing a range of ranks (that is, lance corporal to sergeant) and an average of five years in service used the system to train with regard to the FO role within a FIST environment. Advanced Refresher Training Data collection at EWS afforded DVTE-CAN to be implemented and feedback collected from experienced, active-duty marines (1st lieutenants to lieutenant colonels with an average age of 32). DVTE-CAN was utilized as a refamiliarization tool and training device with 12 squads approximately six months into their yearlong training cycle at EWS. Over half of the marines reported that they had prior training experience as a FiST member, and 80 percent had less than 500 hours of training in supporting arms or close air support. As compared to many of the other implementations of DVTE-CAN, the marines at EWS reported a wider variety of military occupational specialties (MOSs), including infantry, artillery, logistics, administration, police, aircrew, armor, medical, and combat engineer. Within the MOSs reported, 55 percent were classified as ground, 25 percent air, and 18 percent supporting agencies. Advanced Training An intact FiST team organic to 3rd Battalion, 4th Marines (3/4) used DVTECAN as an advanced team training device while training at the USMC AirGround Combat Center; 3/4 was about halfway through its 180 day training cycle prior to deployment. FiST members ranged in age from 21–31 and time in service ranged from 2–12 years. DVTE-CAN was used to supplement the predeployment FiST training package with a focus on ensuring the team was capable of correctly executing 3/4’s FiST battle drill in a variety of scenarios. EXPERIMENTAL DESIGN AND PROCEDURE The design of choice was a between-subjects design with two levels of the independent variable, practical application method. Specifically, marines within
Training Effectiveness Experimentation with the USMC DVTE
311
the control condition used mental simulation or sandbox-type exercises for the basis of their practical application, while those in the experimental condition used DVTE-CAN to develop and execute their battle plans. Marines in both conditions were given the same commander’s intent (for example, rules of engagement, available assets, and deadlines) and operating picture. In developing their battle plans, marines in the control condition used standard planning tools (for example, battle board, compass, and map), while those in the experimental condition used the planning tools embedded within the DVTE-CAN system augmented with the standard tools. Upon arrival marines completed an informed consent document, demographics, and self-efficacy questionnaires. Marines in the experimental condition then participated in a brief system orientation session, facilitated by their instructor or resident contractor. At the conclusion of this session, commander’s intent was delivered to the FiST teams for the first scenario. FiST members then either used DVTE-CAN or standard practical application tools (as described above) to develop their battle plans. Instructors and squad members then subjected FiSTs within both conditions to a critique of their battle plans. At the conclusion of the critique the plan was executed. Within the control condition execution occurred via mental simulation, while within the experimental condition DVTE-CAN allowed marines to see execution take place in real time and space. At the conclusion of the practical application, FiST teams again completed a series of questionnaires. Upon completion of the in-class execution phase, squads participated in live-fire training, when available. While the preferred experimental design was as detailed above, resource constraints often dictated that modifications were made to the initial design (see Table 28.1). For example, on several occasions, training schedules did not permit the creation of true control and experimental conditions; consequently, experimental design across the populations reflected a combination of true control groups and experimental pre-/post-test-only groups. CORRESPONDING INSTRUMENTS Depending on the exact purpose of each data collection opportunity the exact questionnaires varied, but potential questionnaires included pre-/post-selfefficacy, reaction/utility, usability, and ability of the system to be used to achieve specified learning goals. Self-Efficacy A 33-item questionnaire assesses the confidence in a FiST member’s own ability, as well as the team’s ability to accomplish key FiST tasks (α = .93). Reaction/Utility An 18-item questionnaire assesses reactions. System utility in terms of promoting FiST-related skills, the degree of confidence marines had that DVTE-CAN
Table 28.1. Training Unit
DVTE Application Variants DVTE Configuration
Training Environment Training Details
IOC
Joint Semi-Automated Forces (JSAF), forward observer artillery (FO ARTY), forward observer mortars (FO MORT), and FAC
Schoolhouse
• DVTE replaced the traditional sand table indoctrination training in the experimental group IOC classes. • Three training days: two schoolhouse; one live-fire day. • Two classes received live-fire training. One class used DVTE in place of live fire due to the lack of available assets. • Four member teams: (1) FiST leader, (2) FOs, (1) FAC. • FOs & FACs used DVTE; FiST leader used battle board.
1/23
JSAF, FO ARTY, and FO MORT
Field tent
• DVTE replaced the traditional sand table and mental simulation. • System powered by deployable generator. • Small teams worked together to build battle plans for multiple scenarios. • Participants trained in the role of the FO providing the instructor solutions to the scenarios. • Individuals practiced simulated FO communications with the instructor while the plan was executed in the environment and projected for observation.
EWS
JSAF, FO ARTY, FO MORT, FAC, Combined Arms Planning Tool (CAPT), and AH1 Simulation
Schoolhouse
• DVTE replaced the traditional sand table C4 (command, control, communications, and computers) training • FiSTs practiced developing, critiquing, and executing battle plans • JSAF operator also played fixed-wing role • One trainee (a pilot) operated the AH1 Cobra simulation. • Acting commanding officers (COs) received scenarios from the instructor, developed intent, and passed it to the FiST. • FiST members created plan and communicated it to supporting agencies • Plan was collaboratively critiqued by the instructor, acting CO, and the FiST. • Finalized plan was executed; corresponding effects and coordination requirements were observed.
3/4
JSAF, FO ARTY, FO MORT, and FAC
Schoolhouse
• DVTE replaced the traditional sand table training and spanned two days. • Day One: CO (also the instructor) received personal JSAF training while trainees received group instruction on the other DVTE components. • Marines used free play to reinforce guided instruction. • Marines received system setup instruction. • System training culminated with execution of two basic scenarios. • Day Two: Trainees set up DVTE system in <30 minutes. • CO operated JSAF and conducted training the way he would have in the traditional setting.
314
Integrated Systems, Training Evaluations, and Future Directions
prepared them for live fire and to lead a FiST during live fire was assessed. Items also assessed if marines would recommend that DVTE-CAN be given to other trainees or used while on deployment. Finally, several open-ended items encouraged the marines to identify system areas especially useful, as well as areas in need of improvement. Usability A 25-item questionnaire reflected screen aspects, terminology/system information, learning, interactions, and system capabilities. Learning Goals Learning goals assess the degree to which DVTE-CAN could be used to train key FiST learning objectives and tasks. Tasks and corresponding learning objectives were identified based on a task analysis (Bell et al., 2006) and examination of Marine Corps documents. Instructor and observer ratings of FiST performance were also collected during training and live fire when possible, but these results will not be reported here. Only a brief subset of those that most closely relate to spiral development and user acceptance will be discussed in detail within the current chapter. Results that relate to utility, learning, and transfer are to be documented in future publications. RESULTS Infantry Officer Course The collection of data in this applied setting posed unique challenges with respect to logistics, training tempo, and availability of assets. As a result there are varying levels of data associated with each event. In order to keep aligned with the other events presented in this chapter, results concerning the degree that the system met the training goals at IOC and system usability for novice trainees will be presented. Overall, DVTE-CAN was able to meet IOC’s specific training goals. Trainees reported that the DVTE-CAN system had utility over the conventional training in terms of promoting individual, team, and other communication skills related to key FiST tasks. Trainees also reported that DVTE-CAN provided them with a more complete understanding of the FiST domain and enhanced classroom instruction. Efficacy data revealed that trainees were confident that DVTE-CAN prepared them and their team for live fire, as well as to serve in any position within the FiST. Perhaps most importantly, efficacy data revealed that they were highly confident to serve as a FiST leader during live fire. Finally, trainees showed increases in self-efficacy in all areas of mission performance including planning,
Training Effectiveness Experimentation with the USMC DVTE
315
coordination, terminal control of aircraft, communication, and equipment use, with significant increases over those who had not received training on DVTECAN for both terminal control of aircraft and equipment use. At each event, the marines who used DVTE-CAN rated their overall experience with the system and its tools favorably. The use of the tools provided and the visualizations were consistently ranked high across each event. All three events presented a variety of usability areas for improvement. After analysis, these areas were prototyped and then incorporated in improved versions of DVTE-CAN for testing at subsequent events in efforts to improve usability and training effectiveness. Regardless of the training event, one of the most frequently recorded usability issues was the students’ initial confusion with the operating the system. This highlights the need for a comprehensive and standardized training package that allows enough time for familiarization on the system prior to classroom instruction and scenario execution. Overall, free response comments from the trainees were encouraging. Many of the students shared the opinion that experiencing the terrain, targets, and impact of ammunition from the first-person point of view was beneficial and that it provided a level of realism not available on the table-mounted terrain replica. Additionally, the marines indicated that the ability to experience the temporally valid scenarios provided by DVTE-CAN increased their understanding of the timing, coordination, and communications required for successful FiST operations. 1st Battalion, 23rd Marines After using the DVTE-CAN system, marines reported that it had utility in terms of promoting individual, team, and communication skills related to key FiST tasks. Overall, participants felt that the practical application provided an effective training tool for live-fire exercises, that it enhanced classroom instruction, and that it was moderately challenging. However, perhaps most insightful were the free-form comments provided by the reservists with regard to how the system was used for their particular purpose. Specifically, the marines of 1/23 reported that system components dealing with hands-on practice, call for fire, the map, system tools, and feedback were particularly useful and noted as key system components. For example, with regard to hands-on practice, marines reported the realistic simulation and hands-on experience of what occurs within a FiST to be especially useful. Marines also reported that the system had strengths with regard to call for fire, specifically helped with the work-up of CFF, assisted in learning CFF in the classroom environment, provided instruction of verbal commands, and covered step-by-step instruction given the mission. Marines also highlighted features of the map, such as quick access to the map and protractor, map reading techniques, and help with plotting targets on the map. Similarly, in terms of system tools, marines reported that the tools used for calculations were especially useful. Finally, marines reported liking aspects that could be categorized as related to feedback, including the ability for the program to tell users to “say again” if incorrect, the process of getting used to making adjustments, and the point system in showing faults.
316
Integrated Systems, Training Evaluations, and Future Directions
While several system components were specifically mentioned as standing out, a few areas were also noted for improvement. Specifically, it was noted that certain aspects of the map (for example, thick lines on markers) made it difficult to read; a need to practice application with the field radio was noted, and it was suggested that not all the tools presented within the system are available to all marines (so there was some unfamiliarity, for example, the Viper). Finally, a note was made of a need for more training on the system.
Expeditionary Warfare School After using the system, marines rated the usefulness of the system in its ability to support several critical knowledge, skills, and abilities related to fire support, confidence in its ability to prepare them for live fire, its accurate simulation of FiST operations, and added value over traditional practical applications. Overall the system received very favorable results in all rated areas. Similar to the quantitative findings, overall qualitative data were also favorable. Those aspects of the simulation that received the most consistent praise can be categorized into three areas: visualization, practicing communication/coordination skills, and active participation. While many comments were made that reflected the importance of the visualization capabilities of the system for these marines, a few illustrative examples, including the following program visuals, were useful and assisted in (a) timing, deconfliction, and finding the enemy, (b) graphic depiction of feedback, (c) ability to see aircraft and marks as specified by the mission timeline, and (d) the depiction of planning/execution shortfalls. Overall comments reflected the utility of the system allowing visualization of how things come together and play out in “real” time. The ability to actually practice communication and coordination in real time was also noted as a valued system component. For example, marines made such comments as the system (a) enabled participants to work together as a team and communicate together in real time, as they would in a real situation and (b) it provided the capability to practice communication and coordination and realize errors in a safe environment. Finally, with regard to active participation, marines valued the fact that it “puts students in the position to do versus merely watching.” A few other notable comments indicating system benefits as seen by the marines at EWS include (a) it was extremely helpful in pulling all aspects of running a FiST together, (b) capability to apply skills learned in class, (c) timeline integration, and (d) real time effects on target. While the predominant number of comments regarding the use of DVTE-CAN as configured at EWS were positive, a few areas of improvement were suggested. While there were several suggested areas of improvement, many dealt with the need for participants to have a better orientation to the tool and more time to run through the simulated missions. In many instances observers noted that in light of time constraints, instructors would “simulate” a key aspect of execution, thereby constraining some of the benefits of using simulation. A few notable suggested system improvements included the inability to plot units on the map, difficulty in plotting and drawing the fixed-wing initial point on the map, and a
Training Effectiveness Experimentation with the USMC DVTE
317
general suggestion for more map interactivity. Also noted were a lack of fixedand rotary-wing attacks on the battle board timeline within CAPT and a lack of a FiST specific common operational picture map. 3rd Battalion, 4th Marines Overall, DVTE-CAN was able to meet 3/4’s specific training goals, and, similar to 1/23’s results, the marines reported that the DVTE-CAN system had utility in terms of promoting individual, team, and communication skills related to key FiST tasks. Additionally, the members of 3/4’s FiST reported that DVTE-CAN was exceptionally useful for plotting asset and target locations, planning suppression of enemy air defense missions, as well as observing the impact of rounds for effectiveness, adjustment, and/or determining battle damage assessment. On more than one occasion, the researchers observed individual marines recognizing procedural slips through the visualization provided by DVTE-CAN; selfcorrecting; and then sharing those slips with the team, thus creating a team learning experience. While team learning behavior was not being directly investigated, the potential of DVTE-CAN to strengthen shared mental models and to provide an improved team training experience cannot be overstated. Overall, the marines of the 3/4 FiST felt that the practical application provided an effective training tool and that they were, in fact, better prepared for a live-fire training exercise. The marines that used the forward observer PC simulator rated their overall experience with the program and its tools favorably. It is important to emphasize that several aspects and tools were specifically identified as being exceptional. There were a few areas in which the marines made suggestions for improvement. These suggestions were directly related to their prior experience in live-fire and/ or combat situations. Specifically, it was noted that map icons should be in operational terms in order to match what the users would experience in the actual domain. Additionally, in reference to the compass tool, the marines suggested that the virtual M2 and lensatic compasses needed minor modifications to match their live counterparts. One of the most positive ratings came from a free-form comment stating that even in high complexity missions, the system allowed the user to easily perform his operations. JSAF received an overall high user experience rating from the instructor’s point of view. Some of the more notable items were the usefulness and accessibility of the system tools, the display of virtual units and other elements, the correct and consistent use of domain relevant terminology, and the ease of interaction. These high rankings are encouraging since the operator was also instructing and evaluating the FiST marines while he was playing the role of the supporting agencies. There were areas in which JSAF received average ratings. One of these areas was a navigational issue resulting in difficulties setting a time on target for all participating entities. Additionally, the lack of availability of different information formats presented difficulties for the user. These items were directly related to the execution of the scenarios and may have been the result of the lack of comprehensive JSAF functionality training.
318
Integrated Systems, Training Evaluations, and Future Directions
OVERALL LESSONS LEARNED The uniqueness of each data collection effort provided a robust opportunity to assess the perceived utility and flexibility of DVTE-CAN, as well as the effectiveness of utilizing a spiral development process. Based on usability data collected throughout IOC events and other opportunities, usability issues were identified and problem solution tables were provided to system designers who implemented system improvements. Comparing version 1 to version 4 of DVTE-CAN, usability gains resulted in over a 40 percent decrease in heuristic violations. While each marine population indicated that DVTE-CAN had value over traditional classroom training, each group valued different system components based on experience and stated training goals (see Table 28.2). For example, insertions at IOC served to illustrate how the system could be used during indoctrination training to FiST tasks. At IOC, DVTE-CAN was seen as being able to provide marines with a realistic understanding of FiST communications and timelines that are often not available through traditional classroom methods. Marines also reported that DVTE-CAN provided a visual aspect not available on wallmounted maps, promoting a better understanding of synchronization. Similar to IOC, Marines at EWS saw the system as having utility, with nearly all officers recommending it for training while on deployment and indicating that it added value over traditional classroom training. Implementation at EWS illustrated DVTE-CANs use add value to an experienced set of marines who have been deployed, but have varying levels of familiarity with FiST operations and a diverse set of occupational specialties. Instructors also noted its usefulness by expressing interest and querying as to its ability to train teams of teams. A marked contrast to the previous environments was DVTE-CAN’s use by 1st Battalion, 23rd Marines. This case demonstrated DVTE-CAN’s portability, adaptation for field use, and versatility to meet a unit’s training goals. With 1/23, DVTE-CAN was primarily used to augment the standard lecture format by allowing the marines to practice the actions required of a FO within a FiST. Finally, it illustrated the perceived utility of DVTE-CAN with a set of reservists with minimal knowledge of FiST-related tasks. Moving to the other end of the spectrum, 3rd Battalion, 4th Marines served to illustrate how the system could be used and valued with an experienced FiST whose members had deployment and FiST-related combat experience. With this population several firsts were noted. Specifically, the marines heavily weighted the value of the planning component as compared to execution, deployed the system and brought the system online quickly, and demonstrated the ability of a designated marine to operate JSAF and execute training scenarios. Ultimately the marines saw the system as having utility and tended to rate it highly; the officers and enlisted marines recommended DVTE-CAN for training in garrison, in the field, and while on deployment. While there are lessons learned with each marine population (see Table 28.1), there are also lessons learned at the program level that facilitate others
Table 28.2.
Combined Lessons Learned
Training Group
System Use
Unit Lessons
• Indoctrination training • Schoolhouse • Four-person ad hoc FiST • Instructor led
• System seen as offering utility over traditional classroom training by both instructors and students. • System ability to be used for both planning and execution portions. • Visualization aspect highly valued; of particular value are identification of targets on deck in terrain, watching timelines unfold. • Instructor value indicated by use of system beyond original implementation population when live-fire canceled.
Infantry Officer Course • Active • Infantry MOS • Primarily LTs • No prior experience in FiST roles
EWS • Active • Refresher training • Wide variety of MOSs • Schoolhouse • Captain to Lt. Col. • Four-person ad hoc • Some prior experience in FiST + CO FiST roles
• Perceived utility for refresher training on FiST operations within marines who had deployed across a wide variety of MOSs. • Instructors expressed interest in DVTE-CAN’s ability to train teams of teams. • Additional training on system was requested. • Suggested a need for more time on system to gain full benefits (average time on system @ 40 minutes).
1st Battalion,23rd Marines • Reservists • MOS • Rank
• Refresher training • Field environment • Instructor led
• Portability and successful use of DVTE-CAN for field; even with generators as power source. • Incorporated into lectures and simulation training. • Hands-on experience, calculation tools, practicing adjustments, plotting targets, working up transmissions and calls for fire, and hands-on practice of what goes on in FIST were valued system components. • Used primarily as a FO trainer and how this role relates to larger FiST.
• Sustainment training • Classroom • CO-led training • JSAF operated by CO
• Ability of a designated marine to operate JSAF and execute training scenarios. • Learning curve on system reduced. • Ability of marines to easily deploy and bring system online quickly. • System was perceived to meet primary learning goals within an experienced intact FiST team.
3rd Battalion, 4th Marines • Active • Intact FiST, FAC role simulated • Operational experience in FiST roles
Training Effectiveness Experimentation with the USMC DVTE
321
conducting field research with marines. A representative sample of these appear below. Lesson 1. Do not forget the importance of prepackaged tutorials to coincide with implementation of new systems. At each data collection event, a day was dedicated to train-the-trainer sessions. However, as the trainers proceeded to later introduce the system to FiST members, it was done in many different ways due to differences in instructor style, resource constraints, and training schedules. Repeatedly the experimental team witnessed comments that suggested the need for additional time for user familiarization. While constraints on training time are unavoidable, there needs to be a prepackaged tutorial that goes along with newly implemented systems. A short tutorial can be designed to not only deliver systematic training, but to ensure that there is an appropriate baseline understanding of the instructional equipment such as not to hinder the learning of FiST procedures. Lesson 2. A spiral development process incorporating a close partnership between learning specialists, programmers, and user groups is essential to move past the prototype stage. Overall DVTE-CAN has received very positive feedback from the marines. The key to moving from a prototype system to transitional product has been the close partnership between subject matter experts, users, learning specialists, and programmers. The subject matter experts understand where the trainees need to be in terms of proficiency, the learning specialists assist in determining what features need to exist from a human learning standpoint, and the programmers are the ones who must develop the capabilities to meet these needs. Based on system or resource constraints, alternative designs are proposed, which in turn create new discussions on a way forward. Finally, if flexibility in terms of user population is warranted, then users of various experience levels should be interviewed. Lesson 3. Prepare a secondary (and third) experimental plan that anticipates changes dictated once in the field environment. When working within field environments, adaptability is key. Out of the six data collections that are represented within the chapter, not one went according to the original plan. Most often, due to weather conditions or training time constraints, experimental opportunities changed once the reality of Marine Corps training unfolded. As a result, the experimental team typically prepared a threetiered experimental plan. Having a three-tiered plan that specified the ideal design, an alternative moderate design, and a minimum design allowed flexibility once on site as it had been determined a priori what sacrifices in terms of design would be acceptable, based on the questions being asked. Lesson 4. Having a champion within the organization is essential to creating the relationships needed for access and success. Within a relatively short amount of time DVTE-CAN was partially implemented or demonstrated to a wide marine population. In all, data were collected at six locations, with some data collection being done more informally than others. The key to gaining access within each population was having a champion within the Marine Corps who could assist in explaining the purpose, benefit to the
322
Integrated Systems, Training Evaluations, and Future Directions
marines, and promote the system through word of mouth. This, in turn, allowed the experimental and programming teams to ultimately provide a flexible system based on feedback gathered across a variety of marine populations. Access was further promoted through instructors recognizing that their feedback was valued and when possible was implemented into system. Lesson 5. Unobtrusiveness is crucial when working with operational marines. Within most field environments the researcher collecting data is often in the middle of actual training or work in progress. The importance of being unobtrusive in such environments cannot be emphasized enough. This is important from both an experimental and a practical standpoint. The more obtrusive methods and data collection instruments are, the more the researcher may be seen as a burden and a drain on already limited training time. In collecting team data, it is difficult to be unobtrusive, but several mechanisms can be used to mitigate the perception of obtrusiveness. First, coordinate with instructors and/or the point of contact(s) well in advance of arriving so they are aware of the data collection strategy and can provide feedback. Gaining instructor feedback early can also serve to ensure instruments will make sense to the population of interest. Second, collect only the information that is truly needed; the “nice to have” data may have to wait. Third, attempt to keep questionnaires to a minimum in terms of number and length. Finally, part of being unobtrusive is blending in with the environment to the best of the researcher’s ability, so dress accordingly. CONCLUDING COMMENTS DVTE-CAN was developed through a close partnership among learning specialists, software engineers, subject matter experts, and the operational customer. As such, the resulting product served to score high in its perceived utility and generally engendered high levels of positive affect by the marines. Due to the logistical constraints often encountered in field settings, in many cases, true experimental and control groups were not able to be created. This, in turn, places some constraints on the ability to make a definitive statement regarding DVTECAN’s impact as compared to traditional methods across the variety of marine populations that used the tool. However, the use of experimental and control groups at the basic training level (IOC) and the use of pre-/post-test quasiexperimental groups at many of the other locations gives us a fair amount of confidence in the findings presented with regard to utility and the perceived ability of the system to meet the training goals for a diverse set of marines. By presenting a glimpse of the evaluation process for a system that has been successfully received by the user, it is our hope that the methods used and corresponding lessons learned can assist others in transversing the challenging environment of field research. Specifically, we hope that this chapter will assist those charged with the development and evaluation of training systems to develop systems that can easily transition into operational products that are not only scientifically based, but used by the operational community.
Training Effectiveness Experimentation with the USMC DVTE
323
REFERENCES Bell, M., Jones, D., Chang, D., Milham, L., Becker, W., Sadagic. A., & Vice, J. (2006). Fire support team (FiST) task analysis surrounding eight friction points (VIRTE Program Report, Contract No. N00014-04-C-0024). Arlington, VA: Office of Naval Research. Fleet Marine Force field manual 6-5: Marine rifle squad. (1991). Retrieved April 23, 2008, from http://www.lejeune.usmc.mil/2dfssg/med/files/FMFM%206-5.pdf Salas, E., & Burke, C. S. (2002). Simulation is effective for training when . . . . Quality and Safety in Health Care, 11, 119–120. U.S. Department of Defense. (2006). Quadrennial Report. Washington, DC: Department of Defense.
Chapter 29
ASSESSING COLLECTIVE TRAINING Thomas Mastaglio and Phillip Jones Assess: to estimate or determine the significance, importance, or value of; evaluate. —Webster’s New World College Dictionary, 4th Edition we can know more than we can tell . . . —Michael Polanyi, The Tacit Dimension
THE UNIQUE CHALLENGES TO ASSESSING COLLECTIVE TRAINING SYSTEMS Virtual simulations supporting collective training—the training of teams and teams of teams—present a unique challenge when it comes to determining their effectiveness and value to the intended user community. Virtual simulations designed to support collective training, at some level, further complicate that challenge, but because their usage is more pervasive and homogeneous at the event or trainee level, and because they offer controlled access to the training audience, it is possible to collect data to support subjective assessments. The methodology described in this section uses inherently subjective data, but plans for and conducts data collection followed by a structured analysis of that data in an objective manner to ensure the process focuses on the goals that the assessment is designed to accomplish. The Challenges of Empirically Based Evaluations or Testing A complete training effectiveness analysis of any system requires controlleduse scenarios, access to and control of subject units throughout their training lifecycle, significant data collection, and analysis (Boldivici, Bessemer, & Bolton, 2002). Some argue that it must include a comparison to alternative methods (for example, a baseline training approach, such as a field training exercise) to truly evaluate whether the system is worth the investment in development and commitment of time required from the training audience who will serve as subjects. Such an effort would be cost-prohibitive and time consuming (Burnside, 1991).
Assessing Collective Training
325
The significant issues for virtual simulations are what value its users perceive they attain: whether users are satisfied with the technology and implementation approach and whether they use it to their advantage to improve performance. This type of assessment can be more properly termed a training utilization assessment or study. In an empirical sense, a system cannot be training effective unless it is used properly; training utilization is a necessary condition for training effectiveness. Therefore, studying the training utility of a virtual simulation provides valuable insight, from a customer perspective, into the technology and the context in which it is being used. The results can also help determine where technology enhancements warrant investments in preplanned product improvements and how to improve training strategies or site operational policies. Another challenge arises from virtual simulation’s role as an enabler for operational performance. Although virtual simulations are independent systems, they are in reality a substitute, either for operational performance or for some other simulation system—live, virtual, or constructive. Thus, while improvement in individual or team performance from using a virtual simulation is a valid measure, the more useful measure is performance improvement within an overall training program and relative to other intervention methods. Relative improvement should be measured in terms of total improvement, speed of improvement, and resources expended to achieve that improvement. Controlling Dependent versus Independent Variables in Collective Training Assessments Measuring the resources invested to achieve virtual simulation results is another challenge. It is difficult to identify and measure the total simulation investment, which includes research, development, production, and maintenance, plus individual simulation-supported event management and creation costs. Finally, it should also include the post-event costs of transforming virtual simulation results into individual and organizational skills, knowledge, and competencies. Impact of Assessments on the Training Events Supported It is a challenge to assess virtual simulations while minimizing disruption to those using the simulations. Users turn to virtual simulations in order to obtain efficiencies; more desirable methods are too resource intensive to use effectively or safely. It is difficult to justify to users the disruptive costs of an extensive system assessment. An Alternative Approach to Assessment Is Required This chapter describes an alternate method for assessing virtual simulations, one that provides an analysis with depth, breadth, granularity, and rigor to support evaluation. The methodology leverages the human inclination to assess our
326
Integrated Systems, Training Evaluations, and Future Directions
environment and experiences innately and continuously. Those responsible for virtual simulations programs should plan to take advantage of an abundance of organic assessment that is already taking place by the users who innately assess their training tools and events in terms of their utility and value. The challenge is recognizing and harvesting the results of the organic assessments users are already doing. LEVERAGING CUSTOMER RELATIONSHIP MANAGEMENT PRINCIPLES Previous interest in and efforts to assess the effectiveness of training systems have focused on technology per se. We suggest that effective use of a virtual simulation, as a system, is akin to a customer relationship management (CRM) challenge (Hurwitz Group, 2002). CRM, as the term is used in industry, focuses on meeting the needs of a particular customer or class of customers. Effective training utilization can be viewed as a CRM challenge for users of virtual environment technology. Regardless of the quality of the technology or investment of development funds, the system will be of value to its ultimate customers only if they are able and motivated to use it effectively. Identifying Appropriate Stakeholders—Customers A critical step in the assessment process is to identify the user level stakeholders. Many complex virtual simulations will have multiple stakeholders, from those who direct or approve investment in them to the contractors who implement the technology. All have a role and are important to a successful program, but the focus of assessment has to be on the direct users of the simulation. From a CRM perspective, these users are the customers who must embrace both the technology per se and the capabilities it provides. Diagnosing Customer Commitment to Fielded Technology Cost savings are often cited as valid proof that the use of war games or constructive simulations to support collective training is a wise investment of resources. It is appropriate to determine the value that those who train with virtual simulation have realized from the investment in them. We are interested in how committed those customers are to the product—the virtual simulation—being evaluated. The assessment will likely focus on the following high level issues: • Is the device being used as it was designed? • Do its users perceive the device as having value to them? • Are the results of training events that use the device integrated with and used to plan for other training events using the same simulation, another simulation, or alternative training modality (for example, a live exercise)?
Assessing Collective Training
327
Importance of Customer Input to Programs Assessment based on the approach described here is highly dependent on customer input, which is the rationale for clearly identifying the end-user customer (Goodwin & Mastaglio, 1994). It follows that feedback on both the technology design solution and the training program within which that simulation is imbedded is needed to deliver a successful and meaningful assessment. Technology Design Solutions Virtual simulations, like many of our information technology based products, are too often developed based on what the engineer believes is the requirement (Mastaglio, 1991). The decision to conduct an assessment is an opportunity to collect end-user feedback on the implemented design. During the development of the assessment process, it is important to identify the key technical features, those that impact user acceptance and facilitate achieving the training goals. Such features as operational-environment detail and realism, scene resolution, simulator controls, and fidelity to actual operational systems are examples of such technology design features.
A METHODOLOGY FOR PLANNING AND EXECUTING ASSESSMENTS The first step in assessment is to determine its purpose. The study manager should advise the study sponsor as to what can be reasonably achieved. The tendency, once an organization becomes serious about conducting an assessment, is to expand its scope. The study manager must help the organization focus on its key goals. Assessment should focus on the level of performance achievable within the virtual simulations, the efficiency or speed those levels are achieved, the duration of results, the resources saved by the virtual simulation including time and money, the opportunity costs of using the virtual simulation compared to another methodology, and the negative impact of using the virtual simulation. This last focus item is often overlooked. However, because virtual simulations cannot equal live execution, and because virtual simulation users are usually success oriented, there is a tendency to perform when using the virtual simulation in a manner that optimizes performance within the virtual environment in lieu of the target environment for the training. We have developed and recommend the use of an approach called the study of organizational opinion (SO3). It is comprised of eight steps.
Map the Organization The sponsoring organization is the source for data supporting the assessment and also it is the consumer of the assessment results. Understanding the
328
Integrated Systems, Training Evaluations, and Future Directions
sponsoring organization is essential to organizing the assessment, gathering the required data, and providing actionable results or findings. Mapping the organization is an important step. The assessment team should start with the organization’s objectives. The team may have to assist the organization in identifying its goals. Organizational aspects to consider during mapping may include the following: • What is the organizational structure? Who makes decisions? What are those decisions? What information do the decision makers need? • What are the existing assessment/decision processes? How important is continuity versus evolution? • What management processes must the information support? • What are the various organizational agendas? How do you work around them?
Continuity with previous assessment efforts will be important to the organization, and there will be resistance to new assessment methodologies. It should be emphasized that continuity lies in the information and insight that comes from the assessment, not in the particular assessment methodology. Map the Respondent Concurrent with mapping the organization, the assessment team must examine the respondent population that possesses the required knowledge and will be the focus of the collection effort. The team works with the organization to identify those demographic factors—age, gender, rank, specialty, and so forth—that are important to understanding population responses. The team should divide the respondents into groups based on significant or critical demographic factors. Disaggregate Goals into Questions Organizational assessment goals are usually broad, high level queries. The assessment team must disaggregate these broad goals into deliverable questions that respondents could be expected to understand and answer. The process disaggregates goals through three levels: issues, subissues, and questions—all in the form of a query. Each level supports the one above, so the questions support subissues, and the subissues support the issues. Questions should be phrased for best understanding by each respondent group, keeping the information requirement constant. As a simple example, if the information requirement is to determine the operational environment-specific realism of a training program, the question should be written in future tense for those who have not yet experienced the operational environment and present tense for those who have. Question form should follow from the required information. Question form refers to the type of question asked—yes/no, true/false, multiple choice, rank order, fill in the blank, essay, and so forth (Schuman & Presser, 1996; DeVellis,
Assessing Collective Training
329
2003). Effective assessments are open to all forms of questions and may include open and closed questions, as well as qualifiable and quantifiable questions. Open, qualifiable questions add to the analysis burden of the assessment team, as well as to the burden of the respondent, because they are more difficult to answer. In studies of complex organizations, with different echelons of command or management and with various channels of authority, the disaggregation should be tied to those echelons, such that issues are executive level information requirements, subissues are management level requirements, and questions are targeted toward the assessment population.
Link Questions to Respondents The final step in preparation is to link questions to respondents to generate questionnaires. More questions will be prepared than are appropriate for delivery to the entire population, so this step requires deciding which questions NOT to deliver to a respondent group. While every situation is different, our experience shows that a single questionnaire should be limited to approximately 50 questions. In addition to questions prepared through the disaggregation process, a questionnaire includes “filtering” questions. These are questions designed to identify biases or other critical aspects of an individual respondent. Filtering questions must be subtle so respondents do not perceive that they are being screened. Question sequencing on questionnaires is important (Schuman & Presser, 1996). The most critical questions should be front loaded within the questionnaire as there is a natural drop-off after answering approximately 20 questions. This drop-off can be physical—respondents departing the questionnaire—or cognitive—respondents not putting sincere effort into their answers. Questionnaires must be rehearsed by having sample respondents execute the entire questionnaire as delivered in order to obtain internal feedback on the ease of the completion and understandability. In addition, each questionnaire should be delivered to an exemplar respondent via an interview regardless of the intended deployment means. Our experience is that the best way to vet the questionnaire is by verbalizing questions to a sample respondent.
Deploy, Track, and Receive Questionnaires Questionnaires are deployed via interviews, focus groups, written surveys, or the Web. The objective of deployment, tracking, and reception is to maximize questionnaire response and throughput. “Response” is the number of respondents who initiate the questionnaire. “Throughput” is the number of respondents who finish the questionnaire and provide valid data. The most important metric is response quality, which we define as a combination of total response, response by respondent group, quality of responses, and quantity of data.
330
Integrated Systems, Training Evaluations, and Future Directions
Store Data Points/Answers As data, in terms of respondent answers, are received, the assessment team must ensure it is properly stored in an appropriate “facility” with adequate backup. The design of the storage facility, most often a database, is critical. During this step the results are loaded into that data storage so that they can be readily retrieved during the analysis process. Analyze Data Points/Answers Analysis within SO3 should be considered a discovery process as the analyst develops insight from the data. Analysis is done in three sweeps. 1. The first sweep occurs during data collection using results of the closed, quantifiable questions, including demographic questions. This sweep provides an initial assessment of the data and informs later analysis. It also allows the assessment team to provide immediate feedback to the sponsoring organization. 2. The second sweep is the primary sweep. It mirrors the disaggregation conducted in step 3, aggregating the data into consensus answers for each question, subissue findings, and issue findings. For closed-end answers, this is a straightforward aggregation of responses. For open-end answers, the analyst must identify main points in the respondent’s answer. A respondent could provide multiple recommendations or bits of knowledge. These points are combined across the respondent group to provide weighted answers, showing both the emphasis and totality of the respondent group’s opinion or knowledge. Respondent group consensus answers are aggregated into by-respondent group subissue findings. These are then aggregated into consensus subissue findings, which are aggregated into issue findings. Multiple analysts can be used to support each aggregation and to obtain differing perspectives, but a single analyst should be responsible for the aggregation chain in order to maintain consistency and maximize insight. 3. The third sweep is a final inspection of the data and consists of unplanned or informal data analysis. SO3 analyses frequently yield some unexpected results, and the third sweep accommodates that; it can include more sophisticated methods, such as cluster analysis.
Report Results The value of the assessment is only as good as the organization’s ability to understand and act on it. To formulate decisions and drive change, results must be reported in a fashion that supports use by the sponsoring organization. Reports must be presented with clarity while still allowing organizational users to drill down into the details of the data. Relationship of SO3 to Other Methodologies and Approaches The SO3 methodology is a superb tool to integrate into existing assessment processes. The thoroughness and flexibility of this methodology will provide both the data and the analysis to support Kirkpatrick’s four level assessment
Assessing Collective Training
331
(Kirkpatrick, 1994), the formative and summative assessments part of the Dick and Carey systems approach model (Dick, Carey, & Carey 2005), or other methods. The SO3 methodology provides a means to extract, organize, and comprehend the information required to support these others processes. However, SO3 also has the ability to go beyond these processes to capture or access tacit knowledge. Tacit knowledge is first-person knowledge—it is knowledge of the world as the world impacts the individual. By extracting and analyzing tacit knowledge, the SO3 methodology leverages the organic expertise encapsulated in respondents’ tacit understanding. Thus, the SO3 methodology serves to assess not only the targeted virtual simulation training, but also of the training embedded within it. This includes such aspects as the overall training program, the relationship to other training and training systems, and to the conditions of the user. Other processes, which approach assessment from a third-person perspective, may not yield this level of detail or insight. METHODOLOGY APPLIED The authors have applied the SO3 approach in a series of training effectiveness analyses of networked virtual simulation technologies for the U.S. Army. These include the close combat tactical trainer (CCTT; Callahan & Mastaglio, 1995) and simulation network (SIMNET) (Alluisi, 1991). CCTT is the virtual simulation of the U.S. Army’s primary heavy fighting vehicles, the M1A1 Abrams main battle tank and the M2/3 Bradley infantry fighting vehicle. SIMNET is the technological predecessor to CCTT. To further the reader’s understanding of the process, the following section describes one of those studies. Overview of CCTT and Challenges of Assessing Its Effectiveness In January 2004, the Army Research Institute and the TRADOC (U.S. Army Training and Doctrine Command) Program Integration Office–Virtual contracted a study to assess the effectiveness of the CCTT (Jones & Mastaglio, 2006). This study assessed the general effectiveness of the CCTT (Goldberg, Johnson, & Mastaglio, 1994) via a process of interviewing and surveying users, then consolidating their opinions to develop general findings. The process provided sufficient insight and validated the approach; therefore, a second study was contracted to evaluate the contribution of virtual simulations to combat effectiveness. We will discuss this second study. For more details on the first study and the fundamental research to develop the methodology, refer to Mastaglio, Peterson, and Williams (2004) or Mastaglio, Goldberg, and McCluskey (2003). Virtual Simulations in Preparing Army Units for Combat Operations in Iraq This project involved two closely related, but separate studies; one was how active U.S. Army units used the CCTT during preparation for deployment to an
332
Integrated Systems, Training Evaluations, and Future Directions
anticipated high intensity close combat environment, and the other was an assessment of National Guard use of available virtual simulations to prepare their units to deploy to that environment. The initial goals were as follows: • To determine if virtual training impacts combat effectiveness and • To evaluate if changes should be made to the CCTT simulation or site operations to better meet pre-deployment training needs.
The study methodology consisted of collecting and consolidating user opinions. The respondents were from eight units across the United States that had returned from Operation Iraqi Freedom (OIF). The study scope was extended to include assessing the use of mobile virtual simulations in the U.S. Army National Guard (ARNG). The ARNG uses a mix of virtual simulation systems to train close combat tasks. These included mobile and fixed CCTT SIMNET systems. These systems are referred to collectively as virtual maneuver trainers (VMT). Methodology Both studies were each conducted in three phases. Preparatory Phase For each study a formal research plan that incorporated the eight SO3 steps was prepared for U.S. Army review. Study goals were disaggregated through two levels. From the study goals, the team developed several issues, and each was further separated into subissues for which questions were developed. This preparation process and decomposition is shown in Figure 29.1. Simultaneously, the team mapped the organization, identifying a desired sample of type units and respondents to whom the questions would be delivered. For each study, a list of respondents was created. These were personnel who would likely have the knowledge being sought, both officer and noncommissioned officer small unit leaders, from platoon to battalion level. For the OIF study, respondent lists were matched to the same list of respondents as in the earlier CCTT study: battalion commanders, battalion command sergeants major, battalion executive officers, battalion operations officers, company commanders, platoon leaders, and platoon sergeants. For the ARNG study, it was decided not to interview executive officers and to collect input from battalion master gunners rather than command sergeants major. The team also collected input from full-time support staff in the National Guard’s distributed battle simulation program and active U.S. Army advisors to the ARNG, as well as site staff supporting training. In the OIF study, there was one identified respondent demographic factor: duty position. The ARNG study included the following seven demographic factors to support a more detailed analysis: • Duty position, • Months in position,
Assessing Collective Training
Figure 29.1.
333
Study Goal Disaggregating
• Primary VMT, • Level of VMT experience, • Home state, • OIF veteran, and • Under orders for OIF.
Questions were cross-referenced to the class of respondents. Physical questionnaires were prepared for each respondent within each sample unit. MySQL databases were developed as a repository for the results of all questionnaires. A set of data analysis interface tools was developed to support analysis of the large amount of data anticipated.
Data Collection Phase Teams visited each unit within the unit samples providing first an in-briefing to battalion leadership, then conducting interviews. The interviews consisted of the prepared questionnaires, but included follow-up questions and time for discussion. Frequently the follow-up provided insights that were of use during the analysis phase. Both studies also used the Internet for delivering questionnaires via a Web site. Respondents were solicited on the Army Knowledge Online home page and via e-mails to visit a Web site to complete questionnaires. Respondents on the Internet went through an informal, multilayer screening process. The initial announcement served as the first screening layer. The next screenings were self-screening. Respondents first had to decide if they fit the survey criteria. Once on the site, they had to complete a demographic questionnaire targeted either to their duty position or to “other respondent.” The final screening was performed after-thefact by a subject matter expert. He reviewed each answer to determine its validity. Answers that did not follow from the question were not included.
334
Integrated Systems, Training Evaluations, and Future Directions
Data Analysis Phase The final phase consisted of aggregating the data in reverse order of the earlier disaggregating process. Individual answers were combined into consensus answers for each question, by position. These by-position consensus answers were then used to determine, by position, subissue findings. By-position subissue findings were combined into general subissue findings. Finally, general subissue findings were combined into issue findings. The final phase was aggregating the data. Individual answers were combined into consensus answers for each question, by position or by job title. These byposition consensus answers were then used to determine by-position subissue findings. By-position subissue findings were combined into general subissue findings. Finally, these were combined into issue findings. The final step was reporting results via briefings and in writing. Use of the SO3 methodology permitted the identification and reporting of insights not related exclusively to the use of CCTT and SIMNET, but gleaned from the respondents’ perspectives of virtual simulations and their opinions on needed changes to the training environment based on the operational environment experienced in fighting the Global War on Terror. Some examples are as follows: • The change from using virtual simulations to live training because of the greater emphasis on individual tasks, • A need for instruction on how to integrate virtual simulations into an overarching training program, and • The need for virtual simulation systems to present more degraded mode conditions, that is, model unexpected equipment failures.
PERSISTENT TRAINING ASSESSMENT—THE NEXT EVOLUTION AND RECOMMENDATIONS Virtual simulations will improve and be more readily accessible while becoming more common, more realistic, more distributed, and more integrated. As the types and capabilities of virtual simulations expand, assessing the training for which they are used will become more critical. Execution of assessment, however, may become more difficult. Heretofore virtual simulation systems have been centrally located and controlled and their users relatively easy to identify. As virtual simulations become embedded systems or distributed via Web-like solutions, identifying the value and impact of their usage will become more difficult. We make several recommendations to conducting assessments in the future, primarily efforts to expand and integrate assessment. • Integrate assessment into the system or program design—make it organic. Generally, assessment is external to system design and development, and assessing a simulation requires extra effort. Software and hardware should include the ability to automatically capture, store, and report data to support assessment.
Assessing Collective Training
335
• Maximize the collection of data by capturing it whenever the user operates the virtual simulation. Major marketers now use “point of sales” collection. Instead of extracting a large amount of data from relatively few customers, marketers now extract a small amount of data from many customers. Each time a product passes a scanner, a customer accesses a Web page or calls the company, or other contact between the customer and the company occurs, that information is recorded. Often, the customer is not aware that he or she is supporting an assessment. The increase in the amount of data available over time can aid in providing a much more complete assessment. • View virtual simulations as a system within a system of systems and expand the assessment across the breadth of that system of systems. Virtual simulations are normally a component within a training process that uses a variety of live, virtual, and constructive simulation tools. The performance of a single system can best be viewed in the context of the entire environment; assessment should be done across that environment.
CONCLUSION Assessment is not a simple process and will not—particularly for the virtual simulations supporting team and collective training—yield to traditional comparative studies of normative use. Assessments will be inherently based on an analysis of qualitative data. However, the collection of that data can and should be planned and organized to focus on accumulating information that will best support an analytic effort to address the sponsoring organization goals or specific needs. Assessments performed using SO3 can provide the insight needed to help designers and developers of virtual simulations and training programs to deliver more effective training. REFERENCES Alluisi, E. A. (1991). The development of technology for collective training: SIMNET, A case history. Human Factors, 33(3), 343–362. Boldovici, J. A., Bessemer, D. W., & Bolton, A. E. (2002). The elements of training effectiveness. Alexandria, VA: U.S. Army Research Institute for Behavioral and Social Sciences. Burnside, B. L. (1991). Assessing the capabilities of training simulations: A method and Simulation Networking (SIMNET) application (Tech. Rep. No. 1565). Alexandria, VA: U.S. Army Research Institute for Behavioral and Social Sciences. Callahan, R., & Mastaglio, T. (1995). A large-scale complex virtual environment for team training. IEEE Computer, 28(7), 49–55. DeVellis, R. F. (2003). Scale development: Theory and applications (2nd ed.). Thousand Oaks, CA: Sage Publications. Dick, W., Carey, L., & Carey, J. (2005) The systematic design of instruction (6th ed.). New York: Allyn & Bacon. Goldberg, S., Johnson, W., & Mastaglio, T. (1994). Training in the Close Combat Tactical Trainer. In R. Seidel (Ed.), Learning without boundaries. New York: Plenum Press.
336
Integrated Systems, Training Evaluations, and Future Directions
Goodwin, E., & Mastaglio, T. (1994, December). Integrating users into systems development: User evaluations in CCTT. Paper presented at the 16th Interservice/Industry Training Systems Conference, Alexandria, VA. Hurwitz Group. (2002). Customer life cycle management [White paper written for J. D. Edwards]. Framingham, MA: Hurwitz Group, Inc. Jones, P., & Mastaglio, T. (2006). Evaluating the contributions of virtual simulations to combat effectiveness (Study Rep. No. 2006-04). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Kirkpatrick, D. (1994). Evaluating training programs: The four levels. San Francisco: Berrett-Koehler Publishers. Mastaglio, T. (1991, April). Designing simulation systems to support collective training: Lessons learned developing military training systems. Paper presented at the ACM Conference on Human Factors in Computing Systems, CHI 91 Workshop on Advances in Computer-Human Interaction in Complex Systems, New Orleans, LA. Mastaglio, T., Goldberg, S., & McCluskey, M. (2003, December). Assessing the effectiveness of a networked virtual training simulation: Evaluation of the Close Combat Tactical Trainer. Paper presented at the 2003 Industry/Interservice Training Systems and Education Conference, Orlando, FL. Mastaglio, T., Peterson, P., & Williams, S. (2004). Assessing the effectiveness of the Close Combat Tactical Trainer (Research Rep. No. 1920). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Schuman, H., & Presser, S. (1996). Questions & answers in attitude surveys. Thousand Oaks, CA: Sage Publications.
SECTION 3
FUTURE DIRECTIONS SECTION PERSPECTIVE Rudolph Darken and Dylan Schmorrow Did anyone ever think that virtual environments would not be useful for training? Certainly they were crude and extremely limited at the start, but this seems like an obvious conclusion to draw from the set of technologies we call “virtual environments.” How long after the Wright brothers took their first flight at Kitty Hawk did someone think that someday that invention could be used for transportation? The airplane went from flights of 100 feet to 24 miles in less than 18 months. Virtual environments (VEs) did not mature quite so fast, but even the wireframe images the NASA (National Aeronautics and Space Administration) Ames laboratory was producing early on had obvious practical uses—training being one of these. As we read “In the Uncanny Valley” by Judith and Alexander Singer, we see a vision of a symbiotic relationship between the trainee and his VE. In fact, Craig gets confused and cannot tell what is the VE and what is real. At one point he says, “When is this happening? Or is this some kind of replay . . . exercise?” His sensory system was enclosed in the VE in a closed loop. There was no clear separation between Craig’s VE and the real world. How similar is this to Orson Scott Card’s Ender’s Game (1991), which is the focus of Jack Thorpe’s “Trends in Modeling, Simulation, Gaming, and Everything Else”? Ender’s life is consumed by “training” for a mission he thinks he is preparing for in the future. Only later does he come to realize that he was doing the mission when he thought he was only training. There is no separation for him. Thorpe takes this further by suggesting that a host of activities might someday converge through what we call a virtual environment. Personnel selection, recruitment, socialization and initial training, specialization training, crew and unit training (the military would call this “collective training”), planning, rehearsal, reconstruction, and review all meld into one activity within the VE. Think this is purely a military phenomenon? Think again. What NFL (National Football League) team would not pay dearly for such a system that helps it field a superior team? What airline would not pay for a system to assure the highest performing and safest aircrew? This idea is pervasive and universal.
338
Integrated Systems, Training Evaluations, and Future Directions
But this is a philosophical argument. Is it what we really want, and, if so, what will it take to get there? Randall Shumaker fuels the discussion in “Technological Prospects for a Personal Virtual Environment” with a thorough discussion of where we are today and where we might be in 2020, given a reasonable set of assumptions. What might we expect in terms of visual and aural displays, cognitive computation, and human behavior modeling? Is the “personal VE” that Shumaker describes capable of being the symbiotic environment that Singer and Singer as well as Thorpe describe? Furthermore, how well does this vision match what potential users of such a system think they will want? Alfred Harms Jr. describes in “The Future of Navy Training” where the U.S. Navy is headed, with a strong belief that VEs will play a critical role in training. He believes VEs will supply a unique learning experience that allows for “discriminating realism, contextual fidelity dynamic and interactive participation, confirming repetition and controlled assessment of cognitive, affective and psychomotor skills.” This is desperately needed as he sees the missions of the U.S. Navy becoming less predictable and being activated on short notice. William Yates, Gerald Mersten, and James McDonough carry this theme in “The Future of Marine Corps Training” into the Marine Corps, which is arguably becoming less predictable than the U.S. Navy. They describe a scenario where simulation based training using a VE is utilized to achieve a desired end. They make the case for immersive mixed reality that is neither purely real nor purely virtual. But they also call for increased sensory realism as it fits the training requirements of a mission. Are you ready for snakes falling from trees?! The U.S. Army has long taken a leadership role in the development of VE technologies for training. In fact, a large portion of the innovations that we regard as virtual environment technologies were initially funded from the Department of Defense, some specific to certain branches, but many were shared across the services. Roger Smith tells us in “The Future of Virtual Environment Training in the Army” that this has been changing. We are past the tipping point where the government is no longer the prime catalyst for technology development; it is industry (often in service to the military) that is innovating. Most importantly, the video game industry has been a key motivator for the improvement of computing hardware, especially graphics computing and display. Smith takes us into U.S. Army missions and how VE applications might serve them in the future. The U.S. Air Force recognizes the need for improved performance assessment and continuous learning. Will there be a day when each airman has his or her own “learning VE” for all training and education needs? In “Future Air Force Training,” Daniel Walker and Kevin Geiss relate these ideas to existing trends in live-virtual-constructive simulation technology, which really starts to sound a lot like Ender’s Game if you take it to its logical conclusion. Admittedly, the mission profiles of all the services are extremely large—so large that the generalization we are about to make can never be absolutely accurate, but in general it holds. The natural interface between soldiers or marines and their working environment is their own bodies. There is no abstraction beyond their physical bodies. This is not generally true for the U.S. Navy or the U.S.
Future Directions
339
Air Force where a ship, an aircraft, or a display of some sort is the abstraction between the operator and his or her world. This is important because if we think Ender’s game or the uncanny valley might never happen, it will happen there first because it is easier to mimic an existing abstraction than to mimic the real world. Simply, the breadth of stimuli is far too large if we have to simulate physics than if we can abstract it to a small subset. Medical training is another area where VEs will play a large role. The extraordinary precision needed for many medical training tasks has somewhat delayed the development of medical VE trainers, but recently this is beginning to change. James Dunne and Claudia McDonald discuss how this might happen in “Factors Driving Three-Dimensional Virtual Medical Education.” Their “perfect storm” refers to the enormous shortfall in medical capabilities combined with the increased performance and VE capabilities offered by the technology. Put together, we are going to see progress in the medical field happening at a much faster rate than we have seen previously. Last, in industry, we see an increased interest in the use of VEs for design and production. The automotive industry has long been a participant in advancing VE technology, but what about training? Dirk Reiners discusses how industry views the use of VEs for training in “Virtual Training for Industrial Applications.” The business case for the use of VEs is still problematic. He refers to this as the cost of entry. If you cannot afford to invest enough to do something useful, you cannot do anything. Software issues are as much a problem for industry as they have been for the military. Yet decreased costs, increased capabilities, and the need for a competitive edge is driving industry to seek new ways to exploit VEs for training. Robert Gehorsam gets more specific in terms of training applications based on game technologies that focus on the identity, social, and interpersonal aspects of the VE in “Corporate Training in Virtual Environments.” He explains how avatar based applications are better able to deal with human-to-human interactions so important to many corporate training problems. Projecting this into the future, will there be “always-on” training environments that we all will use on an “on demand” basis to meet our training needs? Do the military, medical, and industrial user communities agree with the vision of convergence described in Ender’s Game and “In the Uncanny Valley”? Yes and no. There still seems to be a focus on increased fidelity. We need more visual fidelity, realism, haptic feedback, or as Yates et al. suggest, a “smack in the face” where appropriate. Imagine that we could construct VEs that were indistinguishable with the real world. Would that solve our training problems? In fact, we can do this now. What is the difference to a radar operator deep inside a ship’s combat information center between a real mission where he sees blips on a screen that are stimulated by sensors and a simulation that virtually stimulates his display in exactly the same way? What he sees and hears is exactly the same. That radar operator already knows something about Ender’s game. If we had a perfect place to practice, what we would have is a perfect place to practice, not necessarily perfect performers. In other words, even if we could build a virtual experience that
340
Integrated Systems, Training Evaluations, and Future Directions
was indistinguishable with the real experience, we would still need more. What would we need? Several of the chapters in this section touch on this. We would need a model of learning. How does human and/or team performance improve? What techniques accelerate learning or performance improvement? Anyone who has played golf knows what a performance plateau is. You improve to a point and then you do not get better no matter what you do. Play more golf ? No improvement. Take more lessons? No improvement. Then, for what appears to be no reason whatsoever, you get better. There are lots of theories as to why this occurs, but until we understand it well enough to build mechanisms in our VEs that help us overcome performance barriers, VEs will just be another place to practice. Why replicate the real world when the real world is so limited? Cannot our VEs for training do better? They should. Roger Smith has it right when he says, “VEs are not a training technology.” They are not in and of themselves. We need to remain committed to improved human performance—whatever the mechanism that achieves it. VEs are a collection of technologies that are part of the answer. Let us not assume they are the whole answer any more than are more pixels, better audio, or vibrotactile clothing. What does the future hold for VEs for training? Who can know for sure, but we know what trajectory we are on now and where that might take us. Clearly, stimuli fidelity is going to improve. There are too many demands (warranted or unwarranted) for more fidelity for this to not be the case. What is less talked about but more critical is the science of learning—cognitive modeling, human behavior modeling, scaffolding, models for learning, real time monitoring of physical, mental, and emotional feedback, and virtual instruction. These combined with high fidelity, affordable VEs clearly head in the direction of Ender’s Game and the “In the Uncanny Valley.” Is there a day when each of us owns our own personal learning machine where we can enter a virtual world that knows us, what we can do, what we need to be able to do, how others have done it, and how to help us improve? Or maybe doing and practicing really are one and the same. The final chapters of this section discuss the future of virtual environment displays, virtual reality that can be used in both cognitive and physical rehabilitation, personal learning associates that offer a dialogue with the virtual world, and virtual reality displays in the museums of the future. REFERENCES Card, O. S. (1991). Ender’s game. New York: Tor.
Part VIII: Future Visions
Chapter 30
IN THE UNCANNY VALLEY1 Judith Singer and Alexander Singer
INTRODUCTION BY KATHLEEN BARTLETT In this chapter, the reader will experience a textual representation of the futuristic fusion of man and machine. Join the ultimate digital narrator, Douglas Craig, a “human-computer diad” (part human/part avatar) as he interacts with a simulated environment (a cataclysmic event of the past). Experience Craig’s confusion when his “world” is “reconfigured.” Follow along as he searches for the “Control Center” to restore his missing imbedded memory to the “errant search engine of his mind.” Puzzle over the meaning of his inevitable conclusion. This story—the “pseudo-life” of Douglas Craig—challenges traditional definitions of life, of self, of memory, of time, and, ultimately, of reality. By controlling the information available to the narrator, the authors force the reader to identify the essential environmental constructs that contribute to our cognitive perception of (and our subsequent reaction to) the world (our environment) as defined by our easily manipulated senses and our often insufficient memory (experience). Reality in this chapter is only partially constructed, and by the time the narrator (and the reader) put the pieces together, the experience is over (sort of like life). A renowned Chinese novelist2 observed hundreds of years ago that “where illusion becomes reality, reality becomes illusion” and that comment describes the most profound paradox embodied in this story, and, indeed, in the continued improvements in the fidelity of simulated experience and the envisioned achievements in augmented cognition (via enhanced memory/experience). As the quality of the simulation, (the illusion) improves, users lose all control over perceptions, experiences, even memory, resulting sometimes in unreliable, unpredictable outcomes. Thus, the “Relativity” of human experience (like the title of the M. C. Echer etching) suddenly becomes questionable and conflicting (surreal or hyper 1 The term originated in robotics, but migrated to the world of computer graphics. It refers to the confrontation with entities whose resemblance to humans either physically or behaviorally is close enough to engender an “uncanny” sense of disquiet and fascination. 2 Ts’ao Hsueh-ch’in, The Dream of the Red Chamber (1715?–1763?).
342
Integrated Systems, Training Evaluations, and Future Directions
real). As this story suggests, when the perceived external environment becomes suspect or nonexistent, the internal self is all that remains. * * * * Douglas Craig knew his virtual world was being reconfigured. The process in this advanced system was distinctly different from any previous experience. The visual field had gone dark: the dark you perceive when you close your eyes and shapeless afterimages flicker. Once again, phenomena inside his own sensory system were mirrored in the VE system, creating a closed feedback loop, blurring clear-cut separation. The phrase “mind games” had taken on new meaning. First, he heard the wind, muffled as if through a protective shield. Then he “felt” the folds, pressures, and weight over his entire body of a coverall garment of some kind, responding to the sound of the wind’s rise and fall. His audible breathing seemed to come from inside an enveloping helmet; his gloved hands confirmed the shape of the enclosure. In the after action review (AAR) he was unable to explain why he was certain, before he could see, that he was in a very cold place; but suggested it may have been something that triggered both smell and taste in his sensorium. He was right, of course. The faceplate gently bounced his body-warmed breath back at his skin. Now, the dark lightened, and as this other world took visual form, he had to grasp wildly at a nearby railing. Far below the metal catwalk he stood upon were a vista of rippling steel blue water strewn with jagged ice flows and an oppressively leaden sky merging into a boundless sea, a polar ocean. From this point on, the small cloud of his warm breath emerging from the helmet’s mouth screen reinforced his sense of the “palpable” presence of lethal cold outside his warmsuit. His exact phrase was that this awareness would now “shadow his reasoning” the way that the “patient’s memory fragments” had done. As he regained his balance he saw that the structure around him was perhaps hundreds of feet above the sea. He tried to make sense of the vast tangle of girders and crossbeams, cables, shafts, and signal lights cutting through the arctic twilight. Mammoth cranes were fed huge structural components by giant choppers: hulking atomic powered freighters built to grind through rare ice flows clustered at floating docks. Wind muffled the soundscape. A faint vibration under his feet brought his gaze up to see four figures moving toward him along the catwalk. His helmet’s sonic enhancers picked up the faint scramble of foreign speech, argumentative, oddly slurred. In the few moments he had to prepare for the encounter he glanced down at the forearms of his warmsuit and was relieved to recognize the universal icons of a long obsolete audible language translator and a GPS (global positioning system) locator with fullzoom outboard imaging. The only reason he had an inkling of their operation was the subcourse he had taken years ago in the evolution of technology controls. He grinned at the memory of sophomores threatening self-immolation if they had to unscramble another screen remote from the early 2000s. As the four figures came close enough to evaluate, he felt the beginning of unease. A quality in their gait and gestures suggested harsh living, alcohol, and
In the Uncanny Valley
343
an argument in a babble of dialects he thought were Russian and Norwegian. As the gap between them closed, he backed against the railing so they could pass easily. As he clumsily fumbled with the translator, he was interrupted. One of the four, a Mongolian woman, whose translated voice and facial movements were out of sync, had turned to him and said bluntly, “Control Central need you . . . (linguistic babble) . . . now.” Craig wanted to put this world on pause, giving him time to think through its underlying logic, beginning with these four: two men, two women. Their body language suggested an impatient split between their ongoing argument and what might be an increasing curiosity about himself. Behind transparent faceplates their work-hardened faces suggested a life on North Sea oil rigs, decades ago. Varying individually, their warmsuits all had multiple geometric patches of odd symbols and colors, both enigmatic yet vaguely familiar. The four did not resemble entities he had ever encountered. In his second AAR, Craig recalled how much he wanted to ask them about their pseudo lives as avatar projections of advanced students much like himself. Had not he experienced a lifetime of such encounters, in c-games, in years of VE learning? Inevitably, he had to consider the possibility of a system-induced lucid dream state. Although the pace of events made it difficult to pursue the threads of this conjecture, we know that subsequent events unfolded against the background resonance of these developing thoughts. Now the four started to move away and he blurted out, “I’m new here. Please. Tell me, where is Control Central?” The woman who had spoken to him paused as they were moving away; the others slowed, and she glanced back at him as if he might be dim-witted. In English, heavily accented, “You walk there . . . ,” gesturing in the direction from which they had come. She seemed to be deliberately exercising her language skills. “Turn left at . . . post. Follow arm picture after turn.” For a beat he looked puzzled. She came over, vigorously tapped the locator display on his warmsuit’s forearm, turned impatiently, and rejoined the others. The haptic physicality of the woman’s touch through the warmsuit stayed in his memory as he started away. The illusory cold outside the protective clothing somehow impelled him to move quickly. As he turned a corner the vast construction in progress revealed itself from another viewpoint. While one part of Craig’s mind tried to make sense of the project before him, another part would continue to search for the underlying rationale of the entire exercise. He was moving toward some sort of installation in progress on a platform just above the catwalk a short distance ahead. Moving toward the action he continued the internal argument: he was not going to be a neuroscientist guiding surgibots through a patient’s brain in some future scenario or some kind of engineer at the pole, decades in the past. As an advanced playerist he felt treated like some Generation D kid on his first imbed high.
344
Integrated Systems, Training Evaluations, and Future Directions
Alongside his resentment, the errant search engine of his mind replayed a fragment out of the interior chaos. Craig flashed back to a moment with his team moving through the patient’s hippocampus region: one of the fleeting images that seemed to flicker through the translucent neural holoscape, a girl near a wind farm, playing with an actual live dog. The animal looked like the mutt he had had in childhood, just before his sector excluded organic pets. The remembrance of his loss suddenly stung him, now, on the polar sea. He gasped, surprised at the impact and shook his head to clear the fluid welling in his eyes. It is useful to speculate on the causes that disrupted Craig’s access to his own, quite good memory of the historic events a quarter century in his past. Remember, he was fully aware that in the final physical preparatory stage his imbed search engine would be temporarily neutralized. In his time, this could be in itself, a somewhat physically traumatic process. Consider that Craig’s natural world of 2058 and the arctic period of 2033 not only spans a gulf of a quarter century, but also encapsulates one of the most turbulent and transformative periods in recorded history. His own time was no more able to organically integrate the range of hierarchical shocks and dizzying potentialities than the generations before and after his own cohort. Just past the installation rising above the platform he was headed for, he checked the diagrammatic overlay on his locator screen, made a sharp right turn, and almost stopped short in his tracks. Before him was yet another angle of the ongoing construction, but this one set off a chain of synaptic sparks. There were temporary metal stairways climbing and connecting giant pylons; the whole structure was a display of materials and technologies long obsolete. A handful of workmen hurried up and down the stairs, some carrying gear he could not identify. The associative process had kicked in and he just hung on, letting it take him wherever it would: a drawing . . . Essen . . . Eckar . . . Escher! M. C. Escher! Dutch! master draftsman; early twentieth century; clinically precise etchings, surreal or hypereal, often convoluted geometric forms; somehow projecting a disarmingly naive sensibility. The picture he recalled showed humanoid, faceless automata, trudging up and down impossible stone stairways in a timeless, senseless, labyrinthine maze; all perpendicular surfaces hopelessly out of whack with each other. The whole, a mockery of the viewer’s instinctual need to locate a fixed position in space/time, aptly entitled “Relativity.” For many of his classmates in the cross-cultural interdisciplinary courses, particularly the scientists, discovering Escher’s graphics had whetted their curiosity to explore the two-dimensional fixed visuals of the past; goal achieved. He smiled, remembering the man behind the stratagem: Professor Wellman. Tricky old guy. Using the portal of past art forms, we were guided to explore the evolving discipline of psychohistory, at which point Wellman could repeat his favorite adage to us: George Santayana’s, “Those who cannot remember the past are
In the Uncanny Valley
345
condemned to repeat it” (1905, p. 284). And then Craig’s jaw dropped. He checked his forearm screen, fumbled till the date came up: 10/08/2033. Then he let loose a stream of profanity and shouted into the air, “If they hadn’t taken out my damn imbed, I might have a clue what the hell I’m supposed to be doing. You set me up to fail and then fail again!” After another stream of invective, he took a deep breath and grew quiet. He was not going to pull the rip cord, not yet. He checked his locator: 40 meters from Control Central. He turned and started running as fast as his bulky gear permitted. His racing mind barely considered the immensity of the passing holoscape totally integrated with his sensorium. He focused on possible choices if they had access to full signal contact even in those benighted times. The thought arose, unbidden; my god, in 2033, they were still dying of cancer. This was the first time Craig had consciously allowed himself entry onto the uncertain terrain of “empathy.” The voice of pragmatism argued that all this was “virtual” and why care beyond a final grade rating. Just below full awareness another argument persisted. There was the possibility that this entire experience, including the sense of failure, was somehow more than the sum of its parts. His full-zoom outboard imaging system signaled, vibrating against his forearm. Checking his screen he was able to unscramble the semiotics: 60 meters away, around a bend, Control Central would be a bulge thrusting into space from the catwalk. His pace slowed as he tried to sort out his chaotic thoughts and begin to focus on an effort to evoke the zeitgeist of 2003, 30 years ago. Craig had learned of the pandemics tormenting populations in the first three decades of the twenty-first century. Now that he was immersed in the last third of that period, he drove his perspective and historic memory into this very mindset. Massive quantum computation was still relegated to the status of a laboratory stunt. The Time of Terrors, limited as it was, left national borders convulsed and recovery stunned for another kind of onslaught. It was a time of genetic life extension, full solar/tidal power, and hydrogen-powered locomotion. Profound leaps of understanding in neuroscience and biophysics burst into the late 2040s and 2050s, his own time, with incandescent impact. He braced himself to bring his sensibilities to that other time, beset by physical trauma and a near frenzy of anxiety: the perceived decline of conventional energy sources and possibly irreversible environmental degradation. He was certain that it had been a large-scale event, somehow related to man-made structures; the same calendar year, 2033, his present screen had indicated. He hoped when he finally entered this Control place, he would find something that sparked the evasive key memory. Craig now faced the door of Control Central and saw his own image reflected in the WeatherPlex window of the portal. Even with his warmsuit helmet and the glassy surface reflectance, the face was unmistakably himself. Where was the avatar he had carefully built and refined over the years? We know the full range of his body and brain sensors registered heightened awareness and tension. From later reports we learned that he felt unequal to the task but determinedly rejected the disengage option.
346
Integrated Systems, Training Evaluations, and Future Directions
Looking past his own reflection he could make out figures and displays. Before he opened the door he thought, “Nothing will change. The past is immutable and, good as it is, this is a simulation.” Then he entered Control Central. On his left he saw, against the sidewall, an open section for hanging cold weather clothing. Deliberately turning his back to the group present, he moved to the clothing area to remove his warmsuit as unobtrusively as possible. Past the awkwardness of strange fastenings, he could not help briefly indulging in the unfamiliar textural and haptic signals in removing the outer garment. Before he could question his actions he was relieved to find he was wearing a onepiece work-site coveralls, like the others present: elastic body-fitted material, sealable pockets; attachment points for gear; basic computer interface stiffened the forearm sleeve; solid brown tone. The room’s air felt overly warm and a bit stale even though the grid-covered vents were audibly working. This was a bare bones construction site work center. Eight people of mixed gender and race moved about in front of the large, spatially projected holoscreens still in use in that time. He thought he must have seen those screens last as a child, at perhaps a techno museum display. At this point Craig was only minutes from one of the key moments in the exercise, an emotional explosion we will analyze. Consider the critical weight of the cognitive dissonance generated by the unfolding events. After a moment Craig decided there was an air of latent anxiety in the room. He approached a man off to one side studying a digital clipboard suspended in space, chest high in front of him. A name tag on his coveralls read “Lou Benar.” While Craig was still moving and without taking his attention from the clipboard Benar said offhandedly, “How’s it going, Doug?” Craig realized everyone wore name tags, but he still had to mask his surprise and try to sound both casual and involved. “Came over soon as I could, Lou. Bring me up to speed.” On the verge of responding, Benar reacted to a verbal signal Craig could not discern from a man at the center of the group. The level of agitated exchange had increased among the central group of five people—mostly English in several dialects. Combined with the gestural and voice controls used in their time, a dynamic, relatively integrated interaction between humans and computers was achieved. Craig was strikingly reminded of the gulf between what played out before him and his own time a quarter century later: neural imbeds and (firststage) quantum computing finally delivered the symbiosis of the humancomputer dyad. As he moved toward the others, Benar said matter-of-factly to Craig, in a tone not requiring response, “Staff was supposed to get the HQ bulletin. Where were you?” He passed a Nordic-looking woman who picked up his head gesture and positioned herself to split attention between the rapidly changing displays and Craig. Her name tag was “Nora Lavrans.” Craig moved just a little closer to her to be heard over the stir around them. Lowering his voice he said, “Please tell me, Nora, what this bulletin is about. Seems to have everybody’s attention.”
In the Uncanny Valley
347
She had the same kind of projected digital clipboard that he had seen before; her attention was now darting between the clipboard and the curved wall pulsing with displays of engineering and geological diagrams, graphic dynamic semiotics. She spoke to one of her colleagues in bursts of monosyllabic techspeak; others were in audio or visually supported contact with remote points. Then flashing him a glance, she responded rapidly in Norwegian-accented English. “Shortly, a red line will connect three regions. That line indicates the borders of the continental shelf that is part of the polar tectonic plate.” Craig would not be able to entirely resolve the conflicting demands for attention, but at least two elements could be sorted out. First, some historic catastrophe was impending and his own organic memory, unsupported by imbed retrieval, had so far been inadequate. Second, the VE experience itself was being distorted by his interaction with participant entities, evincing disturbingly “human” characteristics. In a few seconds he was parsing those two elements with surgical concentration. His choice of the sequence in which to pursue the two themes was a critical neurological insight, made under temporal and situational constraint. Craig knew he was distracted momentarily from the display Nora had indicated. His concentration was on the figures around him. Since entering the room he had been making a mental checklist of attributes exhibited by the inhabitants. He ran down the list: the rendering of the clothing’s elasticity, fold formation, following every subtle body movement; part of the overall conviction of a solid body inside, subject to gravity; the small imperfections, variations, and asymmetries in limbs, torsos, and hands; the vagaries of hair, skin color, texture, and reflectance; the endless subtle expressivity of faces and gestures, each with its unique rhythm; speech not improved by technology, with its slurs and losses, the product of vocal cords bouncing around the bone and tissue of the human skull. A third form was present: neither avatar nor human. On entering, he had noticed the debris of food and drink containers and dispensers and a restroom near the open closet wall. Avatars do not use restrooms. He was reminded that his own physiology was temporarily modified by biochemical procedure. Before he could toy with the idea of an improvised Turing test, he became aware that everyone was quietly riveted on the display Nora Lavrans had told him would appear. Enhanced satellite images of three polar regions, an expanse over 1,600 kilometers, were made contiguous by a glowing red line: the continental shelf bordering the polar tectonic plate. Geographic information hung holographically above the space perspective: WANDEL SEA, north of Greenland; SPITSBERGEN, north of Norway; SEVERNAYA ZEMLYA; north of Siberia. All of them were arrayed along the curve between 80° and 85° North Latitude, south of the vast Angara Basin. The search engine of Craig’s unconscious had been digging inexorably in the dusty files of memory for the triggering stream of images. As he was framing a question, a rapid series of drone camera angles flashed across the screens, close enough to make out details. The viewpoints revealed a wide causeway, marching to the horizons, great carbon/alloy towers holding it
348
Integrated Systems, Training Evaluations, and Future Directions
above the ocean. Along one side of the causeway ran a wide maglev system. As the camera angles zoomed closer, he could see structural irregularities and, finally, extensive cracks progressively forming. The tiny forms of construction crews, like insects at this remove, were scattering in clusters; it was the proverbial “train wreck” in slow motion. The agitation in the group around him was palpable; voices straining to repress panic, moving into emergency mode. The display produced a new graphic: scattered clusters of pulsing, luminous, concentric white rings along parts of the tectonic plate outline. Craig leaned closer to Nora and whispered, “The white circles, they indicate seismic shock waves out there, right?” Nora whispered back without turning her head, “Yes. Look at the numbers!” On the screen alongside the concentric rings the numbers climbed; 6.7, 6.9; and climbed; 7.2, 7.8. Finally Craig could not restrain himself. Later, he was able to describe his own voice as hoarse, unrecognizable, and sounding like a man waking into a nightmare. He said, “When is this happening? Or is this some kind of replay . . . exercise?” A compact, graying man in the group, named Charles, spun about and yelled in rage, “Where the hell did you come from, you damn fool? You’re watching hundreds of people die before your eyes and the biggest engineering project in the world go down the drain and you’re asking about replays!” Three of the people near Charles warily moved closer to him. Craig lost it and erupted with a vengeance, lots of pieces having fallen into place. “Listen to me, damn it! Tacoma Narrows, 1940s, they don’t get aerodynamics; Minneapolis, 2000s, they don’t get the importance of redundancy; Jintang Strait, 2020s, they don’t get prototype c-modeling transition to full size! You’ve had half a century of c-modeling and seismic analysis and god only knows how many lives and . . . .” Charles exploded and went for Craig. The three who had been edging close moved fast enough to hold on to him. Douglas Craig recalled the sequence of events perfectly. He backed away from the struggling figures. He shut his eyes and shook his head as if to clear away the mind-wrenching scene he still hears: the raging voice, the others placating, reasoning, over the scuffling sounds. It lasts only a few seconds. And then, there is the unforgettable sound: a low, low frequency roar from the bowels of the earth, making his body vibrate. His eyes snap open. The whole room shakes so hard he fights to keep from falling. People and gear are thrown about. He is slammed back against the row of warmsuits. The lights go out, and they are all shadows against flickering emergency light points. An alarm screams and from somewhere a red light strobes. It lasts forever—32 seconds. Then it is over; the silence and dark is absolute. His breathing and pulse settle. The dark is calming. Selfness is intact. REFERENCES Santayana, G. (1905). The Life of Reason (Vol. 1). New York: Scribner’s Sons.
Chapter 31
TRENDS IN MODELING, SIMULATION, GAMING, AND EVERYTHING ELSE Jack Thorpe Every so often it is useful to take a step back from the daily grind of scientific investigation and technology development to look for trends that help trace where our field has been and where it might be going. This is my take. It appears to me that the major trend in modeling, simulation, war games, and large-scale distributed games, which for the purposes of this chapter I will lump together, is one of convergence. This was detectable in the early 1980s following successes in computer networking sponsored by the Defense Advanced Research Projects Agency (DARPA). Networking standards allowed distinctly different computing machines to exchange data at healthy speeds and volume, thus allowing observers to look inside applications running on these different machines and compare and contrast their similarities and differences: We could see inside the application stovepipes1 and find commonalities, though they often were disguised with stovepipe-specific jargon and iconography. It was not easy, but it was possible. For the U.S. military, some of the more obvious stovepipes concerned (1) personnel selection and recruitment, (2) initial socialization and basic training, (3) specialty skill training, (4) crew and unit training, (5) doctrine development (operational, strategic, and tactical), (6) structured speculation and analysis about possible future wars, (7) the design of force structure to meet these future threats, including the military systems needed to equip the force, (8) planning (short and long term), (9) mission rehearsal, (10) mission execution (operations), (11) after action review, and (12) construction of the historical record for noteworthy events. Smaller stovepipes sometimes existed within each of these larger stovepipes, for example, the branches and specialties within our military services (infantry, aviation, artillery, surface and subsurface ships, intelligence, and so forth). 1
By “stovepipes” I am referring to the tendency of different functionally oriented groups to talk among themselves with unique terms and concepts, organizations, funding, careers, and so forth, that do not easily lend themselves to communication and collaboration with groups from other “stovepipes.” The stovepipe metaphor is often used to suggest parallel but noninteractive enterprises.
350
Integrated Systems, Training Evaluations, and Future Directions
Each of these stovepipes tended to live in its own space, had its own keepers and nurturers (“professionals” in these disciplines with professional support associations, trade conferences, and so forth), had its own descriptive concepts and languages, and even had its own funding streams and legislative advocates. For the list above, these might include recruiters and aptitude test makers, sociologists, instructional systems designers, training device manufacturers, instrumented range designers and white hat controllers, operations research and mission route planners, command and control systems designers, performance measurement experts, historians, analysts, and accountants, to name a few. As information and communications technologies became more prevalent within these various stovepipes, we started to see cracks that allowed us to peek inside to see how someone else’s application worked. This was instructive. Most people were expert in just one discipline, so to understand a little about how another discipline was wired together was revealing. Many examples come to mind. Here are four. DISTRIBUTED SIMULATION AS A COMMAND AND CONTROL (C2) SYSTEM Early in the development and testing of large-scale distributed simulations, circa the mid-1980s, an operationally experienced senior officer referring to a network of simulators observed, “What you have here is a command and control system.” This was incomprehensible to the simulation engineers working on the project. They knew little of the C2 stovepipe. But after some investigation, it was obvious that a simulation network did many of the same things, and used much of the same technology, as a C2 system: both were essentially a large number of nodes connected in real time sharing significant quantities of multimedia data and interacting in a specific way to achieve a desired end state. In some cases, it appeared that a simulation network actually achieved the end state better and at a lower cost that an iron-built2 and fielded C2 system. Further, any large exercise (lots of people, units, command elements, vehicles, and other) conducted in a large-scale distributed simulation system had to have command and control less chaos result. This could be an external, existing C2 system that the participants brought to the simulation (that is, their own unit’s C2 system), or it could be a C2 replication built into the simulation (that is, the essential simulated C2 functions needed to command the forces in the exercise). ROUTE PLANNING SYSTEMS AND FLIGHT SIMULATORS A second example resulted from the examination of route planning systems, the computer applications used to create waypoints (x-y coordinates, altitude, time, speed, configuration, and other) for combat aircraft entering and exiting a 2
By iron-built I mean a system rigorously specified by exhaustive requirements documents and constructed by large teams of contractors, often over long periods of time and at significant cost, with the resulting product sometimes underperforming real world needs.
Trends in Modeling, Simulation, Gaming, and Everything Else
351
battlefield. As engineers and scientists from the flight simulator stovepipe studied the components of the route-planning stovepipe, it was clear that both used similar algorithms. Both had to have aerodynamic models, terrain models, weather models, and so on. Further, though not designed that way, it was possible to have each interact across the other’s stovepipe: a mission could be planned in a routeplanning system and flown in a simulator (or groups of networked simulators) for verification and refinement (for example, accommodating pilot preferences, tactics, techniques, timing, and emergency options). Likewise, a mission flown in a simulator could generate an initial set of waypoints that, when converted into a specific format by the route-planning software, could conceivably be entered directly into an aircraft’s flight director for route execution.
RECONSTRUCTION OF COMBAT OPERATIONS AND LARGESCALE DISTRIBUTED SIMULATIONS A third example can be found in the DARPA-sponsored reconstruction of the Battle of 73 Easting from the Gulf War.3 In conjunction with the U.S. Army, DARPA attempted to use simulation tools from the simulator networking (SIMNET) program to re-create the engagement battle of the 2nd Armored Cavalry Regiment against elements of Iraq’s armored Tawakalna Division. The idea was to re-create the battle as if each combatant vehicle had been an entity on a simulator network generating second-by-second vehicle status messages. Since none of the actual vehicles had been instrumented or had been on a network, the paths and behaviors of every vehicle (U.S. and Iraqi) had to be determined, coded into data packets (as if from a simulator update), and then entered into a simulation data stream of the type automatically generated and captured during a SIMNET exercise. This process took 12 months, partly because no single participant in the battle had a complete, detailed picture of what had actually occurred. Once the data stream had been assembled, carefully studied, and verified for accuracy by the commanders who had fought the battle, at least on the U.S. side, the SIMNET system could play it back, allowing observers to view the action from any vantage point, including within U.S. and Iraqi vehicles (for example, gunner and commander reticle views). The lesson for the scientists and engineers involved was as follows: if combat vehicles and individuals could be instrumented and networked as part of their future C2 infrastructure, simulator networking technology could converge with the C2 systems to capture live combat.4 3 An overview of the battle can be found under “Battle of 73 Easting” in Wikipedia, http:// en.wikipedia.org/wiki/Battle_of_73_Easting 4 A second reconstruction has been completed by a team led by George Lukes at the Institute for Defense Analysis focusing on the defeat of the Taliban in the Mazar-e Sharif region of Afghanistan in the fall of 2001. Neale Cosby, formerly the director of the Simulation Laboratory at the Institute for Defense Analysis in Alexandria, Virginia, has been a spokesman and advocate for this idea and was instrumental in both reconstructions as well.
352
Integrated Systems, Training Evaluations, and Future Directions
REAL WAR AND COMMERCIAL GAMES A fourth example comes from reconstruction of battles from the current Iraq war, not by military laboratories, but by a commercial gaming company.5 Kuma Reality Games has produced reconstructions of approximately 100 U.S. operations in Iraq that are sent to subscribers as games they can play, much like any other computer game. The games are run on regular personal computers (PCs). The company selects a well-publicized battle or operation, creates the terrain/feature data base from a variety of commercially available sources, collects details about the incident, often from the troops who participated (according to one news report), and produces and distributes the episode in a short time (sometimes a week). Using modern graphics technology, the rendering of the battlefield environment and combat effects is good. Of interest, it has been reported that these installments can be found being played by Iraqi teenagers in Iraq shortly after they are published. The convergence here: commercially motivated gaming and real world events. Implications So these and other examples suggest that many of the stovepipes, because they are increasingly based upon the same or similar information technologies, are converging. Further, this homogenization seems to have accelerated as information and communications technologies themselves moved from “special purpose” to “general purpose,” from early computing machines designed to perform narrowly defined tasks (for example, computer-image generators) operated by experts with narrowly specialized skills to today’s powerful, mostly consumer level machines and operating systems with robust, built-in networking capability, often browser based user interfaces, capable of being operated by everyday users. Why Is This Important? This dissolving of stovepipes by the interventions of common technologies enables new applications and new concepts of operations that in the past were hard to imagine. As an example, with combat platforms and individual combatants now instrumented with accurate position location information technology6 and connected via automatically reconfigurable networks, we have a means of capturing some of the details of live combat at an improved level of fidelity and realism. The format of these data, similar to that captured during distributed simulation sessions, allows us to think of both in the same terms. We can use the visualization systems from military simulations or today’s commercial gaming systems (for example, PC based graphics; game consoles like the Xbox or PlayStation) to show the details of actual combat operations. Further, we can document combat operations as interactive data streams, living histories rather than written documents, and 5 6
See www.kumawar.com This is sometimes referred to as blue force tracking or automated vehicle location.
Trends in Modeling, Simulation, Gaming, and Everything Else
353
feed these back into the training of young military personnel via simulation. It is reasonable to imagine a time in the near future where we will have a fully interactive digital encyclopedia of every battle ever fought, available to every person in uniform, as well as officials in leadership positions . . . and perhaps others. Such data would be valuable to researchers in a number of areas, such as in “machine learning and reasoning” where one goal is to build personal digital assistants and other tools for commanders dealing with “wicked problems.”7 This brings to mind the environment that Orson Scott Card painted in his science fiction classic Ender’s Game (1991). The protagonist, Ender Wiggins, a teenager selected using sophisticated profiling tools for his potential as a battle commander, undergoes fully immersive training and education using a wide variety of tools to develop his tactical skills, and flexible physical exercise and gaming environments to develop his command skills (that is, his development of strong interpersonal relationships with his teammates and/or subordinates). For Card, Ender’s assent to becoming a master tactician and effective commander is an all-consuming, continuous effort, without respite. For Ender, any differences among the spectra of representations of reality are nonexistent: the convergence is complete. The information and communications technology that he interfaces with blurs the distinction between the real world and its many representations, all of which he can interact with and manipulate. It provides him the fluid command environment for collaborating with his teammates. In the end, he puts in another day in simulated battle without realizing he is actually commanding a real battle. But because technology has allowed this complete convergence, it does not matter. It is all the same. It seems to me we are on that path, fueled by the emergence of common information and communications technologies, and as we look at trends in modeling, simulation, and gaming, we can trace a path that takes us into a completely different landscape of futuristic, revolutionary applications. Here is one view. BATTLE SCHOOL (UNDERGRADUATE AND GRADUATE) AND THE BATTLEPLEX What would Orson Scott Card’s battle school notion look like with near- and mid-term scientific and technological advances? I like to think of three interrelated thrusts. Many of the specific components are already here today or shortly achievable, but others are farther off. Battle School (Undergraduate) The core curriculum of the battle school (undergraduate) is focused on leader development. Its goal is the maturing of leaders and leader teams (multidimensional/multiorganizational) for dealing with future conflicts and very large 7
This means ill-defined, complex, dynamic problems of the type routinely faced by commanders in chaotic, nonlinear situations.
354
Integrated Systems, Training Evaluations, and Future Directions
operations (for example, humanitarian assistance/disaster relief ). It has five technology components, many of which require scientific advances: a. A strategic syllabus for lifelong learning—adaptive learning tailored to each learner; b. Commander’s personal reference library—100,000 volumes; c. Interactive encyclopedia of all battles and major operations; d. Worldwide access to online mentors: anytime, anywhere; e. Families of decision exercises.
Battle school “students” all are equipped with the battle board, a la Ender’s Game, a tool that accompanies them throughout their careers. It is the main way they port into their battle school worlds. Battle School (Graduate Level) This is the part of battle school for advanced studies. It also has five components: a. The designer’s workbench—where new technologies and concepts are evaluated against past, present, and future worlds. b. Extreme scenarios—a laboratory for thinking the unthinkable using unconventional, creative techniques; c. Reconstruction lab—development of advanced tools and skills for reconstructing past battles and distributing them into the interactive encyclopedia; d. Iteractive history—developing the capability to interdict the historical record, manipulate events and decisions, and create alternative futures; e. Cultural immersion—virtual travel to all points of the world for immersion into current, past, and future histories.
The BattlePlex The BattlePlex is a complex of practice fields, coaches, trainers, “sports medicine,” labs, media, and stadiums, where initial learning and skill acquisition is practiced, specific missions are rehearsed, real world operations are conducted (via the BattlePlex), and senior leaders (especially political) can observe, learn, and understand. All three of these thrusts are interconnected. Advanced studies develop tools and content for core leader development, skills are practiced and honed in the BattlePlex, and real operations conducted there, captured live, are fed back into the core and advanced studies activities. Can we do things like this? Given the continuing trend toward convergence, I think the answer is yes. REFERENCES Card, O. S. (1991). Ender’s game. New York: Tor.
Chapter 32
TECHNOLOGICAL PROSPECTS FOR A PERSONAL VIRTUAL ENVIRONMENT Randall Shumaker
BACKGROUND Short-term predictions for the kinds of technology important for virtual environments (VEs), say two to five years into the future, are fairly safe, but not really all that interesting. Longer-term predictions of technological advances, perhaps 10 to 20 years into the future, have a somewhat spotty, generally humorous history. Some examples of this can be found in the nice little book Yesterday’s Tomorrows (Corn & Horrigan, 1984), which has collected predictions in many domains: flying cars, all kinds of cities of the future, and even intelligent robots are shown. Now that we are in that future time, or long past it, we can enjoy the naivete´ of those predictions. I am still hopeful on the flying car, though. One of my goals is to try to avoid bringing too much pleasure to future readers by using a reasonable methodology and by avoiding specific technical delivery predictions. My focus will be probable capabilities, however implemented. So, how can we make predictions reasonably far into the future that are not doomed to being either completely wrong, or otherwise useless, and why try? The “why” is relatively easy to answer. While short-term predictions are safer, and can be of some value in tactical planning, they are not especially useful in creating a long-term strategic vision. There is also some reason to believe that a reasonable trend prediction may actually be self-fulfilling by creating a viable vision and time scale. The key is the reasonableness based on known laws of physics and economics. Consider Moore’s law for growth in the number of logic elements on a microcircuit for an excellent example of this principle in action. Making reasonable projections that are qualitatively accurate predictions and do not violate the aforementioned laws of nature and economics are my goals here. My objective is to define the sort of features that could be expected to be available in multimodal virtual environments for home use, a personal VE. Another lesson I have learned from a long career in information technology is that even in cases where long-term predictions are reasonably correct
356
Integrated Systems, Training Evaluations, and Future Directions
technologically, assumptions about how these advances will be used and their long-term impact have not proven too useful. The discussion here will be confined largely to considering the technologies for implementing personal VEs, with only modest speculation about the most obvious applications that this development will enable. There will be some discussion of what I hope will be less obvious applications.
APPROACH AND BASIC PREMISES A great deal of VE literature appropriately focuses on human requirements and the degree to which technology can meet them now and in the near future, perhaps five years ahead. Stanney and Zyda (2002) provide a fine example of this useful literature. This chapter covers the many technical, psychological, and evaluation issues inherent in VEs in their current and near-term forms. Typical of many VE publications, this chapter expresses great confidence in the continued growth of technologies. Unfortunately, due to those previously mentioned laws of nature and economics, this trust may not be completely justified, particularly when the horizon may be 10, 15, or 20 years ahead. For this time frame it may be most useful to reverse the question: Can we specify just how much capability might ultimately be required to stimulate the senses of a human being in all necessary modalities at the maximum rates that can be processed by a human? Given this information, estimates can be made concerning if, and when, that might be possible for each modality. This information will also put us in a position to decide whether we really need, would want, or can afford to provide maximum stimulation capability. While stand-alone applications for entertainment and education are potentially important applications for a personal VE, many really interesting applications will involve interacting with other virtual environments and the individuals or groups within them. In order to bound the problem a bit, the focus will be on applications where interaction among people and avatars within the personal VEs will be limited to perhaps a few tens of individuals. This assumption allows us to reasonably safely predict the inter-VE communication requirements that might be needed in a home environment. Another important issue is cost. While laboratories and shared facilities will invest substantial resources to achieve effective capability, most individuals cannot or will not. We have an effective example to guide us in deciding how much the majority of individuals will spend to acquire a high end interface technology that they desire, but do not require: large-screen, high definition television (HDTV) with home theater audio. My goal is to make reasonable estimates of when, if ever, an individual might be able to acquire a personal VE at a cost of $2,500 in current year dollars, about the median price for a high quality, large-screen HDTV with an associated high end audio system today. My short-term prediction follows: within a very few years home entertainment and personal computing will be one commodity or at least will be so tightly coupled that the distinction will not be important. Given this, the fiducial point for projections that follow will be in multiples of “PC-
Technological Prospects for a Personal Virtual Environment
357
2008” capability, the computing system that anyone can buy today for about $1,200. Such a typical machine from a high end vendor would contain a dualcore processor running at 2.2 Ghz (gigahertz) clock rate, 3 gigabytes of memory, a 500 gigabyte disk drive, a high end graphics card, wireless LAN 802.11g (nominally 54 megabits per second), at least 100 megabits per second wired networking, and possibly gigabit Ethernet capability. I will have more to say about external connectivity later. This middle to high end PC will also include a 22 inch LCD (liquid crystal display) monitor, capable of full HDTV video display, and a reasonably good printer. For a few more hundred dollars a quad-core processor, as much as a terabyte of disk space, and an HDTV tuner could be added. All in all, this very capable machine might have cost more than $100,000 ten years ago. Note some calibrating information: the qualification about a maker of high quality computers is important because good design and good components are necessary to harvest the theoretical power of such computers. Note also that the clock rate is not really a good indicator of computer performance, but most PC manufacturers are silent about computing power that users might actually be able to harvest. In fairness, computer performance benchmarking is very workload dependent, so it is hard to provide accurate figures. For comparison, many very high end scientific computers use similar processors with a typical per-processor throughput of perhaps 10 to 20 billion floating-point operations per second (gigaflops). This is a lot of computation, but personal PC mileage will vary significantly, very much to the downside. And finally, almost no generally available software for personal computers can take advantage of the second core in our current machine, let alone use the four in the quad-core machine. This is expected to change though, and I am counting on this for building my personal VE.
CAPABILITIES THAT MIGHT BE POSSIBLE FOR A TOTALLY IMMERSIVE PERSONAL VE Wonderful data for understanding the human as an information processing entity exists (McBride, 2005); however, it is in a form that requires some interpretation to derive appropriate technical parameters for fully “exciting” the human sensor systems. For purposes of this discussion, I will attempt to provide rough order of magnitude (ROM) estimates of the engineering parameters, not specific values. Also necessary in shaping this analysis are a number of assumptions about how the personal VE will be implemented and what it might cost to be a practical possibility. For ethical and practical reasons I have assumed that the user interface will be entirely external to the body in the time frame of interest. This may not always be the case; cochlear implants have been very successfully applied in dealing with some hearing impairments, and various other implantable interface devices have been investigated with some success. For pragmatic reasons though, unless we are attempting to overcome a specific personal disability, I do not believe the average person will consider implantable interface devices in the time frame I wish to consider.
358
Integrated Systems, Training Evaluations, and Future Directions
VISION Vision is our highest bandwidth communication channel and the channel we think of first when planning a VE. It is also our most heavily overused channel in creating such environments, in many cases the only one. There are several aspects of importance here: data delivery to the human visual system, bandwidth needed to supply the delivery system, and processing necessary to generate visual content. McBride (2005) says that the human eye binocular field of view is 200° wide by 135° high from central fixation, and we can infer that perhaps a worst case equivalent resolution would be 80–100 megapixels. The issue is clearly more complex than this number would imply. For example, the fovea is perhaps only 1 megapixel, with peripheral areas much less. If we could track the fovea and provide only high resolution imagery, there the total number of pixels needed would be very substantially less, but requires a lot more complexity for high performance eye tracking, communication, and dynamic image generation. We will still need tracking, even if we provide full-image resolution for everything in the immediate visual field, but at much lower resolution and rates. I do not know exactly how we might supply those 80–100 million pixels. Such technologies as direct retinal writing and advanced head-mounted displays may be able to achieve the equivalent relatively sooner and much less expensively than full high resolution displays. Whether we would actually want or need to build such full visual capability everywhere in the visual field is questionable. In any event, my hypothetical personal VE user will likely opt for less, but still pretty spectacular, capability. For reference, a 1080P (1,920 × 1,080 pixels, progressive scan) HDTV, the highest resolution expected to be commercially available in the United States, has 2,073,600 pixels nominally refreshed 30 times a second. The bandwidth needed to achieve this in raw format would be about 3 gigabits per second. In practice, using currently approved compression technology and broadcast bandwidth, 720P (1,080 × 720 pixels) is the standard, providing 921,600 pixels and requiring an uncompressed bandwidth of 143.18 megabits per second. The compression algorithms used allow this to be transmitted at 37–40 megabits per second. Commercial broadcast experience has shown that most users are quite happy with compressed data at 22 megabits per second and are generally unable to notice the difference. By an interesting coincidence, if we had a way to directly deliver the signal appropriately divided between the fovea and the peripheral vision areas of the eye, 720P HDTV would fairly well fully excite human vision. A high performance eye tracking system coupled with a very large field of view display or direct retinal write system would be required to make this possible. There is much more that could be said about the differences between progressive (P) and interlace scan, various available frame rates, and the human ability to effectively use it. Lacking direct retinal excitation, we will be more interested in the trade-off between visual resolution and field of view, required computing power, local storage needs, and communication capability for creating a personal VE. The performance numbers for 720P HDTV are adequate for this discussion.
Technological Prospects for a Personal Virtual Environment
359
Obviously outstanding video is a key capability for a personal VE; however, a lot of research over the past 10 years has shown that adding additional sensory channels can greatly improve the sense of reality, immersion, and enjoyment of the experience. Moreover, these additional channels can heighten the perceived visual effects or reduce the required visual capability for creating an effective experience. This will be a consideration when we inevitably have to compromise on capability for cost or other practical reasons. AUDIO McBride (2005) also provided outstanding detailed specifications for the human auditory system for those who wish to delve more deeply. For creating a personal VE we can take a less comprehensive view and say that we will consider audio compact disc (CD) standards as adequate to provide the sense of audio presence that is required from a transducer standpoint. Separate excitation of the two ears in a range of 20 to 20,000 Hz (hertz) is accomplished by using a sampling rate of 44.1 KHz (kilohertz) and 16 bits of resolution. This yields a data rate of 705.6 kilobits per second per channel. In round numbers, a standard CD can record 700 megabytes yielding 80 minutes of stereo audio uncompressed.1 A high quality audio headset or a more costly speaker system should work well as the transducer. Note that a high performance speaker system will also involve additional audio channels. Simply transmitting stereo audio from some real space or a fixed virtual space into a personal VE could be accomplished without additional processing other than perhaps dealing with relative changes in position. Generating an audio “soundscape” for a virtual space, the kind of rich audio environment we encounter in normal life, may involve quite a bit of preliminary computation and significant dynamic calculation that is very dependent on the complexity of the audio situation to be portrayed. In any event, our canonical reasonably high performance personal computer can accomplish this task now. For this channel we will also assume the ability to generate speech. This will require additional computation, but no additional bandwidth. TACTILE, CHEMICAL, AND VESTIBULAR SENSES I am including discussion of these capabilities, but as a practical matter these are less critical information channels to be provided in a personal VE for the kinds of applications I will describe later. Moreover, for tactile, in particular, while the total bandwidth involved is relatively small, the number, spacing, and types of tactile and thermal sensors that humans possess are likely to confound practical total synthesis for a very long time. Limited but useful capability may come from progress in direct stimulation for dealing with nerve injuries; 1
Note that communications information is usually expressed in bits, and computer information in bytes. There are 8 bits in a byte; however, there are usually additional bits needed in communication systems for framing, error correction, and other overhead functions, so the throughput relationship is not just a simple ratio.
360
Integrated Systems, Training Evaluations, and Future Directions
however, I do not believe that most unimpaired people will opt for direct stimulation. Where needed, tactile capability for a personal VE is probably limited to low fidelity physical devices until direct nerve stimulation becomes feasible. This may not be as important a limitation as it appears. Experience with some low resolution stimulation devices has shown promise for some special applications and should prove useful for dealing with some visual, auditory, and vestibular handicaps as well. Pioneering work in sensory substitution by Bach-y-Rita and Kercel (2003) show that auditory and tactile analogs for vision and vestibular sensing can be effective and may be rapidly incorporated into the communication channel repertoire of an individual. Even the tongue has been shown to be a viable channel for providing reasonably high resolution information to people with vestibular problems (Tyler, Danilov, & Bach-y-Rita, 2003). These and other novel interface concepts will be of interest for some applications of personal VE discussed later. For olfactory stimulation, it is known that the sense of smell can add significantly to the sense of presence (Brewster, McGookin, & Miller, 2006); fairly simple technology to implement it is available, but I do not believe this will be an important element of a personal VE except perhaps for some kinds of entertainment. Similarly, I believe that most people will want to stimulate their senses of taste with real rather than virtual stimuli, and it is not clear what sort of application might require this capability. This said, perhaps some limited ambient olfactory capability might be an interesting adjunct to improve a sense of presence. Vestibular and proprioceptive senses provide our sense of balance and postural position. Providing effective stimulation of these senses is important for some applications, such as flight simulation, locomotion in a VE, and for practicing physical skills. For widespread personal use, however, I believe that the main issue will be finding ways to avoid significant sensory discrepancies among the visual, auditory, and vestibular scenes; otherwise, our system will have to have simulator sickness placards prominently displayed. HUMAN COMPUTATIONAL CAPABILITY Among the things that would be nice to know is how much computational capability the human brain has to apply. This is of prime interest to psychologists of course; they are interested in precisely how the brain operates. I would like to know for a different reason: when, if ever, might computers become able to interact with humans as a reasoning peer or, more realistically, a reasonably responsive avatar. Such an avatar would be really nice to have in a personal VE for such applications as games, every kind of educational use, and as a personal companion or assistant. I have no idea at the moment how to build the software for this, or even if it is theoretically possible, but understanding the magnitude of computational power that might be required may still be useful. Not too many years ago the answer would have been very sobering—we were very far away. How far away are we now? McBride (2005) has some numbers, most of which are implementation specific to the human brain, a “wetware” computer. This text provides some guidance, but is a bit difficult to convert to computer-oriented
Technological Prospects for a Personal Virtual Environment
361
numbers. Fortunately, someone has done this kind of analysis for us. Hans Moravec (2003) considered this very problem and provided numbers from which we can work. He projects that monkey level intellectual performance might require 10 million MIPS.2 Moravec also projects that it would require 300 million MIPS (300 tera operations per second, or 300 TOPS) for human level performance. The 30 to 1 ratio of human to monkey capability is arbitrary, but we will work with these numbers, maybe having to settle for a monkey level assistant, but with good verbal skills. In any event if the main goal is to produce a pleasing, reasonably responsive, socially adequate, automated avatar, the amount of deep intelligence required might not be that significant. After all, monkeys function well in a rich social environment and can solve many interesting problems in finding food and dealing with threats. However, they are not too good at calculus and other high level reasoning functions. Also there is a positive for our prospects for creating social competent avatars: human and monkey brains have a lot of responsibilities and concerns that an intellect-only artificial colleague does not need. For example, maintenance of life support functions, concerns about finding food, finding a mate and reproduction, coordination of motion, and most kinds of peripheral sensory processing will not be necessary. Similarly the reliability of the individual computing elements of computers is better than the biological equivalent, particularly when we consider a working life before replacement of about three to five years, so we can expect to get by with a bit less raw power and redundancy than the brain has, giving us a cushion in our estimates. The largest supercomputer listed in November 2007 was an IBM Blue Gene capable of a peak speed of about 600 TeraFLOPS (a measure of a computer’s speed), with a memory of 73,728 gigabytes. We may have the raw computing power to achieve Moravec’s vision, except, of course, for the huge cost and knowing how to actually build the required software. There is another element of Moravec’s prediction; How many MIPS might be purchased for $1,000. His projections were based on personal computers available in 2000, extrapolated from 1995 trends. My reading of his chart shows an estimated 109 MIPS for $1,000 by 2025; this is 1,000 TeraFLOPS, not too far from today’s fastest computer. There have been some significant processor implementation changes for personal computers since 2000 that warrant another look regarding this issue.
CAPABILITIES THAT WE WILL WANT TO PROVIDE Everything so far has been a prelude to better understanding what might constitute the requirement and capabilities for a VE that an individual might 2
MIPS is a million instructions per second, a common unit of computer performance. This measure includes all computer instructions, whereas MFLOPS, another common measure, refer to only floating point arithmetic instructions. Unfortunately, MIPS and MFLOPS are not easily interchangeable, and the relationship varies depending on the computer architecture. Since I do not know what mix of instruction types might be needed to build a high performing avatar, I will use these values as loosely interchangeable to suit my ROM arguments.
362
Integrated Systems, Training Evaluations, and Future Directions
purchase for personal use. For practical reasons and to keep these speculations from becoming too much like science fiction, the year 2020 is my target time frame. This is far enough ahead to be interesting and require a certain amount of guesswork, but near enough to allow reasonable technical speculation. High performance vision and audio are the primary VE modalities we must have, but how much is reasonable and usable? Chemical and tactile capability may be useful, but tactile, in particular, will be hard to implement in a totally virtual environment. Tracking body motion, gaze, and facial expression detection will probably be useful or even necessary, but how much physical mobility will be needed? Having high performance avatars would be very nice, but how much should be expected? Which elements of these are likely to be worthwhile or even technically feasible? And finally, what are we likely to be able to afford within our proposed budget and time frame?
NOTES ON MY APPROACH TO PREDICTION Physics, other laws of nature, and the known laws of economics provide useful guidance in making reasonable predictions. Some needed technologies are within sight of fundamental limits that impact our expectations. Fortunately these limits are in many cases orders of magnitude beyond our present capability and at high enough performance levels that they are probably more than enough to keep us very happy in 2020. I will be making use of a nice tool for qualitative physics and mathematics, the ogee curve, coupled with and driven by those laws of nature and economics mentioned. Taken together these have historically proven useful in explanation and prediction in many fields. An ogee, or S-shaped curve, appears very frequently in nature, social systems, mathematics, architecture, and art. It represents growth, possibly geometric or exponential, that is ultimately limited by a lack of some resource or a limitation in some crucial feature. In nature, the limiting factor might be the availability of water, or some nutrient, or competition for some resource, or perhaps a physical limit, such as bone strength. In economics, it might be a shortage of a commodity or limitation in some other resource, such as transportation or saturation of a market. In technology, such limitations are often due to physical properties of available materials, heat dissipation capability, or size constraints in manufacturing technology. In any event this phenomenon is so ubiquitous that it is fairly safe to choose it as a prediction paradigm. The key feature is the phenomenon of interest grows with an increasing rate initially, then with a decreasing rate until it becomes asymptotic to some ultimate level. For our purposes, consider three regions of the curve: early, mid-life, and mature. In the early phase the growth rate is actually increasing over time, in mid-life the rate of rate growth changes from increasing to decreasing, and in the mature phase the growth rate is asymptotic to zero. For a medium- to longrange prognosticator, this is a useful tool in that with relatively few data points we may be able to determine which part of the curve is in effect. For technologies where we may know something about the ultimate limiting factors, we may still
Technological Prospects for a Personal Virtual Environment
363
be able to make useful predictions about how far away the limits are without knowing specifically how we might get there. WHAT I CAN HAVE IN MY 2020 PERSONAL VE: THE GOOD, THE BAD, AND THE DISAPPOINTING The Good This discussion will provide expected capabilities, modulated by technological and economic reality. Cave-like immersive virtual environments are nice for some purposes. Because of space and cost considerations, and considering likely available sources of content, I do not think this is the way a personal VE primarily used for recreational and social purposes will be implemented. The most likely format will be a wraparound screen for individual or small group use, or more economically, a head-mounted display with head and perhaps eye tracking. Either should also provide hand, eye, body joint, and facial expression tracking. Looking at costs for LCD computer displays in 2007 (about $300 for a 22 inch 1,680 × 1,050 pixel display) and noting that we are probably in the early part of mid-life for these technologies, we can reasonably predict that we can have an 8,000 × 2,000 pixel display arranged to subtend perhaps 160° of horizontal field for about the same cost or less. This might be physically large, covering a curved wall, or closer for a more personal display, such as looking out the large windscreen of our personal VE navigation machine. (The vehicle metaphor helps considerably with visual coverage requirements, locomotion issues, and expectations for haptic interaction with the environment.) With a head-mounted display and appropriate tracking, we could achieve 360° capability with about the same effective field of view for the same cost. The high quality, current-day graphics in our hypothetical PC-2008 can drive two 22 inch displays already. It is not much of a reach to project the roughly eight times capability improvement at constant cost in 13 years. I expect to have postural tracking, hand position tracking, and facial expression and eye tracking available for our personal user as inexpensive peripherals by 2020. Some of this capability exists now; for example, the Nintendo Wii interface can detect acceleration in three dimensions and has proven effective for simulating physical interaction in sports games. Facial expression detection and posture tracking are available in the laboratory now, with performance improving rapidly and costs dropping dramatically. Projecting a 10-year-in-the-future cost of perhaps $200 for such an interface is fairly safe. I do not anticipate the need for wide-area physical locomotion technology in a typical personal VE both for cost reasons and the observation that most people’s entertainment and social technology is generally done comfortably seated or perhaps standing within a small area. This is not likely to change. High quality spatialized audio will clearly be available, the economical version headphone driven, the higher end area surround version costing significantly more. This will be integral to any combined computer and entertainment appliances available in 2020, the grand unification of entertainment and computing already being well under way.
364
Integrated Systems, Training Evaluations, and Future Directions
The end of growth in computing power has been long predicted, but for the past 50 years, just as the rate of performance growth started to flatten out, a new technology has been available to keep the curve moving. This cannot go on forever, of course, and basic physical rather than technological limits are appearing on the horizon. Computer technology still has a way to go both in individual processor performance and more recently through physical replication (multicore technology and multiprocessor computers) and better processor utilization (multithreading architectures). For example, the IBM Blue Gene mentioned earlier uses more than 100,000 processors. Dual- and quad-core and multiprocessor personal computers are available today, and 8, 16, and 32 core, and so forth, processors are clearly on the horizon. An experimental 80 core processor was demonstrated during 2007 and thousand-core processors have been discussed. Various sources claim that computing hardware has demonstrated more than 1014 improvement in what $1,000 in constant dollars will buy in computing capability. (This is not a factor of 14, but 14 orders of magnitude, an almost unbelievable number—unduplicated in any other human technological endeavor.) By my estimate this capability is in the latter part of the middle of the maturity curve, but still has 5 orders of magnitude of potential growth left in the next 20 years or so. If true, these 5 orders of magnitude are enough to buy a 600 TeraFLOP human-throughput-capable computer for $1,000 inflation adjusted in 2027. By 2020 we should easily be able to afford high grade multi“monkey level” computing power for the same price. Storage seems to be actually a bit further from limits than computing power, perhaps mid-curve in growth. Currently a gigabyte of RAM (random access memory) costs about $30, and a terabyte of disc $250. Assuming a very conservative 2 orders of magnitude increase at constant cost by 2020, this would give you 0.1 terabyte of RAM and 100 terabytes of disc for under $300. What we could possibly need this to hold and whether such information would best be held locally are other questions. On the other hand, limitations in availability of very high bandwidth connectivity for individual users may justify local caching of large amounts of information. More on the reason for this follows.
The Bad, or at Least not as Good Even modest-size college campuses and businesses have access to optical backbone networks rated at 10 gigabits per second today, and many personal computers have gigabit Ethernet capability. Optical networks have been demonstrated that provide as much as 1.6 terabits per second connectivity using wave division multiplexing on a single fiber pair, and of course fiber links have many such pairs available. All seems very promising for essentially unlimited communication capability, except perhaps for a personal VE user, individual users on large networks, and especially for mobile users. A typical small business currently might have a T1 connection, nominally rated at 1.5 megabits per second (Mbps), and a home user with “high speed” connectivity might have a nominal 7 Mbps connectivity. Most have significantly less. This is both an economic
Technological Prospects for a Personal Virtual Environment
365
and pragmatic issue involving the time and cost to change basic infrastructure. Current home and small business connectivity is not particularly inexpensive either, at least by computing capability standards. A T1 line ranges from $250 to $1,000 per month, while the “high speed” home connection is typically $40 per month. There are plans afoot to provide fiber to individual houses, but, of course, it will take a long time to completely rewire the nation and the world. Cost and delivered bandwidth have not been clarified either. I am less hopeful for multiple order of magnitude bandwidth improvements for individual home users and widespread use in businesses, although a factor of 10 or so is very reasonable by 2020. Part of this is due to the “last mile” problem, or delivery to the end user, and part is due to the shared nature of network distribution where multiple users share each channel. Wired connectivity within a building or house now is generally 100 megabit Ethernet, or perhaps gigabit Ethernet. These networks are shared, of course, and the number of other users and what they are doing can significantly reduce available bandwidth, and, of course, these cannot achieve full rated speeds for technical reasons. Even so, for text, still images, Web browsing, and low resolution video these data rates are satisfactory most of the time. Fiber bandwidth is, of course, significantly higher, but not too widely available, and unlikely to reach the average home users for 15 or 20 years. For distribution of a few hundred HDTV channels to many tens of thousands of users one way, current entertainment cable and fiber connectivity is fine. For hundreds of thousands of sessions of a few tens of users interacting, the situation is not so promising, particularly that last mile. I hope to be surprised by how fast universally available, genuinely high bandwidth connectivity is available; however, realistically, we will probably have to rely on workarounds, such as local caching, high levels of compression, and avatars that mirror or mimic human body language rather than live video for personal use for some time. This latter may not be a bad thing for some applications that I will discuss later where photo realistic imagery is not required and may even be undesirable. Wireless networking is even more problematic. There are really two issues of interest here, connectivity on the go and untethered connectivity indoors. Radio frequency bandwidth is a limited commodity, and we cannot make more of it. While some more frequencies are becoming available with the move from analog to digital television broadcast, there is tremendous competition for it for cellular telephony, entertainment delivery, and all kinds of business, government, and military applications. For these reasons I do not foresee a great increase in the bandwidth, perhaps a factor of 10, that an individual outdoor mobile user might hope to have available. Indoors, and untethered, the situation is a bit better, but there are still a number of laws-of-nature limitations. Frequency congestion is still a critical issue, but for short-range line of sight communication, such as in a house or building, it is much easier to spatially share available channels. Current WiFi wireless uses a protocol called, not very descriptively, 802.11 (pronounced eight oh two dot eleven). There are various flavors of this labeled with a suffix letter, for example 802.11a, 802.11b, 802.11g, 802.11n, and so on.
366
Integrated Systems, Training Evaluations, and Future Directions
These range in data rate from 11 megabits per second (802.11b) to 248 megabits per second (802.11n), but, of course, their typically delivered data rate is realistically about a quarter of the nominal rate; and since the media are shared among local users, individual user data rates are further reduced. These rates are actually pretty good, though, and the highest of them could allow untethered, multichannel two-way video for a modest number of local VE users. The bad news here is that there probably will not be too much more radio based bandwidth ever available for frequency allocation and user health reasons. Also, many of the frequencies are shared with other services, so interference can be a problem. Perhaps local optical line of sight methods will become available that could significantly increase this, but I am unaware of any current development. In any event, do not count on many orders of magnitude increase in connectivity in the time frame considered here.
The Disappointing: Software Conventional software, the kind that constitutes operating systems, devices drivers, office suites, and the many applications in conventional use, has come a long way, but painfully slowly compared to hardware advances, and not too efficiently either. Entertainment software, such as games, is often even complex and costly to produce on a per line of code basis because of heavy demands for fine graphics and complex motion. A large piece of conventional software, such as an operating system, 3 may constitute 50 to 100 million source lines of code (SLOC) that are handcrafted by teams of hundreds of individuals. Currently the operating systems for the personal computers that will be the backbone of future VEs cannot take full advantage of the multiple core processors available now, although techniques for doing so exist and are in use in high performance computers today. For our personal VE, I do not envision the need for fundamentally better operating systems and office suites; however, I do expect this software to disappear from direct user view. The days of everyone needing to be a system administrator or having to know what the “C drive” does need to pass into history. This process is under way already, as the computer becomes the brains behind multimedia home entertainment. Think about the iPod as an early example of how to hide underlying complexity while providing sophisticated behavior. What else is there? Well, building the software that will permit the computers underlying our personal VE to change to be a collaborator with which users can seamlessly interact to solve problems or seek information, and deal with in the same general manner as in human-human social interactions is a major challenge. We are still in the early days of making this a reality. A key part of this is being able to use verbal and nonverbal communication channels effectively in both 3
Apple’s Mac OS 10.4 is estimated to be 86 million SLOC, and Windows Vista is estimated to be about 50 million SLOC. This is among the reasons why memory growth has been driven to such large numbers. Their sheer size and complexity alone explain the reliability issues so often seen. Fortunately this is improving nicely; by 2020 who knows how stable commercial operating systems will become.
Technological Prospects for a Personal Virtual Environment
367
directions. Good speech recognizers and excellent speech synthesizers already exist and will continue to improve. If a user were to have the gesture, posture, and facial feature peripherals I expect a personal VE to have, more effective and more natural dialog between humans linked by VEs would be possible,4 and these would also provide excellent cues to be used for moving humancomputer interaction to the more abstract level of communication that I envision. Note that recognizing and generating speech is not the same thing as understanding language and generating cogent responses. This latter level of software is the kind on which we need to devote some significant energy. MODELING HUMAN BEHAVIOR AND CREATING ADAPTIVE, RESPONSIVE AGENTS The field of artificial intelligence (AI) has had its heady times of great expectations, lows of disappointment at how hard this undertaking proved to be, and a resurgence with more modest goals and much better computing and information resources upon which to draw. We are beginning to see good progress in cognitive modeling and the simulation of certain individual and group behaviors. Maybe one day we will have a workable biologically plausible model of the human mind that could drive a companion avatar, probably not by 2020 though. If and when we do, it might well take a petaFLOP (1015 operations per second) computer or so to run the billions of SLOC needed to implement it. Do we really need such a full-up model to be useful? I believe that most human-human social, business, and recreational interactions do not require anywhere near full human cognitive capability. This is a huge advantage to us in normal life; who wants to have to use all of their cognitive capability all the time? It also means that we can set objectives for social avatars and more natural human-computer dialog at much more reasonable levels. I do not know precisely how to create this social software yet, but there is very promising work in the literature, and some really important supporting technologies already exist. One of the problems with which early AI researchers had to deal was how to compile the huge amount of information that humans draw upon in decision making and in daily life. A great deal of research went into capturing expert knowledge, and many of the techniques developed then have become standard practice in computer science today. An important misapprehension at that time was the view that general human intellectual performance could be captured using logic or some variant as a general model. What was not as successful was capturing “common sense,” the many pieces of important and not so important data that humans make use of in ordinary discourse, and development of good methods for piecing these together. Well, today the compilation of a lot of this general discourse-supporting information has been done, continues to 4
I am assuming here that live video may not be the best way to interact in a VE for bandwidth and other practical reasons. I believe that avatars reflecting human body language and facial expressions of real humans, or computer-generated avatars with reasonable body language and verbal skills, will be the most effective way to build such systems.
368
Integrated Systems, Training Evaluations, and Future Directions
improve and expand, and is available rapidly and free. Consider Wikipedia (wikipedia.org) as an example of a repository for huge amounts of reasonably vetted knowledge, mostly in compact narrative form. Consider Google (www.google.com) as a link to huge amounts of information ordered in sequence of likelihood of being of interest. Both of these are constantly updated and expanded by people—learning and adaptations are intrinsic and free to end users. These even support multiple languages. If we are interested in less factual and structured data to support discourse, opinions, and viewpoints on almost every subject of interest to humans, we have blogs. At least in theory these could provide a dialog avatar a rich source of considered analysis, opinion, ignorance, prejudice, and idle musings. The key, of course, is sorting through and applying this information. This is where I believe some more research should be focused. So, it is not too risky to predict that by 2020 users will be able to conduct an effective and satisfying social dialog with an avatar for tens of minutes on such topics as the weather, current events, travel planning, and information access. Just do not expect too much intellectual depth,5 nor is this really necessary for many of the applications I envision as the most widespread. The hypothetical high end monkey computational capability mentioned before will be more than adequate to implement this level of interaction, and others have provided the storage for the huge information base required to carry on a responsive and believable dialog.6 WHAT COULD BE DONE WITH A PERSONAL VE? Aside from the obvious applications for VEs, such as entertainment, virtual travel, education, and above all, games, I believe that whole new classes of augmentation, assistive, and quality of life applications will become feasible and perhaps, with an aging population, very desirable or even necessary. First let us consider some obvious uses, then the less obvious, but potentially more interesting ones. Second Life (http://secondlife.com) is a current example of an online virtual world that is not a game, but a venue for social interaction and even real world commerce. Even in current form with primitive avatar motion and text interaction, a great many experiments are being done on what such environments might have to offer in the future. Enhanced avatars, and using a personal VE as the user window into a virtual world, should provide a highly effective means for learning, social interaction, and commerce. Virtual worlds have even been proposed as venues for research in social science as realistic microcosms of human experience and decision making (Bainbridge, 2007). A recent article in the Washington Post (Stein, 2007) discussed a number of other applications, including the use of a virtual world in therapy and as an accommodation medium for physical and emotional limitations. Some businesses have built private virtual meeting rooms, 5
I have been in many meetings and social gatherings where none of the human participants appeared to employ even a modest level of cognitive prowess, yet the events were considered to be successful. 6 Some early examples of such discourse agents already exist; visit Ramona! at www.kurzweilai.net
Technological Prospects for a Personal Virtual Environment
369
claiming better results than for audio or even video conferencing, and there are even commercial showrooms for products and off-site college campuses in Second Life. A recent pilot study in using a virtual classroom for teacher selection (Dieker, Hynes, Stapleton, & Hughes, 2007) showed high user acceptance of intelligent avatars in a VE and the potential for a personal VE as a future tool for delivering immersive experiences. This system had multiple avatars that displayed many of the high level social interaction features that I have proposed to automate. The jury is still out on how effective these limited VE experiences really are for commercial and educational purposes, but a number of companies now provide excellent tools for developing business- and training-oriented virtual worlds.7 However, the widespread acceptance and use of virtual worlds for purely social interaction, even in their present limited form, are indicators that there may be significant unexploited potential for other useful purposes. A personal VE as the user portal into such systems could enable many new uses, enrich existing ones, and remove some of the technical impediments for individuals, facilitating broader usage. High performing social avatars and wider rates of human participation would improve virtual world capabilities immensely. An emergent application has been the use of virtual worlds in therapeutic applications and as a means for mitigating social barriers for people with disabilities, mobility limitations, and some type of psychological impediments to full social interaction. New Scientist magazine produced a series of articles during 2007 showing examples of Second Life as a mechanism for dealing with some of these issues. Two of these articles are of particular interest; the June 27 issue focused on Second Life as a vehicle for people with Aspergers’ spectrum disorder to more effectively interact with others, and the August 22 issue dealt with physically handicapped individuals participating in virtual world activities on an equal basis with nondisabled individuals. While still anecdotal, there is growing evidence that virtual worlds have an important role to play in providing social and business access to people across space and as mitigation for various physical and social limitations. In the case of Aspergers, the simplified social signals and even the inherent delays are part of what make these environments effective. Similarly, physical handicaps need not be apparent or an inhibitor in a virtual world. A personal VE that allows a better sense of immersion, multisensory interaction, or even discourse involving synthetic entities can only enhance this capability. One key application I foresee is social accessibility for the elderly who may have mobility issues, but still wish to engage in a rich social life with friends, shop, consult with physicians, or engage in mentally stimulating activities. Conversational avatars may also serve as companions who never get bored hearing repeated stories or discussing politics. Companion animals have served some of this function in the past, but have generally not been able to hold up their side 7 During a recent demonstration of a virtual world for training decision making in a high impact environment, in this case disaster response, it was noted that an avatar of one of the participants was recognizable as representing him, but his avatar was much slimmer and more agile. The participant noted that while he did not have time to exercise, he had his avatar run five miles a day.
370
Integrated Systems, Training Evaluations, and Future Directions
of the conversation. Such conversational avatars might even be able to conduct ongoing cognitive assessments and provide reminders to patients that may be helpful in health maintenance. The current generation of elderly has not generally embraced the Web and e-mail as fully as might be hoped, although this is changing. I do not expect that to be the case with members of the next generation, having been users of personal computers, cell phones, ATM machines, MP3 players, and purchasers of large-screen HDTVs. In particular, the disappearance of the computer itself as a separate entity that has to be learned, debugged, and generally tolerated for the benefits it provides should facilitate the transition to multimodal entertainment, communication, and information appliances. This process is already well under way. REFERENCES Bach-y-Rita, P., & Kercel, S. (2003). Sensory substitution and the human-machine interface. TRENDS in Cognitive Sciences, 7(12), 541–546. Bainbridge, W. (2007, July 27). The scientific research potential of virtual worlds. Science, 317(5837), 472–476. Brewster, S., McGookin, D., & Miller, C. (2006). Olfoto: Designing a smell-based interaction. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 653–662). Corn, J., & Horrigan, B. (1984). Yesterday’s tomorrows: Past Visions of the American Future. Baltimore, MD: Johns Hopkins University Press. Dieker, M., Hynes, C., Stapleton, C., & Hughes, C. (2007, February). Virtual classrooms: STAR Simulator. New Learning Technologies 2007, Orlando, FL. McBride, D. K. (2005). The quantification of human information processing. In D. K. McBride & D. D. Schmorrow (Eds.), Quantification of human information processing (pp. 1–41). New York: Lexington. Moravec, H. (2003, October). Robots, after all. Communications of the ACM, 46(10), 90–97. Stanney, K., & Zyda, M. (2002). Virtual environments in the 21st Century. In K. Stanney (Ed.), Handbook of virtual environments, design, implementation, and applications (pp. 1–14). Mahwah, NJ: Lawrence Erlbaum. Stein, R. (2007, October 6). Real hope in a virtual world. Washington Post, p. A1. Tyler, M., Danilov, Y., & Bach-y-Rita, P. (2003). Closing an open-loop control system: Vestibular substitution through the tongue. Journal of Integrative Neuroscience, 2(2), 159–164.
Part IX: Military and Industry Perspectives
Chapter 33
THE FUTURE OF NAVY TRAINING Alfred Harms Jr.
BACKGROUND With the U.S. Navy investing billions of dollars annually to educate and train its broadly skilled and widely dispersed forces, it is essential that relevant learning content, captivating presentation methods, ubiquitous delivery options, and responsive support systems be efficiently employed to ensure that our sailors have near-continuous, ready access to the most effective learning experiences possible. This mandate becomes even more compelling as the navy continues to reduce in size to an active duty force of 320,000 or less, with minimally manned crews and increasing requirements for multiskilled, cross-trained professionals. Today, one area of focus should be virtual environment learning scenarios as they offer uniquely powerful, creative, and adaptive opportunities for best preparing sailors to successfully serve in ever-changing, increasingly challenging, and potentially risky settings. From high end, full-scale, multiuser, immersive applications to low cost, individual, part-task trainers, virtual environment learning experiences are generally more exciting, impactful, fun, and effective than traditional education and training methods. However, the use of virtual world environments should not be viewed as a panacea for all learning requirements. Rather, they are but one tool in what must be an engaging, tailorable, and incredibly flexible set of blended learning solutions to satisfy the needs of the twenty-firstcentury sailor. When determining how to best provide any learning opportunity, it is instructive to review the crucial role of education and training in the longterm success of the U.S. Navy (and similarly, all of our armed forces). Once education and training imperatives are understood as cornerstones of individual excellence, successful team development, and operational force primacy, one can smartly consider how to best use virtual world environments to enhance future learning experiences. The U.S. Navy has been the world’s preeminent maritime force for well over half a century, and whereas some may debate the reasons why, there are three certainties that highlight the U.S. Navy’s unquestioned supremacy on the high seas and around the world. First and foremost, today’s navy is the best in the world
372
Integrated Systems, Training Evaluations, and Future Directions
due to the wonderful people who fill its ranks from the most junior seaman recruit to the highest ranking admiral. These marvelous volunteers are remarkable in many ways, but most notable are their far-reaching talents, courageous dedication to mission, steadfast trustworthiness, selfless teamwork, and unparalleled commitment to a life of service. Motivated, disciplined, loyal, and hardworking, these amazing professionals are and always will be our nation’s most significant “asymmetric advantage” in any military action, and they will long be our “surest guarantor” of military success if called to fight. No matter what the situation or circumstance, our servicemen and servicewomen routinely provide the pivotal difference between success and failure whether operating in seemingly simple, incredibly complex, or unimaginably demanding scenarios around the globe. In the end, it is the human, and not the machine or technology, who provides the winning margin. That is not to say that the numbers, availability, and capability of our nation’s military and other security equipment are not exceedingly important. In fact, most experts would agree that the second principal reason for America’s military excellence is due to the fact that our fighting forces are generally well-outfitted with modern, world-class equipment. Furthermore, they are adequately resourced to consistently maintain and regularly upgrade these substantial capabilities. Multiple times in our history, the enormous manufacturing capacity of America’s industrial complex has enabled our forces (and various allies) to continue fighting when our enemies simply could not keep up. Ultimately, we outproduced and overwhelmed the enemy with virtually unlimited force structure. Likewise, our nation has for many years aggressively fielded top-of-the-line technological capabilities for its fighting forces, further ensuring consistent and predictable success on the battlefield. Generally speaking, our fighting forces have the necessary tools to fight early, win decisively, and return home safely. This fact does not imply that combat or daily operations are now or ever will be risk-free, as has been so graphically highlighted by the recent years’ casualties experienced in Iraq and Afghanistan. However, no opposing force in the world can match the quality of the people and the equipment routinely fielded by American forces, and together this combination has been key to our military’s many successes over the years. The third and perhaps most dominant factor ensuring the exceptional performance of our military forces has been the nation’s unparalleled commitment to educate and train all personnel, officers and enlisted alike. The finest people, outfitted with the finest equipment, will not alone produce an effective fighting force; extensive and continuing education and training are absolutely necessary to fully develop and exploit the enormous human skills and equipment capabilities resident in our forces. It is this third factor that truly distinguishes the U.S. Navy (and sister services) from other military forces around the world. CHANGING REQUIREMENTS Given that the U.S. Navy is the world’s best and that all three dimensions of excellence—personnel, equipment, and education and training—play a key role
The Future of Navy Training
373
in end-game success, observers might proffer the broken logic that questions “if it ain’t broke, why change a thing?” One would think that when living in a world with almost incomprehensible and still accelerating rates of change, the answer should be readily apparent. Frankly, with so many things changing at unprecedented rates, and an institutional proclivity toward rapid change in most operational scenarios, it is hard to fathom why there exists such inertia opposing change in learning approaches. It is also disappointing that some leaders “just don’t get it” and seemingly undervalue (ignore) education and tolerate suboptimal training regimens designed generations ago. Are our ranks filled with such talented and adaptive people—great patriots committed to success at any cost— that we have become too comfortable with the status quo and not been forced to change? I hope not, for just as status quo operational tactics will eventually get our sailors killed as the enemy continuously adapts for advantage, so will status quo learning approaches fail (or unnecessarily falter) as global realities and mission requirements rapidly change. The navy’s legacy, mass-production approach to learning was extremely well suited for last century’s “force on force”—“fixed enemy” mindset (World War I through the Gulf wars); however, the twenty-first century will demand a broader, more complex mission set than ever before experienced. From large naval platforms with large crews having significant individual specialization and ratingspecific tasks to smaller mission-configurable platforms with smaller crews of multitalented individuals requiring extensive cross-training, it is imperative that today’s learning environment be precisely relevant, highly tailorable, readily adaptable, and pervasively available to meet ever-changing threats and mission requirements. Whereas last century’s navy faced well-defined threats generally dominated by our superior forces, technologies, and supporting infrastructure, today’s navy operates in a more complex operational environment where the collective threat of unpredictable and often unknown situations can be metaphorically described as a “thousand daggers” versus a “single, savage thrust”! This changing environment, coupled with the realities of increasing competition for talent and increasing costs for manpower, makes it imperative that navy education and training efforts be creatively adopted, wisely managed, and predictably effective in producing sailors who can consistently “outthink,” “outlearn,” and “out-adapt” the enemy. This imperative becomes even clearer with time as virtually no enemy is likely to revert to any traditional force attrition warfare model in the near or foreseeable future. Rather, our great navy will continually face a pervasive, shadowy mix of professional warriors, hard-core criminals, determined extremists, and zealous bunglers, all capable of delivering unmistakable horror in unimaginable places with unthinkable consequences. We must prepare for this new world and not wallow in the relative comfort of the past . . . a past that we essentially dominated, but one that has become increasingly less relevant. FUTURE POSSIBILITIES AND DIRECTION So, how does the navy ensure that the education and training pillar of operational excellence continues to provide the winning margin of performance in
374
Integrated Systems, Training Evaluations, and Future Directions
the future? I contend that adequate numbers of wonderfully talented and dedicated men and women will continue to serve their country as members of the armed forces, taking their turn in helping preserve the blessings of freedom and opportunity for all. Likewise, I am convinced that our nation will continue to provide our forces with the finest equipment in the world and adequately resource the outfitting, upgrading, and modernization of our equipment inventory. Finally, I am convinced that we can achieve, enhance, and sustain a world-class learning environment that will interest, challenge, and richly prepare our sailors for their demanding responsibilities in the future. Although the task is daunting in many respects, there is less mystery than many imagine in crafting a successful way ahead. Boldly executing necessary changes to existing policies and programs will be neither painless nor cost-free; however, failing to understand, value, and embrace needed change in the ways we educate and train naval personnel will assuredly result in the diminished growth, development, and accomplishment of our most important “force multiplier”—our people! Transitioning from legacy learning programs to an approach fully leveraging proven science of learning principles and human performance considerations will help ensure our people pursue and master relevant content in an efficient and effective manner. More specifically, we have the capability today to accurately analyze and define actual performance requirements with high specificity and then build learning opportunities tailored to relevant performance objectives. To neglect this fundamental step in ensuring that our education and training efforts are properly targeted on specifically known and thoughtfully projected performance requirements would be both foolish and costly in today’s constrained resources and high stakes environment. A more serious misstep would be a conscious decision to endorse a “good enough” mentality toward navy learning as it exists today, and worst of all would be the utterly disingenuous, “head in the sand” approach used by those who neither value the importance of learning nor understand its direct linkage to our readiness status. We would never approach the people and equipment pillars of military readiness in this manner; rather, we always assess needs and capabilities with laboratory precision, valiantly maneuver to attract and retain the very best people, and procure and maintain the very best equipment. Similarly, we should never shortchange the science, support, and pursuit of what is arguably the most differentiating pillar of the readiness equation—educating and training our people. The use of virtual world environments and other supporting simulations will enhance the learning experience of all our sailors by providing discriminating realism, contextual fidelity, and dynamic and interactive participation, confirming repetition and controlled assessment of cognitive, affective, and psychomotor skills, all in a cost-effective manner. This type of learning experience will become more and more critical with expanded multimission tasking, increased cross-training requirements, and a growing likelihood of short-notice, unpredictable, and unconventional mission scenarios. The expanded use of virtual world environments will enable more meaningful learning experiences whether pursuing (1) simple “exposure” events for improved awareness,
The Future of Navy Training
375
(2) repetitive “practice” events for enhanced competency, or (3) one-of-a-kind, “final rehearsal” events designed to both test and assess the learner’s ability to outthink, outlearn, and out-adapt the enemy. In addition to enhancing individual or team performance, these special learning environments can also provide a reliable, dynamic, and affordable methodology to determine attainment of qualification standards or certifications with high levels of specificity, standardization, fairness, and repeatability. Finally, without virtual world learning environments, there may be no practical or affordable way to expose, educate, and train our people to successfully operate in those infrequent, remotely located, potentially unsafe, and highly volatile or overtly dangerous situations we know our forces will encounter. The benefit to our forces to even be minimally exposed to these types of scenarios before actually encountering them will prove to be invaluable in terms of individual confidence and performance, thereby enhancing the likelihood of mission success and personnel safety. Whatever the mission, having the benefit of repeatable and varied learning experiences will unarguably enhance the end-game performance of most individuals, again measurably increasing the likelihood of mission success and personnel safety. Legacy learning methods alone are simply no longer adequate to prepare our forces to engage, counter, and defeat a determined enemy. Furthermore, legacy learning approaches typically do not facilitate real time, in-depth assessment of performance for feedback and evaluation purposes, thereby limiting learning and performance improvement for most individuals. Both of these situations are unacceptable shortcomings given the upside potential of today’s learning options, especially when considering the pervasive threat and unforgiving scenarios that many of our forces will face with increasing frequency in the coming years. Beyond providing more relevant, better tailored, and more easily assessed learning experiences for our people, virtual environments can clearly be more attention grabbing, more engaging, more stimulating, and, in some cases, more pedagogically sound than traditional learning approaches. Anything done to spark, broaden, and sustain an individual’s desire and passion for learning is critically important, especially in an era where self-driven, independent, lifelong learning will become more and more the norm. Yes, self-administered, selfassessed learning will become commonplace in the years ahead, and these experiences must be relevant and rewarding in order to motivate our people to learn the knowledge, skills, and abilities necessary for survival and success in the twenty-first century. The use of virtual environment experiences will also enhance important “selfdiscovery” aspects of learning. Although sailors are expected to essentially master the content of mandatory topic areas, virtual environment learning experiences are easily designed to offer additional, positive aspects of learning. These learning skills include widespread use of information-seeking, informationanalysis, and information-synthesis tools, real time exercise of adaptive problem solving skills, and routine implementation of collaborative planning and decision-making skills. Finally, whereas traditional learning approaches tend to
376
Integrated Systems, Training Evaluations, and Future Directions
be individualistic by nature and often rooted in either the textbook author’s or classroom instructor’s perspective, virtual world experiences are typically more inclusive, offering the learner opportunities to gain familiarity with multiple and diverse perspectives. Then, using his or her own knowledge base and personal experiences, the learner can better differentiate between fact and opinion, draw balanced conclusions, and more successfully engage both our allies and our enemies around the world. CONCLUSION Virtual learning environments can provide relevant and rewarding learning experiences and, in many cases, offer far more innovative and exciting learning scenarios than in any traditional setting. If we are to be fully responsive to the needs, interests, and strengths of our sailors, and if we are truly focused on maximizing their growth and performance, it is hard to imagine that the U.S. Navy would not exploit the many advantages of virtual world learning environments throughout its education and training programs. As stated in the opening paragraph, however, there is no “one size fits all” solution; rather, a judicious blend of traditional and virtual learning environments will be needed in the future. In that future though, virtual environment learning experiences can and should play a large role in helping prepare our sailors maintain the competitive edge necessary to remain the world’s greatest navy and successfully fulfill their incredibly important role as guardians of our grand republic.
Chapter 34
THE FUTURE OF MARINE CORPS TRAINING William Yates, Gerald Mersten, and James McDonough What ought one to say as each hardship comes? I was practicing for this, I was training for this. —Epictetus
ROAD MAP FOR TRAINING The U.S. Marine Corps Training Modeling and Simulation Master Plan (TM&SMP) was signed in January 2007 (Trabun, 2007) and provides a road map for developing training technology requirements for fully implementing training modeling and simulation (M&S) in support of the Marine Air-Ground Task Force (MAGTF). In broad terms, the Marine Corps relies on simulation to enhance training across the continuum that begins with classroom instruction and culminates in live-fire and live-maneuver exercises. The long-term vision for Marine Corps training is that simulation will be a transparent window from the live environment to a virtual tactical environment that will provide marines from the riflemen to the MAGTF commander with faithful representation of the battle space in which to hone their skills. Simulation training in the MAGTF of the future will begin with entry level training for marines and officers in their basic skills and primary military occupational specialty, for example, marksmanship training and vehicle operator training. After graduating from the entry level training pipeline, the marines in operating units will use simulations to train for the performance of collective tasks within their organic capabilities and also with other units. One example of such collective training would be exercising virtual close air support linking a forward air controller or joint terminal attack controller (JTAC) to a pilot in the cockpit of a high fidelity flight simulator. As a culmination of the training continuum, the many components of the MAGTF will employ live and virtual simulations together in a common scenario conducted inside the wrapper of a constructive simulation prior to deployment. The capstone pre-deployment
378
Integrated Systems, Training Evaluations, and Future Directions
exercise includes elements of intelligence and a geographic frame of reference that are tailored to prepare the deploying MAGTF for the specific mission for which it is to deploy. While a MAGTF is deployed, simulation will be embarked with it and connected via distributed networks to the higher headquarters and a tactical exercise control group that will support the MAGTF commander’s requirement for rapid scenario generation for mission planning and rehearsal while embarked and under way.
USE CASE FOR A SIMULATION-ENABLED INTEGRATED TRAINING EXERCISE Consider the case of a future training exercise integrated across the domains of live, virtual, and constructive training and spanning the echelons from the individual marine on the battlefield to the MAGTF commander’s staff. A typical series of events might begin with a squad of marines in the 1st Marine Expeditionary Force Battle Simulation Center at Camp Pendleton, California, training in a desktop or laptop based virtual tactical decision simulation (TDS) on a patrol in a geospecific representation of a real town in a foreign country. In the course of the patrol the marines encounter a citizen of the country in which they are operating and converse with him in his native language while using appropriate cultural behaviors. The citizen of this foreign country would be represented by an artificially intelligent avatar of a human being that is indistinguishable from other human avatars controlled by the marines. This artificial intelligence (AI) person would speak the native dialect and react in a culturally appropriate manner to the actions (kinetic, verbal, and nonverbal) of the marines. In the course of the conversation with this AI entity, the marines learn of a potential high value individual (HVI) who is thought to be located in a compound outside the town. This information is relayed via a virtual tactical radio link from the marines who are physically located at the simulation center to a regimental combat operations center (COC) in the field at Camp Pendleton. The staff in the COC receives the information on the location of the potential HVI. The suspected location of the individual is correlated from the map location in the virtual area of operations to a grid location in the training area of the continental United States training base where the COC is set up in the field. A decision is made to launch an unmanned aerial vehicle (UAV) to reconnoiter the location. Marines using a Raven-B laptop control station simulated by a virtual and constructive simulation fly a virtual UAV sortie over the area of interest and gather real time video that substantiates the presence of an HVI at the location gleaned from the conversation with the civilian in the TDS. A decision is made to attack the compound where the HVI is located with tactical air and then follow up with a live team of marines to conduct sensitive site exploitation. The sortie to attach the target will be launched from a simulated ship offshore, but flown virtually from flight simulators at Marine Corps Air Station Yuma (Arizona). Terminal control will be provided by a JTAC on the desert floor at Twentynine Palms (California). The voice communication between the JTAC
The Future of Marine Corps Training
379
in the field and the pilot in the simulator would be via a radio frequency to voice over the Internet protocol bridge. The JTAC would designate the target with his laser designator and the intelligent targetry would sense the laser and relay the location of the laser designation wirelessly through the range telemetry system into the simulation. As the pilot of the virtual close air support sortie comes within visual range of the target in the simulation, he will see a representation of the laser on the target that is actually taking place 200 miles away at Twentynine Palms. The pilot transmits to the JTAC “wings level,” and the JTAC turns his head to the sky to visually acquire the aircraft. Through his lightweight, mini-head-mounted, mixed-reality display (receiving simulation-injected position-location information, aka position-location information data for the aircraft) the JTAC sees the virtual representation of a Joint Strike fighter against the blue background of the sky and announced “cleared hot.” The pilot in the flight simulator released virtual precision-guided bombs that impact the location of the target. The simulation tracks the bombs to the target and, upon impact, stimulates the range targetry to activate pyrotechnics visible to the JTAC. Alternatively, if the JTAC is wearing a lightweight see-through mini-head-mounted, mixed-reality display, there would be an augmented reality explosion visible to the JTAC. The JTAC reports the bomb damage assessment, and the virtual sortie returns to the ship. The constructive simulation that serves as the “wrapper” for this exercise adjudicates the effects on the target and “killed” the target entities affected by the attack. Meanwhile, a live squad of marines isolates the target area and checks for dead and survivors. They discover a badly wounded individual represented by an anatomically correct dummy that displays wounds consistent with the concussion and fragmentation of being in proximity to a bomb blast. The corpsman employs lifesaving measures, and a request for casualty evacuation is sent via radio to the COC. Documents and other relevant information are collected from the site and conveyed to the COC, and the exercise continues. The hypothetical case described here is used only for illustration of how the elements of live, augmented, virtual, and constructive simulation might be integrated in a scenario that provides highly realistic and valuable training for marines from the corpsman, to the JTAC, to the pilot, to the staff in the COC, to the marines operating in the TDS at the simulation center. Whether orchestrating such a training scenario is worth the overhead in planning and synchronization is a decision made by the commander, but creating the training infrastructure necessary to facilitate such an event is the intent of the TM&SMP.
INVESTMENTS AND BENEFITS The justification for M&S infrastructure is tied directly to MAGTF capability lists and MAGTF requirement lists. The decision to make an investment in a simulation for training is based on an analysis to determine whether a simulation offers an economy of risk, time, or consumable resources (fuel, ammo, and sweat) compared to alternative venues for training, for example, live fire and
380
Integrated Systems, Training Evaluations, and Future Directions
maneuver. After a simulation application is identified that offers the potential for efficiently training marines, the candidate system must be vetted to ensure that it is truly effective at imparting the desired skill to trainees. At the same time, the system is evaluated to ensure that it does not impart negative training. One of the most important potential efficiencies of simulation training is realized by distributed training over a network that connects marines at geographically distant locations. Operational tempo combined with the cost of transportation of troops and equipment makes it very expensive to conduct live training exercises that span the entire MAGTF. Distributed simulation facilitates marines remaining at their home station, but interacting in a virtual environment (VE) with their counterparts across the command element, ground combat element, air combat element, and combat service support element. Robust distributed simulation exercises require a relatively high bandwidth network backbone dedicated to training. The Marine Corps presently participates in distributed training conducted over the Joint Training and Experimentation Network (JTEN) administered by the Joint Forces Command Joint Warfighting Center. Training conducted using the JTEN must be of a joint nature and be under the umbrella of the Joint National Training Capability. The Marine Corps is also beginning the task of federating some of its training simulations to operate in the Navy Continuous Training Environment (NCTE), which is similar to the JTEN, but was created specifically to conduct training of U.S. Navy and U.S. Marine Corps mission essential tasks. While training in a joint and naval service context is essential the JTEN and the NCTE do not provide a readily available distributed training network on which the Marine Corps can conduct Title X training. For this reason the Marine Corps is studying the requirement for a training network to enable training exercises between dozens of Marine Corps installations that span the force to include the reserve component. At the present time this future capability is being referred to as the Marine Corps Training and Experimentation Network. A high bandwidth training network is a key enabler for the Marine Corps’ vision of the future training capability.
CAPABILITY GAPS The TM&SMP identifies seven science and technology (S&T) “long poles in the tent,” aka S&T long poles, for achieving the objective capabilities in training simulation. These S&T challenges are identified gaps that exist in the current training capabilities of the Marine Corps. Research into potential solutions for these technological challenges is of keen interested to Training and Education Command (TECOM) Technology Division, as well as such organizations as the Office of Naval Research and the Defense Advanced Research Projects Agency. Proposed solutions to these challenges coming from industry or academia must be an open source approach to both software and hardware. The Marine Corps will not invest in proprietary technology that is not interoperable or for which full interface design documentation is not available. The following is a list of the
The Future of Marine Corps Training
381
S&T long poles paraphrased and quoted from the TM&SMP: 1. Rapid generation of high fidelity three-dimensional (3-D) terrain databases to include contour, vegetations, hydrography, and man-made structures and equipment. 2. Targets for the Live Virtual and Constructive Training Environment that are present and interactive from the perspective of all participants in a training environment whether they are participating in a live, virtual, or constructive capacity. Targetry on a live-fire range must “come alive” by a stimulus from a virtual or constructive simulation. When a virtual representation of a target engages in a simulation, the effects on the target must be visible to observers in the live realm. For example, marines in the field instrumented with position-location information and target sensor telemetry would be notified and react appropriately if the location at which they stand is engaged by a fire for effect in the constructive simulation. Tactical command, control, communications, computers, and intelligence (C4I) systems will be stimulated by simulations that communicate via simulation-to-tactical gateways providing data feeds to the commander and his staff that are indistinguishable from operational data feeds. 3. Representations of man-made structures, particularly those in an urban environment, must be faithful replications of actual buildings and objects. These virtual structures must incorporate realist physics models of their construction so that they respond appropriately to kinetic actions, such as breaching of a wall by a vehicle or destruction of the building by ordnance. To be useful in the context of mission planning and mission rehearsal, it is a requirement that these virtual urban structures be created rapidly by the training unit. 4. Live forces must be tracked via position-location information and those tracks represented in virtual and constructive simulations both indoors and outdoors. The position-location information systems cannot be dependent on line-of-sight telemetry. Virtual representations of humans, both computer generated and those representing position-location information feed from live troops, must fully implement human anatomy motion-tracking and display. Physical movements, such as hand and arm signals, must be fully articulated in the virtual environment. Human models must exhibit appropriate cultural, emotional, religious, and ethnic responses to the stimulus from the training environment. 5. Accelerated learning science is a requirement to assist marines in rapidly assimilating knowledge on how to perform complex tasks. Specific S&T gaps include the following: a. Foundations of learning applied to complex tasks. Cognitive load theory and instructional efficiency must be extended to complex tasks. b. Training interventions triggered by neurophysical markers of learning and cognition. c. Principles of expertise development and strategies tailored to continual proficiency models, beyond today’s simple novice to expert techniques. Artificially intelligent opposing and friendly forces should sense the proficiency level of the trainee and scale the difficulty of the training to a level appropriate and optimal for learning and skill acquisition. 6. Experiential learning technologies that provide an experience similar to the experience of completing the live Mojave Viper pre-deployment capstone exercise
382
Integrated Systems, Training Evaluations, and Future Directions conducted at Twentynine Palms to forces at their home stations or deployed in a virtual environment. The content of the learning experience must be tailored for the individual marine’s learning aptitude and base of knowledge.
7. The injection of political, military, economic, social, infrastructure, and information nonkinetic effects into operational level staff exercises is needed to support all elements of national power, future operations, and long-term assessments. This capability must not add to the number of support personnel for a training event. Currently this aspect is war-gamed in a seminar format combined with the simulation effects of a kinetic exercise.
IMMERSIVE TRAINING Recent experience of marines in combat, training in preparation for combat, and the considered opinion of senior operating force leadership are the basis for the Marine Corps’ investment in VE training technology. The short “dwell times” spent at home-station bases between deployments compel the Marine Corps to develop immersive training to augment and improve upon live training capabilities. Virtual training at a less than immersive level facilitated by the deployable virtual training environment will continue to be an important component of training. What follows is a detailed discussion of a research initiative into the higher end of virtual training environments. The Marine Corps is investigating a fully immersive infantry training environment that blends live participants using real weapons and equipment with augmented/virtual reality in order to provide an experience of sufficient realism that it results in a suspension of disbelief. The infantry immersive training environment seeks to (1) replicate as closely as possible the effects and conditions of the battlefield and (2) allows a dismounted infantry squad, platoon, or battalion to effectively train in the wide spectrum of tasks necessary to execute the full range of military objectives. To sufficiently stimulate the senses of trainees to the point of immersion, this trainer must synthesize the “fog of war” through a cluttered, confusing, combatlike environment. As marines assimilate into the environment they will experience stress and fatigue and encounter more chaos and randomness (fewer linear events). The increased realism and interactivity of the training will begin to synthesize the exposure that comes from a first firefight and the nonkinetic events leading up to the baptism by fire. This training environment must stimulate all the senses and overwhelm marines into believing they are experiencing a “real, life-like” firefight. The mixed reality training experience should include such elements of realism as the following: ü
Getting smacked in the face by branches and getting a face full of spider webs while on a security patrol;
ü
Conducting an extended security halt on the snow north of the Arctic Circle;
ü
Encountering a snake that falls out of a tree on the patrol;
ü
Finding the lost members of a squad-sized patrol at night in heavy vegetation;
The Future of Marine Corps Training ü
383
Having the enemy detect and open fire on a patrol if the marines do not exercise noise and movement discipline.
Current tactical training environments are in general too sterile. By having wounded who scream, dead who smell and decay, civilians, and so forth, we can make the experience more challenging and realistic. All who are about to go into a firefight wonder how they will act and how will they handle fear or the initial panic of combat. The immersive training environment must answer those questions in the mind of the marine being trained. PAYING IN SWEAT VERSUS BLOOD In early air-to-air combat over Vietnam, navy pilots achieved a kill ratio against North Vietnamese MiG jets of only two to one. A careful study showed a dramatic seasoning curve increase for pilots after combat. Forty percent of all pilot losses occurred in their first three engagements; however, 90 percent of those who survived three engagements went on to complete a combat tour. In 1969 the navy began a program that sought to provide a pilot his first three missions risk-free. Top Gun pitted novice airmen against a mock aggressor skilled in North Vietnamese aerial tactics. Combat was bloodless yet relatively unfettered. Uncompromising instructors recorded and played back every maneuver and action. The results were dramatic. From 1969 until the end of the air war, the navy’s kill ratio increased sixfold. The Marine Corps is investigating immersive training for the ground infantry equivalent to the navy’s Top Gun for pilots. Some of the impetus behind developing the technologies for the creation of home-station and/or deployed immersive training environments is to significantly enhance that first combat experience and provide a realistic, life-like experience as close to a first fight combat experience as possible. Although the focus is on surviving the first kinetic engagement with the enemy, the simulation environment will also encompass nonkinetic experiences of interaction with host nation civilians and local government officials. Training for the escalation of force will be a critical element of the objective virtual training capability. While realism is paramount, the objective training capability must also be scalable to accommodate training for the entire force. The most sophisticated and realistic virtual training environment is not a solution to the Marine Corps’ requirements if it is not affordable in sufficient capacity to train every rifle squad in the Marine Corps. The training environment must be interoperable with joint training simulations, especially in the exchange of C4I data feeds. The objective environment must replicate not only the employment of organic equipment, but also supporting arms and sensors. WHY IMMERSIVE MIXED REALITY? The goal of virtual and mixed reality training is to make the experience at home station more like a Mojave Viper experience. Immersive and mixed reality
384
Integrated Systems, Training Evaluations, and Future Directions
does not provide the entire answer, as we cannot fully replicate such emotions as fear and panic; however, the more senses we can stimulate in a virtual training environment to deliver the required suspension of disbelief, the more completely we can achieve the goal of learning by “living” the experience and the closer we can get to that first firefight experience and the emotions of fear and panic on the battlefield. The true benefits that we see from immersive mixed reality are as follows: 1. Provide a more realistic and engaging environment in order to allow leaders to make decisions in a near-combat environment experiencing those stressors and stimuli that are possible to be re-created in simulation. This consequence-free environment will provide these leaders with a reservoir of experience from which to base their future decisions in combat. 2. Create a revolutionary training environment where marines can interact with not only live players, vehicles, and aircraft (real when available and virtual/augmented otherwise), but also accurately generated virtual entities in both an urban environment and in open terrain. The potential cost savings and flexibility are tremendous. One example where this capability is clearly seen is close air support conducted in an urban environment. In the current area of operations (AOs), leaders are forced to make targeting decisions based on the threat, as well as the surrounding environment. The decision to engage a target and with what weapons systems in order to eliminate the threat, as well as minimize collateral damage, can be a challenge. We must present marines with this situation in a dynamic realistic training environment. With augmented reality, we could now have the observer look down on a real village populated with both live and virtual (augmented) role-players, both civilian and hostile, and based on what he sees he could then direct either a real (if available) or virtual aircraft to deliver virtual precision-guided munitions onto the target and observe the augmented effects. 3. Another potential benefit that the use of this technology will provide is the ability to quickly change not only a scenario, but also the environment. In an interior building this could be a literal change of the climate, but more importantly it means a change of the people that the marines will encounter in this environment. Instead of having to go out and hire a completely new set of role-players from a different part of the world, it simply requires loading a different set of entities and modifying their behaviors to match the current threat seen in the new AO. In addition, as the system is used the interactions of the virtual players can be updated based on current intelligence.
Fully immersive mixed reality training will enable the Marine Corps to quickly adapt training to prepare marines for any emerging threat. Virtual (augmented) environments blended with live training offer the potential of both improved realism and cost savings compared to a live-only approach in which all of the space and structures must physically exist. As the live elements of training remain constrained by location, the virtual (augmented) elements can change the context and tailor the experience to the demands of the mission.
The Future of Marine Corps Training
385
REFERENCES Trabun, M. A. (2007, January 18). U.S. Marine Corps Training Modeling and Simulation Master Plan. Quantico, VA: U.S. Marine Corps, Training and Education Command, Technology Division.
Chapter 35
THE FUTURE OF VIRTUAL ENVIRONMENT TRAINING IN THE ARMY Roger Smith
VE GROWTH STIMULANTS There is a rich history in researching and developing virtual environments (VEs) within the military. The simulator networking (SIMNET) program of the late 1980s and early 1990s demonstrated the deep value of virtual environment applications (Miller & Thorpe, 1995; Davis, 1995; Singhal & Zyda, 1999). Twentyfive years later there have been significant advances in this area, but there remains vast unexplored potential in this field. There are potentially hundreds of valuable applications to real military operations in logistics, command and control, situation understanding, and information fusion. In both the commercial and the military worlds, the power of VEs is significantly enhanced by the growing availability of digital data in every industrial and government domain. In a world where reconnaissance photos are captured on physical film, there is little that computation and VE can do to enhance this information. Once those photos become digital, it is possible to analyze, fuse, integrate, and morph them so that they become the visible skin of a VE. As most information about the world becomes digital, it creates opportunities to generate higher levels of understanding and new advantages over competitors. As the world has become networked, digital data have also become globally accessible so that digital photos from every continent can be viewed in real time anywhere in the world. As network bandwidth, computational power, and VE algorithms advance, there will be a point at which these images can be stitched together into a seamless threedimensional (3-D) map of the entire world and navigated in real time. These data will include digital images, sound waves, weather patterns, population densities, personal locations, radio frequency spectrum, financial transactions, and dozens of other specializations. From a military perspective, most situations of interest are geographically based. In the past, our technologies have limited our ability to construct information into a geographic form similar to the world from which it was collected. The
The Future of Virtual Environment Training in the Army
387
VE is a new and powerful alternative to textual, graphic, and other paper-oriented representations that have dominated our decision making for centuries. Today’s leaders, managers, and engineers are very comfortable communicating information that has been structured in the form of graphs and tables. The next generation will be just as comfortable structuring information into unique VEs and exploring those collaboratively as a means of understanding and manipulating the world.
COMMERCIAL LEADERSHIP Sometime in the late 1980s there was a tipping point (Gladwell, 2002) at which commercial industry took the lead from government laboratories in advancing computer technologies. The explosion of consumer-grade computing power led to a corresponding explosion in software applications that could exploit this power. One of these growth areas was the computer gaming world that created such products as Quake and Unreal and an annual harvest of new competitors presenting the best VE rendering available at consumer price levels. This civilian market will continue to drive research and development into VEs and the creation of ever-more beautiful and immersive worlds in which to interact with information and other people (Smith, 2006; Dodsworth, 1998). Just as e-mail and instant messaging have replaced the telephone as the leading medium for personal communication, and the Web has replaced the library as the leading repository of information, VEs will replace the textual Web page as the primary medium for shopping, socialization, and exploration. VEs can capture both the contextual relationships of hyperlinks and the proximity relationships of geographic collocation. Some form of VE will become the context within which online digital information is organized, significantly extending the linked, flat Web pages that convey this information today. People who are browsing through data will be able to discover related items that are geographically close to each other just as they do when browsing in a physical library or bookstore. Such applications as Google Earth, Second Life, and World Wind are beginning to illustrate this future. Imagine a World Wide Web in which all personal information is tied together in a single context. For example, a social network of friends live as 3-D avatars in a VE apartment where favorite video clips are streaming on one wall and the contents of an online encyclopedia are lying on a coffee table. Further, in a VE there is no reason that the apartment has to look anything like a physical habitat; it could be a giant garden, forest, cloud city, or ant colony. The information that people need and enjoy may grow like flowers in the garden all around them, their colors and sizes representing currency, importance, source, or other key attributes. Most commercial VE expressions are uniquely personal, playful, and civilian, but the technologies behind them are seriously powerful. Like the radio and the semiconductor before them, these technologies are not limited to entertainment, business, or national defense, but can be applied equally to each domain. The commercial world will be the source from which advanced VE technologies spring and the foundation from which military applications are built.
388
Integrated Systems, Training Evaluations, and Future Directions
Though computer-generated VEs are primarily visual, there may be other alternatives to loading information into the human mind. Direct neural stimulation may allow information to enter without going through the eyes. Technology that enables a blind person’s mind to “see” is similar to that required to generate a VE directly in the mind. The advantages of this approach are beyond current understanding. A neural image may be superior to a standard visual scene. It may create a new sense of the data that are contained in the world, effectively enhancing the human ability to perceive rich mixtures of data within a VE. Further a field is the possibility of creating or enhancing the VE through the use of chemicals. It may be possible to chemically stimulate the brain to construct useful representations of information. The 1960’s experiments with LSD (lysergic acid diethylamide) cast a dark shadow over these kinds of experiments, but new research into chemically enhancing athletic and soldier performance is bringing these ideas back into vogue. Just as caffeine can enhance alertness and reaction time, other chemicals may improve understanding of information that is part of combat operations or that drives training for life-threatening missions.
VE APPLICATIONS The term “serious games” is often used to describe the application of game technologies to military or industrial problems. This has been a useful term, but it will become archaic as the distinction between game technology and nongame applications fades away. Computer chips and graphics cards are not referred to as “entertainment chips” or “serious graphics cards.” They are just tools for constructing useful applications. The same will occur with serious games. All industries will have VEs that meet their needs, just as they have specialized computing and communications devices today (Bergeron, 2006; Lenoir, 2003). Since 1992, the military has identified its simulation tools as live, virtual, or constructive. This delineation has highlighted the computational and conceptual limitations in representing both breadth and depth in a VE. “Virtual” refers to the use of simulated objects by real humans, and these systems have typically represented small areas with few objects at relatively higher levels of detail. “Constructive” refers to the use of simulated objects by simulated people, and these systems have represented very large areas and many objects with relatively less detail. In the years since these definitions were standardized, advances in computation have enabled the creation of many systems that combine one or more domains. Further advances in computation, communication, and conceptualization will allow us to stretch the boundaries of these domains so that there is little difference between them. In the future, constructive and virtual will refer only to the view that is being presented to the human or to an artificial intelligence, not to any inherent limitation of the models that are driving the virtual world. There have been three distinct generations of “constructive simulation,” and perhaps future VEs will create a fourth. The first was the use of sand tables and miniature figures, essentially a scaled representation of the real battlefield. The
The Future of Virtual Environment Training in the Army
389
second was the paper board game that allowed greater abstraction and additional rigor in the rules and mechanics of behavior. The third was the computerization of the war game that extended the algorithms to the limits of the computer rather than the limits of a human player (Allen, 1989; Perla, 1990; Dunnigan, 1993). Advances in VEs will enable the creation of a constructive simulation that is just as detailed as any virtual simulator if so desired. It will employ aggregation and abstraction as a useful metaphor rather than as a core design limitation driven by limited computational power. In the “live” domain there will be VEs embedded in real equipment just as twodimensional map displays exist in equipment today. These VEs will be integrated into the control screens and head-mounted displays that are currently portals into flat, disassociated, two-dimensional data. Rather than seeing the battlefield from a top-down, two-dimensional view, the operators will be able to see it in three dimensions from any angle that they find useful. This is a hugely powerful paradigm and carries so many potential options that the challenge will be in determining where the valuable views lie, not in rendering and animating them for the operator. In this world, there will be little difference between the objects that come from a simulation and those that exist in the physical world. All of them will be seamlessly integrated into a VE.
ARMY MISSIONS VEs are supplemented with physical and cognitive models, software management and control tools, and external interfaces to operational devices to create simulation based training systems. As the nature of the army mission has changed, simulations and VEs have been challenged to represent new missions, new threats, and new tactics that capture the essential elements of the real world and can be used to teach this reality to humans. We have emerged from four decades of a Cold War in which most military training focused on large combat operations that occured on specified battlefields where all participants were expected to be combatants. More recent missions have focused on small units in an urban environment where they must perform humanitarian operations, search and reconnaissance, facility defense, and combat operations all on the same day. This has created a situation in which our VEs and simulations are expected to represent a much more diverse set of objects and interactions. These can no longer be “combat only” models of the world. The focus of current and future missions appears to be on much smaller areas, making it both possible and desirable to deliver very high levels of detail in the area of operations. This detail calls for a VE that can re-create combat operations in a single city block, but also allow personal communications with the populace to build an understanding of the societal factors surrounding the military operations. These factors will trigger important actions and reactions as the simulation progresses. Many of the current simulation models focused on immediate action and immediate consequences. In most cases, these actions and/or consequences are discrete and do not influence actions between objects in the future.
390
Integrated Systems, Training Evaluations, and Future Directions
While the military simulation community has been wrestling with models of information processing and human reaction, it has just begun to explore the richness of person-to-person relationships and their influence over different groups within a population. There is a great deal of “soft social science” that needs to be incorporated into VEs in the future. Accurate physics models of weapon penetration and aircraft lift remain important, but a useful understanding of the urban battlefield is driven by human interactions, motives, and group dynamics. In the past, military simulation systems have been able to focus on the universal and verifiable behavior of the physical world. But models of personal relationships and group behavior are highly cultural, social, and geographical. Huntington (1996) has suggested that all future competitions will be based on seven unique cultures that have emerged in the world: Western, Orthodox, Latin American, Muslim, Hindu, Sinic, and Japanese. Rather than a bipolar world threatened with traditional combat, we live in a more complex world in which the confrontations may be focused in the political, military, economic, social, infrastructure, or information domains and involve seven different and powerful cultures. VEs that are able to represent such a diverse world accurately and effectively will be a significant challenge and a significant focus in the future. ADVANTAGES VEs that are created electronically, biologically, or chemically all present significant advantages for military operations and training. They create an improved space for accessing, absorbing, understanding, and applying information. These are all information based terms that create a pattern very similar to the observe, orient, decide, and act loop that was first proposed by Col. John Boyd (Coram, 2004). The advantages to be gained are so significant that VEs will continue to grow in importance and in the breadth of their application. Specialized versions of VEs will be used for hundreds of different applications, each with a unique focus, but built on a core set of technologies. As the limitations of computer and communication technology fall away and our level of expertise in creating and manipulating these environments increases, VEs will appear in all types of consumer and military systems to aid people in making better decisions and taking more appropriate actions. VEs combine technologies that have been maturing in the training, entertainment, computer science, and communications domains for several years and have reached a point at which they can be adopted by hundreds of commercial and government organizations. REFERENCES Allen, T. (1989). War games. Berkeley, CA: Berkeley Publishing Group. Bergeron, B. (2006). Developing serious games. Boston: Charles River Media. Coram, R. (2004). Boyd: The fighter pilot who changed the art of war. New York: Little, Brown, & Co.
The Future of Virtual Environment Training in the Army
391
Davis, P. K. (1995). Distributed interactive simulation in the evolution of DoD warfare modeling and simulation. Proceedings of the IEEE, 83(8), 1138–1155. Dodsworth, C. (1998). Digital illusion: Entertaining the future with high technology. New York: ACM Press. Dunnigan, J. (1993). The complete wargames handbook: How to play, design, and find them. New York: William Morrow. Gladwell, M. (2002). The tipping point: How little things can make a big difference. New York: Little, Brown, & Co. Huntington, S. P. (1996). The clash of civilizations and the remaking of world order. New York: Touchstone Press. Lenoir, T. (2003). Programming theatres of war: Gamemakers as soldiers. In R. Latham (Ed.), Bombs and bandwidth: The emerging relationship between information technology and security (pp. 175–198). New York: The New Press. Miller, D. C., & Thorpe, J. A. (1995). SIMNET: The advent of simulator networking. Proceedings of the IEEE, 83(8), 1114–1123. Perla, P. (1990). The art of wargaming. Annapolis, MD: Naval Institute Press. Singhal, S., & Zyda, M. (1999). Networked virtual environments: Design and implementation. New York: ACM Press. Smith, R. (2006). Technology disruption in the simulation industry. Journal of Defense Modeling and Simulation, 3(1), 3–10.
Chapter 36
FUTURE AIR FORCE TRAINING1 Daniel Walker and Kevin Geiss Aviation pioneer Wilbur Wright (1900) stated, “It is possible to fly without motors, but not without knowledge and skill.” Our vision for the future emphasizes the capabilities of operators, not simply the hardware that confines them. Future air force training will be driven by expected operational requirements involving personnel extensively connected to their weapon systems, other operators, and coalition forces in a global environment. From a research perspective, we observe weapon systems that are increasingly capable and complex. Reflecting these advances, the future of air force training is live, virtual, and constructive (LVC): “live” personnel and equipment, “virtual” simulated adversaries and environments, and “constructive” computer-generated entities.
OPERATIONAL CONTEXT Operational Roles, Policy, and Doctrine Transformation, evolution, adaptation—the operational roles of air forces— are adjusting along with the nature of conflict. The Department of Defense (2006) Quadrennial Defense Review reemphasized the necessity and value of transforming training to account for the shifts from conventional or symmetric conflicts to asymmetric and unconventional engagements that go beyond traditional kinetics based operations and now focus on such areas as cyber warfare and humanitarian operations. Robust training systems must accommodate future weapon systems along with the makeup and tactics of future adversaries in diverse global operational contexts. The exact makeup of adversaries 20 years hence is unknown, but we do know that technology will advance the capabilities of our forces, as well as those of our adversaries. To enable effective operations, training methodologies require incorporation of advances in both technology and doctrine. In this context, air force personnel participate in military operations through a variety of weapon systems beyond 1
Disclaimer: This manuscript reflects independent views of the authors and is not an official opinion of the U.S. Air Force. Approved for public release WPAFB 08-0027.
Future Air Force Training
393
inhabited aircraft to include autonomous and semi-autonomous aircraft, space, missile, and ground systems. These systems are further functionally integrated with special operations, stability operations, and information operations. Distributed mission operations are discussed elsewhere in Volume 3 (see Andrews and Bell, Volume 3, Section 1, Chapter 8) and reflect a key evolution in the operational framework. System Attributes and Capabilities Military weapon systems continue to separate individual operators from ultimate mechanical events. Pilots no longer push a stick connected to a wire for manipulating a wing aileron, but rather they manipulate electronic interfaces sending digital commands to control uninhabited vehicle systems. The essential competencies required for such tasks may differ. However, for some systems, physical separation is mirrored by cognitive integration that imbeds humans in technological systems. The manner by which work is divided between human and machine is increasingly complex. A recent National Research Council report (2008, p. 30) asserts that for “today’s aircraft” it is now impossible to precisely assign “the percentage of responsibility to humans or machines.” Thus, careful analysis is required to determine for which tasks the human must be trained and how the human is integrated into the virtual environment (see Barnett, Volume 3, Section 1, Chapter 3). Air force weapon system technologies are becoming so diverse and powerful that training, testing, and skill maintenance will increase demands on training and simulation systems. Ackerman (2006) describes one such weapon system, the F-35 joint strike fighter (JSF). JSF targeting capabilities utilize substantial sensor and information fusion, including electro-optical targeting and scanned array radar. The JSF tracks all aircraft within a 10 mile radius and integrates information from 1,000 independent scanning radar arrays, which may be tracking unmanned aerial vehicles (UAVs), missiles, or moving ground targets. To support the currency needs of the JSF operator, training systems must provide innumerable variants on key dimensions (for example, weather, adversaries, and weapon systems). The JSF ultimately requires pilots to take on the additional duty of “chief information officer.” Through interacting with other JSFs, one aircraft has the ability to perform a mission by relying substantially on information provided by a second aircraft (Ackerman). Resource Constraints Two main resource constraints are driving greater implementation of virtual and constructive simulations for training. First, with weapon system capabilities, such as the JSF, it is difficult to put enough real assets in play to fully train a pilot. There is neither sufficient airspace nor enough capable live adversaries routinely available to enable training operations for the pilot of such aircraft, and certainly not for a whole squadron. Second, military operations are costly and fuels are
394
Integrated Systems, Training Evaluations, and Future Directions
precious commodities. Using funds or fuel for training rather than operations becomes a difficult decision. The cost-benefit/effectiveness analysis presented by Moor, Andrews, and Burright (2000) indicates that simulator based aircrew training is a valid alternative for the development of training strategies and requirements in light of these resource constraints. APPROACH Creating the training tools and strategies to improve warfighter performance using LVC operations demands development in five areas: competency based assessment, performance measurement, continuous learning, cognitive modeling, and immersive environments. Competency Based Assessment Consistent with other armed services, air force personnel receive primary training via traditional formal instruction courses. Advanced and continuation training is often administered using different approaches. For example, in the Ready Aircrew Program (RAP) discussed by Colgrove and Bennett (2006) aircrews train with a frequency and event based system to maintain proficiency through specified numbers and types of events. One consequence of a RAP for performance improvement or maintenance is its limited assessment capabilities. The assessment is conducted in two often uncorrelated parts. The primary assessment is conducted by tracking event numbers and frequency. Personnel could be deemed not mission ready by virtue of completing too few events or by exceeding a predetermined period between events. The second part of the assessment is a subjective evaluation of crew member mission competency. If the required events are not performed well, or the crew member appears incapable of succeeding in a designated mission, a supervisor could disqualify him or her. Simply performing the required events in the appropriate time period may be indicated as satisfactory training. A crew member might be deemed mission ready without any linking to qualitative assessment, since poor performance is not tracked by this method. In practice, these subjective assessments are not regularly conducted, and other methods, such as supervisor observation or self-reporting, are required to validate a need for further training. An alternative for aircrews is to use a competency based system versus simply accomplishing a required number of events. Competency based assessment requires detailed mission essential competency2 (MEC) evaluation, which is being instituted for many aircrew combat specialties. The MEC process determines the knowledge and skills, not just tasks, required for proficiency in a mission. Research presented by Colgrove and Bennett (2006) showed that MEC based training produces favorable results. For example, one aerial defense 2
The phrase mission essential competency, mission essential competencies, and associated acronyms have been service marked. Air Combat Command, Air Force Research Laboratory, The Group for Organizational Effectiveness, Inc., & Aptima, Inc. are the joint owners of the service mark.
Future Air Force Training
395
scenario study of MEC based training effectiveness showed 63 percent fewer enemy bombers reached their target, 24 percent more enemy fighter aircraft were killed, and friendly aircrews suffered 68 percent fewer simulated mortalities. Ensuring aircrews can perform such skills requires improving the measurement system. Performance Measurement Technological advances, fielded and under development, provide promise to capture the objective metrics to enable meaningful evaluation and tailored training. One advancement is evident in simulator based training; high fidelity simulation testbeds collect over 750 different performance parameters every 50 milliseconds. This dense data environment provides one component of a performance evaluation and tracking system. Schreiber, Watz, Neubauer, McCall, and Bennett (2007) describe this system as an emerging set of performance measurement strategies and tools to support competency based continuous learning. It includes subject matter expert observer assessments using behaviorally anchored grade sheets and objective measurements based on data from simulation or live operations. Robust and extensively instrumented live training would enable data collection similar to simulator environments. Important parts of this future environment for collecting objective performance data are available within the weapons system, but not generally transmitted on instrumented training ranges. Efforts are under way to develop live, virtual, and constructive techniques to capture those data, such as internal cockpit switch positions. Gathering comparable performance data from virtual and live experiences will enable seamless training for aircrews irrespective of domain. When procedures are in place to gather detailed performance data in all training events, then it will be possible to more efficiently tailor training to specific individuals rather than the “one size fits all” approach of many continuation training regimens. For further discussion of aviation training, see Schnell and Macuda, Volume 3, Section 1, Chapter 12. Continuous Learning A recent Defense Science Board report (Department of Defense, 2003) recommended that traditional schoolhouse training be replaced with continuous training employed on-site with the individual. LVC environments introduce the concept of the transparent venues with an added opportunity that such tools could support both training and operations, allowing personnel to take advantage of nonmission time for training. The continuous learning strategies we foresee go beyond simply “on-the-job training” and should become a standard feature of military systems. Conventional job based training reflects learning during the course of normal duties rather than a situation where the operator is unable to discern the training events from normal mission events. Admittedly, even laboratory experiments
396
Integrated Systems, Training Evaluations, and Future Directions
have not achieved completely seamless integration of simulation and operations for complex weapon systems, yet the value of continuous training and performance assessment is apparent. Hancock and Hart (2002) discuss one simplistic example of the integration of training, competency assessment, and operations. The Transportation Security Administration uses the Threat Image Projection software program where the performance of individual screeners in detecting weapons and explosives by X-ray imaging is evaluated continuously. This approach also allows the system to integrate up-to-date intelligence on specific threats. Likewise, the power of constructive simulations would allow training system designers to incorporate the latest information (for example, threats or terrain data) into training.
Cognitive Models Cognitive models for replicates and imbedded tutors are additional elements for enhancing mission-effective performance training. Cognitive model products are projected to shape service training. One approach is to develop models for performance prediction. Research models can account for the effect of training frequency on models of memory and may allow commanders to predict performance for specific training regimens. Jastrzembski, Gluck, and Gunzelmann (2006) propose that these predictions could then be used to determine effective application of limited training resources while having the greatest impact on improving individual crew member performance. Ball and Gluck (2003) present one pathfinder effort for advancing computational replicates, the development of a Predator UAV pilot computational model. The researchers first created a synthetic task environment (STE) tool to simulate operation of the Predator aircraft. The STE includes aircraft performance simulation and three synthetic tasks: basic maneuvering, landing, and a reconnaissance problem requiring sensor positioning over a target within given constraints (for example, wind, cloud cover, and flight path restrictions). In this STE, various cognitive models were developed in an effort to replicate human performance in dynamic and complex tasks. As this foundational work is expanded, future training strategies will include models of synthetic adversaries and allies. Well-developed models will provide a richer training experience than current rule based constructive simulations. Synthetic adversaries and allies will continue to be an important part of air force training for a number of reasons. As discussed above, modern weapon systems need large, complex scenarios to fully exercise their capabilities. Live adversaries and allies are expensive and less available due to shrinking force structure. Also, peacetime training restrictions (for example, range, space, and speed) decrease the effectiveness of live adversaries when matched against our most advanced systems. As Gluck, Ball, and Krusmark (2007) contend, computational replicates, when fully developed and deployed, offer greater flexibility as they can be modified more cheaply and perhaps more effectively than hardware-intensive live weapon systems.
Future Air Force Training
397
Figure 36.1. This graphic depicts three elements of future air force training technology systems—live, virtual, and constructive.
Immersive Environments As weapon systems continue to diminish the barriers between human and machine, training systems must follow suit. Continued advancements in training technology toward LVC environments (see Figure 36.1) can provide enhancement of immersion through sensory fidelity. Maximizing this fidelity by using more operational equipment may obscure the perception of an active training environment versus an actual mission. Mixing live and simulated entities in the same domain can challenge the situational awareness of participants, although preliminary research has discovered effective mitigation techniques. For instance, Hughes, Jerome, Hughes, and Smith (Volume 3, Section 2, Chapter 25) discuss aspects of integrating terrain data in simulations. Other concerns relate to operating with differing security levels, simulation hardware, and fidelity requirements. Governments and industry will have to continue to work toward standards for data protocols and multilevel security in order to realize an effective coalition immersive environment.
CONCLUSION We have seen increasingly immersive operational environments, such as the merged information stream for the JSF pilot. Because of the data sharing capabilities of the JSF (sensor data and information provided from one platform to another), the pilot may never actually see the target before or after weapon deployment. Thus, in a training scenario, a virtual adversary may be inserted that cannot be distinguished from a live asset. Conceptually, what we are describing is a convergence of perceived experience: a training environment that increasingly incorporates both constructive and real entities, and likewise, a real world activity that is integrated with simulation and training-specific tasks. The future of air force training will be enabled by continued advancements in live, virtual, and constructive environments.
398
Integrated Systems, Training Evaluations, and Future Directions
REFERENCES Ackerman, N. D. (2006, October). Strike fighter partners with pilot. SIGNAL Magazine. Retrieved November 11, 2007, from http://www.afcea.org/signal/articles Ball, J. T., & Gluck, K. A. (2003). Interfacing ACT-R 5.0 to an Uninhabited Air Vehicle (UAV) Synthetic Task Environment (STE). Proceedings of the Tenth Annual ACT-R Workshop and Summer School. Pittsburgh, PA. Colegrove, C. M., & Bennett, W., Jr. (2006). Competency-based training: Adapting to Warfighter needs (U.S. Air Force Research Laboratory Publication No. AFRL-HEAZ-TR-2006-0014). Retrieved October 30, 2007, from http://handle.dtic.mil/100.2/ ADA469472 Department of Defense. (2003). Defense Science Board Task Force on Training for Future Conflicts (Final Report). Department of Defense. (2006). Quadrennial Defense review report. Retrieved October 30, 2007, from http://www.defenselink.mil/qdr/ Gluck, K. A., Ball, J. T., & Krusmark, M. A. (2007). Cognitive control in a computational model of the Predator pilot. In W. Gray (Ed.), Integrated models of cognitive systems (pp. 13–28). New York: Oxford University Press. Hancock, P. A., & Hart, S. G. (2002). Defeating terrorism: What can human factors/ ergonomics offer? Ergonomics in Design, 10, 6–16. Jastrzembski, T. S., Gluck, K. A., & Gunzelmann, G. (2006). Knowledge tracing and prediction of future trainee performance. Paper presented at the 2006 Interservice/Industry Training, Simulation, and Education Conference, Orlando, FL. Moor, W. C., Andrews, D. H., & Burright, B. (2000). Benefit-cost and cost effectiveness systems analysis for evaluating simulation for aircrew training systems. In H. F. O’Neil & D. H. Andrews (Eds.), Aircrew training and assessment (pp. 291–310). Mahwah, NJ: Lawrence Erlbaum. National Research Council of the National Academies. (2008). Human behavior in military contexts. Washington, DC: National Academies Press. Schreiber, B. T., Watz, E., Neubauer, P. J., McCall, J. M., & Bennett, W., Jr. (2007). Performance evaluation tracking system. (Available from AFRL/HEA, 6030 South Kent Street, Mesa, AZ 85212) Wright, W. (1900, May 13). Personal letter to Octave Chanute.
Chapter 37
FACTORS DRIVING THREE-DIMENSIONAL VIRTUAL MEDICAL EDUCATION James Dunne and Claudia McDonald
A PERFECT STORM A perfect storm of adverse factors compels the search for innovative means to provide experiential learning rooted in critical thinking, not only for degree based education, but also as continuing education for medical practitioners. Research and development of three-dimensional (3-D) virtual learning platforms coordinate an interdisciplinary response to these adverse factors, which include widespread medical error, imminent shortages of medical personnel, a shift driven by demographics in the nature of medical care, rapid evolution in military medicine, and challenges posed by mass-casualty terrorism. The Institute of Medicine of the National Academy of Sciences estimated in 1999 that 44,000 to 98,000 people die annually from avoidable medical errors and that costs associated with these deaths ranged from $17 billion to $29 billion (Kohn, Corrigan, & Donaldson, 1999, pp. 1–2). The report distinguished between latent errors, whole classes of mistakes waiting to happen due to defects in the complex system of medical care, and active errors, individual miscues, most of which stem from system defects. The report calls for systemic reform in order to solve latent errors rather than enhanced vigilance to reduce errors at the point of contact. In essence, according to the report, fixing the complex system of medical education in America will reduce the overall number of errors (Kohn et al., p. 146). The institute’s call for broader and deeper training and continuing education carries with it a degree of difficulty compounded not only by “a broadening array of topics,” but also by other factors, including shorter hospital stays and residents’ workweek, which are reducing clinical training opportunities and expertise development (for example, Verrier, 2004, p. 1237); and baby-boom retirements from academic faculties and other demographic factors, which are creating
400
Integrated Systems, Training Evaluations, and Future Directions
looming shortages of medical personnel, especially physicians and nurses (Rasch, 2006, p. 3). Modeling and simulation are generally recognized in the medical community as remedies for the shortcomings of degree based health-care education curricula that do not provide sufficient clinical experience. A U.S. Food and Drug Administration panel, for example, “has recommended the use of virtual reality simulation as an integral component of a training package for carotid artery stenting,” according to the New England Journal of Medicine (Reznick & McRae, 2006, p. 2667). Reznick and MacRae conclude, “Given the advances in technology and the accruing evidence of their effectiveness, now is the time to take stock of the changes we can and must make to improve the assessment and training of surgeons in the future” (p. 2668).
SHIFTING SANDS: WARFARE AND TERRORISM The shape and nature of military conflict has changed dramatically, from traditional “theater” engagement to what Bilski et al. (2003, p. 814) call “the everchanging nonlinear battlefront.” A new warfighting tactic developed since the mid-1990s, expeditionary maneuver warfare, has driven military medicine to develop the forward resuscitative surgical system (FRSS), mobile, highly trained teams of surgeons and paramedical personnel who establish trauma treatment facilities within 10 miles of enemy contact (Chambers et al., 2005, pp. 27–30). Such forward-area medical intervention is necessary because expeditionary tactics and warfighting often increase evacuation times to major medical facilities; moreover, surgical practice in the FRSS differs significantly from that of institutional settings wherein surgical care seeks not only to keep patients alive through “damage control,” but also to effect definitive treatment of injuries. Terrorism has evolved to become a military tactic, as well as means of attacking and destabilizing civilian populations. The use and development of improvised explosive devices and other “weapons of opportunity” (Ciraulo & Frykberg, 2006, p. 943) create rapidly evolving generations of blast wounds, including soft-tissue damage and amputations, for which most military and civilian physicians are not prepared by clinical experience. Terrorism also has struck the United States. Studies since the 1995 bombing of the Alfred P. Murrah Federal Building in Oklahoma City, Oklahoma, and the 2001 jetliner hijackings with devastating effects in New York City and Washington, D.C., show that most physicians are not well prepared for mass-casualty incidents (for example, Treat et al., 2001; Galante, Jacoby, & Anderson, 2006). There have been significant expenditures since 9/11 in training first responders, but there has not been a corresponding provision for training physicians to deal with mass casualties stemming from a terrorist attack. Major training issues for civilian and military medicine flow from these developments as global conflict rooted in terroristic attacks looms on the horizon. Routine training required for effective FRSS operations, for example, is simply not available through most military medical institutions (Schreiber et al., 2002,
Factors Driving Three-Dimensional Virtual Medical Education
401
p. 8). Mass-casualty incidents are rare enough in the United States that few physicians have clinical experience in dealing with the triage, let alone treatment, of complex blast injuries (Ciraulo & Frykberg, 2006, p. 948). Few civilian medical facilities have either the equipment or clinically trained staff to handle a major bioterroristic attack, the victims of which may not begin appearing at hospitals and clinics for weeks after their exposure (Treat et al., 2001, p. 563). Developing clinical education in virtual space is consistent with the military’s long-standing commitment to simulation as an effective part of its training mix. Zimet, Armstron, Daniel, and Mait (2003) observe: With considerable assistance from the electronic game and entertainment industry, coupled with virtual reality environmental trainers, training systems correspond with actual combat to an unprecedented degree. Training software now is embedded in actual equipment, allowing continuous training on station. In addition, warfare itself has moved from the mostly physical to the mostly mental demands of information management and decision-making; thus, virtual training particularly approaches operational conditions in information age warfare. (¶36)
A NEW LEARNING PARADIGM: VIRTUAL MEDICAL EDUCATION Virtual medical education research and development calls into play expertise in the fields of medicine, medical education, computer science, software engineering, physics, computer animation, art, and architecture to collaborate with the commercial gaming industry in producing research based virtual learning platforms based on cutting-edge computer gaming technology. Virtual medical education is an initiative for improving the assessment and training of future medical and health-care personnel. Three-dimensional virtual environments can be used for clinical learning by all health-care disciplines to supplement traditional didactic materials and methods by providing iterative clinical training that poses no threat to patients, even as it enhances critical thinking. Such learning platforms also take into account the remarkable development of computer technologies as tools for teaching the “Net Generation” born since 1982, of which 89.5 percent are computer literate, 63 percent are Internet users, and 14.3 percent of whom have been using the Internet since age four, according to the U.S. Department of Commerce (2002, p. 43). Virtual medical education also can hone clinical skills for medical cases not usually encountered in actual environments, for example, avian flu epidemics and attacks of bioterrorism. Clinical experience for these events must be simulated to be learned; in theory, 3-D virtual space provides the most effective means for delivering asynchronous, iterative, clinical training, anytime and anywhere as an in-depth complement to traditional didactic materials. Three-dimensional virtual learning platforms, armed with intelligent tutors utilizing artificial intelligence to monitor user performance, can provide immediate feedback in the form of instructional material simultaneous with the user’s
402
Integrated Systems, Training Evaluations, and Future Directions
encounter with a virtual case. Case-editing systems with user-friendly interfaces can provide instructors with flexibility to design cases consistent with an institutional setting and educational goals consistent with problem learning and the procedural requirements of credentialing agencies. Virtual environments, moreover, are customizable and can be authored, as cases are, to be consistent with site specifics. The development of three-dimensional virtual simulation comes at a moment in the history of American health care when the paradigm is shifting from cure to care as the population ages. The current system is built on the concept of cure, but that does not reflect current U.S. demographic data and society dominated by aging baby boomers living longer than any previous generation due to pharmaceutical intervention, surgical advances, and the successful treatment of chronic diseases. U.S. medicine is evolving a new concept of care that has implications for medical education and underscores the need for virtual learning space. Medical practice in the future will become more complex as the concept of care for older patients matures and technology continues to advance. Virtual simulation of such complexity makes possible the kind of education and training that will rise to these challenges. Coalescing adverse factors in health-care education—fewer clinical opportunities, less time for clinical training, and declining medical and nursing school admissions—compel the search for alternative means not only of degree based education, but also continuous education for practitioners. Virtual medical learning platforms may provide cross-disciplinary expertise and resources required to meet these looming health-care crises. Virtual medical education research may have implications for disseminating cutting-edge medical knowledge to economies and cultures throughout the world that have not developed sufficient infrastructure to provide adequate clinical experience in traditional curricular formats. Virtual medical learning platforms are conveniently deliverable by various electronic media, playable on sufficiently powered and configured computers, and quickly utilized through the application of user-friendly tutorials and training routines.
THE WAY FORWARD: CENTERS FOR VIRTUAL MEDICAL EDUCATION Sophisticated learning in virtual space is, so far, just a theory. To be credible, virtual medical education must be based on rigorous research and testing to establish its validity and reliability. Centers for virtual medical education, such as one proposed at Texas A&M University–Corpus Christi, would provide crossdisciplinary expertise and resources to educational, governmental, and business entities engaged in meeting looming health-care crises with three-dimensional virtual learning platforms that are iterative, providing unlimited, repeatable clinical experience without risk to patients; portable, for training anywhere there is a computer; asynchronous, for training anytime; and immersive, providing firstperson experience leading to critical thinking and practical knowledge.
Factors Driving Three-Dimensional Virtual Medical Education
403
Virtual learning platforms must be grounded in research findings and equipped with tools and generators that enable clients to author their own cases and create their own scenarios within a variety of virtual environments. The platform in development at Texas A&M University–Corpus Christi, for example, is being rigorously researched and developed and extensively tested for reliability and validity, which may be expected to yield a product for delivering curricula with confidence for medical and other health professions. Virtual medical education research must be continuous as products are refined with successive generations of computer electronics, especially gamedevelopment technologies. Sophisticated learning platforms currently are pushing commercial developers to produce true-to-life images not previously achieved for entertainment purposes. Total fidelity in replicating physiological and pathophysiological states in virtual space is key to the success of these technologies as pedagogical and training tools, for only then can students’ clinical experiences in the learning platform be truly immersive and true to life. Virtual medical education also must develop an entrepreneurial dimension through collaboration with other entities that will generate revenue to support continuing research. Developing three-dimensional virtual learning platforms is expensive, requiring from-scratch funding of no less than $30 million. The good news is that, as sophisticated learning platforms are developed, subsequent refinement will not require reinventing the wheel. Virtual medical education researchers will license their proprietary software to others for far less than it would require to stake development from scratch. Over the long term, casedevelopment costs may be expected to decrease due to economies of scale, even as clinical cases become more complex and the demand for sophisticated visual fidelity pushes beyond current gaming industry standards; however, in the short term, it is not unlikely that case-development costs will remain relatively high for the foreseeable future. In the beginning—and we are at the very beginning of developing these learning tools—the U.S. military may be key in funding research programs leading to the development of valid, reliable, sophisticated learning platforms. The research and development field of virtual medical education would be a pool of training resources for military medical training, professional certification and credentialing, professional development, graduate medical education, and improved jointforce military deployment. By their very nature, such training products will produce efficiencies of operations and economies of scale for joint military medical activities as a source and distribution point for clinical training materials transmitted electronically to joint operating forces anywhere in the world.
CONCLUSION Three-dimensional virtual learning platforms are the right thing at the right time in medical education—technological development meeting critical needs and generating pioneering research. Virtual medical education will coordinate a diverse array of academic disciplines in an interface with government and
404
Integrated Systems, Training Evaluations, and Future Directions
business interests toward understanding and validating the dynamics of learning in virtual space, which will benefit not only medical education, but higher education in general. Students will learn, but more than that, new opportunities and career paths will evolve as the theoretical benefits of learning in virtual space are better understood and its capabilities—and limitations—become clear. The gaming industry recognizes that so-called “serious games” are on the growing edge of this dynamic sector, but it will take collaboration with the medical academic community to substantiate claims that learning platforms in virtual space can provide valid and reliable educational strategies for high level critical thinking and clinical skills. Contributor: Ron George, Texas A&M University–Corpus Christi REFERENCES Bilski, T. R., Baker, B. C., Grove, J. R., Hinks, R. P., Harrison, M. J., Sabra, J. P., et al. (2003). Battle casualties treated at Camp Rhino, Afghanistan: Lessons learned. The Journal of TRAUMA, Injury, Infection and Critical Care, 54(5), 814–822. Chambers, L. W., Rhee, P., Baker, B. C., Perciballi, J., Cubano, M., Compeggie, M., et al. (2005). Initial experience of U.S. Marine Corps forward resuscitative surgical system during Operation Iraqi Freedom. Archives of Surgery, 140(1), 26–32. Ciraulo, D. L., & Frykberg, E. R. (2006). The surgeon and acts of civilian terrorism: Blast injuries. Journal of the American College of Surgeons, 203(6), 942–950. Galante, J. M., Jacoby, R. C., & Anderson, J. T. (2006). Are surgical residents prepared for mass casualty incidents? Journal of Surgical Research, 132, 85–91. Kohn, L., Corrigan, J., & Donaldson, M. (Eds.). (1999). To err is human: Building a safer health system. Committee on quality of health care in America. Washington, DC: National Academy Press. Rasch, R. F. R. (2006). Teaching opens new doors. Men in Nursing, 1(5), 29–35. Reznick, R. K., & MacRae, H. (2006). Teaching surgical skills: Changes in the wind. New England Journal of Medicine, 355(25), 2664–2669. Schreiber, M. A., Holcomb, J. B., Conaway, C. W., Campbell, K. D., Wall, M., & Mattox, K. L. (2002). Military trauma training performed in a civilian trauma center. Journal of Surgical Research, 104, 8–14. Treat, K. N., Williams, J. M., Furbee, P. M., Manley, W. G., Russell, F. K., & Stamper, C. D. (2001). Hospital preparedness for weapons of mass destruction incidents: An initial assessment. Annals of Emergency Medicine, 38(5), 562–565. U.S. Department of Commerce, Economics and Statistics Administration, National Telecommunications and Information Administration (2002). A nation online: How Americans are expanding their use of the internet. Retrieved May 11, 2007, from http:// www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf Verrier, E. D. (2004). Who moved my heart? Adaptive responses to disruptive challenges. Journal of Thoracic and Cardiovascular Surgery, 127(5), 1235–1244. Retrieved May 17, 2007, from http://dx.doi.org/10.1016/ j.jtcvs.2003.10.016 Zimet, E., Armstrong, R. E., Daniel, D. C., & Mait, J. N. (2003). Technology, transformation, and new operational concepts. Defense Horizons, 31. Retrieved May 4, 2007, from http://www.ndu.edu/inss/DefHor/DH31/DH_31.htm
Chapter 38
VIRTUAL TRAINING FOR INDUSTRIAL APPLICATIONS Dirk Reiners The goal of this chapter is to provide the reader with an overview of industrial applications of virtual reality in general and virtual training specifically and to discuss the issues involved for industry to employ virtual training and how these can be overcome in the future. Even though the initially envisioned applications areas for virtual reality (VR) were in the entertainment and scientific, as well as military and medical, realms, industrial users were quick to try and see how to take advantage of the potential of the technology. Ressler (1994) already lists a number of prototype and research applications focused on industrial uses of VR technologies. While design review and visualization have been the focus areas for industrial applications, training has always been a core area of interest for industrial users. However, actual productive acceptance of virtual reality has been rather limited. This chapter looks at reasons for this phenomenon up until now and how the landscape will change for training in VR in the future. SPECIAL CASE: DRIVING SIMULATORS Industry has been an avid and successful user of specific kinds of virtual environments for training for a very long time, in fact, long before the term was even created. These systems were just not called virtual environments (VEs), but simply simulators, primarily for such vehicles as cars, planes, boats, and others. Many different kinds of driving and flight simulators have been and are in productive, daily use both on the side of the manufacturers, for example, as driving simulators to evaluate and train on such driver support systems as antilock brakes, as well as the users, such as airlines for pilot training. They share some of the characteristics of other VE training systems, so some of the issues described in the later parts of this chapter apply to them, too, but due to their wide availability and development, somewhat independent from the rest of the virtual environment continuum, they are not a part of this chapter.
406
Integrated Systems, Training Evaluations, and Future Directions
APPLICATION AREAS There have been many prototypes and developments for employing VE technology in an industrial context. Virtual worlds have significant advantages in industrial training in several areas. A major advantage is that training can be done without actual access to the physical facility for which the training is taking place. This is useful for facilities that have not been built yet, allowing the creation of a trained team to be ready by the time a facility becomes operational, for example, training an assembly crew before the factory that things will be assembled in has been built. It is also useful to train for facilities that cannot be taken out of production for training, for example, training a painter without having to shut down a full paint booth, for cases where training needs to be done in a separate geographical location from the final facility, or in cases where the object of the training is not available in the training location, for example, because of expense or space reasons. Examples include maintenance for vehicles and aircraft (Kaewkuekool et al., 2002; Wenzel, Castillo, & Baker, 2002). Other advantages of virtual training can be reduced resource use and faster reconfiguration to provide training for different scenarios. Even if the facility is available, using it for training could endanger the trainee, for example, operation of large machinery (Sanders & Rolfe, 2002), chemical plants (Nasios, 2001), or nuclear reactors (Kashiwa, Mitani, Tezuka, & Yoshikawa, 1995; Mark, 2004). Other scenarios that cannot effectively be trained in real life are realistic emergency situations, as by their very nature they can threaten the trainee (Nasios, 2001). DEVELOPMENT OF INDUSTRIAL USE OF VIRTUAL REALITY There are a number of different reasons for the slow adoption of VR into productive use in industry. Cost of Entry In the early days of VR (before 2000), projection and computing systems were a major investment, up to and above the million-dollar mark. High end Silicon Graphics, Inc.’s (SGI’s) graphics supercomputers were the only practical option for driving a VR setup. Head-mounted displays were either of low quality, such as 640 × 480 pixels with less than 24 bit color resolution, or very high cost, while low quality projection systems capable of displaying stereo images were not available at all, leaving only high end, expensive projectors. An additional, very important cost factor was the size of the installations. Computer systems were the size of a large refrigerator, with corresponding noise and air conditioning requirements. Cathode ray tube based projectors were not much smaller and needed regular calibration by specialized staff to maintain good quality images. Many of these installations would not fit into a regular office building floor and needed special construction, which posed a significant extra expense when looking at introducing them.
Virtual Training for Industrial Applications
407
This effectively prevented anybody but very large companies from acquiring a system for experimentation and prototyping applications. Unless there was a very clear and present return of the substantial cost involved, it was next to impossible to argue for buying a virtual reality system. As a consequence only large companies with correspondingly large research budgets and large prospective benefits considered and invested in virtual reality. For most of the 1990s, especially in Europe, this meant car and other vehicle companies. They were and still are in a business situation that has large potential benefits to be gained from the use of virtual environments both for training and more general design and planning purposes. Competition in the field is fierce, and time to market is an important factor for success. Being able to use a virtual model of a product or a virtual environment simulation of a scenario can significantly reduce the number of real models and prototypes (major time and cost factors) that need to be built, which can pay off even a high end virtual reality system fairly quickly. Software Availability A major hurdle for industrial adoption of virtual environments was and still is the availability of adequate and effective application software. The potential industrial users of VE technology are not software developers, and systems designed for software developers that are used very successfully in a computer science centric environment are not immediately useful in industrial applications. Larger companies worked with universities and hosted researchers or Ph.D. students to develop specialized systems for them, which also helped alleviate the need to have their own hardware setup. Some of these developers became part of the companies and continued to do specialized development, but overall having to develop software is a major deterrent. Thus for a long time VE usage was limited to large companies that could afford to fund specialized software development in an innovative field with a small number of available developers, mostly from the research community. Most companies did not want to fund the development of general software suites that could support various application areas of VE technology; very narrowly focused solutions were much less expensive and more effective. Actual uses were mainly limited to research prototypes or applications with a very limited scope. Productive deployment was and is hindered by support issues. Research organizations and universities are great partners for developing prototypes and new technologies, but productive use needs constantly available support, user service, documentation, and continued development. Research organizations are not set up to provide these services, and commercial software vendors were not able to see enough business due to the high cost of entry and the small number of customers.
408
Integrated Systems, Training Evaluations, and Future Directions
Data Creation and/or Conversion The data conversion and processing needs of applications vary widely. On the one end of the spectrum are applications that have only one scenario, like most driving simulators. In that case, a lot of manual work can be put into model preparation, as it has to be done only once; therefore, the cost can be amortized over many uses of the system. Especially in the early days, preparing a model for running a virtual environment was a major effort. Graphics systems could handle only fairly small models, on the order of 100,000 triangles or less, and still maintain the update rates and latency necessary for achieving immersion. Even with many optimizations, such as level of detail, that focus this triangle budget on the visible portion of the screen, that is a severe limitation. Creating a convincing virtual world within this limitation required strong simplification, especially if the original data came from a computer-aided design (CAD) system and was designed for constructing real objects, thus containing a large amount of detail that is irrelevant for a virtual environment, such as the threads of the screws. Typical CAD data can easily exceed millions of triangles, requiring a reduction by orders of magnitude to provide satisfactory performance. It was also not a trivial endeavor to export data from CAD systems, as there were no standardized formats that could be used for exchange. Every CAD system supported only its proprietary format, requiring a specialized exporter converter to get access. Another important aspect of model preparation is assigning material characteristics, colors, and textures in an effort to simulate the natural appearance of the object. The graphics hardware could process only very simple lighting calculations fast enough, so creative approximation was needed to generate convincing effects, and a lot of experience in assigning these parameters was necessary to get the best results. This data preparation effort is a major problem for application from the other end of the spectrum for the ones who need to work with up-to-the-minute current data, such as reviews of the current design state or assembly/disassembly training and simulation. For these applications any manual intervention beyond model selection is a problem, and even automatic conversions that take longer than a few minutes limit the usefulness severely. CURRENT STATUS While virtual veality is still a rather small area compared to the size of the computer industry, VR systems benefit significantly from developments in other areas, changing the playing field, especially for interested parties from an industrial background. Cost of Entry The times of Silicon Graphics graphics machines are long gone. The competitive drive in the race for a better gaming personal computer (PC) has led such
Virtual Training for Industrial Applications
409
manufacturers as nVIDIA and ATI (now AMD) to develop highly integrated, extremely powerful graphics systems that fit on a PC card and cost a fraction of an SGI machine. Such standard manufacturers as Dell, Inc. and HewlettPackard Company can deliver a machine capable of displaying scenes consisting of several million triangles with sophisticated lighting and shading models at interactive and/or immersive rates. The cost of such a system is only slightly higher than a regular desktop PC, and, in fact, many CAD designers use just such a system in their daily work. This makes it possible to run an immersive VE system from standard components that integrate well into a standard information technology (IT) infrastructure and that do not need major investments to be acquired. They also do not have unusual requirements as far as power and air conditioning are concerned, making their installation very easy. The display side has not evolved quite as much. High end stereo-capable projectors and screens are still large and expensive, but it is now possible to use low end boardroom projectors with simple passive filters to create entry level systems that for most practical purposes look very similar to a standard meeting room. This allows even small companies to set up a VE-capable environment without unduly large effort. The hardware side of the equation is ready for widespread adoption. Software Availability In the long run, in-house software teams are not a sustainable solution for most companies, given the speed of change of the software and hardware environment and the growing user requirements. The amount of effort required to keep an internal software system for virtual environments up-to-date is just too high for companies for which the software is not the core business. Higher level software systems that hide the complexity of programming applications behind graphical user interfaces promise to reduce the barrier of entry into application development. Many systems have a node- and route based structure that provides basic building blocks that can be connected using graphical tools. However, for most of the industrial users, especially the many smaller companies that have not had contact with the technology at all, the learning curve even of those systems is not a viable option. Therefore, the availability of off-the-shelf applications or low cost configuration and/or specialization is a necessity for a much wider adoption of VE technology in a larger variety of industries. It is an obvious hen and egg problem, as the substantial efforts involved in creating specialized applications need to be offset either by high prices or larger numbers of customers to be a viable business. A number of small- and medium-sized companies are competing in this market with reasonable success, offering turn-key solutions for a number of problem areas. A compromise that has been attempted with good success by some companies is to create very specialized applications with very limited but valuable functionalities either as configurations for a high level system or special programs based on existing libraries. This could be a successful path for wider spread use in
410
Integrated Systems, Training Evaluations, and Future Directions
industry, but it depends on the availability of development manpower at affordable pricing. At this time, software availability and usability remains a critical shortcoming in widespread adoption of VE technology in industry. Data Creation and/or Conversion Thanks to the rapid developments in graphics hardware capabilities, the requirements for models to be suitable for virtual environment systems have been relaxed quite significantly. The precision of CAD models has increased from the early days of VR, but not at the same rate as the performance. Therefore, current CAD models for small- to medium-scale objects can be used pretty much as they are in a VE system, reducing or eliminating the need for simplification. For large objects, such as airplanes or whole factories, or for mechanically complicated objects, such as an engine with all construction details, nonsimplified models are still too complex. But thanks to advances in automatic software simplifiers that remove visually irrelevant components, usually only a small amount of manual work is necessary. It has also become easier to get data out of other systems. Exporters for such standard file formats as VRML/X3D or Collada are common in many construction and/or simulation systems. If those are not available, a common fallback is STL, the stereolithography format, which is a trivial triangle format that loses a lot of structural information, but gives access to the geometry and is supported in virtually every geometric construction system. Given the growing need for high quality images coupled with the wider availability of high speed graphics hardware, more and more construction systems support the specification of high quality surface characteristics. These are not necessarily used by every designer, but having them available helps increase awareness. In addition, the grown hardware capabilities allow the direct use of realistic lighting models, alleviating the need for a large amount of experimentation to achieve a desired look. Instead, an automated or semi-automated assignment of surface characteristics to objects can be done. In conclusion, while model conversion and availability for VE systems is not a totally obvious and automated process yet, it has been significantly simplified and does not pose a serious deterrent to introducing VE systems into an industrial context anymore. FUTURE DEVELOPMENTS A lot has happened since VR was first prototyped and used in industrial applications. Cost and other barriers of entry have been significantly lowered, and things are going to get only better in the future. Cost of Entry There does not seem to be an end in sight for the evolution of graphics card performance. They keep getting faster and more powerful, at a stable price level,
Virtual Training for Industrial Applications
411
supporting running VE-style interactive/immersive three-dimensional (3-D) graphics application on almost every available computer system. New developments, such as the PCI-Express standard for expansion cards, will support putting more graphics cards into one system, allowing larger, multiscreen or very high resolution projection systems to be driven from a single off-the-shelf PC. Three-dimensional-capable displays are just now entering the mainstream. Three-dimensional movie theaters are becoming ubiquitous, and Texas Instruments together with Samsung is pushing 3-D-capable TVs into the market, at prices that compete very well with regular TVs. This noticeably reduces the cost of entry into 3-D displays and introduces a wider audience to them, increasing acceptance and, after that, demand for them. Software Availability The problems for providing software are getting smaller due to increasing numbers of customers and a growing market, but it is still a major issue and a deterrent to really wide adoption. The existing companies will slowly but surely expand their offerings, but there is a market for new companies that can provide solutions for industries that see potential at the now reduced price points. CONCLUSION Industry has looked at virtual environments for a long time, as there are a number of application scenarios that can clearly benefit from VE technology and provide significant savings both in time and cost compared to traditional methods. In the beginning many technical limitations were barriers for exploratory or even productive entry into the field. Thanks to many developments in hardware and software many of these barriers have been lowered or will be lowered in the near future. A limiting factor that still exists is the availability of turn-key software solutions for quick and seamless introduction into new businesses. There is a need for more providers of specialized know-how that can help companies quickly create practical, working solutions for new application scenarios and industries. REFERENCES Kaewkuekool, S., Khasawneh, M. T., Bowling, S. R., Gramopadhye, A. K., Duchowski, A. T., & Melloy, B. J. (2002, May). Using virtual reality technology to support job aiding and training. Paper presented at the Industrial Engineering Research Conference, Orlando, FL. Kashiwa, K., Mitani, T., Tezuka, T., & Yoshikawa, H. (1995). Development of machinemaintenance training system in virtual environment. Proceedings of the 4th IEEE International Workshop on Robot and Human Communication (pp. 295–300). Piscataway, NJ: Institute of Electrical and Electronics Engineers.
412
Integrated Systems, Training Evaluations, and Future Directions
Mark, N. K. (2004, November 25–26). VR-system for procedural training and simulation of safety critical operations in relation to the refuelling at Leningrad NPP. Presentation at the NKS Seminar on Nordic Safety Improvement Programmes, Halden, Norway. Nasios, K. (2001). Improving chemical plant safety training during virtual reality. Unpublished doctoral dissertation, University of Nottingham, Nottingham, United Kingdom. Ressler, S. (1994, June). Applying virtual environments to manufacturing (Rep. No. NISTIR 5343). Gaithersburg, MD: National Institute of Standards and Technology. Sanders, S., & Rolfe, A. C. (2002). The use of virtual reality for preparation and implementation of JET remote handling operations. Fusion Engineering and Design, 69, 157–161. Wenzel, B. M., Castillo, A. R., & Baker, G. (2002). Assessment of the Virtual Environment Safe-for-Maintenance Trainer—VEST (Rep. No. AFRL-HE-AZ-TP-2002-0011). Mesa, AZ: Air Force Research Laboratory.
Chapter 39
CORPORATE TRAINING IN VIRTUAL ENVIRONMENTS Robert Gehorsam Interactions in virtual training worlds can be much richer, deeper and more realistic than with existing computer based techniques. In addition, using virtual training worlds for corporate training has such advantages as decreasing training cost by lowering trainees’ travel and lodging expenses, providing a wide range of flexibility for training schedules, and improving the motivation of trainees. —Accenture Technology Labs
Historically, virtual environments have been most useful for training and practicing procedurally oriented skills when the risk of failure in the operational environment is high and alternative methods are either prohibitively expensive or, at the other end of the spectrum, ineffective in providing appropriately immersive experiences to the trainee. Aviation, disaster response, hazardous materials handling, and military training have typically been the sweet spots for the application of virtual environments. Furthermore, these applications have tended to target the single user (such as fighter pilots) and rely on expensive, specialized, locationspecific hardware and software systems. Finally, these training applications focus on human interactions with complex physical or instrumentation systems—the environment itself is a key antagonist in the training scenario. The rise of massively multiplayer online games and social virtual worlds has transformed this paradigm, providing an opportunity for new cohorts of professionals to derive benefits from virtual environments that were unforeseen by traditional virtual training developers, as well as by game designers. Fundamentally, these consumer technologies use game design along with graphic, artificial intelligence, and networking technologies to focus on the identity, social, and interpersonal aspects of virtual environments. In essence, rather than focusing on the human-environment interaction, they focus on the human-to-human interactions within the environment. The avatar—the virtual representation of an individual in a virtual environment—is the focus. Rich, multiuser virtual environments are now available to anyone with a standard personal computer (PC) and an Internet connection, and this will ultimately spread to mobile devices.
414
Integrated Systems, Training Evaluations, and Future Directions
Beyond these technology trends, it is an often-repeated truism that a generation of employees is now entering the workforce whose first experience of software is not the spreadsheet, but the video game. America’s Army, albeit a military application, is a key example of how contemporary organizations believe they must communicate to their prospective workforces. It stands to reason, then, that corporations are now looking at how games and virtual environments might play a productive role in the workplace. And when they do, they see training and e-learning as areas of primary interest. The global knowledge management market is informally estimated by Claire Schooley, Senior Analyst for e-Learning at Forrester Research, to be $195 billion per year, which includes spending on technology, course development, and formal and informal learning (personal communication, April 2008). Today’s global corporations have a seemingly endless demand for training and e-learning that can be satisfied only with a range of learning modalities: internally, there are sales, management, and leadership and technical training needs; externally, there are customers and partners to be educated and trained. The global corporation is thus faced with several challenges for developing effective learning strategies: how to overcome the financial and time costs of bringing people to learning centers and how to foster cohesion when the workforce is distributed, mobile, timeshifted, and comes from diverse cultural backgrounds. Virtual environments—and specifically persistent online virtual worlds—are essentially a model. Dr. Byron Reeves, the Paul C. Edwards Professor of Communication at Stanford University and Faculty Director of the Stanford Media X Partners Program notes, “MMORPGs [massively multiplayer online roleplaying games] mirror the business context more than you would assume. They presage one possible future for business—one that is open, virtual, knowledgedriven, and comprised of a largely volunteer or at least transient workforce” (IBM Corporation & Seriosity Inc., 2007, p. 7). As a model, then, virtual worlds provide a potentially limitless environment for learning and training. The premise for the successful deployments of virtual worlds in enterprise-oriented training is that these worlds can (a) replicate the full range of everyday and extraordinary situations for employees, (b) provide the necessary support for various modalities of learning, (c) do so in a manner that is easy to deploy, operate, and learn, and (d) provide either superior training or more cost-effective training . . . or both. However, because of the traditional resistance of corporate information technology (IT) departments to nonstandard desktop applications, and an equally traditional “cultural” bias against any software that seems “game-like” in the serious workplace, the use of virtual environments is still in its earliest stages in corporate settings. Some of the early work explores the use of virtual environments for collaboration, some for marketing, and some for training. At this point in time, studies and pilots are under way in a number of different industries, notably health care, energy, and, not surprisingly, technology. What is most striking is the broad range of applicability suggested by avatar-enabled virtual environments, from general purpose management training through specific industrial uses to individual professional development, these immersive environments show the
Corporate Training in Virtual Environments
415
potential to deliver training and learning through a range of modalities, including team based training, mentoring, formal curricula adapted to virtual environments, and even individualized training. This chapter provides a series of snapshots of how some of the more innovative learning-oriented companies are utilizing virtual worlds to develop new training and learning capabilities and what these early efforts might presage for the future. In 2003, as massively multiplayer online gaming was becoming a mainstream phenomenon and Linden Lab and There.com were launching Second Life and There, respectively, Accenture, through its Technology Labs, began exploring a general purpose, horizontal application of virtual worlds—the use of distributed immersive environments for management and leadership training to solve critical problems in a collaborative manner. In a 2003 paper “Using Virtual Worlds for Corporate Training,” published as part of the Proceedings of the Third IEEE International Conference on Advanced Learning Technologies (ICALT’03), the authors describe how virtual worlds are well suited toward solving the problem of delivering “synchronous interactions among distributed trainees” (Nebolksky, Yee, Petrushin, & Gershman, 2003, p. 412). The authors describe how the virtual training world provides for three distinct categories of users: facilitators (the observer/controller equivalent in military exercise), subject domain experts, and the students themselves. While the first two categories of users provide context and content, the students, who may or may not be co-located, are represented as three-dimensional (3-D) avatars on computer screens, which are the communications interface between the participants. Students may be assigned to play their actual real-life role (for example, vice president of manufacturing) or be asked to switch roles to provide them with new perspectives. For example, the vice president of marketing may be asked to take on the head manufacturing role in order to learn about the operating constraints that manufacturing experiences as it tries to deliver to marketing specifications. Accenture identified 10 skills necessary for leadership development, ranging from such basic skills as communications and planning through more demanding skills, such as envisioning the future and intelligent risk taking. They designed a fictional scenario well suited for the immersive, visceral qualities of 3-D environments based on the polar expeditions of Sir Ernest Shackleton, a scenario that demanded—and challenged—both team formation and leadership skills. The participants then proceeded through a series of episodes with a “narrative” arc that involved scenario description, plan, plan execution, complication, and debriefing. Relative to live role-playing exercises that previously required collocation of participants and a paucity of immersive experience, Accenture concluded that using virtual worlds for corporate training “had such advantages as decreasing training cost . . . giving the flexibility for training schedule, and improving the motivation of trainees” (Nebolksky et al., 2003, p. 413). Interest in corporate uses of virtual worlds for organizational training was further echoed by SRI Consulting in a major 2007 study of virtual worlds and serious gaming, in which the company noted that “training and education are likely
416
Integrated Systems, Training Evaluations, and Future Directions
to be the most mature virtual worlds applications outside games and social world” (Edmonds, 2007, p. 50). Citing the long history of custom-built virtual environments as training tools for military and government organizations, it concluded that organizations would “continue to demand specific environments that they are free to control themselves and host on their own intranets” as opposed to using public virtual worlds for organizational purposes (Edmonds, p. 50). Nevertheless, robust, publically disclosed deployments of private virtual worlds for corporate training have been few and far between, due to reasons cited above. One early adopter organization has been pharmaceutical giant Johnson & Johnson, which in 2005 began using virtual environments to address the onboarding and familiarization challenges of new employees. Johnson & Johnson (J&J) is emblematic of today’s large, globally distributed companies. It has over 119,000 employees, distributed among 250 operating companies in 57 countries, and generates over $53 billion in annual sales. It has both enormous consumer brand recognition, as well as a technology research and development–centric culture. According to surveys conducted by Fortune, Forbes, and other media, it is one of the world’s most admired—and diverse—companies. Furthermore, it has achieved consistent sales growth for 75 consecutive years. How does a company of this scale maintain such a high level of performance? One critical component of success is through J&J’s e-University, an online, traditional Web based portal comprising over 75 distinct schools organized by region, function, and operating company. The schools offer thousands of individual courses, integrated into a learning management system. However, one functional unit of J&J, the Pharmaceutical Research & Development (J&JPRD) unit, concluded that a more immersive virtual environment was required for certain training tasks. J&JPRD consists of 10 research centers located in North America, Europe, India, and China engaged in the development of new pharmaceutical products and conducting clinical trials. These efforts require high levels of collaboration between diverse groups. J&JPRD has developed an extension of its e-University presence, known as 3DU. As reported in a case study prepared by Brandon Hall Research, the unit felt the need to “shrink the physical world to allow for engaging interactive learning techniques . . . to enhance retention and knowledge transfer among the global employees of J&JRPD” (McKerlich, 2007, p. 17). The environment, based on Proton Media’s Protosphere tool, re-creates physical classroom environments and provides individual employees the opportunities to interact via avatars (with speech and text), exchanging documents, conducting classes, and engaging in informal learning opportunities. Unlike typical synchronous teleconference environments, 3DU is “always on,” meaning that any employee anywhere in the world can log in anytime to interact with others. Furthermore, the integration with J&J’s learning management system provides self-paced learning within the virtual environment as well. As noted earlier, one barrier to adoption in the enterprise is the cultural resistance of IT departments or even management to enable game-like applications on corporate intranets. Even when this barrier is overcome, other cultural
Corporate Training in Virtual Environments
417
barriers can exist within the workforce. Technology apprehension by older, less tech-savvy employees came to the forefront of J&J’s implementation challenge, and, thus, a technology-training curriculum had to be introduced to enable culturally and demographically diverse populations of employees to not only use the technology itself, but to understand how to communicate and interact with other employees in an avatar-mediated environment (McKerlich, 2007, p. 18). Private e-learning-oriented virtual worlds, created from commercial off-theshelf platforms, deployed behind firewalls, and integrated into corporate learning management systems, are an important and perhaps central trend in creating the milieu for the widespread adoption of virtual worlds for training, but not the only. Members of today’s global, mobile workforce have individual professional development needs that exist separately from the specific training regimens of individual enterprises. Almost certainly, one of the most in-demand skills the workforce needs is language proficiency. In particular, learning English as a foreign language (EFL) is a multi-billion-dollar industry, with demand expected to rise to 2 billion English language learners worldwide by 2010 (Graddol, 2006, p. 101). According to informal research conducted by Paideia Computing, a technology based EFL instructional company, today, the vast majority of learning English occurs in live classroom or private tutor settings, with less than 20 percent technology enabled, mostly through CD-ROM based media. The efficacy of using multimedia and game based language training tools is increasingly accepted, and the U.S. Department of Defense’s Tactical Iraqi program has, in fact, spawned a commercial endeavor to extend the reach of the capability beyond U.S. forces. Now, however, some companies are looking at how to employ virtual worlds to address the insatiable commercial demand for English as a second language in the Asian market. Paideia Computing is a company that utilizes Forterra Inc.’s OLIVE virtual world platform to develop English language curriculum for the Asian market. The solution is not intended to be delivered behind a company’s firewall for training purposes, but rather as a public, subscription based service for individual and small-group training. In addition to the new form of delivery (a public, specialpurpose virtual world), Paideia introduces several other innovations relative to the use of virtual worlds for enterprise-oriented training. Language acquisition is a skill best learned through practice in context. Paideia quickly understood that a virtual world provides the optimal blend of both: an ondemand virtual environment that can re-create the physical settings most language students are going to need in both acquiring and practicing conversational and reading capabilities. Unlike a classroom setting or a textbook, the virtual environment provides the student with the actual experiences of, for example, being at an airport, in a restaurant, and in a work meeting. And unlike CDROM based instruction, the virtual environments could be populated not just by artificial characters, but by other role-players, students, and teachers. The distributed nature of a virtual world platform also enables native-speaking or otherwise qualified teachers to be available from any location in the world to any student located anywhere in the world, which in turn provides a capability
418
Integrated Systems, Training Evaluations, and Future Directions
not available to other single-user technologies, such as CD-ROM voice interaction. While there have been considerable advances in voice recognition technology to enable a synthetic instructor to understand and respond to a human, the integrated voice over Internet protocol capabilities of a virtual world platform enables high fidelity multipoint communication between a teacher and any number of students, no matter where they are located. Furthermore, the integrated recording and replay capabilities of the platform enable the teacher to review with any number of students over the network key lessons and performance assessments. That said, with over 1,000,000,000 students studying EFL, it is easy to imagine the significant business scalability problems in providing personal tutors for every student online. So while virtual worlds excel and differentiate themselves from other game-like environments, Paideia has recognized that a full language training solution involves not just virtual environments that provide teacher-to-user training, but integration with learning management systems, the development of appropriate synthetic characters that can either provide instruction or be “background” characters in a scenario, and full data analytics to support the optimization and evolution of the system. The result is a learning environment that is part virtual world, part social network, part game, and part classroom. In conclusion, we see that while the uses of virtual environments for training in enterprise are still at their earliest stages, a range of applications, learning modalities, and deployment methodologies are available. It seems clear that there is no “one size fits all.” Unlike government programs, where requirements frequently result in technology being built from the ground up, corporations will rely on commercial off-the-shelf platforms, subsequently modified for specific organizational needs, to satisfy their requirements. Nevertheless, some barriers will continue to exist and need to be overcome by internal evangelism and the asyet-to-be-published success stories in a range of lighthouse deployments. Technologically, the limited graphical capabilities of desktop PCs and the limited capacities of internal networks to handle the increased traffic requirements will challenge widespread adoption in the near term. Culturally, the potential resistance of IT departments and management to “game” deployments on networks will play an inhibiting role, and the not-uncommon phenomenon of user apprehension relative to a new, dynamic technology can be anticipated and answered. However, the potential benefits and cost savings relative to travel and other overhead expenses show high promise. Most importantly, the entry into the workforce of a young, technically sophisticated population, growing up on games and virtual worlds at home, presages a demand for a similar high quality immersive experience at work. REFERENCES Edmonds, R. (2007). Virtual worlds. Menlo Park, CA: SRI Consulting Business Intelligence.
Corporate Training in Virtual Environments
419
Graddol, D. (2006). English next: Why global English may mean the end of ‘English as a foreign language.’ London: The British Council. Available from http://www .britishcouncil.org/learning-research-english-next.pdf IBM Corporation & Seriosity Inc. (2007). Virtual worlds, real leaders: Online games put the future of business leadership on display (A Global Innovation 2.0 Rep.). Available from http://www.seriosity.com/downloads/GIO_PDF_web.pdf McKerlich, R. (2007). Virtual worlds for learning: How four leading organizations are using virtual environments for training (Analysis Report). Sunnyvale, CA: Brandon Hall Research. Nebolksky, C., Yee, N., Petrushin, V., & Gershman, A. (2003). Using virtual worlds for corporate training. Proceedings of the 3rd IEEE Conference on Advanced Learning Technologies—ICALT’03 (pp. 412–413). Los Alamitos, CA: IEEE Computer Society. Available from http://csdl2.computer.org/comp/proceedings/icalt/2003/1967/00/ 19670412.pdf
Part X: Next Generation Concepts and Technologies
Chapter 40
VIRTUAL ENVIRONMENT DISPLAYS Carolina Cruz-Neira and Dirk Reiners The display has always been a critical component of virtual environment (VE) systems. This is not surprising, as humans take in 80 percent of the information about the environment through their eyes; therefore, presenting a convincing version of the virtual environment to the eyes is a necessary step for full immersion. Displays have also shaped the public image of virtual reality (VR) to a large extent; people wearing weird contraptions on their heads or, at the minimum, funny glasses are a staple element of many movies, and that is not actually an inaccurate description of the reality in most labs. A LITTLE BIT OF HISTORY The initial idea for VE displays originated from Ivan E. Sutherland’s (1965) vision for “The Ultimate Display” that enabled users to enter and control a computer-generated world. Sutherland’s (1968) Sword of Damocles was the first head-mounted display (HMD) and, therefore, marked the beginning of a new field. Fisher, McGreevy, Humphries, and Robinett (1986) at NASA (National Aeronautics and Space Administration) went a step further integrating HMDs with three-dimensional (3-D) sound, voice recognition, voice synthesis, and a DataGlove (Zimmerman, Lanier, Blanchard, Bryson, & Harvill, 1987). A few years later, the first commercial HMD, VPL Research Inc.’s Eyephone, became available. The early 1990s clearly marked the acceptance of VEs, with the introduction of the CAVE (cave automatic virtual environment) by Cruz-Neira, Defanti, Sandin, Hart, and Kenyon (1992), the virtual portal by Michael Deering (1993), and the responsive workbench by Krueger and Fro¨ehlich (1994). These systems helped establish the field of virtual reality by introducing the novel, yet pragmatic, use of proven, familiar projection systems to create immersive displays. The real validation for projection technology as an accepted form of VR happened when General Motors Corporation (GM) installed the first CAVE in the industry in late 1994. GM pioneered applications of VR in the area of vehicle design and virtual prototyping. A few years later, the oil and gas industry “discovered” VR, getting a significant number of projection based immersive systems across its different branches and groups. By 1997 there were over 50 projection based VR systems operating worldwide in academia, research, and industry.
Virtual Environment Displays
421
Projection systems solved several limitations of HMDs by providing a physical space that could be shared by multiple users (although only one could be tracked) and that allowed the blending of virtual and real space, including the user’s own body. But they also introduced their own limitations, in particular, issues related with the use of projectors, such as multiple projector calibration (color, convergence, blending, and so on) and the need for large and dark spaces. Thus displays have always been and will continue to be an active area of research and an important aspect of future growth and development for VE systems. To reach high levels of acceptance, VE displays need to have high quality, in the best case the quality of the human visual system, and they need to be affordable. Given that these are conflicting goals, there are many sweet spots in the continuum that can be exploited, and current developments in commercial off-the-shelf (COTS) components enable future devices to raise the quality bar without raising the price. Given the enormous flexibility and the wide range of capabilities of the human visual system, quality can have many aspects in the context of displays. Volume 2, Section 1 in this handbook provides a detailed discussion of the human visual system and its relationship to display design. For the purposes of the discussion in this chapter, resolution, field of view, stereo presentation, and color and brightness precision all can play important roles in whether a display is merely good enough or able to get users to suspend their disbelief and get immersed in the virtual environment. NEED COTS COMPONENTS TO GROW A recurrent problem with virtual reality equipment and especially displays is that they tend to be very expensive, as they are usually built in very small numbers and have high quality and precision requirements. As a consequence, only a few people can afford them, which drives the price even higher to the point that nobody is able or willing to afford them, and the company disappears. The most promising solution to this problem is to use COTS components as much as possible, such as projectors, liquid crystal display (LCD) panels, or interactive devices from gaming. To some extent that has always been the case, as it is economically unfeasible to develop such specialized components as LCD panels for VE displays. The disadvantage is that this limits the possible capabilities to whatever the rest of the commercial market needs at the time. These capabilities might not match the needs of VE displays exactly, opening a space for ingenious engineers to find ways to push the components far beyond that for which they were designed. HEAD-MOUNTED DISPLAYS Head-mounted displays have been one of the defining components of virtual environments since the very beginning. Making them work well and deliver high quality visuals is a very challenging problem, which is why early HMDs had very bad quality and/or usability.
422
Integrated Systems, Training Evaluations, and Future Directions
An HMD combines a large number of challenges: as it is rigidly attached to the head it needs to be very light to allow comfortable use over time. It also needs to cover a large field of view to avoid distracting tunnel-view effects. Because the displays are close to the eye, the pixels need to be very small to avoid blockylooking images. Especially the wide field of view, in combination with limited resolutions of LCD panels, has been a significant problem in HMD design, and for a number of years very little development and/or advancement could be seen. In the early years the focus was on wide field of view systems (50° to 60° horizontally) with low resolutions (in the order of 640 × 480 pixels), which led to displays that allowed the identification of every individual pixel (or subpixel), which severely limited the realism of the displayed images. Later the focus shifted to higher resolutions (up to 1,280 × 1,024 pixels), but partially at the cost of field of view (30° to 40° horizontally), leading toward many HMDs that left the impression of wearing black blinders at all times. In the recent past this has changed, and two new developments give hope for a resurgence of HMD display systems. The HMD system designed and developed by Bolas and McDowell (2006) targets the field of view problem through the use of two LCD panels per eye and widespread optics. The result is a display that provides a good field of view, but with noticeably limited resolution outside of the direct forward view direction. The benefit is a fairly light design that can be used very comfortably and that can be built at a reasonable cost. The second approach introduced by Sensics (2008) tries to cover all the bases by combining a large number (up to 24) of LCD panels to cover the whole field of view. This approach has the potential to achieve very high resolution everywhere in the field of view, allowing the user to look around freely. The cost is a fairly high weight, requiring physically strong users or an intelligent counterbalance scheme. It is also a very challenging problem to design the right optics to hide the discontinuities, including the geometric, as well as color and brightness discontinuities, between all the LCD panels and to make them appear seamless to the user. Solutions exist, but they require fairly precise calibration for and by the user. A critical component of all current HMDs is small, high resolution LCD panels. They used to be very specialized components with very limited use, which led to few options and high prices. Given the rise of higher and higher resolution cell phones, as well as such portable devices as ultralight PCs, such as Sony’s Vaio UX50, the need for high resolution in small displays is rising, helping to add interest among display manufacturers to provide more products in this market. This increased availability and quality will support increased quality for HMDs pretty directly. Both systems with few panels and widespread optics, as well as multipanel tiled systems, will benefit, and there are market opportunities for both designs. The resulting expectation is that we will see some resurgence of HMD developments and technology, reviving a market that had become rather stale.
Virtual Environment Displays
423
MONITORS In the beginning monitor based systems, also known as fish tank VR, such as the one described by Ware, Arthur, and Booth (1993), were pretty common. They provided an inexpensive entry way into virtual environments, as the common cathode ray tube (CRT) monitors were capable of displaying active stereo images directly, alleviating the need to buy specialized display systems. This changed dramatically when LCD panels became the standard display, to the point that they have essentially replaced CRT monitors 100 percent. LCDs are not fast enough for active stereo, removing the ability to directly display stereo images and all but removing monitor based VR systems. New developments in stereoscopic display for regular TVs can open new avenues here, where Samsung has introduced plasma displays that can display the necessary high refresh rate to drive active stereo glasses. These are designed to be used as home TVs and, therefore, are too big for desktops, but they can serve as an alternative to large monitors or small projection screens. On the LCD side the quickly increasing resolution of current panels, routinely reaching 2,560 × 1,600 pixels or more, has made it possible to actively use technologies that have been around for a long time, but which need very high resolution displays for high quality results. These technologies are lenticular or parallax-barrier screens for autostereoscopic display. These approaches both trade resolution for stereoscopic display by redirecting a subset of the available pixels into different directions away from the screen. By placing the eyes into the right position so that one eye sees a subset of pixels that is not visible to the other eye, stereo display is possible. The difference between the two methods is that lenticular displays use a lens sheet to redirect the pixels into different directions, while parallax-barrier displays use an array of black stripes that hides parts of the screen for certain locations in space. The quality of these displays depends on the resolution and on the number of separate images that can be redirected, which influences how large the area is from which correct stereo can be seen, as well as how many users can see individual views of the 3-D scene. Autostereoscopic displays add a new quality to the display space, as they allow immersive display without any special glasses of any kind, which also allows multiple viewers. This can help in gaining acceptance among a larger audience spectrum and help make stereoscopic and/or immersive displays more widely used in regular office settings. An alternative to the old active stereo, which LCD panels do not support and will not support in the near future, is passive stereo using polarized glasses. Through the use of two coupled LCD panels it is possible to create polarized stereo images directly, without having to create two images separately and polarizing and overlaying them. These displays are becoming available commercially now, such as the iZ3D monitor (2008), and at competitive prices, as they are being targeted at the game-playing public. Somewhere between monitors and projection systems is an alternative technology to the previously mentioned stereoscopic plasma displays, which are stereoscopic digital light processing (DLP) displays. These are projection TVs that
424
Integrated Systems, Training Evaluations, and Future Directions
have been put on the market by different manufacturers at price points that are very close to regular TVs. They are projection systems, but at a small scale, and they support regular active stereo signals. They also form a basic component of larger-scale projection displays (see below). Research and development for monitors for VE displays had been fairly dormant for some time. New developments in commercial applications for stereoscopic displays for regular TV and game audiences help to make them more available and affordable, and also make them a more usual occurrence. This can help remove the stigma of being something special, as that can hinder adoption, especially in an industrial context. Also taking advantage of the regular development of increasing resolution opens up new ways to display stereoscopic and immersive content, leading toward totally new venues that so far shied away from the need for glasses or other head based hardware. LARGE SCREEN/PROJECTION SYSTEMS Large screen systems, mostly based on projections (see below for new developments), have become very popular since their inception. They can provide the same 360° fully immersed feeling that HMDs support, but they require much less intrusive and smaller glasses than an HMD; when used with the correct interaction devices, they can be made completely untethered and wireless. Other benefits include the ability for the users to see their own real bodies and the ability for multiple people to see the virtual environment (although in a head-tracked scenario only one of them will get correct stereo images, they can all get an impression of what is displayed). The last aspect is especially interesting in a corporate setting, where many VEs are used for discussions and evaluations. Their disadvantages are the need for large screens to obtain a large field of view, and the need for covering those screens with high resolution pixels. HMDs move with the user’s head and can focus their pixel output where he or she is looking. Large screen displays need to have pixels everywhere, in case the user is looking there. This makes the resolution of the display a larger issue than in HMDs. There has been a lot of effort in trying to achieve higher resolution displays. The two possible approaches are either to use a combination of low resolution projectors or to use higher resolution components in the projector. The first approach is attractive due to the ability to use low cost COTS components to create very high resolution display systems (Kresse, Reiners, & Kno¨pfle, 2003). This is alleviated by the challenges to make the multiple components match up correctly, both geometrically and in color and brightness. Automated solutions for these problems have been found, and commercial companies are starting to provide them as turn-key solutions. One interesting approach developed by Jaynes, Seales, Calvert, Fei, and Griffioen (2003) and available from Mersive Technologies (2008) uses the DLP TVs mentioned above as the projection base unit. They are as attractive as they are comparatively cheap. In addition, they feature high (HD) resolution and, due to their construction, require only very limited constructive depth, which addresses one of the major problems of most projection displays. Joining the stereo capabilities of latest-generation TVs with
Virtual Environment Displays
425
intelligent color and geometric correction in a tiled fashion forms an interesting and affordable option in the projective display space. An alternative to tiling small units is to design systems with a high intrinsic resolution. This requires a very large engineering effort from the main projector units down to the actual LCD panel or DLP chips. The VE market is not big enough to sustain this kind of development. However, VE systems can benefit from developments in the general projection system market. Movie theatres are making the move to a digital distribution and display scheme, and they need very high resolution displays. As a consequence, such projector manufacturers as Sony and JVC are producing projectors capable of displaying images at resolutions of 4,096 × 2,160 (also known as 4K). This is a significant step over older systems, and at normal projection sizes and viewing distances can reach the resolution of the human eye. While not exactly cheap, these systems allow a fairly painless entry into very high resolution, out-of-the-box displays. In addition to increasing resolutions, cinemas are also moving toward stereoscopic displays. In the near future, combinations of very high resolution and stereoscopic displays can be expected to come out of the commercial cinema domain and be applicable directly in a VE context. Another interesting new avenue is a combination of the autostereoscopic display idea described above with projective displays. Using a holographic screen and an array of standard projectors it is possible to provide large-screen autostereoscopy, which allows multiple users to have their individual views on the 3-D scene at the same time (Holografika image). These displays provide very interesting opportunities, but at a very high cost, as a large number of pixels need to be calculated and generated. Large-screen systems continue to be a very active area of development. In the recent past commercial developments have stepped up to reach resolution and stereo capabilities that used to be a specialized topic in VE research and development and bring them into the COTS space, which will help bring costs down and make them more available to a wider variety of application domains. There are still research avenues, especially in going beyond single projector systems and into autostereoscopic systems, and many other quality aspects, such as brightness, color accuracy, and dynamic range, that are not fully addressed by current systems. CONCLUSION Displays have been a defining component of VEs since the very beginning. There has been a little bit of a lull in the field in the past when detailed refinements to existing technology where made, but no real breakthroughs. In the recent past interest has picked up, and new developments have opened new avenues in very high resolution displays, making autostereoscopic displays more available and approachable. There are many interesting avenues to follow for research in higher resolution and easier use, multiuser systems and to open new avenues in terms of new aspects of display technologies, such as higher dynamic ranges and better color
426
Integrated Systems, Training Evaluations, and Future Directions
precision, continuing on the quest to create the ultimate display that will allow us to display a virtual environment that is indistinguishable from reality. REFERENCES Bolas, M., & McDowell, I. (2006). The Wide5 HMD. Retrieved April 26, 2008, from http://www.fakespacelabs.com/Wide5.html Cruz-Neira, C., Defanti, T. A., Sandin, D. J., Hart, J., & Kenyon, R. (1992). The CAVE audio visual experience automatic virtual environment. Communications of the ACM 35(6), 64–72. Deering, M. (1993). Making virtual reality more real: Experience with the virtual portal. Proceedings of Graphics Interface’93 (pp. 219–226). New York: ACM. Fisher, S. S., McGreevy, M., Humphries, J., & Robinett, W. (1986, October). Virtual environment display system. Paper presented at the ACM Workshop on Interactive 3D Graphics, Chapel Hill, NC. iZ3D Monitor. (2008). Stereoscopic 3D and iZ3D Perception. Retrieved March 12, 2008, from http://www.iz3d.com/download/iZ3D_Whitepaper.pdf Jaynes, C., Seales, B., Calvert, K., Fei, Z., & Griffioen, J. (2003, May). The Metaverse—A collection of inexpensive, self-configuring, immersive environments. Paper presented at the 7th International Workshop on Immersive Projection Technology/Eurographics Workshop on Virtual Environments, Zurich, Switzerland. Kresse, W., Reiners, D., & Kno¨pfle, C. (2003, May). Color consistency for digital multiprojector stereo display systems: The HEyeWall and the digital cave. Paper presented at the 7th International Workshop on Immersive Projection Technology/Eurographics Workshop on Virtual Environments, Zurich, Switzerland. Krueger, W., & Fro¨ehlich, B. (1994). The responsive workbench. IEEE Computer Graphics and Applications, 3(3), 12–15. Mersive Technologies. (2008). The m-Series displays. Retrieved on April 15, 2008, from http://www.mersive.com/mSeries_About.html Sensics. (2008). The sensics PiSight HMD. Retrieved February 23, 2008, from http:// www.sensics.com/technology Sutherland, I. E. (1965). The ultimate display. Proceedings of IFIPS Congress, 2, 506–508. Sutherland, I. E. (1968). Head-mounted three-dimensional display. Proceedings of the Fall Joint Computer Conference, 33, 757–64. Ware, C., Arthur, K., & Booth, K. S. (1993). Fish tank virtual reality. Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (pp. 37–42). New York: ACM. Zimmermann, T. G., Lanier, J., Blanchard, C., Bryson, S., & Harvill, Y. (1987). A hand gesture interface device. Proceedings of the ACM Conference on Human Factors in Computing Systems and Graphics Interface (pp. 189–192). New York: ACM.
Chapter 41
MINDSCAPE RETUNING AND BRAIN REORGANIZATION WITH HYBRID UNIVERSES: THE FUTURE OF VIRTUAL REHABILITATION Cali Fidopiastis and Mark Wiederhold The hallmark of virtual reality (VR) technologies is the capacity to deliver real time simulations of real world contexts that allow for user interaction through multimodal sensory stimulation (Burdea & Coiffet, 2004, p. 3). As a rehabilitation tool, VR augments the therapist’s capability to provide such essential therapeutic elements as programmable systematic practice, engaging interaction, immediate feedback, safe nondistracting patient environments, simulations of objects or events difficult to replicate in real life, and environmental manipulations not capable in the real world (Holden, 2005; Riva, 2005; Rose, Brooks, & Rizzo, 2005). More importantly, the ability to create personalized environments that match the disability level of the patient is a major step forward in designing successful rehabilitation protocols (Wilson, Foreman, & Stanton, 1997). The capability of VR to support a broad range of therapies has driven over 15 years of research, encompassing both cognitive and physical rehabilitation. Researchers have successfully applied advanced VR simulations to improve motor and sensorimotor functions, to assess the extent of disability, to treat anxiety disorders, and to offer clinician-assisted telerehabilitation (Rizzo, Brooks, Sharkey, & Merrick, 2006; Wiederhold, 2006). While effective therapy is the goal of VR based rehabilitation, we predict that the more enduring and empowering legacy of this approach will be to advance our understanding of how the brain functions when persons perform real world tasks (Tarr & Warren, 2002). While postulating future technologies that can diagnose and deliver exacting therapies is more colorful, the unanswered questions of how the brain performs everyday tasks leave these fanciful ideas as pale as the promise of affordable individual flying transporters. Thus, our more pragmatic reflection on the future of simulation based rehabilitation is that such technologies coupled with portable, noninvasive, unobtrusive neurosensing devices (for example, encephalography, EEG) will provide a unique window into the realistic workings of the behaving
428
Integrated Systems, Training Evaluations, and Future Directions
brain. In the context of rehabilitation, understanding the retuning and the reorganizing capabilities of the human brain is necessary for determining therapeutic efficacy as well as understanding prognoses and long-term care needs. This aim of this chapter is to highlight the emerging capabilities of virtual worlds and couple them with advances in our understanding of brain function and recovery. EXTENSIBLE REALITIES AND THE THERAPEUTIC ADVANTAGE As available technologies for presenting three-dimensional (3-D) graphics continue to improve and expand, virtual reality has come to mean a particular method of simulating virtual worlds. Milgram and Kishino (1994) describe the virtuality continuum as a range with the real environment on one end and purely virtual environments on the other. Mixed reality (MR) falls between the two extremes and merges the physical real world with the virtual world in an interactive setting within the same visual display environment, either with the real environment augmented by virtual objects (augmented reality) or the converse (augmented virtuality). This flexibility allows for the creation of hybrid therapies that employ a mixture of virtual elements and traditional protocols. For example, exposure therapy where the patient actively confronts the feared object (for example, spiders) or situation (for example, public speaking) is recognized as a necessity for successful phobia treatment; however, there are many situations (for example, car accident) for which direct or graded confrontation is impractical and unacceptable (Wiederhold & Wiederhold, 2005). VR based therapy in these cases fills a necessary void. In fact, comparison studies between traditional and virtual reality exposure therapy show that VR therapy is equally as effective for treating social phobia (Klinger et al., 2005) and fear of flying (Rothbaum et al., 2006). There is also evidence that persons with specific types of phobias (for example, arachnophobia or fear of spiders) are more willing to seek VR based treatment than engaging in traditional exposure therapy (GarciaPalacios, Botella, Hoffman, & Fabregat, 2007). While advantageous in phobia therapy, VR is limited by the technological challenges of creating realistic feedback afforded by the physical properties of natural objects (for example, solidness of a cup). MR, however, incorporates the inherent advantages of immediate multisensory feedback from the physical world while retaining the advantages of VR. The real strength in this approach is its ability to re-create the human experience in a manner that potentially facilitates cognitive processing capabilities of the patient. This capability of MR is especially important for physical and cognitive neurorehabilitation applications. Fidopiastis et al. (2006) demonstrated the importance of MR based therapy for facilitating activities of daily living training. In their single case study, the researchers replicated the kitchen of a patient with severe memory impairment in MR and demonstrated that this contextualized environment could assist in training breakfast-making skills, such as locating breakfast items. An important result of this study was that after training, the participant was able to prepare breakfast in his own kitchen without cuing, which he was unable to do before training. The skills learned in this contextualized (ecologically valid), controlled,
Mindscape Retuning and Brain Reorganization with Hybrid Universes
429
and safe environment, therefore, transferred to the patient’s own home. Thus, the MR based setting provided the patient with an interactive and flexible environment in which he could “safely explore his functional capabilities” (Fidopiastis et al., 2006, p. 186). Both therapy examples above illustrate that the use of simulated real world environments is effective as rehabilitation tools for their respective application areas. While anxiety therapies provide a real world corollary for VR based treatment comparison, the heterogeneity of treatment methods for both simulation based and traditional cognitive rehabilitation precludes cross-comparisons (Cicerone et al., 2005; Fidopiastis, 2006). Judging treatment suitability for clinical adoption is an ongoing concern, especially for cognitive rehabilitation (Carney et al., 1999). Standard therapeutic goals and outcome measures remain clinically undefined for both traditional and virtual cognitive rehabilitation treatments (Weiss, Kedar, & Shahar, 2006; Whyte & Hart, 2003). There is an estimated total lifetime expense of approximately $60 billion for persons with traumatic brain injury (TBI) in the United States (Finkelstein, Corso, Miller, & Associates, 2006). Thus, there is an urgent need to develop the methods and the metrics to soundly assess the impact of rehabilitation therapies and assistive devices today (Whyte, 2006). Virtual rehabilitation technologies provide a therapeutic advantage in the development of methods and metrics for evaluating the effectiveness of rehabilitation treatments (Rizzo & Kim, 2005). First, virtual systems can be designed as a testbed where multiple technologies (for example, haptic devices and 3-D sound) and therapies (for example, functional training and neurofeedback) can be tested simultaneously or separately (Fidopiastis et al., 2005). Second, psychophysiological measures such as EEG can be integrated into the data output for real time data analysis (Fidopiastis, Hughes, Smith, & Nicholson, 2007). These two features allow for replicable systems that can provide cross-facility testing, quantify the level of therapeutic benefit, as well as characterize brain changes (both temporal and spatial), and lead to modular setups that are more appropriate for private use. More importantly, a single system can provide physical, cognitive, and psychological therapies allowing for programmatic level assessment. These advanced protocols will ultimately provide candid guidance for the development of future clinical technologies, such as brain-driven interfaces for the computer or other remote devices. MINDSCAPE RETUNING AND BRAIN REORGANIZATION USING VIRTUAL REHABILITATION Brian plasticity involves the capacity of the nervous system to modify its organization either structurally (for example, changes in neural connections) or functionally (for example, changes in neural patterns) due to experience, maturation, or injury (Ormerod & Galea, 2001). The essence of rehabilitation for both anxiety disorders and recovery from brain injury is to positively affect these neural changes and to subsequently enhance the patient’s long-term quality of life (Bremner, Elzinga, Schmahl, & Vermetten, 2008; Kelly, Foxe, & Garavan,
430
Integrated Systems, Training Evaluations, and Future Directions
2006). Identifying the types of therapies that most successfully meet these aims is the question of the rehabilitation sciences. An inherent problem in determining efficacy of any rehabilitation approach is the issue of individual differences. We propose that virtual rehabilitation tools coupled with state-of-the-art biosensing technologies are a means to not only characterize individual responses to therapy, but to extend our understanding of how persons physically, cognitively, and socially may perform during real world interactions, such as in vocational settings (Carney et al., 1999). Neuroimaging studies using such devices as functional magnetic resonance imaging (fMRI) may suggest structural and functional brain differences that underlie the various psychiatric disorders and brain changes due to injury. For example, Rauch and Shin (2002) reviewed the neuroimaging literature across the spectrum of anxiety and stress disorders. The authors contend that while there are brain similarities among persons presenting with these disorders, there are features that are unique to each disorder phenotype. These results support the idea that persistent cognitive impairments (for example, memory deficits) due to post-traumatic stress disorder (PTSD) may be attributable to maladaptive changes in the hippocampus and associated circuitry (for example, amygdala; Nemeroff et al., 2006). Although hyperarousal of the hippocampal circuitry is also seen in persons presenting with social phobias, these heightened activations are usually present only when the patient is faced with fear-evoking stimuli (Mataix-Cols & Phillips, 2007). Persons diagnosed with either disorder type, regardless of these brain differences, are most likely treated with similar psychotherapy (that is, exposure therapy). Yet, the specifics of the treatment protocol (for example, duration and type of medicines) are unsettled questions. This lack of specificity may lend to the chronicity or persistence of the disorder, especially for patients with PTSD. Issues such as these also affect neurorehabilitation treatment selection. Neuroimaging technologies are also emerging methods for evaluating neurorehabilitation therapies that may lead to beneficial brain plasticity: neuroplasticity that results in cognitive or motoric improvement (Boyd, Vidoni, & Daly, 2007; Kelly et al., 2006). Thus, the emphasis in neurorehabilitation is to apply treatments that facilitate functional neural reorganization or regeneration (for example, Chen, Abrams, & D’Esposito, 2006). In contrast, modifying fear-related memory structures, what we call “mindscape retuning,” is a primary goal of exposure therapies (for example, Foa & Kozack, 1986). The emphasis of these treatments may change as more information about the interconnectivity of brain areas and their contributions to cognitive processing (for example, learning) are elucidated. Regional cerebral blood flow as measured by positron emission tomography does show attenuation or retuning of the neural network associated with fear responding (hippocampus, amygdala, and medial prefrontal cortex) after traditional psychotherapy (De Raedt, 2006). Similar studies are in progress using fMRI for VR based exposure therapy to better understand its underlying neural mechanisms (Hoffman, Richards, Coda, Richards, & Sharar, 2003). There is also
Mindscape Retuning and Brain Reorganization with Hybrid Universes
431
fMRI evidence that VR based physical therapy leads to practice-induced cortical reorganization in the primary sensory motor cortex and subsequent improvement in motor behavior (You et al., 2005). The utility of specifying neural responses that underlie any treatment is the capacity to predict therapeutic outcomes on an individual basis (Paulus, 2008; Chen et al., 2006). As the current review suggests, informing treatment specificity and outcomes with neuroimaging is a nascent field of research. Evidence based analyses of treatments currently depend upon traditional experimental designs (for example, specified control groups and outcome metrics). Mahncke, Bronstone, and Merzenich (2006) provide an example of an evidence based therapeutic framework, Posit Science’s Brain Fitness training program, for remediating plasticity processes with negative consequences (for example, weakened neuromodulatory control and negative learning). Outcomes of such training are correlated with improved neuropsychological measures of memory in healthy older adults (Mahncke, Connor, et al., 2006). Extensive training and practice involving these processes have immense potential for improving higher order cognitive functions, such as memory. More importantly, the treatment offers a means to measure individual gains. Kelly et al. (2006) and Mahncke, Connor, et al. (2006) both utilized computer based programs to target practice-induced plasticity within cognitive processing networks to achieve positive functional results. Such computer based programs, however, do not offer engagement of senses beyond visual and audio. These types of therapies may not benefit brain areas with more complex functions, such as the prefrontal cortex (PFC). Chen et al. (2006) suggest that restoring function to the PFC and associated networks should be a priority when considering cognitive rehabilitation treatments for persons with TBI. The PFC is thought to exert executive control over integrated cognitive processes, such as attention and working memory, which are necessary to perform goal-directed tasks. This brain area and related network is particularly susceptible to damage from head injury. Thus, training tasks that target and engage the PFC networks may enhance PFC functioning and the resulting emergent executive processing. Virtual rehabilitation therapy, using mixed reality in particular, enables multimodal stimulation and programmable virtual environments that meet the criteria for stimulating, engaging, and training integrated neural processes underlying higher cognitive functions. The opportunity for MR based therapy is to integrate portable, unobtrusive neurosensing devices that can be comfortably worn in the 3-D virtual rehabilitation environment. NEW SCOPE AND FOCUS Under the Virtual Technologies and Environments initiative sponsored by the Office of Naval Research, the Media Convergence Laboratory (MCL) and the Institute for Simulation and Training at the University of Central Florida created the human experience modeler (HEM) testbed to evaluate technologies of virtual, augmented, and mixed reality that may enhance cognitive rehabilitation effectiveness (Fidopiastis et al., 2006). This reconfigurable, portable MR based
432
Integrated Systems, Training Evaluations, and Future Directions
training solution was extended to provide therapy for persons experiencing PTSD, TBI, and loss of limb. Currently, the MCL team has partnered with The Virtual Reality Medical Center to produce an MR system that assists physical rehabilitation for stroke-disabled patients with upper-extremity hemiplegia. The field of augmented cognition has developed technologies and protocols that can improve the information processing capabilities of learners within their operational environment. The use of biosensing devices, including functional near-infrared imaging, to determine real time changes in the cognitive state of the learner is one innovation of the field. Another innovation is the coupling of cognitive state measures to adaptive system changes within the simulation based training environment. This training system flexibly modifies the information exchange between the learner and the training material such that the learning state of the operator is optimized. Incorporating these features within the HEM will allow for real time assessment of the patient’s behavioral and cognitive changes within a contextualized environment. More importantly, this merger offers a greater potential for determining an effective rehabilitation strategy that not only shows promise in the clinic, but also transfers such successes to the home. The potential of demonstrating improved quality of life and overall functional outcomes for persons with anxiety disorders or impairments due to trauma is a true advance forward for the rehabilitation sciences. REFERENCES Bremner, J. D., Elzinga, B., Schmahl, C., & Vermetten, E. (2008). Structural and functional plasticity of the human brain in posttraumatic stress disorder. Progress in Brain Research, 167, 171–186. Boyd, L. A., Vidoni, E. D., & Daly, J. J. (2007). Answering the call: The influence of neuroimaging and electrophysiological evidence on rehabilitation. Physical Therapy, 87 (6), 684–703. Burdea, G. C., & Coiffet, P. (2004). Virtual reality technology (2nd ed.). New York: John Wiley & Sons. Carney, N., Chestnut, R. M., Maynard, H., Mann, N. C., Paterson, P., & Helfand, M. (1999). Effect of cognitive rehabilitation on outcomes for persons with traumatic brain injury: A systematic review. Journal of Head Trauma Rehabilitation, 14, 277–307. Chen, A., Abrams, G. M., & D’Esposito, M. (2006). Functional reintegration of prefrontal neural networks for enhancing recovery after brain injury. Journal of Head Trauma Rehabilitation, 21(2), 107–118. Cicerone, K. D., Dahlberg, C., Malec, J. F., Langenbahn, D. M., Felicetti, T., Kneipp, S., et al. (2005). Evidence-based cognitive rehabilitation: Updated review of the literature from 1998 through 2002. Archives of Physical Medicine and Rehabilitation, 86(8), 1681–1682. De Raedt, R. (2006). Does neuroscience hold promise for the further development of behavior therapy? The case of emotional change after exposure in anxiety and depression. Scandinavian Journal of Psychology, 47(3), 225–236. Fidopiastis, C. M. (2006). User-centered virtual environment assessment and design for cognitive rehabilitation applications. Ph.D. dissertation, University of Central Florida,
Mindscape Retuning and Brain Reorganization with Hybrid Universes
433
Orlando, FL. Retrieved March 5, 2008, from Dissertations & Theses database. (Publication No. AAT 3233649). Fidopiastis, C. M., Hughes, C. E., Smith, E. M., & Nicholson, D. M. (2007, September). Assessing virtual rehabilitation design with biophysiological metrics [Electronic version]. Proceedings of Virtual Rehabilitation 2007, 89. Fidopiastis, C. M., Stapleton, C. B., Whiteside, J. D., Hughes, C. E., Fiore, S. M., Martin, G. M., et al. (2005, September). Human experience modeler: Context-driven cognitive retraining to facilitate transfer of learning. Paper presented at the 4th International Workshop on Virtual Rehabilitation (IWVR), Catalina Island, CA. Fidopiastis, C. M., Stapleton, C. B., Whiteside, J. D., Hughes, C. E., Fiore, S. M., Martin, G. M., et al. (2006). Human experience modeler: Context-driven cognitive retraining to facilitate transfer of learning. CyberPsychology & Behavior, 9(2), 183–187. Finkelstein, E. A., Corso, P. S., Miller, T. R., & Associates. (2006). The incidence and economic burden of injuries in the United States. New York: Oxford University Press. Foa, E. B., & Kozak, M. J. (1986). Emotional processing of fear: Exposure to corrective information. Psychological Bulletin, 99, 20–35. Garcia-Palacios, A., Botella, C., Hoffman, H., & Fabregat, S. (2007). Comparing acceptance and refusal rates of virtual reality exposure vs. in vivo exposure by patients with specific phobias. Cyberpsychology & Behavior, 10(5), 722–724. Hoffman, H. G., Richards, T., Coda, B., Richards, A., & Sharar, S. R. (2003). The illusion of presence in immersive virtual reality during an fMRI brain scan. Cyberpsychology & Behavior, 6(2), 127–123. Holden, M. K. (2005). Virtual environments for motor rehabilitation: Review. CyberPsychology & Behavior, 8(3), 187–211. Kelly, C., Foxe, J. J., & Garavan, H. (2006). Patterns of normal human brain plasticity after practice and their implications for neurorehabilitation. Archives of Physical Medicine and Rehabilitation, 87(2), S20–S29. Klinger, E., Bouchard, S., Legeron, P., Roy, S., Lauer, F., Chemin, I., et al. (2005). Virtual reality therapy versus cognitive behavior therapy for social phobia: A preliminary controlled study. CyberPsychology & Behavior, 8(1), 76–88. Mahncke, H. W., Bronstone, A., & Merzenich, M. M. (2006). Brain plasticity and functional losses in the aged: scientific bases for a novel intervention. Progress in Brain Research, 157, 81–109. Mahncke, H. W., Connor, B. B., Appelman, J., Ahsanuddin, O. N., Hardy, J. L, Wood, R., et al. (2006). Memory enhancement in healthy older adults using a brain plasticitybased training program: A randomized, controlled study. Medical Sciences, 33, 12523–12528. Maitix-Cols, D., & Phillips, M. L. (2007). Psychophysiological and functional neuroimaging techniques in the study of anxiety disorders. Psychiatry, 6(4), 156–160. Milgram, P., & Kishino, A. F. (1994). Taxonomy of mixed reality visual displays. IEICE Transactions on Information and Systems, E77-D(12), 1321–1329. Nemeroff, C. B., Bremner, J. D., Foa, E. B., Mayberg, H. S., North, C. S., & Stein, M. B. (2006). Posttraumatic stress disorder: A state-of-the-science review. Journal of Psychiatric Research, 40(1), 1–21. Ormerod, B. K., & Galea, L. (2001). Mechanisms and function of adult neurogenesis. In C. A. Shaw & J. C. McEachern (Eds.), Toward a theory of neuroplasticity (pp. 85– 100). Lillington, NC: Taylor & Francis.
434
Integrated Systems, Training Evaluations, and Future Directions
Paulus, M. P. (2008). The role of neuroimaging for the diagnosis and treatment of anxiety disorders. Depression and Anxiety, 25, 348–356. Rauch, S. L., & Shin, L. M. (2002). Structural and functional imaging of anxiety and stress disorders. In K. L. Davis, D. Charney, J. T. Coyle, & C. Nemeroff (Eds.), Neuropsychopharmacology: The fifth generation of progress (pp. 953–966). Philadelphia: Lippincott Williams & Wilkins. Riva, G. (2005). Virtual reality in psychotherapy: Review. CyberPsychology & Behavior, 8(3), 220–230. Rizzo, A. A., & Kim, G. J. (2005). A SWOT analysis of the field of virtual rehabilitation and therapy. Presence: Teleoperators and Virtual Environments, 14(2), 119–146. Rizzo, A. S., Brooks, T., Sharkey, P. M., & Merrick, J. (2006). Advances in virtual reality therapy and rehabilitation. International Journal on Disability and Human Development, 5(3), 203–204. Rose, F. D., Brooks, B. M., & Rizzo, A. A. (2005). Virtual reality in brain damage rehabilitation: Review. CyberPsychology & Behavior, 8(3), 241–262. Rothbaum, B. O., Anderson, P., Zimand, E., Hodges, L., Lang, D., & Wilson, J. (2006). Virtual reality exposure therapy and standard (in vivo) exposure therapy in the treatment of fear of flying. Behavior Therapy, 37, 80–90. Tarr, M. J., & Warren, W. H. (2002). Virtual reality in behavioral neuroscience and beyond. Nature, Neuroscience Supplement, 5, 1089–1092. Weiss, P. L., Kedar, R., & Shahar, M. (2006). Ties that bind: An introduction to domain mapping as a visualization tool for virtual rehabilitation. CyberPsychology & Behavior, 9(2), 114–122. Whyte, J. (2006). Using treatment theories to refine the designs of brain injury rehabilitation treatment effectiveness studies. Journal of Head Trauma and Rehabilitation, 21 (2), 99–106. Whyte, J., & Hart, T. (2003). It’s more than a black box, it’s a Russian doll: Defining rehabilitation treatments. American Journal of Physical Medicine and Rehabilitation, 82 (8), 639–652. Wiederhold, B. (2006). CyberTherapy 2006. CyberPsychology & Behavior, 9(6), 651–652. Wiederhold, B. K., & Wiederhold, M. D. (2005). Virtual reality therapy for anxiety disorders: Advances in evaluation and treatment. Washington, DC: American Psychological Association. Wilson, P. N., Foreman, N., & Stanton, D. (1997). Virtual reality, disability and rehabilitation. Disability and Rehabilitation, 19(6), 213–220. You, S. H., Jang, S. H., Kim, Y. H., Hallett, M., Ahn, S. H., Kwon, Y. H., et al. (2005). Virtual reality-induced cortical reorganization and associated locomotor recovery in chronic stroke: An experimenter-blind randomized study. Stroke, 36, 1166–1171.
Chapter 42
PERSONAL LEARNING ASSOCIATES AND THE NEW LEARNING ENVIRONMENT J. D. Fletcher Much education, training, problem solving, performance aiding, decision aiding, and the like, may, in the not-distant future, rely on dialogues or conversations with personalized computer based devices, which might be called personal learning associates (PLAs). It further seems likely that these devices will be used as portals into virtual worlds and virtual environments where these dialogues will continue in combination with other experiences, contexts, and conditions. Functionally, such a PLA-inhabited world might rely on three key components: 1. A global information infrastructure, such as today’s World Wide Web, populated by sharable digital objects. These objects could be content for display, such as text, video, virtual “islands,” and avatars. They could also be nondisplay materials, such as algorithms, instructional strategies, software tools, and databases. 2. Servers to locate and retrieve these digital objects and assemble them to support interactions with users and learners. 3. Devices that serve as PLAs for users and learners. They could be handhelds and laptops so that they are available on demand, anytime, anywhere. They could also be hosted on platforms ranging from integrated circuits to mainframes. The PLAs could be linked for use by groups of geographically dispersed learners working collaboratively. They will be personal accessories, but they need not be limited to individual uses.
TRENDS There are historical and technological trends in education, training, and elsewhere that point to the likelihood, if not inevitability, of PLA devices and capabilities. In discussing these trends we need a generic term for education, training, performance aiding, problem solving, decision aiding, and similar capabilities. For convenience they are lumped together here and called “learning.”
436
Integrated Systems, Training Evaluations, and Future Directions
SOME HISTORICAL TRENDS IN LEARNING In the primordial beginnings and for perhaps 100,000 years thereafter, learning involved direct, in-person interactions between learners and a sage. Seven thousand or so years ago we learned how to write, which effected a major revolution in learning. People with enough time and resources could study the words of sages without having to rely on face-to-face interaction or the vagaries of human memory. Learning began to move in an on-demand, anytime, anywhere direction. The next step was the development of books (that is, something beyond mud or stone tablets). As discussed by Kilgour (1998), books were based on papyrus and parchment rolls until about 300 B.C. when the Romans began to sew sheets of parchment together into codices. These were cheaper to produce because they were based on locally available parchment made from animal skin and allowed content to be placed on both sides of the sheets. Use of paper prepared from linen and cotton in about A.D. 100 (China) and A.D. 1200 (Europe) made books even less expensive. Their lowered costs made them more available to a literate and growing middle-class who, in turn, increased the demand for more cost reductions, more books, and more of the learning they provided. This demand led to the introduction of books printed from moveable type, first in China around A.D. 1000, and later in Europe in the mid-1400s (Kilgour, 1998). Learning then continued to become more widely and inexpensively available on demand, anytime, anywhere. Next, after about 500 years, comes the computer. With its ability to adapt the sequence and type of operations based on conditions of the moment—or microsecond—computer technology may effect yet another revolution in learning. While preserving the capabilities of writing and books to present learning content on demand, it can also provide guidance and tutorial interactions as needed by individual learners. This combination of learning and individually tailored interactivity is not something books, movies, television, or videotape technologies can do to any appreciable degree. It is a new and significant capability for learning. In short, the progression of learning across human history appears to be toward increased on-demand, anytime, anywhere access to learning. Aided by computer technology, it seems likely to continue. At least that is the argument presented here. TECHNOLOGY Many technologies evolve in directions that no one foresees. We had steam engines before railways, wireless telegraph before radio, microwave transmitters before microwave ovens, the Internet before the Web, and so forth. Still, there may be value in trying to envision where our technologies may be taking us. Knowing in advance where we are going can help us get there—or avoid doing so, should that seem more prudent. It has been suggested that the future is already here, but unrecognized and unevenly distributed. When it comes to learning and learning environments we
Personal Learning Associates and the New Learning Environment
437
might ask what is currently unrecognized and unevenly distributed to see where these environments, and we, may be headed. We might begin by hazarding a list of possibly relevant trends and capabilities that are already at hand. Such a list could include the following: Moore’s Law. In 1965 Gordon Moore, a co-founder of Intel Corporation, noted casually that engineers were doubling the number of electronic devices on chips every year. If we expand Moore’s time estimate to 18 months, our expectations fit reality quite closely (Brenner, 1997). This pace of development seems likely to continue. Gorbis and Pescovitz (2006) found that about 70 percent of IEEE (Institute of Electrical and Electronic Engineers) Fellows expect Moore’s Law to continue holding for at least 10 more years. About 35 percent of them expect it to continue beyond that, up to 20 years. The major consequence of Moore’s Law for PLAs is that the technology needed to support them will become increasingly more compact and affordable. Computer Communications and Networking. The most dramatic and globally pervasive manifestations of computing in our daily lives seem to be the Internet and the World Wide Web. Web use grew about 266 percent between 2000 and 2007, with more than 1.3 billion learners and users of all sorts worldwide as of December 2007 (Internet World Stats, 2007). The Web and the evolving global information infrastructure have made vast amounts of human information—and misinformation— globally accessible. Tens of thousands of people can participate in massively multiplayer online games, such as EverQuest, Final Fantasy, RuneScape, and World of Warcraft. Similar multitudes of globally dispersed learners may soon be participating in virtual environments through PLAs. The Semantic Web. The Semantic Web (Berners-Lee, Hendler, & Lassila, 2001), which is being developed under the auspices of the World Wide Web Consortium, should improve cooperation between computers and human beings by imbuing Web information with meaning and ontological connections. These connections are expected to expose semantic linkages between disparate bodies of knowledge regardless of how different they may appear to be at first (for example, Chandrasekaran, Josephson, & Benjamins, 1999). They will make it possible to develop increasingly powerful, accurate, and comprehensive models of learners for use in tailoring learning environments and their interactions to individual needs and interests (Dodds & Fletcher, 2004). They may add substantially to the adaptability and realism of virtual environments. Computer Graphics, Video, and Animation. The validity of the multimedia principle, which states that people can absorb more information from words and pictures presented together than from words alone, seems well established by research and ensuing cognitive theory (Fletcher & Tobias, 2005). Enhancements in multimedia capabilities (for example, graphics, video, and animation) now available in virtual environments, and therefore available to PLAs, increase the power, flexibility, and functional range of learning environments and, thanks to the multimedia principle, the retention and transfer of what is learned from them. Learning Objects. Object-oriented applications are becoming ubiquitous. The development of specifications to make learning objects accessible, interoperable, reusable, and durable is an integral part of this trend. These specifications have been described elsewhere (for example, Fletcher, Tobias, & Wisher, 2007; Wiley, 2000). The objects are packaged in metadata, which describes what is in the package, and are being made
438
Integrated Systems, Training Evaluations, and Future Directions
available on the global information infrastructure, allowing object-oriented applications, such as we might find in PLAs, to identify, locate, and access them, thereby enhancing the flexibility, responsiveness, and adaptability of leaning environments. Natural Language Processing. The steadily growing capabilities of computer technology to participate in natural language conversations (for example, Graesser, Gernsbacher, & Goldman, 2003) will significantly enhance the mixed initiative dialogues in which participants, both computer generated and real, participating in learning environments can initiate interactions. One can imagine turning an avatar loose on the global information infrastructure to find advice or to answer a question by locating relevant learning objects and/or engaging humans and other avatars in conversations and returning to report when it judges itself ready. Language barriers should diminish in virtual environments as avatars and human participants become increasingly able to interact using a variety of languages (for example, Chatham, in press). Given the economic windfall promised by reliable natural language understanding by computers, it seems likely that these capabilities will continue to develop. Individualized, Computer-Assisted Learning. Major improvements over classroom instruction occur when education and training can be presented in tutorial, individualized interactions. The difference can amount to two standard deviations as, for instance, Bloom (1984) found. However, we cannot afford a single human instructor for every learner nor a single advisor for every problem solver. A solution to this problem may be found, as Fletcher (1992) and Corbett (2001) have suggested, by using computers to make affordable the substantial benefits of individualized, tutorial learning suggested by Bloom’s research. Computer technology captured these benefits early on. Since the 1960s they have tailored (a) rate of progress for individual learners, (b) sequences of instructional content and interactions to match each learner’s needs, (c) content itself—providing different learners with different content depending on what they have mastered, and (d) difficultly levels to ensure that the tasks for the learner are not so easy as to be boring or so difficult as to seem impossible. These capabilities have been available and used in computer based instruction from its inception (for example, Coulson, 1962; Galanter, 1959; Suppes, Jerman, & Brian, 1968). By the early 1970s, the effectiveness of using computer technology to individualize learning was generally recognized (for example, Ford, Slough, & Hurlock, 1972; Vinsonhaler & Bass, 1972). Findings from many studies comparing the use of computers in learning to standard classroom practice may be summarized, statistically, by a “rule of thirds.” This rule suggests that the learning capabilities we would expect to find on computer based devices, such as PLAs, can reduce the cost of delivering instruction by about one-third and, beyond that, either reduce instructional time to reach instructional goals by about one-third (holding learning constant) or increase the skills and knowledge acquired by about one-third while holding instructional time constant. As a statistical summary that is silent about cause, the rule of thirds is compatible with Clark’s (1983) often-cited point that it is not technology itself, but what we do with it that matters. Still, the demonstrably attainable savings that the rule of thirds reports in time to learn can be expected to be found in the use of PLAs and could reduce the costs of specialized skill training in the Department of Defense by as much
Personal Learning Associates and the New Learning Environment
439
as 25 percent (Fletcher, 2006). Similar cost savings are attainable through the use of PLAs as performance aids in equipment maintenance (Fletcher & Johnston, 2002). Intelligent Tutoring Systems. The key and historical difference between computerassisted instruction and intelligent tutoring systems is a substantive matter and more than a marketing term. When intelligent tutoring was first introduced into computerassisted instruction, it concerned quite specific goals that were first targeted in the 1960s (Carbonell, 1970; Fletcher & Rockway, 1986; Sleeman & Brown, 1982). Two defining capabilities were that intelligent tutoring systems should • Allow either the system or the learner to ask open-ended questions and initiate a “mixed-initiative” dialogue as needed or desired for learning. Mixed-initiative dialogue requires a language that is shared by both the system and the learner. Natural language has been a frequent and continuing choice for this capability (for example, Brown, Burton, & DeKleer, 1982; Collins, Warnock, & Passfiume, 1974; Graesser, Person, & Magliano, 1995; Graesser, Gernsbacher, & Goldman, 2003), but the language of mathematics, mathematical logic, electronics, and other wellstructured communication systems have also been used (Barr, Beard, & Atkinson, 1975; Suppes, 1981; Sleeman & Brown, 1982; Psotka, Massey, & Mutter, 1988). • Generate learning material and interactions on demand rather than require developers to foresee and prestore all such materials and interactions needed to meet all possible eventualities. This capability involves not just generating problems tailored to each learner’s needs, but also providing coaching, hints, critiques of completed solutions, appropriate and effective teaching strategies, and, overall, the interactions and presentations characteristic of individualized, tutorial learning environments. Generative capability remains key to the full range of PLA capabilities envisioned here.
Early applications such as BIP in computer programming (Barr, Beard, & Atkinson, 1975), BUGGY in subtraction (Brown & Burton, 1978), EXCHECK in mathematical logic (Suppes, 1981), SOPHIE in electronic troubleshooting (Brown, Burton, & DeKleer, 1982), and others demonstrated that the necessary capabilities to model subject matter and match it with models of the learner and generate interactions on demand and in real time are within our technical grasp. Development of these capabilities has continued to improve their performance (for example, Luckin, Koedinger, & Greer, 2007; McCalla, Looi, Bredeweg, & Breuker, 2005; Polson & Richardson, 1988; Psotka, Massey, & Mutter, 1988). PLA OPERATIONS, FUNCTIONALITIES, AND CAPABILITIES What happens as we begin to combine the above technologies, among others, into learning applications? What might we expect a PLA to be and do? A PLA might be carried in a pocket or on a belt, worn as a shirt, or even implanted. It will operate wirelessly, accessing the global information infrastructure. It will include all the eagerly sought and widely used functionalities found on today’s mobile telephones—e-mail, games, instant messaging, and even voice communication between people. It will use natural language, speech and/or text, to communicate—although other modes, including the language of science, mathematics, and engineering, will be available. It will provide a full range of
440
Integrated Systems, Training Evaluations, and Future Directions
media for interactions, including graphics, photographics, animation, video, and the like. An important feature of PLAs will be their ability to allow participation in virtual environments and simulations, which could be used as virtual laboratories, mimicking equipment, situations, markets, and so forth. Virtual laboratories will allow the learner to test different hypotheses concerning the subject matter, try out different problem solving strategies and solutions, participate in collaborative learning and problem solving, and examine the effects and implications of different decisions. Because PLAs will be able to link to other PLAs, they will be able to contact experts and engage with other learners in virtual environments using software tools (for example, Soller & Lesgold, 2003) that identify and assemble potential communities of interest and enhance communication and collaboration within them. PLAs will become intensely personal accessories. Through explicit and/or implicit means, they will develop, test, and modify models of the learner(s). These models will reflect each learner’s knowledge, skills, abilities, interests, values, objectives, and style of encoding information. By using this information to access the global information infrastructure, PLAs will be able to collect and assemble precisely the learning objects that an individual needs to learn, solve a problem, or make a decision. In effect, PLAs may provide a polymath in every pocket, accessing the whole of human knowledge and information, filtering and adapting it for relevance and accuracy, and supplying it, on demand, in a form and level of difficulty that an individual learner is prepared to understand and apply. By incorporating natural language understanding, PLAs may provide the goaldirected, on-demand, interactive conversations that have long been the goal of automated learning (for example, Uttal, 1962). The foundation of these interactions would be a mixed-initiative conversation between the learner and the PLA to achieve targeted objectives.
PLA INFRASTRUCTURE: PROGRESS It seems reasonable to anticipate the ready availability of devices that can support PLA functions. Moore’s Law, computer communications, wireless infrastructure, and the development of handheld computing should all help ensure this outcome. The sharable learning objects required by PLAs are achievable, but not so easily assumed. These require effort and agreement among developers more than scientific breakthroughs. The global information infrastructure, currently instantiated as the World Wide Web, is obviously in place. It needs to be complemented by capabilities that automatically and precisely locate digital objects that will operate on most, perhaps all, of the PLA platforms to which they might be delivered. Objects that meet these criteria have been specified by the Sharable Content Object Reference Model (SCORM) developed by the Advanced Distributed
Personal Learning Associates and the New Learning Environment
441
Learning (ADL) initiative (Dodds & Fletcher, 2004; Fletcher, Tobias, & Wisher, 2007). SCORM ensures that learning objects developed in accord with its specifications allow them to be interoperable across computing platforms of many types, durable across different versions of underlying system support software, and reusable across multiple environments and applications. SCORM has received global acceptance as a specification and is progressing through the steps needed to be certified as an international standard. Whether or not SCORM is the ultimate specification for supporting PLAs remains to be seen, but it is an essential beginning. It has demonstrated the feasibility and acceptability of sharable learning objects. The issue of access remains. Even if the global information infrastructure is well populated with interoperable, reusable, and durable objects, the problem of finding precisely correct objects to meet PLA user requirements remains. The Content Object Registry/Repository Discovery and Resolution Architecture (CORDRA) and the accompanying ADL Registry infrastructure have made substantial advances toward this goal (Dodds & Fletcher, 2004; Fletcher, Tobias, & Wisher, 2007). CORDRA uses metadata packaging and ontologies to allow substantially more precise location of digital objects than the text crawling techniques of many current search engines. Its precision can be expected to continue improving with the development of the Semantic Web and other emerging capabilities. As with SCORM, the eventual tool used by PLAs may or may not be CORDRA based, but its functionalities are likely to remain quite similar. SCORM and CORDRA give us the means to populate the global information infrastructure with PLA-usable objects. We have only to create them. That appears to be happening. A survey of learning materials developed for industry and government found that over 4 million SCORM objects had been produced (Rehak, 2006). More have been appearing steadily since that survey was made. The critical part of PLA functioning, then, remains the capability of servers to assemble learning material on demand, in real time, and in accord with learners’ needs. PLAs will implement the generative, dialogue based, informationstructured capabilities called for by intelligent tutoring systems. In effect, PLAs must participate in the design and development of the learning environment—virtual and otherwise—in addition to presenting it. They can become more than just delivery systems. This goal has not yet been reached, but it appears achievable. Still, there is much that can be done in the interim. FINAL WORD It seems likely that the technological trends and capabilities discussed here and carried forward by the ancient and continuing trend toward on-demand, anytime, anywhere learning will lead to the appearance of something very much like PLA based learning environments. We may reasonably expect substantially increased, globally available learning opportunities through enhanced access to education, training, problem solving, performance aiding, and decision aiding—or learning —made possible by PLAs. We can expect learning to become more responsive and effective through the continuous assessment, learner modeling, and
442
Integrated Systems, Training Evaluations, and Future Directions
interactions tailored on demand to learner needs that our technologies are making feasible. Finally, we can expect PLAs to vastly enhance access to and use of virtual environments for learning. REFERENCES Barr, A., Beard, M., & Atkinson, R. C. (1975). A rationale and description of a CAI program to teach the BASIC programming language. Instructional Science, 4, 1–31. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific American, 284, 34–43. Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13, 4–16. Brenner, A. E. (1997). Moore’s Law. Science, 275, 1551. Brown, J. S., & Burton, R. R. (1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science, 2(2), 155–192. Brown, J. S., Burton, R. R., & DeKleer, J. (1982). Pedagogical, natural language and knowledge engineering in SOPHIE I, II, and III. In D. Sleeman & J. S. Brown (Eds.), Intelligent Tutoring Systems (pp. 227–282). New York: Academic Press. Carbonell, J. R. (1970). AI in CAI: An artificial intelligence approach to computer-assisted instruction. IEEE Transactions on Man-Machine Systems, 11, 190–202. Chandrasekaran, B., Josephson, J. R., & Benjamins V. R. (1999). Ontologies: What are they? Why do we need them? IEEE Intelligent Systems and Their Applications, 14, 20–26. Chatham, R. E. (in press). Toward a second training revolution: Promise and pitfalls of digital experiential training. In K. A. Ericcson (Ed.), Development of professional expertise: Toward measurement of expert performance and design of optimal learning environments. Cambridge, United Kingdom: Cambridge University Press. Clark, R. E. (1983). Reconsidering research on learning from media. Review of Educational Research, 53, 445–459. Collins, A., Warnock, E. H., & Passfiume, J. J. (1974). Analysis and synthesis of tutorial dialogues (BBN Rep. No. 2789). Cambridge, MA: Bolt Beranek and Newman. (ERIC ED 088 512) Corbett, A. (2001). Cognitive computer tutors: Solving the two-sigma problem. In M. Bauer, P. J. Gmytrasiewicz, & Y. Vassileva (Eds.), User Modeling (pp. 137–147). Berlin: Springer-Verlag. Coulson, J. E. (Ed.). (1962). Programmed learning and computer-based instruction. New York: John Wiley and Sons. Dodds, P. V. W., & Fletcher, J. D. (2004). Opportunities for new “smart” learning environments enabled by next generation web capabilities. Journal of Education Multimedia and Hypermedia, 13(4), 391–404. Fletcher, J. D. (1992). Individualized systems of instruction. In M.C. Alkin (Ed.), Encyclopedia of educational research (6th ed., pp. 613–620). New York: Macmillan. Fletcher, J. D. (2006). A polymath in every pocket. Educational Technology, 46, 7–18. Fletcher, J. D., & Johnston, R. (2002). Effectiveness and cost benefits of computer-based aids for maintenance operations. Computers in Human Behavior, 18, 717–728. Fletcher, J. D., & Rockway, M. R. (1986). Computer-based training in the military. In J. A. Ellis (Ed.), Military contributions to instructional technology (pp. 171–222). New York: Praeger Publishers.
Personal Learning Associates and the New Learning Environment
443
Fletcher, J. D., & Tobias, S. (2005). The multimedia principle. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 117–133). New York: Cambridge University Press. Fletcher, J. D., Tobias, S., & Wisher, R. L. (2007). Learning anytime, anywhere: Advanced distributed learning and the changing face of education. Educational Researcher, 36(2), 96–102. Ford, J. D., Slough, D. A., & Hurlock, R. E. (1972). Computer assisted instruction in Navy technical training using a small dedicated computer system: Final report (Research Rep. No. SRR 73-13). San Diego, CA: Navy Personnel Research and Development Center. Galanter, E. (Ed.). (1959). Automatic teaching: The state of the art. New York: John Wiley & Sons. Gorbis, M., & Pescovitz, D. (2006). IEEE Fellows survey: Bursting tech bubbles before they balloon. IEEE Spectrum, 43(9), 50–55. Graesser, A. C., Gernsbacher, M. A., & Goldman, S. (Eds.). (2003). Handbook of discourse processes. Mahwah, NJ: Lawrence Erlbaum. Graesser, A. C., Person, N. K., & Magliano, J. P. (1995). Collaborative dialogue patterns in naturalistic one-on-one tutoring. Applied Cognitive Psychology, 9, 495–522. Internet World Stats. (2007). Usage and population statistics. Retrieved January 30, 2008, from http://www.internetworldstats.com Kilgour, F. G. (1998). The evolution of the book. New York: Oxford University Press. Luckin, R., Koedinger, K. R., & Greer, J. (Eds.). (2007). Artificial intelligence in education. Amsterdam: IOS Press. McCalla, G., Looi, C. K., Bredeweg, B., & Breuker, J. (Eds.). (2005). Artificial intelligence in education. Amsterdam: IOS Press. Polson, M. C., & Richardson, J. J. (Eds.). (1988). Intelligent tutoring systems. Mahwah, NJ: Lawrence Erlbaum. Psotka, J., Massey, L. D., & Mutter, S. A. (Eds.). (1988). Intelligent tutoring systems: Lessons learned. Hillsdale, NJ: Lawrence Erlbaum. Rehak, D. R. (2006). Challenges for ubiquitous learning and learning technology. Educational Technology, 46, 43–49. Sleeman, D., & Brown, J. S. (Eds.). (1982). Intelligent tutoring systems. New York: Academic Press. Soller, A., & Lesgold, A. (2003). A computational approach to analyzing online knowledge sharing interaction. In U. Hoppe, F. Verdejo, & J. Kay (Eds.), Proceedings of Artificial Intelligence in Education 2003 (pp. 253–260). Amsterdam: IOS Press. Suppes, P. (Ed.). (1981). University-level computer assisted instruction at Stanford: 1968– 1980. Stanford, CA: Institute for Mathematical Studies in the Social Sciences. Suppes, P., Jerman, M., & Brian, D. (1968). Computer-assisted instruction: The 1965–66 Stanford arithmetic program. New York: Academic Press. Uttal, W. R. (1962). On conversational interaction. In J. E. Coulson (Ed.), Programmed learning and computer-based instruction (pp. 171–190). New York: John Wiley and Sons. Vinsonhaler, J. F., & Bass, R. K. (1972). A summary of ten major studies on CAI drill and practice. Educational Technology, 12, 29–32. Wiley, D. (2000). The instructional use of learning objects. Retrieved January 30, 2007, from http://www.reusability.org/read
Chapter 43
THE FUTURE OF MUSEUM EXPERIENCES Lori Walters, Eileen Smith, and Charles Hughes
THE UBIQUITOUS MUSEUM The evolution of the museum will be tightly woven with the desires of what noted educator Marc Prensky defines as the “digital native” generation (Prensky, 2001). It is a generation that has no direct link to the vacuum tube, the rotary phone, or a world before satellites. To these individuals born after 1990, the personal computer is as much a natural component of life as a television for those born after 1960. Museums of the future have the incredible opportunity to merge the interactivity and visualization that the digital generation enjoys, while still maintaining one of the museum’s paramount missions—the intergenerational transfer of cultural memory. Traditionally we have provided this intergenerational transfer as if free-choice learning centers were isolated points of interconnectivity. The links to exploration and learning should not weaken simply because an individual has left the brick and mortar confines of the museum. Future museums will have available to them a web of connectivity that provides for a seamless transfer of knowledge between visitor, facility, and beyond. Current technology can permit learning data and experiences to travel with individual learners through time and space. Their museum experiences can be captured onto a personal learning “journal” that follows them after they depart the facility. Radio frequency identification (RFID) technology, personal digital assistants (PDAs) and cell phones, allow customizable exploration of the museum’s learning experiences. As visitors make choices during their exploration on priority community issues at a science center, or examine artifacts and artwork of interest at a historical or art museum, their decisions (implicit and explicit) are logged into their journal. Data gathered at the museum on field trips can follow students back into the classroom, giving educators a tool that can link the discovery learning at the museum with curricular learning at their students’ particular grade level. Families can access data gathered at the museum once they
The Future of Museum Experiences
445
return home, allowing deeper and longer exploration than is possible at a museum exhibit. Museums are just beginning to explore the potential of the digital revolution. While most maintain a Web presence, these Web sites offer little more than reformatted versions of their marketing information that a potential visitor could acquire at any tourist bureau kiosk. Museums that have ventured beyond the brochure phase and have developed supplemental materials for their larger exhibits have, to a large extent, yet to tap into the Internet’s full interactive potential. Most sites offer generic experiences where everyone begins at the same point and few drill down through layers to locate information of particular interest to those individuals. Any interactivity comes not from how the visitor interacted with exhibits at the museum; it is derived from interaction with the Web site only—reinforcing the separation of the museum and its Web site. The next generation of the museum Web site can provide interactive adventures that are customized to the interests of individual visitors. Museum experiences are a combination of personal, social, and physical contexts for learning as noted by Falk and Dierking (2000). This multifaceted context allows for rich exploration and connection to everyday life. While technology continues to evolve and will ultimately provide new solutions, the Media Convergence Laboratory at the University of Central Florida is examining this concept with current RFID technology at three partner museums. At each facility a network of RFID transceivers will be positioned throughout a specific exhibit hall. Visitors receive a pre-encoded disposable wristband, and as they walk through the exhibit the wristband sends data to a transceiver recording the length of time that a visitor interacts with a particular artifact/exhibit. Each exhibit is identified with a multitude of potential interests. The acquired data are used to create personalized online experiences that can be accessed by logging on to the facility’s Web site and entering the number printed on each visitor’s wristband. Another primary use of RFID is to customize any particular visitor’s learning while in the museum facility itself. Exhibits can allow visitors to answer polls, prioritize issues, explore “what if ?” scenarios, and keep the decisions made and the simulations experienced in their digital “files” throughout their visits. The RFID experience is completely anonymous as the numbered wristbands have no connection to individual names—thus there are no privacy issues to concern visitors. The personalized interactive Web site will allow further, deeper exploration of a topic of specific interest, stemming from the in-facility experiences that the individual or group had, and enhance the overall individual learning experience. A significant secondary use for the captured data is providing a new tool to museum exhibit and education professionals by providing an exhibit layout and usage analysis for them. The captured RFID data will be available for aggregation, analysis, and display for the museum’s development use. Animations of visitor paths can demonstrate how individuals travel through the on-site exhibit hall. Dwell time on specific exhibits can be revealed. The data can uncover general patron patterns that can be used to improve the current gallery in regard to flow and assist in the design of future learning experiences.
446
Integrated Systems, Training Evaluations, and Future Directions
Today, museums are often resistant to displaying large portions of their exhibits online due to fears of the Web accessibility reducing the number of on-site visitors. This parochial thought does a disservice to two of the basic tenets of a museum—dissemination of learning experiences and stimulating curiosity to explore further. It also fails to address the desires of individuals who simply are unable to physically travel to that museum. The treasures entrusted to any historical or art museum, for example, are outside the physical reach of a large percentage of the world’s population. This is not an issue of not wishing to visit—it is a true issue of inability. Museums of the future should embrace the connectivity of the Internet—not only for sharing their exhibits and collections with everyone, but for their own economic survival. The recent introduction of the 16 billion pixel digital image of Leonardo da Vinci’s “The Last Supper” available on the Internet at www.haltadefinizione.com has the potential to significantly alter the relationship between the museum and the Internet. Every 15 minutes, groups of 25 visitors are permitted to view the “The Last Supper”—which provides an annual visitation of 320,000 per year. With its placement on the Internet, anyone has the ability to view the masterpiece to a degree beyond that of even an in-person experience. Granted it would be cost and time prohibitive to digitize all historical artifacts and works of art to such a great degree of detail as the “The Last Supper,” but there are multiple possibilities for potential sponsorship and other funding opportunities. As connectivity expands, museums will want to evolve as how they serve their communities, and rich discussion will occur on how they might serve the global community. We can easily imagine the installation of a series of high definition cameras within museums where virtual visitors could peruse a museum at any time of the day from any location. There are many possible operational issues to address, such as having live cameras in exhibit halls and what level of viewing would be possible for exhibits and floor programs, but those issues are beginning to be discussed by multiple industries as connectivity continues to expand. The argument that offering Web based experiences reduces attendance figures can be continually debated. One could argue that an inviting Web portal would encourage many individuals to visit the facility in person. Others could claim that many will refrain from an in-person visit when an online experience could be accomplished from the confines of their lounge chair. The fact is that the museum experience and the online experience, if they are well designed, could not be more different. The museum experience is inherently social, where groups of people visit together, and their discussion, along with interaction with other museum visitors, makes a unique imprint when they leave. Online experiences are inherently personal and customized and serve to allow deeper exploration to a level not possible on a museum exhibit floor. The two experiences can indeed seed each other for ongoing learning and excitement. In addition to connectivity for learners between learning environments, the future will be one where increased networking between museums might be seen. Imagine if many of your museum visits from this moment on could be interconnected. Each exhibit that you find interesting is noted, where museum exhibits
The Future of Museum Experiences
447
are networked and your PDA or cell phone alerts you to exhibits within a selfdetermined distance based on your interests. As you pass through an unfamiliar town you are notified that a small museum contains an artifact or artwork of interest about which you may never have known. Networking is critical to museums in the future—in particular, to smaller museums where collections are often eclectic. Travelers may pick up a brochure that provides a broad overview of a facility and have no idea that within its walls is an artifact that is of particular interest to them. If smaller museums created a national artifact database, your PDA or phone could cross-check the database with your location and alert you to the artifact and guide you directly to the front door of the facility. This could be achieved with automatic alerts, even making you aware of objects of interest as you are driving down the highway or by explicit request with the choice being up to the user THE EVOLVING MUSEUM Museums have always been places with unique personalities. The uniqueness generally comes from a theme and a centerpiece that expresses that theme. Having a centerpiece can, however, be a mixed blessing. In general, the venue that has the greatest initial impact comes with a high price tag. The cost of the exhibit and the organization’s identification with its implicit message make it difficult to replace the exhibit, even years after its novelty and the public’s perception of its relevance have dissipated. The consequence is low attendance and minimal dwell time in a venue that typically consumes a large portion of the museum’s overall real estate. It is in this context of a highly valued but underused and largely static exhibit that the technology of mixed reality can and will make a positive difference. By mixed reality (MR), we are referring to simulation based experiences where the user is placed in an immersive setting that is either real with virtual asset augmentation or virtual with real world augmentation (Milgram & Kishino, 1994). Additionally, in the model proposed in Stapleton and Hughes (2003), the underlying story must draw on the user’s imagination. This latter requirement is needed if the experience is to leave a lasting impression, a clear objective of museums (Stapleton & Hughes, 2006). It is important to differentiate mixed from virtual reality. In a purely virtual experience, a user’s visual system is dominated. This removes the experience from the current physical context. That is, clearly, undesirable when we want to maintain the venue’s existing context. It is also counter to one of a museum’s most important attributes, that of encouraging discussion and social interactions among visitors, especially among family members. In contrast, an MR experience does not exclude the current context; rather, it enhances that context. Visual contact with other humans still exists, and dialog is encouraged as the visitors are collectively surrounded by virtual and physical objects that seem to interact with each other. As the above is vague, an example is in order. In its simplest form, mixed reality is a visual overlay, where virtual objects and virtual signage are placed in front
448
Integrated Systems, Training Evaluations, and Future Directions
of real objects. The mixed real/virtual information is seen either by wearing a head-mounted display (HMD) or viewing the scene through the mediation of a monitor. In the case of an HMD, there are two choices—video see-through and optical see-through (Rolland & Fuchs, 2000, Uchiyama, Takemoto, Satoh, Yamamoto, & Tamura, 2002). In the former, the real world is captured by cameras mounted on the outside of the HMD and the mixed reality is delivered on liquid crystal displays mounted on the inside and aligned with the user’s eye. In the latter, the real world is seen through transparent lenses with the virtual content projected into the user’s field of view. Using a monitor is a less expensive approach and usually involves mounting a camera on the back (nondisplay) side of the monitor. The camera captures reality, and that is merged with virtual content; the merged scene is then rendered onto the display. This paradigm can be extended to use large screens, such a dome screens, so the experience feels immersive. Audio, with real world sounds being heard naturally and synthetic ones being introduced through carefully placed speakers or via headsets, can augment the experience. Speakers are more hygienic and currently more capable of delivering three-dimensional precisely placed sounds (Hughes, 2005). We noted that the easy case is when virtual content overlays physical objects, but MR can also deliver experiences where the real and virtual are intertwined with real objects partially occluding virtual ones and vice versa. This is the kind of MR that we believe can be used to reinvigorate wonderful, but stale content pieces in museums. The goal of the MR community is to achieve the holodeck from Star Trek fame. Such a future is quite probable (see Bimber, 2006, for work already in use at museums), given advances in the field of optics. However, its availability in the museums not directly associated with mixed reality researchers and at a price affordable to smaller- and medium-sized venues is still in the future, and the problem of unchanging exhibits is already being faced by museums that must compete in today’s media centric world. In 2004, to test out the concept of MR bringing an exhibit back to life, our lab developed the Sea Creatures experience at the Orlando Science Center’s DinoDigs exhibition hall (Hughes, Stapleton, Hughes, & Smith, 2005). This venue contains fossils of marine reptiles and fish in a clean, inviting environment. In the midst of this, we brought in a large dome screen, outfitted with a camera on the nonviewing side, which faced the main exhibits. Speakers were added above and around the viewing area. Prior to this, we had digitally scanned the area, acquiring a three-dimensional model that was perfectly registered with (overlaid on) the room’s fixed objects (artifacts, exhibit cases, fossils, and support columns). See Figures 43.1(a) and (b) When visitors walked around to the display side of the dome screen, they were greeted by a virtual guide who informed them of the kinds of sea life that they would have encountered in the Cretaceous Period. This virtual guide appears to be standing on the museum floor in front of all the real activities currently taking place in the hall. Typical young visitors look around to the other side seeing real people who also show up on the screen, but no sign of our helpful guide. Surprising to their family and/or friends, the curious youth is now part of the mixed
The Future of Museum Experiences
449
Figure 43.1. (a) shows the DinoDigs venue with the dome screen nonviewing side that contains a mounted camera to capture the real world.
reality. As the virtual guide leaves, the visitors hear rushing water and the entire venue appears to be experiencing a flood and then the Cretaceous Period reptiles appear. They swim in front of and behind the columns and display cases in the museum, and yet a peek around the side of the dome screen reveals nothing but normal activity. The visitors are encouraged to explore this enhanced world and, through story, seduced to return to the physical space and look more closely once the experience ends. MR has the ability to enhance and attract; its inclusion adds new information and activities to the exhibit. The attraction is that each subsequent visit to the venue has the potential to expose new knowledge. Thus, a museum’s long-term
450
Integrated Systems, Training Evaluations, and Future Directions
Figure 43.1. (b) shows the viewing side of the dome screen with mixed real and virtual content.
centerpiece experience can evolve and remain relevant, encouraging repeat visits and memberships that are so central to a healthy museum. Sea Creatures is just one example of how MR can reinvigorate museum experiences, allowing them to evolve with the new scientific knowledge and new interaction paradigms. Other examples include Geller (2006); Liu, Fernando, Cheok, Wijesena, and Tan (2007); and the extensive work done by the Augmented Reality Group at Bauhaus-University Weimar (Bimber & Raskar, 2005). REFERENCES Bimber, O. (2006). Augmenting holograms. IEEE Computer Graphics and Applications, 26(5), 12–17. Bimber, O., & Raskar, R. (2005). Spatial augmented reality: Merging real and virtual worlds. Wellesley, MA: A K Peters. Falk, J., & Dierking, L. (2000). Learning from museums: Visitor experiences and the making of meaning. Lanham, MD: Altamira Press. Geller, T. (2006). Interactive tabletop exhibits in museums and galleries. IEEE Computer Graphics and Applications, 26(5), 6–11. Hughes, C. E., Stapleton, C. B., Hughes, D. E., & Smith, E. (2005). Mixed reality in education, entertainment and training: An interdisciplinary approach. IEEE Computer Graphics and Applications, 26(6), 24–30.
The Future of Museum Experiences
451
Hughes, D. E. (2005, July). Defining an audio pipeline for mixed reality. In Proceedings of Human Computer Interfaces International 2005-HCII’05 [CDROM]. Mahwah, NJ: Lawrence Erlbaum. Kondo, T., Inaba, R., Arita-Kikutani, H., Shibasaki, J., Mizuki, A., & Minabi, M. (2007). Mixed reality technology at a natural history museum. In J. Trant & D. Bearman (Eds.), Museums and the Web 2007. Toronto, Canada: Archives & Museum Informatics. Available from http://www.archimuse.com/mw2007/papers/kondo/kondo.html Liu, W., Fernando, O., Cheok, A., Wijesena, J., & Tan, R. (2007). Science museum mixed reality digital media exhibitions for children. In 2nd Workshop in Digital Media and Its Application in Museum & Heritage (pp. 389–394.) Washington, DC: IEEE Computer Society. Milgram, P., & Kishino, A. F. (1994). Taxonomy of mixed reality visual displays. IEICE Transactions on Information and Systems, E77-D (12), 1321–1329. Prensky, M. (2001). Digital natives, digital immigrants. On the Horizon, 9(5), 1–2. Rolland, J. P., & Fuchs, H. (2000). Optical versus video see-through head-mounted displays in medical visualization. Presence: Teleoperators and Virtual Environments, 9 (3), 287–309. Stapleton, C. B., & Hughes, C. E. (2003). Interactive imagination: Tapping the emotions through interactive story for compelling simulations. IEEE Computer Graphics and Applications, 24(5), 11–15. Stapleton, C. B., & Hughes, C. E. (2006). Believing is seeing. IEEE Computer Graphics and Applications, 27(1), 80–85. Uchiyama, S., Takemoto, K., Satoh, K., Yamamoto, H., & Tamura, H. (2002). MR platform: A basic body on which mixed reality applications are built. In International Symposium on Mixed and Augmented Reality 2002-ISMAR2002 (pp. 246–256). Washington, DC: IEEE Computer Society.
This page intentionally left blank
ACRONYMS
AA AAR AAV ACT-R ADL ADW AFIST A4I AI AMIRE AO APE API AQT AR ARCI ARI ARNG ASL ASRA ASTD ASW ATELIER ATM AVs AW AWAVS BARS BCG BiLAT BIP BL
America’s Army after action review amphibious assault vehicle adaptive control of thought-Rational Advanced Distributed Learning air defense warfare Abrams full crew interactive simulator trainer advanced active analysis adjunct for interactive multisensor analysis training artificial intelligence authoring mixed reality area of operation auxiliary physics engine application programming interface advanced qualification training augmented reality acoustic rapid commercial off-the-shelf insertion Army Research Institute Army National Guard advanced scripting language Advanced Systems Research Aircraft American Society for Training and Development antisubmarine warfare architecture and technologies for inspirational learning environments automated teller machine autonomous vehicles aviation warfare aviation wide-angle visual system Battlefield Augmented Reality System Brogden-Cronbach-Gleser Bilateral Negotiation basic instructional program blended learning
454
Acronyms
CAD CAN CAPT CARP CAS CATS CAVE CCTT CD CDMTS CFF CFn C4I C4ISR CGF CGI ChrAVE CI CLS CO COC COD COMPTUEXs CORDRA COTS COVE CRM CRS4 CRT CTA CTER CTPS C2 C2V CVE DAGGERS DARPA DDKE Desdemona DESRON DI DIS DI-SAF DLP
computer-aided design Combined Arms Network of DVTE Combined Arms Planning Tool Computerized Airborne Research Platform close air support cognitive avionics tool set cave automatic virtual environment close combat tactical trainer compact disc common distributed mission training system call for fire composable FORCEnet command, control, communications, computers, and intelligence command, control, communications, computers, intelligence, surveillance, and reconnaissance computer-generated forces computer-generated imagery Chromakey Augmented Virtual Environment confidence interval combat lifesaver commanding officer combat operations center Cognitive Delfin Composite Training Unit Exercises Content Object Registry/Repository Discovery and Resolution Architecture commercial off-the-shelf Conning Officer Virtual Environment customer relationship management Center for Advanced Studies, Research, and Development in Sardinia cathode ray tube cognitive task analysis cumulative transfer effectiveness ratio combat trauma patient simulation command and control command and control vehicle collaborative virtual environment distributed advanced graphics generator and embedded rehearsal system Defense Advanced Research Projects Agency data-driven knowledge engineering desorie¨ntatie demonstrator amst destroyer squadron dismounted infantrymen distributed interactive simulation Dismounted Infantry Semi-Automated Force digital light processing (trademark owned by Texas Instruments)
Acronyms DMX DoD DoDD DoDI DOF DMO DMT DSP DVTE DVTE-CAN EAX ECATT/MR ECG ECS EEG EFL ESN ET ETDS EU EWS FAC FATS FBW FCS FFW FiST FMB fMRI FMT FO FOM FOV FPS FRS FRSS FY GAO GFI GIST GM GOTS GPS GUI HARDMAN HCI HD
455
digital multiplex Department of Defense Department of Defense directive Department of Defense instruction degrees of freedom distributed mission operations distributed mission training digital signal processing Deployable Virtual Training Environment Deployable Virtual Training Environment–Combined Arms Network environmental audio extensions embedded combined arms team training and mission rehearsal electrocardiogram emergency care simulator electroencephalogram English as a foreign language European Simulation Network embedded training embedded training for dismounted soldiers European Union Expeditionary Warfare School (USMC) forward air controller firearms training system fly-by-wire future combat system future force warrior Fire Support Team full mission bridge functional magnetic resonance imaging full mission trainer forward observer federation object model field of view first-person shooter Fleet Replacement Squadron forward resuscitative surgical system fiscal year General Accounting Office goodness-of-fit index Gwangju Institute of Science and Technology General Motors Corporation government off-the-shelf global positioning system graphical user interface hardware versus manpower human-computer interface high definition
456 HDTV HEM HFM HLA HMD HMX-1 HPSM HSI HVI IAAPA ICALT ICS ICV IED IEEE IG I/ITSEC ILUMA IMAT INVEST IOC IOS IPTs IR IRS ISMAR ISMT ISNS IT ITER ITK ITS J&J J&JPRD JFCOM JO JSAF JSF JTAC JTEN KR KSAs LAN LCAC LCD LCS
Acronyms high definition television human experience modeler Human Factors and Medicine high level architecture head/helmet-mounted display Marine Helicopter Squadron One, the Presidential Helicopter Squadron human performance systems model human-systems integration/interaction high value individual International Association of Amusement Parks and Attractions International Conference on Advanced Learning Technologies intercockpit communications system infantry carrier vehicle improvised explosive device Institute of Electrical and Electronics Engineers image generator Interservice/Industry Training, Simulation, and Education Conference illumination under realistic weather conditions Interactive Multisensor Analysis Training intervehicle embedded simulation and training Infantry Officer’s Course (USMC) instructor operator station integrated product teams infrared internal referencing strategy International Symposium on Mixed and Augmented Reality Indoor Simulated Marksmanship Trainer integrated shipboard network systems information technology incremental transfer effectiveness ratio Infantry Tool Kit intelligent tutoring systems Johnson & Johnson Johnson & Johnson Pharmaceutical Research & Development Joint Forces Command junior officer Joint Semi-Automated Forces joint strike fighter joint terminal air controller Joint Training and Experimentation Network knowledge of results knowledge, skills, and abilities/attitudes local area network landing craft, air cushion liquid crystal display littoral combat ship
Acronyms LED LRU LSP LTC LVC LW MAGTF MANPRINT MCL MGS MIDI MILES MITAC MITAVES MMOG MMORPGs MMP ModSAF MOS MOT2IVE MOUT MPT MR MR MRC MRMC MSTC MTC MX NASA NATO NAVAIR NCTE NFL NMCI NOPF NPS NRC-FRL NVG OCONUS OFET OIF ONEnet OneSAF ONR ONS
457
light emitting diode line replaceable unit learning support package lieutenant colonel live, virtual, and constructive land warrior Marine Air-Ground Task Force manpower and personnel integration Media Convergence Laboratory mobile gun system musical-instrument digital interface multiple integrated laser engagement system map interpretation and terrain association course map interpretation and terrain association virtual environment system massively multiplayer online game massively multiplayer online role-playing games massively multiplayer modular semi-automated forces military occupational specialty Multi-Platform Operational Team Training Immersive Virtual Environment military operations on urban terrain manpower, personnel, and training mission rehearsal mixed reality Marmara Research Center Medical Research and Materiel Command medical simulation training centers mission training center mixed reality toolkit National Aeronautics and Space Administration North Atlantic Treaty Organization Naval Air Systems Command Navy Continuous Training Environment National Football League Navy Marine Corps Internet Naval Ocean Processing Facility Naval Postgraduate School National Research Council Flight Research Laboratory night vision goggle outside contiguous United States Objective Force Embedded Training Operation Iraqi Freedom OCONUS Navy Enterprise Network One Semi-Automated Forces Office of Naval Research operational needs statement
458
Acronyms
OOD officers of the deck OPL Operator Performance Laboratory ParaSim parachute simulator PC personal computer PCC pre-command course PDA personal digital assistant PFC prefrontal cortex PLA personal learning associate PTSD post-traumatic stress disorder QTEA quality of training effectiveness assessment radar radio detection and ranging RAM random access memory RAP Ready Aircrew Program R&D research & development RDECOM Research, Development, and Engineering Command RDECOM STTC Research, Development and Engineering Command, Simulation and Training Technology Center RFID radio frequency identification RGB red, green, and blue ROM rough order of magnitude RTI run-time infrastructures SAF semi-automated forces SAGAT situation awareness global assessment technique S&T science and technology SAPS stand-alone patient simulator SBIR Small Business Innovative Research SCORM Sharable Content Object Reference Model SCP School for Command Preparation SE standard error SE systems engineering SET sonar employment trainer SET-MR scalable ET and mission rehearsal SGI Silicon Graphics, Inc. SIGGRAPH Special Interest Group on Graphics and Intermixed Techniques SIMILAR state, investigate, model, integrate, launch, assess, and reevaluate SIMNET simulation network 6 DOF 6 degrees of freedom SLEP service life extension program SLOC source lines of code SMART simulated mission and rehearsal training SME subject matter expert SMMTT submarine multimission team trainer SORTS status of resources and training system SO3 study of organizational opinion SPAWAR space and naval warfare SSP sound speed profile STA sensory task analysis
Acronyms STDA STE STI STL STO STOW STTC SVS SWAT SWOS TacOpsMC TBI TC3 TD TDS TECOM TEE TER 3-D 3G TM&SMP TNO TNA TOPS ToT TRADOC TRPPM TTE UA UAS UAV UCD UCF UCF-IST UNREP USACOM USAF USE USMC UV UV V&V VBS-1 VCSA VE VEAAAV VEHELO
sonar tactical decision aid synthetic task environment Systems Technology, Inc. stereolithography science and technology objective synthetic theater of war Simulation and Training Technology Center soldier visualization station special weapons and tactics Surface Warfare Officers School Tactical Operations Marine Corps traumatic brain injury tactical combat casualty care training device tactical decision simulation Training and Education Command (USMC) training effectiveness evaluation training effectiveness ratio three-dimensional third generation Training Modeling and Simulation Master Plan Netherlands Organisation for Applied Scientific Research training needs analysis tera operations per second transfer of training Training and Doctrine Command training planning process methodology tactical training equipment utility analysis unmanned aerial system unmanned aerial vehicle user-centered training system design University of Central Florida University of Central Florida Institute for Simulation and Training underway replenishment U.S. Atlantic Command (now JFCOM) United States Air Force user scrutiny event U.S. Marine Corps ultraviolet unmanned vehicle verification and validation Virtual Battlefield System 1 Vice Chief of Staff of the Army virtual environment Virtual Environment Advanced Amphibious Assault Vehicle Virtual Environment Helicopter
459
460 VEL VELCAC VESUB VIRTE VLNET VMS VMT VR VRLab VRVis XML
Acronyms Virtual Environment Laboratory Virtual Environment Landing Craft, Air Cushion Virtual Environment Submarine Virtual Technologies and Environments virtual life network voyage management system virtual maneuver trainers virtual reality Virtual Reality Lab Virtual Reality and Visualization Extensible Markup Language
INDEX
AA (America’s Army) game, 5, 95–96, 414 AAR (after action review) process, 128, 135 AAV (amphibious assault vehicle) crew trainer, 120–21, 123 Abrams tank, 33, 85, 86, 88–89, 331. See also Close combat tactical trainer (CCTT) Accenture Technology Labs, 413, 415 Ackerman, N. D., 393 Ackerman, R. L., 153 Acoustic laboratories, 63–67, 64f Acoustic rapid commercial off-the-shelf insertion (ARCI) combat system, 67, 69 Acquisition management process, 18–25 Active learning, 148 ACT-R (adaptive control of thought– rational) theory of learning, 198 Adaptability, 154, 198–200 Adaptive control of thought–rational (ACT-R) theory of learning, 198 Advanced Acoustic Analysis Adjunct for IMAT (A4I), 70 Advanced Distributed Learning (ADL) initiative, 440–41 Advanced Interactive Systems, Inc., 42, 235 Advanced technology development, described, 10 ADW (air defense warfare) scenarios, 198–99
Affective behavior, 150, 164 Afghanistan, 93, 132, 351 n.4 AFIST (Abrams full crew interactive simulator trainer), 86 A4I (Advanced Acoustic Analysis Adjunct for IMAT), 70 After action review (AAR) process, 128, 135 AI (artificial intelligence), 367–68, 378–79 Air Combat Command, 81 Air defense warfare (ADW) scenarios, 198–99 Air Force. See U.S. Air Force Air Force Research Laboratory, 78. See also Distributed mission operations (DMOs) Airport screeners, 396 Alertness, 388 Alfred P. Murrah Federal Building, 400–401 Alliger, G. M., 150 AMD (ATI), 409 American Society for Training and Development (ASTD), 243 America’s Army (AA) game, 5, 95–96, 414 Amphibious assault vehicle (AAV) crew trainer, 120–21, 123 Analog computational programs, xiv Analog-intensive vehicles, 87 Analysis of variance, 248 Ancient China, xiii–xiv, 436
462
Index
Ancient Romans, 436 Anderson, John R., 198 Andrews, D. H., 394 Animatic, defined, 275 n.1 ANT-18 Basic Instrument Trainers, 1 Antisubmarine warfare (ASW), 62–63, 73. See also Interactive Multisensor Analysis Training (IMAT) program Anxiety therapies, 428–32 APEs (auxiliary physics engines), 261–62 API (application programming interface), 259 Apple’s Mac OS 10.4, 366 n.3 Application programming interface (API), 259 Applied research, described, 9–10 Aquila, 18 AR. See Augmented reality Arachnophobia, 428 Arcade games, 280 Architecture and technologies for inspirational learning environments (ATELIER), 143 ARCI (acoustic rapid commercial off-the-shelf insertion) combat system, 67, 69 Arizona Highway Patrol, 224 Armstrong, R. E., 401 Army. See U.S. Army ARNG (U.S. Army National Guard), 332–34 Arthur, K., 423 Artificial intelligence (AI), 367–68, 378–79 Artillery Training School, 120–21 ARToolkit, 257 Ashley project, 140 Aspergers’ spectrum disorder, 369 ASRA (Bell 412 Advanced Systems Research Aircraft), 113, 114 Assessing technology design solutions, 327 Assessment of training: analysis process, 330; of collective training, 324–35; of collective training, challenges to, 324– 26; of collective training, controlling dependent versus independent
variables, 325; of collective training, leveraging customer relationship management principles, 326–27; of collective training, persistent training assessment, 334–35; of collective training, methodology for planning and executing assessments, 327–34; of competency prior to, 394–95; databases, 330; defined, 324; deploy, track, and receive questionnaires, 329; empirically based evaluations or testing, 324–25; filtering questions, 329; impact of assessments on the training events supported, 325; overview, xv; point of sales data collection, 335; reporting results, 330. See also Measurement of performance; Training effectiveness evaluation (TEE) ASTD (American Society for Training and Development), 243 ASW (antisubmarine warfare), 62–63, 73. See also Interactive Multisensor Analysis Training (IMAT) program Asymmetric Warfare—Virtual Training Technologies, 131–37; avatars, 132; communications, 132–33; development of system prototypes, 134–35; performance standards, 133; system high level design, 133–34; technology requirements, 133; terrain development, 132; test and evaluation, 135–37 ATELIER (architecture and technologies for inspirational learning environments), 143 Structural equation modeling model, 274 ATI (AMD), 409 Atlas Intercontinental Ballistic Missile Program, 10 Attack Center Trainer, xiv Attia, A., 180 Attitudinal outcomes, 164–65 Audio, 259–60, 359, 360 Augmented reality (AR), 278–87; advanced concepts impacting AR, 284;
Index anomalies in the AR visualization, 284; application domains, 280–82; components, 282–83; deployment benefits, 286–87; described, xv, 278– 79, 428; human factors/mobility, 283; limitations, 280; technical requirements, 279; utility for training, 285–86; virtual reality (VR) and, 281, 286; virtual reality (VR) versus, 278, 282, 286 Augmented virtuality, described, 428 AuSIM 3D Goldminer system, 258–59 Austria, 141, 143 Automated vehicle location, 352 n.6 Automation: automation bias, 28; automation surprise, 27; failures in, 28; human-automation interface problems, 29; implicit knowledge lacking in, 29– 30; mode proliferation, 29; promise and challenges of, 28–29; that model humans, 30–31; trust and the complacency/distrust continuum, 28–29 Autonomous vehicles (AVs), 25–26 Autostereoscopic displays, 425 Auxiliary physics engines (APEs), 261–62 Avatars: America’s Army (AA) game, 95–96; conversational, 369–70; for corporate training, 339, 413; cultural and nonverbal communications, 133; described, 131, 132, 413; exercise by, 369 n.7; future of VE training and, 387, 435, 438; human-systems integration (HSI) for naval training systems, 30– 31; “In the Uncanny Valley,” 337, 339, 340, 341–48; personal VEs and, 356, 360–62, 365, 367, 368–70; simulation training in the MAGTF, 378. See also Games and gaming technology for training Avian flu epidemics, 401 Aviation warfare (AW) apprentice school, 63 Aviation wide-angle visual system (AWAVS) program, 176 AVs (autonomous vehicles), 25–26 Azuma, R. T., 278–79
463
Bach-y-Rita, P., 360 Bahil, A. T., 11 Bailey, Mike, 35–36 Baker, E. L., 208 Baldwin, T. T., 199, 201–2 Ball, J. T., 396 Balladares, L., 140 Bandwidth improvements, 364–66 Banker, W. P., 148 Barfufl, H., 280 Barnett, Scott, 4 BARS (Battlefield Augmented Reality System), 281–82 Bartz, D., 282 Battle Command 2010, 5 Battlefield Augmented Reality System (BARS), 281–82 Battle of 73 Easting, 351 BattlePlex, 354 Battle School (graduate level), 354 Battle School (undergraduate), 353–54 BCG (Brogden-Cronbach-Gleser) model, 179–80, 179f Beatty, W. F., 74 n.1 Behavioral measures, 150 Behringer, R., 283 Belcher, D., 281 Bell Carroll, M., 251 Bell 412 Advanced Systems Research Aircraft (ASRA), 113, 114 Bennet, E., 281 Bennett, W., Jr., 394–95 Bessmer, D., 303 Bewley, W. L., 208 Bilateral Negotiation (BiLAT), 125–26, 127–29 Billinghurst, M., 281 Bilski, T. R., 400 Bimber, O., 282 Biosensing devices, 432 Bioterrorism attacks, 401 Blade, R. A., 200 Blended fidelity, 161–62 Blended learning (BL), 7 Blindness, 388 Bloom, B. S., 150, 164, 438 Blue Box Trainers, 1
464
Index
Blue force tracking, 352 n.6 Blue screen technique, 217–19. See also Chromakey process Bluetooth cellular technology, 280 Bolas, M., 422 Boldovici, J., 303 Bolton, A., 303 Books, history of, 436 Booth, K. S., 423 Boredom, 28, 153 “Bottom line.” See Transfer effectiveness evaluations Bowers, C. A., 151 Bradley fighting vehicles, 85, 86, 88–89, 331. See also Close combat tactical trainer (CCTT) Brain plasticity, 429–30 Brandon Hall Research, 416 Brogden-Cronbach-Gleser (BCG) model, 179–80, 179f Bronstone, A., 431 Brown, D. G., 282 Buckwalter, J. G., 201 Budgets. See Costs Buffers, 259 Burke, C. S., 151, 153–54 Burn-out effect, 108 Burright, B., 394 Business training. See Corporate training in VEs Butz, A., 283 C2 (distributed simulation as a command and control) system, 350 CAD (computer-aided design) systems, 408, 409, 410 Caffeine, 388 Call for fire (CFF), 310, 315–16 Calvert, K., 424 Camp Pendleton, California, 378–79 Canada, 139, 142 Canadian Personal Weapons Test, 207 CAN (Combined Arms Network), 36–38 Card, Orson Scott (Ender’s Game), 337, 338, 339, 340, 353 Carey, J., 331 Carey, L., 331
Carnegie Mellon University, 198, 220 CARP (Computerized Airborne Research Platform), 113, 114 Carreto, C., 140 Carson, J. L., 209 CAS (close air support), 310 Cathode ray tube (CRT) monitors, 406, 423 CATS (cognitive avionics tool set), 107–8 Cavazza, M., 280 CAVE (cave automatic virtual environment), 363, 420 CCTT (close combat tactical trainer), 33, 331–34 CD (compact disc) standards, 359 CDMTS (common distributed mission training station), 108–10, 109f, 111– 12, 114 Cell phones, 444 Center for Advanced Studies, Research, and Development in Sardinia (CRS4), 141 CFF (call for fire), 310, 315–16 CFn (FORCEnet), 70–71 CGI (computer-generated imagery), 218–19 Chadwick Carreto Computer Research Center, 140 Chambers, W. S., 176 Champney, R. K., 251 Channels, 259–60, 262 Charles, F., 280 Chatelier, P. R., 82 Chemical capabilities, 362, 388 Chen, A., 431 Chen, G., 201 China (ancient), xiii–xiv, 436 Chromakey process, 217, 219–22, 221f, 222f; Chromakey Augmented Virtual Environment (ChrAVE), 220, 297– 305, 299f, 300f, 301f Chung, G. K., 208 Clark, E. V., 157–58 Clark, R. E., 438–39 Classrooms, virtual, 369 Clemson Shoot House. See Measurement
Index of performance: instrumentation for the recording of live building clearing exercises Close air support (CAS), 310 Close Combat Marines, 35, 38 Close combat tactical trainer (CCTT), 33, 331–34 Closed-loop communication, 153–54 Coalescent Technologies Corporation, 36 COD (Cognitive Delfin), 113 Cognitive avionics tool set (CATS), 107–8 Cognitive capabilities as transfer mechanism, 198 Cognitive Delfin (COD), 113 Cognitive impairments. See Rehabilitation Cognitive outcomes, 164–65 Cognitive task analysis (CTA), 128 Cohn, J., 243, 251 Cold War, 10 Colegrove, C. M., 394–95 Collaborative virtual environments (CVE), 140 Collada, 410 Collyer, S. C., 176 Colquitt, J. A., 198 Combat operations center (COC), Camp Pendleton, 378–79 Combat trauma patient simulation (CTPS), 100–101, 102–4. See also Medical simulations Combined Arms Network (CAN), 36–38. See also Deployable Virtual Training Environment (DVTE) Commercial off-the-shelf (COTS) games, 34–35, 36, 421, 424. See also Games and gaming technology for training Commercial VE expressions, 387–88 Common distributed mission training station (CDMTS), 108–10, 109f, 111– 12, 114 Complacency/distrust continuum, 28–29 Composite Training Unit Exercises (COMPTUEXs), 69 COMPTUEXs (Composite Training Unit Exercises), 69
465
Computational replicates, 396 Computer-aided design (CAD) systems, 408, 409, 410 Computer-generated imagery (CGI), 218–19 Computerized Airborne Research Platform (CARP), 113, 114 Computer packaging, 284 Computer technology, growth in, 361–66; costs, 357, 361, 364–65, 406–7; current competitive environment, 408–9; size of early computers, 406–7; software systems, 366–67. See also specific components and systems by name; Trends in VE Concurrent Versions System, 58 Conferencing systems, 281 “Congressional plus-up,” 35, 39 Conning Officer Virtual Environment (COVE), 56–58, 56f Connor, B. B., 431 Content Object Registry/Repository Discovery and Resolution Architecture (CORDRA), 441 Control Central. See “In the Uncanny Valley” Control streams, 262 Convergence. See Trends in VE Conversational avatars, 369–70 Corbett, A., 438 CORDRA (Content Object Registry/Repository Discovery and Resolution Architecture), 441 Corn, J., 355 Corporate training in VEs, 413–19; costs, 414; employees’ experience with video games, 414, 418; English as a foreign language (EFL), 417–18; focus on human-to-human interactions within the environment, 413; Johnson & Johnson (J&J), 416–17; resistance to, 414, 416–17, 418; technology trends, 413–14 Cosby, Neale, 351 n.4 Costs: assessment of investment in VEs, 326–27; of assessments, 325; of computer technology, 240–41, 357,
466
Index
361, 364–65, 406–7; of corporate training in VEs, 414; cost justification, 178–79; of entry for industrial applications, 406–7; of fuel, 78, 82, 173, 393–94; helicopter training, 290, 291; of immersive mixed reality, 384; LCD computer displays, 363; license fees, 119; of medical simulations, 403; of museum experiences, 447; of personal VEs, 356–57; time-cost models, 176–78; trade-off models, 174–76; utility analysis, 179–80 COVE (Conning Officer Virtual Environment), 56–58, 56f Cox, B. D., 202 Craig, Douglas. See “In the Uncanny Valley” Crawl, walk, run method, 309 Criterion contamination, 151 Criterion deficiency, 150–51 CRM (customer relationship management) challenge, 326–27 Cronbach, L. J., 179–80 CRS4 (Center for Advanced Studies, Research, and Development in Sardinia), 141 CRT (cathode ray tube) monitors, 406, 423 Cruz-Neira, C., 420 CTA (cognitive task analysis), described, 128 CTER (cumulative transfer effectiveness ratio), 168–69, 168f CTPS (combat trauma patient simulation), 100–101, 102–4. See also Medical simulations Cue fidelity, 162, 163–64 Cultural practices and awareness: as component of Battle School, 354; cultural bias versus game-like software, 416–17, 418; features within training programs, 38, 48, 52, 54, 82, 125–27, 132–33, 135, 390; future of VE training in the Army, 390; of game players, 5; global training market, 414; MAGTF training, 378; organizational, 201, 414; TM&SMP (Training
Modeling and Simulation Master Plan), 381; transfer of via museum experiences, 444 Cumming, R. W., 212–13 Cumulative transfer effectiveness ratio (CTER), 168–69, 168f Customer relationship management (CRM) challenge, 326–27 CVE (collaborative virtual environments), 140 DAGGERS (distributed advanced graphics generator and embedded rehearsal system) project, 92–98, 97f; architecture, 94–95; concept of operation, 94; early STO objectives, 93; future applications, 96–97; future of DAGGERS’ successors, 97–98; game based environments and applications, 95–96 Daily living training, 262, 269–70, 270f, 271, 428–29 Daniel, D. C., 401 Darken, R. P., 148 DARPA. See Defense Advanced Research Projects Agency Data collection, 322, 335 Data-driven knowledge engineering (DDKE), 57 DataGlove, 420 Da Vinci, Leonardo, 446 Davis, Bette, 7 Day, E. A., 198 DDKE (data-driven knowledge engineering), 57 Dead reckoning, 304 Dean, F. F., 11 Decision making via virtual worlds, 369 Declarative knowledge, 202 Deering, Michael, 420 Defeat and failure within training context, 108 Defense Advanced Research Projects Agency (DARPA): Battle of 73 Easting, 351; convergence following successes sponsored by, 349; modular semi-automated forces (ModSAF), 34;
Index SIMNET (simulation network), 33, 331, 332–34, 351, 386; synthetic theater of war (STOW), 59; Tactical Iraqi, 38 Defense Authorization Act for Fiscal Year (FY) 2001, 35 Defense Modeling and Simulation Office, 59 De Florez, Luis, xiv Delacruz, G. C., 208 Deliberate practice, 304 Dell, Inc., 409 DELTA3D, 119 Denmark, 142 Deployable Virtual Training Environment—Combined Arms Network (DVTE-CAN), 308–22; corresponding instruments, 311–14; described, 309; DVTE Application Variants, 312–13t; experimental design and procedure, 310–11; lessons learned, 318–22, 319–20t; methods, 309–10; results, 314–17 Deployable Virtual Training Environment (DVTE), 33–39, 119–20, 121–22, 123; commercial off-the-shelf (COTS) games in the U.S. Marine Corps, 34–35; demonstrations and transitions, 39; described, 33; development of system prototypes, 36, 38–39; DoD distributed simulation, history of, 33–34; operational or user considerations, 37–38; requirements specification for, 35–36; shipboard simulators for Marine Corps operations, 34–35 Desdemona (DESorie¨ntatie DEMONstrator Amst), 140 Design based transfer mechanisms, 197, 200–201 Design process, overview, 18–25 DESRON (destroyer squadron) commodore, 69 De Vries, L. F., 208 Dick, W., 331 Dierking, L., 445 Di Giacomo, T., 139 Digital computers, history of development, xiv
467
Digital light processing (DLP) displays, 423–25 Digital multiplex (DMX) effects, 254–56, 260–61 “Digital native” generation, 444 Digital signal processing (DSP) effects, 259 DinoDigs exhibition hall, 265–66, 266f Disabilities. See Rehabilitation Dismounted combatant simulation training systems, 232–41; assessment results, 235–38; Combined Ratings of Simulator Capability (2002–2005), 236–37t; Combined Training Effectiveness Questionnaire Results, 238t; described, 232, 263; direct interaction with the simulated physical environment, 233; direct person-to-person interaction unmediated by equipment, 233; Dismounted Infantry Semi-Automated Forces Operator Station, 235; Dismounted Infantry Virtual After Action Review System, 235; early history, 233–34; emphasis on physical activity, 233; environment, 232–33; recent history, 234–35; recommendations, 240–41; soldier task performance, 239–40; training effectiveness, 239; Training Effectiveness Questionnaire Results, 238t Disneyland, 6 Distributed interactive simulation (DIS) architecture, xiv Distributed mission operations (DMOs), 77–83; benefits of, 82–83; evaluation of, 80–81; future of, 82–83; impact of DMO virtual environments, 81–82; methods and issues, 79–80; technologies, 78–79; U.S. Air Force training and operational challenges, 77–78. See also Marine Air-Ground Task Force (MAGTF) Distributed mission training (DMT), 78 Distributed practice, described, 148 Distributed simulation as a command and control (C2) system, 350
468
Index
Dizziness. See Simulator sickness DLP (digital light processing) displays, 423–25 DMOs. See Distributed mission operations DMX (digital multiplex) effects, 254–56, 260–61 Doom (computer game), 4, 34 Drab, S., 257 The Dream of the Red Chamber (Ts’ao Hsueh-ch’in), 341 n.2 Driving simulations, 224–30, 227f, 228f, 230f, 405 Drugge, M., 284 DSP (digital signal processing) effects, 259 Dual reality portals, 230 Dugdale, B., 153 DVTE. See Deployable Virtual Training Environment DVTE-CAN. See Deployable Virtual Training Environment—Combined Arms Network EAX (environmental audio extension) 2.0, 259 ECATT/MR (embedded combined arms team training and mission rehearsal), 88–89 ECS (Emergency Care Simulator), 101–2 Education. See MR Sea Creatures EEG (electroencephalogram), 111, 112– 13, 114, 427–28, 429 E! Entertainment, 6 Effectiveness of training systems. See Training effectiveness evaluation (TEE) Effectiveness of weapons systems, 21, 43–48, 46t Effects of exposure to VE systems. See Simulator sickness EFL (English as a foreign language), 417–18 Ehnes, J., 282 Elderly, 369–70 Electrocardiogram (ECG), 111, 112–13, 114
Elms, Inspector General LTC, 2 Embedded combined arms team training and mission rehearsal (ECATT/MR), 88–89 Embedded training (ET): appended, 86– 88; army combat systems, 85–90; defined, 23, 85; Embedded training for dismounted soldiers (ETDS) science and technology objective (STO), 92– 93, 95–96; FCS requirements, 85–86; fully embedded, described, 86; overview, 3–4; umbilical, 86–87 Emergency Care Simulator (ECS), 101–2 Emotional fidelity, 201 Empirically based evaluations or testing, 324–25 Ender’s Game (Card), 337, 338, 339, 340, 353 Endoscopy, 141 Endsley, M. R., 154 Engagement Skills Trainer 2000, 42 English, N., 209 English as a foreign language (EFL), 417–18 Entertainment industry: augmented reality (AR), 280; film industry, 6–7, 217–19. See also Museum experiences Environmental audio extension (EAX) 2.0, 259 Epictetus, 377 Epidemics, 401 Ericsson, K., 304 Escher, M. C., 341, 344 ESN (European Simulation Network), 142 Espenant, M., 283 ET. See Embedded training ETDS (embedded training for dismounted soldiers), 92–93, 95–96 Etherington, T., 111 Ethernet, 364–65 European Aeronautic Defence and Space Company, 283 European Simulation Network (ESN), 142 European Union (EU), 139, 143. See also specific countries by name
Index Evaluation of training. See Training effectiveness evaluation (TEE) EverQuest, 437 Expeditionary Warfighter Training Group Pacific, 316–17, 318. See also USMC Deployable Virtual Training Environment—Combined Arms Network (DVTE-CAN) Exposure therapy, 428–32 Extensible markup language (XML) scripting language, 261, 263 Eye binocular field of view, 358–59 Eye displacement, 300 Eyephone, 420 Facial expression detection, 363 Failure within training context, 28, 108 Falk, J., 445 FATS, Inc., 36, 42 FBM Facility, xiv FCS (future combat system), 85–86 FDA (U.S. Food and Drug Administration), 400 Fear treatment, 428–32 Feedback: after action review (AAR) process, 128, 135; customer input, 327; knowledge of results (KR), 108, 113, 149; lack of, 149 Fei, Z., 424 FFW (future force warrior), 82–83 Fidelity: blended, 161–62; cue, 162, 163– 64; described, 152, 200; determining fidelity requirements for simulators, 211–12; in DMOs, 78–79; emotional, 201; functional, 200–201; identical elements theory, 200–202; IMAT project, 72–73; limitations of rational and replication approach, 212–13; in marksmanship simulation, 210–13; in medical simulation, 105, 403; physical, 200; psychological, 200–201; sensory, 397; trade-off models, 174–76 Fidopiastis, C. M., 428 Fielded navy virtual environment training systems. See U.S. Navy Field of view (FOV), 358–59
469
Film industry, 6–7, 217–19. See also Entertainment industry Filtering questions, 329 Final Fantasy, 437 Finland, 143 1st Marine Expeditionary Force Battle Simulation Center, 378–79 First-person shooter (FPS) application, 36 Fischer, J., 282 Fisher, S. S., 420 Fish tank VR, 423 FiSTs (Fire Support Teams), 308, 309. See also USMC Deployable Virtual Training Environment—Combined Arms Network (DVTE-CAN) Fletcher, J. D., 438 Flight simulators: fuel-cost savings, 78, 82, 173, 393–94; history of, xiv; trends in, 350–51. See also Helicopter training; U.S. Air Force Flying, fear of, 428 FMB (full mission bridge), 58 FMRI (functional magnetic resonance imaging), 430–31, 432 Forbes (magazine), 416 FORCEnet (CFn), 70–71 Ford, J. K., 164, 198–99, 201–2 Ford, K., 150 Forrester Research, 414 Forterra Inc., 417 Fortune (magazine), 416 Forward resuscitative surgical system (FRSS), 399–400 Foushee, C. H., 22 FOV (field of view), 358–59 Foxlin, E., 283 FPS (first-person shooter) application, 36 Franklin, M., 282 Freeze strategy, 79, 80 Friedman tests, 248 Fro¨ehlich, B., 420 FRS (H-46 Fleet Replacement Squadron) —HMM(T)-164, 304–5, 305f FRSS (forward resuscitative surgical system), 399–400 Fuel costs, 78, 82, 173, 393–94 Full mission bridge (FMB), 58
470
Index
Fully embedded training, described, 86. See also Embedded training Functional fidelity, 200–201 Functional magnetic resonance imaging (fMRI), 430–31, 432 Fused reality, 217–31; chromakey process, 220–22, 221f, 222f; demonstration, 222–29; described, 217–18; future features using fused reality, 230; history of chromakey, 218–20; virtual deployment of physical objects, 229–30, 229f, 230f; visual system, 222 Future combat system (FCS), 85–86 Future force warrior (FFW), 82–83. See also DAGGERS Games and gaming technology for training, 125–30; assessing experience level of instructors, 129; assessing prior experience of trainees, 128–29; challenges to utilizing, overview of, 5; congressional funding for, 35–36, 39; constructive simulation, 388–89; for corporate training, 413–15; DAGGERS, 95–96; described, 125; developing PC technology, 408–9; establish learning objectives, 127; “knobology,” 129; license fees, 119; linking game activities to learning objectives, 128; massively multiplayer online games (MMOGs), 131–37, 413– 15, 437; performance assessments and feedback, 128; reconstruction of battles from Iraq War, 352; for social interaction, 368–70; for therapy, 368– 70; trends in, 4–5, 352–53; U.S. Army Research, Development and Engineering Command, Simulation and Training Technology Center (RDECOM STTC) recommendations, 92–93, 100, 103–4, 105, 126; for U.S. Marine Corps (USMC), 34–35. See also Avatars; specific games by name Gandy, M., 280 General Accounting Office (GAO), 21 General Motors Corporation (GM), 420
Geneva, Switzerland, 139 Germany, 142 Gestalists, 196–97 GFI (goodness-of-fit index), 273 GIST (Gwangju Institute of Science and Technology), 141 Gleser, G. C., 179–80 Global positioning system (GPS), 281 Gluck, K. A., 396 GM (General Motors Corporation), 420 God’s eye view, 82 Goldiez, B. F., 279, 281 Goodale, M. A., 212 Goodness-of-fit index (GFI), 273 Google, 368 Google Earth, 387 Gorbis, M., 437 Gorman, P. F., 233 GOTS (government off-the-shelf ) simulations, 36, 38 GPS (global positioning system), 281 Grant, S. C., 209 Graphical user interface (GUI) protocol, 263, 409 Grasset, R., 281 Green screens. See Chromakey process Greiner, G., 280 Griffioen, J., 424 GUI (graphical user interface) protocol, 263, 409 Gulf wars, 351–53. See also Iraq War Gunzelmann, G., 396 Gupta, A., 281 Gustatory displays, 360 Gwangju Institute of Science and Technology (GIST), 141 Habit-forming mistakes, 108, 149 Haccoun, R. R., 166 Hagman, J. D., 209, 210 Half-Life game engine, 89–90 Haller, M., 257 Hamtiaux, T., 166 Hancock, P. A., 291–93, 396 Hand position tracking, 363 Hand-to-eye coordination, 300 Haptic vests, 260–61
Index HARDMAN, 22 Hart, J., 420 Hart, S. G., 396 Hartmann, W., 257 Hays, R. T., 125, 126, 127, 128, 177 HCI (human computer interface), 53 HDTV (high definition television), 356– 57, 365 Headaches. See Simulator sickness Head-mounted displays (HMDs): challenges and limitations of, 98, 257, 421– 22; field of view (FOV), 302; history of development, 420–21; museum experiences, 448; overview, 421–22; Sword of Damocles, 420 Heart-rate variability, 191–94, 194f Helicopter Antisubmarine Squadron Ten (HS-10), 294–96 Helicopter training, 290–306; chromakey augmented VE, 297–305, 298f, 299f, 300f, 301f, 303f, 305f; map interpretation and terrain association VE, 291–97, 292f, 293f, 294f, 295f, 297f HEM (human experience modeler) testbed, 431–32 Hewlett-Packard Company, 409 HFM (Human Factors and Medicine) Panel, 142 High definition television (HDTV), 356– 57, 365 High level architecture (HLA), 34, 36, 89–90, 122–23 High Performance Computing, 141 High value individual (HVI), 378–79 Hirose, M., 282 Hirota, K., 282 Hitchcock, Alfred, 217 HLA (high level architecture), 34, 36, 89– 90, 122–23 HMD. See Head-mounted displays (HMDs) HMX-1 (Marine Helicopter Squadron One), 296–97, 297f, 298f Ho, Y. S., 141 Hollywood. See Film industry Holodeck, 6, 448
471
Holografika image, 425 Honeycutt, E., 180 Horrigan, B., 355 Horvath, M., 153 HPSM (human performance systems model), 13–16, 13f, 14t HSI. See Human-systems integration Hue saturation value, 220–21 Hughes, C. E., 447 Hughes Research Laboratories, 34 Human auditory system, 359 Human-automation teams, 29 Human computational capability, 360–61 Human computer interface (HCI), 53 Human experience modeler (HEM) testbed, 431–32 Human eye binocular field of view, 358–59 Human Factors and Medicine (HFM) Panel, 142 Human-in-the-loop entities, 79–80 Human performance systems model (HPSM), 13–16, 13f, 14t Human-systems engineering, 27–32 Human-systems integration (HSI) for naval training systems, 18–25; defined, 19–20; future developments, 24–25; history of, 20–24 Humphries, J., 420 Huntington, S. P., 390 Hybrid therapies. See Mindscape retuning and brain reorganization with hybrid universes Hybrid vision-inertial self-trackers, 283 H-46 Fleet Replacement Squadron (FRS) —HMM(T)-164, 304–5, 305f IBM Blue Gene, 361, 364 ICALT’03 (International Conference on Advanced Learning Technologies), 415 ICS (intercockpit communications systems), 304 Identical elements theory, 149, 196–203; design based transfer mechanisms, 200–201; level of analysis, 201–2;
472
Index
person based transfer mechanisms, 197–200; situation based transfer mechanisms, 201; theory described, 196–97, 202 IEEE (Institute for Electrical and Electronics Engineers Standard for Distributed Interactive Simulation), 135, 416 I/ITSEC (Interservice/Industry Training, Simulation, and Education Conference), 2, 114, 117, 123 Ilities, 21 Illness. See Simulator sickness ILUMA (illumination under realistic weather conditions) model, 59 IMAT. See Interactive Multisensor Analysis Training program IMAT.Explore, 71–73 Immersive mixed reality, 382–84. See also Mixed reality (MR) Implicit knowledge (tacit knowledge), 29–30 Incremental transfer effectiveness ratio (ITER), 167–69, 168f, 174–75, 174f, 175f, 246 Individualized, computer-assisted learning, 438 Indoor Simulated Marksmanship Trainer (ISMT), 42 Industrial applications of VEs, 405–11; cost of entry, 406–7, 408–11; current status, 408–10; data creation and/or conversion, 408, 410; driving simulations, 224–30, 227f, 228f, 230f, 405; examples of, 406; future developments, 410–11; software availability, 407, 409–10, 411 Infantry and marksmanship training systems, 41–48; challenges of, 214; Diversity of Locomotion Interfaces for Virtual Environments, 47f; future developments, 48; immersive systems, 42–43; marksmanship systems, 41–42; Partial Summary of Recent VE Effectiveness Evaluations, 46t; Partial Summary of VE Infantry and Marksmanship Trainers, 44–45t;
research on training effectiveness, 43–48. See also Small arms simulation Infantry Officers Course (IOC), 309–10, 314–15, 318. See also USMC Deployable Virtual Training Environment—Combined Arms Network (DVTE-CAN) Infantry Tool Kit (ITK), 36. See also Deployable Virtual Training Environment (DVTE) Infrared (IR) reflection, 230 Institute for Creative Technologies (ICT), 258 Institute for Defense Analysis, 351 n.4 Institute for Electrical and Electronics Engineers Standard for Distributed Interactive Simulation (IEEE), 135, 416 Institute for Simulation and Training at the University of Central Florida, 431–32 Institute of Medicine of the National Academy of Sciences, 399 Integrated shipboard network systems (ISNS), 67 Integrated training systems, overview, 1–9; history of, 1–2; recent past, 2–3; trends: embedded training, 3–4; trends: Hollywood as partner, 6–7; trends in games: individual, team, massively multiplayer and mobile, 4–5 Intel Corporation, 437 Intelligent tutoring systems (ITS), 291, 439 Interactive Multisensor Analysis Training (IMAT) program, 62–74; deployable sonar operations training, 66–69, 68f; described, 62–63; evaluations of training effectiveness, 66; integrated IMAT training and performance support for theater level ASW Operations, 72–73; learning and performance support systems, described, 63; lessons learned from, 73; shore school based IMAT training, 63–66, 64f; surface platform,
Index strike-group (battle group), and network level training, 69–72 Interactive rendering, 141 Intercockpit communications systems (ICS), 304 Interdevice transport delays, 81 Internal referencing strategy (IRS), 166 International Conference on Advanced Learning Technologies (ICALT’03), 415 International Council on Systems Engineering, 11 International virtual environment research and development, 138–43; Center for Advanced Studies, Research, and Development in Sardinia (CRS4), 141; Chadwick Carreto Computer Research Center, 140; European Union (EU), 143; Gwangju Institute of Science and Technology (GIST), 141; Marmara Research Center (MRC), 142; MIRALab, 139; North Atlantic Treaty Organization (NATO) countries, 142; TNO defense security and safety, 140; Virtual Environment Laboratory (VEL), 139–40; Virtual Reality and Visualization (VRVis) Research Center, 141; Virtual Reality Lab (VRLab), 139 Internet: complemented by PLA platforms, 440–41; current navy requirements for development and transition of online performance aiding or learning systems, 71–72; IMAT.Explore, 71–72; massively multiplayer online games (MMOGs), 131–37, 413–15, 437; on-site visitors to museums, 446; replaced by VEs, 387; search engines, 368; usage, 401, 437; Web based CVE, 140 Inter-Sense inertial-acoustic trackers, 42, 263 Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC), 2, 114, 117, 123
473
Inter-vehicle embedded simulation and training (INVEST), 87–89 “In the Uncanny Valley” (Singer and Singer), 337, 339, 340, 341–48 INVEST (inter-vehicle embedded simulation and training), 87–89 IOC (Infantry Officers Course), 309–10, 314–15, 318. See also USMC Deployable Virtual Training Environment—Combined Arms Network (DVTE-CAN) iPods, 366 IPTs (program integrated product teams), 22 Iraq War, 77, 93, 132, 332–34, 351–52 IR (infrared) reflection, 230 Irrelevant behaviors and results, 151 IRS (internal referencing strategy), 166 ISMT (Indoor Simulated Marksmanship Trainer), 42 ISNS (integrated shipboard network systems), 67 Isoperformance approach, 175, 175f Italy, 143 ITER (incremental transfer effectiveness ratio), 167–69, 168f, 174–75, 174f, 175f, 246 ITK (Infantry Tool Kit), 36 ITS (intelligent tutoring systems), 291, 439 iZ3D Monitor, 423 Janak, E. A., 150 Jastrzembski, T. S., 396 Java Media Framework, 259 Jaynes, C., 424 Jerome, C. J., 271, 273, 274 JFCOM [Joint Forces Command] (U.S. Atlantic Command), 33 Johnson & Johnson (J&J), 416–17 Joint Forces Command [JFCOM] (U.S. Atlantic Command), 33 Joint live virtual constructive data translator, 118–19 Joint Semi-Automated Forces (JSAF), 34, 35–36, 38, 317, 318 Joint strike fighter (JSF), 393
474
Index
Joint terminal air controllers (JTAC), 38, 377–79 Joint Training and Experimentation Network (JTEN), 380 Joslin, C., 139 JSAF (Joint Semi-Automated Forces), 34, 36, 38, 317, 318 JSF (joint strike fighter), 393 JTAC (joint terminal air controllers), 38, 377–79 JTEN (Joint Training and Experimentation Network), 380 Julier, S. J., 282 Jumping simulations. See ParaSim JVC, 425 Kanbara, M., 283 Kanfer, R., 153 Karande, K., 180 Keil, A., 280–81 Keller, M., 108, 114 Kelly, C., 431 Kenyon, R., 420 Kercel, S., 360 Khamene, A., 280–81 Kiley, Kevin C., 104 Kilgour, F. G., 436 Kim, S. Y., 141–42 Kirkpatrick, D. L., 157, 163, 164, 166, 167, 169t, 330–31 Kishino, A. F., 428 Kitchens. See Daily living training Kitty Hawk, 337 Kiyokawa, K., 281 Knapp, J. R., 168 Knerr, B. W., 240–41 “Knobology,” 129 Kno¨pfle, C., 424 Knowledge of results (KR), 108, 113, 149. See also Feedback Knowledge, skills, and abilities (KSAs), 41, 159–61, 164–66, 244–45, 246–48, 271–73 Kocaeli, Turkey, 142 Korea, 141 Kraiger, K., 150, 164 Krampe, R., 304
Kresse, W., 424 KR (knowledge of results), 108, 113, 149. See also Feedback Krueger, W., 420 Krulak, Charles C., 4, 34 Krusmark, M. A., 396 KSAs (knowledge, skills, and abilities), 41, 159–61, 164–66, 244–45, 246–48, 271–73 Kuma Reality Games, 352 Kwon, J., 111 Lamb, P., 281 Land navigation. See Helicopter training Land warrior (LW), 82–83. See also DAGGERS Lane, J. C., 212–13 Language proficiency, 417–18, 438, 439, 440 Laptop systems, 240–41 Large screen/projection systems, 424–25 Laser insert devices, 208, 209 “The Last Supper” (da Vinci), 446 Lathan, C. E., 200–201 Lausanne, Switzerland, 139 Lawrence, J. S., 6 LCD computer displays, 363, 423. See also Monitors LCS (littoral combat ship), 58 Learning curve methodology, 250–51 Learning measures, categories of, 150 LEDs (light emitting diodes), 283, 305 Lee, W., 282 Leonardo, da Vinci, 446 Lessig, C., 282 Levy, P. E., 148, 150 LHD-7, 37 Liarokapis, F., 280, 281, 283, 286 License fees, 119 Light emitting diodes (LEDs), 283, 305 Liljedahl, U., 284 Lind, J. H., 234 Linden Lab, 415 Link, Ed, xiv, 1 Link Aviation Devices Inc., 1 Link Trainer (“Blue Box”), xiv Lintern, G., 211, 213
Index Linux applications, 122 Littoral combat ship (LCS), 58 Live building clearing exercises. See Measurement of performance: instrumentation for the recording of live building clearing exercises Live-fire training. See USMC Deployable Virtual Training Environment—Combined Arms Network (DVTE-CAN) Livingston, M. A., 282 Locomotion interfaces, 43–48, 46t, 47f LOCUS project, 281 LPD-17 San Antonio class, 23–24 LSD (lysergic acid diethylamide), 388 Lukes, George, 351 n.4 LW (land warrior), 82–83 M2/3 Bradley infantry fighting vehicle, 85, 86, 88–89, 331. See also Close combat tactical trainer (CCTT) Mac OS 10.4, 366 n.3 MacRae, H., 400 Macuda, T., 108, 114 Magnenat-Thalmann, N., 139 MAGTF XXI (Marine Air-Ground Task Force XXI), 34–35, 36, 38 Mahncke, H. W., 431 Mait, J. N., 401 MAK Technologies, 34 Manpower. See Human-systems engineering MANPRINT program, 22 Map interpretation and terrain association course (MITAC), 291, 296 Map interpretation and terrain association virtual environment system (MITAVES), 293–97, 293f, 294f, 295f Marichal, X., 280 Marine Air-Ground Task Force (MAGTF), 377–84; capability gaps, 380–82; described, 377–78; investments and benefits, 379–80; use case for a simulation-enabled integrated training exercise, 378–79 Marine Air-Ground Task Force XXI (MAGTF XXI), 34, 36, 38
475
Marine Corps Air Station Yuma (Arizona), 378–79 Marine Doom, 34 Marine Helicopter Squadron One (HMX-1), 296–97, 297f, 298f Marines. See U.S. Marines Marksmanship training. See Infantry and marksmanship training systems Marmara Research Center (MRC), 142 Marsden, J., 209 Martin, O., 280 Massed practice, described, 148 Massively multiplayer online games (MMOGs), 131–37, 413–15, 437 Maurer, S., 180 Mazar-e Sharif region, Afghanistan, 351 n.4 McBride, D. K., 358, 359, 360–61 McCall, J. M., 395 McDowell, I., 422 McGreevy, M., 420 MCL (Media Convergence Laboratory), 254, 274, 431–32, 445 Mead, S. J., 280 Measurement of performance: instrumentation for the recording of live building clearing exercises, 184– 95; clearing exercises, described, 184– 85; examples of training facilities, 185; position tracking, 188–90, 188f; walls and facility infrastructure, 186–88, 186f, 187f; weapons and body instrumentation, 190–95, 192–93f. See also Assessment of training; Training effectiveness evaluation (TEE) MEC (mission essential competency) evaluation, 394–95 Medal of Honor: Allied Forces, 35 Media Convergence Laboratory (MCL), 254, 274, 431–32, 445 Medical Education Technologies, Inc., 103 Medical simulations, 99–105, 399–404; augmented reality (AR), 280–81; benefits and need, 99; centers for virtual medical education, 402–3; challenges faced by medical field,
476
Index
399–400, 402; challenges of using, 104–5; civilian use of, 105; Combat Trauma Patient Simulation System Cycle of Care, 101f; costs, 403; deaths from avoidable medical errors, 399; fidelity, 105, 403; forward resuscitative surgical system (FRSS), 399–400; history of development, 99–100; medical simulation training centers (MSTC), 104; medical visualization, 141; overview, xiv; a partial task (tourniquet) trainer, 103f; required breakthroughs for the growing use of, 105; shift from cure to care paradigm, 402; stand-alone versus networked training systems, 100–103, 101f, 102f, 103f; STISIM Drive, 224; success stories, 103–4; tactile, chemical, and vestibular senses personal VEs, 359–60, 388. See also Rehabilitation Meeting rooms, 369 Memory (human), 271, 430 Menchaca, R., 140 Mental discipline model, 196, 197 Mental model assessments, 153–54 Merchant, S., 111 Mersive Technologies, 424 Merzenich, M. M., 431 Metacognition, 199 Mexico City, Mexico, 140 Microsoft Visual Studio.NET, 58 MIDI (musical-instrument digital interface) triggers, 258 MILES (multiple integrated laser engagement system) gear, 185–86, 191, 194 Milgram, P., 278, 428 Milham, L., 251 Military operations on urban terrain (MOUT), 244, 245, 246–51, 249f, 263–65, 264f, 271–73 Miller, R. B., 174 Mindscape retuning and brain reorganization with hybrid universes, 427–32; described, 430; extensible realities and the therapeutic advantage, 428–29; new scope and focus, 431–32
MIRALab, 139 Mission essential competency (MEC) evaluation, 394–95 Mission training centers (MTCs), 81 MITAC (map interpretation and terrain association course), 291, 296 MITAVES (map interpretation and terrain association virtual environment system), 293–97, 293f, 294f, 295f Mixed-initiative dialogue, 439 Mixed reality (MR), 254–75, 256f; auditory component, 257–60; Chromakey Augmented Virtual Environment (ChrAVE), 220, 297– 305, 299f, 300f, 301f; described, 254, 428, 447; effectiveness of, 270; haptics and physical effects, 260–61; immersive mixed reality, 382–84; mixed reality domes, 257, 265–66, 266f; MR infrastructure, 254–56, 255f; MR Kitchen, 262, 269–70, 270f, 271; MR MOUT (military operations in urban terrain) testbed, 263–65, 264f, 271–73; MR Sea Creatures, 257, 265– 67, 267f, 271, 448–50, 449f, 450f; MR Time Portal, 267–69, 268f; museum experiences, 447–50; presence, 273– 74; scripting, 261–63; simulator sickness, 274; training objectives and/or outcomes, 270–73; visual displays, 256–57 MMOGs (massively multiplayer online games), 131–37, 413–15, 437 Mobile navigation, 281 Modeling and Simulation Coordination, 3, 123 Mode proliferation, 29 ModSAF (modular semi-automated forces), 34 Mohring, M., 282 Mojave Viper pre-deployment capstone exercise, 381–82, 383 M1A1 Abrams main battle tank, 33, 85, 86, 88–89, 331. See also Close combat tactical trainer (CCTT) Monitors, 423–24; cathode ray tube (CRT) monitors, 406, 423; digital light
Index processing (DLP) displays, 423–25; large screen/projection systems, 424– 25; research and development, 424; stereoscopic displays, 424. See also LCD computer displays Monkey level intellectual performance, 361 Moor, W. C., 394 Moore, G. E. (Moore’s Law), 1, 210, 355, 437, 440 Moravec, Hans, 361 Morie, J. F., 201 MOT2IVE (Multi-Platform Operational Team Training Immersive Virtual Environment), 37, 169, 309 Motion sickness–like symptoms. See Simulator sickness Motivation, 108, 153. See also Boredom Mouloua, M., 271 MOUT (military operations on urban terrain), 244, 245, 246–51, 249f, 263– 65, 264f, 271–73 MR. See Mixed reality systems MRC (Marmara Research Center), 142 MR Kitchen, 262, 269–70, 270f, 271 MR MOUT (military operations in urban terrain) testbed, 263–65, 264f, 271–73. See also Military operations on urban terrain (MOUT) MR Sea Creatures, 257, 265–67, 267f, 271, 448–50, 449f, 450f MR StoryEngine, 260–61, 261–63, 267 MR Time Portal, 267–69, 268f MSTC (medical simulation training centers), 104 MTCs (mission training centers), 81 Mullen, E., 165–66 Mullen, Mike, 71 Multifunctional vision blocks, 87 Multihost automated remote command and instrumentation application, 118 Multimedia principle, 437 Multi-Platform Operational Team Training Immersive Virtual Environment (MOT2IVE), 37, 169, 309 Multiple integrated laser engagement system (MILES) gear, 185–86, 191, 194
477
Multipurpose operational team training immersive virtual environment system, 121–22, 123 Murtha, T. C., 153 Museum experiences, 444–50, 449f, 450f; costs of exhibits, 447; evolving museums, 447–50; intergenerational transfer via, 444; mixed reality (MR), 447–50; MR Sea Creatures, 257, 265– 67, 267f, 271, 448–50, 449f, 450f; on-site visitors, 446. See also Entertainment industry Music. See Audio Musical-instrument digital interface (MIDI) triggers, 258 Mutual trust, 154 Naimark, L., 283 NASA (National Aeronautics and Space Administration), 337, 420 National Academy of Sciences, 66 National Center for Simulation, 103 National Defense Authorization Act, 2001, 35–36 National Football League (NFL), 337 National Institutes of Health, 224–25, 227f, 228f National Research Council: 2008 report, 393; Flight Research Laboratory (NRC-FRL), 108, 111, 113 National Training Systems Association, 3 NATO (North Atlantic Treaty Organization) countries, 142 Natural language processing, 438, 439, 440 Nausea. See Simulator sickness Naval Air Station North Island, 294–96 Naval Air Systems Command, 36 Naval Air Warfare Center, 136–37. See also U.S. Navy Naval Mine and ASW Command, 70 Naval Postgraduate School, 38, 119, 220 Naval Research and Development (SPAWAR [Space and Naval Warfare] Systems Center, San Diego, California), 34, 62, 71 Naval Research Laboratory, 281–82
478
Index
Naval Sea Systems Command, 59–60, 67 Naval Studies Board, National Academy of Sciences, 66 Naval Submarine School, 68 Naval Surface Warfare Center, 36, 59, 60, 62, 74 n.2 Navigation. See Helicopter training Navy. See U.S. Navy Navy Continuous Training Environment (NCTE), 380 Navy Enterprise Network (ONEnet), 67 Navy Marine Corp Internet (NMCI), 67 NCTE (Navy Continuous Training Environment), 380 Nelson, L., 153 Nerve injuries, 359–60 Net Generation, 401 Netherlands, 140, 142 NetImmerse, 58–59 Networking: bandwidth improvements, 364–66; security concerns, 81 Neubauer, P. J., 395 Neujahr, H., 283 Neuroimaging technologies, 430. See also EEG (electroencephalogram) Neurorehabilitation. See Rehabilitation Neurosensing devices, 427–28, 429 New England Journal of Medicine, 400 New Scientist (magazine), 369 NFL (National Football League), 337 Niemann, H., 280–81 Night vision goggle (NVG) usage, 302 Nilsson, M., 284 9/11 Terror attacks, 400 Nintendo Wii, 143, 363 NMCI (Navy Marine Corps Internet), 67 Noninteractive video, 291, 296 North Atlantic Treaty Organization (NATO) countries, 142 North Vietnamese MiG jets, 383 NRC-FRL (National Research Council Flight Research Laboratory), 108, 111, 113 Nuclear-powered ballistic missile submarine (SSBN) Trident variant, 68 Nuclear-powered cruise missile submarine (SSGN) Trident variant, 68
Numerical Design Limited, 58–59 NVG (night vision goggle) usage, 302 NVIDIA, 409 Object-oriented applications, 437–38 Observational learning, 148 Obtrusive methods and data collection instruments, 322 Ocean-environment models, 67–68 OCONUS (outside contiguous United States), 67 Oda, K., 220 Office of Naval Research (ONR): modifying COTS games, 35; Multiplatform Operational Team Training Immersive Virtual Environment, 309; Virtual Technologies and Environments (VIRTE), xiv, 37–38, 43, 119–23, 302– 5, 309, 431–32. See also U.S. Navy Off-site college campuses, 369 Ogee curve (S-shaped curve), 362–64 OIF (Operation Iraqi Freedom), 77, 93, 132, 332–34, 351–52 Oklahoma City, Oklahoma, 400 Old Man and the Sea (film), 219 Olfactory capability, 360 OLIVE virtual world platform, 417 ONEnet (Navy Enterprise Network), 67 Online gaming. See Massively multiplayer online games (MMOGs) Online performance aiding or learning systems, 71–72. See also Internet Open source gaming applications, 119 Operational costs. See Costs Operational functionalities and coordination requirements, 162 Operation Enduring Freedom, 77 Operation Flashpoint, 36 Operation Hollywood (Robb), 6 Operation Iraqi Freedom (OIF), 77, 93, 132, 332–34, 351–52 Operator Performance Laboratory (OPL), 107–8, 111–12, 113 Ophysiological measures. See EEG (electroencephalogram) Organizational assessment goals, 328–29
Index Orlando, Florida, 265–66, 266f, 273. See also MR Sea Creatures Orlansky, J., 82, 173, 178–79 Outcome measures, described, 163 Outside contiguous United States (OCONUS), 67 Overlearning, 149 Ownership considerations, 21 Ownship simulation, 87 Padgett, M. L., 200 Paideia Computing, 417–18 Pair, J., 201 Paper, history of, 436 ParaSim, 223–24, 224f, 225f, 226f Park, J., 282, 283 Parnes, P., 284 Part learning, 148 PC based games. See Games and gaming technology for training PCI-Express standard, 411 PC-IMAT, 66–67, 69–71 PC-2008, 356–57 PDAs (personal digital assistants), 444 Performance evaluations. See Training effectiveness evaluation (TEE) Personal digital assistants (PDAs), 444 Personal learning associates (PLAs), 435–42; cost savings via, 439; historical trends in learning, 436; infrastructure: progress, 441–42; key components for use, 435; operations, functionalities, and capabilities, 439– 41; technology, 436–39; trends, 435 Personal VEs, 355–70; approach and basic premises, 356–57; audio, 359, 360; background, 355–56; human computational capability, 360–61; modeling human behavior and creating adaptive, responsive agents, 367–68; notes on prediction approach, 362–63; rough order of magnitude (ROM) estimates, 357; suggested capabilities, 361–62, 363–67; tactile, chemical, and vestibular senses, 359–60, 388; totally immersive personal VE capabilities,
479
357; uses for, 368–70; vision, 358–59, 360, 388 Person based transfer mechanisms, 197–200 Pescovitz, D., 437 Phobia treatment, 428–32 Physical fidelity, 200 Physiological sensors. See Quality of training effectiveness assessment (QTEA) tool Pituitary surgery, 141 PLAs. See Personal learning associates Plateau analysis, 246, 248–51, 249f Point-source channels, 260 Polanyi, Michael, 324 Polygon templates, 220 Poolman, P., 114 Portal generation, 230 Posit Science’s Brain Fitness training program, 431 Post–Cold War period deployments, 83 Post-exercise learning, 152, 201. See also Feedback Post-traumatic stress disorder (PTSD), 430 Postural tracking, 363 Power supply and distribution, 121–22, 123, 284 Predator UAV pilot computational model, 396 Prensky, Marc, 444 Prepackaged tutorials, 321 Primacy, 108 Procedural knowledge, 202 Process measures, described, 163 A Producer’s Guide to U.S. Army Cooperation with the Entertainment Industry (U.S. Army), 6 Program integrated product teams (IPTs), 22 Propagation loss laboratory, 64–65, 65f Proprietary software solutions, 119 Proton Media’s Protosphere tool, 416 Pro Tools, 258 Psychological fidelity, 200–201 Psychomotor, skill based outcomes, 164–65
480
Index
Psychotherapy, 430. See also Exposure therapy PTSD (post-traumatic stress disorder), 430 QinetiQ, 282 Quake, 387 Quality of training effectiveness assessment (QTEA) tool, 107–15, 109f, 112f; development of QTEA system prototypes, 113; instrumented training systems requirements, 108–11; iterative integration and transition, 114; operational considerations of instrumented aviation training systems, 113–14; system high level design, 111– 13; test and evaluation, 114–15 Quintero, R., 140 Radio frequency bandwidth, 365, 366 Radio frequency identification (RFID) based systems, 190, 283, 285, 286, 444, 445 Ramona!, 368 n.6 RAP (Ready Aircrew Program), 394 Rauch, S. L., 430 Raven-B laptop control stations, 378–79 Raydon, 36 Reaction measures, 150 Ready Aircrew Program (RAP), 394 Realistic broadcasting, 141 Reality By Design (Advanced Interactive Systems), 42 Real time kill removal capability, 79–80 Recognition of Combat Vehicle series, 38 Reconstruction of combat operations and large-scale distributed simulations, trends in, 351 Recordings of training sessions, 152 Red Cross sponsored swimming program, 177–78, 178t Red, green, and blue (RGB) components, 220–21, 221f Reeves, Byron, 414 Regenbrecht, H., 281 Regenerate zones, 79–80 Rehabilitation: assessment of treatment,
430–31; brain plasticity, described, 429–30; costs of rehabilitation for TBIs, 429; human experience modeler (HEM) testbed, 431–32; mindscape retuning and brain reorganization with hybrid universes, 427–32; MR Kitchen, 262, 269–70, 270f, 271; Posit Science’s Brain Fitness training program, 431; via virtual worlds, 369. See also Medical simulations Reiners, D., 424 Remote system interfaces, 263 Repetition, 176–77 Required capabilities analyses, 164 Research, Development and Engineering Command (RDECOM) Simulation and Training Technology Center (STTC), 92–93, 100, 103–4, 105, 126 Ressler, S., 405 Results measures, 150–51 Rezk-Salama, C., 280 Reznick, R. K., 400 RFID (radio frequency identification) based systems, 190, 283, 285, 286, 444, 445 Rifle ranges. See Small arms simulation Rizzo, A., 201 Robb, David L., 6 Robinett, W., 420 Robots, xiii–xiv Rogue Spear, 35 Romans (ancient), 436 Roscoe, S. N., 166–67, 169, 174–75, 245, 246 Roughead, Gary, 70–71 Route planning systems, 350–51 Rule of thirds, 438–39 RuneScape, 437 Ryerson University, 139 S&T (science and technology) long poles, 380–81 Sackett, P., 165–66 Safety/security considerations, 87, 140 SAGAT procedure, 154 SAIC, 235 Salas, E., 150, 151, 152, 153–54, 164
Index Salter, M. S., 234 Samsung, 411 Sandin, D. J., 420 Sandl, P., 283 Santayana, George, 344–45 SAPS (Stand-Alone Patient Simulator), 102, 102f, 104 Sardinia, 140–41 Sauer, F., 280–81 Sauser, B., 13 SBIR (Small Business Innovative Research), 34 Scaffolding techniques, 291 Scalable ET and mission rehearsal (SET-MR), 88–90 Schaffer, R., 58 Scheuering, M., 280 Schmidt, G. S., 282 Schneider, A., 280 Schnell, T., 108, 111, 114 Schoolchildren. See MR Sea Creatures Schooley, Caire, 414 School for Command Preparation (SCP), 127 Schreiber, B. T., 395 Science and technology fundamentals, 9–10 Science and technology objective (STO), 92–93, 95–96 Science and technology (S&T) long poles, 380–81 SCORM (Sharable Content Object Reference Model), 440–41 SCP (U.S. Army Command and General Staff College, School for Command Preparation), 127 Scripting, 261–63 SE. See Systems engineering approaches Sea Creatures, 257, 265–67, 267f, 271, 448–50, 449f, 450f Seales, B., 424 Seaman’s eye, 50–56, 52t, 54f, 55f Search engines, 368 Seasickness. See Simulator sickness Sea World, 6 Second Life, 131, 368–69, 387, 415 Self-efficacy: defined, 198; DVTE-CAN,
481
311, 314–15; self-efficacy questionnaires, 165; of slower-acquisition trainees, 153; transfer performance and, 198–99 Self-regulation, 165, 199 The Semantic Web, 437, 441 Sensics Inc., 422 Sensor deployment, 111, 112–13, 114 Sensor servers, 262 Sensory fidelity, 397 Sensory substitution, 359–60 Sensory task analysis (STA), 161–62 September 11 terror attacks, 400 SET-MR (scalable ET and mission rehearsal), 88–90 SET (sonar employment trainer), 67–68, 68f SFX engine, 261–62, 265 SGI (Silicon Graphics, Inc.), 54, 406, 408–9 Sharable Content Object Reference Model (SCORM), 440–41 Shared mental models, 153–54 Shields up, 80 Shin, L. M., 430 Shipboard Marine Corps operational simulator technology, 35–36 Silicon Graphics, Inc. (SGI), 54, 406, 408–9 SIMILAR, 11–12, 13–16, 14t SIMNET (simulation network), 33, 331, 332–34, 351, 386 Sims, D. E., 153–54 Simulated Mission and Rehearsal Training (SMART) initiative, 143 Simulation and Training Technology Center (STTC), 92–93, 100, 103–4, 105, 126 Simulation Laboratory at the Institute for Defense Analysis, 351 n.4 Simulation network (SIMNET), 33, 331, 332–34, 351, 386 Simulation programs, overview: analog computational programs, xiv; augmented cognition, xv; distributed interactive simulation, xiv; fundamental components, xiv; history
482
Index
of development, xiv; medical simulation, xv Simulation, Training and Instrumentation Command’s technology base, 100 Simulator sickness, 273 Situation based transfer mechanisms, 197, 201 Six degrees of freedom (6 DOF) tracking, 282–83 Skills. See Knowledge, skills, and abilities (KSAs) SLOC (source lines of code), 366–67 Small arms simulation, 206–14; assessing transfer of training, 207–8; challenges in assessing simulations for marksmanship training, 210; fidelity in marksmanship simulation, 210–13; prediction, 208–9; transfer of training, 209. See also Infantry and marksmanship training systems Small Business Innovative Research (SBIR), 34 SMART (Simulated Mission and Rehearsal Training) initiative, 143 Smell, 360 SMEs (subject matter experts), 162 SMMTT (submarine multimission team trainer), 68 Smokejumpers, 223 SO3 (study of organizational opinion), 327–31, 332–34 Social interactions, 368–70 Social phobia, 428 Soesterberg, the Netherlands, 140 Soldier Visualization Station (SVS), 42, 235, 240 SolidWorks, 229 Sonar employment trainer (SET), 67–68, 68f Sonar tactical decision aid (STDA), 67, 69 Sony, 422, 425 SORTS (status of resources and training system), 167–69 Sound. See Audio SoundDesigner, 259–60 Sound propagation, 64–65, 65f
Sound speed, 64–65, 65f Source lines of code (SLOC), 366–67 Sources (audio), 259–60 Space and Naval Warfare Systems Center (SPAWAR), 34, 62, 71 Spatialized channels, 260 Spatial relations, 281 SPAWAR (Space and Naval Warfare) Systems Center, 34, 62, 71 Speakers (audio), 259–60 Special Devices Desk at the Bureau of Aeronautics, xiv Spiral approach, 11–12, 14t, 15–16, 121, 158–59, 314, 321 SRI Consulting, 415–16 SSBN (nuclear-powered ballistic missile submarine) Trident variant, 68 SSGN (nuclear-powered cruise missile submarine) Trident variant, 68 S-shaped curve (Ogee curve), 362–64 Stakeholders, identifying, 326 Stand-Alone Patient Simulator (SAPS), 102, 102f, 104 Standing training devices (TDs), 50 Stanford University, 414 Stanney, K. M., 251, 356 Stapleton, C. B., 447 Star Trek, 6, 448 STA (sensory task analysis), 161–62 Static matte, 218 Status of resources and training system (SORTS), 167–69 STDA (sonar tactical decision aid), 67, 69 STE (synthetic task environment) tools, 396 Stevens, B., 281 STISIM Drive, 224–28, 227–28f STI (Systems Technology, Inc.), 217–18. See also ParaSim; STISIM Drive STL, 410 Stone, Oliver, 6 STO (science and technology objective), 92–93, 95–96 Stovepipes, 78, 349–50. See also Trends in VE STOW (synthetic theater of war), 33, 59 StraBer, W., 282
Index Stress, 201 String, J., 173, 178–79 Stryker ET system, 85, 89–90 STTC (Simulation and Training Technology Center), 92–93, 100, 103– 4, 105, 126 Study of organizational opinion (SO3), 327–31, 332–34 Subject matter experts (SMEs), 162 Submarine Development Squadron Twelve, 67 Submarine multimission team trainer (SMMTT), 68 Submarines: antisubmarine warfare (ASW), 62–63, 73; COVE training system, 56–58, 56f; “seaman’s eye,” 50–56, 52t, 54f, 55f; Virtual Environment Situation Awareness Review System (VESARS), 50–56, 52t, 54f. See also Interactive Multisensor Analysis Training (IMAT) program Sundareswaran, V., 283 Supercomputers, 361 Surface Warfare Officers School (SWOS), 56–58, 56f Surgeons and surgery. See Medical simulations Sutherland, Ivan E., 420 SVS (Soldier Visualization Station), 42, 235, 240 Sweden, 142, 143 Swimming training, 177–78, 178t Switzerland, 139 Sword of Damocles, 420 SWOS (Surface Warfare Officers School), 56–58, 56f Synnes, K., 284 Synthetic entities, 79–80 Synthetic task environment (STE) tools, 396 Synthetic theater of war (STOW), 33, 59 Systems engineering approaches, 9–17, 14t; engineering approaches, 11–12; engineering fundamentals, 10–11; HPSM: integrating the scientific method and systems engineering, 13–
483
16, 13f, 14t; human-systems engineering, 27–32; science and technology fundamentals, 9–10; SIMILAR, 11–12, 13–16, 14t Systems Technology, Inc. (STI), 217–18. See also ParaSim; STISIM Drive System utility, 122, 311, 314 Tacit knowledge, 331 Tactical decision making under stress research program, 22–23 Tactical decision simulations (TDSs), 34, 378–79 Tactical Iraqi program, 38, 417 Tactical Operations Marine Corps (TacOpsMC), 35, 38 Tactical training equipment (TTE), 50 Tactile capabilities, 362 Taliban, 351 n.4 Tappert, C. C., 283 Targeted emotional responses, 165 Task analysis, 161–64, 246–48 Taxonomy of educational objectives, 150 TDs (standing training devices), 50 TDSs (tactical decision simulations), 34, 378–79 Technology Labs, 415 Technology trends: history of development, 436; Moore’s Law, 1, 210, 355, 437, 440; overview, 436–38; rule of thirds, 438–39 TECOM (Training and Education Command) Technology Division, 380–82 TEE. See Training effectiveness evaluation (TEE) Tekamah Corporation, 103 Televisions: digital light processing (DLP) displays, 423–25; HDTV (high definition television), 356–57, 365; stereoscopic displays, 424; 3-D-capable TVs, 411 Tenmoku, R., 283 TER (transfer effectiveness ratio), 167– 69, 168f, 245–46, 251. See also Transfer effectiveness evaluations Tesch-Romer, C., 304
484
Index
Texas A&M University–Corpus Christi, 402–3 Texas Instruments, 411 Theater and Force-Level Anti-Submarine Warfare, 73 Therapy. See Rehabilitation There.com, 415 There (MMOG), 415 3rd Battalion, 4th Marines310, 317, 318 Thorndike, E. L., 149, 196, 197, 198, 202, 203 Thorpe, Jack, 33 Threat Image Projection software program, 396 3-D-capable TVs, 411 TNA (training needs analysis), 158, 159 TNO defense security and safety, 140 Tongues, 360 Top Gun, 383 Toronto, Ontario, 139 Torre, J. P., 210 Trackers/tracking for training, 282–83 TRADOC (U.S. Army Training and Doctrine Command), 3–4, 85–86, 331 Training, overview: active learning, 148; adaptability, 154; distributed practice, described, 148; learning context, 148; massed practice, described, 148; observational learning, 148; overlearning, 149; part learning, 148; repetition, 176–77; self-efficacy, 153, 165, 198, 198–99, 311, 314–15; training management, described, 162; transfer performance, 166–67; whole learning, 148. See also Assessment of training; specific programs and components by name; Training effectiveness evaluation (TEE) Training and Doctrine Command (TRADOC), 3–4, 85–86, 331 Training and Education Command (TECOM) Technology Division, 380–82 Training effectiveness evaluation (TEE), 147–55; behavioral measures, 150; challenges and considerations, 151–53, 157–58; cognitive outcomes, 164–65;
complete recordings of training sessions, 152; criterion contamination, 151; criterion deficiency, 150–51; described, 50, 157, 243; design of evaluation process, 151; fidelity, 152; guidelines for use of simulations, 151; key principles in learning and skill acquisition, 148–49; Kirkpatrick’s four level model of training evaluation, 157, 163, 164, 166, 167, 169t, 330–31; learning measures, 150; operational functionalities and coordination requirements, 162; psychomotor, skill based outcomes, 164–65; quasi-experimental methods, 165–66; reaction measures, 150; results measures, 150–51; training critical team variables, 153–54; training evaluation, overview, 149–51; training motivation and VEs, 153. See also Assessment of training; Measurement of performance; Quality of training effectiveness assessment (QTEA) tool; Transfer effectiveness evaluations Training effectiveness evaluation (TEE), lifecycle approach, 157–70, 160f, 169t; described, 158–59; task analysis, 159– 61; TEE Evaluation Metrics, 169t; theoretical TEE, 163–64; trainee performance evaluation, 164–66, 164– 67; training needs analysis (TNA), 158, 159; training systems design, 161–63; training systems evaluation, 163; training transfer evaluation, 166–67; transfer efficacy, 167–69, 168f. See also Transfer effectiveness evaluations Training needs analysis (TNA), 158, 159 Training planning process methodology (TRPPM), 22 Transfer effectiveness evaluations, 166– 67, 173–81, 243–52; assessing small arms simulations, 207–8, 209; “bottom line,” 149; cost justification, 178–79; cumulative transfer effectiveness ratio (CTER), 168–69, 168f; curriculum, 247–48; defined, 243; design based transfer mechanisms, 197, 200–201;
Index efficacy, 167–69; incremental transfer effectiveness ratio (ITER), 167–69, 168f, 174–75, 174f, 175f, 246; learning curve methodology, 250–51; methodologies, 245–51; parameters, 249–50; person based transfer mechanisms, 197–200; roadblocks to conducting training effectiveness evaluations, 244–46; situation based transfer mechanisms, 197, 201; task analysis and training objective identification, 247; test populations, 244–47; time-cost models, 176–78; trade-off models, 174–76; transfer effectiveness ratio (TER), 167–69, 168f, 245–46, 251; utility analysis, 179–80. See also Identical elements theory; Training effectiveness evaluation (TEE) Transportation Security Administration (TSA), 396 Trauma patient simulations. See Medical simulations Traumatic brain injury (TBI), 429. See also Rehabilitation Trends in VE, 349–54; BattlePlex, 354; Battle School (graduate level), 354; Battle School (undergraduate), 353– 54; commercial games, trends in, 352–53; distributed simulation as a command and control (C2) system, 350; flight simulations, 350–51; real war and commercial games, trends in, 352–53; reconstruction of combat operations and large-scale distributed simulations, trends in, 351; route planning systems, 350–51. See also Computer technology, growth in TRPPM (training planning process methodology), 22 Trust, 28–29, 154 Ts’ao Hsueh-ch’in, 341 n.2 TSA (Transportation Security Administration), 396 TTE (tactical training equipment), 50 Trainer, 123 Turkey, 142
485
Twentynine Palms (California), 378–79, 382 263rd Army National Guard, Air and Missile Defense Command site, 186– 90. See also Measurement of performance: instrumentation for the recording of live building clearing exercises Tysons Corner, Virginia, 296–97, 297f, 298f UAS (unmanned aerial system), 79, 82 UA (utility analysis), 179–80, 179f, 180f UAVs (uninhabited aerial vehicles), 393–94 UCD (user-centered training system design), 161–64, 166 “The Ultimate Display” (Sutherland), 420 Ultraviolet (UV) reflection, 230 Underway replenishment (UNREP), 57 Uninhabited aerial vehicles (UAVs), 393–94 United Kingdom, 142 University of Central Florida, 103, 235, 254, 431–32, 445 University of Geneva, 139 University of Southern California Institute for Creative Technology, 6 University of Texas Applied Research Lab, 70 Unmanned aerial system (UAS), 79, 82 Unmanned vehicles (UVs), 27–32; automation surprise, 27; autonomous vehicles (AVs), 25–26; benefits of, 27– 29; challenges of, 28–29; examples of, 27; improving human-system interaction, 30–31; increased demand for, 25– 26; information processing: software versus “bioware,” 29–30 Unreal Tournament 2003, 95–96, 387 UNREP (underway replenishment), 57 Usability analyses, 162, 314 U.S. Air Force: cognitive models, 396– 97; continuous learning, 395–96; distributed mission operations (DMOs), 77–83; embedded training, 4; immersive environments, 397; joint
486
Index
strike fighter (JSF), 393; live, virtual, and constructive (LVC), 392–97, 397f; operational roles, policy, and doctrine, 392–93; resource constraints, 393–94; Security Police weapons training, 209; SIMNET program, 33; training and operations with other organizations, 78 U.S. Army: Army Game Project, 5; Army Investment Plan, 2–3; Battle Command 2010, 5; close combat tactical trainer (CCTT), 33, 331; combat systems, 85– 90; combat systems, demonstrations, 89–90; combat systems, embedded training research, 87–88; combat systems, prototypes, 89; combat systems, system high level design, 86– 87; combat systems, task analysis, 85– 86; combat systems, user considerations, 88–89; commercial VE expressions used by, 387–88; DAGGERS (distributed advanced graphics generator and embedded rehearsal system) project, 92–98, 97f; functional area assessment on modeling and simulation management, 2–3; future of VE training, 389–90; games and gaming technology for training, 5, 126; history of dismounted combatant simulation training systems, 234; Hollywood as partner, 6–7, 217– 19; ILUMA (illumination under realistic weather conditions) model, 59; MANPRINT program, 22; Recognition of Combat Vehicle series, 38; recommendations for trainers using PC based computer games for training, 126–29; recruitment programs, 5; SIMNET (simulation network), 33, 331, 332–34, 351, 386; “soft social science,” 390; Topographic Engineering Center’s Camp Lejeune terrain database, 59; TRADOC (U.S. Army Training and Doctrine Command), 3–4, 85–86, 331 U.S. Army Command and General Staff College, School for Command Preparation (SCP), 127
U.S. Army Medical Research and Materiel Command (MRMC), 103, 105 U.S. Army National Guard (ARNG), 332–34 U.S. Army Research, Development and Engineering Command (RDECOM), 92–93, 100, 126 U.S. Army Research Institute Simulator Systems and Infantry Forces Research Units, 129, 135, 136, 234–35, 331 U.S. Army Research Laboratory Computational and Information Sciences Directorate, 234–35 U.S. Army Research Laboratory Human Research and Engineering Directorate, 234–35 U.S. Army Simulation Training and Instrumentation Command, 5, 234–35 U.S. Atlantic Command (now Joint Forces Command [JFCOM]), 33 U.S. Department of Commerce, 401 U.S. Department of Defense: on costs of specialized skill training, 438–39; Defense Modeling and Simulation Office (Modeling and Simulation Coordination), 122–23; Defense Science Board report (2003), 395; demonstrations of, 135; Department of Defense Directives (DoDD), 21–22; distributed simulation technologies, 33–34; on game based trainers, 125; GAO report on (1981), 21; high level architecture standard, xiv, 34, 36, 122– 23; human-systems integration (HSI) for naval training systems, 18–25; Institute for Electrical and Electronics Engineers Standard for Distributed Interactive Simulation (IEEE), 135, 415; Quadrennial Defense Review (2006), 392; Tactical Iraqi, 417 User-centered training system design (UCD), 161–64, 166 User interfaces, 122 User scrutiny event (USE), 36 USE (user scrutiny event), 36 U.S. Food and Drug Administration, 400 U.S. Forest Service, 223
Index “Using Virtual Worlds for Corporate Training” (ICALT’03), 415 U.S. Marine Corps (USMC): DVTE (see Deployable Virtual Training Environment), Fire Support Teams (FiSTs), 308, 309; games and gaming technology for training, 4–5, 34–35; H-46 Fleet Replacement Squadron (FRS)—HMM(T)-164, 304–5, 305f; immersive training, 382–83; Joint Training and Experimentation Network (JTEN), 380; LPD-17 training program, 23–24; Marine Air-Ground Task Force (MAGTF), 377–84; Marine Air-Ground Task Force XXI (MAGTF XXI), 34–35, 36, 38; Marine Combat Development Command, 4–5; Marine Corps Air Station Yuma (Arizona), 378–79; Marine Corps Modeling and Simulation Office, 34; Marine Corps Semi-Automated Forces development, 34; Marine Helicopter Squadron One (HMX-1), 296–97, 297f, 298f; paying in sweat versus blood, 383; shipboard Marine Corps operational simulator technology, 35–36; Virtual Environments and Technologies program, 302–5; virtual fire support trainer, 119– 21, 123. See also Helicopter training U.S. Marine Corps (USMC): Training Modeling and Simulation Master Plan (TM&SMP), 377–84; capability gaps, 380–82; described, 377–78; investments and benefits, 379–80; use case for a simulation-enabled integrated training exercise, 378–79 USMC Deployable Virtual Training Environment—Combined Arms Network (DVTE-CAN), 308–22; corresponding instruments, 311–14; described, 309; DVTE Application Variants, 312–13t; experimental design and procedure, 310–11; lessons learned, 318–22, 319–20t; methods, 309–10; results, 314–17 U.S. Navy, 50–60; air defense warfare (ADW) scenarios, 198–99; aviation
487
wide-angle visual system (AWAVS) program, 176; changing requirements, 372–73; Chromakey Augmented Virtual Environment (ChrAVE), 220, 297–305, 299f, 300f, 301f; Conning Officer Virtual Environment (COVE), 56–58, 56f; embedded training, 4; future possibilities and direction, 373– 76; HARDMAN, 22; human performance systems model (HPSM), 13; human-systems integration (HSI) for naval training systems, 18–25; LPD-17 San Antonio class, 23–24; mass-production approach to learning, 373; Naval Air Station North Island, 294–96; Naval Air Systems Command, 36; Naval Air Warfare Center, 136–37; Naval Mine and ASW Command, 70; Naval Postgraduate School, 38, 119, 220; Naval Research and Development (SPAWAR [Space and Naval Warfare] Systems Center, San Diego, California), 34, 62, 71; Naval Research Laboratory, 281–82; Naval Sea Systems Command, 59–60, 67; Naval Studies Board, National Academy of Sciences, 66; Naval Submarine School, 68; Naval Surface Warfare Center, 36, 59, 60, 62, 74 n.2; Navy Continuous Training Environment (NCTE), 380; Navy Enterprise Network (ONEnet), 67; Navy Marine Corp Internet (NMCI), 67; requirements for development and transition of online performance aiding or learning systems, 71– 72; strengths and advantages of, 371– 72; team tactical engagement simulator, 233–34; training planning process methodology (TRPPM), 22; Virtual Environment Landing Craft, Air Cushion (VELCAC), 51, 58–60, 60f, 302; Virtual Environments and Technologies program, 302–4; Virtual Environment Submarine (VESUB), 51–57. See also Helicopter training; Office of Naval Research USS Iwo Jima, 37
488
Index
USS Vincennes, 22–23 Utility analysis (UA), 179–80, 179f, 180f Utility reactions, 150 Utility servers, 262 UVs. See Unmanned vehicles UV (ultraviolet) reflection, 230 V&V (verification and validation), xv, 104–5, 212–13 Vaio UX50, 422 Validation. See Verification and validation (V&V) Van de Velde, Walter, 285 VBS-1 (Virtual Battlefield System 1), 36 VEAAAV (Virtual Environment Advanced Amphibious Assault Vehicle), 302 Vee model, 11, 13, 14t, 15, 16 VEHELO (Virtual Environment Helicopter), 302–5, 303f, 305f VELCAC (Virtual Environment Landing Craft, Air Cushion), 51, 58–60, 60f, 302 VEL (Virtual Environment Laboratory), 139–40 Verification and validation (V&V), xv, 104–5, 212–13 Vertigo. See Simulator sickness Vertigo (film), 217 VESUB (Virtual Environment Submarine), 51–57, 52t, 54f, 55f Video conferencing, 369 Video games. See Games and gaming technology for training Vienna, Austria, 141 Vietnam era, 21, 383 Vincenzi, D. A., 177 VIRTE VELCAC prototype trainer, 58–60 VIRTE (Virtual Technologies and Environments), 37–38, 43, 119–23, 309, 431–32 Virtual at sea trainer, 119–20 Virtual Battlefield System 1 (VBS-1), 36 Virtual classrooms, 369 Virtual Environment Advanced
Amphibious Assault Vehicle (VEAAAV), 302 Virtual environment displays, overview, 420–26; COTS components, 421; head-mounted displays (HMDs), 421– 22; history of development, 420–21; large screen/projection systems, 424–25; monitors, 423–24 Virtual Environment Helicopter (VEHELO), 302–5, 303f, 305f Virtual Environment Laboratory (VEL), 139–40 Virtual Environment Landing Craft, Air Cushion (VELCAC), 51, 58–60, 60f, 302 Virtual Environments for Intuitive Human-System Interaction study, 142 Virtual Environment Submarine (VESUB), 51–57, 52t, 54f, 55f Virtual environments (VEs), overview: described, 63; development of prototypes, 120–23; as enablers for operational performance, 325; history of development, 337; lessons learned, 117–24; system high level design, 119– 20; task analysis, 117–19. See also Virtual environment displays, overview Virtual fire support trainer, 119–21, 123 Virtual life network (VLNET), 139 Virtual maneuver trainers (VMT), 332–34 Virtual meeting rooms, 368 Virtual Reality and Visualization (VRVis) Research Center, 141 Virtual Reality Lab (VRLab), 139 The Virtual Reality Medical Center, 432 Virtual reality sickness. See Simulator sickness Virtual reality (VR): augmented reality (AR) collaborative tasks, 281, 286; augmented reality (AR) versus, 278, 282, 286; described, 278, 447 Virtual retinal displays, 280 Virtual simulation training toolkit, 119– 20, 121–22, 123 Virtual Technologies and Environments
Index (VIRTE), 37–38, 43, 119–23, 309, 431–32 Vision: direct neural stimulation, 388; personal VEs, 358–59, 360 Vlahos, Petro, 219 VLNET (virtual life network), 139 VMT (virtual maneuver trainers), 332–34 Vogelmeier, L., 283 Vogl, T., 111 Vogt, S., 280–81 Voice recognition technology, 417–18 Vomiting. See Simulator sickness VPL Research Inc., 420 VR. See Virtual reality VRLab (Virtual Reality Lab), 139 VRML/X3D, 410 VRVis (Virtual Reality and Visualization) Research Center, 141 VShip, 57 Ward, P., 291–93 Ware, C., 423 Warfighter Human Immersive Simulation Laboratory’s Infantry Immersive Trainer, 118 Washington Post, 368–69 Waterfall method, 11–12, 14t, 15, 16 Watz, E., 395 Web. See Internet Westwood, D. A., 212 Wetware computer, 360–61 Wetzel-Smith, S. K., 74 n.1 White, C. R., 209 Whole learning, 148 Widmer, Arthur, 219 WiFi wireless, 365–66
489
Wiggins, Ender. See Ender’s Game (Card) Wii, 143, 363 Wikipedia, 368 Wilbourn, J. M., 209 Wildcat graphics cards, 293 Williams, A., 291–93 Williams, J., 201 Williges, B. H., 166–67, 169, 174–75 Wilson, K. A., 151 Windows applications, 122 Windows Vista, 366 n.3 Wiss Federal Institute of Technology, 139 Witmer, B., 271, 273, 274 Wizard of Oz, 280 Woodworth, R. S., 196, 197, 203 Workload reduction via automation, 28 World of Warcraft, 131, 437 World Wide Web, 387. See also Internet World Wide Web Consortium, 437 World Wind, 387 Wright brothers, 337, 390 XML (Extensible Markup Language) scripting language, 261, 263 X-ray imaging, 396 Yates, W. W., 209 Yesterday’s Tomorrows (Corn & Horrigan), 355 Yokoya, N., 283 Yoon, S. U., 141 Zimet, E., 401 Zyda, M., 356
This page intentionally left blank
ABOUT THE EDITORS AND CONTRIBUTORS THE EDITORS JOSEPH COHN, Ph.D., is a Lieutenant Commander in the U.S. Navy, a full member of the Human Factors and Ergonomics Society, the American Psychological Association, and the Aerospace Medical Association. Selected as the Potomac Institute for Policy Studies’ 2006 Lewis and Clark Fellow, Cohn has more than 60 publications in scientific journals, edited books, and conference proceedings and has given numerous invited lectures and presentations. DENISE NICHOLSON, Ph.D., is Director of Applied Cognition and Training in the Immersive Virtual Environments Laboratory at the University of Central Florida’s Institute for Simulation and Training. She holds joint appointments in UCF’s Modeling and Simulation Graduate Program, Industrial Engineering and Management Department, and the College of Optics and Photonics. In recognition of her contributions to the field of Virtual Environments, Nicholson received the Innovation Award in Science and Technology from the Naval Air Warfare Center and has served as an appointed member of the international NATO Panel on “Advances of Virtual Environments for Human Systems Interaction.” She joined UCF in 2005, with more than 18 years of government experience ranging from bench level research at the Air Force Research Lab to leadership as Deputy Director for Science and Technology at NAVAIR Training Systems Division. DYLAN SCHMORROW, Ph.D., is an international leader in advancing virtual environment science and technology for training and education applications. He has received both the Human Factors and Ergonomics Society Leland S. Kollmorgen Spirit of Innovation Award for his contributions to the field of Augmented Cognition, and the Society of United States Naval Flight Surgeons Sonny Carter Memorial Award in recognition of his career improving the health, safety, and welfare of military operational forces. Schmorrow is a Commander in the U.S. Navy and has served at the Office of the Secretary of Defense, the Office of Naval Research, the Defense Advanced Research Projects Agency, the Naval Research Laboratory, the Naval Air Systems Command, and the Naval
492
About the Editors and Contributors
Postgraduate School. He is the only naval officer to have received the Navy’s Top Scientist and Engineers Award. THE CONTRIBUTORS ALI AHMAD is a Lead Researcher at Design Interactive, Inc. He holds a Ph.D. in Industrial Engineering from the University of Central Florida. His research interests include multimodal interaction design, audio interfaces, and advanced application of statistical techniques. Ali is a Certified Simulation Analyst and a Six Sigma Black Belt. G. VINCENT AMICO, Ph.D., is one of the pioneers of simulation—with over 50 years of involvement in the industry. He is one of the principal agents behind the growth of the simulation industry, both in Central Florida and nationwide. He began his simulation career in 1948 as a project engineer in the flight trainers branch of the Special Devices Center, a facility now known as NAVAIR Orlando. During this time, he made significant contributions to simulation science. He was one of the first to use commercial digital computers for simulation, and in 1966, he chaired the first I/ITSEC Conference, the now well-established annual simulation, training, and education meeting. By the time he retired in 1981, he had held both the Director of Engineering and the Direct of Research positions within NAVAIR Orlando. Amico has been the recipient of many professional honors, including the I/ITSEC Lifetime Achievement Award, the Society for Computer Simulation Presidential Award, and an honorary Ph.D. in Modeling and Simulation from the University of Central Florida. The NCS created “The Vince Amico Scholarship” for deserving high school seniors interested in pursuing study in simulation, and in 2001, in recognition of his unselfish commitment to simulation technology and training, Orlando mayor Glenda Hood designated December 12, 2001, as “Vince Amico Day.” DEE ANDREWS, Ph.D., is a Senior Scientist with the Human Effectiveness Directorate of the Air Force Research Laboratory. His Ph.D. is in Instructional Systems from Florida State University. His research interests include distributed simulation training, aircrew training, and cyberoperations training. RICHARD ARNOLD is President of Human Performance Architects, a human factors consulting firm based in Orlando, Florida, specializing in military personnel, training, and safety research. Prior to establishing the company, he served in the U.S. Navy as a designated Aerospace Experimental Psychologist. ED BACHELDER received his Ph.D. from the Massachusetts Institute of Technology subsequent to flying the SH-60B as a Naval Aviator. His areas of research at Systems Technology, Inc., include (1) augmented reality, (2) optimized control guidance for helicopter autorotation, (3) system identification, and (4) 3-D helicopter cueing for precision hover and nap-of-earth flight.
About the Editors and Contributors
493
JOHN BARNETT is a Research Psychologist with the U.S. Army, whose research interests include human-automation interaction, aviation, training, and human performance in extreme environments. He holds a Ph.D. in Applied Experimental and Human Factors Psychology from the University of Central Florida and is a former U.S. Air Force officer. KATHLEEN BARTLETT, Technical Writer, earned her MA in English at the University of Central Florida (UCF). After teaching for Orange County Public Schools, Bartlett held several instructional and administrative positions at UCF. In addition to writing for UCF’s Institute for Simulation and Training, she teaches at Florida Institute of Technology. WILLIAM BECKER, Ph.D., is research faculty in the MOVES Institute at the Naval Postgraduate School. His specialty is the development of hardware and software to support advanced training for military personnel. He is currently working with the Marine Corps. HERBERT BELL, Ph.D., is Technical Advisor for the Warfighter Readiness Division, Human Effectiveness Directorate of the Air Force Research Laboratory. His research interests include distributed simulation, training effectiveness, and research methodology. He received a Ph.D. in experimental psychology from Vanderbilt University. NOAH BRICKMAN graduated from University of California at Santa Cruz (UCSC) in 1995 with a bachelor’s degree in Computer Science. He has 12 years of experience writing virtual reality, aerospace simulation, AI, and gaming software. He is currently working for Systems Technology, Inc., and pursuing a computer science master’s degree at UCSC. C. SHAWN BURKE is a Research Scientist at the Institute for Simulation and Training, University of Central Florida. She is currently investigating team adaptability, multicultural team performance, multiteam systems, and leadership, measurement, and training of such teams. Dr. Burke received her doctorate in Industrial/Organizational Psychology from George Mason University in 2000. MEREDITH BELL CARROLL is a Senior Research Associate at Design Interactive, Inc. She is currently a Doctoral Candidate in Human Factors and Applied Experimental Psychology at the University of Central Florida. Her research interests include human/team performance and training in complex systems with focuses on performance measurement and virtual training technology. ROBERTO CHAMPNEY is a Senior Research Associate at Design Interactive, Inc. He is currently a Doctoral Candidate in Industrial Engineering at the University of Central Florida. His research interests include the design, development, and evaluation of human-interactive systems and emotions in user experience.
494
About the Editors and Contributors
NICOLE COEYMAN is a Science and Technology Manager for the Asymmetric Warfare–Virtual Training Technologies (AW-VTT) program at the U.S. Army RDECOM STTC. She received a B.S. degree in Computer Engineering from the University of Central Florida (UCF) and is currently pursuing a master’s degree in Industrial Engineering at UCF. JOSEPH COHN received his Ph.D. in Neuroscience from Brandeis University’s Ashton Graybiel Spatial Orientation Laboratory and continued his postdoctoral studies with Dr. J. A. Scott Kelso. His research interests focus on maintaining human performance/human effectiveness in real world environments by optimizing the symbiosis of humans and machines. CAROLINA CRUZ-NEIRA is the Executive Director and Chief Scientist of the Louisiana Immersive Technologies Enterprise (LITE). Her interests are in interdisciplinary applications for immersive Virtual Environments in combination with supercomputing and high speed networking. She was the technical developer of the original CAVE and is the holder of the IEEE VGTC Virtual Reality Technical Achievement Award 2007. RUDOLPH DARKEN is Professor of Computer Science at the Naval Postgraduate School. He is also the Director of Research for the Center for Homeland Defense and Security and is the former Director of the MOVES Institute for modeling and simulation. DAVID DORSEY is employed by the National Security Agency. Dr. Dorsey holds a Ph.D. in Industrial-Organizational Psychology and a graduate minor in Computer Science from the University of South Florida. His professional interests include performance measurement, testing and assessment, training and training technologies, and computational modeling. JAMES DUNNE, CDR, M.D., is chief of trauma and surgical critical care and surgical director of intensive care at the National Naval Medical Center in Bethesda, Maryland. He has completed two postdoctoral research fellowships focused on blood transfusion and its effect on morbidity and mortality in trauma. CALI FIDOPIASTIS, Ph.D., is the Associate Director for Applied Cognition in the ACTIVE Lab at the Institute for Simulation and Training at the University of Central Florida. Cali studies human brain plasticity in naturalistic environments employing biosensing devices, such as fNIR and EEG, along with biomathematical modeling techniques. NEAL FINKELSTEIN, Ph.D., is a graduate of Florida Atlantic University with a degree in Electrical Engineering. He also holds a Doctorate in Industrial Engineering from the University of Central Florida. At the STTC (Simulation and Training Technology Center) in Orlando, Florida, he serves as the technical
About the Editors and Contributors
495
advisor on technical, programmatic, and organizational issues that cut across the organization. J. D. FLETCHER is a member of the senior research staff at the Institute for Defense Analyses, where he specializes in personnel and human performance issues. His research interests include design and evaluation of education and training using technology, cost-effectiveness analysis, and the development of human performance and expertise. GEORGE GALANIS is head of the Training and Preparedness group with the Defence Science and Technology Organisation in Australia. His research includes investigating the effectiveness of simulation for individual and collective training, and learning at the organizational level. He holds a Ph.D. in Engineering and Human Factors from the RMIT (Royal Melbourne Institute of Technology). PAT GARRITY is a principal investigator at U.S. Army Research, Development and Engineering Command (RDECOM), Simulation and Training Technology Center (STTC). He currently works in Dismounted Simulation conducting R&D in the area of dismounted soldier training and simulation where he was the Army’s Science and Technology Objective Manager for the Embedded Training for Dismounted Soldiers program. ROBERT GEHORSAM is President of Forterra Systems, a provider of enterprise virtual world solutions. He has more than 25 years of management experience in the online games and digital media world and has held senior positions at Sony Online Entertainment, Viacom, Scholastic, and Prodigy Services Company. KEVIN GEISS, Ph.D., is a principal scientist in the Human Effectiveness Directorate of the Air Force Research Laboratory. Dr. Geiss was the Associate Chief Scientist for the Directorate from 2003 to 2006. Dr. Geiss holds a Ph.D. in Zoology and an M.S. in chemistry, both from Miami University, Ohio. STEPHEN GOLDBERG, Ph.D., is the Chief of the Orlando Research Unit of the U.S. Army Research Institute. He received a doctorate in Cognitive Psychology from the State University of New York at Buffalo. He supervises a research program focused on feedback processes and training in virtual simulations and games. BRIAN GOLDIEZ holds a Ph.D. in Modeling & Simulation and an M.S. in Computer Engineering from the University of Central Florida (UCF). He is Deputy Director at UCF’s Institute for Simulation and Training and has a joint appointment in Industrial Engineering and Management Systems. He has 30 years of experience in M&S.
496
About the Editors and Contributors
STUART GRANT is a Defence Scientist at Defence Research and Development Canada, where he leads the Learning and Training Group. His research addresses interfaces to virtual environments for dismounted combatants, simulation for direct fire training, and distributed simulation. He received a Ph.D. in Cognitive Psychology from the University of Toronto. GARY GREEN was a Principal Investigator for IST’s Embedded Simulation Technology Lab that is conducting research in support of the U.S. Army Research, Development and Engineering Command (RDECOM) Simulation and Training Technology Center (STTC). He has an M.S. in Operations Research from the U.S. Naval Postgraduate School. MATT GUIBERT has focused on human/machine interfacing problems at STI. Mr. Guibert was responsible for the creation of the system architecture and software development of STI’s Driver Assessment and Training System (DATS), which serves as a training and feedback module for STI’s commercially available driving simulator STISIM Drive. ALFRED HARMS JR., Vice Admiral, U.S. Navy (Ret), is currently serving on the staff of the University of Central Florida (UCF). Admiral Harms joined UCF following his final active-duty assignment where he served as Commander, Naval Education and Training Command where he was responsible for accession, professional and warfare training for all naval personnel. JOHN HART is the Chief of Creative Learning Technologies Division and the Learning with Adaptive Simulation and Training Army Technology Objective Manager at U.S. Army RDECOM-STTC. He manages research and development of immersive technologies to create effective learning environments for military training that include the use of game based technologies. CARL HOBSON is the founder and President of Oasis Advanced Engineering Inc. and has been involved in the simulation industry since 1985. Oasis has led the U.S. Army’s Embedded Training R&D for ground combat systems for the past decade. Prior to forming Oasis, Mr. Hobson managed the General Dynamics Land Systems engineering labs for eight years. ADAM HOOVER, Ph.D., earned his B.S. (1992) and M.S. (1993) in Computer Engineering and his Ph.D. (1996) in Computer Science and Engineering from the University of South Florida. Adam is currently an Associate Professor in Electrical and Computer Engineering at Clemson University. His research focuses on tracking, embedded systems, and physiological monitoring. CHARLES HUGHES is Professor and Associate Director of the School of Electrical Engineering and Computer Science at the University of Central Florida. He is also Director of the Media Convergence Laboratory at the Institute for
About the Editors and Contributors
497
Simulation and Training. His research interests are in mixed reality and interactive computer graphics. DARIN HUGHES is a research faculty member at the Media Convergence Laboratory within the Institute for Simulation and Training, University of Central Florida. His research interests include sound in simulation, auditory perception, and audio engines. CHRISTIAN JEROME is a research psychologist at the U.S. Army Research Institute. He received his Ph.D. in applied experimental psychology from the University of Central Florida in 2006. He has performed research with attention, situation awareness, decision making, presence, human performance cognitive modeling, driver distraction, training using virtual/augmented reality, and game based simulation. DAVID JONES is a Senior Research Associate at Design Interactive, Inc. He received his M.S. degree in Industrial Engineering from the University of Central Florida, where he focused on multimodal design science. At Design Interactive, he has performed training evaluations on a number of government funded training systems. PHILLIP JONES has over 22 years of professional experience as an Army combat-arms officer. He has extensive army and joint operational and training experience. More recently, Mr. Jones has led a series of studies on training effectiveness in army and joint units. ROBERT C. KENNEDY is a doctoral candidate in I/O Psychology at the University of Central Florida. His research experience includes NASA Space Adaptation, NSF Cognitive Effects of Stress, and DoD Sensory/Perceptual Performance, training transfer, and criterion development and measurement. ROBERT S. KENNEDY, Ph.D., has been a Human Factors Psychologist for over 48 years and has conducted projects with numerous agencies, including DoD, NASA, NSF, DOT, and NIH, on training and adaptation, human performance, and motion/VE sickness. He is also an Adjunct Professor at the University of Central Florida. BRUCE KNERR, Ph.D., is a team leader at the U.S. Army Research Institute, where he conducts research on the use of virtual simulations for soldier training. He received a B.S. in Psychology from The Pennsylvania State University and M.S. and Ph.D. degrees in Engineering Psychology from the University of Maryland. STEPHANIE LACKEY, Ph.D., is the Deputy Director of the Concept Development and Integration Laboratory at the Naval Air Warfare Center Training
498
About the Editors and Contributors
Systems Division. Stephanie earned a B.S. in Mathematics from Methodist University and M.S. and Ph.D. degrees from the Industrial Engineering and Management Systems Department, University of Central Florida. FOTIS LIAROKAPIS, Ph.D., holds a Ph.D. in Computer Engineering (University of Sussex), an MSc in Computer Graphics and Virtual Environments (University of Hull), and a BEng in Computer Systems Engineering (University of Sussex). He is employed by Coventry University as a Senior Lecturer and University of Sussex as a Visiting Lecturer. RODNEY LONG is the program lead for research using Massively Multiplayer Online Games (MMOG) at the SFC Paul Ray Smith Simulation and Training Technology Center. Mr. Long has a breadth of simulation and training experience that spans more than 20 years in the Department of Defense. TODD MACUDA, Ph.D., is currently Vice President of Business Development and Operations at Gladstone Aerospace Consulting. Dr. Macuda is a graduate of the University of Western Ontario and holds an undergraduate and master’s degrees in Psychology and a Ph.D. in Neuroscience. He is an adjunct professor of several universities and a qualified instructor of human factors and related aerospace medicine courses. HENRY MARSHALL’s 25-plus years with the government have been spent primarily in leading-edge simulation technologies, such as embedded training technology development, semi-automated forces (SAFs), and software acquisition. He received a BSE in Electrical Engineering and an M.S. in Systems Simulation from the University of Central Florida. THOMAS MASTAGLIO, Ph.D., has a career that includes 22 years of service as a U.S. Army Officer, a Senior Engineer, and a Program Manager in industry, as an independent consultant, a university faculty membership, and a business owner. His educational, research, and industry technical background includes computer and cognitive science, simulation design and development, training development and technology, and program development. MICHELLE MAYO is currently a Science and Technology Manager for the Tactical Digital Holograms project for RDECOM-STTC. Previously, the manager of the Combat Trauma Patient Simulation Program, Ms. Mayo has over eight years of experience in military simulation and training programs. She has a B.S. degree in Computer Engineering. CLAUDIA MCDONALD, Ph.D., leads the Center for Virtual Medical Education at Texas A&M University–Corpus Christi, specializing in research and development of sophisticated learning platforms utilizing virtual-world technologies. McDonald originated Pulse!! The Virtual Clinical Learning Lab, which has received more than $12 million in federal funding since March 2005.
About the Editors and Contributors
499
JAMES MCDONOUGH, MAJ, is an artillery officer and a graduate of the Naval Postgraduate School (NPS) in Modeling, Virtual Environments, and Simulations. Upon graduation from NPS he served as the Modeling and Simulation Officer for Training and Education Command Technology Division at Quantico, Virginia. He is presently assigned to 3d Battalion, 12th Marine Regiment, 3d Marine Division, Okinawa, Japan. GERALD MERSTEN, Ph.D., is the director of Technology Division of Marine Corps Training and Education Command at Quantico, Virginia. He has extensive executive experience in the aerospace industry. LAURA MILHAM received her doctorate from the Applied Experimental and Human Factors Psychology program at the University of Central Florida. At Design Interactive, she is the Training Systems Director and Principal Investigator of numerous projects in support of the development and assessment of the effectiveness of training systems and training management systems. JEFFREY MOSS is a retired U.S. Army warrant officer and UH-60 Black Hawk instructor pilot with nearly 2,000 hours of flight time as an instructor. After completing his army career, he began work in the civilian sector participating in joint, PC based simulation exercises and leading PC based simulation systems integration. PETE MULLER is President of Potomac Training Corporation and the Systems Engineer for ONR’s Human Performance, Training & Education (HPT&E) Thrust area. He served the same role for ONR’s Virtual Environments and Technologies (VIRTE). He has worked in the aerospace industry in systems engineering and program management for both large and small companies. ERIC MUTH, Ph.D., earned a B.A. from Hartwick College in 1991. He earned his M.S. and Ph.D. degrees in Psychology from The Pennsylvania State University in 1993 and 1997, respectively. Eric is currently a Professor of Psychology at Clemson University, where his work focuses on performance in high stress/ workload environments. LONG NGUYEN is an Electronics Engineer at the Naval Air Warfare Center Training Systems Division (NAWCTSD). He manages NAWCTSD’s Applied Modeling and Simulation Branch. He holds an M.S. in Electrical Engineering from the University of Central Florida and is pursuing a Ph.D. in Industrial Engineering from the same. DENISE NICHOLSON, Ph.D., is the Director of the Applied Cognition and Training in Immersive Virtual Environments Laboratory at the University of Central Florida’s Institute for Simulation and Training (IST). Her additional UCF appointments include the Modeling and Simulation Graduate Program, the
500
About the Editors and Contributors
Department of Industrial Engineering and Management Systems, and the College of Optics and Photonics/CREOL. JACK NORFLEET heads the Medical Simulation Technologies group at the Army’s RDECOM-STTC. He has 24 years of experience in military simulation with experience in medical simulations, live training, and in instrumentation systems. He has a BSEE from UCF and an MBA from Webster University. He has also trained as an EMT. JOHN OWEN serves as Head of the Weapon Systems HSI Branch at the Naval Air Warfare Center Training Systems Division. He holds a B.S. in Electrical Engineering. Mr. Owen previously served as the Training Program Manager for LPD 17 and is the former manger for SEAPRINT. DANIEL PATTON is the Deputy Director for Surface and Expeditionary Warfare Projects at the Naval Air Warfare Center, Training Systems Division (NAWC-TSD) in Orlando, Florida. He is a retired Naval Officer and holds an M.A. in Instructional Systems Technology. M. BETH PETTITT is the Division Chief for Soldier Simulation Environments (SSE), Simulation and Training Technology Center, RDECOM. Prior to this position, she was instrumental in establishing STRICOM’s CTPS and Advanced Trauma Patient Simulation (ATPS) DTO programs. Ms. Pettitt has over 19 years of experience in military modeling and simulation. JAMES PHARMER, Ph.D., works as a Senior Research Psychologist at the Naval Air Warfare Center Training Systems Division (NAWCTSD) applying human systems integration (HSI) principles into navy system acquisition programs. He holds a Ph.D. in Applied Experimental Human Factors Psychology and an M.S. in Engineering Psychology. WILLIAM PIKE is a Science and Technology Manager at U.S. Army RDECOM-STTC, where he leads research and development efforts in medical simulations to train combat medics. He has also led research efforts on PC game based simulations to determine their most effective use to support military training. DIRK REINERS is an Assistant Professor at the University of Louisiana, Lafayette. His interests are in interactive 3-D graphics for complex scenes and software systems for real time rendering applications, as well as computer game design and development. He is the project lead for the OpenSG Open Source scenegraph project (http://opensg.vrsource.org/trac). KATRINA RICCI, Ph.D., is a Senior Research Psychologist with the Naval Air Warfare Center Training Systems Division in Orlando, Florida. Dr. Ricci earned an M.S. in Industrial Organizational Psychology and a Ph.D. in Human Factors
About the Editors and Contributors
501
Psychology from the University of Central Florida. Dr. Ricci has over 20 years of experience in training, human performance, and HSI. DAVID ROLSTON is Chairman and CEO of Forterra Systems and has over 35 years of experience in the high technology industry spanning a broad spectrum of industries, applications, and technologies, including extensive involvement in simulation and training, graphics applications, imagery, gaming, artificial intelligence, entertainment, and early versions of the Internet. STEVEN RUSSELL is a Research Scientist at Personnel Decisions Research Institutes in Arlington, Virginia. He holds a Ph.D. in Industrial-Organizational Psychology from Bowling Green State University. His professional interests include the design and evaluation of training programs, criterion measurement, and test development and validation, including item response theory (IRT) techniques. RICHARD SCHAFFER is a Principal Investigator at Lockheed Martin’s Advanced Simulation Center in Burlington, Massachusetts. He has been a simulation technology researcher since joining the DARPA SIMNET team in 1985 and has served as PI or Lead Simulation Integrator for numerous defense modeling and simulation programs. Richard is a Lockheed Martin Fellow. DYLAN SCHMORROW, Ph.D., is an international leader in advancing virtual environment science and technology for training and education applications. Dr. Schmorrow is a Commander in the U.S. Navy and has served at the Office of the Secretary of Defense, the Office of Naval Research, the Defense Advanced Research Projects Agency, and the Naval Research Laboratory. TOM SCHNELL is an Associate Professor in Industrial Engineering at the University of Iowa. He is the Director of the Operator Performance Laboratory (OPL). Tom has degrees in electrical (BSEE) and in industrial engineering (MS, Ph.D.). He is a commercial pilot, flight instructor, and helicopter and glider pilot with jet-type ratings. LEE SCIARINI is a doctoral candidate the University of Central Florida. His research interests include training system development and effectiveness, human performance, human systems integration, team performance, unmanned systems, neuronergonomics, augmented cognition, and how all of these areas can be leveraged to enhance future systems. RANDALL SHUMAKER, Ph.D., is the Director of the Institute for Simulation and Training (IST). Previous assignments include Superintendent for Information Technology at the U.S. Naval Research Laboratory and Director of the Navy Center for Applied Research in Artificial Intelligence. His research interests include artificial intelligence, biomorphic computing, and human-agent collaboration.
502
About the Editors and Contributors
ALEXANDER SINGER has been a Motion Picture Director for 40 years, directing over 280 TV shows in all forms and genres, five feature films, and a short film for DARPA. Three projects with the NRC led to an award as Lifetime National Associate of the National Academies (of Science). JUDITH SINGER has published two novels, written a Columbia Pictures feature screenplay and various daytime and prime-time TV screenplays, has contracted treatments for TV and feature films, and shared conceptualizing with her husband’s science-driven explorations, including this project. For a decade she has been a professional Film Script Supervisor. EILEEN SMITH is the Associate Director of the Media Convergence Laboratory and an Instructor in the Digital Media Department at the University of Central Florida. Her research interests are using mixed reality to enhance learning experiences. ROGER SMITH, Ph.D., is the Chief Technology Officer for U.S. Army Simulation, Training and Instrumentation. He holds degrees in computer science (Ph.D.), statistics (M.S.), mathematics (B.S.), and management (MBA and M.S.). ROBERT SOTTILARE is the Deputy Director for the U.S. Army Research, Development and Engineering Command’s Simulation and Training Technology Center in Orlando, Florida. He has an M.S. in Simulation and is currently a Ph.D. candidate in the Modeling & Simulation program at the University of Central Florida. KAY STANNEY is President of Design Interactive, Inc. She received her Ph.D. in Industrial Engineering from Purdue University, after which time she spent 15 years as a professor at the University of Central Florida. She has over 15 years of experience in the design, development, and evaluation of human-interactive systems. ROY STRIPLING, Ph.D., is the program manager for the Office of Naval Research Human Performance, Training, and Education thrust area. He previously served as the head of the Warfighter Human-Systems Integration Laboratory at the Naval Research Laboratory. Dr. Stripling received his Ph.D. in neuroscience from the University of Illinois. JOSEPH SULLIVAN, CDR, is a Permanent Military Professor in the Computer Science Department at the Naval Postgraduate School (NPS). Prior to assignment to NPS, CDR Sullivan completed numerous operational tours as helicopter pilot. FRED SWITZER III, Ph.D., earned a B.A. from University of Texas at Austin in 1975. He earned his M.S. and Ph.D. degrees in Industrial/Organizational Psychology from Lamar University in 1982 and University of Illinois at
About the Editors and Contributors
503
Urbana-Champaign in 1988, respectively. Fred is currently a Professor of Psychology at Clemson University. Dr. JACK THORPE is a consultant involved in the definition and planning of advanced technology development projects. His expertise is in Distributed Simulation, and he was the Program Manager at the Defense Advanced Research Projects Agency that created SIMNET, micro-travel, video arcade trainers, the electronic sand table, and seamless simulation. JUAN VAQUERIZO has been an industry leader in VST for the past 25 years. He founded two high technology visual simulation companies: Soft Reality and Advanced Simulation Research. He has been at the forefront of the design, development, and delivery of hundreds of training systems including the U.S. Army’s RDECOM DAGGERS. DENNIS VINCENZI, Ph.D., is a Research Psychologist for the NAWCTSD in Orlando, Florida. He earned his Ph.D. in Human Factors Psychology from the University of Central Florida in 1998. Dr. Vincenzi is currently the lead researcher on an ONR-funded research project involving Next Generation Helmet-Mounted Display Systems. DANIEL WALKER, COL, is Chief, Warfighter Readiness Research Division, Air Force Research Laboratory. He holds a B.S. from the USAF Academy, four master’s degrees, and is pursuing a Ph.D. He is a Master Navigator with experience in the B-1 and the B-52, including commanding the B-1 Division of the USAF Weapons School. WILLIAM WALKER is a Visual Systems Engineer with HMD, collimated, domes, and large cylindrical display system experience. He has a BSEE from Auburn University and is a Navy Surface Warfare Officer. Prior work includes C-130H, F-14A & D, and E-2C aviation visual systems. He is the visual engineer on the COVE and VESUB projects. LORI WALTERS, Ph.D., is joint faculty with the Institute of Simulation and Training and Department of History at the University of Central Florida. Her research interests are the use of virtual reality to enhance the story of history and technology in the museum. TIMOTHY WANSBURY is a Technology Transition Officer at U.S. Army RDECOM-STTC, where he leads efforts in transitioning tools, technologies, and prototypes developed through a variety of research and development projects. He has led research efforts focused on developing a better understanding of how to design, develop, and use PC game based simulations to support military training.
504
About the Editors and Contributors
SANDRA WETZEL-SMITH is a Senior Research Psychologist at the Space and Naval Warfare Systems Center in San Diego, California. She also serves as Director, Tactical Systems, at the Naval Mine and ASW Command, San Diego. Her most recent awards include the SSC-SDLauritzen-Bennett award for Scientific Excellence and the Federal Computer Week Federal 100. MICHAEL WHITE is a Certified Modeling and Simulation Professional (CMSP) with Alion Science and Technology and has over 10 years of experience in Modeling and Simulation. Mr. White holds a B.S. in Professional Aeronautics and an MBA/A from Embry-Riddle Aeronautical University and is pursuing a Ph.D. at Old Dominion University. SUSAN WHITE is an independent consultant in the Washington, D.C., area. Dr. White holds a Ph.D. in Industrial-Organizational Psychology from the University of Maryland. Her professional interests include performance appraisal, training, services marketing and management, and climate and culture. MARK WIEDERHOLD Ph.D., is President and Director of Virtual Reality Medical Center in San Diego. Dr. Wiederhold and his team have been treating patients with VR therapy for the past 12 years. He has served on several advisory, editorial, and technical boards, and he has more than 150 scientific publications. WALLACE WULFECK is a Senior Research Psychologist at the Space and Naval Warfare Systems Center where he serves as Co-Principal Investigator and Project Scientist on the Interactive Multisensor Analysis Training (IMAT) project. He previously directed the Instructional Simulations Division and Training Research Computing Facility and served at the Office of Naval Research. WILLIAM YATES, LT COL, is an artillery officer and a graduate of the Naval Postgraduate School in Modeling, Virtual Environments, and Simulations. He was the director of the Battle Simulation Center at MAGTF Training Command, Twentynine Palms prior to being assigned as the M&S officer for the Program Manager for Training systems.