Engineering Psychology and Cognitive Ergonomics: 8th International Conference, EPCE 2009, Held as Part of HCI International 2009, San Diego, CA, USA, ... Lecture Notes in Artificial Intelligence)

Lecture Notes in Artiﬁcial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster Subseries of Lecture Notes in...

Author: Don Harris

22 downloads 1164 Views 20MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Artiﬁcial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science

5639

Don Harris (Ed.)

Engineering Psychology and Cognitive Ergonomics 8th International Conference, EPCE 2009 Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 Proceedings

13

Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editor Don Harris Cranﬁeld University, School of Engineering Department of Systems Engineering and Human Factors Cranﬁeld, Bedford MK43 0AL, UK E-mail: d.harris@cranﬁeld.ac.uk

Library of Congress Control Number: 2009928834

CR Subject Classiﬁcation (1998): I.2.0, I.2, H.5, H.1.2, H.3, H.4.2, I.6, J.2-3 LNCS Sublibrary: SL 7 – Artiﬁcial Intelligence ISSN ISBN-10 ISBN-13

0302-9743 3-642-02727-X Springer Berlin Heidelberg New York 978-3-642-02727-7 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientiﬁc Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12706648 06/3180 543210

Foreword

The 13th International Conference on Human–Computer Interaction, HCI International 2009, was held in San Diego, California, USA, July 19–24, 2009, jointly with the Symposium on Human Interface (Japan) 2009, the 8th International Conference on Engineering Psychology and Cognitive Ergonomics, the 5th International Conference on Universal Access in Human-Computer Interaction, the Third International Conference on Virtual and Mixed Reality, the Third International Conference on Internationalization, Design and Global Development, the Third International Conference on Online Communities and Social Computing, the 5th International Conference on Augmented Cognition, the Second International Conference on Digital Human Modeling, and the First International Conference on Human Centered Design. A total of 4,348 individuals from academia, research institutes, industry and governmental agencies from 73 countries submitted contributions, and 1,397 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of the design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human-computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. This volume, edited by Don Harris, contains papers in the thematic area of Engineering Psychology and Cognitive Ergonomics, addressing the following major topics: • • • •

Cognitive Approaches in HCI Design Interaction and Cognition Driving Safety and Support Aviation and Transport

The remaining volumes of the HCI International 2009 proceedings are: • • • • • •

Volume 1, LNCS 5610, Human–Computer Interaction––New Trends (Part I), edited by Julie A. Jacko Volume 2, LNCS 5611, Human–Computer Interaction––Novel Interaction Methods and Techniques (Part II), edited by Julie A. Jacko Volume 3, LNCS 5612, Human–Computer Interaction––Ambient, Ubiquitous and Intelligent Interaction (Part III), edited by Julie A. Jacko Volume 4, LNCS 5613, Human–Computer Interaction––Interacting in Various Application Domains (Part IV), edited by Julie A. Jacko Volume 5, LNCS 5614, Universal Access in Human–Computer Interaction––Addressing Diversity (Part I), edited by Constantine Stephanidis Volume 6, LNCS 5615, Universal Access in Human–Computer Interaction––Intelligent and Ubiquitous Interaction Environments (Part II), edited by Constantine Stephanidis

VI

Foreword

• • • • • • • • • •

Volume 7, LNCS 5616, Universal Access in Human–Computer Interaction––Applications and Services (Part III), edited by Constantine Stephanidis Volume 8, LNCS 5617, Human Interface and the Management of Information––Designing Information Environments (Part I), edited by Michael J. Smith and Gavriel Salvendy Volume 9, LNCS 5618, Human Interface and the Management of Information––Information and Interaction (Part II), edited by Gavriel Salvendy and Michael J. Smith Volume 10, LNCS 5619, Human Centered Design, edited by Masaaki Kurosu Volume 11, LNCS 5620, Digital Human Modeling, edited by Vincent G. Duffy Volume 12, LNCS 5621, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris Volume 13, LNCS 5622, Virtual and Mixed Reality, edited by Randall Shumaker Volume 14, LNCS 5623, Internationalization, Design and Global Development, edited by Nuray Aykin Volume 15, LNCS 5624, Ergonomics and Health Aspects of Work with Computers, edited by Ben-Tzion Karsh Volume 16, LNAI 5638, The Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience, edited by Dylan Schmorrow, Ivy Estabrooke and Marc Grootjen

I would like to thank the Program Chairs and the members of the Program Boards of all thematic areas, listed below, for their contribution to the highest scientific quality and the overall success of HCI International 2009.

Ergonomics and Health Aspects of Work with Computers Program Chair: Ben-Tzion Karsh Arne Aarås, Norway Pascale Carayon, USA Barbara G.F. Cohen, USA Wolfgang Friesdorf, Germany John Gosbee, USA Martin Helander, Singapore Ed Israelski, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Kari Lindström, Finland

Holger Luczak, Germany Aura C. Matias, Philippines Kyung (Ken) Park, Korea Michelle M. Robertson, USA Michelle L. Rogers, USA Steven L. Sauter, USA Dominique L. Scapin, France Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK Teresa Zayas-Cabán, USA

Foreword

Human Interface and the Management of Information Program Chair: Michael J. Smith Gunilla Bradley, Sweden Hans-Jörg Bullinger, Germany Alan Chan, Hong Kong Klaus-Peter Fähnrich, Germany Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA Youngho Rhee, Korea

Anxo Cereijo Roibás, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P.R. China Bernhard Zimolong, Germany

Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven Landry, USA

Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China

Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Ron Laughery, USA Wen-Chin Li, Taiwan James T. Luxhøj, USA

Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK

VII

VIII

Foreword

Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa

Matthew J.W. Thomas, Australia Mark Young, UK

Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth André, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian Bühler, Germany Noelle Carbonell, France Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Arthur I. Karshmer, USA Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA

Patrick M. Langdon, UK Seongil Lee, Korea Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Toshiki Yamaoka, Japan Panayiotis Zaphiris, UK

Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA

Gordon M. Mair, UK Miguel A. Otaduy, Switzerland David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Dieter Schmalstieg, Austria Dylan Schmorrow, USA Mark Wiederhold, USA

Internationalization, Design and Global Development Program Chair: Nuray Aykin Michael L. Best, USA Ram Bishu, USA Alan Chan, Hong Kong Andy M. Dearden, UK

Susan M. Dray, USA Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA

Foreword

Sung H. Han, Korea Veikko Ikonen, Finland Esin Kiris, USA Masaaki Kurosu, Japan Apala Lahiri Chavan, USA James R. Lewis, USA Ann Light, UK James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA

Elizabeth D. Mynatt, USA Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin Röse, Germany Eunice Ratna Sari, Indonesia Supriya Singh, Australia Christian Sturm, Spain Adi Tedjasaputra, Singapore Kentaro Toyama, India Alvin W. Yeo, Malaysia Chen Zhao, P.R. China Wei Zhou, P.R. China

Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Amy Bruckman, USA Peter Day, UK Fiorella De Cindio, Italy Michael Gurstein, Canada Tom Horan, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Jonathan Lazar, USA Stefanie Lindstaedt, Austria

Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan Anthony F. Norcio, USA Jennifer Preece, USA Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Sergei Stafeev, Russia Charalambos Vrasidas, Cyprus Cheng-Yen Wang, Taiwan

Augmented Cognition Program Chair: Dylan D. Schmorrow Andy Bellenkes, USA Andrew Belyavin, UK Joseph Cohn, USA Martha E. Crosby, USA Tjerk de Greef, The Netherlands Blair Dickson, UK Traci Downs, USA Julie Drexler, USA Ivy Estabrooke, USA Cali Fidopiastis, USA Chris Forsythe, USA Wai Tat Fu, USA Henry Girolamo, USA

Marc Grootjen, The Netherlands Taro Kanno, Japan Wilhelm E. Kincses, Germany David Kobus, USA Santosh Mathan, USA Rob Matthews, Australia Dennis McBride, USA Robert McCann, USA Jeff Morrison, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Glenn Osga, USA

IX

X

Foreword

Dennis Proffitt, USA Leah Reeves, USA Mike Russo, USA Kay Stanney, USA Roy Stripling, USA Mike Swetnam, USA Rob Taylor, UK

Maria L.Thomas, USA Peter-Paul van Maanen, The Netherlands Karl van Orden, USA Roman Vilimek, Germany Glenn Wilson, USA Thorsten Zander, Germany

Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Thomas J. Armstrong, USA Norm Badler, USA Kathryn Cormican, Ireland Afzal Godil, USA Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Sung H. Han, Korea Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Tianzi Jiang, P.R. China

Kang Li, USA Zhizhong Li, P.R. China Timo J. Määttä, Finland Woojin Park, USA Matthew Parkinson, USA Jim Potvin, Canada Rajesh Subramanian, USA Xuguang Wang, France John F. Wiechel, USA Jingzhou (James) Yang, USA Xiu-gan Yuan, P.R. China

Human Centered Design Program Chair: Masaaki Kurosu Gerhard Fischer, USA Tom Gross, Germany Naotake Hirasawa, Japan Yasuhiro Horibe, Japan Minna Isomursu, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan

Kun-Pyo Lee, Korea Loïc Martínez-Normand, Spain Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Kazuhiko Yamazaki, Japan

In addition to the members of the Program Boards above, I also wish to thank the following volunteer external reviewers: Gavin Lew from the USA, Daniel Su from the UK, and Ilia Adami, Ioannis Basdekis, Yannis Georgalis, Panagiotis Karampelas, Iosif Klironomos, Alexandros Mourouzis, and Stavroula Ntoa from Greece. This conference could not have been possible without the continuous support and advice of the Conference Scientific Advisor, Prof. Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications Chair and Editor of HCI International News, Abbas Moallem.

Foreword

XI

I would also like to thank for their contribution toward the organization of the HCI International 2009 conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, and Maria Bouhli. Constantine Stephanidis

Table of Contents

Part I: Cognitive Approaches in HCI Design Towards Cognitive-Aware Multimodal Presentation: The Modality Eﬀects in High-Load HCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yujia Cao, Mari¨et Theune, and Anton Nijholt Supporting Situation Awareness in Demanding Operating Environments through Wearable User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jari Laarni, Juhani Heinil¨ a, Jukka H¨ akkinen, Virpi Kalakoski, Kari Kallinen, Kristian Lukander, Paula L¨ opp¨ onen, Tapio Palom¨ aki, Niklas Ravaja, Paula Savioja, and Antti V¨ a¨ at¨ anen Development of a Technique for Predicting the Human Response to an Emergency Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Glyn Lawson, Sarah Sharples, David Clarke, and Sue Cobb A Dynamic Task Representation Method for a Virtual Reality Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Chiara Leva, Alison Mragaret Kay, Fabio Mattei, Tom Kontogiannis, Massimiliano De Ambroggi, and Sam Cromie An Investigation of Function Based Design Considering Aﬀordances in Conceptual Design of Mechanical Movement . . . . . . . . . . . . . . . . . . . . . . . . . Ying-Chieh Liu and Su-Ju Lu CWE: Assistance Environment for the Evaluation Operating a Set of Variations of the Cognitive Walkthrough Ergonomic Inspection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Mahatody, Christophe Kolski, and Mouldi Sagar

3

13

22

32

43

52

The Use of Multimodal Representation in Icon Interpretation . . . . . . . . . . Sin´e McDougall, Alexandra Forsythe, Sarah Isherwood, Agnes Petocz, Irene Reppa, and Catherine Stevens

62

Beyond Emoticons: Combining Aﬀect and Cognition in Icon Design . . . . Sin´e McDougall, Irene Reppa, Gary Smith, and David Playfoot

71

Agency Attribution in Human-Computer Interaction . . . . . . . . . . . . . . . . . John E. McEneaney

81

Human-UAV Co-operation Based on Artiﬁcial Cognition . . . . . . . . . . . . . . Claudia Meitinger and Axel Schulte

91

XVI

Table of Contents

Development of an Evaluation Method for Oﬃce Work Productivity . . . . Kazune Miyagi, Hiroshi Shimoda, Hirotake Ishii, Kenji Enomoto, Mikio Iwakawa, and Masaaki Terano

101

Supporting Cognitive Collage Creation for Pedestrian Navigation . . . . . . Augustinus H.J. Oomes, Miroslav Bojic, and Gideon Bazen

111

Development of a Novel Platform for Greater Situational Awareness in the Urban Military Terrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephen D. Prior, Siu-Tsen Shen, Anthony S. White, Siddharth Odedra, Mehmet Karamanoglu, Mehmet Ali Erbil, and Tom Foran The User Knows: Considering the Cognitive Contribution of the User in the Design of Auditory Warnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Catherine Stevens and Agnes Petocz

120

126

Part II: Interaction and Cognition The Inﬂuence of Gender and Age on the Visual Codes Working Memory and the Display Duration – A Case Study of Fencers . . . . . . . . . . . . . . . . . Chih-Lin Chang, Kai-Way Li, Yung-Tsan Jou, Hsu-Chang Pan, and Tai-Yen Hsu Comparison of Mobile Device Navigation Information Display Alternatives from the Cognitive Load Perspective . . . . . . . . . . . . . . . . . . . . Murat Can Cobanoglu, Ahmet Alp Kindiroglu, and Selim Balcisoy

139

149

Visual Complexity: Is That All There Is? . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexandra Forsythe

158

Operational Decision Making in Aluminium Smelters . . . . . . . . . . . . . . . . . Yashuang Gao, Mark P. Taylor, John J.J. Chen, and Michael J. Hautus

167

Designers of Diﬀerent Cognitive Styles Editing E-Learning Materials Studied by Monitoring Physiological and Other Data Simultaneously . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Bodn´ K´ aroly Hercegﬁ, Olga Csillik, Eva ar, Judit Sass, and Lajos Izs´ o Analyzing Control-Display Movement Compatibility: A Neuroimaging Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S.M. Hadi Hosseini, Maryam Rostami, Makoto Takahashi, Naoki Miura, Motoaki Sugiura, and Ryuta Kawashima Graphics and Semantics: The Relationship between What Is Seen and What Is Meant in Icon Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sarah Isherwood

179

187

197

Table of Contents

The Eﬀect of Object Features on Multiple Object Tracking and Identiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tianwei Liu, Wenfeng Chen, Yuming Xuan, and Xiaolan Fu Organizing Smart Networks and Humans into Augmented Teams . . . . . . Martijn Neef, Martin van Rijn, Danielle Keus, and Jan-Willem Marck Quantitative Evaluation of Mental Workload by Using Model of Involuntary Eye Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Goro Obinata, Satoru Tokuda, Katsuyuki Fukuda, and Hiroto Hamada Spatial Tasks on a Large, High-Resolution Tiled Display: Females Mentally Rotate Large Objects Faster Than Men . . . . . . . . . . . . . . . . . . . . Bernt Ivar Olsen, Bruno Laeng, Kari-Ann Kristiansen, and Gunnar Hartvigsen Neurocognitive Workload Assessment Using the Virtual Reality Cognitive Performance Assessment Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas D. Parsons, Louise Cosand, Christopher Courtney, Arvind Iyer, and Albert A. Rizzo

XVII

206 213

223

233

243

Sensing Directionality in Tangential Haptic Stimulation . . . . . . . . . . . . . . . Greg Placencia, Mansour Rahimi, and Behrokh Khoshnevis

253

Eﬀects of Design Elements in Magazine Advertisements . . . . . . . . . . . . . . Young Sam Ryu, Taewon Suh, and Sean Dozier

262

The Inﬂuence of Shared-Representation on Shared Mental Models in Virtual Teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rose Saikayasit and Sarah Sharples

269

Harnessing the Power of Multiple Tools to Predict and Mitigate Mental Overload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charneta Samms, David Jones, Kelly Hale, and Diane Mitchell

279

Acceptance of E-Invoicing in SMEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karl W. Sandberg, Olof Wahlberg, and Yan Pan Mental Models in Process Visualization - Could They Indicate the Eﬀectiveness of an Operator’s Training? . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karin Schweizer, Denise Gramß, Susi M¨ uhlhausen, and Birgit Vogel-Heuser Eﬀects of Report Order on Identiﬁcation on Multidimensional Stimulus: Color and Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I-Hsuan Shen and Kong-King Shieh

289

297

307

XVIII

Table of Contents

Conﬁdence Bias in Situation Awareness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ketut Sulistyawati and Yoon Ping Chui

317

Tactical Reconnaissance Using Groups of Partly Autonomous UGVs . . . . Peter Svenmarck, Dennis Andersson, Bj¨ orn Lindahl, Johan Hedstr¨ om, and Patrik Lif

326

Part III: Driving Safety and Support Use of High-Fidelity Simulation to Evaluate Driver Performance with Vehicle Automation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timothy Brown, Jane Moeckli, and Dawn Marshall

339

Applying the “Team Player” Approach on Car Design . . . . . . . . . . . . . . . . Staﬀan Davidsson and H˚ akan Alm

349

New HMI Concept for Motorcycles–The Saferider Approach . . . . . . . . . . . J.P. Frederik Diederichs, Marco Fontana, Giacomo Bencini, Stella Nikolaou, Roberto Montanari, Andrea Spadoni, Harald Widlroither, and Niccol` o Baldanzini

358

Night Vision - Reduced Driver Distraction, Improved Safety and Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Klaus Fuchs, Bettina Abendroth, and Ralph Bruder Measurement of Depth Attention of Driver in Frontal Scene . . . . . . . . . . . Mamiko Fukuoka, Shun’ichi Doi, Takahiko Kimura, and Toshiaki Miura Understanding the Opinion Forming Processes of Experts and Customers During Evaluations of Automotive Sounds . . . . . . . . . . . . . . . . Louise Humphreys, Sebastiano Giudice, Paul Jennings, Rebecca Cain, Garry Dunne, and Mark Allman-Ward HR Changes in Driving Scenes with Danger and Diﬃculties Using Driving Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yukiyo Kuriyagawa, Mieko Ohsuga, and Ichiro Kageyama Driver Measurement: Methods and Applications . . . . . . . . . . . . . . . . . . . . . Shane McLaughlin, Jonathan Hankey, and Thomas Dingus The Assessment of Driver’s Arousal States from the Classiﬁcation of Eye-Blink Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshihiro Noguchi, Keiji Shimada, Mieko Ohsuga, Yoshiyuki Kamakura, and Yumiko Inoue Guiding a Driver’s Visual Attention Using Graphical and Auditory Animations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tony Poitschke, Florian Laquai, and Gerhard Rigoll

367 376

386

396 404

414

424

Table of Contents

Fundamental Study for Relationship between Cognitive Task and Brain Activity During Car Driving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shunji Shimizu, Nobuhide Hirai, Fumikazu Miwakeichi, Senichiro Kikuchi, Yasuhito Yoshizawa, Masanao Sato, Hiroshi Murata, Eiju Watanabe, and Satoshi Kato A Study on a Method to Call Drivers’ Attention to Hazard . . . . . . . . . . . . Hiroshi Takahashi

XIX

434

441

An Analysis of Saccadic Eye Movements and Facial Images for Assessing Vigilance Levels During Simulated Driving . . . . . . . . . . . . . . . . . Akinori Ueno, Shoyo Tei, Tomohide Nonomura, and Yuichi Inoue

451

Implementing Human Factors within the Design Process of Advanced Driver Assistance Systems (ADAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boris van Waterschoot and Mascha van der Voort

461

A Survey Study of Chinese Drivers’ Inconsistent Risk Perception . . . . . . . Pei Wang, Pei-Luen Patrick Rau, and Gavriel Salvendy

471

Design for Smart Driving: A Tale of Two Interfaces . . . . . . . . . . . . . . . . . . Mark S. Young, Stewart A. Birrell, and Neville A. Stanton

477

Part IV: Aviation and Transport Supervision of Autonomous Vehicles: Mutual Modeling and Interaction Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gilles Coppin, Fran¸cois Legras, and Sylvie Saget Conﬂicts in Human Operator – Unmanned Vehicles Interactions . . . . . . . Fr´ed´eric Dehais, Stephane Mercier, and Catherine Tessier Ergonomic Analysis of Diﬀerent Computer Tools to Support the German Air Traﬃc Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muriel Didier, Margeritta von Wilamowitz-Moellendorﬀ, and Ralph Bruder

489 498

508

Behavior Model Based Recognition of Critical Pilot Workload as Trigger for Cognitive Operator Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . Diana Donath and Axel Schulte

518

A Design and Training Agenda for the Next Generation of Commercial Aircraft Flight Deck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Don Harris

529

Future Ability Requirements for Human Operators in Aviation . . . . . . . . Catrin Hasse, Carmen Bruder, Dietrich Grasshoﬀ, and Hinnerk Eißfeldt

537

XX

Table of Contents

The Application of Human Error Template (HET) for Redesigning Standard Operational Procedures in Aviation Operations . . . . . . . . . . . . . Wen-Chin Li, Don Harris, Yueh-Ling Hsu, and Lon-Wen Li

547

Eﬀect of Aircraft Datablock Complexity and Exposure Time on Performance of Change Detection Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chen Ling and Lesheng Hua

554

A Regulatory-Based Approach to Safety Analysis of Unmanned Aircraft Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ¨ James T. Luxhøj and Ahmet Oztekin

564

Using Acoustic Sensor Technologies to Create a More Terrain Capable Unmanned Ground Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siddharth Odedra, Stephen D. Prior, Mehmet Karamanoglu, Mehmet Ali Erbil, and Siu-Tsen Shen

574

Critical Interaction Analysis in the Flight Deck . . . . . . . . . . . . . . . . . . . . . . Chiara Santamaria Maurizio, Patrizia Marti, and Simone Pozzi

580

Understanding the Impact of Rail Automation . . . . . . . . . . . . . . . . . . . . . . . Sarah Sharples, Nora Balfe, David Golightly, and Laura Millen

590

Cognitive Workload as a Predictor of Student Pilot Performance . . . . . . . Nathan F. Tilton and Ronald Mellado Miller

600

Direct Perception Displays for Military Radar-Based Air Surveillance . . . Oliver Witt, Morten Grandt, and Heinz K¨ uttelwesch

606

A Selection of Human Factors Tools: Measuring HCI Aspects of Flight Deck Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rolf Zon and Henk van Dijk

616

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

625

Towards Cognitive-Aware Multimodal Presentation: The Modality Effects in High-Load HCI Yujia Cao, Mariët Theune, and Anton Nijholt Human Media Interaction Group, University of Twente, P.O. Box 217, 7500AE, Enschede, The Netherlands {y.cao,m.theune,a.nijholt}@utwente.nl

Abstract. In this study, we argue that multimodal presentations should be created in a cognitive-aware manner, especially in a high-load HCI situation where the user task challenges the full capacity of the human cognition. An experiment was conducted to investigate the cognitive effects of modalities, using a high-load task. The performance measurements and subjective reports consistently confirm a significant modality impact on cognitive workload, stress and performance. A relation between modality usage and physiological states was not found, due to the insufficient sensitivity and individual differences of the physiological measurements. The findings of this experiment can be well explained by several modality-related cognitive theories. We further integrate these theories into a suitability prediction model, which can systematically predict how suitable a certain modality usage is for this presentation task. The model demonstrates a possible approach towards cognitive-aware modality planning and can be modified for other applications. Keywords: Cognitive-aware, multimodal presentation, modality planning, cognitive load, stress, performance, high-load HCI.

1 Introduction Advanced human-computer interactions are often accomplished through multiple modalities, such as text, images, speech, and sound. Modality planning in HCI is often accomplished in a context-aware manner, i.e. the modalities to be used are selected according to communication goals, user profiles, environmental conditions and resource limitations [1]. However, as multimodal presentations are created for human users to perceive, process and act upon, computer systems should understand not only how to convey information, but also how human minds take in and process the information. When taking into account the modality-related knowledge of human cognition, multimodal presentations could be created in a cognitive-aware manner, so they can be more efficiently perceived and processed. We believe that the cognitive aspects of modality planning are particularly essential in a high-load HCI situation, where the interaction challenges the full capacity of the human cognition. A huge body of psychology studies provides modality-related cognitive theories and principles that are potentially useful for cognitive-aware modality planning. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 3–12, 2009. © Springer-Verlag Berlin Heidelberg 2009

4

Y. Cao, M. Theune, and A. Nijholt

According to Baddeley’s working memory model [4], the working memory has separated stores (perception channels) for visual information and auditory information, and each store has a limited capacity. Therefore, the capacity of working memory can be better used when both channels are used to perceive information. This theory is known as the dual-channel theory. Another modality-related finding of Baddeley is that the working memory relies on sub-vocal speech to rehearse information and maintain memory traces [3]. Furthermore, the dual-coding theory of Paivio [13] states that verbal and nonverbal information are represented and processed in separated mental systems. These two systems are interconnected through dynamic associative processes. Studies on multimedia learning have demonstrated that the associative processes between the verbal and nonverbal mental systems play a major role in knowledge comprehension and long-term memorization [8, 12]. In this study, we attempt to apply these modality-related cognitive theories as a foundation of cognitive-aware multimodal presentation. An experiment was conducted to investigate the cognitive effects of modalities, using a high-load HCI task. The results showed a significant modality impact on performance, cognitive workload and stress. Based on the experimental findings, we integrate the relevant theories into a prediction model that can systematically compare the suitability of different modality usages for this specific task.

2 Experiment We created an earthquake rescue scenario, where the locations of wounded and dead people are continuously reported to the crisis response center and displayed on a computer screen. Based on these reports, a crisis manager directs a doctor to reach all wounded people and save their lives. In this experiment, the subject plays the role of the crisis manager and his/her task is to save as many wounded victims as possible. 2.1 Presentation Material For each victim report, two types of information can be provided: basic information and additional aid. The basic information includes the type of the victim (wounded or dead) and its location. The additional aid reduces the searching area by indicating which half of the screen (left or right) contains this victim. To convey these two types of information, we selected four modalities based on their visual/auditory and verbal/nonverbal properties: text (visual, verbal), image (visual, nonverbal), speech (auditory, verbal) and sound (auditory, nonverbal). The basic information can be efficiently conveyed by locating a visual object on a map. We use text or image to present a victim (see Fig. 1), and the cell it occupies on a grid-based map indicates the location of the victim (see Fig. 2). The additional aid can be presented by text (‘left’ or ‘right’), image (a left arrow or a right arrow), speech (‘left’ or ‘right’) or sound (an ambulance sound coming from the left or the right speaker). Previous studies suggest that the categorization and understanding of concrete objects are slower when they are presented by text than by image [2, 7, 14]. Therefore, in order to better observe the benefit of the additional aid, text is used to present the basic information if additional aids are given. Finally, five experimental conditions were selected (see Table 1).

Towards Cognitive-Aware Multimodal Presentation

5

Table 1. Five experimental presentation conditions Index 1 2 3 4 5

Basic Information Text Image Text Text Text

Additional aid None None Image Speech Sound

Fig. 1. Text and image presentations

Modality properties Visual, verbal Visual, nonverbal Visual + visual, verbal + nonverbal Visual + auditory, verbal + verbal Visual + auditory, verbal + nonverbal

Fig. 2. Sample of the grid-based map (partial) of the victim type

2.2 Task The subject played the role of the crisis manager. The task was to send the doctor to each patient by mouse-clicking on the presentation (text or image). A new patient appeared with a random interval between 2 to 5 seconds, usually at the same time as one or more dead victims. A patient had a life time of 10 seconds and would turn into a dead victim without a timely treatment. A number above the presentation of a patient indicated his remaining life time. When timely treated, patients disappeared from the screen. In each trial, 100 patients were presented in about 5 minutes. The difficulty of the task could be regulated by the number of distracters (dead victims). At the beginning of a trial, there was no object on the grid map and the task was relatively easy. As the number of dead victims grew, it became more and more difficult to identify a patient in the crowded surroundings. The task difficulty reached the maximum (about 40% of the cells contained objects) after about 150 seconds and remained unchanged for the rest of the trial (see Fig. 3). 2.3 Measurements Three categories of measurements were applied, namely performance, subjective and physiological measurements. The performance in each trial was evaluated by three variables: 1) the average reaction time to click on a patient (in seconds), 2) the total number of patients that were not treated within 10 seconds and 3) the time stamp when the first patient died in a trial (in seconds). The NASA Task Load Index [9] was used to obtain subjective reports on cognitive workload and stress. A 20-level rating, from very low (1) to very high (20), was performed on cognitive workload and stress, respectively. In order to further assess cognitive workload from the physiological

6

Y. Cao, M. Theune, and A. Nijholt

states, we recorded the electrocardiograms, galvanic skin conductance and respiration during the experiment. Scientific literature suggests that when the cognitive demand increases, the heart rate increases, the heart rate variability decreases, the skin conductivity increases and the respiration rate increases [5, 10, 15, 17]. 2.4 Subjects and Procedure 20 university students (bachelor, master or Ph.D.) volunteered to participate in this experiment. After entering the lab, the experimenter first applied the physiological sensors to the participant, while he/she was listening to soothing music. When the sensors were set, an additional resting period of 5 minutes was given and then the baseline physiological state was recorded for 5 minutes. Afterwards, the participant received an introduction to the experiment and performed a short training session in order to get familiar with the task and presentation conditions. Finally, the participant performed the five experimental trials with a counterbalanced order. A 5 minutes break was placed between each two successive trials. The subjective ratings on cognitive load and stress were conducted during the breaks. The whole experimental procedure lasted for about 80 minutes.

3 Results Due to our experimental design, we applied repeated-measure ANOVAs on the dependent measurements, with modality as a within-subject factor. Results from the three categories of measurements are presented in this section. 3.1 Performance Measurements ANOVA results showed a significant modality effect on all the three performance measurements. First, modality had an effect on the average reaction time, F (2.87, 54.51) = 12.76, p<0.001. Subjects reacted the fastest in the ‘text + speech aid’ condition, spending 1.95 seconds on average to rescue a patient. The average reaction time was the longest (3.05 seconds) in the ‘text + no aid’ condition. Second, the modality factor also has an effect on the number of dead patients, F (2.36, 44.84) = 16.81, p<0.001. The least patients died in the ‘text + speech aid’ condition (3 on average), and the most died in the ‘text + no aid’ condition (11 on average). Third, modality had an effect on the time stamp of the first dead patient in a trial, F (4, 76) = 17.71, p<0.001. The first dead patient occurred the earliest in the ‘text + no aid’ condition (at the 73th second on average) and the latest in the ‘text + speech aid’ condition (at the 221th second on average). As mentioned above, the task became more and more difficult as the number of distracters (dead victims) increased. Therefore, the time difference of the first dead patient indicates that, due to the different modality usages, the performance had different levels of tolerance against the increase of the task difficulty (see Fig. 3). In general, the ‘text + no aid’ and ‘text + image aid’ conditions form a low performance group; while the other three conditions form a high performance group. Pair-wise comparisons (post-hoc tests) show significant differences between all pairs of conditions that are taken from different groups (see [6] for more details).

Towards Cognitive-Aware Multimodal Presentation

Fig. 3. The time stamps of the first dead patient

7

Fig. 4. Subjective rating on cognitive workload and stress

Moreover, results of these three measurements are strongly positively correlated, indicating that when modalities are more properly used, subjects react faster, fewer patients die and a good performance holds longer. 3.2 Subjective Measurements The average subjective ratings on cognitive workload and stress are shown in Fig. 4. ANOVA results show a significant modality effect on both the subjective cognitive workload (F (4, 76) = 16.91, p<0.001) and the subjective stress (F (4, 76) = 9.379, p<0.001). There is a strong positive correlation between these two measurements (Cor. = 0.855), suggesting that subjects feel more stressed when they devote more cognitive efforts into the task. The experienced stress and cognitive workload are the highest in the ‘text + no aid’ condition and the lowest in the ‘text + speech aid’ condition. Furthermore, the subjective measurements are also positively correlated with the performance measurements, indicating that when the task is more difficult due to an improper usage of modalities, subjects devote more cognitive effort, feel more stressful and perform worse. The subjective measurements also show a twogroup pattern. Subjective stress and cognitive workload are significantly higher in the ‘text + no aid’ and the ‘text + image aid’ conditions than in the other three conditions. 3.3 Physiological Measurements Table 2 shows the seven physiological measurements that are calculated for each trial (based on [11, 16]). In order to eliminate the individual differences in physiological activities, the measurements are normalized within each subject, using the baseline values. For example, the average HP value of trial n from subject 1 (s1) is normalized as follows: ‫ܲܪ‬௡̴௦ଵ̴௡௢௥௠ ൌ

‫ܲܪ‬௡̴௦ଵ െ ‫ܲܪ‬ୠୟୱୣ୪୧୬ୣ̴ୱଵ ൫‫ܲܪ‬௡̴௦ଵ ห ൌ ͳǡʹǡ ǥ ǡͷ൯ െ ‫ܲܪ‬ୠୟୱୣ୪୧୬ୣ̴ୱଵ

ǡ

݊ ൌ ͳǡʹǡ ǥ ǡͷ

8

Y. Cao, M. Theune, and A. Nijholt Table 2. Physiological measurements Category Heart period

Measurement HP LF

Heart rate variability

RMSSD NN50

Skin conductivity

GSRN GSL

Respiration

RP

Description The time interval between two successive heart beats The total spectrum power of a heart period series at band 0.07~0.14Hz (HRV analysis at the frequency domain) The square root of the mean squared differences of successive heart periods (HRV analysis at the time domain) The number of internal differences of successive heart periods that are greater than 50ms (HRV at time domain) The number of event-related skin conductance responses The tonic level of skin conductivity The time interval between two successive respiration peaks

Using repeated-measure ANOVAs, a modality effect was only found on the LF measurement at the 90% confidence level, F (4, 76) = 2.52, p = 0.07. This result indicates that the variance in the task difficulty (due to different modality conditions) did not have a significant impact on the subjects’ physiological states. 3.4 Discussion The modality effects on cognitive workload, stress and performance have been consistently confirmed by the performance and the subjective measurements. First, image is a better modality than text to present the basic information. In line with the literature, our results suggest that when presenting concrete objects, the understanding and categorization of images is easier and faster than text. Second, the two ‘visual + auditory’ modality combinations (see Table 1) significantly outperform the ‘visual + visual’ combination. Since the basic information already imposes a high load to the visual perception channel, an additional visual aid can only further split up the attention and cause distractions, instead of actually aiding the performance. On the other hand, an auditory aid can be of real help, because it effectively provides extra information without imposing any extra load to the visual channel. This finding is in line with the dual-channel theory of Baddeley [4]. Third, the verbal additional aids significantly outperform the nonverbal additional aids. In this task, it often happens that new patients arrive when one or more earlier presented patients are still un-rescued. When additional aids are given, subjects try to maintain a queue of ‘left’s and ‘right’s in their minds while searching for the earlier presented patients, because their lifetimes are decreasing as they wait for treatment. Since the working memory relies on verbal codes to maintain memory traces, verbal aids can be directly rehearsed and maintained. However, nonverbal aids require an extra translation through the associative processes between the two mental systems, resulting in a higher cognitive demand. This finding is consistent with the working memory theory of Baddeley and the dual-coding theory of Paivio [13]. Finally, the ‘text + speech aid’ condition, as a ‘visual + auditory’ combination with verbal aids, is proved to be the best modality condition among the five. Although ‘text + no aid’ is the worst condition, when the additional aid is presented by a proper modality, the combination significantly improves the performance and reduces the cognitive load and stress.

Towards Cognitive-Aware Multimodal Presentation

9

Table 3. Individual differences in the sensitivity of physiological measurements Subject index 1 2 3 4 5

No. of good p.p. 55 27 37 39 43

No. of bad p.p. 55 27 37 39 43

Physiological measurement with significant t-test results at 95% cl. at 90% cl. none none GSL HP RMSSD NN50 HP GSL RP HP

A modality effect on physiological states is not found. Further analyses provide two explanations for this result. First, for each measurement, we applied a t-test, comparing the mean values of the 5 experimental trials to the baseline values. Significant differences were found in all measurements except GSRN. For HP, LF, GSL and RP, the differences were in the expected direction for all 20 subjects. This shows that the physiological measurements did pick up the major changes in cognitive demand between the baseline and the 5 experimental conditions. However, it seems that they were not sensitive enough to reflect the relatively small variances in the cognitive demand between the 5 experimental trials. Second, the level of sensitivity of the physiological measurements might be different for each subject, i.e. a measurement might be sensitive for some subjects, but not all. If so, statistical analyses of the data from 20 subjects would not reveal robust patterns. To prove this explanation, we selected a number of good and bad performance periods from the performance data of each of 5 subjects (randomly selected). Bad performance periods were taken by selecting a 10-second window centered at all time stamps when a patient died. The same number of good performance periods (also 10 seconds long) was taken from the beginning of the five trials when the task was relatively easy. We assume that the cognitive load level was higher in the bad performance periods than in the good performance periods. Six 1 measurements were re-calculated in each period. Then, t-tests were conducted between these two conditions, for each measurement respectively. The results (see Table 3) indeed show individual differences in the sensitivity of physiological measurements. For example, the heart rate measurement (HP) is sensitive for subjects 2, 4 and 5. Heart rate variability (RMSSD, NN50) is sensitive only for subject 3. Skin conductivity (GSL) is sensitive for subjects 2 and 4. Respiration (RP) is sensitive only for subject 5. This finding indicates that the physiological assessment of cognitive workload should take the individual differences between subjects into account, especially when the variances to be detected are relatively minor.

4 A Suitability Prediction Model The modality-related cognitive theories are in line with our experimental findings, suggesting that they could be applied as a theoretical foundation for cognitive-aware modality planning. In this section, we demonstrate a possible way of integrating these 1

LF is not applicable with a 10s window. Normally, about 300 data points (about 5 minutes) are required to resolve frequencies down to the 0.01Hz level [16].

10

Y. Cao, M. Theune, and A. Nijholt

cognitive theories into a model that can systematically predict the suitability of a certain modality usage for our presentation task. In this task, a modality usage is considered as more suitable if processing the information presented with this modality usage imposes lower cognitive load on the users. Based on the same set of modalities as used in our experiment, the model is able to predict the suitability of all modality combinations, including those that are not investigated in the experiment. A linear utility function is constructed that takes modalities as inputs and outputs a value describing the suitability level. The higher the output value is, the more suitable the input modality usage is. The function contains three attributes: 1) the representative property of the modality that presents the basic information (B). In our scenario, auditory modalities (speech and sound) cannot be used to present the locations of objects (see [6] for details). Therefore, possible options are text and image. Based on the literature and our experimental findings, image is more suitable than text, thus a 2 is assigned to image and a 1 to text; 2) the perception property of the modality that presents the additional aids (P). Possible options are visual, auditory and none. Based on the dual–channel theory, when a visual modality is used for B, a visual aid causes distraction and harms the performance, while an auditory aid assists the performance. Therefore, a -1 is assigned to the visual modalities and a 1 to the auditory modalities; 3) the mental system the assisting modality belongs to (M). Possible options are verbal, nonverbal and none. According to the working memory theory and the dual-coding theory, verbal aids are more beneficial than nonverbal aids, thus a 2 is assigned to verbal modalities and a 1 to nonverbal modalities. Furthermore, a weight (݂) is assigned to each attribute, determining how much the attribute contributes to the final suitability score. The summary of the three weights is 1. Finally, the suitability prediction model is as follows: ܵ‫ ݕݐ݈ܾ݅݅ܽݐ݅ݑ‬ൌ ݂஻ ൈ ‫ ܤ‬൅ ݂௉ ൈ ܲ ൅ ݂ெ ൈ ‫ܯ‬

The choice of weights can be influenced by the task condition. For example, the factor M is less important in a low-load condition than in a high-load condition, because there is less or no pending information to be maintained. The factor P can be very important in an extremely low-load condition (e.g. one patient per hour), because the visual vigilance could be low but the auditory signals have an alerting function. For a high-load situation as in our experiment, the weights are set to 0.5, 0.3 and 0.2 for B, P and M, respectively. The suitability predictions for 10 possible modality usages are shown in Table 4. The outcomes for the five experimental conditions are consistent with the performance results and the subjective rating, indicating the validity of this model. The ‘image + speech aid’ combination is predicted to be the best modality usage for this presentation task in a high-load condition. This suitability prediction model demonstrates the possibility to quantitatively evaluate the cognitive effects of modalities and systematically select the best modality usage for a specific presentation task. To generalize to other applications, the following aspects need to be considered: 1) the output: how to define suitability based on the presentation goal; 2) the attributes: what criteria to use to predict suitability based on related theories; 3) the weights: how important is each attribute based on task conditions. Moreover, when contextual aspects need to be taken into account for modality planning, they can be either treated as separate attributes or as conditional switches that selectively determine attribute values or weights. For instance, the

Towards Cognitive-Aware Multimodal Presentation

11

Table 4. Predicted suitability of 10 possible modality usages Index

Modality for basic info. 1* text 2 text 3* text 4* text 5* text 6* image 7 image 8 image 9 image 10 image *: experimental conditions

Modality for additional aid none text image speech sound none text image speech sound

B 0.5 1 1 1 1 1 2 2 2 2 2

P 0.3 0 -1 -1 1 1 0 -1 -1 1 1

M 0.2 0 2 1 2 1 0 2 1 2 1

Suitability score 0.5 0.6 0.4 1.2 1.0 1.0 1.1 0.9 1.7 1.5

B value is currently set to 2 for image and 1 for text. However, for the relatively small group of text-oriented users (1 subject out of 20 in our experiment), the B value should be 1 for image and 2 for text, assuming the user preferences are available.

5 Conclusion and Future Work In this study, we conducted an experiment to investigate the cognitive effects of modality, using a high load HCI task. The performance and subjective measurements consistently show a significant modality effect on cognitive workload, stress and performance, indicating the necessity of conducting modality planning in a cognitiveaware manner. When presenting information for a high-load perception task, such as the one in our experiment, the most suitable modality combination is the one that distributes the information load into two perception channels and provides verbal aids to assist the short-term maintenance of the pending tasks. Although no relation was found between modality usage and subjects’ physiological states, the analyses of physiological data brought an important implication to the physiological assessment of cognitive load. That is to take the individual differences in the sensitivity of physiological features into account, especially when the variances to be detected are relatively minor. The findings of this experiment are in line with several modality-related cognitive theories that can be applied as a theoretical background for cognitive-aware modality planning. We encoded these theories into a linear prediction model. The model quantitatively predicts the cognitive effects of a modality usage and thus determines how suitable it is for a given presentation task. The components of this model can be re-designed for other applications. Future work is considered in several aspects. First, the complexity of the presented information can be increased in order to better explore the expressive power of the text modality. Second, the modality effects observed in this high-load perception task need to be further validated with higher level cognitive tasks, such as reasoning, comprehension and decision making. Finally, the generalization of the modality evaluation model needs further investigations.

12

Y. Cao, M. Theune, and A. Nijholt

Acknowledgments. This research is part of the Interactive Collaborative Information System (ICIS) Project. ICIS is sponsored by the Dutch Ministry of Economic Affairs, grant nr: BSIK03024.We thank C. Mühl and E. L. Abrahamse for their help with setting up the experiment. We also thank the 20 participants for their effort and time.

References 1. Andre, E.: The Generation of Multimedia Presentations. In: Dale, R., Somers, H.L., Moisl, H. (eds.) Handbook of Natural Language Processing. Marcel Dekker, Inc., USA (2000) 2. Bachvarova, Y., van Dijk, B., Nijholt, A.: Towards a Unified Knowledge-Based Approach to Modality Choice. In: Multimodal Output Generation (MOG), pp. 7–15 (2007) 3. Baddeley, A.D.: Essentials of Human Memory. Psychology Press, USA (1999) 4. Baddeley, A.D., Hitch, G.J.: Working Memory. The Psychology of Learning and Motivation: Advances in Research and Theory 8, 47–89 (1974) 5. Boucsein, W., Haarmann, A., Schaefer, F.: Combining Skin Conductance and Heart Rate Variability for Adaptive Automation During Simulated IFR Flight. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS, vol. 4562, pp. 639–647. Springer, Heidelberg (2007) 6. Cao, Y., Theune, M., Nijholt, A.: Modality Effects on Cognitive Load and Performance in High-Load Information Presentation. In: Intelligent User Interface (IUI), pp. 335–344 (2009) 7. Carr, T.H., McCauley, C., Sperber, R.D., Parmelee, C.M.: Words, Pictures, and Priming: On Semantic Activation, Conscious Identification, and the Automaticity of Information Processing. J. Exp. Psychol. Hum. Percept. Perform. 8, 757–777 (1982) 8. Clark, J.M., Paivio, A.: Dual Coding Theory and Education. Educational Psychology Review 3, 149–210 (1991) 9. Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Human Mental Workload 1, 139–183 (1988) 10. Kramer, A.F.: Physiological Metrics of Mental Workload: A Review of Recent Progress. In: Damos, D.L. (ed.) Multiple-Task Performance. CRC Press, USA (1991) 11. Malik, M.: Heart Rate Variability - Standards of Measurement, Physiological Interpretation, and Clinical Use. Circulation 93, 1043–1065 (1996) 12. Mayer, R.E., Moreno, R.: Nine Ways to Reduce Cognitive Load in Multimedia Learning. Educational Psychologist 38, 45–52 (2003) 13. Paivio, A.: Mental Representations: A Dual Coding Approach. Oxford University Press, USA (1986) 14. Potter, M.C., Faulconer, B.A.: Time to Understand Pictures and Words. Nature 253, 437– 438 (1975) 15. Scerbo, M.W., Freeman, F.G., Mikulka, P.J., Parasuraman, R., Di Nocero, F.: The Efficacy of Psychophysiological Measures for Implementing Adaptive Technology. TP-2001211018, NASA Langley Research Center, Hampton (2001) 16. Stern, R.M., Ray, W.J., Quigley, K.S.: Psychophysiological Recording, 2nd edn. Oxford University Press, UK (2001) 17. Verwey, W.B., Veltman, H.A.: Detecting Short Periods of Elevated Workload: A Comparison of Nine Workload Assessment Techniques. Applied Experimental Psychology 2, 270–285 (1996)

Supporting Situation Awareness in Demanding Operating Environments through Wearable User Interfaces Jari Laarni1, Juhani Heinilä2, Jukka Häkkinen3, Virpi Kalakoski4, Kari Kallinen5, Kristian Lukander4, Paula Löppönen6, Tapio Palomäki7, Niklas Ravaja5, Paula Savioja1 and Antti Väätänen2 1

VTT Technical Research Centre of Finland, Vuorimiehentie 3, Espoo, P.O. Box 1000, FI-02044 VTT, Finland 2 VTT Technical Research Centre of Finland, Tekniikankatu 1, P.O. Box 1300, FI-331010 Tampere, Finland 3 Department of Psychology, P.O. Box 9, FI-00014 University of Helsinki, Finland 4 Finnish Institute of Occupational Health, Topeliuksenkatu 41 a A, FI-00250 Helsinki, Finland 5 Center for Knowledge and Innovation Research, Helsinki School of Economics, Fredrikinkatu 48 A, 9th Floor, P.O. Box 1210, FI-00101 Helsinki, Finland 6 Birger Kaipiaisen katu 4 c 31, FI-00560 Helsinki, Finland 7 Finnish Defence Forces Technical Research Centre, Kokonkatu 2 A, P.O. Box 10, FI-11311 Riihimäki, Finland

Abstract. The military environment is physically and mentally extremely stressful. Tasks in the operating environment are varied, demanding and hazardous. Due to these challenges, new user interfaces (UIs) are required providing improved soldier protection and performance both in day-time and night-time conditions. The new UIs should, e.g., improve the soldier’s situation awareness, i.e., perception of information, integration of pieces of information, determination of their relevance to one’s goals, and projection of their status in the future. The aim of the Finnish project called “Supporting situation awareness in demanding operating environments through wearable interfaces” is to develop UIs for wearable computers that help the special force soldier carry out his/her main critical tasks, e.g., detection and identification of enemies and features of the surrounding environment, navigation and self-localization, development of tactics and communication between and within military units. The main portions of the work are task and work analysis, and conceptual design and evaluation of prototype systems. The present paper present the project and the methods that are used in the functional analysis of military tasks. Keywords: Situation Awareness, Cognitive Task Analysis, Military Domain, Wearable User Interface, Future Soldier, Future Warrior.

1 Introduction New user interfaces (UIs) developed into military environment are required providing improved soldier protection and performance both in day-time and night-time D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 13–21, 2009. © Springer-Verlag Berlin Heidelberg 2009

14

J. Laarni et al.

conditions. The new UIs should, e.g., improve the soldier’s situation awareness, i.e., perception of information, integration of pieces of information, determination of their relevance to one’s goals, and projection of their status in the future. The aim is to keep the soldier aware of what is going on around there and make him/her possible to act and react quickly in accordance with the general goals and objectives. In the numerous studies introducing novel technologies and UIs for infantry soldiers, little attention has been paid to the fact, that the physical and cognitive demands brought along wearable technology should not increase the physical and mental workload of soldiers. Doubtless, this situation should have a significant effect on the technology and UIs to be introduced for military use. Additionally, it will cause demands on the methods to be applied in the analysis of complicated working environments of the present and future infantry soldiers. The Finnish project SAWUI (“Supporting situation awareness in demanding operating environments through wearable interfaces”) aims to research and develop novel UIs for wearable computers that help a special force soldier carry out his/her main critical tasks, i.e. detection and identification of enemies and features of the surrounding environment, navigation and self-localization, development of tactics and communication between and within military units. The main stages of the development process are task and work analysis, conceptual design and evaluation of prototype systems. The project started in August 2008 and it will end in July 2010. The project will be proceeded with the research institutes, companies and Finnish Defence Forces. The aim of the present paper is to present the SAWUI project, especially focusing on the methods used in the functional analysis of military tasks. These methods have a basic role in requirements capturing process of the new wearable military UIs. The solutions we study, design and present in this study are based human-centric design approach.

2 Background Generally speaking, the focus of the development of the future soldier systems is to improve the lethality and survivability of an individual soldier. The aim is that the performance of a soldier becomes faster and more effective, and his/her ability to adapt to tactical changes is more effective. In order to reach this aim, the soldier should, for example, perceive better in a dim light and at long distance range; the soldier has a direct connection to supporting systems; he/she is able to utilize advanced weaponry, and the weapons he/she uses are reliable and accurate. Since the soldier is constantly integrated into a communication network he/she is able to communicate with others all the time. However, the soldier is not a robot or a cyborg, since he/she should be able to adapt those systems he/she is integrated with. Several countries have a program or programs for the development of future soldier systems underway. For example, they are including LandWarrior and Future Force Warrior (USA), Félin (France), IdZ (Germany), FIST (UK) and Integrated Soldier System Project (Canada). In many of these projects the aim is specifically to develop a system that improves soldiers’ performance in urban operating environments.

Supporting Situation Awareness in Demanding Operating Environments

15

According to literature, the basic elements of a future soldier system are 1) an operating centre including a wearable user interface, a positional and navigation system and a system supporting situation awareness; 2) a modular weapon system; 3) individual equipment including clothing, protection and a carrying system; 4) a communication system including a short-distance radio and a rescue radio and 5) a training system [7]. In a similar way, according to the Finnish STAE-report, the development of a system supporting a soldier’s performance should be considered as an integrated whole including clothing, ballistic protection, different types of sensors, weaponry and all kinds of devices and equipment supporting the soldier’s performance [5]. A future soldier equipped with these devices and systems can be thought as some kind of a node in a large information network: On the one hand, he/she is a passive sensor and node in a communication network, on the other hand, he/she is also an independent actor in the operational field. In other words, he/she is a subsystem in a system that is a part of a larger system of systems. Wearable multimodal user interfaces are supposed to improve the detection and identification of critical information by directing the soldier’s attention to the right direction; they should provide sensory enhancement by improving the operator's ability to localize targets and self and navigate in the environment; they should give better dynamic information by keeping the operator up to date on changes and situational factors in the operating environment; they should provide information sharing between members of the team and support planning and dynamic decision making; and finally they should foster distributed decision making by providing information across teams and between commanders and the team and by supporting different ways to comprehend and integrate information [4]. In order to reach these goals, wearable systems should be context-sensitive, proactive, ‘prosthetic’ and user-friendly [6]. They should be context sensitive so that they are able to identify the action possibilities and constraints of a situation, make these possibilities visible to the user, and help users to become aware of the meaning of different activities. They should be proactive in a positive way, and help users to engage to different activities. They should also be 'prosthetic' and augment performance on tasks in which the hands are busy. In addition, they should be nonobtrusive and easy to use. To that aim, it should be useful if novel visualization techniques, browsing methods and multimodality are applied.

3 Goals of the SAWUI Project In order to be able to develop new technical systems for the military environment the key task we have to perform in the project is to define and characterize soldiers' physical, perceptual and cognitive activities, analyze their task performance and describe the environments in which soldiers perform (i.e. work domain and task analysis). Furthermore, and based on it, we have to define the physical, perceptual and cognitive requirements that have to be supported by the new UIs introduced for military use. As a result of these analyses, we will be able to obtain an understanding of the user and system requirements and a view of the strengths and weaknesses of the existing systems and technologies.

16

J. Laarni et al.

Based on the results and recommendations received from the work domain and task analysis, the next step is to specify the user and system requirements of wearable UIs to be used in the military related purposes. At the same time, the user experience and the ability of wearable UIs in maintaining situational awareness with minimum disturbing stress and cognitive load for the user will be studied. Finally, based on the requirements captured, as described above, a prototype system will be implemented. Such prototype can be used, e.g., for training and for gaining user feedback from real end users. The evaluations will take place both in a laboratory (simulated contexts) and in the real life conditions. The focus of the evaluations will be on the technology maturity and applicability, on system usability and on cognitive load and ergonomics of the wearable system implemented. The work will be supported by military experts from the Finnish Defence Forces who will give their assistance in each step of the project.

4 Methods The development of a wearable multimodal soldier system consists of several steps [3]. First, the aim is to characterize the information requirements of infantry soldiers for some representative operational tasks. Secondly, the requirements for a wearable multimodal soldier system are specified. This includes user requirements for information processing and specifications for applications supporting situation awareness, decision making and communication and collaboration in demanding environments. Thirdly, based on the studies mentioned above, wearable soldier system proof of concept will be implemented. Finally, an evaluation of the prototypes in terms of their usability and functionality will be carried out. 4.1 Background Preparation and Domain Familiarization The aim of this phase of the study is to determine the user group, the planned tasks and functions together with the existing technologies. The purpose is to familiarize the researchers involved to the military domain, to define the scope for the task analysis, to select the most critical tasks for further analysis, to define the key aspects of the task and to determine the perceptual and cognitive skills that are supported. Furthermore, the aim is to familiarize to the future user interface systems that are based on pervasive and ubiquitous computing technologies and to prepare a literature review on the application of these technologies in the military domain. 4.2 Observations and Knowledge Elicitation The key requirements are identified through interviews, literature reviews and observations of real soldier performance in the field context. Different tools will be used to break down activities into meaningful functions and tasks and determine which of them are likely to be perceptually and cognitively challenging to soldiers. A modified Decision-Centered Design Method is applied consisting of the following five stages, preparation, knowledge elicitation, analysis and representation, application design and evaluation [2]. This phase includes the identification and understanding of the military domain, tasks and users and identification of cognitively

Supporting Situation Awareness in Demanding Operating Environments

17

complex tasks. Knowledge elicitation methods are used to understand the information needs, attentional demands and critical decisions and to identify team structure and communication. The analysis and representation stage are used to decompose data into discrete elements and to identify the users’ decision requirements. 4.3 Study and Analysis of Wearable UI Technologies Studies on commercially available wearable UI technologies will be carried out in this phase. This is made by providing expert evaluations for a set of most promising wearable UIs and by studying their prospects and limitations in the case of military context of use. This task will provide a more in-depth understanding of the requirements of wearable UI technologies from the cognitive, operational, human factors and ergonomic point of views, and it will give more specific user and technical requirements for wearable user interfaces of a future soldier in maintaining situational awareness in an increasing and simultaneous flow of information from different sources. This analysis will take into account the special characteristics, such as climatic and time of day circumstances. 4.4 Work Domain and Task Analysis The main aim at this phase is to gather and analyze information from different sources concerning the requirements that are critical to the effective performance in the military domain. This will provide us with a deeper understanding of the tasks and contexts in which the wearable UI system will be used, and the aim is to characterize the key requirements of the battlefield domain. This becomes necessary in order to understand the physical, perceptual and cognitive activities and perceptual and cognitive constraints presented by the environment as well as to understand the knowledge, behavior and strategies of soldiers. The basic methods and techniques in the acquisition of these requirements are cognitive task analysis techniques and work domain analysis techniques joined together with a profound understanding of the perceptual and cognitive mechanisms that underlie the soldier performance [3]. Cognitive task analysis techniques applied here area set of techniques to describe the perceptual and cognitive demands of the soldier's tasks, e.g., soldier information needs, decision making strategies, critical decisions and how technology may support performance [3]. The aim is to determine and analyze which kind of information is used in those critical tasks and how this information is received. Work domain analysis provides information of the constraints of the military environments. It complements the cognitive task analyses by providing some of the constraints and possibilities of the environment that may have an impact on the soldiers' physical, perceptual and cognitive activities. The work domain analysis consists of the hierarchical representation of the domain including functional goal-means decomposition of the application domain. The aim is to understand and analyze the goals to be achieved and the functional means for achieving the goal. Different methods and techniques can be used to collect data for the work domain analysis [3]. Cognitive work requirements analysis includes the determination of the cognitive demands for the domain model [3]. Cognitive work

18

J. Laarni et al.

requirements (i.e. detection, recognition, attention control/focusing, monitoring, problem solving, decision making) are determined for each work domain concept. Information and relationship requirements analysis includes the identification of the information and actions required ("what is needed"), i.e. a set of information elements and action possibilities needed for the settling the above-mentioned requirements [3]. The cognitive task analysis leads to the determination of perceptual and cognitive demands of the soldiers' tasks. Cognitive demands are difficult, challenging and frequent decisions and tasks within the military domain including relevant information about why the activities are challenging and difficult, what strategies are used to carry out these tasks, what supporting information is needed and what are the common errors and difficulties in conducting the task [2]. The most important aspect of this work task is to define what the strategies are that are used by the soldier to conduct a particular task and by which way new information technologies could support these strategies. A detailed specification of the requirements will provide input for what activities must be supported and what content must be provided to the soldier. 4.5 Research on Human Information Processing Capacity A central aim of the experimental work at this stage is to clarify the limitations of human information processing in relation to the use of wearable UIs. Exceeding human limitations would result in reduced task performance, increased amount of errors, and unnecessary visual and/or cognitive load. Adequate ergonomic design of user interfaces reduces perceptual and cognitive strain related to the use of wearable computers and therefore, helps avoiding potentially detrimental health effects of the devices. For example, in the use of wearable displays, various potential problems related to human information processing can be recognized. These include small size of wearable and mobile displays, short viewing distance, detrimental interaction between perception of displayed and external information, and light adaptation when a wearable display is used in low ambient light conditions [1]. The second aim is to study and model the integration of multimodal information and allocation of selective attention in demanding operating environments. The wearable UI should support the timely and adequate allocation of attention between different sense modalities and provide resistance to unnecessary interruptions. The third aim is to study the critical dimensions of soldier psycho-physiological state (both psychological and psycho-physiological factors). This information is needed when evaluating the physiological and psychological condition of the soldier in order to assess his/her capacity to perform the task, definition of physical, perceptual and cognitive requirements and strategies for supporting design decisions. 4.6 Human Information Processing Requirements for Multimodal Adaptive Interfaces The aim is to define specifications for a wearable multimodal computer system which supports perception and comprehension of critical operational information. The ergonomics and usability of the displays and input devices must be optimized. The optimization of visual displays in terms of visual conditions is challenging. Special

Supporting Situation Awareness in Demanding Operating Environments

19

challenges are caused by demanding usage conditions, dim light, dazzling sunlight and fast changes in lighting conditions. For example, the changes of lighting conditions require that the system must be adaptable. One of our aims is to recognize the problems caused by lighting conditions and generate solutions to these problems. In addition to visual displays, the aim is also to investigate auditory and tactile interfaces that can provide supporting information. Specifically, we will study the integration of information through different sensory channels in wearable user interfaces, and guidelines will be generated for the presentation of multi-sensory information. It will also be studied the possibility to use peripheral visual information in these kind of displays. Our purpose is to optimize the input and output properties of the wearable user interface taking into account the physiological and cognitive boundary conditions of the human information processing system. We will also study how to improve localization and navigation in demanding operating environments, and how to support communication and collaboration between soldiers and operative decision making. One of our key aims is to identify psycho-physiological parameters or combinations of parameters measuring in the optimal way mental and physical stress in demanding operating environments and study how acceleration sensors can be used in the measurement of the user’s actions and posture. The target at this phase is to discover requirements for the presentation of information of the user’s psycho-physiological state and specifications for the adaptive interface utilizing sensor information.

5 Development of the Prototype System Based on the design requirements, the aim is to develop an integrated networked UI prototype including a wearable computer capable of gathering information from the operating environment and from the user through sensors. The system will provide context-sensitive services through multimodal user interfaces. 5.1 Wearable User Interfaces In the development of user interfaces for wearable displays it must be taken into account the fact that novel technologies enable new types of multimodal displays. With these displays there are completely new types of user interface requirements. For example, the interface must enable fluent performance in the task; it has to provide to the user information about critical incidents without interfering the current task, i.e., the situational awareness remains intact. One possibility is to utilize ambient visual, auditory or tactile signaling. When using visual signals interface features could change slowly their color, saturation or size; when using auditory signals the interface provides a peripheral awareness of people and events. In order to provide an ability to interact with the information display while performing in external task new interaction techniques are needed. Also, there is a need for visualizations that support task performance. Input/output devices for use with the wearable display should be carefully designed. Overall, wearable UIs present several challenges for input due to small

20

J. Laarni et al.

display volume and resolution and difficulties in managing information within small screen space. Reducing the visual size of elements allows for displaying larger quantities of elements, but also makes it harder for the user to select those tiny elements. Display and input could be unattached and isolated so that the control of the device is separated from getting information from within the display.. It has to be taken into account mapping issues, visual feedback on actions (integrating tactile, auditory) and the fact that the visual space moves with the head. Input surface is also different, and the buttons are generally not visible to the user. Possible solutions to these problems have been suggested [1]. For example, the virtual targets presented in user interfaces can be "active", that is, the user interface can modify its appearance and functionality based on the users' actions. Haptic and/or auditory feedback can also be used. 5.2 Wearable Sensors Context sensitivity can be defined as the system’s ability to adapt its behavior or properties to changes in its environment. A user’s context may be related to the physical attributes (e.g., time, temperature and physiological state of the body) or to the social environment of the user (e.g., recognition and identification of nearby persons and objects). Context sensitivity is based on the fact that the environment can recognize the user and the usage situation. Knowledge on time and location are typical examples, but sensors can also collect information of other features of the user and of the environment. The challenge in the development of context-sensitive systems is to recognize the user in a correct manner and to be able to predict his/her needs or to be able to behave according to the pre-specified instructions in a situationally correct manner. The aim is to collect data from different sources and processes. Different types of sensors can be used that can be integrated to the clothes or to a separate wearable sensor platform for pre-processing. In addition, different physiological parameters can be measured, e.g., heart rate, respiration, body temperature, posture and movement. Information from the operating environment can be collected from physical location, temperature, humidity and chemical concentrations; in addition, video and sound information can be collected and transmitted. The system should also support communication within a group and between different organizational levels. In addition to speech communication, the system should be able to support the transmission of information through head-up displays or tactile displays.

6 Conclusions One of the main lessons that we have learned from other future soldier system projects is that it is very difficult to improve an infantry soldier’s situational awareness, decision-making ability, performance efficiency and performance accuracy. Therefore, we are also gathering user requirements from other demanding operating environments such as fire fighting, police operations and extreme sport. Some examples of basic requirements for a wearable system for these environments are that we should design and develop a system that does not overload cognition but

Supporting Situation Awareness in Demanding Operating Environments

21

supports it and provides redundancy. The system should not prevent information acquisition from the environment, and it must be ensured that critical information is always received. We should also strive for a simple and well-integrated system that is easy to use. In addition, we should carefully supervise that the weight and energy consumption of the system are minimized. Another lesson we have learned concerns the methods for collecting data and analyzing it: Cognitive task analysis methods if applied rigidly do not necessarily provide the answers we are searching for. Therefore, these methods have to be tailored to the characteristics of the application domain and to the characteristics of the design task.

References 1. Baber, C.: Wearable Computers: A Human Factors Review. Int. J. Hum.-Comput. Int. 13, 123–145 (2001) 2. Crandall, B., Klein, G., Hoffman, R.R.: Working Minds. In: A Practioner’s Guide to Cognitive Task Analysis. Bradford Books, Cambridge (2006) 3. Elm, W.C., Potter, S.S., Gualtieri, J.W., Roth, E.M.: Applied Cognitive Work Analysis: A Pragmatic Methodology for Designing Revolutionary Cognitive Affordances. In: Hollnagel, E. (ed.) Handbook of Cognitive Task Design, pp. 357–382. Lawrence Erlbaum, Mahwah (2003) 4. Kaasinen, E., Norros, L.: Älykkäiden ympäristöjen suunnittelu: kohti ekologista systeemiajattelua (English transl.: Design of smart environments: towards the ecological system approach), Teknologiainfo Tecnova Oy, Helsinki (2007) 5. Puolustusjärjestelmien kehitys: Sotatekninen arvio ja ennuste 2020 (STAE 2020, osa 2). Pääesikunnan sotatalousosasto (2004) (in Finnish) 6. Riva, G.: The Psychology of Ambient Intelligence: Activity, Situation and Presence. In: Riva, G., Vatalaro, F., Davide, F., Alcañiz, M. (eds.) Ambient Intelligence. IOS Press, Amsterdam (2005) 7. Saarelainen, T.: Taistelija 2020 – jalkaväen kärkitaistelija. Tekniikan julkaisusarja Tutkimuksia 1/2007. Maasotakoulu, Lappeenranta (2007) (in Finnish)

Development of a Technique for Predicting the Human Response to an Emergency Situation Glyn Lawson1, Sarah Sharples1, David Clarke2, and Sue Cobb1 1

Human Factors Research Group, Faculty of Engineering, The University of Nottingham, United Kingdom, NG7 2RD 2 School of Psychology, The University of Nottingham, United Kingdom, NG7 2RD {glyn.lawson,sarah.sharples, david.clarke,sue.cobb}@nottingham.ac.uk

Abstract. This paper presents development work on a new approach for predicting the human response to an emergency situation. The study builds upon an initial investigation in which 20 participants were asked to predict what actions they would take in the event of a domestic fire [1]. The development work involved a retest with an additional 20 participants to investigate the reliability of the approach. Furthermore, the analysis procedure was improved such that the results represented more accurately those which could be obtained from practical application of the approach. As found in the initial investigation, the frequencies and sequences of the reported acts had significant relationships with a study of behavior in real fires [2] (Spearman’s rho: 0.323, N=55, p<0.05) and (Spearman’s rho: 0.340, N=37, p<0.05), respectively. Further development work is required, but the results indicate that the approach may have use for predicting human behavior in emergencies. Keywords: human response, behavior, emergency, reliability, predict.

1 Introduction Predictions of human behavior can be used by Human Factors (HF) professionals to analyze interaction with a system at an early stage in the development process. Predictions typically involve abstracting complex human behavior into models which are analyzed against the tasks the user will have to perform to achieve their desired goal on the planned system. This process can identify mismatches between human capabilities and system requirements, or quantify the predicted human performance. It can reduce the cost associated with identifying HF issues at a later stage in the development process, when physical prototypes become available for user trials, as design decisions become fixed and investment will have been made into specific lines of engineering [3]. Publications in the HF literature regarding predictions of human performance are typically concerned with nominal operating conditions, e.g. [4]. However, methods are also required for predicting behavior in emergency situations. These would have several applications: understanding the human response to an industrial incident D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 22–31, 2009. © Springer-Verlag Berlin Heidelberg 2009

Development of a Technique for Predicting the Human Response

23

during factory layout planning [5]; developing training plans for emergency response teams [6]; and improving the design of evacuation routes and guidance in buildings, aircraft and boats [7-9]. Predicting behavior in emergencies is required for reasons other than reducing development costs. Ethical considerations prevent putting participants in distress, and therefore the relevant behaviors cannot be studied in a laboratory, even if a physical prototype or the real system is available [10]. Basing predictions on previous events is not always possible, as gaining access to incident reports can be difficult. Furthermore, emergency events are unique to the particular circumstances; the validity may be questioned of a behavioral prediction derived from a different scenario [11]. This paper presents development work on a new approach for predicting human behavior in emergency situations. The approach is based on a talk-through method in which participants are asked to describe their actions in response to a scenario [12]. This low-cost approach means that a dangerous situation can be described without putting participants in any physical risk. The approach also draws upon sequential analysis [13], an established technique for studying dynamic aspects of behavior. Sequential analysis requires the assessor to record behaviors according to a taxonomy. Sequences of behavior can then be analyzed, for example by recording the number of times behavior X follows behavior Y. These methods were chosen in the initial development of this predictive approach [1] to enable comparability with sequential analysis data from a study of real behavior in domestic fires [2]. The comparison study [2] involved interviews with 41 people who had been involved with domestic fires, who were asked to describe exactly what they did from the time they noticed something abnormal was happening, until they reached a safe area. The descriptions were transcribed and coded using a taxonomy of behaviors. The frequency of acts, and act sequences, were analyzed and published. The initial predictive study [1] demonstrated comparability of the frequency and sequence of a selection of reported actions with the study of behavior from people involved with real fires [2]. However, reliability is an important criterion for the success of a method or technique, which essentially requires that the same results are achieved upon repeated use of the method [14]. This investigation aimed to investigate reliability of the predictive approach through a test-retest approach, in which the methodology was replicated with different participants. It was also decided to investigate in further detail the validity of the predictive approach by removing from the analysis only two acts. These were related to technology which has become more prevalent since 1980. This differs to the analysis used in Lawson et al. [1], in which any categories not expected due to the example scenario or experimental protocol were removed. This change simplifies the administration of the approach as a predictive tool, as users are likely to find it difficult to identify which tasks to remove from the analysis unless justification is obvious, as it is with the two acts relating to a change in technology over several decades.

2 Method Adverts were places around the University of Nottingham, and were circulated by email. The adverts explained that research was being conducted into human behavior in emergencies. They told people not to apply if they had been involved (or had a

24

G. Lawson et al.

close friend or relative who had been involved) in a fire, or if they suffered from any mental ill health. These precautions were put in place to avoid causing participants any distress. Participants from the initial investigation [1] were also excluded as the aim was to investigate whether the results drawn from a different sample would also demonstrate a significant relationship with the study of behavior in real fire [2]. 20 participants (14 male, 6 female; mean age=31.32, SD=5.47) met the application criteria and were allocated a 1 hour appointment each. Each participant was asked to sign a consent form which emphasized that they could pull out at any point if they felt distressed. Thereafter, the methodology replicated that used by Lawson et al. [1]. Firstly, participants were asked to sketch a plan layout of all rooms on all floors of their house. This provided a visual reminder to the participants, who were required to consider the layout during the trial. It also familiarized the experimenter with the layout, which would help them understand comments made by participants. Then, participants were asked to imagine that it was the middle of the night and that they, and everyone else who typically sleeps in their house, were asleep in their beds. They were asked what actions they would take if they were woken by a faint crackling noise coming from the kitchen. They were told that this noise was caused by a fire if their anticipated action sequence led them to approach the kitchen. They were told to be reasonably explicit, and were probed for more detail if not enough was given. Every act reported was recorded, in order, until they were told to stop, which was typically when they had exited their house, or reached a state that would remain unchanged until the fire brigade arrived. The reported acts were recorded on a laptop and displayed on a projection screen so that participants could see their predicted act sequences. The sketches of the floor plans were also recorded by the experimenter. As in the initial investigation into this predictive approach [1], the reported actions were coded against a common taxonomy of human behavior in fire, taken from Canter et al. [2]. Every effort was made to map the reported acts onto the taxonomy, but if this was not possible, they were mapped onto categories generated from the Lawson et al. study [1], or new categories were created. The frequencies with which each action occurred were recorded. A matrix was then created in which the number of times each act followed every other act was recorded. The matrix was used to generate standardized residuals (observed frequency minus expected frequency1, divided by square root of expected frequency) for each transition between groups of related acts. Canter et al. [2] reported the standardized residuals as “strength of association” values and used them to provide information on the relationships between acts.

3 Results The taxonomy of acts, and frequencies with which each act was reported is shown in Table 1. This shows the frequency with which each of the acts from the taxonomy were reported in this study. The last column shows the values from the study of behavior in real fires [2]. Note that acts 1-25 were from the original taxonomy [2], acts 27a, 27c, 28a and 39a were found in the first predictive study [1], and acts 42a 1

Expected frequency = (row total*column total)/grand total.

Development of a Technique for Predicting the Human Response

25

and 43a were found in this study, but could not be mapped to any previous act. Only two acts were removed from the analysis: “collect mobile/cordless phone” and “turn burglar alarm off”. These were removed as the prevalence of these technologies is likely to have increased since the original study in 1980, and therefore the comparison of these tasks would be unrepresentative. Table 1. Taxonomy of acts against which the actions reported in this experiment were coded

Code

Action category

1a 2a 2b 2c 3a 3b 3c 4a 4b 5a 5b 6a 6b 6c 7a 8a 8b 9a 10a 10b 10c 10d 11a 11b 12a 13a 13b 13c 14a 14b 15a 16a 16b 17a 18a 18b 19a 19b 20a

Pre-event actions Perception of stimulus (ambiguous) Alerted/awoken (ambiguous) Note behaviour of others (ambiguous) Perception of stimulus (associated with fire) Note fire (development) Encounter smoke Interpretation (incorrect) Disregard/ignore prior stimulus Receive warning/information/instruction Ask advice/request information Search for people (in smoke) Encounter person in smoke Check state of victim Observe rescue attempt Advise/instruct/reassure Note agitated state of person Feel calm/unconcerned Experience negative feelings Experience uncertainty Feel concern about occupants Request assistance (urgent) Fire equipment faulty/unable to work Struggle with environment Seek information/investigate Realize door to fire area open Prevent fire spread Ensure accessibility Indirect involvement in activity Wait for person/action to be completed Rescue Go/gain access to house with fire Go to neighbour's house Dress, gather valuables Fetch things to fight fire with Fight fire Evasive Leave immediate area Forced back by/breathing difficulties/due to smoke/flames

Frequency This study: Canter et al. [2]: 20 participants 41 participants 0 42 5 40 2 20 0 16 2 12 20 31 0 35 4 23 1 15 3 73 5 16 0 18 0 12 0 6 0 19 29 51 0 5 0 11 3 16 1 15 0 16 0 2 0 8 0 5 52 76 0 6 13 29 2 10 0 27 2 22 0 42 8 40 5 12 21 20 23 22 28 25 9 33 1 51 0 48

26

G. Lawson et al. Table 1. (continued)

Code

Action category

20b 20c 20d 21a 22a 22b 23a 23b 24a 25a 27a 27c 28a 39a 42a 43a

Cope with smoke Struggle through smoke Injured Pass through/enter fire area (investigate etc) Warn Phone for assistance Rescued/assisted Rescued from window Note/wait for fire brigade arrival Enter area of minimal risk Take/carry pet Call for pets Return to bedrooms Wake someone Take weapon/threaten/attempt to scare Move car

Frequency This study: Canter et al. [2]: 20 participants 41 participants 1 15 0 8 0 6 8 35 5 34 18 7 1 13 0 4 16 45 23 52 4 0 1 0 16 0 12 0 4 0 1 0

The frequency of predicted acts demonstrated a significant relationship with Canter et al’s [2] study of behavior in real fires (Spearman’s rho: 0.323, N=55, p<0.05). A significant relationship was also reported in Lawson et al. [1], albeit with a reduced set of tasks. These findings indicate that for this scenario, the predictive approach reliably demonstrates comparability for the frequency of acts with those reported by people who have been involved with real fires. The transitions analyzed as part of this investigation are shown in Figure 1, which includes all transitions reported for domestic fires by Canter et al. [2]. Each labeled node represents a group of actions from Table 1. The arrows represent transitions between the nodes, by pointing to subsequent acts. There is no meaning in the position of the act nodes. The standardized residuals for each of the labeled arrows from this study and Canter et al. [2] are shown in Table 2. The standardized residuals demonstrated a significant relationship between this study, and those reported in Canter et al. [2] (Spearman’s rho: 0.340, N=37, p<0.05). A significant finding was also reported by Lawson et al. [1], again indicating repeatability in the approach, and (in this instance) comparability to behavior in real fires. The findings were investigated further to identify any opportunities to improve the approach through the next stages of its development. A scatter plot in Figure 2 shows the standardized residuals from this study plotted against those from Canter et al [2]. The dashed line indicates a theoretical perfect correlation. A visual inspection indicates that there were under-representations of the transitions B, a, and L in this study. These relate to: “evasive” to “leave house”; “pre-event activity” to “hear strange noises”; and “leave house” to “leave house”. Possible reasons for these underrepresentations are made in the discussion.

Development of a Technique for Predicting the Human Response

Pre-event 1 activity

a

c

Hear strange noises

2

j e

k d

b Informed

4

Misinterpret (ignore) m

r

5

12 Investigate

n

o

l

u

y

v Dress

17

Wait for q person to 14 3 return Encounter H i F smoke/fire f Warn 22 g Fight 18 h fire Feel 10 z concern Close 13 door t 19

s

Instruct/ 8 reassure Enter room 21 of fire p origin Go to

Leave house

neighbours or return to 16 house

G A

Encounter difficulties 20 in smoke E w Rescue 15 attempt D Evasive

C 6

L

B

J

25

K Meet fire brigade on arrival 24

M I

End of involvement Fig. 1. Transitions investigated as part of this study Table 2. Standardized residuals for the transitions Transition a b c d

27

This study 0 0 3.64 -0.39

Canter et al. [2] 9.84 1.34 6.26 0.97

*

x

Search for person in smoke

28

G. Lawson et al. Table 2. (Continued) Transition e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H 2 I J K L M2

This study 0.06 -1.23 1.38 0.95 0.79 -0.18 -0.22 1.97 0.36 0.89 -0.45 -0.10 5.59 -0.11 1.52 1.10 4.56 -0.11 0.00 0.00 1.88 5.66 2.13 1.75 0.00 -0.16 -0.06 3.30 -0.16 -0.35 3.28 5.61 -1.23 -

Canter et al. [2] 5.81 2.12 1.34 0.89 0.00 4.02 0.89 2.46 1.34 0.97 1.72 0.45 8.88 0.55 3.13 0.45 2.68 0.89 1.79 0.45 1.79 3.58 1.79 14.30 0.89 3.90 3.58 2.24 1.34 0.89 9.39 4.47 4.47 4.47 2.24

4 Discussion The results demonstrated that in this scenario, the approach used yielded a significant relationship between the frequencies and sequences of actions participants predicted they would take in a domestic fire when compared to a study of interviews with survivors of real domestic fires. These significant relationships were also identified in the initial application of this approach [1]. The significant relationships for the frequency and sequence of acts for both predictive studies indicate that, in this 2

Note “End of involvement” was not listed in the original taxonomy of acts, and therefore strength of association values were not calculated for acts leading to this one.

Development of a Technique for Predicting the Human Response

29

7

Standardized residuals: this study

6 5 4 3 2

B

1 0

a

-1

L

-2 0

2

4

6

8

10

12

14

16

Standardized residuals: Canter et al. [2] Fig. 2. Strength of association values for the transitions from each of the studies. A theoretical perfect correlation is indicted (dashed line).

instance, the approach demonstrated reliability, and comparability with one study of human behavior in a real fire. These findings imply that with development the approach could be applied to reveal what actions people will take in an emergency situation. It must be emphasized that further work is required – for example an investigation into generalizability to a range of different scenarios. However, once developed, and if proven to meet all the criteria of a quality method, it could be used to predict behavior in novel situations as determined by the scenario presented to the participants. The resources required are low (a laptop computer, projector and whiteboard, and 20 participants) and it does not require any physical or virtual creation of an environment, only a brief written description of the scenario of interest. While this study demonstrated significant relationships for the sequence of actions, Fig 2 revealed that some tasks were under-represented in this approach, and that enhancements may further improve the validity. “Evasive” to “leave house” (B) might have been infrequently observed due to differences in interpretation of participants’ responses to Canter et al. [2]. For example, participants in this study stated where they would go to e.g. “I would get out of the house”. This was never interpreted as “I would put distance between me and the fire” and then “I would get out of the house”, although this sequence was identified more frequently in the Canter et al. study [2]. In the next stage of the development of this approach, the interviewing will probe for more detail on participants’ predicted locations. The sequence related to “pre-event activity” (a), would also have been expected to be under-represented, as the participants were told that they heard a fire, and therefore their first reported action was never anything which could be classified as a “pre-event activity”. For subsequent trials, it is recommended that the approach is altered such that pre-event activities (and “end of involvement”) are recorded as acts.

30

G. Lawson et al.

The final sequence which also shows apparent under-representation is L “leave house” to “leave house”. This group actually includes the sub-category of “enter area of minimal risk”. Therefore, if a participant predicts that they would move out of their house, then move to another safe place, this would increase the occurrence of this transition. This also may be realized through greater attention to reporting locations. It is worth noting that significant relationships were found for the sequence and frequency of acts despite a stricter analysis procedure than that used in Lawson et al. [1]. In this development study, only tasks relating to changes in technology since 1980 were removed, whereas in the initial predictive study several acts were removed which were not anticipated due to the experimental protocol or scenario. This change was made to reflect more accurately the anticipated end-use of the approach. A Human Factors practitioner would not know which tasks were unexpected due to the experimental protocol, and therefore this analysis is more representative of how the approach will be implemented to predict behaviors.

5 Conclusions This study was conducted to investigate developments on an approach for predicting human behavior in an emergency situation, in which participants were asked what actions they would take if they experienced a domestic fire in their house at night. The frequency and sequence of acts reported were compared to a study of human behavior in real fires. A significant relationship was found for both, as they were in an initial study of the predictive approach, indicating reliability. This is despite a stricter analysis procedure, which excludes far fewer tasks. Recommendations have been made to improve the methodology, mainly asking participants for more detail about their locations, which will be implemented in the next phase of the development of this approach. The approach continues to show promise as a low resource method, which (with development) could be used as part of the Human Factors professional’s toolkit for predicting behavior in novel situations. It does not require a physical or virtual mockup, and does not put participants in any danger.

Acknowledgements The authors acknowledge Professor David Canter for permission to use the adapted material from Domestic, Multiple Occupancy, and Hospital Fires [2].

References 1. Lawson, G., Sharples, S., Cobb, S., Clarke, D.: Predicting the Human Response to an Emergency. In: Bust, P.D. (ed.) Contemporary Ergonomics. Taylor and Francis, London (accepted for publication) (in press) 2. Canter, D., Breaux, J., Sime, J.: Domestic, Multiple Occupancy, and Hospital Fires. In: Canter, D. (ed.) Fires and Human Behaviour, pp. 117–136. John Wiley and Sons, New York (1980)

Development of a Technique for Predicting the Human Response

31

3. Laughery, R.: Simulation and Modelling as a Tool for Analysing Human Performance. In: Wilson, J.R., Corlett, N. (eds.) Evaluation of Human Work, 3rd edn., pp. 219–238. Taylor and Francis, London (2005) 4. Hamilton, W.I., Clarke, T.: Driver Performance Modelling and its Practical Application to Railway Safety. Applied Ergonomics 36, 661–670 (2005) 5. DiFac IST5-035079, http://difac.net 6. Lawson, G., D’Cruz, M., Bourguignon, D., Pentenrieder, K.: Training in the Digital Factory. In: The IFAC Workshop on Manufacturing, Modelling, Management and Control, Budapest (2007) 7. Purser, D.A., Bensilum, M.: Quantification of Behaviour for Engineering Design Standards and Escape Time Calculations. Safety Science 38, 157–182 (2001) 8. Brooks, C.J., Muir, H.C., Gibbs, P.N.G.: The Basis for the Development of a Fuselage Evacuation Time for a Ditched Helicopter. Aviation, Space and Environmental Medicine 72, 553–561 (2001) 9. Deere, S.J., Galea, E.R., Lawrence, P.J.: A Systematic Methodology to Assess the Impact of Human Factors in Ship Design. Applied Mathematical Modelling 33, 867–883 (2009) 10. Dane, F.C.: Research Methods. Brooks/Cole Publishing Co., Belmont (1990) 11. Silverman, B.G., Johns, M., Cornwell, J., O’Brien, K.: Human Behavior Models for Agents in Simulators and Games: Part I – Enabling Science with PMFserv. Presence: Teleoperators and Virtual Environments, 139—162 (2006) 12. Kirwan, B., Ainsworth, L.K.: A Guide to Task Analysis. Taylor and Francis, London (1992) 13. Bakeman, R., Gottman, J.M.: Observing Interaction: An Introduction to Sequential Analysis. Cambridge University Press, Cambridge (1986) 14. Wilson, J.R.: Methods in the Understanding of Human Factors. In: Wilson, J.R., Corlett, N. (eds.) Evaluation of Human Work, 3rd edn., pp. 1–31. Taylor and Francis, London (2005)

A Dynamic Task Representation Method for a Virtual Reality Application Maria Chiara Leva1, Alison Mragaret Kay1, Fabio Mattei1, Tom Kontogiannis2, Massimiliano De Ambroggi1, and Sam Cromie1 1

Aerospace Psychology Research Group, Trinity College Dublin 2 Ireland 2 Department of Production Engineering and Management, Technical University of Crete, Chania Greece {levac,alison.kay,matteif}@tcd.ie, [email protected], [email protected], [email protected]

Abstract. The paper introduces an approach and a tool for representing tasks in the workplace as it emerged from a research finalized in reproducing safety critical tasks in Virtual Reality (VR) simulations. In the context of an EU research project called Virthualis. The tool, developed to address the needs of three different user groups, supports data collection and provides a structure for the interviews and for the simultaneous graphical representation of the task. It can be used as a common means of communication between the technical personnel involved in the interviews, the human factors expert and the VR expert. This aids the common understanding of a task, its main objective, challenges and criticalities whilst performing the actual analysis. Keywords: Task analysis, Human Factors, safety procedures, troubleshooting.

1 Introduction A proper acquisition of relevant information about a task in a safety-critical environment is the foundation of every sound human factor analysis. The scope of the analysis may cover a Human Reliability Assessment, an evaluation of a humanmachine system, the writing of a procedure or the preparation of a training program. More and more studies have highlighted that this critical first step of the analysis has often been neglected leaving the design stage without structured information about the tasks and contexts to be addressed.[1], [2], [3], [4], [5], [6]. This issue has become apparent in the course of a European project (called Virthualis) aimed at reproducing safety-critical tasks in Virtual Reality (VR) simulations. The main insight leading to the implementation of the project, is that real plants can be modeled in VR, where experiments can be conducted, observations made and data collected. This gives the safety analyst a safe environment to form and test hypothesis, and train operators using a trial and error approach, something that can seldom be done on a real plant. Within Virthualis the analysis of the task and its description is the first input required for the development of the Virtual Reality (VR) simulation. In this framework the Task Modeler is just a tool used for guiding the preliminary phases of data collection about task to be analyzed and then represented using the D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 32–42, 2009. © Springer-Verlag Berlin Heidelberg 2009

A Dynamic Task Representation Method for a Virtual Reality Application

33

Virtual Reality. Its scope is to provide a structure of the interviews and for the graphical representation of the task so that it can be a common means of communication for all involved in the analysis. 1.1 End Users Needs for Task Modeling and Representation In the preliminary phase of the project end-user needs were elicited. The end-users can be subdivided in three different user groups: the HF analysts, the Industrial End Users and the VR experts. The Human Factors analysts needed an approach and a tool to help them in providing: - A template for the interview process of a Task analysis with the ability to structure the interview phase in order to highlight and examine the deviations from standard practice. These deviations are fundamental to understanding what can and does go wrong in the field and should be an integral part of any safety critical task representation. A graphical representation of the procedure map linked to the template including: - A list of tasks and subtasks with their associated actor roles, mapping a given procedure. - Logical links between sub-tasks, events and actor roles. - A list of all the possible sequences of execution for a given procedure - A list of possible success end states and failure end states. - A list of the possible error modes of the tasks and Performance Shaping Factors (PSF) affecting activities. - A list of tools, documents, artifacts and information exchanges between the roles for each step of the task. - A tabular representation of the above elements derived from the graphical structure - A task analysis that can constitute the basis for performing a quantitative human reliability assessment of a task. - The possibility of comparing the procedure with VR experiment results and to change it if necessary. The industrial end-users needed: - A representation of the task that was easy to understand so that they could verify the information they provided was correctly translated - Be able to use the task analysis to rewrite existing standard procedures - Derive indication for ad hoc possible troubleshooting procedures. Finally, the VR experts tasked with providing a representation of the data requested the following information: - A graphical description of a task and its objectives including logical and temporal description of the expected steps and actors carrying out those steps and a formal description of the task sequences (XML schema) - A list of the different actors to be involved in the Procedure under analysis and indication about the relevant part of the plant, tools and equipment the actors are likely to come in contact with during task execution.

34

M.C. Leva et al.

-

A list of relevant performances to be observed and in respect to which meaningful interactions should be provided (warning messages etc.).

Some existing tools and method have been revised in the area of Task analysis and Workflow Modeling to address those needs. Table 1 reports a summary of the review. Table 1. Revision of existing techniques/tools in respect to Virthualis user needs

Tool and Method revised in the task analysis domain Hierarchical Task Analysis (HTA); [7] is a process for developing a description of tasks in terms of operations (things people do to reach goals) and plans (statements of conditions that tell when each operation is to be carried out). HTA has been mostly used for the design of training. Its merit is mainly the clear representation technique. HTA is currently supported by a software: www.taskarchitect.com [8] Goal Operators Methods Selectors (GOMS) [9] is one of the most influential task analysis techniques for the purpose of user interface design. It is intended to estimate user performance based on the description of the system before the system is actually built. Several attempts have been made over the years to develop graphical representation tools as well. (ftp://www.eecs.umich.edu/people/kieras/GOMS/GO MSL_Guide.pdf) Méthode Analytique de Description des taches (MAD) [10] is the task modelling part of a larger method for designing interactive systems. In MAD, task models are similar to HTA models except that the plan has been replaced by so called constructors in order to specify the time order in which tasks are executed. Templates are used to describe the task details even if most of the fields are merely for description purposes and are not used in any automated evaluation or transformation. MAD is supported by a tool called (KMADe) (http://kmade.sourceforge.net/).[11] Groupware Task Analysis [3] essentially consists of a conceptual framework that specifies relevant aspects of the task world that need attention for design purposes. For extracting data, it uses techniques such as structured interviews, interaction analysis, and observation. Task modeling is then performed as a cyclic activity where models are created, evaluated, modified. GTA mainly

Issues in respect to Virthualis needs The main problem encountered with it was primarily the limited capability to represent task deviations flow and critical decision point as logical gates. GOMS underpinning model does not allow for any type of error analysis, the model works under the assumption that a user will know what to do at any given point.

The tool has been developed primarily with the focus of analysis of user interfaces. Therefore task structure is still configured around the notion of a hierarchical breakdown of a task tree that does not logically represent the impact of task deviation on the nominal expected flow. This tool has also been developed primarily with the focus of analysis of user interfaces. Therefore, the main issue encountered was the difficulty in representing task deviations flow and

A Dynamic Task Representation Method for a Virtual Reality Application

35

uses hierarchical representations. GTA is also supported by a software tool: http://www.cs.vu.nl/~martijn/gta/.[12] Tool and Method revised in the workflow modeling domain: Microsaint is a commercial tool developed by Micro Analysis and Design Inc. It consists of a discreteevent network simulation software package for building models that simulate real-life processes. It is used to model and simulate manufacturing processes, such as production lines, to examine resource utilization, efficiency, and cost. It can model service systems to optimize procedures, staffing, and other logistical considerations and it can model human operator performance and interaction under changing conditions. [13]

critical decision points as logical gates.

IBM Task modeler in Eclipse [14] The “Task Modeller” developed by IBM as an editor that is part of the software development platform called Eclipse A usability practitioner can produce either classic HTA (Hierarchical Task Analysis) diagrams or RAG (Roles and Goals) diagrams. A development manager can produce a use case model. The link for the download and demo version of this product is the following: http://www.alphaworks.ibm.com/tech/taskmodeler/d ownload

The issue related to this UML modeling tools is that its usage for Human Factors Practitioners and Safety analysts is not so immediate because it is meant to be used primarily by software developers. Furthermore no support is provided for guiding the task analysis interview process.

Issues with respect to Virthualis needs: With respect to this tool, the issues highlighted were the impossibility of using the source code and modifying it in order to integrate the tool within the Virthualis platform. Also, the fact that Microsaint does not really model human error, and does not allow the provision of a classification of the possible errors for a given procedure

In revising the techniques for workflow modeling it was highlighted that a UML notation currently used is the Business Process Modeling Notation (BPMN) [15]. The “primary goal of BPMN is to provide a notation readily understandable by all business users, from the business analysts that create the initial drafts of the processes, to the technical developers responsible for implementing the technology that will perform those processes, and finally, to the business people who will manage and monitor those processes. BPMN provides a simple means of communicating process information to business users, process implementers, customers, and suppliers. The notation seems to be very suitable for representing the task in a graphical way easily understandable and for providing a sound logical structure to serve the needs elicited among Virthualis users.

36

M.C. Leva et al.

2 Task Modeling within Virthualis: The Approach Chosen The Task Modeler developed within Virthualis provides a structured interview template for the human factor practitioner so that he can reconstruct a model of an actual task execution considering also the deviation from the nominal task flow and its consequences. The structured interview template has been prearranged in such a way that all the branching of the task, also the ones originating from deviation, are analyzed and represented till they reach their final outcomes (it was decided to use a stop logic that depends upon the purpose of the case study). Furthermore, the method for representing the chain of events in the Task Modeler needs to have a clear logical structure for task-mapping. The structure should be composed of the following main categories: -

-

Activities (tasks and sub-processes). An activity is a generic term for work that the operator performs. An activity can be atomic or non-atomic (compound). Events (start events, messages to be sent to the system or the other actors, completion events, errors, deviations, tools related events) Links and Gates (gateways for controlling the merging or splitting of actions according to “And” & “Or” logic, connectors between activities, gates etc..) Gateways are used to control the divergence and convergence of the sequence flow and critical decision points where deviations can occur; links are used to show the order that activities are supposed to be performed in a process; links can also represent referents to messages flows and associations. Information connected to task (performance shaping factors, etc.) does not have a direct effect on the Sequence Flow or Message Flow of the Process, but provide useful information about what data may be required to perform the activity, what influences the activity and/or what the activity can produce.

2.1 Description of the Tool Developed The “Task Modeler” (TM) uses different Graphical User Interfaces (GUIs) for editing and visualizing the collected information in different formats. The representation of the task model can be input following a guided structured interview, through two editors: a graphical and tabular one. The visual representation documents the relationship between tasks and sub-tasks making up a procedure, highlighting the workflow, actor interactions, parallelisms, and possible deviations. Every change in each editor has to be simultaneously reported on any other. For the first version of the TM, five processes were considered: 1. A safety analyst / human factor (HF) specialist imports, exports or creates a new task model; 2. A safety analyst / HF specialist performs a task analysis interview sessions in real time by filling the interview template or the tabular task; 3. A safety analyst / HF specialist alters a certain task representation; 4. Industrial end-user views and print a task model; 5. Retrieve different kinds of reports about the task model

A Dynamic Task Representation Method for a Virtual Reality Application

37

Fig. 1. Use case diagram illustrating some uses of the tool by the three different user groups

The Graphical User Interface developed includes five main sections: 1. Section 1 “Case study Wizard: it allows the creation of a new case study, the definition of its properties (general information about the plant under study, the interviews sessions, the operators involved etc..) and it introduces the template for the structured interview. 2. Section 2 provides a tabular editor of the information collected trough the template of the structured interview. 3. Section 3 provides a graphical view of the sequence of events, actions and messages in the task model. The sequence is synchronized with the information input through the tabular or the wizard view and can be edited directly as well. The graphical interface provides a palette containing the symbols used for representing the task model (Business Process Modeling Notation) and a map facilitating the navigation of extended graphs. 4. Section 4 enables the analyst to query the information inserted. 5. Section 5 is in common with the Virthualis Platform and enables to explore and edit existing task models and its connected properties. 2.2 Detail Description of TM Functions The first section of the GUI is the case study wizard. As previously mentioned, this section has a first step in which it allows to create a new case study and define its properties (general information about the plant under study, the interviews sessions, the operators involved in the analysis, the PSF related to the case study, etc..). There are three different screens for the first step of GUI Section 1 each one of them has a specific purpose.

38

M.C. Leva et al.

Fig. 2. General information about the Case Study

Fig. 3. List of all the Performance Shaping Factors associated with the Case Study

Screen 1: To create and edit a new case study. An example is reported in Fig.2. The “Id” is automatically given by the software. The fields to be filled are: a. b. d. e. f. g.

Name of the case study; Goal to specify the main goal of the case study; and description; Plant involved; The status of the plant (start up, shut down, normal operation, emergency operation, other); The estimation of task frequency (every day, every week, once a year); The estimation of the expected time in terms of distribution and its parameter.

Screen 2 (Fig.3) is used to visualize the Performance Shaping Factors (PSF) affecting the task, while Screen 3 is used to visualize the information about the operators (“Job Roles”, e.g. CR operator, CR assistant), tools (e.g. gas detector, spanner), plant areas (e.g. control room), and plant components (e.g. pressure valve) inserted for the task. This form is automatically filled by the software collecting the information from the nodes generated during the specification of the steps of the task.

A Dynamic Task Representation Method for a Virtual Reality Application

39

Fig. 4. Example of a screen for the Interview Template

Fig. 5. Properties that can be specified for each node of the Task Model

The second step of GUI Section 1 deal with the interview templates (Fig.4). It includes the template for the structured interview in which the interviewer is able to go through the definition of all the steps of the normal flow of the task and their connected attributes during an interview session. This part will have different screens for inputting the actual outline of the task. Specifically the inputs required are: a)

The information about the case study trigger (a case study trigger can be any of the possible task nodes: an action/task or an event, or a gate); b) The information associated with each task node.

The form is set up into two tabs, one for the general description and one to detail the properties. In this form, it is possible to define the gate or rule specification in which the logic of the sequence of action and event can be explicitly identified; Figure 4 shows the form to be filled to input a new event and the detail of the popup window used to select the conditions.

40

M.C. Leva et al.

The second tab allows detailing the properties of the nodes (Fig. 5) that is: - Expected Duration of the step – it is possible to specify a time distribution; - Actors involved (Job Roles) – all the actors involved in the step. - Tools, Plant areas, and Plant components involved in the considered step. The second Section of the GUI provides a tabular view of the information collected trough the template for the structured interview. It also enables to modify this information and the changes are synchronized for any other Section. The information collected into the tabular view is exportable in a file format readable by Ms Word, Ms Excel, and XML. The third Section provides a graphical view of the sequence of events, actions and messages forming the task model. This third graphical representation enables the Safety Analyst / Human Factors expert to edit the task model in a graphical way. The graphical interfaces provides a graphical palette containing the symbols used for graphically representing the task model (which is the Business Process Modeling Notation) and a map facilitating the navigation of extended graphs. It is possible to drag and drop the symbol from the graphical palette into the drawing area to create a new node of the task. The map is updated every time the drawing area or the tabular view is edited. The properties of every generic graphical node or connection are editable by opening the guided interview template format for the selected node through a right-click of the mouse. The tabular view and the graphical view are visible from a unified window as shown in Fig.6.

Fig. 6. Graphical and Tabular view of the task model

GUI Section 1, Section 2, Section 3 and Section 4 use the same data model according to the chosen Model View Controller (MVC) architectural pattern.

3 The Task Model and the Design of VR Experiments The task analysis provides a structured description of the activities that permits a high-profiling of the task, the context of work and the users’ needs. However, task analysis is more focused on “what” the operators are expected to do than on “how”

A Dynamic Task Representation Method for a Virtual Reality Application

41

Table 2. Link between PSF, task deviations and suggestions for VR experiments Task deviation Does the operator spot the leakage in the first 5 min?

PSF Clearness of roles/responsibility Accessibility/Location of leakage Number of simultaneous goal

Suggestions for VR experiment In one experiment no one is clearly assigned to the task of oil leakages, the set of tasks are shared among everybody Test different positions for the leakage (first floor and second floor) In one experiment the operator assigned to do the checking is also assigned many other tasks, in another is only assigned one or few compatible tasks

they actually perform their activities from their own perspective. Another method, based on user scenarios, seems to be more suitable to examine user activities from the visual perspective (e.g. location, gaze direction, affordances and constraints on movement) and generate functional and usability specifications for virtual environments (VE). This information however can find as a cornerstone the fact that for each node of the task model (especially critical decision-points), the important performance shaping factors and tools to be used are recorded. They are linked to specific task deviations and in relation to those and the need to observe how they influence the task some suggestion for the Virthualis simulation experiments can be reported. An example of it is reported in Table 2.

References 1. Kirwan, B., Ainsworth, L.K. (eds.): A. Guide to Task Analysis. Taylor &. Francis, Ltd., London (1992) 2. Diaper, D., Stanton, N.A.: The Handbook of Task Analysis for Human-Computer Interaction Lawrence Erlbaum Associates Inc. Publishers Mahwah, New Jersey (2004) 3. Van Welie, M., Van der Veer, G.C.: Groupware Task Analysis. In: Hollnagel, E. (ed.) Handbook of Cognitive Task Design, pp. 447–476. Lawrence Erlbaum Associates, New Jersey (2003) 4. Kontogiannis, T.: A Petri Net Based Approach for Ergonomic Task Analysis and modelling with emphasis on adaptation to system changes. In: Safety Science, vol. 41, pp. 803–835. Elsevier Pergamon Press, Amsterdam (2003) 5. Chitnis, M., Tiwari, P., Ananthamurthy, L.: UML Overview, Tutorial, http://www.developer.com/design/article.php/1553851 6. Jung, W.D., Yoon, W.C., Kim, J.W.: Structured Information Analysis for Human Reliability Analysis of Emergency Task in Nuclear Power Plants. In: Reliability Engineering and system Safety, vol. 71, pp. 21–32. Elsevier Science, Amsterdam (2001) 7. Annett, J., Duncan, K.D.: Task analysis and training design. Journal of Occupational Psychology 41, 211–221 (1967) 8. TaskArchitect software, http://www.taskarchitect.com

42

M.C. Leva et al.

9. Card, S.K., Thomas, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erbaum Associates (1983) 10. Baron, M., Lucquiaud, V., Autard, D., DL et Scapin: K-MADe : un environement pour le noyau du modèle de description de l’activité. In: Proceedings of the 18th French-speaking conference on Human-computer interaction, Montreal, Canada, April 18-21 (2006) 11. K-MADe, http://kmade.sourceforge.net/ 12. Groupware Task analysis, http://www.cs.vu.nl/~martijn/gta/ 13. Microsaints, http://www.maad.com/index.pl/micro_saint 14. Task modeller IBM, http://www.alphaworks.ibm.com/tech/taskmodeler/download 15. White, Stephen, A., Miers: Derek BPMN Modeling and Reference Guide. Future Strategies Inc. (2008)

An Investigation of Function Based Design Considering Affordances in Conceptual Design of Mechanical Movement Ying-Chieh Liu1 and Su-Ju Lu2 1

Department of Industrial Design Chang Gang University [email protected] 2 Department of Digital Technology Design National Taipei University of Education [email protected]

Abstract. Using the concept of affordances could lead the designer to consider the user’s possible actions during design activities, which is increasingly important in many design cases. This paper proposes a model that attempts to incorporate the concept of affordances to function based design in conceptual design of mechanical movement. The role of affordances in the initial design process considers the user’s possible actions to the solution in the environment. A simple example of door latch design is demonstrated to see how affordances can support in the divergent and convergent design activities. Keywords: function based design, affordances, mechanical movement, conceptual design, engineering design process.

1 Introduction The use of functions and functional decomposition has been used widely for designing artifacts, particularly in the area of mechanical and product design [1, 2, 3]. The function of a product is what the product is expected to perform [3]. Functionbased design is the generic term that researchers or practitioners, in this realm, emphasizes form ever follow function. The ‘shape-giving’ of a product is derived after a series activities related to functions such as developing function structures [1]. Function is identified from the design problem. The definition of function has been defined differently. One common definition is to represent function as an isolated input-output transformation within a “black box” where function structure is to be developed and mapped or transformed into form by a series of approaches. However no generally agreed definition of function has emerged so far. Activities, following this type of design, show potential use in innovative design and in the situations where design problems are ill-defined or partially recognized [4]. However, functional requirements distilled to represent user requirements might often D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 43–51, 2009. © Springer-Verlag Berlin Heidelberg 2009

44

Y.C. Liu and S.J. Lu

ignore user’s potential reactions after the solution is embodied. To complement this device-centric perspective, the concept of affordances is considered here. The term, affordances, originally derived from Gibson [5], has been popular within design community. This is particularly true, in the user centric design community, mostly through the introduction from the book “The Psychology of Everyday Things [16]. One challenge of function-based design is that potential positive functions and negative functions might not be identified during the design process. This is addressed in Brown & Blessing [7]. They mentioned the role of affordances in the design process is significant to complement function based design. Therefore, this paper proposes a refined model with an attempt to incorporate the concept of affordances into a function-based-design model in the initial design process. An illustrative example is given to show how the concept of affordances is used in this model. 1.1 The Concept of Affordances Gibson [5] defined affordance as “the affordance of the environment are what it offers the animal, what it provides or furnishes, either for good or ill. … It implies the complementarity of the animal and the environment” According to Gibson, the examples of the environment include terrestrial surfaces, other animals, air, water, and solids. One example is such as, “air affords breathing”. The concept of affordances is discussed in a complementarity of the two things, such as “A” and “B”. The complementarity includes a range of perspectives, such as design for X, recognition, etc. Examples, explained in everyday objects design from Norman [6], are “a chair affords for sitting” and “Glass is for seeing through”. The two things here in everyday objects are “the user” and “the object”. The complementarity focuses on usability of the object. For the concept of affordances applied in artificial intelligence, one example is such as “how to design robots that recognize affordances in their environments [8]”. The two things in robot design are “the robot” and “the environment” (may include many objects). The perspective of complementarity emphasizes recognition-related capability. The object, in some design community, is treated as artifact, such as Maier & Fadel [9]. Some good discussions of different definitions on affordances can be seen in You & Chen [10], Torenvliet (2003), and McGrenere & Ho [12].

2 The Design Model The design model proposed in this paper is shown in Figure 1, where solution alternatives are consolidated from abstract textual representation toward a concept sketch to fulfill a few qualitative functional requirements following the multiple divergent and convergent steps. This is a refined model from the one shown in Liu, et al [13]. This refined model has the following characteristics. Separate Divergence and Convergence. Two major types of design activities are: divergence and convergence. In the divergent step, a wide range of possible alternatives are generated. These alternatives meet one or some of the requirements, and can be refined and embodied later. In the convergent step, solution alternatives need to be screened, scored, and refined at the earliest possible moment; otherwise the

An Investigation of Function Based Design Considering Affordances

45

Divergence Convergence

Function based Design

Affordance for Divergence Affordance for Convergence

Shape-giving

Shape-refinement

Fig. 1. The design model with the multiple divergent and convergent steps

number of proposals to consider will continue to grow. The purpose of separate divergence and convergence is to explicitly each step in a disciplined manner for easy management. Iteration. The design activities are seen as an iteration of divergent and convergent activities. This could reduce the challenge that design is under such a situation where the designer may not be fully understood design problem, solution alternatives, user’s needs or the environment that the solution will be used. To resolve this, a constant review of these is helpful. Multiple Abstraction Levels of Concepts. It is important to generate a broad range of concepts (e.g., hundreds of concepts) so that better or optimal concepts will not be overlooked. The challenge is that a sudden expansion of solution number can be overwhelmed. An alternative way is to gradually increase the solution number by means of multiple abstraction levels of concepts. Concepts are represented from abstract toward detailed when more design parameters (e.g., energy, materials, orientations, direction, or others) are considered. Reduced Number of Alternatives. The final selected concept is derived from different steps of synthesis, mixed with a series of narrowing down process, in which the total search space is noticeable reduced as the solutions become more and more concrete. Tactics for Divergence. Tactics for generating solutions include decomposition, transformation, classification of search space, morphological matrix, etc. Decomposition subdivides function into its respective sub-functions and further so as

46

Y.C. Liu and S.J. Lu

to develop a hierarchical tree. Transformation links the function into possible perceived sketches, i.e., alternative concepts, through investigation of archives, consulting with experts, searching existing products, conducting creative methods, etc. Classification of search space provides a systematic way to explore the potential concepts in a specific domain. Within a certain morphological matrix that the subfunctions are listed in a column of a matrix, and the alternative concepts for each function are placed in adjacent rows. The total number of theoretically possible combinations is equal to the product of the number of concepts for each sub-function. More detailed explanation of the relevant tactics are shown in Cross (2000), Ullman (2002), Ulrich and Eppinger (2003). Tactics for Convergence. Tactics for generating solutions include screening and scoring. Given a list of design criteria (e.g., functionality and manufacturability), one concept is selected as the datum concept or reference concept. For each criterion each concept is marked whether the selected concept is better, worse, or about the same as the datum. A broad range of concept is quickly screened for further score. Importance weights are given for each criterion. Each selected concept is rated as unsatisfactory, moderate, or good using an ordinal scale such as 0, 1, 2, 3, or 4. Concepts are reviewed with a quantitative perspective. Consensus or common understanding of the important design issues can also be reached. Again, the detailed explanation of the relevant tactics are shown in Cross (2000), etc. 2.1 Affordances of the Model This model currently considers only functionality, the use of affordances to support divergent and convergent activities are discussed below. Affordances for Convergence. In the form-giving activity, the solution alternatives are derived from one or few requirements. The rest of the requirements need to be considered. Solution alternatives are selected, otherwise, require refinement to the extent that they afford for these requirements. This activity is done through the user’ manipulation (i.e., possible actions). Some solution alternatives are refined and thus afford the requirements. The others are deleted from further refinement and embodiment. Affordances for Divergence. Once form is given, the designer can explore two types of functions with the concept of affordance. The first type is to explore additional needed functions. The possible manipulation of the physical object could lead to uncover functions that were not thought beforehand. This kind of discovery can add additional use to the design. For example, a pair of scissors affords for cutting. However, though user’s physical interaction or thinking to figure out, the possible use of the scissors can be shown in Figure 2. The second type is to explore undesired functions or side effect from the perspective of “what could go wrong”. This could be useful when considering design for error or design for safety. Let’s take the scissors example again, afford for punching which leads to the possibility of hurting the user or others.

An Investigation of Function Based Design Considering Affordances

47

Fig. 2. Examples of the affordances of a pair of scissors

3 The Simplified Illustrative Example In our model, possible embodiments generated (Figure 3(c)) from one (combined) spatial configuration (Figure 1(b)) are generated. The abstraction level of possible embodiments here is seemed as generic physical embodiments. Solutions are generated through the levels of text, spatial and thus embodiment. A detailed description of how solutions are generated and represented is shown in Liu et al [13]. To use the concept of affordance, the model is considered with three factors, i.e., user, solution, and environment. User is the individual who will eventually use the designed device. Solution is derived as the designer come up with a set of solution alternatives though various abstraction levels of representation. These are considered to be the preliminary form of a final designed device. This designed device exists in the world. Environment refers to the rest of the world that matters to the users. Four types of relations can be discussed with the concept of affordances. These types are user-solution, usersolution-environment, user-environment, solution-user, and environment-solution-user. The definitions of these are shown in Table 1. Table 1. The types of relations considering the concept affordances

Types of relations

Definition

User-Solution

User’s possible actions upon the solution

User-Solution-Environment

User’s possible actions upon the solution under the condition of considering the environment The affordances of the solution that could afford for the user. The affordances of the solution under the condition of the environment that could afford for.

Solution-User Environment-Solution-User

48

Y.C. Liu and S.J. Lu

Therefore, a refined description considering affordances will add the perspectives of the user and the environment as shown in Table 2. One example with the type of “Solution-User” is shown in Table 3. This table describes the designer first follow function-based design to come up with the solution. This solution should have the affordances as shown in the first line of the table. However, the designer with the perspective of “what could go wrong” could help to examine the statements described in the second. Ideally, the refined solution is made to ensure positive affordance is maximized and negative affordance is minimized. One example is shown in Table 4.

(a) One spatial configuration of the topological solution Lever Wedge Tierod Tierod

(b) Four possible physical embodiments of (a)

Fig. 3. The possible physical embodiments of the solution Table 2. Description of a door latch example with functional reasoning considering affordances

Functional Based Design Considering Representation Affordances Input offset

Simplified Design problem

The design problem, among other things, has the input pointing downward and require a leftward output

offset

k

j offset i Output

An Investigation of Function Based Design Considering Affordances

49

Table 2. (continued)

User

The user has the goal of opening the door with the intention that his/her hand raise at an appropriate angle at a certain distance to the door.

Environment

The environment that matters is the door The door affords for installing with the device. frame.

Table 3. Description of a door latch example with affordance in the convergent step

Designer’s Perspective

Uncertainty statements described with the concept of affordance

• The handle affords for pushing towards the door and pulling toward the user. Solution-User (what should • The handle affords for releasing of user's hand. go right) • The handle affords for reversing to the original position. • The locking bar affords for returning to the original position. Solution-User • The door affords for blocking the user's hand. (what could • The handle affords for over-returning to the original position. go wrong) • The locking bar affords for over-returning over the original positions. Table 4. Description of the solution considering user-solution-environment

Designer’s Perspective

Desired behavior

To fulfill the user's initial plan, the user will carry out several steps. First step is to push downward the latch with proper angle and unlock the door Userbar, hold the latch and pull toward the user. Second, is to move and rotate the solutionenvironment door. Thirdly, the user pushes the door back to the original state. Finally, the door latch is released and the door lock moving back to the original position.

Representation

50

Y.C. Liu and S.J. Lu

4 Discussion and Assessment The potential use of the concept of affordance is to ensure that the user’s needs are fully grasped by the designer. Affordances could help to deal with the challenge under the situation where the designer may not be fully understood design problem, solution alternatives, user’s goal intention or the environment surrounded. To ensure the designer’s solution is what the user desires and thus will fit in the specific environment, we believe it is necessary for a constant review using the concept of affordances. Having generated potential physical embodiment, the designer analyzes current generic embodiment with the user’s intention and possible actions in the specific environment. The affordances of a design solution result in searching possible actions of the user as to the solution. Whether the user will conduct a certain action (e.g., the user push downward to the handle) to the solution is a matter of possibility. The possibility can be low or high, depending on the user, the solution or the environment. The action with higher possibility has a higher change of happening. The result of conducting a certain action can be good or ill with various impact levels. Each possible action can be assessed, weighted, and prioritized. Possible actions link to the matter of ‘what might go wrong’ and ‘what could provide more’ are listed for review. The quantity of the possibility to a certain action and the impact of a certain action is useful for the designer to focus on those with high possibility and high impact. Refinement of the solution should ensure positive affordance is maximized and negative affordance is minimized, if it is possible. Preventive actions to mitigate or avoid can be considered before the completion of designing. A recently approach, affordance based design, emphasizes to analyze concepts with respect to desired & undesired affordance early in the design process [9]. Their work contributes to a relation theory of design and the identification of six properties of affordance. Comparing with their work, this paper attempts to use the concept of affordance, to complement the functional reasoning design, rather than developing a new design approach.

5 Conclusion and Future Work This paper has proposed a model in conceptual design of mechanical movement. This model attempts to incorporate the concept of affordances as a complement to function based design. A simple example of door latch design was demonstrated to see how affordances can support in the divergent and convergent design activities. The role of affordances in the initial design process of function based design plays a substantial role with the addition of considering the user’s possible action in the environment. Using the concept of affordances could lead the designer to consider the user’s possible actions, which is important in many design cases. Future work includes exploring more design examples to evaluate and consolidate the model. Also, it is necessary to identify positive and negative functions through steps for the concept of affordance. These steps include guidance or methods for designers to explore the potential functions in a reasonable and prioritized manner. We believe the focus on complementarity as well as the relations of user, solutions

An Investigation of Function Based Design Considering Affordances

51

and environment might need to identify first so as to explore potential functions in a directed manner.

Acknowledgement The work has been funded under grant number NSC 97-2511-S-152-001-MY3 by the National Science Council, R.O.C.

References 1. Pahl, G., Beitz, W.: Engineering Design: A Systematic Approach, 2nd revised edn. Springer, Heidelberg (1999) 2. Ulrich, K.T., Eppinger, S.D.: Product Design and Development, 4th edn. McGraw-Hill, New York (2008) 3. Eggert, R.J.: Engineering Design. Pearson Prentice Hall, New Jersey (2005) 4. Chakrabarti, A., Bligh, T.P.: A Scheme for Functional Reasoning in Conceptual Design. Design Studies Journal 21, 493–517 (2001) 5. Gibson, E.J.: The Theory of Affordances and the Design of the Environment. In: The Symposium on Perception in Architecture, American Society for Esthetics, Toronto (1976) 6. Norman, D.A.: The Design of Everyday Things. Currency Doubleday, New York (1998) 7. Brown, D.C., Blessing, L.: The Relationship Between Function and Affordance. In: ASME 2005 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference. Paper no. DETC 2005-85017 (2005) 8. Murphy, R.R.: Case Studies of Applying Gibson’s Ecological Approach to Mobile Robots. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans 20(1), 105–111 (1999) 9. Maier, J.R.A.: Rethinking Design Theory. Mechanical Engineering 130(9), 34–37 (2008) 10. You, H.C., Chen, K.S.: Applications of affordance and semantics in product design. Design Studies Journal 28, 23–38 (2007) 11. Torenvliet, G.: We can’t afford it! The devaluation of a usability term. Interactions 10(4), 12–17 (2003) 12. McGenere, J., Ho, W.: Affordances: clarifying and evolving a concept. In: Proceedings of Graphics Interface 2000, Montreal, Quebec, Canada, pp. 179–186 (2000) 13. Liu, Y.C., Chakrabarti, A., Bligh, T.P.: Towards an Ideal Approach for Concept Generation. Design Studies Journal 24, 341–355 (2003) 14. Cross, N.: Engineering Design Methods, 3rd edn. John Wiley & Sons, New York (2000) 15. Ullman, D.G.: The Mechanical Design Process, 3rd edn. McGraw-Hill, New York (2002)

CWE: Assistance Environment for the Evaluation Operating a Set of Variations of the Cognitive Walkthrough Ergonomic Inspection Method Thomas Mahatody, Christophe Kolski, and Mouldi Sagar LAMIH – UMR8530, University of Valenciennes and Hainaut-Cambrésis, Le Mont-Houy, F-59313 Valenciennes Cedex 9, France {thomas.mahatody,mouldi.sagar, christophe.kolski}@univ-valenciennes.fr

Abstract. In spite of the existence of several usability inspection methods, they are still insufficiently used because of various constraints, such as the absence of software environments to facilitate their use and the inexistence of guides to choose the best method to be used. This article describes the design of CWE (Cognitive Walkthrough Environment) which is an evaluation assistance environment exploiting the Cognitive Walkthrough inspection method. This environment is intended to facilitate the use of the CW method as well as several of its versions and extensions. Keywords: Cognitive Walkthrough, Inspection, Evaluation, Usability.

1 Introduction Cognitive Walkthrough (CW) was first proposed in 1990 by Lewis and his colleagues to evaluate “walk up and use” interactive systems [12]. This method is based on a theory concerning learning by exploration. It interests many researchers and practitioners, and has been the subject of many studies and experiments. Indeed we found and analyzed a dozen versions and extensions of the method; a synthesis was proposed in [15,16]. The synthesis focuses on three aspects: theoretical, methodological, and technological aspects. On the theoretical aspect for example, we have found various cognitive theories underlying the method; some are based on the action theory [18]. According to this synthesis, CW evolutions and extensions do not have enough tools to assist evaluators. This confirms the criticism by some authors stating that the CW method is awkwardness, tedious to use [7,6]. Consequently, we directed our research towards the specification and the prototyping of an assistance environment exploiting a set of versions and extensions of the Cognitive Walkthrough Method. This environment is called CWE (Cognitive Walkthrough Environment). First, we will describe the context of our research which will be devoted to the principle of the CW method and the various versions and extensions implemented in CWE. Then, we will describe the design of CWE (use case diagram, software architecture, screen pages). Finally a conclusion and perspectives will finish the article. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 52–61, 2009. © Springer-Verlag Berlin Heidelberg 2009

CWE: Assistance Environment for the Evaluation Operating a Set of Variations

53

2 Research Context The evaluation consists in detecting the problems involved in the training and the use of an interactive system and proposing solutions to improve it. The evaluation must be based on the knowledge, the behaviors and the goals of the target users in relation to the system to evaluate. It is also necessary to take account of the environments from which the users interact with the system. There are several approaches for the evaluation of the usability [17]: let us quote for example (1) the user based evaluation, (2) the model based evaluation which consists in modeling the users of the system to be evaluated in order to be able to predict potential problems, or, (3) the inspection based evaluation, where the evaluations are carried out by one or more evaluators of the system. Our research is focused on the CW method which is an evaluation method belonging to the inspection based methods family. Among such methods, the heuristic based one (such as the heuristic evaluation method [17]) is distinguished of those based on cognitive models [21]. CW belongs to the last category which consists in simulating the cognitive process of the user work according to a theoretical model. Indeed, CW is originally based on a theory of the learning by exploration called CE+ [20] in order to evaluate the ease of learning and use of an interactive system. CW is also interesting insofar as its basic principles can be combined with those of other user based methods (it is the case with CW With Users [4]), or other heuristic based methods such as Heuristic Walkthrough [24]. The CW principle in its initial version is the following: the task to be evaluated is broken up into a succession of actions which the user must carry out successively. Then the evaluator has a list of questions which will help this one to focus on essential aspects of the interactive system.

3 CWE Design Our background of the analysis and the practice of several versions and extensions of CW for ten years ago (on hundreds of subjects of Master degree student types in ergonomics, computer science, multimedia and automation) [5]) brings us to specify the CWE, aiming at facilitating the use of CW. This environment must assist the evaluators during the various evaluation stages. Thus while referring to the CW evaluation process, the evaluation comprises three stages: the preparation of the evaluation, the inspection of the system and the results analysis. Fig. 1 provides UML use case diagram of the environment. We distinguish two types of actors: supervisor and evaluator. As the use case diagram suggests, the supervisor is the one who manages and supervises the evaluation process. He or she prepares the evaluation and analyses the results, while the evaluators intervene during the inspection phase. They thus inspect the tasks chosen by the supervisor with a method chosen by the supervisor among the various CW versions and extensions proposed by CWE.

54

T. Mahatody, C. Kolski, and M. Sagar

Fig. 1. UML Use Case Diagram

3.1 Phase 1: Preparation of the Evaluation The phase of preparation consists in defining the entries of the evaluation and the contexts of use of the system. During this phase, the supervisor carries out the following activities: the specification of the system to be evaluated, the modeling of the end users of the system, the choice of the scenarios and the analysis of the tasks to be evaluated. Fig. 2 gives a total insight of the screen pages available for that. Specification of the System to Be Evaluated. This activity consists in gathering information concerning the interactive system to be evaluated according to a model. This model must provide the evaluators with information enabling them to be aware of the domain concepts and the field jargon. This activity is carried out by using the screen page “System” (see Fig. 2). This screen page has two tabs making it possible to specify the system to be evaluated.

CWE: Assistance Environment for the Evaluation Operating a Set of Variations

55

Fig. 2. Global insight of CWE screen pages

The first tab allows characterizing the system to be evaluated. It allows entering the name and the textual description of the system. The domain, the motivation and the context of use of the system can also be described. This information can help the supervisor to choose the evaluation method which the evaluators will have to use. In CWE, several types of motivation are proposed, such as: motivation related to industrial product or general public products. According to the case, for example, the productivity and the cost (industrial case), the design, the esthetics and the ease of use (case of the general public) will be favored. This first tab allows also entering the features of the system and the aspects of each feature. A feature is a limited part of the system: an aspect of a feature is a way of making use of that particular feature [6]. The second tab allows constituting the specific terms of the domain. To describe them in the form of lexicons, we adopt the model suggested by [11]. This model describes a term by two types of description: the notion and the behavioural response. The notion (denotation) described the symbol (WHAT) and the behavioural response (connotation) makes it possible to obtain information about the context (HOW, WHERE, WHEN, etc). These lexicons as well as various information of the first tab are at the disposal of the evaluators during the inspection phase (use case “Learn the entry of the evaluation”). Modeling of the Users. This activity is very important insofar as the CW method recommends in the beginning that the evaluators must be in the position of the real users of the system. This activity thus consists in gathering information concerning the real users; this leads to a user model. This model must inform various aspects of

56

T. Mahatody, C. Kolski, and M. Sagar

the real users (as recommended in [2, 23], for example), their knowledge and their background. This modeling uses the screen page “User” (see Fig. 2). This screen page comprises two tabs. The first one is used to enter the information concerning their global knowledge in computer science; this knowledge (Low, Medium or Good) is evaluated from a general outlook, from the various platforms (Personal Computer, Workstation or PDA…) and the various operating systems known. The second tab is used to enter the backgrounds of the users about the system to be evaluated: expertise (Beginner, Intermediate, or Expert), type of use (Discretionary or Mandatory), frequency of use (Low, Medium or High), etc. Choice of the Scenarios. The choice of the scenarios must take account of the budget and the available time for the evaluations as well as the importance of the various tasks supported by the system. Thus it is necessary to choose as most representative scenarios as possible, i.e. scenarios which embrace as much as possible the whole functionalities of the system. For that it is necessary to use the principle suggested by [6]: i.e. a minimum of scenarios, which embrace the maximum of system features and aspects of these features, has to be chosen. The screen page “Scenario” (see Fig. 2) is dedicated to this activity. It allows naming and describing each scenario as well as the actors and resources concerned. Then for each entered scenario, the concerned features and aspects of features are checked off. The supervisor can add or remove scenario to determine those which it is necessary to evaluate according to the principle mentioned above. Task Analysis. Once the scenarios chosen, their analysis is carried out: it is a question of describing and of breaking up them. The screen page “Task” (Fig. 2, and without reduction in Fig. 3) allows describing a scenario by entering the information necessary to the description of an activity. According to [14], this information is those described in Fig. 3: goal, duration, frequency, etc. The screen page “Task” also makes it possible to break up the scenarios hierarchically in order to obtain a succession of actions. It is this continuation of actions which will be evaluated while answering the questions of the method chosen by the supervisor. The Choice of the Evaluation Method. This stage consists in choosing the evaluation method to be used among the various versions or extensions available in CWE. It is pointed out that CWE can contain several evaluation methods. The screen page “Method” (see Fig. 2) allows adding new evaluation method. CWE supports the following methods : CW version 1 [12], CW version 2 [21], CW version 3 [27], Heuristic Walkthrough (HW) [24], Norman CW [22], streamlined CW [25], CW for the Web [1], CW With Users [4], Extended CW [9,10] and Distributed CW [3]. The screen page “Choosing Method” (see Fig. 2) proposes the list of the methods known by the environment. The supervisor can choose a method directly or ask the system to propose one adapted to the system to evaluate, according to a set of characteristics. For example, for an industrial or commercial application, the productivity is paramount and is often privileged compared to the ease of use. While for a domestic application, the ease of use can be paramount to market the product. Then for the latter, CWE can propose an evaluation method based on the theory of learning by exploration CE+ [20] (like CW in its first version).

CWE: Assistance Environment for the Evaluation Operating a Set of Variations

57

Fig. 3. Task analysis screen page

3.2 Phase 2: Inspection The phase of inspection of the tasks comprises two main stages: evaluation handover and evaluation. Evaluation Handover. This stage allows the evaluators being familiarized with the field concerned with the system to evaluate. For that, a screen page “Learning” (see Fig 2) proposes four tabs. They allow exploring the information concerning the system and its users whom the supervisor introduced into the preparation phase. The first tab gives information on the system to be evaluated: its various characteristics and functionalities as well as the various terms used in the field concerned with the system. This leads the evaluator to interact with the system while understanding the various messages available in this one. The second tab gives access to information about the users (knowledge, background…). Evaluators who have this information on their minds could simulate easier the cognitive processes followed by the future users. The third and fourth tabs give information about the scenarios and the various actions to be evaluated. During this stage, the evaluators can explore the tutorials proposed by CWE in the “Help” menu. These tutorials concern as well as the CW versions and extensions as CWE. Evaluation. The second stage relates to the evaluation of each concerned task by using the selected evaluation method. According to our approach, each method comprises a set of questions which can be divided into four categories: the category of the questions to answer before the execution of the action, the category of the

58

T. Mahatody, C. Kolski, and M. Sagar

Fig. 4. Confrontation of results obtained by several evaluators (extract)

questions to answer during the execution of the action, the category of the questions to answer after the execution of the action and finally the category of the questions to answer if the task is of co-operative category (while being based on the CW extension called: Groupware Walkthrough [19]). This categorization will be useful for possible comparative studies of the evaluation methods in the phase of results analysis. A CW extension can be also associated with other methods. For example, HW corresponds to the third CW version associated with a heuristic evaluation method. CW With Users corresponds to the third CW version associated with a think aloud evaluation method. The evaluation of a task consists in evaluating each action related to the task while answering each question suggested by CWE by using the screen page “Inspection” (see Fig. 2). According to our model, the evaluator must link the action to the context, then he/she answers positively or negatively each question while justifying his/her answers. CWE records the answers and the justifications provided by the evaluator. A negative answer announces the existence of usability problems which the evaluator must also record by providing the following information: the number of the problem, its description, justification, percentage of the users who can encounter the problem, frequency (low, often, very often, or always), gravity (Tolerable, Moderate, Serious or Critical) and possible solutions to solve the problem.

CWE: Assistance Environment for the Evaluation Operating a Set of Variations

59

3.3 Phase 3: Analyzing Inspection Results The objective of the results analysis is to produce a usability problem description report. For that, the supervisor must use the data provided by the evaluators during the inspection phase. CWE produces in particular the following dashboards: summary of the results, confrontation of the results obtained by the evaluators, confrontation of the results obtained using different methods, synthesis of the usability problems, confrontation of the usability problems found by evaluator, and confrontation of the usability problems found using different methods. From these various dashboards and information relating to the system, the users, the tasks and the evaluators, the supervisor can carry out the following activities to produce relevant evaluation reports (according to the recommendations of [8]): to detect the false alarms, to detect the evaluator’s errors, to evaluate the evaluator’s biases, to evaluate the severity of each problem, to detect the conflict recommendations and finally to examine the problems according to several abstraction. The table inside the Fig. 4 allows for example detecting some evaluator’s errors by examining the provided justifications.

4 Conclusion and Prospects We have described in this article the design of CWE. It is an assistance environment for the ergonomic evaluation of interactive systems. This environment exploits several versions and extensions of the Cognitive Walkthrough method, and encompasses the three main phases used for an evaluation. CWE is under development and tests. It allows modeling various versions and extensions, including versions and extensions with two passes as “Heuristic Walkthrough” [24], or “Cognitive Walkthrough With Users” [4]. It is an open system which allows adding other CW version or extensions. CWE should in the middle term contribute to the improvement of the methodological and technological aspects of CW. On the methodological aspect for example, our model requires each answer to be justified (in order to facilitate later interpretation of it). Even if we do not have yet real results from evaluations, we can conclude that preliminary objectives are achieved such as the facilitation in using CW versions and extensions, the possibility of analyzing the evaluation results considering various characteristics of the system, evaluator profiles and evaluation methods. We plan thereafter to make comparative studies about interactive systems from different domains evaluated by evaluators with different profiles, using various methods implemented in CWE. This will enable us to improve progressively the knowledge about CW and its versions and variants, with the aim for example to predict which method fits better for each type of interactive system. Acknowledgements. The present research work has been partially supported by the “Ministère de l'Education Nationale, de la Recherche et de la Technologie », the « Région Nord Pas-de-Calais », the FEDER (Fonds Européen de Développement Régional) and the GDR E HAMASYT. The authors gratefully acknowledge the support of these institutions.

60

T. Mahatody, C. Kolski, and M. Sagar

References 1. Blackmon, M., Polson, P., Kitajima, M., Lewis, C.: Cognitive Walkthrough for the Web. In: Proceeding of CHI, pp. 463–470. ACM Press, New York (2002) 2. Brusilovsky, P., Millán, E.: User Models for Adaptive Hypermedia and Adaptive Educational Systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) Adaptive Web 2007. LNCS, vol. 4321, pp. 3–53. Springer, Heidelberg (2007) 3. Eden, J.: Distributed cognitive walkthrough (DCW): a walkthrough-style usability evaluation method based on theories of distributed cognition. In: Proceedings of the 6th ACM SIGCHI conference on Creativity & cognition, Washington, DC, USA, June 13-15 (2007) 4. Granollers, T., Lorés, J.: Cognitive Walkthrough With Users: an alternative dimension for usability methods. In: Proceedings HCI International, Las Vegas (2005) 5. Huart, J., Kolski, C., Sagar, M.: Evaluation of multimedia applications using inspection methods: The CW case. Interacting with Computers 16, 183–215 (2004) 6. Jacobsen, N.E., John, B.E.: Two Case Studies in Using Cognitive Walkthrough for Interface Evaluation. Carnegie Mellon University School of Computer Science Technical Report No. CMU-CS-00-132 (2000) 7. Jeffries, R., Miller, J.R., Wharton, C., Uyeda, K.M.: User interface evaluation in the real world: a comparison of four techniques. In: Proc. of the ACM CHI 1991 Conf. on Human Factors in Computing Systems, pp. 119–124. ACM Press, New York (1991) 8. Jeffries, R.: Usability problem reports: Helping evaluator communicate effectively with developers. In: Nielsen, J., Mack, R.L. (eds.) Usability Inspection Methods, pp. 273–294. John Wiley & Sons, New York (1994) 9. Kato, T., Hori, M.: Articulating the cognitive walkthrough based on an extended model of HCI. In: Proceedings HCI International, Las Vegas (2005) 10. Kato, T., Hiro, M.: Beyond Perceivability: Critical Requirements for Universal Design of Information. In: 8th Annual ACM Conf. on Assistive Technologies, pp. 287–288. ACM Press, New York (2006) 11. Leite, J.C.S.D.P., Hadad, G.D.S., Doorn, J.H., Kaplan, G.N.: A Scenario Construction Process. In: Requirements Engineering, pp. 38–61. Springer, London (2002) 12. Lewis, C., Polson, P., Wharton, C., Rieman, J.: Testing a walkthrough methodology for theory-based design of walk-up-and-use interfaces. In: Proc. ACM CHI, pp. 235–242 (1990) 13. Lewis, C., Wharton, C.: Cognitive Walkthrough. In: Helander, M., Landaeur, T.K., Prabhu, P. (eds.) Handbook of Human-Computer Interaction, pp. 717–732. Elsevier, Amsterdam (1997) 14. Lucquiaud, V.: Proposition d’un noyau et d’une structure pour les modèles de tâches orientés utilisateurs. In: Proceedings IHM 2005, Toulouse, France, pp. 83–90 (2005) 15. Mahatody, T., Sagar, M., Kolski, C.: Cognitive Walkthrough pour l’évaluation des IHM: synthèse des extensions et évolutions conceptuelles, méthodologiques et technologiques. In: Proceedings of IHM 2007, 19ème Conférence de l’Association Francophone d’Interaction Homme-Machine, Paris, France, November 13-15, 2007. International Conference Proceedings Series, pp. 143–150. ACM Press, New York (2007) 16. Mahatody, T., Sagar, M., Kolski, C.: Cognitive Walkthrough for HCI evaluation: basic concepts, evolutions and variants, research issues. In: Proceedings EAM 2007 European Annual Conference on Human-Decision Making and Manual Control, Technical University of Danemark, Lyngby (2007)

CWE: Assistance Environment for the Evaluation Operating a Set of Variations

61

17. Nielsen, J., Molich, R.: Heuristic Evaluation of User Interfaces. In: Proceedings of CHI conference on Human Factors in Computing Systems, pp. 249–256 (1990) 18. Norman, D.A.: Cognitive engineering. In: Norman, D.A., Draper, S.W. (eds.) User centered systems design: New perspectives in human-computer interaction, pp. 31–61. Lawrence Erlbaum Associates, Hillsdale (1986) 19. Pinelle, D., Gutwin, C.: Groupware walkthrough: Adding context to groupware usability. In: Proceedings of ACM CHI, pp. 455–462. ACM Press, New York (2002) 20. Polson, P.G., Lewis, C.H.: Theory-based design for easily learned interfaces. HumanComputer Interaction 5(5), 191–220 (1990) 21. Polson, P., Lewis, C., Rieman, J., Wharton, C.: Cognitive Walkthroughs: a method for theory-based evaluation of user interface. International Journal of Man-Machine Studies 36, 741–773 (1992) 22. Rizzo, A., Mandrigiani, E., Andreadis, A.: The AVANTI Project: Prototyping and Evaluation with a Cognitive Walkthrough Based on the Norman’s Model of Action. In: Proceedings of Designing Interactive Systems: Processes, Practices, Methods, & Techniques, pp. 305–309 (1997) 23. Robert, J.M.: Que faut-il savoir sur les utilisateurs pour réaliser des interfaces de qualité? In: Boy, G. (ed.) Ingénierie cognitive: IHM et cognition, pp. 249–283. Hermès science publications, Paris (2003) 24. Sears, A.: Heuristics Walkthroughs: Finding the problems without the noise. IJHCI 9, 213–234 (1997) 25. Spencer, R.: The Streamlined Cognitive Walkthrough method, working around social constraints encountered in a software development company. In: Proceedings of ACM CHI, pp. 353–359. ACM Press, New York (2000) 26. Spencer, R.: The streamlined cognitive walkthrough method, working around social constraints encountered in a software development company. In: Proceedings of the CHI 2000 conference on Human Factors in computing systems, pp. 353–359 (2000) 27. Wharton, C., Rieman, J., Lewis, C., Polson, P.: The Cognitive Walkthrough Method: A practitioner’s Guide. In: Nielsen, J., Mack, R.L. (eds.) Usability Inspection Methods, pp. 105–140. John Wiley & Sons, New York (1994)

The Use of Multimodal Representation in Icon Interpretation Siné McDougall1, Alexandra Forsythe2, Sarah Isherwood3, Agnes Petocz4, Irene Reppa5, and Catherine Stevens4 1

Bournemouth University, Fern Barrow, Poole, BH12 5BB, UK 2 Liverpool John Moores University, UK 3 School of Healthcare, University of Leeds, Leeds, LS2 9JT 4 University of Western Sydney, South Penrith DC, NSW 1797, Australia 5 Swansea University, Singleton Park, Swansea, SA2 8PP, UK [email protected], [email protected], [email protected], {kj.stevens,a.petocz}@uws.ed.au, [email protected]

Abstract. Identifying icon functions differs from naming pictures in that strong semantic links between pictures and their names have been formed over a long period of time whereas the meaning of icons has often to be learned. This paper examines roles of icon characteristics such as complexity, concreteness, familiarity and aesthetic appeal in determining how easily icons can be learned and identified. The role of these characteristics is seen as dynamic, changing as the user learns the icon set. It is argued that the way in which users learn icon meanings is similar to the processes involved in language learning. Icon meanings are learned by drawing on rich multimodal representations which are the result of our world experience. This approach could lead to a better understanding of how multimodal information can be most usefully presented on interfaces.

1 Emerging Themes in Icon Research Given that communication using icons is now commonplace, it is important to understand the processes involved in icon comprehension and the stimulus cues which individuals utilize to facilitate identification. Earlier research attempted to isolate individual key characteristics which were important in determining such as icon comprehension such as concreteness [1, 2, 3] and visual complexity [4, 5] However, recent work has uncovered a more multifaceted and dynamic picture in which (a) other characteristics such as familiarity, semantic distance and user appeal are important [6, 7, 8] (b) icon characteristics are closely inter-related [e.g. 6, 9], and (c) where the importance of icon characteristics in determining identification changes as the icon set is learned [9]. At the root of effective iconic communication is the way in which the user constructs a relationship between the icon and the function it represents. This has received particular attention when auditory icons are used because, by their very nature, they are more ambiguous and creating an effective icon-referent mapping can be particularly difficult [10, 11, 12, 13]. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 62–70, 2009. © Springer-Verlag Berlin Heidelberg 2009

The Use of Multimodal Representation in Icon Interpretation

63

2 The Effects of Learning on Predictors of Usability One of the ways in which icons differ from pictures is that strong semantic links between pictures and their names have usually been forged over a long period whereas the intended meaning of an icon often has to be learned. This means that the balance of characteristics which are important in determining usability, and the relationships between those characteristics, changes as users learn icon sets. Changes in the role of each icon characteristic, and their relationship to other characteristics, as learning takes place is now considered. 2.1 Concreteness Concrete icons which use depictions of real objects and people allow individuals to use their knowledge of the everyday world in order to access meaning (cf Fig. 1a and b with c and d) [ 2, 3, 14, 15, 16]. Stevens and Petocz [17] have shown that this holds true for auditory, as well as, visual icons. ‘Natural auditory indicators’ such as the sound of a fire engine siren to indicate ‘engine fire’ or the sound of coughing to indicate ‘dangerous levels of carbon monoxide’ allowed participants to use their knowledge of what these sounds normally indicated in order to infer their meaning. Inferring was much more difficult when auditory icons were simple bursts of sound which bore no obvious relationship to the meaning they were indicating. Stevens and Petocz also found that the modality in which icons were presented was important. Individuals responded more accurately and effectively to concrete icons when they were visual than when they were auditory. This appeared to be because auditory icons are generally more ambiguous (e.g. the sound of coughing can indicate many things) and require more interpretation by the user. Until recently concreteness was sometimes seen as an icon’s most important property, however research now suggests that the effects of concreteness on user performance are less than previously thought [9]. This is because only a limited number of functions can be represented concretely and getting a close fit between pictures and functions is not always easy. For example, naming the bottle in Figure 1e does not allow you to arrive at its intended meaning.

(a) heliport

(b) slow processing

(d) male

(e) blow moulding

(c) zoom

Fig. 1. Examples of different types of icons

64

S. McDougall et al.

Initial differences in performance between concrete and abstract icons are not long lasting and diminish as users gain experience with icons [1, 7, 9]. Other icon properties therefore assume greater importance as learning progresses. 2.2 Semantic Distance Semantic distance is a measure of the goodness-of-fit between the icon and its meaning. Isherwood and others [9, 18, 19] have shown that semantic distance is more important than concreteness in the initial stages of learning because it is an index of the closeness of icon-function relationships irrespective of whether or not the icons are concrete or abstract. This is illustrated in Figure 2. In a study carried out by McDougall, Curry & de Bruijn [20] three sets of icons which were used to indicate functionality on a ‘general purpose vehicle’ designed to overcome obstacles in a road. The abstract and arbitrary icon sets are equally non-pictorial, using primarily shapes and arrows rather than real world items to depict meaning. However, the abstract icons still allowed the user to draw inferences about what might be meant while the arbitrary icons were randomly allocated to functions. Three groups of participants learned the icon sets. Those learning the concrete and abstract sets did equally well but those learning the arbitrary set lagged well behind. The latter finding mirrors the findings reported by Stevens & Petocz [16] for their ‘auditory symbolic’ icons in which the meaning of abstract bursts of sounds were only gradually learned by association.

Icon Sets Concrete

Abstract

Arbitrary

Accelerate

Dump load

Push

Fig. 2. Examples of icons and functions for concrete, abstract and arbitrary icon sets

2.3 Familiarity Superficially at least, it seems a truism that familiarity is an important factor in icon interpretation, since familiarity implies something which has already been learned. However, familiarity for icons can take a number of forms; our familiarity with what is depicted in the icon, with the relationship between the icon and its function, and with the function itself. For example, our familiarity with the icon-function

The Use of Multimodal Representation in Icon Interpretation

65

relationship allows us to identify the icon representing ‘male’ in Figure 1e more quickly and effectively compared with the icon representing ‘slow processing’ (Figure 1b), where we are familiar with what is depicted. Thus, while it can be useful to exploit our familiarity with objects or environmental sounds, if it requires the users to unlearn previously salient meanings such exploitation is likely to have much less utility [17]. Semantic distance is more important than icon familiarity in determining user performance initially as we form relationships between icon and function. Once these have been learned, our familiarity with the icon and the icon-function relationship determine speed and accuracy of responding since it is an index of ease of access to long-term memory [21]. 2.4 Complexity Visual complexity has been most closely associated with the search for, and perception of, icons in displays [4, 7]. The more complex icons are the longer search times on an interface is likely to be. The effects of visual complexity appear to be long lasting with differences in performance between simple and complex icons remaining even after considerable training [7]. Because of its association with visual search, rather than icon identification, its effects on user performance were thought to be relatively independent of other icon characteristics. However, Forsythe’s research [22] has shown that this is not the case. Visual complexity is particularly amenable to the use of metrics and a number of possible measures have been developed. These typically assess a combination of the number of lines and shapes out of which the stimulus is composed. However, Forsythe et al’s research [22] has made it clear that these metrics are often biased by our familiarity with the stimulus (more familiar stimuli are perceived as being simpler). For example in the metric developed by Garcia, Badre & Stasko [23], metric calculation is based on icon features including the number of closed and open figures as well as horizontal and vertical lines. It is clear that shapes we are familiar with are more likely to be seen as whole components within the figure, while unfamiliar shapes may be seen as a series of horizontal and vertical lines, so increasingly the complexity value of the stimulus under scrutiny. This ‘hidden’ relationship between visual complexity and familiarity may account to some extent for the lack of robustness of visual complexity effects with respect to either pictures or icons [24, 25, 26, 27, 28, 29, 30, 31] and suggests that it may have more of a role to play in icon identification than was previously thought. 2.5 Aesthetic Appeal Research to date for both pictures and icons has shown that there are close relationships between icon characteristics and perceived appeal [32]. Icons that are concrete, familiar and simple have greater appeal in comparison to those that abstract, unfamiliar, and complex [33,34,35,36,37,38]. This evidence might suggest that appeal is simply an emergent property of ease of processing, i.e. when a stimulus is easily and fluently processed this leads to a later positive attribution of this experience as ‘liking’ for that stimulus [39]. However, the research presented by McDougall,

66

S. McDougall et al.

Reppa, Smith & Playfoot [40] suggests that aesthetic appeal has a more active part to play in determining user performance. When users are presented with appealing complex icons performance is better than for unappealing complex icons. Although such effects were not apparent when simple icons were present, this shows that icon appeal can have a direct effect on user performance in a way that the user is unlikely to be conscious of. Such findings therefore go beyond the idea that, if users find an interface appealing they are more likely to spend time learning it, so enhancing performance. Nor is appeal purely stimulus driven. Users underlying cognitive skills also appear to have subtle effects on aesthetic appeal. The direction in which users habitually read influences their preference patterns for icons [40, Study 3]. The trajectory of the effects of aesthetic appeal in determining usability as icons are learned is not yet known but what is clear on the basis of the findings reported in this session is that it is likely to affect usability both in our initial and later encounters with icon sets and that its effects are likely to be closely linked to the effects of other icon characteristics such as concreteness, familiarity and visual complexity.

3 Utilizing Users’ Multimodal Knowledge to Enhance Interpretation While it is important to be able to describe the changing role of icon characteristics as users learning icon sets, contributions to this parallel session [17, 18] have made it clear that the key to enhancing usability is to enable the user to form meaningful connections between the icon and what it is referring to. The closer the icon-function connections formed, irrespective of modality, the more successful the icon is likely to be. What is less clear is the precise form that these connections take, the nature of the representations that underpin them, and the processes involved in learning them. Recent research in psycholinguistics and robotics has led to theoretical developments which may provide a key to understanding the semantic processing and representation underpinning multimodal displays and how they might be optimized. Grounded cognition theorists argue that semantic information is not stored and processed in an amodal system independent of our sensory systems [41] and that learning takes place by assimilating information multimodally [42]. Grounded cognition theories propose that memory contains many multimodal components from vision, audition, action, affect and language. Retrieving a memory therefore involves reactivating multimodal components together. Such approaches have their roots in early theories of situated action proposed by Gibson and others [43, 44] which have been influential in interface design practice. Barsalou [41] used the experience of sitting in a chair as an example of how multimodal information capture and subsequent retrieval might work. As we sit down in a chair, the brain captures information across the modalities and integrates them with a multimodal representation in memory. This representation will include, not only how the chair looks, but also how it feels, the creaks it makes as one sits down, the action of sitting, and introspections of comfort and relaxation. Later, when knowledge is needed about a chair (e.g. a chair depicted in a concrete icon to indicate a seating area), our previous multimodal experiences are reactivated. Thus the perception of relevant objects or sounds in icons triggers affordances for action stored

The Use of Multimodal Representation in Icon Interpretation

67

in memory [45]. Therefore the extent to which icons are able to activate appropriate memory representations with accompanying appropriate affordances is an important determiner of usability. Plausible ways in which different types of icon-function relationships might be built are as follows:(i) Direct relationships: These exploit existing memory representations very straightforwardly. Simply naming the content of the icon provides sufficient information to infer meaning (e.g. the use of helicopter to indicate a heliport in Fig 1a; the use of the sound of a fire engine siren to indicate fire). (ii) Indirect relationships: In these cases the icon has an augmentative function providing an indication of how the meaning might be inferred (e.g slow processing in Fig 1b or the sound of coughing to indicate the build up of noxious gases). Possible referents in memory, though rich in information, are less definite and more diffuse and therefore are more prone to error when the iconreferent relationship is being learned. Abstract icons may benefit particularly if information about likely affordances is indicated in the icon (e.g. the abstract icons in Fig. 2). (iii) Distant relationships: In these cases icon-referent relationships are learned by rote through convention. When initially encountered the icon provides virtually no cues to helpful memory representations, except possibly that this is a type of symbolic information which must be learned and decoded (e.g. the arbitrary icons in Fig. 2; the use of tonal bursts of sound at set frequencies to provide warnings [17]). As the icon-referent relationships are learned they will engender situated memory representations but may never have the same depth or richness of information available to them as direct and indirect relationships. (iv) Inappropriate relationships: When icon-referent relationships cue inappropriate and competing memory representations which cut across their intended meaning, they can be unhelpful. The extent to which this is the case will depend on the richness of the inappropriate memories triggered and will need to unlearn these associations (e.g. ‘blow moulding’ in Fig. 1e). Using a grounded cognition approach, we would expect that the meaning of icons is learned in a similar way to the meaning of words and that the process of learning icon-function relationships could be characterized in the same manner. Given this assumption, it is not surprising that the there appear to be strong parallels between children’s language acquisition and our ease of learning different types of iconreferent relationships. Across languages, children’s first words consist of object names relating to the items that surround them such as food and toys. This is followed by verbs related to simple actions. Research has shown that the ease with which a word is learned depends upon its concreteness and the extent to which it refers to tangible real world items [46]. Therefore, object names and action verbs are learned first because they are more directly observable and concrete. Later learned words correspond to abstract concepts (e.g. method, justice) that are not directly grounded in experiences with our surroundings. Initial and imageable words directly grounded in physical and perceptual experiences serve as a foundation for the acquisition of abstract words and ideas that become indirectly grounded through their relations to those grounded words

68

S. McDougall et al.

(i.e. the use of metaphors with the real world). The capture of multiple stimulus attributes across different modalities to form rich memory representations also helps to explain the probabilistic and complex way in which multiple icon attributes contribute to icon usability and appeal. One of the difficulties in using multimodal interfaces has been our lack of understanding of precisely how each modality contributes to user experience and vice versa. Given that multimodal learning interfaces are now being built to examine the acquisition of language and meaning using mathematical algorithms [47], this may provide a rich source of information about just how multi-modal information can be balanced to maximally enhance the information provided via interfaces to users.

References 1. Green, A.J.K., Barnard, P.J.: Iconic interfacing: The role of icon distinctiveness and fixed or variable screen locations. In: Diaper, D., et al. (eds.) Human-computer interaction – Interact 1990. Elsevier, North-Holland (1990) 2. Rogers, Y., Oborne, D.J.: Pictorial communication of abstract verbs in related to humancomputer interaction. British Journal of Psychology 78, 99–112 (1987) 3. Stotts, D.B.: The usefulness of icons on the computer interface: Effect of graphical abstraction and functional representation on experienced and novice users. In: Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting, pp. 453–457. Human Factors and Ergonomics Society, Santa Monica (1988) 4. Byrne, M.D.: Using icons to find documents: Simplicity is critical. In: Proceedings of INTERCHI 1993, pp. 446–453 (1993) 5. Scott, D.: Visual search in modern human-computer interfaces. Behaviour and Information Technology 12, 174–189 (1993) 6. Forsythe, A., Mulhern, G., Sawey, M.: Confounds in pictorial sets: The role of complexity in basic-level picture processing. Behavior Research Methods 40, 116–129 (2008) 7. McDougall, S.J.P., de Bruijn, O., Curry, M.B.: Exploring the effects of icon characteristics on user performance: The role of icon concreteness, complexity, and distinctiveness. Journal of Experimental Psychology: Applied 6, 291–306 (2000) 8. Goonetilleke, R.S., Shih, H.M., On, H.K., Fritsch, J.: Effects of training and representational characteristics in icon design. International Journal of Human-Computer Studies 55, 741–760 (2001) 9. Isherwood, S., McDougall, S., Curry, M.: Icon identification in context: The role of icon characteristics and user experience. Human Factors 49, 465–476 (2007) 10. Reppa, I., McDougall, S.: Visual aesthetic appeal speeds processing of complex but not simple icons. In: Proceedings of the 52nd Annual Meeting of the Human Factors and Ergonomics Society 2008, pp. 1155–1159 (2008) 11. McKeown, D., Isherwood, S.: Mapping candidate within-vehicle auditory displays to their referents. Human Factors 49, 417–428 (2007) 12. Petocz, A., Keller, P., Stevens, C.: Auditory warnings, signal-referent relations and natural indicators: re-thinking theory and application. J. Exp. Psychol: Applied 14, 165–178 (2008) 13. Stephan, K.L., Smith, S.E., Martin, R.L., Parker, S.P.A., McAnally, K.: Learning and retention of associations between auditory icons and denotative referents: implications for the design of auditory warnings. Human Factors 48, 288–299 (2006)

The Use of Multimodal Representation in Icon Interpretation

69

14. Arend, U., Muthig, K.-P., Wandmacher, J.: Evidence for global feature superiority in menu selection by icons. Behaviour and Information Technology 6, 411–426 (1987) 15. Stammers, R.B., George, D.A., Carey, M.S.: An evaluation of abstract and concrete icons for a CAD package. In: Megaw, E.D. (ed.) Contemporary ergonomics 1989, pp. 416–421. Taylor & Francis, London (1989) 16. Stammers, R.B., Hoffman, J.: Transfer between icon sets and ratings of icon concreteness and appropriateness. In: Proceedings of the Human Factors Society 35th Annual Meeting, pp. 354–358. Human Factors and Ergonomics Society, Santa Monica (1991) 17. Stevens, K., Petocz, A.: The user knows: Considering the cognitive contribution of the user in the design of auditory warnings. In: Proceedings of HCI International 2009 (2009) (in press) 18. Isherwood, S.J.: Graphics and semantics: The relationship between what is seen and what is meant in icon design. In: Proceedings of HCI International 2009 (2009) (in press) 19. McDougall, S.J.P., Isherwood, S.J.: What’s in a name? The role of graphics, functions, and their interrelationships in icon identification. Behaviour Research Methods (in press) 20. McDougall, S.J.P., Curry, M.B., de Bruijn, O.: The effects of visual information on users’ mental models: An evaluation of pathfinder analysis as a measure of icon usability. International Journal of Cognitive Ergonomics 5, 59–84 (2001) 21. Lambon-Ralph, M., Graham, K.S., Ellis, A.W., Hodges, J.R.: Naming in semantic dementia: What matters? Neuropsychologia 36, 775–784 (1998) 22. Forsythe, A.M.: Visual complexity: Is that all there is? In: Proceedings of HCI International 2009 (2009) (in press) 23. Garcia, M., Badre, A.N., Stasko, T.: Development and validation of icons varying in their abstractness. Interacting with Computers 6, 191–211 (1994) 24. Alario, F.-X., Ferrand, L., Laganaro, M., New, B., Frauenfelder, U.H., Segui, J.: Predictors of picture naming speed. Behavior Research Methods, Instruments & Computers 36, 140– 155 (2004) 25. Humphreys, G.W., Riddoch, M.J., Quinlan, P.T.: Cascade processes in picture identification. Cognitive Neuropsychology 5, 67–103 (1988) 26. Ellis, A.W., Morrison, C.M.: Real age-of-acquisition effects in lexical retrieval. Journal of Experimental Psychology: Learning, Memory & Cognition 24, 515–523 (1998) 27. Barry, C., Morrison, C.M., Ellis, A.W.: Naming Snodgrass and Vanderwart pictures: Effects of age of acquisition, frequency, and name agreement. Quarterly Journal of Experimental Psychology 50A, 560–585 (1997) 28. Bonin, P., Peereman, R., Malardier, N., Méot, A., Chalard, M.: A new set of 299 pictures for psycholinguistic studies: French norms for name agreement, image agreement, conceptual familiarity, visual complexity, image variability, age of acquisition, and naming latencies. Behavior Research Methods, Instruments, & Computers 35, 158–167 (2003) 29. Cuetos, F., Ellis, A.W., Alvarez, B.: Naming times for the Snodgrass and Vanderwart pictures in Spanish. Behavior Research Methods, Instruments and Computers 31, 650–658 (1999) 30. Paivio, A., Clark, J.M., Digdon, N., Bons, T.: Referential processing: Reciprocity and correlates of naming and imaging. Memory & Cognition 17, 163–174 (1989) 31. Snodgrass, J.G., Yuditsky, T.: Naming times for the Snodgrass and Vanderwart pictures. Behavior Research Methods, Instruments, & Computers 28, 516–536 (1996) 32. McDougall, S., Reppa, I.: Why do I like it? The relationship between icon characteristics, user performance and aesthetic appeal. In: Proceedings of the 52nd Annual Meeting of the Human Factors and Ergonomics Society 2008, pp. 1257–1261 (2008)

70

S. McDougall et al.

33. Berlyne, D.E.: Studies in the new experimental aesthetics. Hemisphere, Washington (1974) 34. Jacobsen, T., Hofel, L.: Aesthetic judgments of novel graphic patterns: Analyses of individual judgments. Perceptual & Motor Skills 95, 755–766 (2002) 35. Bauerly, M., Liu, Y.: Computational modeling and experimental investigation of the effects of compositional elements on interface and design aesthetics. In: Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice (2003) 36. Vartanian, O., Goel, V.: Neuroanatomical correlates of aesthetic preferences for paintings. Neuroreport 15, 893–897 (2004) 37. Kawabata, H., Zeki, S.: Neural correlates of beauty. Journal of Neurophysiology 91, 1699– 1705 (2004) 38. Bornstein, R.F.: Exposure and affect: Overview and meta-analysis of research, 1968-1987. Psychological Bulletin 106, 265–289 (1989) 39. Reber, R., Schwarz, N., Winkielman, P.: Processing fluency and aesthetic pleasure: Is beauty in the perceiver’s processing experience? Personality and Social Psychology 8, 364–382 (2004) 40. McDougall, S., Reppa, I., Smith, G., Playfoot, D.: Beyond emoticons: Combining affect and cognition in icon design. In: Proceedings of HCI International 2009 (2009) (in press) 41. Barsalou, L.W.: Grounded cognition. Annual Review of. Psychology 59, 617–645 (2008) 42. Clark, A.: Being there: Putting brain, body, and world together again. MIT Press, Cambridge (1997) 43. Lakoff, G., Johnson, M.: Metaphors we live by. University Chicago Press, Chicago (1980) 44. Gibson, J.J.: The ecological approach to visual perception. Houghton Mifflin, New York (1979) 45. Glenberg, A.M., Schroeder, J.L., Robertson, D.A.: Averting the gaze disengages the environment and facilitates remembering. Mem. Cogn. 26, 651–658 (1998) 46. Gillette, J., Glietman, H., Glietman, L., Lederer, A.: Human simulations of vocabulary learning. Cognition 73, 135–176 (1999) 47. Yu, C., Ballard, D.H.: A multimodal learning interface for grounding spoken language in sensory perceptions. ACM Transactions on Applied Perceptions 1, 57–80 (2004)

Beyond Emoticons: Combining Affect and Cognition in Icon Design Siné McDougall1, Irene Reppa2, Gary Smith1, and David Playfoot2 1

Bournemouth University, Fern Barrow, Poole, BH12 5BB, UK Swansea University, Singleton Park, Swansea, SA2 8PP, UK {smcdougall,i.reppa,gsmith}@bournemouth.ac.uk, [email protected] 2

Abstract. Recently there has been a shift in emphasis from interface usability to interface appeal. Very few studies, however, have examined the link between usability and appeal and evidence regarding the direction of the relationship between the two remains equivocal. This paper examines the nature of the relationships between the usability and aesthetic appeal of icons. The findings from three studies presented here show evidence, not only for the symbiotic relationship between aesthetic preference and performance, but also for the possible causal links between the two. The implications of these findings for interface design and theoretical explanations of usability are discussed. Keywords: Icon, Affective computing, Aesthetic preference, Performance, Usability.

1 Introduction Hassenzahl & Tractinsky [1] recently pointed out that ‘user experience’, a term which previously would have referred exclusively to the usability of an interface, has broadened over the last decade to include our affective response to the interface. They note that ‘as technology matured, interactive products became not only more useful and usable, but also fashionable, fascinating things to desire’ (p. 91). As a result of this change in focus, research examining the ways in which the appeal of interface can be enhanced has flourished [e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. Despite the growing recognition that enhancing the aesthetic appeal of an interface may be just as important as improving its usability, the nature of the relationship between measures of usability (e.g. accuracy and response time) and appeal is poorly understood. In this paper we argue that knowing more about the nature of this relationship is important if we are to optimize the working relationship between usability and appeal in order to give added value and enhance interface design in a new way. Research has already shown that usability and appeal are inter-related in a general way. Kuroso & Kashimura [13] found that more aesthetically pleasing designs for an automated teller machine were perceived to be more usable. This finding has since been replicated by Tractinsky and his colleagues who on the basis of their results put forward the idea that ‘what is beautiful is usable’ [10, 14]. Hassenzahl’s [6] research, D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 71–80, 2009. © Springer-Verlag Berlin Heidelberg 2009

72

S. McDougall et al.

however, challenged this assumption. When he examined the perceived usability of MP3-player skins, he found that perceived usability became a strong determinant of goodness (i.e. satisfaction), particularly after the skins had been used for some time. This raises the possibility that our experience of a usable system affects our perceptions of its appeal - that ‘what is usable is beautiful’. However, it is worth noting that Hassenzahl also drew a distinction between ‘goodness’, ‘beauty’ and ‘hedonic attributes’, each of which had slightly different relationships with perceived usability. What these findings suggest is that the relationships between usability and aesthetics may be a bi-directional one. However, this hypothesis has not yet been tested. In studies examining aesthetic appeal the choice of stimulus is important. Some earlier researchers advocated the use of works of art or photographs since, they argued, this provided greater ecological validity [15] Others advocated a much more experimental approach using novel stimuli with no pre-existing or obvious emotional valence, maintaining that the use of such stimuli ensured greater experimental validity [16]. More recently research has attempted to combine both bottom-up and top-down approaches [17] and use stimuli which are experimentally tractable while being ecologically valid [e.g., 5, 18]. In the studies which we report here icons were used as stimuli. This is for two reasons. Firstly, icons form a communicative substrate for a wide variety of interfaces. Secondly, as experimental stimuli they are easily controlled and manipulated, not least because their characteristics have been identified and measured [19, 20]. Moreover, the effects of icon characteristics on performance are already well known [e.g., 21, 22]. The three studies reported below demonstrate the complex nature of the relationship between usability (performance) and appeal which is emerging from our laboratory. In the first study, the nature of the relationships between icon characteristics, user performance and aesthetic appeal are examined. The second shows that aesthetic appeal can influence performance (what is beautiful is usable?) and finally the third study shows that our habitual ways of dealing with stimuli influences our perceptions of appeal (what is usable is beautiful?). The theoretical and practical implications of these findings are then discussed. The term aesthetic appeal is used when participants’ ratings of appeal are the data being considered and preference is used when dealing with participants’ choices between stimuli.

2 Study 1: Examination of Parallels between Performance and Appeal 2.1 User Performance The effects of visual complexity, concreteness and familiarity of icons on user performance are well documented and are as follows:(i) Users are able to search more quickly for simple, rather than complex, icons in interface arrays (c.f. Figure 1 b and c with d and h) [22, 23, 24]. This is not surprising given what we know about visual search more generally [25]. (ii) Concrete icons can be interpreted more accurately and quickly, particularly when we are learning a new interface (c.f. Figure 1 a and b with c and d) [26, 27]. This is because we can use our knowledge of the everyday world and objects to

Beyond Emoticons: Combining Affect and Cognition in Icon Design

73

deduce what they might mean. However, the extent to which we can deduce icon functions on this basis is limited since many functions cannot be represented concretely while maintaining a good fit between icon and function (c.f. Figure 1a and b).

(a) heliport

(b) slow processing

(c) female

(d) rinse

(e) sound

(f) loudspeaker connection

(g) biohazard

(h) electric transmission

Fig. 1. Examples of icons

(iii) Our familiarity with what is depicted in the icon appears to be one of the most important determinants of user performance and encapsulates any effects observed for concreteness [21]. This is because even abstract icons can be understood quickly and effectively if we are familiar with the icon. For example, our familiarity with the icon representing ‘female’ in Figure 1c allows us to identify it more quickly and effectively compared with the icon representing ‘slow processing’ (Figure 1b). Recently Forsythe, Mulhern & Sawey [28] also have shown that there is a correlation between ratings of icon familiarity and ratings of visual complexity because familiar icons are perceived as being simpler. Figure 2 summarises findings to date. Familiarity is an important determinant of user performance and encompasses the effects previously observed for concrete icons. Visual complexity also determines user performance via its role in visual search but is also correlated to some extent with icon familiarity since icons appear simpler as we become more familiar with them. 2.2 Ratings of Aesthetic Appeal Visual complexity, representativeness (concreteness) and familiarity also affect ratings of appeal as follows:(i) Stimulus complexity has a significant influence on aesthetic appeal. This holds true for pictures [15], abstract shapes [29] and websites [5, 30]. (ii) Representational pictures (i.e. those which depict concrete objects) are preferred to abstract ones [31, 32]. (iii) Zajonc [33] and others have shown that even when we see objects only briefly, the appeal of these stimuli is increased relative to stimuli which have not previously been seen and this appeal grows as we grow more familiar with them [34].

74

S. McDougall et al.

User Performance

Familiarity Concreteness Complexity

Fig. 2. Relationships between icon characteristics and performance

Study 1 examined these remarkable parallels between the factors that determine performance and aesthetic appeal in more detail. Ratings of aesthetic appeal were obtained for a large corpus of 239 icons for which subjective ratings of complexity, concreteness and familiarity had already been obtained (thus providing a measure of each of these icon characteristics, [19]). Forty participants1 were asked to rate the appeal of the icons on a 1-5 scale (1=really dislike, 5=really like). Our expectation was that if appeal is truly related to performance we would expect that relationships between icon characteristics and performance would be the same as for user performance as in Figure 2 above. In accordance with previous findings, simple icons received higher appeal ratings than complex icons, concrete icons received higher ratings than abstract icons, and familiar icons were more appealing than unfamiliar icons. Regression analyses were carried out to examine predictive relationships between icon characteristics and appeal in more detail. These analyses revealed that familiarity, concreteness and visual complexity predict aesthetic appeal in the same intricate manner as that observed for user performance in Figure 2 above. Firstly, familiarity was the primary predictor of aesthetic appeal. Secondly, although icon concreteness was a strong predictor of appeal, it did not significantly predict variance in appeal once the effects of familiarity had been accounted for. Finally, visual complexity independently predicted significant amounts of the variance observed in aesthetic appeal, but its role in determining appeal also overlapped to some extent with familiarity.

3 Study 2: Is It Usable Because It Is Beautiful? Study 1 illustrated the complex nature of the symbiotic relationship between usability and appeal, expanding our knowledge from previous research which has suggested a 1

Participants in the studies reported here were undergraduate or graduate students at the universities of Swansea and Bournemouth.

Beyond Emoticons: Combining Affect and Cognition in Icon Design

75

general correlation between appeal and usability. However, it did not establish a causal link between appeal and user performance. In Study 2 we set out to examine whether or not there was a causal relationship and, if so, what form that might take. In this study the visual complexity and aesthetic appeal of icons was varied orthogonally. Four sets of icons were created: simple icons which were either appealing or non-appealing (see Figure 1, e and f) and complex icons which were either appealing or non-appealing (see Figure 1, g and h). In addition, all 4 sets were matched on the basis of other icon characteristics (i.e. concreteness and familiarity). Fifteen participants were required to search for icons in an array in a task designed to mimic search for symbols on an interface such as finding icons in cluttered computer desktop displays or selecting icons in software packages (see Figure 3).

OK

Fig. 3. Example of display in visual search task

On the basis of previous research, we might have expected search times to be optimal when icons are both simple and appealing (i.e. they have additive effects). However, it was equally possible that the effect of appeal could be contingent on visual complexity (i.e. the beneficial effect of appeal on performance would only be apparent either when icons were simple or when they were complex, not both). Figure 4 shows that, in accordance with previous research, search times were faster for simple icons. However, search times for appealing and non-appealing simple icons did not differ. In contrast, response times to appealing complex icons were significantly faster than when they were not appealing. Performance was therefore contingent on the visual complexity of the icon. These findings are not entirely surprising given what we know of visual search. In visual search arrays, simple icons are more likely to ‘pop-out’ in a visual display, as they are easier to process (i.e., due to fewer features present in the stimulus). In this case any effect of aesthetic appeal is likely to be overridden by the ‘pop-out’ effect. In contrast, complex icons are less likely to ‘pop-out’ in a search array (i.e., due to greater number of visual features present in the stimulus). It is only in this case that aesthetic appeal has an effect on search performance, reducing search times for complex icons with high aesthetic appeal. This study shows that aesthetic appeal directly affects user performance (i.e. it is usable because it is beautiful) but that it is a contingent, rather than a simple, relationship.

76

S. McDougall et al.

Mean Response Time (ms)

Simple icons are usable, irrespective of appeal, at least where tasks require visual search. Complex icons are less usable, but the effects of visual complexity can be ameliorated if the icon is appealing. Again, this study serves to illustrate the need for a more detailed knowledge of the general relationships previously observed in order to be able to apply them effectively to interface design. 1000 950 Non-appealing

900

Appealing

850 800 Com plex

Sim ple

Fig. 4. The effect of visual complexity and aesthetic appeal on performance

4 Study 3: Is It Beautiful Because It Is Usable? This study also illustrates the subtle factors affecting the relationship between user performance and appeal. In contrast to Study 2, this study examined whether skills which are likely to affect usability affect our perceptions of their appeal, i.e. what is usable is beautiful. Recent research has shown that aesthetic preferences can be affected by our habitual direction of reading, i.e. right-to-left for readers of Arabic and Hebrew or left-toright for readers of English and other European languages. Chokron & De Agostini [35] recently found that French, left-to-right, readers preferred pictures which had most of their content on the right while Israeli Hebrew, right-to-left, readers preferred pictures whose content was primarily on the left. Similar findings were reported by Heath et al [36] although they also found that preferences were dependent on the extent to which individuals read in only one language. The tendency to have a rightward bias in English readers diminished with exposure to Arabic. The aim of Study 3 was to determine if (a) the directional bias in appeal for picture content in previous studies would be present when icon appeal was considered and (b) the influence of icon characteristics differed depending on the orthography which individuals habitually used. Thirty six right-handed participants took part in this study; half the participants were right-to-left readers (Arabic) and the other half were left-to-right readers (English). Participants were presented with icons in pairs which were mirror-images of one another. One set of icons had a left-right emphasis and the other had a right-left emphasis. For example, Figure 1a shows a ‘heliport’ icon with right emphasis while Figure 1b shows a ‘slow processing’ icon with left emphasis. Each individual icon would be shown paired with its mirror-image (e.g. the ‘heliport’ icon flipped to the left). Participants were asked to state which one they preferred. Our findings are

Beyond Emoticons: Combining Affect and Cognition in Icon Design

77

illustrated in Figure 5. The pattern of results was similar to Heath et al.(2005). English readers significantly preferred icons with right emphasis in accordance with their reading direction. Arabic readers, however, did not show any significant preference for either left or right picture emphasis and this appears to have been the result of the fact that our participants were bilingual Arabic-English speakers and habitually read in both directions. These findings suggest that the influences on aesthetic preference for interfaces are very subtle. While this study could be taken as evidence of the idea that ‘what is usable is beautiful’, some important caveats need to be borne in mind when reaching conclusions. The first is that reading is a skill and is not a direct measure of usability – we have assumed that for English readers, icons with right emphasis are more usable but this needs to be demonstrated directly. The second is that this paradigm emphasizes differences in appeal for stimuli which differ in left-right emphasis and are therefore asymmetrical. Many icons are symmetrical for good reason, since they are easier to process, represent man-made objects which are typically symmetrical, and are more likely to be found appealing [37, 38].

Frequency

500 Right em phasis

400

Left emphasis

300 200 Arabic

English

Fig. 5. Frequency of preference for left and right emphasis pictures in Arabic and English readers

5 Discussion 5.1 Theoretical Explanations The current studies demonstrate that usability-aesthetics relationships operate in both a feed-forward and feed-backward manner, i.e. they are bi-directional. Is there a theoretical account which can bring these and other existing findings together? One possibility is the perceptual fluency hypothesis which postulates the stimuli we have previously encountered are processed perceptually more effectively and quickly [39]. This perceptual fluency is subsequently (mis)attributed to greater liking. Recently Reber et al revised this account on the basis of cultural influences on taste and proposed a processing fluency account [40]. This proposed that (i) stimuli differ in the fluency with which they can be processed (ii) fluency itself is hedonically marked (iii) processing fluency feeds into judgments of aesthetic appeal and (iv) that the impact of fluency is moderated by expectations and attributions based on experience. This certainly provides a dynamic, rather than static, explanation of aesthetic appeal and takes account of experience and task demands as well as stimulus characteristics. However,

78

S. McDougall et al.

it assumes that judgments of appeal follow performance, while our research suggests that this relationship is bi-directional. We therefore propose that, while Reber et al.’s unidirectional framework helps to bring together many of the disparate findings reported to date, our findings suggest that ratings of appeal emerge out of ongoing processing and that the timeline for ‘hedonic marking’ may reflect a bi-directional relationship between performance and appeal. It is also important to take account of the possibility that under certain circumstances the close relationship between appeal and performance may break down. For example, where usability is paramount, appeal may not be so closely correlated with performance (e.g. in air traffic control displays task demands may determine the closeness of the relationship between usability and appeal). 5.2 Practical Implications At a practical level our findings have a number of implications:1. Our findings challenge both researchers and interface designers to take account of the strong links observed between user performance and appeal, and that icon characteristics which users may not be consciously aware of, may influence the appeal of the interface and subsequent usability. 2. The findings of Study 2 suggest that if complex icons are necessary, as in some cockpit displays for example, one way to mitigate the increased search time for complex icons is to maximize aesthetic appeal. 3. Study 3 showed that users’ skills and habitual modes of processing, such as the directional in which they read, can influence perceptions of appeal. This may be particularly important for website design.

References 1. Hassenzahl, M., Tractinsky, N.: User experience – a research agenda. Behaviour & Information Technology 25, 91–97 (2006) 2. Bauerly, M., Liu, Y.: Computational modeling and experimental investigation of the effects of compositional elements on interface and design aesthetics. In: Proceedings of the 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice (2008) 3. Bauerly, M., Liu, Y.: Development and validation of a symmetry metric for interface aesthetics. In: Proceedings of the 49th Annual Meeting of the Human Factors and Ergonomics Society (2005) 4. Bauerly, M., Liu, Y.: Computational modeling and experimental investigation of effects of compositional elements on interface and design aesthetics. International Journal of Human-Computer Studies 64, 670–682 (2006) 5. Bauerly, M., Liu, Y.: Effects of symmetry and compositional elements on interface design and aesthetics. International Journal of Human-Computer Interaction 24, 257–267 (2008) 6. Hassenzahl, M.: Emtions can be quite ephemeral: We cannot design them. Interactions 11, 46–48 (2004) 7. Lindgaard, G., Dudek, C.: What is this evasive beast we call user satisfaction? Interacting with Computers 15, 429–452 (2003)

Beyond Emoticons: Combining Affect and Cognition in Icon Design

79

8. Lindgaard, G., Fernandes, G., Dudek, C., Brown, J.: Attention web designers: You have 50 milliseconds to make a good first impression. Behaviour & Information Technology 25, 115–126 (2006) 9. Han, S.H., Hong, S.W.: A systematic approach for coupling user satisfaction with product design. Ergonomics 46, 1441–1461 (2003) 10. Tractinsky, N., Katz, A.S., Ikar, D.: What is beautiful is usable. Interacting with Computers 13, 127–145 (2000) 11. Tractinsky, N., Cokhavi, A., Kirschenbaum, M., Sharfi, T.: Evaluating the consitency of immediate perceptions of web pages. International Journal of Human-Computer Studies 64, 1071–1083 (2006) 12. Zhang, P., von Dran, G., Blake, P., Pipthsukunt, V.: Important design feature in different website domains: An empirical study of user perceptions. e-Service Journal 1(1) (2001) 13. Kurosu, M., Kashimura, K.: Apparent usability vs. inherent usability. In: CHI 1995 Conference Companion, pp. 292–293 (1995) 14. Tractinsky, N.: A few notes on the study of beauty in HCI. Human-Computer Interaction 19, 351–357 (2004) 15. Berlyne, D.E.: Studies in the new experimental aesthetics. Hemisphere, Washington (1974) 16. Raymond, J.E., Fenske, M.J., Tavassoli, T.T.: Selective attention determines emotional responses to novel visual stimuli. Psychological Science 14, 537–542 (2000) 17. Zeki, S.: The splendours and miseries of the brain: Love, creativity and the pursuit of human happiness. Wiley & Sons, Chichester (2009) 18. Fang, X., Singh, S., Ahluwalia, R.: An examination of different explanations for the mere exposure effect. Journal of Consumer Research 34, 97 (2007) 19. McDougall, S., Curry, M.B., de Bruijn, O.: Measuring symbol and icon characteristics: Norms for concreteness, complexity, meaningfulness, familiarity and semantic distance for 239 symbols. Behavior Research Methods 31, 487–519 (1999) 20. Forsythe, A., Sheehy, N., Sawey, M.: Measuring icon complexity: An automated analysis. Behavior Research Methods, Instruments and Computers 35, 334–342 (2003) 21. Isherwood, S., McDougall, S., Curry, M.: Icon identification in context: The changing role of icon characteristics with user experience. Human Factors 49, 465–476 (2007) 22. McDougall, S., de Bruijn, O., Curry, M.: Exploring the effects of icon characteristics on user performance: The role of concreteness, complexity and distinctiveness. Journal of Experimental Psychology: Applied 6, 291–306 (2000) 23. Byrne, M.D.: Using icons to find documents: Simplicity is critical. In: Proceedings of INTERCHI 1993, pp. 446–453 (1993) 24. Scott, D.: Visual search in modern human-computer interfaces. Behaviour and Information Technology 12, 174–189 (1993) 25. Quinlan, P.T.: Visual feature integration theory: Past, present, and future. Psychological Bulletin 129, 643–673 (2003) 26. Green, A.J.K., Barnard, P.J.: Iconic interfacing: The role of icon distinctiveness and fixed or variable screen locations. In: Diaper, D., et al. (eds.) Human-computer interaction – Interact 1990. Elsevier, North-Holland (1990) 27. Rogers, Y., Oborne, D.J.: Pictorial communication of abstract verbs in related to humancomputer interaction. British Journal of Psychology 78, 99–112 (1987) 28. Forsythe, A., Mulhern, G., Sawey, M.: Confounds in pictorial sets: the role of complexity and familiarity in basic-level picture processing. Behavior Research Methods 40, 116–129 (2008)

80

S. McDougall et al.

29. Jacobsen, T., Hofel, L.: Aesthetic judgments of novel graphic patterns: Analyses of individual judgments. Perceptual & Motor Skills 95, 755–766 (2002) 30. Lavie, T., Tractinsky, N.: Assessing dimensions of perceived visual aesthetics of web sites. International Journal of Human-Computer Studies 60, 269–298 (2004) 31. Vartanian, O., Goel, V.: Neuroanatomical correlates of aesthetic preferences for paintings. Neuroreport 15, 893–897 (2004) 32. Kawabata, H., Zeki, S.: Neural correlates of beauty. Journal of Neurophysiology 91, 1699– 1705 (2004) 33. Zajonc, R.B.: Attitudinal effects of mere exposure. Journal of Personality & Social Psychology Monographs 9, 1–27 (1968) 34. Bornstein, R.F.: Exposure and affect: Overview and meta-analysis of research, 1968-1987. Psychological Bulletin 106, 265–289 (1989) 35. Chokron, S., De Agostini, M.: Reading habits influence aesthetic preference. Cognitive Brain Research 10, 45–49 (2000) 36. Heath, R., Mahmasanni, O., Rouhana, A., Nassif, N.: Comparison of aesthetic preferences among Roman and Arabic script readers Laterality: Asymmetries of Body, Brain, and Cognition, vol. 10, pp. 399–411 (2005) 37. Arnheim, R.: Art and visual perception: The new version. University of California Press, Berkeley (1974) 38. Humphrey, D.: Preferences in symmetries and symmetries in drawings: Asymmetries between ages and sexes. Empirical Studies of the Arts 15, 41–60 (1997) 39. Reber, R., Winkielman, P., Schwarz, N.: Effects of perceptual fluency on affective judgments. Psychological Science 9, 45–48 (1998) 40. Reber, R., Schwarz, N., Winkielman, P.: Processing fluency and aesthetic pleasure: Is beauty in the perceiver’s processing experience? Personality and Social Psychology 8, 364–382 (2004)

Agency Attribution in Human-Computer Interaction John E. McEneaney Department of Reading and Language Arts Oakland University, Rochester, MI USA [email protected]

Abstract. Social psychologists have documented that people attribute a humanlike agency to computers. Work in human motor cognition has identified a related effect known as “intentional binding” that may help explain this phenomenon. Briefly, intentional binding refers to an unconscious attribution of agency to sufficiently complex entities in our environments that influences how we perceive and interact with those entities. Two studies are presented that examine whether intentional binding, an agency effect observed when people interact with physical objects, also applies in virtual environments typical of human-computer interaction (HCI). Results of the studies indicate that agency effects are observed in human-computer interaction but these effects differ from those reported in physical environments. Results of the studies suggest that human perception and action may operate differently in virtual environments than in physical interactions. Keywords: social interface theory, intentional binding, cognition, perception, agency attribution.

1 Introduction A substantial body of work in social interface theory [1] has demonstrated that people often attribute a human-like agency to computers. Gender bias, for example, has been observed depending on whether the voice used by the computer is male or female [2]. Computer users are more likely to disclose personal information to a computer if that computer has disclosed “personal” information about itself [3]. Participants in HCI studies have also been observed to adopt a bias attributing credit or blame to a computer depending on whether the computer is perceived to share qualities with the experiment participants [4] and people using an online learning system that fails by design during an experiment tend to soften criticism of the failure when they report the problem directly to the computer rather than an independent human experimenter [5]. Furthermore, results from work in affective computing suggest that the perception of agency can be influenced by interface factors [6]. Our understanding of agency attribution by computer users, however, is limited. This work examines the attribution of agency in tasks typical of HCI. Recent neurocognitive work exploring human agency has, perhaps not surprisingly, focused on motor behaviors. Motor cognition is better understood than higherorder cognitive phenomena because the behaviors studied are defined in more precise D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 81–90, 2009. © Springer-Verlag Berlin Heidelberg 2009

82

J.E. McEneaney

ways and the mapping of brain and motor behavior is clearer and more direct. Correlations between specific motor behaviors and activation in the motor cortex are reliable enough for meaningful generalization and even simple motor behaviors such as moving a single finger still incorporate an essential feature of intentional behavior – the subjective experience of free action. Moreover, the history of work exploring connections between motor behavior and agency dates back to the early 1980s and during this time researchers have defined a variety of techniques to assess the subjective experience of agency and its neural correlates. Among the earliest of these studies was work by Libet and coauthors [7], who sought to establish a timeline relating central neural activity in the brain, peripheral neural activity of muscles, and the conscious subjective experience of action. Libet used scalp electrodes to chart the broadly distributed electrical activity of the brain known as the readiness potential or RP [8], a finger electrode to measure musclespecific electrical activity, and a clock with a single rapidly moving hand so that subjects could report when they became aware of the decision to move their finger. Libet instructed his subjects to move a finger of their own free will, using the position of the clock hand to indicate when they had made the decision to move. Libet’s goal was to determine the chronology of these three events and, from a neurological perspective, his results were unsurprising. Briefly, the RP associated with a finger movement usually emerged from a subject’s neural baseline about 800 milliseconds before the finger movement, the subject became aware of the decision to move about 200 ms before the movement, and the increase of electrical activity in the finger muscle began about 50 ms before the subject’s finger movement triggered an electrical switch. Libet’s study, however, stirred a debate among philosophers of mind as it provided rather compelling evidence that the subjective experience of free will (at least with respect to motor volition) followed, rather than preceded, the broader brain activity (i.e., the RP) responsible for the motor action. Apparently, a brain has a mind of its own. More recent work, relying on Libet’s clock method, explores social aspects of the experience of agency and has its origins in research with non-human primates. Recent neurocognitive work [9] suggests biologically plausible mechanisms to support agency-specialized neural structures (mirror neurons) that might account for cognitive correlates of agency attribution. In the mid-1990s researchers at the University of Parma were engaged in mapping brain function in macaque monkeys [10]. As a part of this research a specific neuron had been mapped that always fired when a monkey picked up a peanut that had been placed on a surface before it. Every time the monkey acted in this manner, the neuron fired, suggesting that the brain activity corresponded to “I (the monkey) am picking up this peanut.” One day a monkey, fully wired and waiting to begin a study, watched as a researcher came and picked up a peanut from the table and the same “I am picking up the peanut” neuron fired. Apparently, the same neural circuit could mean “I am picking up the peanut,” as well as “You are picking up the peanut”. This neural circuit did not distinguish between these two rather different situations although the same neuron did not fire when the monkey watched a mechanical arm pick up a peanut [11]. It appears that this neural activity does not simply represent perception or motor response; it seems to represent agentinitiated action. Research since Rizzolatti’s serendipitous discovery has confirmed both the presence of these so-called mirror neurons in people and Rizzolatti’s initial hunch that mirror neurons seem to be related to a wide range of social behaviors [12].

Agency Attribution in Human-Computer Interaction

83

Other work in motor cognition [13] provides independent support for the idea that the brain has specialized processes for perceiving the actions of other agents. Study participants were instructed to perform simple arm movements (rhythmic side-to-side or up-and-down arm motions) while watching another person or a robot perform similar or different movements. The hypothesis driving the study was that “actions are intrinsically linked to perception (p. 522)” and that, if the mirror system is activated, this should increase the likelihood of imitative behavior. Subjects who watched another person make incongruent movements (e.g., up-and-down when the subject was moving side-to-side) had significantly more variability in their movements than when the other person was moving congruently. Furthermore, arm movements by people in the study were not influenced by watching a robot make incongruent movements. This interference effect brought on by observation of another agent demonstrates that perception and action are linked and, more importantly, that perceived agency can influence action. Another extensive line of work exploring agency effects has shown that subjective timing of events depends on whether or not people attribute agency to the source of an observed action. Briefly, our perceptual experiences are subtly different when we watch ourselves or other people than when we watch non-intentional events. One expression of this agency effect is a perceptual shift that delays perception of an event relative to its actual time when the event is perceived as an intentional action. A second manifestation of this agency effect is a perceptual shift that anticipates perception of an event that is perceived to be a consequence of intentional action. When, for example, a buzzer sounds as a result of pressing a button, the perceived timing of the button press is delayed while the timing of the buzzer is reported as occurring earlier (see Fig. 1). These same effects are also observed when subjects watch other people press the button. Taken together, these two effects have been referred to as “intentional binding” [14]. Prior work on intentional binding, however, has focused on physical environments where people observe the movements of machines, physical objects, or human hands. In these studies, the subjective experience of actions and effects are reliably influenced by the agency effect illustrated in Figure 1. Fig. 1. Intentional binding of an action The subjective timing between intenand its effect tional actions and their effects is consistently reduced and this effect is not observed in circumstances where subjects watch objects or even their own fingers when their movements are under external control. Although work by social psychologists indicates people attribute a kind of agency to computers, it is unclear whether we can expect to see the subtle perceptual effects of intentional binding when events and consequences are represented in virtual environments typical of HCI. The following studies were designed to address this question: Is intentional binding observed in the virtual environments typical of HCI?

84

J.E. McEneaney

2 Experimental Procedures Both studies relied on the same experimental procedures and stimuli (see Figure 2). The only difference between the two studies was the target of the timing task. In study 1, participants were asked to report the timing of a red flash that coincided (within an average of 8 milliseconds) with a self-, other-, or machine-initiated mouse click. In study 2 participants reported the timing of an auditory tone that followed a mouse click by 250 milliseconds (ms). Subjective timing relied on Libet’s clock methodology [7] [15]. Experimental materials were developed in Python using PsychoPy [16], a collection of Python modules that provide access to hardware-level system functions for precision stimulus display and response timing. Participants reported the observed position of a rotating spot at the time of a mouse click, a visual stimulus, or an audible tone. Participants indicated the position of the spot by clicking the clock where the spot was at the time of the target event. For example, a participant who saw the rotating spot in the position shown in Fig. 2B when the clock flashed red will move the cursor and click the “28-minute” mark on the clock (Fig. 2C). In the other-click condition, the participant watches another person (a researcher) interact with the computer, again reporting the timing of a target event. Finally, in the two computer-click conditions the participant started the trial by clicking the “Next Trial” button but, once initiated, the computer controlled cursor movement and clicks. In the machine-simulated-click condition the movement of the cursor and timing of the click replicated the movement and timing of one of the participant’s own trials randomly selected from data collected during training trials. In the machine-random-click condition the cursor was invisible so there was no cursor movement and timing of the target event was random. The first set of trials in both studies was a spontaneous self-click condition requiring participants to report when they had clicked the mouse. This set of trials was designed to familiarize participants with the experimental protocol and assess whether the mouse click and visual signal were subjectively perceived as simultaneous. Presentation of the four subsequent blocks of experimental trials employed a Latin square with conditions presented in random order, with each condition occurring once in each position.

A. When a trial begins, a rotating spot appears at a random position. The task is to report the position of the spot when a target event occurs.

B. The target in study 1 was a red flash when the mouse was clicked. Study 2 subjects reported the timing of a tone following the flash by 250 ms.

C. After the tone, the spot continues for a random interval then disappears. Subjects click to show where the spot was at the time of the target event.

Fig. 2. The Libet clock used in the studies at three points in a trial

Agency Attribution in Human-Computer Interaction

85

All trials in both studies presented the same sequence of events. The only differences between blocks of trials and studies were the timing of events (which could be determined by the participant, the experimenter, or the computer) and the target of the timing task (i.e., a click, red flash, or audible tone). Each experimental trial was initiated by the participant or researcher clicking a “Next Trial” button, starting the clock rotating from a random starting position. Participants were asked to let the spot make one complete revolution, move the cursor to the center of the clock, and click the mouse at a time of their own choosing. When the clock was clicked, its center flashed red for 100 milliseconds, providing an immediate virtual “action signal” when the mouse was clicked. In addition, 250 ms after the click there was a 1000 Hz auditory tone that sounded for 100 milliseconds, providing an action effect that always followed the click and visual flash. After these events occurred, the spot continued to rotate for a random period of time that ranged from 1.5 to 2.5 seconds so that participants would not be aided by after-image effects. Both studies adopted fully-factorial one-way repeated measures designs with four levels of an agency factor. Study 1 focused on a visual cue associated with the mouse click. In the self-click (SC) condition, the participant reported the timing of the visual cue associated with a self-initiated mouse click. In the other-click (OC) condition, the participant observed an experimenter click the mouse and reported the visual cue associated with that action. In addition to these two “human” conditions, there were two “machine” conditions. In the machine-simulated-click (MSC) condition the participant observed a computer-controlled movement of the cursor and a mouse click that matched the timing and motion of trials randomly selected from the participant’s data in the training block. Lastly, in the machine-random-click (MRC) machine condition, the timing of the mouse click was random and the cursor was not visible; the participant simply reported the visual cue when it appeared. Materials and conditions in Study 2 were identical to those used in Study 1. The only difference in Study 2 was that participants reported the timing of the auditory tone that followed the mouse click rather than the red flash.

3 Study 1 Results and Discussion Results of a repeated-measures GLM ANOVA indicated an agency effect, with significant differences in both the general test (Hotellings Trace F(3,32) = 5.62, p = .003, η2 = .345) and in the within-participants test corrected for possible violation of the sphericity assumption (Huyhn-Feldt F(2.124,72.221) = 4.527, p = .013, η2 = .118) with observed power of .915 and .773 respectively. Mean error scores across the four conditions are illustrated in Figure 3. Follow-up analysis confirmed that the observed agency effect could be attributed to differences between the human and machine conditions. A paired t-test examined the data with self- and other-click conditions collapsed into a single “human” condition and the computer-controlled conditions treated as a single “machine” condition. Results of the paired t-test revealed a significant difference t(34) = -2.511, p = .017. Further paired t-tests showed no significant differences within the two human and machine conditions. Results of Study 1 replicate prior findings that perceptual judgments of observed actions are influenced by whether or not participants attribute

86

J.E. McEneaney

agency to the source of those actions. 2 Unlike prior work assessing intentional binding in physical environments, how1.5 ever, the effects observed showed a greater delay in perceiving machine 1 initiated action than that of human action, the opposite of findings in physical 0.5 environments. Prior work has consistently documented relative delays in 0 perception of self- and other-initiated -0.5 (i.e., human) actions compared to maSC OC MSC MRC chine generated events. In study 1, however, this pattern of timing errors is Fig 3. Mean error scores from Study 1 for reversed. One possible explanation for this re- self-click (SC), other-click (OC), machineversal of the agency effect draws on the simulated-click (MSC), and machineuse of a virtual target (a red flash) that random-click (MRC) conditions is associated with a mouse click rather Table 1. Study 1 error scores for data than the direct observation of a physical collapsed across Human (SC & OC) and movement. Perhaps a time interval as Machine (MSC & MRC) conditions short as 8 ms is enough for participants to perceive the red flash as an effect of Collapsed Std. Mean N the mouse click rather than a simultaConditions Dev. neous associated event. This is a perHuman .2557 3.20063 35 ception that may be reinforced in eveMachine 1.5657 2.43723 35 ryday computer use, where mouse clicks are understood to cause computer events. In order to test this possibility, a paired t-test compared the timing of physical mouse clicks in the training blocks that preceded experimental trials with the selfclick experimental trials in which participants were prompted to use the on-screen flash as the target in the timing task. Results of this analysis showed no significant difference (t(34) = 1.052, p = .300), suggesting that physical and virtual markers of action are functionally equivalent, at least with respect to the timing tasks used in study 1.

4 Study 2 Results and Discussion Analysis in study 2 also began with a repeated-measure GLM ANOVA. This analysis revealed an agency effect for the auditory tone following observed actions, with significant differences in both the general test (Hotellings Trace F(3,33) = 3.117, p = .039, η2 = .221) and in the within-subjects test corrected for possible violation of the sphericity assumption (Huyhn-Feldt F(2.304,80.653) = 4.461, p = .011, η2 = .113) with observed power of .673 and .792, respectively. Mean error scores across the four conditions are illustrated in Figure 4. Follow-up analysis showed that the observed agency effect could be attributed to differences between the human and machine conditions. A paired t-test examined the

Agency Attribution in Human-Computer Interaction

87

data with conditions collapsed into 7.5 human and machine conditions as in study 1. Results of a paired t-test 7 revealed a significant difference, 6.5 t(35) = 2.681, p = .011. Further paired t-tests showed no significant 6 differences within the two human and 5.5 machine conditions. Results of study 2 corroborate the findings of study 1, 5 including the anomalous reversal of 4.5 agency effects in a virtual environSC OC MSC MRC ment. As before, there was a statistically significant agency effect and Fig 4. Mean error scores from Study 2 for this effect could be attributed to difself-click (SC), other-click (OC), machineferences between the human and masimulated-click (MSC), and machine-randomchine conditions. click (MRC) conditions Results show that participants’ perceptions of the timing of audible tones differed depending on whether Table 2. Study 1 error scores for data collapsed across Human (SC & OC) and those tones followed a human- or Machine (MSC & MRC) conditions machine-initiated action, although the sequence and relative timing of Collapsed Std. the target stimuli were similar in all Mean N Conditions Dev. conditions. These results confirm the Human 6.6442 3.20063 36 findings of study 1 indicating that Machine 4.9450 2.43723 36 intentional binding is observed in virtual environments typical of HCI. As before, however, results indicate that intentional binding operates differently in a virtual environment, where the observed effects are reversed compared to humanmachine differences noted in physical environments

5 General Discussion Results of studies 1 and 2 indicate that agency effects influence users’ subjective experiences of action and response in even very simple HCI tasks involving mouse manipulations with auditory and visual feedback. The results that have been observed, however, differ from results of prior work. Macro-interactive studies of people interacting with computers on social time scales (on the order of minutes) show that people tend to attribute agency to computers with which they interact. In the present studies, however, subjects attributed agency only when the source of an action or effect was human. Subjects did not attribute agency in machine-action conditions suggesting that there may be an interaction threshold for social agency effects that was not attained by either machine condition in the present studies but was attained in the macro-interactive tasks used in prior work. There are also differences in the way agency effects are expressed in virtual environments. Micro-interactive tasks in previous work document an attraction between

88

J.E. McEneaney

action and effect when agency is attributed to the source of action. 8 The present studies, however, show 7 greater attraction between action 6 Audible tone and effect under non-agency ma5 chine conditions. Data across all 4 four conditions and both studies are 3 depicted in Figure 5. In the two hu2 man agency conditions (SC & OC) 1 the subject or an experimenter maVisual click cue 0 nipulated the mouse. In the two -1 machine conditions (MSC & MRC) SC OC MSC MRC the computer simulated agency or triggered events randomly without Fig. 5. Mean perceptual errors scores from any semblance of agency. The relastudy 1 (squares) and study 2 (triangles) tive attraction between click and tone is apparent: click and tone are more attracted in the machine conditions than in human conditions.

6 Conclusions and Implications Results of these studies support the claim that intentional binding operates in virtual environments typical of HCI. Human users perceive even simple actions and effects differently depending on whether agency is attributed to the source of the action. Given the significance and ubiquity of simple actions and effects in HCI, it may be important to examine the influence of intentional binding and other agency effects on software use and the user experience. Furthermore, the fact that unconscious agency effects also influence macro-interactive social behaviors suggests these effects operate across a broad spectrum of contexts. It cannot, however, be assumed that the agency effects expressed in micro-interactive contexts like those in the two studies described here are related to macro-interactive counterparts in simple ways. As demonstrated by the discrepancies between physical and virtual effects, even within micro-interactive contexts there are differences in the way agency effects are expressed. The significance of agency effects however is broader than suggested by the technical focus adopted in these studies. Personal agency is one of the most fundamental experiences we have in our interactions with others and the physical world. It is, however, becoming increasingly clear that the folk psychology of agency and the conscious experience on which it is based are not consistent with neurobiology and cognition. Our experience of intention follows rather than precedes the brain activity that drives it and there are circumstances where we may be mistaken about even the most basic aspects of personal agency. As we learn more about the neurocognition of agency it may be possible for complex interactive systems to unconsciously influence our experience of agency [17], particularly since there are well documented examples of similar types of unconscious influence [18]. Semantic priming studies have consistently shown that subliminally displayed words influence the perception of subsequent semantically associated words [19]. Studies of blindsight [20] have shown that

Agency Attribution in Human-Computer Interaction

89

human behavior may rely on visual information even when there is no conscious experience of vision. Finally, a series of related studies that involve HCI tasks similar to those in the present work [21] have shown that people can be induced to incorrectly attribute actions to themselves when those actions are actually a result of someone else’s action. Finally, numerous recent studies suggest that our experience of personal agency relies on many of the same neural circuits that support the attribution of agency to others [22]. Learning about how we attribute agency to computers, therefore, may also help us better understand ourselves and our interactions with other people, as well as contribute to the neurocognitive foundations of HCI.

References 1. Reeves, B., Nass, C.: The media equation: How people treat computers, television, and new media like real people and places. Cambridge University Press, New York (1996) 2. Nass, C., Moon, Y., Green, M.: Are computers gender-neutral? Gender stereotypic responses to computers. Journal of Applied Social Psychology 27(10), 864–876 (1997) 3. Nass, C., Moon, Y.: Machines and mindlessness: Social responses to computers. Journal of Social Issues 56(1), 81–103 (2000) 4. Moon, Y., Nass, C.: Are computers scapegoats? Attributions of responsibility in humancomputer interaction. International Journal of Human-Computer Interaction 49(1), 79–94 (1998) 5. Sundar, S.S., Nass, C.: Source orientation in human-computer interaction. Communication Research 27(6), 683–703 (2000) 6. Sengers, P., Liesendahi, R., Magar, W., Seibert, C., Muller, B., Joachims, T., Geng, W., Martensson, P., Hook, K.: The Enigmatics of Affect. Anonymous. In: Proceedings of the Conference on Designing Interactive Systems: Processes, practices, methods, and techniques, London, pp. 87–98 (2002) 7. Libet, B., Gleason, C.A., Wright, E.W., Pearl, D.K.: Time of conscious intention to act. in relation to onset of cerebral activity (readiness potential): The unconscious initiation of a freely voluntary act. Brain 106, 623–642 (1983) 8. Coles, M.G.H., Rugg, M.D.: Event-related brain potentials. In: Rugg, M.D., Coles, M.G.H. (eds.) Electrophysiology of Mind. Oxford University Press, Oxford (1996) 9. Society for Neuroscience: Mirror, mirror in the brain: Mirror neurons, self-understanding, and autism research. Science daily (2007) 10. Gallese, V., Fadiga, L., Fogassi, L., Rizolatti, G.: Action recognition in the premotor cortex. Brain 119, 593–609 (1996) 11. Rizolatti, G., Fogassi, L., Gallese, V.: Neurophysiological mechanisms underlying the understanding and imitation of action. Nat. Rev. Neurosci. 2, 661–670 (2001) 12. Gallese, V., Keysers, C., Rizzolatti, G.: A unifying view of the basis of social cognition. Trends in Cognitive Sciences 8, 396–403 (2004) 13. Kilner, J.M., Paulignan, Y., Blakemore, S.J.: An interference effect of observed biological movement on action. Current Biology 13(6), 522–525 (2003) 14. Haggard, P., Clark, S., Kalogeras, J.: Voluntary action and conscious awareness. Nat. Neurosci. 5(4), 382–385 (2002) 15. Pockett, S., Miller, A.: The rotating spot method of timing participantive events. Consciousness and Cognition 16(2), 241–254 (2006) 16. Peirce, J.W.: Psychopy - Psychophysics software in Python. Journal of Neuroscientific Methods 162(1-2), 8–13 (2006)

90

J.E. McEneaney

17. Stetson, C., Cui, X., Montague, P.R., Eagleman, D.M.: Motor-sensory recalibration leads to an illusory reversal of action and sensation. Neuron 51, 651–659 (2006) 18. Dixon, N.F.: Subliminal perception: The nature of a controversy. McGraw Hill, New York (1971) 19. Marcel, A.J.: Conscious and unconscious perception: Experiments on visual masking and word recognition. Cognit. Psychol. 15, 197–237 (1983) 20. Weiskrantz, L.: Prime-sight and blindsight. Conscious. Cogn. 11, 568–581 (2002) 21. Wegner, D.M.: The illusion of conscious will. MIT Press, Cambridge (2002) 22. Jackson, P.L., Decety, J.: Motor cognition: A new paradigm to study self-other interactions. Current Opinion in Neurobiology 14, 259–263 (2004)

Human-UAV Co-operation Based on Artificial Cognition Claudia Meitinger and Axel Schulte Universität der Bundeswehr München (UBM), Department of Aerospace Engineering, Institute of Flight Systems (LRT-13), 85577 Neubiberg, Germany {claudia.meitinger,axel.schulte}@unibw.de

Abstract. In the future, Uninhabited Aerial Vehicles (UAVs) will be part of both civil and military aviation. This includes co-operative mission accomplishment of manned and unmanned assets with little manpower being available for UAV guidance. So, UAVs need to be able to accomplish tasks with a minimum of human intervention and possibly in co-operation with other UAVs or manned aircraft. This paper presents artificial cognition as approach to co-operative capabilities of UAVs. They are guided by so-called Artificial Cognitive Units (ACUs) being capable of goal-directed behavior on the basis of understanding the current situation. Prototype evaluation results show the capability of suchlike co-operative ACUs to yield human-like rationality and the ability to act as peers in a human-ACU team. Keywords: cognitive automation, artificial cognition, multiple UAV guidance, human-machine co-operation, UAV co-operation.

1 Introduction In manned aircraft, human pilots are responsible for safe mission accomplishment and have the authority to do whatever is necessary within their scope of allowed action alternatives. This includes managing the automation available onboard. Especially in situations which could not be foreseen during system design and thus cannot be considered when planning a mission, typical human abilities are essential to maximize mission success. Humans are capable of understanding what is going on, reflecting on what should be achieved next and planning the next steps. This human strength can even be assumed in situations the concrete configuration of which could not be anticipated or has never been encountered before. When considering UAVs which are supposed to fulfill whatever kind of mission, unforeseen situations are likely to occur and have to be handled appropriately. Therefore, human operators still have to be kept in the loop. In this case, a UAV operator acts similar to a remote pilot, being connected to the vehicle via data link rather than being located in the aircraft. However, this approach hits its limitations when e.g. time delays or data link losses occur. Moreover, the guidance of multiple UAVs with limited abilities by one human operator will probably exceed the available human resources in certain situations and result in excessive workload. Finally, for cooperative missions in which teams of unmanned and manned assets work together, the timely coordination can hardly be realized with remotely operated UAVs. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 91–100, 2009. © Springer-Verlag Berlin Heidelberg 2009

92

C. Meitinger and A. Schulte

Our approach to enable multi-UAV guidance and manned-unmanned teaming is to introduce so-called Artificial Cognitive Units (ACUs) aboard the UAVs, which are capable of understanding mission objectives and the environmental situation, which have an explicit representation of goals that should be achieved in the course of a mission and which are able to plan appropriate action sequences. In short, these ACUs mimic certain aspects of human cognition, which are relevant for goal-directed, rational behavior. This is assumed to be a crucial competence to enable human-machine co-operation on peer level and to form manned-unmanned teams in airborne missions. This paper considers the development, implementation, and evaluation of ACUs for UAV guidance within such manned-unmanned teaming scenarios, in which human pilots have to work together with several co-operative ACUs onboard UAVs in order to successfully accomplish a mission. Within our work, a military air-to-ground attack mission served as an example (see figure 1). Attack-A/C

known threat

target pop-up threat

SEAD-UAVs Forward Line of Own Troops

Human Pilot Artificial Cognitive Unit (ACU)

Fig. 1. Generic Air-to-Ground-Attack Scenario for investigation of human-ACU co-operation (A/C: aircraft, SEAD: Suppression of Enemy Air Defense)

To start with, a short introduction on artificial cognition will be given. This includes details on aspects of cognition that are being modeled and realized as technical systems. Afterwards, our concept of knowledge-based co-operation, necessary knowledge and a prototype implementation will be described. Finally, the evaluation of the implemented ACU with respect to goal-oriented behavior and human-machine cooperation will be presented.

2 Artificial Cognition As mentioned above, we introduce so-called Artificial Cognitive Units (ACUs) aboard UAVs which are capable of goal-directed planning of their behavior while considering the current situation. In order to perform well in as many situational configurations as possible these ACUs have to be able to exhibit behavior on all levels of human performance as introduced by [1]: − Skill-based behavior is characterized by highly automated and efficient execution of sensori-motor patterns without the need to be aware of. Just like an experienced helicopter pilot can perform a hovering task on this level, a controller stabilizing an airborne platform would exhibit the equivalent of skill-based behavior. − Rule-based behavior in contrast requires attentional resources from humans and can be observed in standard task situations. Then, a direct situation-task-mapping

Human-UAV Co-operation Based on Artificial Cognition

93

takes place and the appropriate tasks can be executed by means of skill-based capabilities. Typical rule-based behavior in the aviation domain can be observed when processing check lists e.g. before departure. − Knowledge-based behavior is of importance in situations, which have not been experienced before and for which it is not known what should be done next. For example, a pilot might not immediately have an appropriate solution to a situation in which certain mission relevant tasks have to be completed but at the same time unexpected events such as a change of the tactical situation or onboard available resources occur. In order to determine the next steps, at first, the situation has to be understood, then, currently relevant goals will be determined and finally, appropriate actions have to be planned which are suited to achieve the desired state. While skill-based and rule-based behavior can be implemented into technical systems quite straightforward, performance on the knowledge-based level is much harder to realize, because developers have to enable the system to understand the situation, to reason about super ordinate goals, to decide what to achieve next and to plan appropriate action sequences.

Interpretation

Input Interface

Belief

Environment

observable behaviour of CP = ACU behaviour

Input Data

Goal Determination

Environment Models Desires situational knowledge

a-prioriknowledge

Action Alternatives Instruction Models

Instructions

Plan

Goals

Planning

Output Interface

Scheduling

Fig. 2. The Cognitive Process

A paradigm for the design for artificial cognitive systems with such capabilities is the Cognitive Process (CP) [2, 3, 4], which is depicted in figure 2. It is a model of human information processing and describes a-priori knowledge models necessary for the implementation of especially knowledge-based behavior as well as transformation steps actually processing the knowledge. The transformer interpretation uses environment models to gather an understanding of the current situation on the basis of input data from the environment. This belief is the most important input for the determination of currently relevant goals to be achieved next. These are derived from desires, which describe all goals that can potentially be prosecuted by the ACU. The transformer planning assembles available action alternatives to a plan, which is suited to achieve the goals. Finally, the plan is being executed and instructions are sent to the output interface.

94

C. Meitinger and A. Schulte

For the realization of ACUs based on this paradigm, the cognitive system architecture COSA [2] has been developed which provides a framework implementing application-independent parts of the CP, i.e. knowledge processing by the transformers. Moreover, the development of application-specific a-priori knowledge is supported by the Cognitive Programming Language CPL, which allows to actually formulate environment models, desires, action alternatives and instruction models.

3 Co-operative UAV Guidance Based on this approach to design and implementation of artificial cognitive units capable of knowledge-based behavior, a mission management system for UAVs has been developed, which is capable of co-operative mission accomplishment in a manned-unmanned team. In concrete terms, a team consisting of a manned aircraft and several UAVs each of which is guided by such an ACU receives a mission order from an operator and has to co-ordinate itself in order to successfully accomplish the mission. This section at first details the full range of capabilities the UAVs have to cover before our approach to realize co-operative behavior and its modeling and implementation on basis of the Cognitive Process and COSA are being described. 3.1 Required Capabilities According to [5], several capabilities have to be covered by a UAV guidance system in order to be able to successfully accomplish co-operative missions such as the one sketched in figure 1. These are: − System management, i.e. controlling flight guidance systems such as an autopilot − Safe flight, i.e. ensuring collision avoidance with e.g. other aircraft or terrain − Single vehicle mission accomplishment, i.e. the ability to actually accomplish certain mission relevant tasks, which have been assigned to a UAV − Co-operative mission accomplishment, i.e. the ability to achieve the common mission objective together with other assets which includes co-operation, coordination and communication (see section 3.2) The scope of the work presented within this paper was mainly co-operative mission accomplishment on the knowledge-based level of performance while the other capabilities were just considered as much as necessary for co-operation. 3.2 Co-operation Within a human-ACU team, the ACUs involved have to co-operate with both human team members and other ACUs. Co-operation means that several agents, i.e. humans and/or ACUs, work together, in order to achieve a common objective (cf. e.g. [6]). Different tasks assigned to the team members involved contribute to this common objective. Usually, these tasks are not independent from each other (cf. e.g. [7]). For example, one task has to be completed before another one can be started or several team members need a common resource. Such interdependencies have to be resolved in order to ensure efficient teamwork. Therefore, the framework of commitments, conventions and social conventions described by Jennings [8] is being used, which

Human-UAV Co-operation Based on Artificial Cognition

95

allows reasoning on working together on a high level of abstraction. An important means for co-ordination is communication, i.e. the exchange of information among agents. For communication of the ACUs the Agent Communication Language specified by the Foundation of Intelligent Physical Agents (FIPA, www.fipa.org) is being used. It provides the description of a message format, performatives and content languages, as well as interaction protocols. The latter define what kind of messages have to be exchanged in order to achieve a certain objective. For example a ‘request interaction’ can be used to ask another team member to complete a certain task. 3.3 Modeling and Implementation In order to provide ACUs with the capabilities described in sections 3.1 and 3.2 appropriate knowledge has to be modeled, i.e. desires, action alternatives, instruction models and environment models of the a-priori knowledge of the Cognitive Process (see section 2) have to be identified. As co-operative behavior shall be exhibited on the knowledge-based performance level, above all appropriate desires have to be identified to implement co-operative capabilities [9]. These refer to the achievement of the common objective by committing and dropping commitments to tasks appropriately and actually completing assigned tasks. Secondly, working together as a team is being addressed by appropriate information exchange, distribution of tasks among team members and setting up a team structure. Thirdly, the co-ordination of interdependencies among the activities of agents is taken into account and finally, a desire producing communication within the team is being modeled. Action alternatives mainly refer to the initiation of dialogs and communicative acts, i.e. sending messages. An instruction model describes the message protocol used. Finally, a comprehensive understanding of the current situation is necessary for knowledge-based behavior. This is gathered by the implementation of environment models comprising concepts such as actor, team, resource, task, dialog and commitment. The interaction of several models of the implemented a-priori knowledge shall be explained taking the distribution of workload within an ACU team as example. Figure 3 shows several models of the a-priori knowledge of the developed ACU (continuous frame) as well as instances of these models (dotted frame). Here, two actors (actor-self and actor-1) as instances of the environment model actor exist within the situational knowledge of the considered ACU which is associated with actor-self. Moreover, there are two instances of task-destroy (task-destroy-0 and task-destroy-1). While actor-1 is committed to both tasks (two instances of commitment), there is no commitment of actor-self to any task. This results in the conclusion, that workload of actor-self is low (cf. attribute is of instance workload-0 of workload) while the workload of actor-1 is high (cf. attribute is of instance workload-1 of workload). So far, the current situation has been interpreted, i.e. instances of environment models have been created and their attributes have been assigned values. Activation criteria of the desire balance-workload comprise the knowledge, that balance-workload shall be pursued as active goal in case the workload of one actor is low while the workload of another actor is high. As this is the case, an instance of balance-workload is created within the situational knowledge. To achieve this goal, several actions are possible from the perspective of actor-self, namely to propose to take over task-destroy-0 or to propose to take over task-destroy-1. Both are instances of the action alternative propose. As there are more than one possible ways to achieve

96

C. Meitinger and A. Schulte

id = 0

commitment-0 sible

sible

workload-0 is = low

actor

workload

actor-1

actor

id = 1

task-destroy-1

content

actor

commitment-1 respon-

task

task-destroy-0

workload-1 is = high

to

propose type = takeover

actor-self

respon-

task

task-destroy

commitment

content

effects

to

propose

propose

type = takeover effects

balanceworkload

balanceworkload

actor

Fig. 3. Interplay of models (a-priori knowledge) and instances (situational knowledge) [9]

the active goal, appropriate selection knowledge has to be used to choose from the action alternatives being available. Assuming, that actor-self selects to propose taking over task-destroy-0 from actor-1, an appropriate dialog will be initiated and conducted, again leading to the instantiation of a-priori knowledge models including the activation of desires.

task

task-destroy

actor-self id = 0

respon-

commitment-2 sible

task-destroy-0

commitment

task

actor

workload-0 is = medium

actor

workload

commitment-1 responsible

task-destroy-1

actor-1 id = 1

actor

workload-1 is = medium

Fig. 4. Models and instances thereof after appropriate actions have been successfully carried out in order to achieve goal “balance workload”

This interaction results in a change in commitments, which is also communicated to other team members and reflected within the situational knowledge (cf. figure 4). In concrete terms, there is no commitment of actor-1 to task-destroy-0 any more (previously commitment-0), but instead, actor-self has committed to task-destroy-0 (commitment-2) and the workload of both actors is assumed to be “medium” now. Thus, the desired situation has been reached and there is no reason for an ongoing activation of the desire “balance-workload” any more. This example is based on assumptions about the workload of the team members involved. While for machine team members, workload can quite easily be inferred from task load, for human team members, more elaborate models have to be introduced here. In this way, appropriate knowledge has been implemented referring to all desires relevant to co-operative capabilities mentioned above (see [9]). Moreover, application specific capabilities have been considered which are necessary for the evaluation of the developed prototype that will be described in the next section.

Human-UAV Co-operation Based on Artificial Cognition

97

4 Evaluation The evaluation of the ACUs capable of co-operatively accomplishing a multi-UAV mission was conducted using a simplified air-to-ground-attack mission as an example (cf. figure 1). The ACUs were evaluated from three perspectives, namely (1) achievement of the specified functionality in a pure ACU team, (2) capability to behave on knowledge-based performance level, and (3) co-operation within human-ACU team. As there are detailed reports on the achievements regarding item (1) (e.g. [9, 10]), this paper will focus on knowledge-based behavior and human-ACU co-operation. 4.1 Knowledge-Based Behavior Section 2 gave an overview of the characteristics of knowledge-based behavior, which most notably comprises the orientation of behavior towards explicit goals in situations for which no direct situation-task mapping is available. For the purpose of clarity, knowledge-based performance shall here be illustrated using a single vehicle sub-problem as example. Figure 5 top left shows a UAV on its flight back to the home base. It is currently within the range of a temporarily shutdown SAM site, which can re-activate at any point in time. Relevant goals of the ACU guiding the UAV within this context are the avoidance of (potential) threat by the SAM site as well as flight to the home base. The action alternatives it has at hand are to fly away from the SAM site or to plan and

Fig. 5. Knowledge-based balancing of threat exposure and route following [9]

98

C. Meitinger and A. Schulte

activate a flight plan which can not consider potential threat. Therefore, no action alterative is suited to achieve both goals at the same time. Still, in the end the UAV flies a trajectory which in principle satisfies both goals mentioned above (see figure 5 bottom right). This trajectory results from a situationadapted prioritization of active goals and selection of action alternatives. Based on an interpretation of the current situation, the ACU decides at some points in time that it is more important to escape the potential threat of the SAM site and chooses the appropriate action alternative (see figure 5, 7:12, 7:39, 8:15), while in other situations the goal to fly to the home base which is achievable by planning and activation of a route is being pursued (see figure 5, 6:56, 7:23, 7:47, 8:22, 9:37). Of course, the resulting trajectory is not as optimal as it could have been when using a highly specialized algorithm for trajectory generation. But although there was no such expert knowledge or capability within the system, behavior could be observed, which does follow explicitly represented super ordinate goals and which is close to expected behavior. [9] 4.2 Human-ACU Co-operation In order to gain insight to problems arising in human-ACU co-operation and derive requirements for future ACU development, an experiment has been conducted, where human pilots had to control up to three UAVs equipped with ACUs to accomplish a mission as described in figure 1 [9, 11]. It was expected, that humans would be able to manage the guidance of one UAV but workload would exceed an acceptable level when increasing the number of UAVs. 100%

56% 44%

50%

44%

31% 0%

Attack SEAD

Fig. 6. Average workload of subjects in different configurations

To investigate this, five experienced military air crew members were asked to work together with one or three UAVs in different roles (attack or SEAD, cf. figure 6, bottom) and scenarios. The manned aircraft could be controlled on autopilot level. Cooperation within the human-ACU team was based on the exchange of information on capabilities, resources, i.e. weapons, and commitments via a graphical user interface. Moreover, task requests could be sent to either individual UAVs or the whole UAV team. After each run workload was measured using the NASA-TLX method [12] and the subjects were interviewed. Figure 6 shows the average workload of subjects against the configuration. Interestingly, workload is on a medium level in all configurations that have been investigated

Human-UAV Co-operation Based on Artificial Cognition

99

and there is only a small increase in workload when changing the number of UAVs within the team from one to three. Moreover, the increase in workload when changing the human role from attack to SEAD is as large as when adding two UAVs, which could be related to different performance of UAVs in the SEAD and attack role. These results show that human-ACU co-operation works well with the approach of cognitive and co-operative automation. Still, further improvement potential was identified in the following areas after having analyzed interview protocols: − Team Structure. The introduction of a hierarchical team structure is encouraged, although team members shall be capable of situation dependent deviation from leader input. − Abstraction of Interaction. Interaction with ACUs shall be as abstract as possible (e.g. specification of tasks for UAV team rather than detailed instructions for single UAVs). In particular it shall be possible to give instructions on different abstraction levels. − Level of Automation. ACUs shall be able to act on a high level of automation but the actual level shall be adaptable or self-adapted to the current situation and task. − Teamwork. Co-operation of humans and ACUs shall be based on a common agenda and anticipate future actions of team members. − Task completion. The capability of ACUs to actually accomplish tasks has to be improved. − Communication within team. Vocabulary shall be adapted for more intuitive understanding of humans. Moreover, the number of dialogs and messages shall be reduced to a minimum in order to avoid overload of humans. − Assistant System. The human team member shall be supported by an electronic assistant providing situation awareness concerning team members, task assignment and information flow within team as well as accomplishment of the primary mission related task, i.e. aircraft guidance.

5 Summary While humans are capable of goal-directed behaviour even in situations they have not experienced before, conventional automation systems usually lack the capability to understand the situation, identify relevant goals and plan actions which can transform the current situation into the desired one. Cognitive automation in contrast is capable of such knowledge-based performance and can therefore be used to develop Artificial Cognitive Units as advanced UAV guidance systems for usage in highly complex and unpredictable scenarios. The Cognitive Process has been used as paradigm for the design of co-operative Artificial Cognitive Units and applied to co-operative UAV guidance. The concept and some implementation details of co-operative capabilities have been described before the resulting capabilities of the ACUs have been discussed. Hereby, at first the capability of the developed ACUs to behave on the knowledgebased level of performance has been explained and the importance of explicitly represented goals within the situational context has been reported on. Finally, the promising results of a human-ACU teaming experiment have been presented, where a human operator could work together with up to three UAVs very well. Future steps include the

100

C. Meitinger and A. Schulte

application of these results in more realistic scenarios and the helicopter domain as well as field tests of ACUs using our UAV demonstrators [13]. Moreover, techniques concerning knowledge acquisition and modelling will be investigated in order to approach a system engineering process for development of cognitive automation.

References 1. Rasmussen, J.: Skills, Rules, and Knowledge; Signals, Signs, and Symbols, and other Distinctions in Human Performance Models. IEEE Transactions SMC 13(3) (1983) 2. Putzer, H., Onken, R.: COSA – A Generic Cognitive System Architecture Based on a Cognitive Model of Human Behavior. Cognition Technology and Work 5, 140–151 (2003) 3. Onken, R., Schulte, A.: System-ergonomic Design of Cognitive Automation – Dual-Mode Cognitive Design of Vehicle Guidance and Control Work Stations. Springer, Heidelberg (2009) 4. Schulte, A., Meitinger, C., Onken, R.: Human Factors in the Guidance of Autonomous Vehicles: Oxymoron or Tautology? The Potential of Cognitive and Co-operative Operator Assistant Systems. Cognition Technology and Work (2008) 5. Platts, J., Ögren, P., Fabiani, P., di Vito, V., Schmidt, R.: Final Report of GARTEUR FM AG14. GARTEUR/TP-157 (2007) 6. Ferber, J.: Multi-Agent Systems – An Introduction to Distributed Artificial Intelligence. Addison-Wesley, Harlow (1999) 7. Malone, T.W., Crowston, K.: The Interdisciplinary Study of Co-ordination. ACM Computing Surveys 26(1), 87–119 (1994) 8. Jennings, N.R.: Coordination Techniques for Distributed Artificial Intelligence. In: O’Hare, G.M.P., Jennings, N.R. (eds.) Foundations of Distributed Artificial Intelligence, pp. 187–210. Wiley, Chichester (1996) 9. Meitinger, C.: Kognitive Automation zur kooperativen UAV-Flugführung. Dissertation, Universität der Bundeswehr München (2008) 10. Meitinger, C., Schulte, A.: Onboard Artificial Cognition as Basis for Autonomous UAV Co-operation. In: Platts, J. (ed.) GARTEUR FM AG14 – Autonomy in UAVs Technical Proceedings. GARTEUR/TP-157 (2007) 11. Rauschert, A., Meitinger, C., Schulte, A.: Experimentally Discovered Operator Assistance Needs in the Guidance of Cognitive and Cooperative UAVs. In: Proceedings of the Conference of Humans Operating Unmanned Systems (HUMOUS 2008), Brest, France (2008) 12. Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, pp. 139–184. North Holland Press, Amsterdam (1988) 13. Kriegel, M., Schulte, A., Stütz, P.: Testbed environment for future UAV automation technologies. In: Proceedings of the 2nd International Workshop on Aircraft System Technologies (AST 2009), Hamburg, Germany (2009)

Development of an Evaluation Method for Office Work Productivity Kazune Miyagi1, Hiroshi Shimoda1, Hirotake Ishii1, Kenji Enomoto1, Mikio Iwakawa2, and Masaaki Terano2 1 Kyoto University, Gokasho, Uji, Kyoto, Japan Panasonic Electric Works Co., Ltd., Kadoma, Osaka, Japan {Miyagi,shimoda,hirotake,enomoto}@energy.kyoto-u.ac.jp, {iwakawa,terano}@panasonic-denko.co.jp 2

Abstract. The authors have developed a performance test, CPTOP2 (Cognitive Performance Test of Productivity), which consists of four task tests to evaluate cognitive abilities of office workers in order to quantitatively and objectively evaluate their productivity by controlling office environment. In addition, the testing time of CPTOP2 becomes shorter than conventional CPTOP2 in order to introduce it in evaluation of actual office environment. In this study, two subject experiments were conducted to verify its function and accuracy. The function of CPTOP2 was verified by measuring brain activity by fNIRS when conducting CPTOP2 test. The accuracy of CPTOP2 was verified by comparing improvement of performance indexes of CPTOP2 with that of simulated office work. Keywords: office environment, performance test, fNIRS, cognitive ability.

1 Introduction Energy saving is one of the countermeasure of increasing green house gas emission caused by increasing recent worldwide energy consumption. In Japan, the government has promoted that the temperature of air-conditioning system in the summer should be 28 degree Celsius and office workers are recommended to wear casual style cloths. However, the drop in productivity of office workers caused by the energy saving may extend their labor time and this may consume more energy[1]. On the other hand, recent studies have revealed that improvement of office environment may improve the productivity of office workers[2]. However, the method which evaluates office productivity objectively and quantitatively has not been established yet, because office works include not only simple repetitive works but also creative and atypical work. If such method is developed, it is expected that it can be utilized for the design of office room and the evaluation of energy consumption based on office work productivity. As mentioned above, it is difficult to directly measure the office work productivity because there are various kinds of office works. In this study, therefore, the followings are assumed. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 101–110, 2009. © Springer-Verlag Berlin Heidelberg 2009

102

K. Miyagi et al.

1. Office work is a brain work so that it mainly employs human cognitive abilities. 2. The productivity of office work is decided how much the cognitive abilities can be utilized. 3. The utilization of each cognitive ability can be measured by the corresponding performance test. 4. Total office work productivity can be valuated as the utilization of each cognitive ability and its weight depending of the office worker’s individual task. The authors, therefore, had developed an office work performance test, CPTOP (Cognitive Performance Test for Office Productivity) based on the above assumptions, aiming at establishing an evaluation method of office work productivity[3]. CPTOP can evaluate how much cognitive abilities can be applied in an office environment by conducting performance tests. In order to reveal the abilities necessary for the office work, the authors defined essential abilities based on “Handbook of Human Abilities” written by E.A.Fleishman[4]. Among the human abilities, 21 cognitive abilities such as “oral comprehension” and “memorization” were picked up as the elementary abilities for the office work. In addition, the interview and questionnaire were conducted to the workers who were general clearks, engineers, managers, researchers and so on. Based on the results of the interview and questionnaire, 11 essential abilities were picked up and CPTOP consists of 11 task tests corresponding to the 11 essential abilities. In addition, the authors have conducted laboratory experiments where office work productivities were measured by using CPTOP under two lighting conditions which are normal lighting (fixed to 750 lux) and circadian rhythm lighting where illumination was controlled from hour to hour in order to adjust human circadian rhythm[5]. As the experimental result, it was found that the results of both CPTOP and simulated office work under the circadian rhythm lighting were improved approximately 4% comparing with the normal lighting. The above experiments were conducted with employing subjects in laboratory experiment room. Environmental conditions such as temperature, humidity, noise, air circulation were strictly controlled in the experiments, however, it take 95 minutes to conduct 11 performance tests of CPTOP. When office work productivity in an actual office is measured by using CPTOP in the future, the testing time of 95 minutes is too long because it disturbs their office works for 95 minutes. It is, therefore, required to shorten the testing time in order to introduce it to the evaluation of office work productivity in actual office environment.

2 Development of CPTOP2 In this study, 11 task tests for 11 cognitive abilities were reconsidered to reduce the number of task tests and to shorten the whole testing time. As the result, the authors have developed a new performance test, CPTOP2, which is summarized the 11 task tests into only four. Table 1 shows the task tests and the corresponding abilities which each task test can evaluate. In addition, CPTOP2 has been developed as a web-based software, so that the test can be conducted by using only a PC connecting with the Internet. The total time of CPTOP2 is 34 minutes approximately and is much shorter than that of the conventional CPTOP. Figure 1 shows the example screen shots of CPTOP2.

Development of an Evaluation Method for Office Work Productivity

103

Table 1. Task test of CPTOP2 and cognitive abilities to be evaluated Task test of CPTOP2 1.Word reordering

2.Block assembling 3.Filling a blank of number series 4.State transition memorization Total performance of the above

Word reordering

Filling a blank of number series

Cognitive abilities to be evaluated Oral Comprehension Written Comprehension Oral Expression Written Expression Fluency of Ideas Originality Mathematical Reasoning Number Facility Inductive Reasoning Memorization Selective Attention

Block assembling

State transition memorization

Fig. 1. Screen shots of CPTOP2

The details of each task test will be explained in the following sections. 2.1 Word Reordering As shown in top left of Figure 1, five words are randomly displayed in the window. In this task, these words should be reordered to make a grammatically and semantically correct sentence by clicking the words in correct order. The performance indicator of

104

K. Miyagi et al.

this task test is number of correct sentences in a unit time. This task test evaluates linguistic abilities such as oral comprehension, written comprehension, oral expression and written expression. 2.2 Block Assembling As shown in top right of Figure 1, four black blocks should be rearranged by dragging a mouse to create a meaning shape and entitle it. The example figure shows a shape of “rabbit”. The performance indicator of this task test is number of assembled figures in a unit time. This task test evaluates the abilities related to ideas such as fluency of ideas and originality. 2.3 Filling a Blank of Number Series As shown in bottom left of Figure 1, a number series with a blank is displayed in the window. In this task, a number which matches the blank should be chosen among four choices below. The number series is one of arithmetic progression, geometric progression and Fibonacci sequence. The performance indicator of this task test is number of correct answers in a unit time. This task test evaluates mathematical abilities such as mathematical reasoning and number facility. 2.4 State Transition Memorization As shown in bottom right of Figure 1, one of the figure “circle”, “triangle” and “square” and the buttons of “1”, “2” and “3” are displayed in the window. The figure is changed by pressing one of the buttons and all of the change pattern should be memorized. After pressing “answer” button, “1”, “2” and “3” buttons can not be pressed and memorized change pattern can be input in the right part. The performance indicator of this task test is number of correct memorized patterns in a unit time. This task test evaluates inductive reasoning and memorization.

3 Verification of Evaluation Function of CPTOP2 In order to verify the evaluation functions of CPTOP2, a subject experiment was conducted using fNIRS (functional Near-Infrared Spectroscopy). Concretely, based on the localization of brain functions, brain activity was measured with fNIRS while the subjects were conducting each test, in order to check whether corresponding brain anatomies are activated or not. In the verification experiment, the control tasks had been prepared in advance, which realized only mouse movements of each task test without the cognitive parts. And the brain activity during each task test was calculated by subtracting the activities while conducting the control task from that while conducting CPTOP2 task test. Six subjects (three males and three females) joined the experiment in total. As the result of the measurement, it was confirmed that each CPTOP2 task test activated the corresponding brain anatomies, for example, the state transition memorization test activated both sides of frontal lobe which corresponds to short term memory of human cognitive function as shown in Figure 2.

Development of an Evaluation Method for Office Work Productivity

105

Activations related to short term memory (STM)

―oxy-HB ―deoxy-HB ―total-HB

Fig. 2. An example result of measurement while conducting state transition memorization test

4 Verification of Evaluation Accuracy of CPTOP2 In order to verify the evaluation accuracy of CPTOP2, another subject experiment was conducted where illumination conditions were controlled. The details of the experiment will be described in this chapter. 4.1 Purpose The purpose of this experiment was to verify the evaluation accuracy and sensitivity of CPTOP2 by comparing the results of CPTOP2 under normal lighting condition and circadian rhythm lighting condition by task light. Here, the task light means a luminaire which is set just above the desk in order to ensure enough illumination and save energy. 4.2 Experimental Method Experimental environment. This experiment was conducted in an experimental room where environmental conditions can be controlled. Figure 3 shows the top view of the experimental room and a scene of experiment. There are four desks and four PCs in order that four subjects could join the experiment at the same time. An illumination controllable fluorescent light was installed above each desk as a task light, which color temperature was 5000K and Ra was 84. In the experiment, two illumination conditions were applied which were “normal condition” and “circadian condition”. Under the normal condition, the illumination on the desk was fixed to 750 lux by the task light and other fluorescent lights on the ceiling. Under the circadian condition, the illumination on the desk was controlled as Figure 4.

106

K. Miyagi et al.

Task Light PC

Subject A,E

Subject C,G

Subject B,F

Subject D,H

Window

Experimenter

Server PC

Task Light

Subject

Illumination on desktop (lux)

Fig. 3. Top view of experimental room and a scene of experiment

2200

750

9:00

11:45

13:00 14:00

17:00

Time

Fig. 4. Illumination control sequence under circadian condition

Human circadian rhythm can be adjusted by being exposed high illumination in the morning and it leads deep sleep in the night and high arouse in the day time. By applying this illumination condition, human circadian rhythm is supposed to be adjusted properly. In the experiment, the performances of intellectual productivity under these two conditions were compared. Subjects. Eight subjects joined this experiment, who are three males and five females. The range of their ages were from 27 to 54, and the average age was 40.9. All of them had experiences of office work. Experimental procedure. The experiment was conducted for eleven days in total as shown in Figure 5. Figure 6 shows the experimental procedure in each day. The first day was a day for practice where task procedures of CPTOP2 and simulated office task were explained by the experimenter and the subjects made a practice for them. At “normal condition I” from 2nd day to 4th day in Figure 5, the lighting condition of the experimental room was set to “normal condition” mentioned in 4.2.1 and the procedure in Figure 6 was conducted. At “circadian condition” from 5th day to 7th day, the

Development of an Evaluation Method for Office Work Productivity

107

lighting condition was set to “circadian condition” and the procedure in Figure 6 was conducted as well. At “normal condition II” from 8th day to 10th day, the lighting condition was set to “normal condition” again and the procedure was conducted as well. On the last day, the same procedure was conducted as a dummy experiment in order to cancel terminal effect. Monday

Tuesday

1st week

2nd week

Circadian Condition

Circadian Condition

Wednesday

Thursday

Friday

Saturday

Practice

Normal Condition I

Normal Condition I

Normal Condition I

Circadian Condition

Normal Condition II

Normal Condition II

Normal Condition II

Sunday

Dummy measurement

Fig. 5. Schedule of experiment 9:10 - 9:15

Fatigue Questionnaire (1)

9:15 - 10:25

1st task set

10:30 - 11:40

2st task set

11:40 - 11:45 11:45 - 12:30 12:30 - 12:35

Contents of a task set Word reordering (7min.)

Block assembling (10min.)

CPTOP2

Fatigue Questionnaire (2)

Lunch break

Filling a blank of number series (7min.)

Fatigue Questionnaire (3) State transition memorization (10min.)

12:35 - 13:30

3rd task set

13:35 - 14:30

4th task set

14:30 - 14:40

Break

14:40 - 15:35

5th task set

15:40 - 16:35

6th task set

16:35 - 16:40

Fatigue Questionnaire (4)

Simulated office task (10min.)

Fig. 6. Experimental procedure in a day

On each day, fatigue questionnaire was first conducted, then two task sets were given to the subjects. Each task set consists of four task tests of CPTOP2 and a receipt classification task as a simulated office task. After the two task sets, the fatigue questionnaire was conducted at the last of the morning. After lunch break, the fatigue questionnaire was conducted again and two task sets were given as well. After a short break, two task sets were given to the subjects again then the fatigue questionnaire was conducted at the last of the day. Namely six task sets and four fatigue questionnaire were given in a day. Measured Indexes. The performance indicators of four task tests of CPTOP2 and the simulated office work were measured as performance indexes. The simulated office task was the receipt classification task as mentioned above, in which maximum 200

108

K. Miyagi et al.

receipts should be classified into 27 categories according to their amount of money, payee, payment method and account of money spent. The performance indicator of this task is number of classified receipts in a unit time. Fatigue questionnaire, on the other hand, examines the fatigue states of the subjects in five viewpoints which are sleepiness, discomfort, haze, instability and dullness. It consists of 25 questions and they are answered with five grades. 4.3 Experimental Results and Discussions Figure 7 shows the average of the normalized performance indexes of each task set. When calculating the normalized performance indexes, it is assumed that the 2.0

2.0

2.3%

1.5 1.0

1.0

0.5

0.5

0.0

Normal Condition I

Circadian Condition

Normal Condition II

0.0

(a)Word reordering 2.0

Normal Condition I

Circadian Condition

Normal Condition II

(b)Block assembling 2.0

4.0%

1.5

1.0

0.5

0.5

Normal Condition I

Circadian Condition

1.9%

1.5

1.0

0.0

6.7%

1.5

Normal Condition II

(c)Filling a blank of number series

0.0

Normal Condition I

Circadian Condition

Normal Condition II

(d)State transition memorization

2.0

2.9%

1.5 1.0 0.5 0.0

Normal Condition I

Circadian Condition

Normal Condition II

(e)Receipt classification (Simulated office task)

Fig. 7. Normalized performance indexes of CPTOP2 and simulated office task

Development of an Evaluation Method for Office Work Productivity

109

performance indexes under both “normal condition I” and “normal condition II” are the same, then learning curve of each task test is deduced. The learning curve is assumed to be expressed as the following equation;

Pn = Plim − ( Plim − P1 ) ⋅ (1 − r ) n−1

(1)

where

Pn Plim r

: Performance index of nth test

: Performance index when finishing learning : Improvement rate The parameter Plim and r can be determined with method of least squares by the performance index of each test under “normal condition I” and “normal condition II”. The normalize performance indexes are calculated with the learning curve as the standard line. In this procedure, the normalized performance indexes in which the learning effect was compensated were deduced. In each graph of Figure 7, the horizontal axis means experimental term and they express “normal condition I”, “circadian condition” and “normal condition II” from left to right, while the vertical axis means the normalized performance indexes as bar graphs and their standard deviation as error bars. As the results, the performance of (e) simulated office task improved 2.9% under “circadian condition” comparing with those under “normal condition”. Because the circadian rhythm illumination could adjust the human biorhythm, the concentration of day time could be improved. This result matches well with the previous studies by the authors[6]. On the other hand, the result of CPTOP2 which were (a) word ordering test, (b) block assembling test and (d) state transition memorization test improved 2.3%, 6.7% and 1.9%, respectively. The average of these results matched well with that of (e) simulated office task. The result of (c) filling a blank of number series test, however, dropped down -4.0%. It was because the solution strategy of the filling a blank of number series test changed frequently so that the learning effect of the test could not be compensated properly. This test should be improved in the future, especially the answering method should be changed. On the other hand, most of the results of fatigue questionnaire have no significant difference between normal condition and circadian condition. The only cases which have significant difference (p<0.01) are; 1. sleepiness under circadian condition was lower than that under normal condition and 2. discomfort under circadian condition was higher than that under normal condition. The reason of (1) is supposed that biorhythm of the subjects were adjusted by the circadian rhythm lighting, while the reason of (2) is supposed that the circadian rhythm lighting kept subjects’ arousal level for longer time.

5 Conclusion In this study, an intellectual performance test, CPTOP2, has been developed from the viewpoint of human cognitive functions. The test time of CPTOP2 was shortened

110

K. Miyagi et al.

comparing with conventional CPTOP in order to introduce it into actual offices. In addition, subject experiments were conducted in order to verify its evaluation function and accuracy. As the result of fNIRS measurement, it was found the each task test of CPTOP2 activated the corresponding brain anatomies. As the result of the illumination experiment, average improvement of each test except filling a blank of number series test under the condition of circadian rhythm illumination matched well with that of the simulated office task.

References 1. Lomonaco, G., Miller, D.: Environmental Satisfaction, Personal Control and the Positive Correlation to Increased Productivity, Johnson Controls Inc. (1997) 2. Brill, M.: Using Office Design to Increase Productivity, vol. I. Buffalo Workspace Design and Productivity Inc. (1984) 3. Shimoda, H., Ito, K., Hattori, Y., Ishii, H., Yoshikawa, H., Obayashi, F., Terano, M.: Development of Productivity Evaluation Method to Improve Office Environment. In: 12th International Conference on Human-Computer Interaction, vol. 9(2), pp. 939–947 (2007) 4. Fleishman, E.A., Reilly, M.E.: Handbook of Human Abilities, pp. 1–37. Consulting Psychologists Press, Palo Alto (1992) 5. Enomoto, K., Kondo, Y., Obayashi, F., Iwakawa, M., Ishii, H., Shimoda, H., Terano, M.: An Experimental Study on Improvement of Office Work Productivity by Circadian Rhythm Light. In: The 12th World Multi-Conference on Systemics, Cybernetics and Informatics, WMSCI 2008, vol. VI, pp. 121–126 (2008) 6. Obayashi, F., Kawauchi, M., Terano, M., Tomita, K., Hattori, Y., Shimoda, H., Ishii, H., Yoshikawa, H.: Development of an Illumination Control Method to Improve Office Productivity. In: 12th International Conference on Human-Computer Interaction, vol. 9(2), pp. 939–947 (2007)

Supporting Cognitive Collage Creation for Pedestrian Navigation Augustinus H.J. Oomes1, Miroslav Bojic1, and Gideon Bazen2 2

1 Delft University of Technology, Delft, The Netherlands Logica Management Consulting, Amstelveen, The Netherlands [email protected]

Abstract. How can we assist people in efficiently finding their way around a novel area? We tested a prototype navigation support system with 10 elderly pedestrians and found that adding landmark information considerably helped them in learning the structure of an unknown residential environment. We conclude that providing explicit landmark information in the learning phase seems beneficial for the creation of a rich “cognitive collage” that is fully functional in later phases when navigation support is not available. Keywords: human spatial navigation, cognitive map, cognitive collage, path integration, landmark recognition, reorientation, navigation support systems.

1 Introduction Present-day navigational support systems can guide you to your destination almost perfectly. These devices are so effective and easy to operate that users may become entirely dependent on them. Our goal is to develop systems that assist users in getting to know their way around a novel area. The crucial difference with existing systems is that we want to make our users independent of the device as quickly as possible. We lean heavily on ideas developed by researchers of the psychology of human spatial navigation. The classic idea is that we somehow build up “cognitive maps” or “mental maps” of our environment. These maps have similar characteristics as the paper and electronic maps that we use today [4]; representations are enduring (fixed in memory), geo-centric (from a bird’s eye perspective), and comprehensive (completely depicting the environment). Several researchers have shown that these ideas are untenable since the nature of the mental representation of the environment for navigational purposes is fundamentally different [5, 7]. The alternative view is that the human mind is lazy and opportunistic, and will only represent the necessary information for the task at hand. A more apt metaphor for mental representations for navigation is “cognitive collage” [5]; we create representations from experience, from maps, from descriptions, etcetera. The only aim is a functional representation, not a perfect one. In order for humans to navigate from a home location to a destination (and return home safely), we appear to have access to three different mechanisms: path integration, landmark recognition, and re-orientation (see [7] for an excellent review). The first one may surprise readers since we share the method of path integration (or “dead reckoning”) with insects and birds. Path integration is the dynamic updating of the D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 111–119, 2009. © Springer-Verlag Berlin Heidelberg 2009

112

A.H.J. Oomes, M. Bojic, and G. Bazen

present position and direction relative to the home or starting position. It is a simple method that requires limited memory, though it is sensitive to cumulative errors. In the method of landmark recognition, we use salient objects in the environment that are easily remembered and recognized to guide us from one location to the next. Finally, re-orientation is the method that guides us from one space to the next by recognizing the present environment by its global shape and orientation, and remembering the exits to guide us to the next space. For self-reliant navigation, humans use a combination of the methods above, depending on the structure of the environment and their personal experience. Another matter is communicating how to get from A to B to other people. Tversky and Lee [6] asked participants to give them descriptions and depictions of a certain route. They found that there are many similarities between verbal and pictorial directions, and that they consisted of common elements: landmarks (buildings, e.g. “the Old Church”), orientations (paths, e.g. “an alley”), and actions (turn, e.g. “take a right”). May, Ross, Bayer, and Tarkiainen [2] confirmed this finding and found that landmarks are the elements in route instructions with the highest occurrence. The focus in that study was, like in Tversky and Lee [6], on generating instructions, not on actually using them. Ross, May, and Thompson [3] gave pedestrians in real navigation situations explicit information on landmarks. Their findings show a reduction of errors and an increased confidence. Finally, we turn to the issue of the effect of age on navigational skills. Wilkniss, Jones, Korol, Gold, and Manningal [8] have found that the skill of route learning diminishes with age. For navigational aids, Goodman, Brewster, and Gray [1] found that the use of landmarks is particularly useful for elderly pedestrians. A part of the motivation for our research came from a business case developed with our project partners1. We wanted to find ways of assisting elderly people that are moving into eldercare and are learning their way around a new residential environment. Our main focus is on testing the effect of landmark information on building up a functional cognitive collage of a novel environment by elderly pedestrians.

2 Prototype We designed and developed a navigation support prototype. Our goal was to provide elderly pedestrians with a device that can guide them from A to B, and has the option of showing additional information about landmarks in the direct environment. We used a HTL Advantage PDA with 624 MHz CPU and 128 Mb memory with a 640 x 480 pixel touch screen display of size 12.5 cm. The PDA had a built-in GPS receiver. In addition, we used an external electronic compass of type Wintec WBT100 with its own GPS receiver that communicated with the PDA through Bluetooth. The PDA and compass were encased in a wooden frame as shown in Figure 1. The operating system of the PDA was Windows Mobile 6.0, and the application was written in C# 2.0 under the .NET compact framework. The 3D graphics protocol was DirectX, with the 3D objects stored as .obj files and the texture images as .jpg files. For the log and configuration files we used simple .txt files. During navigation, 1

Delft University of Technology, TNO Human Factors, and Logica Management Consulting collaborated within the MultimediaN project on developing innovations for multimodal interactive applications.

Supporting Cognitive Collage Creation for Pedestrian Navigation

113

Fig. 1. Prototype of our navigation support device (left), and an example screenshot of the display during navigation in the experimental area in Soesterberg, the Netherlands (right)

the location data was measured and stored in the PDA, as well as in the memory of electronic compass. The user could hold the device in his or her hands and walk normally while occasionally consulting the information provided on the display. Figure 1 shows a screenshot of the prototype in action. The display showed a map in perspective projection that was always oriented in the forward-up direction. The location of the user was indicated on the map by an abstract blue figure, and the heading direction was depicted by a red arrow. In addition to the red arrow that conveyed in which direction to walk, a waypoint was added to make it even clearer where the user would have to go next. Objects in the environment were indicated by small photographs on vertical “billboards” that were always facing the user. The billboards had a slender arrow coming out of the bottom that was touching the map with its sharp end, indicating the location of the object. We distinguished two kinds of objects: waypoints and landmarks. Waypoints are easily recognizable objects (for example fire hydrants or traffic signs) that were on the path of the pedestrian and their photographs were shown on the billboards with a green edge. The waypoints functioned as indicators of intermediate locations on the path to the destination. Landmarks were (large) objects that were easily identifiable structures that could be seen from the pedestrian’s vantage point (for example houses or bridges) shown on billboards with a blue edge. Depending on the condition in the experiment, landmarks where either present on the screen or not, as will be explained in detail in the next section.

3 Experiment We tested our navigation support prototype in an experiment with 10 elderly pedestrians that were asked to walk around a residential area in Soesterberg, the Netherlands. The goal of the experiment was to find out if they could learn the environment with the help of the prototype to the extent that they could subsequently navigate efficiently without its assistance.

114

A.H.J. Oomes, M. Bojic, and G. Bazen

3.1 Participants Ten elderly people, 60 to 72 years old (2 women and 8 men), were selected from a pool of volunteers for their age and the fact that they had never been in the residential area where the experiment took place. This area was close to the TNO Human Factors institute in Soesterberg that manages this pool of experimental participants and graciously allowed us the use of their facilities. The participants were paid for their services. 3.2 Procedure On arrival we informed the participants about the goal of the experiment and had them fill out a survey about their navigational aptitude. We asked them about their experience in using maps and electronic navigation systems, and asked them to selfassess their navigational abilities. We subsequently familiarized them with the prototype and escorted them out of the building to the starting point A that was about 300 meters from the TNO building. The experimenter explained to the participant that their task was to follow the guidance of the navigation device to walk to the destination point B. At the destination point B we asked them to hand in the device and gave them the second task. They now had to walk back to the starting point A without help of either the device or the experimenter. It is important to emphasize that they did not know this in advance; they could not have foreseen that they would be asked to find the starting point without any assistance. The experimenter had set a maximum duration of 30 minutes for the return walk; this limit was actually never reached. After arriving back at the starting point, the experimenter and the participant went back into the TNO building. During the debriefing session, the participants were asked to draw a map of the area that they had just navigated through. Finally, they filled out a survey about their navigation experience during the experiment, and were asked about the usability of the prototype. There was also space on the survey papers for suggestions and remarks. 3.3 Conditions The experimental route in the Soesterberg residential area (Figure 3) was about 1 kilometer long and had 10 turns. We photographed 12 waypoints and 20 landmarks that could be shown on the display of the device. Examples of landmarks are a church, a bank, and a children’s play-field (see also Figure 2). The most important independent variable in the experiment was the presence or absence of landmark information on the display during the walk from A to B. The reader should keep in mind that this information was not necessary for finding the destination B since there was already a rich set of information about which route to follow. The crucial question is whether this additional information, showing landmarks along the route they were presently walking, made it easier for them to walk back to the starting point without assistance.

Supporting Cognitive Collage Creation for Pedestrian Navigation

115

Fig. 2. Examples of typical landmarks (top row) and waypoints (bottom row)

Fig. 3. Satellite image of the experimental route (red line) in a residential area in Soesterberg, the Netherlands, with the TNO building on the bottom right

4 Results First of all, each participant managed to find destination B with the guidance of our navigation support prototype. On average it took them about 17 minutes, ranging from 14 to 22 minutes. Interestingly, all participants were also successful in finding

116

A.H.J. Oomes, M. Bojic, and G. Bazen

the starting point A. It took them on average 13 minutes, within a range of 8 to 17 minutes. We found no significant difference in walking time between the two conditions “landmarks present” or “no landmarks”, either for the route from A to B, or the route back from B to A. There was however a difference in the route that participants had taken back to the starting point A. Figure 4 shows the results of Participant 1 who could see the landmark information on the display (“landmarks present”). The green line indicates the route from A to B, and the red line shows the route back from B to A. As one can see, this person took a very similar route back to the starting point. Figure 5 shows the results for Participant 4 in the “no landmark” condition. It is clear that the route back (red line) is quite different. The participant takes a detour but succeeds in getting back on track, eventually finding the “home” position.

Fig. 4. Results of Participant 1 in the “landmarks present” condition. The green line indicates the route from A to B, and the red line the route back from B to A. The routes are very similar.

We analyzed the routes of all 10 participants in detail and counted the number of waypoints and landmarks that the participants have seen on their way back to the starting point. Our criterion for similarity was that they should have encountered at least 66% of waypoints and 66% of landmarks. If both percentages were under 66% we called them “different”. Table 1 shows an overview that shows that 4 out of 5 participants that were shown landmarks on the device took a similar route back, and 4 out of 5 participant that were not shown landmarks took a different route back.

Supporting Cognitive Collage Creation for Pedestrian Navigation

117

Fig. 5. Results of Participant 4 in the “no landmarks” condition. The green line indicates the route from A to B, and the red line the route back from B to A. Return route is very different. Table 1. Overview of the number of participants taking a similar or a different return route given the conditions “landmarks present” or “no landmarks” Condition Return route

landmarks present

no landmarks

Similar

4

1

Different

1

4

Fisher’s exact statistical test reveals that the probability of this distribution of results is 0.099, rather low but not enough to be considered significant with p < 0.05. The participants in our experiment were also asked to sketch the route they had just taken. We found a wide variability in the quality of these maps (see Figure 6): from just a few lines, to a complete and accurate drawing with an indication of the location of waypoints and landmarks, and an indication of the return route. There is however no correlation between the similarity of the return route and the quality of the sketches.

118

A.H.J. Oomes, M. Bojic, and G. Bazen

Fig. 6. Sketches of Participant 1 (left) and Participant 3 (right)

The surveys revealed quite a large variability in self-assessed navigational skills, but that does not that correlate with performance. The usability of the device was generally rated quite high although it was commented that the photos of the landmarks could have been clearer. We are planning to use more iconic images instead of photographs in the future. Occasionally the weather conditions were of influence on the experiment. One time it rained during the initial phase of the experiment and that mainly affected the quality of the GPS signal resulting in temporary low accuracy of the localization. It took some time before the first waypoint emerged but that was the only problem. Another time, the sun was shining brightly at low angle and this affected the visibility because of glare on the screen. Especially the visibility of the photographs of the waypoints and landmarks was negatively affected.

5 Conclusion We tested whether providing explicit landmark information during the exploration of a novel environment would accommodate elderly pedestrians in becoming independent of navigation support systems. We found strong indications that this may be the case but we are reluctant to draw very strong conclusions given our small sample size (N = 10). For example, we can not exclude the possibility that our experimental participants have used the method of path integration to navigate back to the starting point; all participants managed to return within a reasonable time, even though some used an entirely different route. In future experiments we plan to probe their representation of their location relative to “home” by having them set a dial in the direction of the home location at multiple points along the route. If they are capable of giving accurate estimations of the direction of the home position from a location where the home is not visible, then the path integration system is fully functional.

Supporting Cognitive Collage Creation for Pedestrian Navigation

119

Furthermore, the residential area in Soesterberg that we used had clearly visible boundaries, so it was very hard to get lost in this area. Participants may have used their knowledge about the structure and size of Dutch residential neighborhoods in making decisions about their route. That can be considered a part of the cognitive collage of our participants that we had no control over. Acknowledgments. We gratefully acknowledge the MultimediaN project for funding our collaboration in the work-package N2 Multimodal Interaction. We very much appreciate the support of the TNO Human Factors institute, especially Mark Neerincx for his advice and encouragement.

References 1. Goodman, J., Gray, P., Khammampad, K., Brewster, S.: Using Landmarksto Support Older People in Navigation. In: Dunlop, M.D. (ed.) Mobile HCI 2004. LNCS, vol. 3160, pp. 13– 16. Springer, Heidelberg (2004) 2. May, A.J., Ross, T., Bayer, S.H., Tarkiainen, M.J.: Pedestrian navigation aids: information requirements and design implications. Personal and Ubiquitous Computing 7(6), 331–338 (2004) 3. Ross, T., May, A., Thompson, S.: The Use of Landmarks in Pedestrian Navigation Instructions and the Effects of Context. In: Dunlop, M.D. (ed.) Mobile HCI 2004. LNCS, vol. 3160, pp. 300–304. Springer, Heidelberg (2004) 4. Tolman, E.C.: Cognitive maps in rats and men. Psychological Review 55, 189–208 (1948) 5. Tversky, B.: Cognitive maps, cognitive collages, and spatial mental models. In: Spatial Information Theory A Theoretical Basis for GIS, pp. 14–24. Springer, Heidelberg (1993) 6. Tversky, B., Lee, P.U.: Pictorial and Verbal Tools for Conveying Routes. In: Freksa, C., Mark, D.M. (eds.) COSIT 1999. LNCS, vol. 1661, p. 51. Springer, Heidelberg (1999) 7. Wang, F.W., Spelke, E.S.: Human spatial representation: insights from animals. Trends in Cognitive Sciences 6(9), 376–382 (2002) 8. Wilkniss, S.M., Jones, M.G., Korol, D.L., Gold, P.E., Manning, C.A.: Age-related differences in an ecologically based study of route learning. Psychology and Aging 12(2), 372– 375 (1997)

Development of a Novel Platform for Greater Situational Awareness in the Urban Military Terrain Stephen D. Prior1, Siu-Tsen Shen2, Anthony S. White3, Siddharth Odedra1, Mehmet Karamanoglu1, Mehmet Ali Erbil1, and Tom Foran1 1

Department of Product Design and Engineering, School of Engineering and Information Sciences, Middlesex University, London N14 4YZ, United Kingdom [email protected] 2 Department of Multimedia Design, National Formosa University, 64 Wen-Hua Rd, Hu-Wei 63208, Taiwan 3 Department of Computing and Multimedia Technology, School of Engineering and Information Sciences, Middlesex University, London NW4 4BT, United Kingdom

Abstract. The conflicts in Afghanistan and Iraq and the more recent war in the Gaza Strip have emphasized the need for novel platforms which provide for greater situational awareness in the urban terrain. Without intelligent systems, which can accurately provide real-time information, collateral damage to property will result, together with unnecessary civilian deaths. This situation is exacerbated by the fact that within the next decade 75% of the world’s population will be living in urban areas. This paper outlines the current state of unmanned aerial vehicles throughout the world and presents a novel design of a multiple rotary wing platform which has great potential for both military and civilian application areas. Keywords: unmanned aerial vehicles, situational awareness, military operations, urban terrain.

1 Introduction It has been reported that within the next decade 75% of the world’s population will be living in urban areas [1, 5]. We can therefore extrapolate that future military operations will predominantly be fought out in these difficult combat zones. This has to some extent already been borne out by the conflicts in Iraq and Afghanistan, where the US and UK military have paid a high price for restoring freedom (see Table 1). The recent one-sided war in the Gaza Strip has further emphasized the need to prevent collateral damage and unnecessary loss of civilian life. Medical intervention in the combat zone during the so-called ‘golden hour’ has improved over time such that 9 out of 10 soldiers injured now survive. Any loss of life is regrettable and the more that technology can do to remove personnel from the battlefield the better. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 120–125, 2009. © Springer-Verlag Berlin Heidelberg 2009

Development of a Novel Platform for Greater Situational Awareness

121

Table 1. US, UK and Civilian Casualties in Iraq and Afghanistan [6]

Country US

UK

Civilian

Theatre Iraq Afghanistan Total Iraq Afghanistan Total Iraq Afghanistan Total

Dead 4,201 627 4,828 176 125 301 97,094 10-30,000§ 107-127,000

Wounded 43,993 4,400† 48,393† 3,294∗ 1,970∗ 5,264∗ N/A N/A N/A

†

Estimated data based on known US casualty rates in Iraq. * OP Telic and Herrick Casualty and Fatality Tables up to 15 Nov 2008 – MoD Factsheets. § Estimates of civilian deaths range from 10,000 to 30,000 for the period 2001 to 2008. Conducting Military Operations in the Urban Terrain (MOUT) is clearly more dangerous than operating in open terrain, and therefore requires greater situational awareness if casualties are to be reduced.

2 The World of Unmanned Autonomous Systems Autonomous Systems now come in all shapes and sizes, and can be categorized into the following five main segments in relation to the vertical space plane: • • • • •

Space (Unmanned Space Vehicles, USVs) Aerial (Unmanned Aerial Vehicles, UAVs) Ground (Unmanned Ground Vehicles, UGVs) Surface (Unmanned Surface Vehicles, USVs) – typically sea going vessels. Sub-surface (Unmanned Undersea Vehicles, UUVs)

In this paper we will focus mainly on the design and selection of small UAVs due to their inherent suitability for MOUT. The growth in UAVs over the last ten years has been impressive, driven in large part by the conflicts in Iraq and Afghanistan and the ongoing ‘War on Terror’. According to estimates by Frost and Sullivan, the aggregate military UAV expenditure (2003-2012) for the US and Europe is expected to be £20bn [3], with the US DoD alone forecasting a FY09 UAS procurement spend of US$2bn [2]. Probably the most reliable and up-to-date source of information relating to international UAV usage originates from the Unmanned Vehicle Systems Website and Yearbook which lists UAV activity across the international spectrum [7]. The latest data for 2008-09 lists 974 Unmanned Aircraft Systems (UAS) being developed in 49 countries throughout the world. This has increased by 104% in the last four year period.

122

S.D. Prior et al.

Of these 974 systems, 578 (60%) are classed as military, 115 (12%) are civil/commercial and 242 (25%) are dual purpose. Other categories are Developmental and Research. In terms of the 49 UAS producing countries, the US is in the lead with 341 (35%) systems, followed a long way behind by Israel 72 (7%), France 65 (7%), Russian Federation 53 (5%), UK 51 (5%) and Germany 36 (4%). The most common type of UAV remains the Fixed Wing system (71%), followed by Rotary Wing (18%), Shrouded Rotary Wing (Ducted Fan) (3%), Lighter-than-Air (3%), and then a series of other systems which include motorized parafoils, tilt rotors, flapping wings, etc. AeroVironment Inc, in the US, is at the forefront of this technology with their Wasp III fixed wing UAV system (See Fig 1 below).

Fig. 1. The WASP III Unmanned Aerial Vehicle (AeroVironment Inc., USA)

3 The Middlesex Co-Axial Tri-Rotor (HALO™) In light of a greater understanding of the problems associated with the dismounted soldier, such as the mass of any system, its endurance and performance a Preliminary Design Specification (PDS) was constructed, the key points of which are given below: Design requirements (in no particular order): • • • • • • • • • • • •

MTOW of 5kg or less System shall be capable of being backpack able (0.35 x 0.45 x 0.3 m = 47 lt) Linear Speed (0-3 m/s) Ability to hover and perch Endurance of 30-60 minutes Rate of climb in hover of 3.5 m/s Manoeuvrable in at least 4 DOF (X, Y , Z, and RZ) Ability to carry a payload of up to 2 kg (to include fuel/power source) Less than £5,000 (excluding the sensor payload) Quiet in operation (< 60 dB(A) @ 3m) GPS waypoint autonomous control Autonomous vertical take-off and landing (VTOL)

Development of a Novel Platform for Greater Situational Awareness

• • • •

123

Set-up in less than 5 minutes Turnaround in less than 10 minutes Safe operation at all times Ability to detect, identify, locate and report the four main target types: IED’s, Snipers, Technicals (4x4 armed vehicles) and Armed Combatants

Having reviewed all the alternatives, we focused on both quadrotor and tilt rotor designs due to their innovative principles and VTOL capability. Finally, we concluded that a multiple rotary winged Co-Axial Tri-Rotor UAV with a VTOL capability could be a novel solution to MOUT. We named our UAV ‘HALO™’ due to its force protection operational role. Our proposed UAV system consists of a unique Co-Axial Tri-Rotor design (UK Patent Application No. 08 108 86.2; Design Registration 4008525) which incorporates six AXI 2217/20 brushless out-runner motors, each capable of producing approx 5.6 N (570 g) of thrust at 7 Amps (6,200 RPM), connected to six GWS 1060, 3bladed props (See Fig 2). The mass of this UAV is 3.25 kg, which consists of a main system mass of 3.05 kg and an interchangeable payload of 0.2 kg. The system has the capability to increase this payload up to 2 kg if necessary depending on the required sensor package. The UAV is powered by 2 x 8,000 mAh, 14.8v Lithium-Polymer (Li-Po) batteries from MaxAmps™ in the US, which will draw a nominal current of 7 A per motor, making a total current draw per battery of 21 A. This will give a predicted minimum endurance of 23 minutes, dependent on payload and environmental conditions. The gross dimensions of this UAV are ∅0.7 m (tip to tip) x 0.3 m. The system is capable of hover and perch (it can land and still rotate its camera sensors).

Fig. 2. The Middlesex University Co-Axial Tri-Rotor Unmanned Aerial Vehicle

124

S.D. Prior et al.

3.1 The Co-axial Drive Principle Fundamental to the success of our chosen design is the co-axial drive unit, this consists of two props one mounted above the other rotating about the same axis in opposite directions and powered by separate motors. This arrangement allows the torque output of both units to be balanced thus negating the yaw moment, whilst providing considerable thrust for a small package size (See Fig 3). Co-axial props have been used on a number of commercial aircraft including the British Supermarine Spitfire and the Russian Tupolev Tu-95 with great success. An excellent book describing the benefits of the co-axial arrangement, together with the momentum theory analysis is given by J. Gordon Leishman [4].

Prop Type Size x Pitch.

and

Angular Velocity and Direction of Rotation.

Distance between props.

Fig. 3. The Co-Axial Drive Configuration

After extensive testing with many different motor and propeller combinations, it was found that each co-axial unit could produce a maximum Thrust of 19.6 N (2 kg) at 18 Amps (which is the Current capacity of the AXI motor). Momentum Theory states that the Thrust, T of the propeller is proportional to the Rotational Speed, n2 and the Diameter, D4 of the propeller: T = ρ . n2 . D4 . CT

(1) 3

Also the Power, P of the propeller is proportional to the Rotational Speed, n and the Diameter, D5 of the propeller: P = ρ . n3 . D5 . CP

(2)

Where ρ = Density of Air 1.225 kg/m3; CT = Thrust Coefficient and Cp = Power Coefficient for a given propeller.

Development of a Novel Platform for Greater Situational Awareness

125

4 Conclusion The wars in Iraq and Afghanistan have been costly in both human and monetary terms; personnel and machines are wearing out and the political fallout from injuries and deaths of civilians and military servicemen and women cannot be underestimated. Unmanned Aircraft Systems typically cost 1% of manned systems and can provide ISTAR in places where manned systems cannot go. There is a requirement for small, lightweight and agile VTOL UAVs to be developed for use in the section or company sized unit within MOUT situations which at the present time remains unfulfilled. Apart from the obvious benefits in the military context, it is the author’s belief that within the next decade we will begin to see more and more applications in the civilian field of small unmanned aerial systems which will operate semi-autonomously and eventually fully autonomously in areas such as energy conservation and monitoring, agriculture, farming and emergency service operations. Acknowledgements. The authors would like to acknowledge support from the Emerald fund via a Phase II mini grant Ref: 2M – 034, which has enabled this research to progress.

References 1. Baybrook, R.: Urban Rush hour. Armada International 30(4), 14 (2006) 2. Clapper, J., Cartwright, J., Young, J., Grimes, J.: Unmanned Systems Roadmap (20072032). Defense, Editor. Office of the Secretary of Defense, p. 188 (2007) 3. DeGarmo, M.T.: Issues Concerning Integration of Unmanned Aerial Vehicles in Civil Airspace. In: The MITRE Corporation, McLean, Virginia, p. 98 (2004) 4. Leishmann, J.G.: Principles of Helicopter Aerodynamics. In: Shyy, W., Rycroft, M.J. (eds.), 2nd edn. Cambridge Aerospace Series, p. 817. Cambridge University Press, New York (2006) 5. McDonald, G.: Bullets in the Bricks, in Military Training & Simulation News, pp. 1–8 (2004) 6. Unknown: Iraq Coalition Casualty Count. [cited] (July 2008), http://www.icasualties.org/oif/ 7. Van Blyenburgh, P.: International UAS Yearbook 2008-2009, 6th edn. Blyenburgh & Co., Paris (2008)

The User Knows: Considering the Cognitive Contribution of the User in the Design of Auditory Warnings Catherine Stevens and Agnes Petocz School of Psychology & MARCS Auditory Laboratories, University of Western Sydney, Locked Bag 1797, South Penrith DC, NSW, 1797, Australia {kj.stevens,a.petocz}@uws.edu.au

Abstract. An experiment that investigated effects of modality, warning type, and task demand on warning recognition speed and accuracy is reported. Using the experiment as a specific example, we argue for the importance of considering the cognitive contribution of the user (viz. prior learned associations) in the warning design process. Drawing on semiotics and cognitive psychology, we highlight the indexical nature of so-called auditory icons or natural indicators and argue that the cogniser is an indispensable element in the tripartite nature of signification. Keywords: Auditory warnings, Workload, Modality, Icons, Semiotics.

1 Introduction Caricatures of everyday sounds [1,2,3] have been considered as auditory icons. However, as we have argued elsewhere [4] the term icon, meaning likeness or image, while having straightforward application in the visual domain cannot be applied in a straightforward manner in the auditory domain. In audition, there are few true auditory icons – that is, where one sound is used to stand for another sound by virtue of resemblance between the two sounds. In the language of semiotics [5] what has been termed an auditory icon is more correctly an index. For clarity, we refer to what have been called auditory icons as natural indicators where these natural indicators have been adopted or adapted for purposes of conventional indication [4]. Abstract warnings where there is no prior systematic relation between signal and event are termed symbolic indicators (following Peirce) as their association is determined purely by convention. In operational environments where the visual display is often very complex, auditory natural indicators have the potential to be effective warnings because they can be short, are not easily masked, are distinct from speech signals, can be used where the visual display is at risk of visual information overload, and when the critical event does not make a sound [3,6]. Reaction times are faster in response to auditory natural indicators compared with tonal and speech warnings [7] and auditory symbolic indicators [8,9]. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 126–135, 2009. © Springer-Verlag Berlin Heidelberg 2009

The User Knows: Considering the Cognitive Contribution of the User

127

There are two classes of natural indicators [4]. The first involves natural indicators that have been adopted to indicate cause or a correlated object/event. These can include indicators made by humanly manufactured objects (e.g., the sound of a car) but which may nevertheless be considered part of the environment of natural indicators (i.e., the environment into which humans are born). For example, the sound of a car failing to start is correlated with running out of fuel. The second class consists of those natural indicators that have been adapted to exploit naturally occurring shared features (particularly similarity of form or function) between what they naturally indicate and the selected target. For example, the sound of a sink draining (a whirpool form) may be used to signal “tornado” [3]. Drawing upon semiotics and psychology, we argued earlier [4] that, because signification is a tripartite relation between signal, referent, and person/cogniser, auditory warning designers cannot afford to neglect the cogniser as an indispensable element. Indeed, the cognitive contribution of the user (viz. prior learned associations) appears to be the most significant factor in determining the effectiveness of warnings. The cognitive processes involved in recognizing a natural indicator typically include recognition of the source of the sound or image and the activation of longterm memory. Symbolic indicators, on the other hand, need to be learned from the outset. For example, sound bursts with a unique set of frequencies and pause durations yielding a novel and unique pitch and temporal pattern. Features of such a sound or image need to be extracted and become associated with a particular critical event. The association can be strengthened with repeated exposure and directive training. In the context of warning design, both natural and symbolic forms of indication need to be learned but natural indicators have been learned previously and/or exploit some causal, correlational, or similarity of form or function relation. Symbolic indicators, on the other hand, are abstract and learned within an experimental session or within the idiosyncrasies of a particular operational system. As we have noted [4], it is a truism that connections that have already been (at least partially) learned will be fully learned more easily than connections that have not been previously learned. The purpose of comparing natural and symbolic indicators here is to investigate whether this is the case when the warning set is small (i.e., within the capacity of adult working memory), and in both visual and auditory form. If warnings are natural indicators for events that they signal, then, during training, accuracy should be greater and reaction time faster in response to natural indicators than to symbolic indicators. This should be the case for warnings presented in either the visual or the auditory modality. We would expect an advantage for natural indicators to hold not just for the learning phase, but also for a test phase in which there are additional and competing demands, as in a dual task. 1.1 Recognition of Warnings During High Workload In studies of the effectiveness of different types of warning signals, task demand or workload is rarely investigated. Task demand is a crucial variable for the application and generalisability of results to real-world settings. An assumption, for example, that natural indicators are recognised more often and more quickly than symbolic

128

C. Stevens and A. Petocz

indicators when task demand is low and the operator is unstressed does not necessarily predict their efficacy during time pressured and/or critical situations. When critical incidents occur there are often many situations including alarms that need attention. To begin to investigate the recognition of natural indicators under more demanding conditions, in the present experiment, participants performed dual tasks – concurrent arithmetic calculations and warning recognition – with a systematic increase in the difficulty of the arithmetic task. Thus we investigate the effect of task demand on warning recognition speed and accuracy. 1.2 Aim, Design, and Hypotheses The aim of the experiment was to investigate the effect of indicator type, modality and task demand on warning recognition speed and accuracy. The experiment consisted of a 2x2x2 factorial design: modality (auditory, visual), indicator (symbolic, natural), and task demand (high, low) with repeated measures on the latter two factors. The dependent variables were warning recognition accuracy and reaction time during learning and dual-task test phases. It was hypothesized that i) in both auditory and visual modalities, natural indicators compared with symbolic indicators elicit greater accuracy and faster reaction time during the learning phase; and ii) in both auditory and visual modalities, natural indicators compared with symbolic indicators elicit greater accuracy and faster reaction time during low and high demand conditions of the dual-task test phase.

2 Method 2.1 Participants Forty adult participants (35 females and 5 males), from the University of Western Sydney took part in the study for which they received course credit. The mean age of the sample was 21.08 years, (SD=5.27, range 18-42 years). Twenty participants were presented with auditory signals and 20 with visual signals. During the test phase, 10 participants completed the low demand dual task first followed by high demand, another 10 participants completed the high demand dual task first, and 10 participants were presented initially with blocks of natural indicators followed by blocks of symbolic indicators, and 10 participants completed symbolic indicator blocks before natural indicator blocks. All participants had normal hearing and normal or correctedto-normal vision. 2.2 Stimuli The auditory and visual natural indicators used in the experiment were rated in a separate stimulus selection task as being highly related (means of 3.77-4.33 out of 5) with specified critical events. Ratings of association between symbolic indicators and events with which they were arbitrarily paired were low with a mean association rating of 1.56 (SD=.72) on a scale from totally unrelated (1) to highly related (5).

The User Knows: Considering the Cognitive Contribution of the User

129

Auditory Natural Indicators. The auditory natural indicators were obtained from the websites www.sounddogs.com and www.findsounds.com. The four natural indicators were from the first class, that is, adopted to indicate cause or a correlated object/event. For example, the sound of coughing is correlated with an excess of a dangerous gas such as carbon monoxide. All sampled everyday sounds were 1 s, 16-bit mono, and standardized to a sample rate of 44.1 kHz, with normalized amplitude. Descriptions are given in Table 1. Table 1. Visual and auditory natural indicators used as stimuli Event Low fuel Carbon monoxide Ground proximity Engine fire

Visual Natural Indicator Description Petrol pump

Visual Natural Indicator

Auditory Natural Indicator Description Car failing to start

Skull and crossbones

Coughing

Plane diving into mountain Fire extinguisher

Explosion Fire engine siren

Auditory Symbolic Indicators. The auditory symbolic indicators consisted of tones with normalized amplitude, 44.1 kHz sample rate, 16-bit resolution, mono, and were all 1 s in duration. The sounds were designed based on guidelines set by Patterson [10] in which a burst of sound is first created and repeatedly played over the duration of the signal. Each sound burst had its own set of frequencies and pause durations, giving each warning a unique pitch and temporal pattern. The duration of the bursts varied from 0.19 s to 1 s. The tones and upper harmonics of periodic sounds were selected from frequencies in the range 150–3000 Hz. The auditory symbolic indicators stood in no obvious relation to the events with which they were paired. Visual Natural Indicators. The visual natural indicators were obtained from the website www.clipart.com and depicted in Table 1. The visual natural indicators were designed to be similar to their auditory counterparts in the way that they related to their targets via causal/correlational based indication. For example, the image of a petrol pump was not used to indicate "petrol pump", which would have been a purely icon-based relation; instead, it was used to indicate something associated with a petrol pump (low fuel). All clipart images were shown in black and white. Visual Symbolic Indicators. To ensure that image complexity was comparable across the set of visual natural and symbolic indicators, the visual symbolic indicators were obtained by enlarging a small section of visual images from clipart, other than those being used as visual natural indicators. All symbolic indicators were selected on the basis that there was no obvious relationship between the indicator and the event with which it was paired. All images were shown in black and white. Visual symbolic indicators for each of the four events are shown in Table 2.

130

C. Stevens and A. Petocz Table 2. Visual symbolic indicators used as stimuli Event Low fuel

Visual Symbolic Indicator

Carbon monoxide Ground proximity Engine fire

Four critical aviation events that could potentially lead to an accident were selected. Symbolic auditory and visual indicators and natural auditory and visual indicators were designed for each of these. The critical events were presented as ‘clickable’ buttons on a computer screen, equidistant from one another. Warnings were presented either visually at the top of the computer screen for 1000 ms or auditorily through headphones, also lasting for 1000 ms. A mathematical addition task, presented visually and concurrently on an adjacent computer, was constructed in the form of low and high demand conditions. The low demand version consisted of three numbers, all less than five (e.g., 1+2+3), presented in the middle of the computer screen. The high demand task consisted of two double figure numbers (e.g., 26+49). Participants were required to mentally add the numbers together and then say the answer aloud. The addition task was displayed on the screen for 2000 ms. Both low and high demand conditions consisted of a total of 16 additions. Warnings were presented intermittently with approximately four addition tasks presented to the occurrence of one warning. 2.3 Equipment The experiment was programmed in PowerLaboratory version 1.0.3, and presented to participants on a Macintosh iBook G4. The concurrent addition task was programmed in SuperLab Pro 1.74 and presented to participants on a Macintosh Power Book G4. Auditory warnings were played through Koss stereo headphones. 2.4 Procedure The procedure was approved by the University of Western Sydney Human Research Ethics Committee. An information sheet was distributed and participants provided written consent before the experiment began. Participants were randomly assigned to either the visual or the auditory condition and were provided with some context for each of the warnings by reading a Critical Aviation Events Information sheet. For example, “Carbon monoxide is a colorless, odorless gas that can be produced through the burning of fossil fuels. If sufficient levels of carbon monoxide enter the cockpit, the pilot can be rendered unconscious”. Participants were randomly assigned to the indicators (symbolic or natural) that they would be exposed to and learn first. They were trained on the relation between each of the warnings and the corresponding event, then tested on the warning-event relation, until they had had a total of 16 presentations (two presentations of each of the four signals). Corrective feedback was

The User Knows: Considering the Cognitive Contribution of the User

131

given after each response. This design contrasts with previous studies wherein participants have been trained to a certain criterion level of performance, for example [3,6,12]. We adopt a human factors approach involving minimal training and a stimulus set that should not exceed the capacity of adult working memory. In the test phase, participants were required to perform a visual addition task in which numbers appeared briefly on the screen and they were required to add the numbers together as fast and as accurately as possible, while still responding to the warnings when they were presented. The visual addition task was presented through an adjacent computer, requiring that participants divide their attention between two computers. A second phase of the experiment (comprising both learning and test phases) involved the same procedure using a new set of signals from the warning type not yet tested. The experiment took 25-30 minutes.

3 Results Descriptive statistics relating to arithmetic accuracy in the test phase are shown in Table 3. Warning recognition accuracy and reaction times on correct responses in learning and test phases are displayed in Tables 4, and 5, respectively. Table 3. Mean accuracy on the arithmetic task in the dual task test phase (max. = 16)

Indicator Natural

Symbolic

Modality Auditory Visual Total Auditory Visual Total

Low Demand M SD 13.75 2.05 14.75 1.07 14.25 1.69 14.30 1.84 14.95 1.05 14.63 1.51

High Demand M SD 5.60 4.64 7.25 4.34 6.43 4.51 6.00 4.35 7.00 3.99 6.50 4.15

As a manipulation check, performance on the high and low demand versions of the concurrent arithmetic task was computed. There was a main effect of task demand, F(1,38)=200.58, p<.05, Cohen’s d=2.44, with a significant difference between arithmetic scores in the high demand (M=6.46, SD=4.33) and low demand (M=14.44, SD=1.60) conditions. There was no main effect of modality or indicator on arithmetic scores and no interactions between modality, indicator, or task demand factors. Table 4. Mean warning recognition accuracy in the learning phase (max. = 16) and low and high demand test phases (max. = 4)

Indicator Natural

Symbolic

Modality Auditory Visual Total Auditory Visual Total

Learning Phase M SD 14.65 1.79 15.65 0.59 15.15 1.41 10.45 2.84 11.65 4.42 11.05 3.71

Low Demand M SD 3.45 0.83 3.45 0.69 3.45 0.75 2.60 1.23 2.40 0.99 2.50 1.11

High Demand M SD 3.65 0.59 3.30 0.92 3.48 0.78 2.35 1.04 2.75 0.97 2.55 1.01

132

C. Stevens and A. Petocz

Table 5. Mean warning recognition reaction times (ms) in the learning phase and low and high demand test phases – correct responses only

Indicator Natural

Symbolic

Modality Auditory Visual Total Auditory Visual Total

Learning Phase M SD 4253 357 4023 407 4138 396 4683 639 4406 743 4544 698

Low Demand M SD 2686 619 2167 640 2427 675 2353 674 2104 722 2228 701

High Demand M SD 2119 332 2261 743 2190 572 2736 951 2398 714 2567 848

3.1 Learning Phase It was hypothesized that natural indicators compared with symbolic indicators elicit greater accuracy during the learning phase. Significantly greater accuracy was recorded during the learning phase in response to natural indicators (M=15.15, SD=1.41) than to symbolic indicators (M=11.05, SD=3.71), F(1,38)=47.04, p<.05, Cohen’s d=1.46. There was no main effect of modality (Auditory M=12.55, SD=2.32, Visual M=13.65, SD=2.15) and no modality x indicator interaction. It was hypothesized that natural indicators compared with symbolic indicators elicit faster reaction time during the learning phase. Reaction times (RT) of correct responses in the learning phase were significantly faster in response to natural indicators than to symbolic indicators, F(1,38)=16.36, p<.05, Cohen’s d=0.72; natural indicators M=4138.10 ms (SD=395.80) and symbolic indicators M=4544.40 ms, (SD=698.31). There was no main effect of modality (Auditory M=4469.08, SD=498.13, Visual M=4214.42, SD=575.31) and no modality x indicator interaction. 3.2 Test Phase The second hypothesis was that natural indicators compared with symbolic indicators elicit greater accuracy during high and low demand conditions of the dual-task test phase. In the high demand condition, accuracy was significantly greater in response to natural indicators than to symbolic indicators in both the auditory modality, F(1,19)=24.51, p<.05, Cohen’s d=1.54 and visual modality, F(1,19)=5.49, p<.05, Cohen’s d=0.58. Similarly, in the low demand condition, accuracy was significantly greater in response to natural indicators than to symbolic indicators in both the auditory, F(1,19)=7.51, p<.05, d=0.81, and visual modalities, F(1,19)=12.72, p<.05, d=1.23. There was a significant interaction in recognition accuracy between modality, indicator, and task demand, F(1,38)=6.36, p<.05. It was hypothesized that natural indicators compared with symbolic indicators elicit faster reaction times during high and low demand conditions of the dual-task test phase. In the high demand condition, RTs were significantly faster in response to natural indicators than to symbolic indicators in the auditory modality, F(1,19)=8.25, p<.05, Cohen’s d=0.87, but there was no significant difference between natural and symbolic indicator RTs in the visual modality. In the low demand condition and opposing the directional hypothesis, RTs were significantly faster in response to symbolic indicators than to natural indicators in the auditory modality, F(1,19)=4.73,

The User Knows: Considering the Cognitive Contribution of the User

133

p<.05, d=0.51. There was no difference between natural and symbolic indicators with respect to RT in the low demand, visual modality condition. There were three significant interactions involving RT: modality x task demand, F(1,38)=4.38, p<.05; indicator x task demand, F(1,38)=10.08, p<.05; and modality x indicator x task demand, F(1,38)=4.28, p<.05.

4 Discussion This experiment investigated the effect of indicator type, modality of presentation and task demand on warning recognition speed and accuracy during training and dual task test phases. As hypothesized, during the learning phase, natural indicators (or caricatures of everyday sounds and objects) relative to symbolic indicators elicited greater recognition and faster RTs. This pattern of responding was upheld in test phase conditions with respect to accuracy. The pattern was observed in RT only in the test phase involving high demand and the auditory modality. Results from the learning phase provide support for the general hypothesis that in learning signal-event relations there is an advantage for those natural indicators that have been previously learned and are now being exploited for indication [4]. This pattern occurs even when a set of just four warnings is used, corroborating findings of others [6,8,9,11,12]. Natural indicators elicit recognition of the source of the sound or referent of the image, and activate associations in long-term memory. Symbolic signal-event relations need to be learned within an operational context. While RTs in the present experiment were relatively slow in response to warnings during training, they improved in the dual-task test phase. The expected pattern of results has been obtained in accuracy but not fully in RT scores of the test phase. Natural indicators elicited significantly faster RTs in the high demand task with warnings in the auditory modality but not in the visual modality and not in the low demand condition. One explanation derives from scrutiny of the concurrent task arithmetic scores and an apparent weighting of tasks by participants. The mean arithmetic score in the low demand condition was approximately 14 out of 16 whereas in the high demand condition the maximum mean was around 7 out of 16. Participants performed well on simple addition tasks to the detriment of speed in recognizing auditory natural indicators. That this occurred only in the auditory modality may be attributable to the greater degree of possible ambiguity in the auditory natural stimuli. For example, the sound of coughing was used as a natural auditory indicator for carbon monoxide. However, it is a natural indicator of many other situations such as a person with a cold, a person choking, a person smoking heavily or (significantly for the present context) that there is a(n engine) fire. The same can be said for the sound of a car failing to start and the sound of an explosion. Of the four auditory natural indicators used in the experiment, the fire engine siren is perhaps the only one that is as closely connected to its target and as unconfounded with other possible connections as is its visual counterpart (the image of a fire extinguisher). In contrast, the visual natural indicator of a petrol pump is a typical fuel indicator in motor vehicles, and the skull and crossbones are a familiar indicator of poison. Visual natural indicators thus may already be better learned at the start of the experiment than are auditory natural indicators, and may also be more distinctive in

134

C. Stevens and A. Petocz

the sense of being less liable to confounding with other prior learned associations. Of course, there are several other factors which may have contributed to the visual advantage (e.g., temporal differences in stimulus registration). However, it is clear from other research [4] that, in general, the cognitive contribution (viz. prior learning) of participants and users must also be considered in warning design. In the high demand condition, participants performed poorly on the mental arithmetic, possibly guessing, and leaving more cognitive resources for relatively fast warning recognition. The symbolic auditory indicators under low demand elicited more optimal performance with good arithmetic accuracy and, relative to the natural indicators, faster RTs. However, under conditions of high demand, arithmetic was again poor and auditory symbolic indicators were recognized slowly. The use of a demanding concurrent task has brought into relief the potentially complex and operationally important interaction between task load and indicator type. A further advantage of the auditory modality may manifest when coupled with a visually-presented arithmetic task. However, the present results do not suggest interference from visually-presented indicators when the learning phase visual versus auditory modality accuracy scores are compared with test phase visual versus auditory modality scores. Similarly, in arithmetic scores there was no significant effect of modality. The present experiment was designed specifically to contrast the design used by Perry et al [12] and, in the spirit of a human factors approach, to keep training to a minimum. Thus, rather than training all participants to 100% criterion level of performance as we have done in the past, all participants were exposed to a set number of training trials. The present results suggest that training to a criterion level of performance may be important especially in the case of symbolic indicators. While there was no main effect of task demand in the test phase, demand did interact with modality in both indicator recognition and RT scores. This provides evidence of the need to examine the setting in which warnings will be used not only from the perspective of ambient noise, potential maskers, and the complexity of existing auditory and visual displays, but also the nature of the operational task and the load that it incurs. There is need also for effects of demand and indicator to be investigated in settings that approach the operational context such as an Advanced Aviation Training Device (AATD). The importance of context in warning design is underscored by the present results. The artificial environment is always embedded, both physically and psychologically, in the natural environment that includes the learned associations of the user. Not only is it a truism that an already-learned association will be more easily and quickly learned than one which has not yet been learned, but it is also true that if the referent is just one of a large set of equally salient associations, such that learning the connection requires the learner to "unlearn" or ignore those other meanings, then a natural indicator may actually be less effective than a newly designed symbolic indicator that is free of "excess baggage". Thus, while exploiting natural indicators is a good idea, there are advantages and disadvantages. On the other hand, the user is likely to bring some association even to symbolic connections. In the present experiment, ratings of symbolic associations were typically higher than zero. These observations suggest that the cognitive contribution of the user is not just one among many equally salient and cumulatively contributing factors to be taken

The User Knows: Considering the Cognitive Contribution of the User

135

into account for the design of warnings. Instead, it infiltrates other factors which are often treated independently as in perceived stimulus complexity, meaning, semantic distance, perceived aesthetic appeal, and so on.

References 1. Ballas, J.A.: Common factors in the identification of an assortment of brief everyday sounds. J. Exp. Psychol: Human Perception and Performance 19, 250–267 (1993) 2. Gaver, W.W.: The Sonic Finder: An interface using auditory icons. In: Human-Computer Interaction, vol. 4, pp. 67–94 (1989) 3. Keller, P., Stevens, C.: Meaning from environmental sounds: Types of signal-referent relations and their effect on recognizing auditory icons. J. Exp. Psychol: App. 10, 3–12 (2004) 4. Petocz, A., Keller, P., Stevens, C.: Auditory warnings, signal-referent relations and natural indicators: re-thinking theory and application. J. Exp. Psychol: Applied 14, 165–178 (2008) 5. Peirce, C.S.: In: Hartshorne, C., Weiss, P. (eds.) Collected Papers of Charles Sanders Peirce, vol. II. Harvard University Press, Cambridge (1932/1960) 6. Stephan, K.L., Smith, S.E., Martin, R.L., Parker, S.P.A., McAnally, K.: Learning and retention of associations between auditory icons and denotative referents: implications for the design of auditory warnings. Human Factors 48, 288–299 (2006) 7. Graham, R.: Use of auditory icons as emergency warnings: Evaluation within a vehicle collision avoidance application. Ergonomics 42, 1233–1248 (1999) 8. Belz, S.M., Robinson, G.S., Casali, J.G.: A new class of auditory warning signals for complex systems: Auditory icons. Human Factors 41, 608–618 (1999) 9. McKeown, D., Isherwood, S.: Mapping candidate within-vehicle auditory displays to their referents. Human Factors 49, 417–428 (2007) 10. Patterson, R.D.: Guidelines for Auditory Warning Systems on Civil Aircraft (Paper No. 82017). Civil Aviation Authority, London (1982) 11. Gaver, W.W., Smith, R.B., O’Shea, T.: Effective sounds in complex systems: The ARKola simulation. In: Conf. on Human Factors in Computing Systems (CHI 1991), pp. 85–90. Association for Computing Machinery, New York (1991) 12. Perry, N., Stevens, C., Wiggins, M., Howell, C.: Cough once for danger: An experimental investigation of auditory icons as informative warning signals in civil aviation. Human Factors 49, 1061–1071 (2007)

The Influence of Gender and Age on the Visual Codes Working Memory and the Display Duration – A Case Study of Fencers Chih-Lin Chang1, Kai-Way Li2, Yung-Tsan Jou3,*, Hsu-Chang Pan4, and Tai-Yen Hsu5 1

General Education Center, Hsiu Ping Institute of Technology, Taiwan Institute of Technology management, Chung Hua University 2 Department of Industrial Engineering & System Management, Chung Hua University Hsin-Chiu City, Taiwan 3 Department of Industrial and Systems Engineering, Chung Yuan Christian University Chung-Li City, Taiwan 4 Hsiu Ping Institute of Technology, Taichung City, Taiwan 5 Physical Education, National Taichung University, Taichung City, Taiwan [email protected], [email protected], [email protected]@.ntcu.edu.tw, [email protected]

Abstract. This research discusses the influence of code colors and duration of display to the corrective rate of visual codes working memory and the Critical Fusion Frequency (CFF) value in different gender and two age groups (high school and university). The results showed that gender has a certain effect upon the corrective rate of the visual codes working memory and on the CFF ratios. Moreover, female fencers’ CFF ratios were higher than the male fencers’. The high school participants’ CFF ratios were obviously higher than the college participants’. The color of display has a significant effect on the corrective rate of visual codes working memory. Evidence showed that the duration time of display affected the corrective rate of the visual codes working memory. The code duration of display of 0.3 second had the highest corrective rate of visual codes working memory. Keywords: duration of display, visual codes working memory, Critical Fusion Frequency (CFF) value, and color of display.

1 Introduction Most kinds of sports have a great relationship with speed, strength, agility, and handeye coordination. The performance of speed, agility, and hand-eye coordination has a lot of relevance with vision organ – eyes. Among important factors affecting the performance of sports players, whether to focus their attention on becomes the key to victory or defeat [1]. Attention refers to pay full attention on a definite target; in particular, those sports which require accuracy and rapid response demand a high degree of attention [2]. Excellent attention can rule out unnecessary interference and D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 139–148, 2009. © Springer-Verlag Berlin Heidelberg 2009

140

C.-L. Chang et al.

stimulus, and quickly focus on specific objectives, so as to develop out speed and techniques. Literatures show that the performance of athletes is related to vision ability ([3] and [4]) since the vision capacity will affect the reception of messages of the external environment as well as reaction time. A fast-moving sport which is unnecessary to track fast-moving objects has higher vision capacity; furthermore the first-class players are superior to other level players [5]. The quality of the vision capacity has larger impact on the performance of sport [6]. Studies have shown that sports vision plays an important role on the performance of sport. Some studies also pointed out that the vision skills of athletes are better than non-athletes, [7], [8], [9], [10], and [11], and it is an undisputed fact that a good sport vision can help elite athletes reach peak state, and enhancement of vision ability of the athletes can help sports performance. Vision capacity related to sport is such as to distinguish two different points and lines or the visual acuity to clearly detect the structure of the object; the contrast sensitivity – the ability to differentiate objects and background brightness. After image appears, through visual experience, conduction, and finally the formation of the brain process of perception, it is determined by visual reaction time, referred to as perceptual speed, for human being who is averaged at about 0.2 second [12]. Literature found that fencing players can not change the direction of movement within 0.25 seconds [13], so a good fencer must have good predictability. Predictability is an ability of keen observation, formed by immediate integration, analysis, determination, which is also a kind of early response capacity, based on distance perception, time difference, the feeling of the sword and a wealth of game experience. Rapidness, accuracy, adaptability, and predictability are the characteristics of sport fencing [13], and the basic elements of predictability are sensing the subtlety, having an insight into chances in advance, which depend on the most important human sensibility - the eyes to carry out its work. Spontaneous change of the fencing action requires more than 0.25 seconds, while the average perception rate is at least 0.2 seconds, leaving only 0.05 seconds of the judge time, therefore how to shorten the time of perception rate, enhance the correct rate of predictability is a very important question. In 1989, Japan used "Acuvision 1000-type" system to implement the training for the women's volleyball players on the items of eye-hand coordination, eye movement, and visual concentration [5]. The study found that enhancement of sport vision ability can help enhance sport performance; and code visual recognition exercise capacity is one of the training courses for sport vision ability. Under athletes training theory, gender and age will have their differences, and men and women players are different at speed, muscle strength, physiological cardiopulmonary endurance, so as the intensity of the load. Different psychological pressure, so as to the performances of the attention, enthusiasm, self-confidence, perception efficiency, therefore sport performance will be different. This study takes self-develop code working (instant) memory capacity detection software" as a research tool, uses visual codes working memory capacity to explore the correct rate for the fencers of different gender and age to identify the code working memory capacity, and then uses visual flicker fusion threshold (CFF) detector to detect the change of CFF values after continuous 15 minutes for the visual codes of working memory. It is expected that through inspecting visual code working memory capacity and the detection of CFF values, the differences between different

The Influence of Gender and Age on the Visual Codes Working Memory

141

gender and age groups are understood, provided to the coaches and fencers as reference for vision capacity training, helping them to enhance sports performance.

2 Research Methods 2.1 The Subjects A total of eight subjects, with the average height 169.5 ± 8.0 cm, weight 61.6 ± 10.8 kg, and age of 19 ± 2 years old, have learned and exercised fencing for 3.4 ± 1.5 years, among whom are four high school students, aged from 16.5 to 18.3 years old (M = 17.2 years), and four college players, aged from 19.8 to 20.8 (M = 20.4 years); four male players (M = 19.8 years), and four female players (M = 17.8 years). Each player has accepted fencing training for at least 2 years, all right-handed, no eye diseases, with vision of more than 0.9 degree after correction. 2.2 Experiment Design (1) Instrument and inspection software configuration Visual flicker fusion threshold detection is based on Model 12021 Flicker Fusion System made by Lafayette Co. Ltd. of the United States, adopting descending method to detect flicker threshold. The detection tool for capability of working memory for codes is based on Microsoft's PowerPoint 7.0 to write software which is installed on the same four host computers with the same brand and model 17-inch screen as inspection display. The main functions of inspection software for working memory for codes include: control of the screen background color, font color, font size, font selection, code number, show time, the number of test questions and the total testing time, etc. For each parameter setting in experiment design, please refer to Table 1. Experimental inspection of environmental luminance is based on digital measurement VDT to measure the focal point, which is also the position the code appears, whose environmental illumination is set to 300 lx. (2) The experimental variables The experiment adopts repeated measures design, with the independent variables as gender, age (to be divided into high school and college groups), code display duration (set to 0.1s, 0.2s or 0.3s, the 3 levels), display color (blue, green and yellow, the 3 colors). There are 2 dependent variables, respectively the visual flicker fusion threshold (CFF) and the correct rate. The correct rate refers to the ratio of the correct question number and the total question number when the subject takes inspection for working memory for codes. (3) Site design For the arrangement for the four experimental computers, please refer to Figure 1. Subject to only one CFF detector, the subjects are randomly assigned one after another, with the experiment carried out orderly. Illumination comes from the ceiling fluorescent light source and four halogen dimmers with the independent adjustor. Four halogens are respectively placed at the corner of 3 meters behind the subject, one meter high from the ground (see Figure 1). The irradiation angle of light source should be avoided to generate reflection upon VDT, with the environment illumination level controlled at 300 lx.

142

C.-L. Chang et al. Table 1. Configuration parameters for working memory inspection VDT pixels 1152×864

Background Code Font Display color White

color

size

Black 24

duration

Display

Code

color number

0.1

Blue

0.2

Green

0.3

Yellow

6

Font

Arial

Inspection duration 15 Minutes

Fig. 1. Subjects’ experimental layout

(4) Experimental procedures Before the experiment, the subjects are first told the experiment purpose, notes which should be paid attention to, and operation method. Each subject has a 10-question practice for 2 times before inspection. In this experimental design, the average number of question items every minute is 10 questions, with the inspection duration of total 15 minutes. The eight subjects are randomly assigned to 4 computers, with 3 levels of display duration and 3 kinds of display colors, totaling 9 experimental combinations in this experiment design. Each subject is respectively taken 3 CFF detections before and after experimental inspection, and taken the average. During inspection, record simultaneously the total number of questions for each experimental code memory and the correct answers, and calculate the correct rate. Each inspection duration takes 15 minutes, and 5 minutes rest with eyes closed after each inspection, plus 3 times respectively, pre- and post- experimental CFF detections which are expected to take about 10 minutes. Thus an average composition of each experiment takes about 30 minutes, and the subject takes about 4 hours 30 minutes to complete the 9 experimental combinations. CFF and the correct rate inspection for working memory for visual codes are as follows:

The Influence of Gender and Age on the Visual Codes Working Memory

143

a. The correct rate inspection for working memory for visual codes After setting the environmental illumination, enter the inspection software, orderly accomplish the settings for working memory for codes such as the screen resolution, background color, font color, font size, display duration, the number of codes, fonts, inspection duration, etc., referring to Table 1. This experiment takes display duration, display color, gender and age as independent variables, among which the age is divided into college and high school groups according to enrollment. After the subject is ready in place, press "Enter" key to begin answering. Working memory for codes takes Arabic numerals to randomly display a 6-digit number at the focal point of VDT. The display duration has 3 levels respectively 0.1s, 0.2s, 0.3s in this experiment. After the end of display duration, the subject inputs VDT codes appeared according to experimental requirement, and it is considered correct only when all 6 codes are same and in same order. After completion of answer, press “Enter” again according to computer prompt to continue the next item, until 15 minutes when the inspection software turns off automatically. In the mean time, the system records the total number of items the subject has answered and the correction rate. b. CFF threshold inspection Use the flicker fusion instrument Model 12021 made by the company Lafayette of the United States, to inspect CFF threshold value. CFF value is measured 3 times respectively before and after each 15 minutes inspection for working memory for visual codes, and taken the average. CFF threshold value measurement takes descending method, with flickering frequency of 1-100 Hz, and the descending frequency of the light source flash at 0.5 Hz per second. The subject looks straight into the continuous light source until flickering is felt at the light source, then quickly press recording switch button. The critical point from the continuous light to the blinking light is called the flicker fusion threshold.

3 Results and Discussions This study summarizes the conclusions of relevant VDT code color and background color combinations domestically and overseas. The combinations of code color and background color will affect the performance of user’s computer operation, and improper application of color combination could not let users quickly understand the information content presented on the screen. According to the result of Tinker and Paterson [14] study, there is optimal reading speed for black-text-white-background on VDT display, and from the study by Zhu and Cao [15] about background and code color on VDT display, it has worse recognition effect when the brightness contrast is too small. Therefore in this study the background and code color uses back against a white background. (1)

Effect of gender, age, display duration and code color on the correct rate of working memory. It is discovered from the inspection results of effect of gender, age, display duration and the code color on the correct rate of working memory, that men and women have statistical significance (p< .05), see Table 2. Female students have higher correct rate

144

C.-L. Chang et al.

performance of working memory than male students, which is contradictory to the conclusion by Long and Johnson [16]. Long and Johnson's study shows that men's dynamic visual acuity is better than women, but results of this study show that women have higher correct rate for recognizing codes than men. Good dynamic visual acuity can better clearly identify the mobile codes, which leads to better capacity of code working memory; therefore there should be higher identification correct rate. That the reason for this is whether the 15 minutes experiment concentration duration is too long, thus affecting the correct rate for the subject to identify, or due to gender or individual difference, there is an optimal work time, that is to be further explored. According to Ishigaki and Miyao study [17], it indicated that dynamic visual acuity for 6-20 year-old men and women has not reached statistically significant difference, which is inconsistent to the result in this study that gender showed significant difference. In this study, subjects from high school group are 17 to 18 years old (M = 17.2 years), college group are 20-21 (M = 20.4 years), whose ages are about between 6-20 years old. Study results show that age does not have significant difference with the correct rate for identifying codes, which is in line with the Ishigaki and Miyao [17] findings that gender has no significant differences with the correct rate. In aspect of code color, it reaches a significant level (p< .05). It is shown that when the background color is white, and the code color is yellow, the working memory accuracy is lower than the blue and green code color. When fencers are in game, averagely there are about 3 meters in distance, which has slight increase or decrease depending on the heights, speed and response capacity of the two fencers, but little in error. Fencers will choose a safe distance that they consider they can effectively defend against attacks, and the optimal distance that they can effectively perform their own scoring attacks. In fencing, the two parties are bound to appear sudden attack or parry actions, so at any time the fencers should maintain a high degree of attention at the opponent's tip, whose purpose is to foresee their possible threat, so as to improve the correct rate of response actions, and achieve the purpose of scoring. Fencing competition rule stipulates that, measuring from the tip, the weapon should be coated at least 12cm long using insulating plaster (in fencing competition it normally uses color blue, green or yellow), which is the right target that the opponent’s eyes are focusing. Whether the conclusion of this study can be applied that yellow has the lowest code recognition accuracy, so that yellow insulating plaster is chosen to coat the 12cm part of the tip, with the white clothes of the fencers as the background color, and so that it can be achieved to interfere the opponent’s visual recognition accuracy, that is left to the future experimental observation. This study shows that time to the correct rate of code working memory has also reached the significant level (p< .05). However, the interactions among four independent variables of gender, age, code color, and display duration do not reach significant level (p> .05), refer to Table 1. The accuracy for the code display duration of 0.3 seconds is averagely 68.30 percent, and 53.97% and 60.8% respectively for the code display duration of 0.1 seconds and 0.2 seconds. It is indicated from these codes identification accuracy that 0.3 seconds is only the former part of the average response time in game, whose recognition accuracy is near about 70% (> 68.30%), and how to effectively take advantage of, and shorten the reaction time to 0.2 second or even to 0.1 second, so that it can help to observe movements in order to correctly and effectively implement techniques [13].

The Influence of Gender and Age on the Visual Codes Working Memory

145

Table 2. The analytical summary table of factors difference for the correct rate of visual code working memory Sources of variation Gender

Type III Sum of Square 7343.35

Age

476.15

df

MSS

F

p

1

7343.35

51.20

.000

1

476.15

3.32

.075

Display 2007.82 2 1003.91 7.00 .002 Color Display 1742.13 2 871.07 6.07 .005 Duration *p .05 *dependent variable: the correct rate (visual code working memory)

Post-hoc

＞male Blue, green> female

yellow

>

0.3 0.2, 0.1

With the assumption that visual code working memory accuracy unchanged, when display duration reduces, the perception effect should augment accordingly. It was previously discussed that the spontaneous movement change for fencing sport requires at least at least 0.25 seconds, while the average perception rate requires 0.2 seconds, coupled with the at least 0.05 seconds of judging time, therefore that how to shorten the perception time and enhance the accuracy of foreseen capacity is a very important issue. That how to strengthen the visual code working memory capacity, to provide more adequate time for visual observation, will contribute to the success rate of the tactical and technical implementation. (2)

The differences of visual code working CFF with regard to gender, age, and display duration Study found that when inspecting visual code working memory, visual flicker fusion threshold reaches the significant level (p< .05). After the experiment, the average CFF values for female students are higher than those for male students. Is this because the 15 minutes experimental concentration time is too long, which was stated previously for the working memory accuracy? Female subjects might be able to focus on certain object for long time, therefore there is higher recognition accuracy for the female subjects, and visual fatigue is more easily caused for them, whose CFF values decline more than male subjects, reaching significant difference. The reason remains to be further clarified in the future. In respect of age, the average CFF value for high school group is also higher than that of college group, reaching significant difference (p< .05). However, the interaction among the four factors: gender, age, display color and display duration does not reach the significant level (p> .05), referring to Table 3. The past study found that the environmental illumination can affect the visual fatigue, and Saito and Hosokawa’s [18] study also found that, when the environmental luminance increases, the pupil diameter significantly reduces, easily leading to visual fatigue. ANSI/HFS 100-1998 [19] recommended that the environmental illuminance of VDT workstation should be at 200 lux to 500 lux. The environmental lumination in this experiment is set at 300 lux, which is in compliance with the above requirement.

146

C.-L. Chang et al.

Table 3. Analytical summary of factor differences for visual code working memory visual flicker fusion threshold Source of Difference Gender

Type III Sum of Square

df

MSS

F

p

Post-hoc

1006.58

1

1006.58

16.41

.000

Age

1467.33

1

1467.33

23.93

.000

female<male High School> College

Code Color

1.64

2

.82

.01

.987

7.49

2

.06

.941

Display Duration *p .05

3.75

In this study, the results for both code color and display duration do not reach significant level, and the CFF value in the same experiment of Chang, Li, Jou, and Hsu [20] also does not reach significant difference, which is in line with the conclusion of Oohira [21] findings that the CFF value changes little, difficult to reach significant difference during the work of VDT. Although the CFF value would decline with the operating time increasing, that is, the visual fatigue would also increase with the operating time increasing [22], in particular, the operation error rate would increase after 75 minutes, and there is a more significant visual fatigue after 90 minutes, a more significant visual fatigue [23]. Each experiment in this study lasted only 15 minutes, and having 5 minutes to rest, therefore, the CFF values for both the code color and display duration did not reach the significant level. It is discussed that the main reason should be related with the experiment duration which is too short. Iwasaki, Kurimoto and Noro [24] studied on the level that the computer operators respond to the CFF values of different colors, and it was discovered that the CFF values of green and yellow reduce significantly after 30 minutes working, and the CFF value of red reduce significantly after 15 minutes working. This study uses the code colors blue, green and yellow to work 15 minutes, and the experimental result shows that all the CFF values do not reach significant level, which should be caused by the short experiment duration that has only 15 minutes. Although by Iwasaki et al. [24] the CFF value of red color dropped significantly after 15 minutes working, the code color in this study does not use red so as not to be able to compare results, which is very unfortunate. The study found that the age factor could result in significant difference with regard to CFF values (p <.05). The average CFF value of high school students is higher than college students, which is different from the conclusion by Ge et al., [25], that is, the age factor does not have a significant impact on the CFF value. However, the subjects studied in [25] have the age range from 41 to 58 years old, which is very different with the age of the subjects in this study– high school group (M=17.2 years old), college group (M=20.4 years old). The study by Nomiyama et al. [26] found that the CFF values for ages under 40 were not significantly different, and the age gap of the subjects in this study is not great, but it has reached significant level. Under the circumstance with only 15 minutes of the inspection time, whether it is such as Marek and Noworol’s [27] study in assessing VDT operators’ visual and mental fatigue - it was found that the CFF value of the focal point of the retina may be affected by visual

The Influence of Gender and Age on the Visual Codes Working Memory

147

fatigue, thus causing significant difference – that should remain to be further explored in the future.

4 Conclusion Many studies have confirmed that the dynamic visual acuity is trainable, that is, it can be trained to improve. It is recommended for coaches that more vision trainings be arranged during daily routine, especially the eyesight training watching fast-moving objects, which should be able to enhance the ability of dynamic visual acuity. Male dynamic visual acuity are better than female, but this study found that female code recognition accuracy is higher than male. Attention should be a very important factor. In the future it should further explore the topic of attention and vision capacity issues. Acknowledgments. It is thankful for the National Science Council, Taiwan, sponsoring this research project (NSC 95-2221-E -164-006 -).

References 1. Zhang, H.L.: Anxiety, attention and exercise score. Chinese Sports Quarterly 11(3), 5–11 (1997) (in Chinese) 2. Qiu, X.X., Liu, C.G.: Exploration on application and importance of attention in sports context. Chinese Sports Quarterly 18(4), 74–80 (2004) (in Chinese) 3. Classe, J.G., Semes, L.P., Daum, K.M., Nowakowski, R.L., Alexander, J., Wisniewski, J., Beisel, J.A., Mann, K., Rutstein, R., Smith, M., Bartolucci, A.: Association between visual reaction time and batting, fielding, and earned run averages among players of the Southern Baseball League. Journal of the American Optometric Association 68(1), 43–49 (1997) 4. Fujishiro, H., Mashimo, I., Ishigaki, H., Edeagawa, H., Endoh, F., Nakazato, K., Nakajima, H.: Visual Function of collegiate American Football Players in Japan. In: 13th Asian Games Scientific Congress (1998) 5. Tsai, T.B.: The faculty train of sport vision for volleyball athletes. Science of Volleyball coaching, 25–30 (2003) 6. Loran, D.F., MacEwen, C.J.: Sport vision. Butterworth-Heinemann, Boston (1997) 7. Olsen, E.: Relationships between psychological capacities a success in college athletics. Research Quarterly 27, 79–89 (1956) 8. Stroup, F.: Relationship between measurements of field of motion perception and basketball ability in college men. Research Quarterly 28, 72–75 (1957) 9. Ridini, L.: Relationships between psychological function, test and selected sports skills of boys in junior high school. Research Quarterly 39, 674–683 (1968); Iwasaki, T., Kurimoto, S., Noro, K.: The Changes in colour flicker fusion (CFF) values and accommodation times during experimental repetitive tasks with CRT display screens. Ergonomics 32(3), 293– 305 (1989) 10. Landers, D., Boutcher, C., Wang, Q.: A psychological study of archery performance. Research Quarterly for Exercise and Sport 57, 236–244 (1986) 11. Abernethy, B.: Visual characteristics of clay target shooters. Journal of Science and Medicine in Sport 2(1), 1–19 (1999)

148

C.-L. Chang et al.

－

12. Chang, H.L.: Athlete Training on Attentiveness Sports Related Methodology. Physical Education at School 31, 160–167 (1997) 13. Yuan, W.M., Chang, C., Hsiao, T.: Fencing (in Chinese). People of Physical Education Publication, Ltd., Beijing (1998) 14. Tinker, M.A., Paterson, D.G.: Studies of typographical factors influencing speed of reading: Variations in color of print and background. Journal of Applied Psychology 15, 471–479 (1981) 15. Zhu, Z.X., Cao, L.R.: The impact of object-background color matching on the color CRT display effectiveness. Psychology Journal (in Chinese) 26(2), 128–134 (1994) 16. Long, G.M., Johnson, D.M.: A comparison between methods for assessing the resolution of moving targets (dynamic visual acuity). Perception 25(12), 1389–1399 (1996) 17. Ishigaki, H., Miyao, M.: Implications for dynamic visual acuity with changes in age and sex. Perceptual and motor skills 78, 1049–1050 (1994) 18. Saito, S., Hosokawa, T.: Basic study of the VRT (visual reaction test): the effects of illumination and luminance. International Journal of Human-Computer Interaction 3(3), 311–316 (1991) 19. ANSI/HFS 100-1998: American National Standard for Human factors Engineering of visual display terminal workstations. Human factors Society, Inc., Santa Monica, California (1998) 20. Chang, C. L., Lin, F. T., Li, K. W., Jou, Y. T., Hsu, T. Y.: The Study of the Impact of Environmental Illuminance on the Visual Codes Working Memory during a Fencing Game. In: 2009 IEEE International Conference on Networking, Sensing and Control (2009) (accepted) 21. Oohira, A.: Eye strain and foveal CFF in VDT work. Japanese Ophthalmology 57(12), 1318–1319 (1986) 22. Aoki, K., Yamanoi, N., Aoki, M., Horie, Y.: A study on the change of visual function in CRT display task. In: Salvendy, G. (ed.) Human-Computer Interaction, pp. 465–468. Elsevier, Amsterdam (1984) 23. Nishiyama, K.: Ergonomic aspects of the health and safety of VDT work in Japan: a review. Ergonomics 33, 659–685 (1990) 24. Iwasaki, T., Kurimoto, S., Noro, K.: The Changes in colour flicker fusion (CFF) values and accommodation times during experimental repetitive tasks with CRT display screens. Ergonomics 32(3), 293–305 (1989) 25. Ge, S.Q., Wu, G.C., Xu, X.H., Yao, Y.X., Du, X.Q., Jin, L.J.: The impact of flight fatigue on visual integration for civil flight personnel with different age groups. China Aviation & Aerospace Medical Journal (in Chinese) 16(3), 180–183 (2005) 26. Nomiyama, K., Okubo, T., Nomiyama, H.: Characteristics of fatigue tests by a long- term observation. Japanese journal of Hygiene 39, 831–840 (1984) 27. Marek, T., Noworol, C.: Bi-point flicker research and self0-ratings of mental and visual fatigue of VDT operators. In: Asfour, S.S. (ed.) Trends in Ergonomics/Human Factors IV, pp. 163–168. Elsevier, North-Holland (1987)

Comparison of Mobile Device Navigation Information Display Alternatives from the Cognitive Load Perspective Murat Can Cobanoglu, Ahmet Alp Kindiroglu, and Selim Balcisoy [email protected], [email protected], [email protected]

Abstract. In-vehicle information systems (IVIS) should minimize the cognitive load on the drivers to reduce any risk of accidents. For that purpose we built an experiment in which two alternatives for information display are compared. One alternative is the traditional information display method of showing a map with the target route highlighted in red. This is compared against a proposed alternative for information display in which prior to a junction a ground-level photo is displayed with a large red arrow pointing at the correct route the driver must take. The photo-enhanced information display method required 39% more time spent while gazing at the screen but provided a 10% reduction in the total number of headturns. Based on the participant comments, 80% of whom opted for the non-photo enhanced method, we concluded that the cognitive load brought on by the photo-enhancement is not worth the return. Keywords: Cognitive load, information display, navigation, time-based comparison.

1 Introduction 1.1 Foreword The market for mobile devices that offer navigation help has been growing exponentially. The development in hardware technology that yields more processing power as well as higher storage capacity allows experimenting with new interfaces and information display methods. We have identified a potentially helpful information display technique which is to provide ground-level photographs in complex junctions with clear directives on top of these images to help the driver. However the cognitive load of these new possibilities should be assessed carefully, most of all for safety reasons. The impact of cognitive load on safety is very fundamental as emphasized in the research of Pompei et al. [1] Therefore we have assessed the attention time and headturns required when using the photo-enhanced technique and the more traditional way of simply displaying the map of the area immediately around the driver. We have found in our experiment that the ground-level display technique requires a lower number of headturns for the driver to find the correct way therefore showing that it causes less distraction. Be that as it may, the amount of time the driver spends gazing at the screen per each headturn increases in the photo-enhanced method therefore suggesting that the driver spends D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 149–157, 2009. © Springer-Verlag Berlin Heidelberg 2009

150

M.C. Cobanoglu, A.A. Kindiroglu, and S. Balcisoy

more effort trying to figure out what he is being shown. This might in turn point at increased cognitive load and worse cognitive ergonomics. The questionnaire that was given to the participants points out that shorter but more frequent looks are cognitively more ergonomic than the other way around. The ground-level photos would be taken from vehicles travelling in the same junction and further supplemented with clear brightly-colored arrows pointing at the turn the driver must take. 1.2 Background Information The tasks that a driver performs while driving a car are usually categorized as follows: Primary tasks that are directly related to operating the car and travelling safely [4] and secondary tasks that are performed by the driver while driving but are not directly related to the process of operating the vehicle [5]. Primary tasks are generally considered to be the rotation of the steering wheel, the observation of the surroundings, operation of the pedals. Secondary tasks are usually considered to be the operation of a cell phone, radio or navigation equipment [6]. Cognitive load is measured in a number of different ways in literature which can broadly be distinguished into two categories: subjective and objective techniques. An important example of the subjective techniques used is the NASA task load index (TLX) which computes the Overall Workload Index using several questions with answers on a 1-100 scale [8]. Within the objective tests, two main classes can be defined: first category being tests in which the time it takes for participants to complete certain tasks is measured whereas research falling into the second category uses the tracking of certain biological indicators such as glances, headturns or eyemovements [7], [10]. While building a navigation device for use within the context of this experiment, we conducted a search on the current standards and design guidelines. We aimed to make the appearance of the software compliant with the ISO 15008 standard for determining the contrast, color and characters on the display screen [11]. While positioning the device, we followed the guidelines of the European Statement of Principles [4] and positioned the device so that it would not occlude any of the road or surroundings but still be easy-to-reach for the driver by a slight headturn. Another design principle that concerns design of in-vehicle information systems (IVIS) is the 15-second rule. The rule states that any task, when viewed as a continuous sequence of actions, should not exceed 15 seconds [12]. 1.3 Motivation It is stated by Klinker and Tonnis that arrows have reached the end of their informative capacities through overloading in information visualization for road vehicles [9]. It is intuitive that the occlusions of different road levels in multi-leveled junctions do not contribute to the informative quality of simple arrows within the navigation context. In this research we have aimed to assess the potential of a candidate alternative to be used in information display with this motivation. To assess the usability as accurately as possible, we have used a combination of the techniques used in literature. Headturn tracking and glance time-keeping were used as

Comparison of Mobile Device Navigation Information Display Alternatives

151

objective indicators, and this was supplemented by a questionnaire as a subjective indicator. This intuition came from the literature as similar techniques have been utilized in both [8] and [10]. To be able to focus solely on the cognitive load of navigating, we thought it necessary to exclude any manual input on the side of the participant driver but instead decided to provide a completely automated system. As the user needs to provide no manual input, the 15-second rule is not applicable in that sense to this research. The only task that needs to be accomplished is identifying if a particular turn must be taken or not and that task took far less than 15 seconds in all the trials. Thus the claim that the experimental setup complied with the 15-second rule cannot be rejected.

2 Methodology 2.1 Participants There were five participants which were selected from a body of volunteers that were available. Due to the limited funding we had, we could not endorse participation but rather had to rely on volunteers as participants. However we imposed the restriction that each participant had to be the holder of a valid driving license by the time of the experiment. This was necessary as the experiment was conducted on public roads. To further minimize any risk of accidents, every participant had to admit that they were comfortable with driving a car and using a GPS navigation device at the same time or else they were not accepted to the experiment. The age range of the participants was 20-35 and two out of five participants were females. 2.2 Experimental Design Cognitive load brought on by using a mobile navigation device while driving is measured by two metrics in our study: number of headturns to look at the screen and the time spent looking at the screen (in milliseconds). Measuring the average time of attention is a technique that has been used in previous research about information display on mobile devices. [2] The first of the alternative information display methods used is the rather traditional method of displaying a map of the area immediately around the user with the target route highlighted in a color different than others. This method could mislead the user in complex junctions - where we take complex to mean junctions with multiple levels, each at a different height since the user could potentially fail to understand which level she was on. Such hindrances coupled with the fact that traffic gets heavier on complex junctions, possibly drawing even more driver attention - could significantly increase cognitive load. To bolster usability, ground-level photos of the junction to be displayed on the mobile device screen could potentially be helpful. These photos give the viewing driver exactly what she already sees outside the window, making it easier to decide to drive straight or to turn. A false turn in a complex junction could potentially create serious deviation from the intended route resulting in large losses of time and effort at the least. This problem of deviation of the user from the route and possibly even getting lost is so crucial that there are researchers like Viitala-Kiss et al. [3] that produce sophisticated methods of correcting these deviations as effectively as possible.

152

M.C. Cobanoglu, A.A. Kindiroglu, and S. Balcisoy

Fig. 1. Experimental Setup

Fig. 2. Experiment Route and Positions of the Eight Turns with Photo-Enhancement

Each participant would be instructed to complete a route following the directives displayed on the device screen. Participants were given the driver seat with two other aides in the car; one recording the session while the other kept the device in a position that the driver deemed comfortable. Figure 1 is a photo showing the experimental setup. The experiment design was aimed to compare the usability difference between two alternative methods of information display on mobile device screens in complex junctions. The experiment took place on a route that included four turns in a complex junction as well as four turns on a normal road without any complex junctions. The screen in default

Comparison of Mobile Device Navigation Information Display Alternatives

153

Fig. 3. The Screenshot of the Device in Photo-Enhanced Information Display Mode 50 metres Before Turn 1

Fig. 4. The Screenshot of the Device in No Photo-Enhancement Mode 50 metres Before Turn 1

mode displayed a map of the local area immediately around the vehicle with a magnification of approximately 2.75 cm per 100 meters. In photo-enhanced mode, when the car was closer than 50 meters to a turn the screen displayed a ground-level photo showing the road ahead of the driver with a bright red arrow pointing at the direction she must go - you can see a screenshot at Figure 3. In case the device was in the traditional mode, it kept displaying the map with a small auxiliary arrow at the bottom pointing at the direction of the turn - screenshot at Figure 4. While calculating the results we made our calculations twice, one excluding the time or glances that the driver gave while the car was at a halt i.e. when waiting for a red light.

154

M.C. Cobanoglu, A.A. Kindiroglu, and S. Balcisoy

Fig. 5. The Average Gaze Duration per Each Headturn When the Vehicle is Non-Stationary

Fig. 6. Average Number of Headturns to Look at the Device Screen When the Vehicle is NonStationary

3 Result On average the photo-enhanced method had 3.4 headturns - accounting for 10.6% less than map view (as shown in Table 2) but in turn the photo-enhanced display required 170 millisecond increase in attention time per look (as seen from Table 1). The average time spent while staring at the screen per each headturn is higher by 39% in the photo-enhanced method. The questionnaire that was given to the participants following the experiment shows that 80% of the participants thought the classical map view was more helpful than the photo-enhanced method.

Comparison of Mobile Device Navigation Information Display Alternatives

155

Table 1. Average Time Spent Gazing at the Screen (in seconds)

Photo-Enhanced: No Photo-Enhancement:

TOTAL 21.85 19.88

WHILE DRIVING 18.65 14.87

Table 2. Average Number of Headturns to Look at the Screen

Photo-Enhanced: No Photo-Enhancement:

TOTAL 31.2 34.4

WHILE DRIVING 30.4 33.8

Table 3. Average Time Spent Gazing at the Screen per Headturn (in seconds)

Photo-Enhanced: No Photo-Enhancement:

TOTAL 0.70 0.57

WHILE DRIVING 0.61 0.44

Table 4. Experimental Results for Photo Enhanced Method

Headturn Count Participant 1 Participant 2 Participant 3 Participant 4 Participant 5

29 44 21 35 27

Total Glance Time (seconds) 21.04 28.41 8.65 40.6 10.54

Driving Glance Time (seconds) 21.04 28.41 7.65 25.6 10.54

Preferred Method

YES

Table 5. Experimental Results for Non-Photo Enhanced Method

Headturn Count Participant 1 Participant 2 Participant 3 Participant 4 Participant 5

24 52 25 33 38

Total Glance Time (seconds) 25.92 31.30 7.30 16.12 18.76

Driving Glance Time (seconds) 20.90 25.307 6.30 16.12 11.76

Preferred Method YES YES YES YES

Note that we exclude gaze times or headturns while the car is stationary for two main reasons: Firstly when the car is stationary (for example, waiting in a red light) there is almost no cognitive load on the driver. Secondly the traffic was often fluid which never provided the driver with an opportunity to observe the device screen

156

M.C. Cobanoglu, A.A. Kindiroglu, and S. Balcisoy

while not driving the car. As a result of these factors using the non-stationary time measurements does not change the observations definitively but just slightly distorts them as the main focus of this research is on cognitive load while driving and navigating.

4 Conclusion Photos provide more information per each look but at the cost of higher cognitive effort. Despite the reduction in headturn count, extra attention demanded per each headturn seems too expensive in terms of cognitive load. This extra cognitive effort spent to parse the photo, while driving and negotiating lane changes with other drivers, can possibly increase the risk of cognitive capture and lead to accidents or fatalities. Our conclusion is that the photo-enhanced method demands higher cognitive effort from the driver. This conclusion is based on the observation that in the photoenhanced method on the average there is a 39% increase (170 milliseconds) of gaze duration per each headturn. Although the amount of information supplied to the user per headturn is slightly higher in photo-enhanced information display as shown in the fact that it requires 10.06% less headturns overall (Tables 1,2 and 3); the subjective tests show that this slight benefit does not justify the 39% increase of attention time required per each headturn. Subjective tests in which the participants have been asked to name the preferred method have shown that 80% support the arrow-based simpler model to the photo-enhanced method, further supporting this conclusion (Tables 4, 5). This conclusion gets more pronounced in complex junctions where simply driving the car requires heavy mental effort. The consequence is that the photo-enhanced method gives significantly more information per look - as shown by the number of headturns metric. However this comes at the cost of longer attention time per each headturn. The post-experiment evaluations show that this extra effort is just too precious in very high cognitive load environments therefore reducing the attention time needed per look must be taken as a higher priority than reducing the number of headturns to enhance cognitive ergonomics.

References 1. Joseph Pompei, F., Sharon, T., Buckley, S.J., Kemp, J.: An Automobile-Integrated System for Assessing and Reacting to Driver Cognitive Load. In: Proceedings of Covergence 2002, pp. 411–416. IEEE SAE, Los Alamitos (2002) 2. Krum, D.M., Omoteso, O., Ribarsky, W., Starner, T., Hodges, L.F.: Evaluation of a Multimodal Interface for 3D Terrain Visualization. In: 13th IEEE Visualization 2002, VIS 2002 (2002) 3. Viitala-Kiss, T., Ikonen, J.: GPS Assisted Alternative Path Modeling and Guidance. In: 2006 International Conference on Software in Telecommunications and Computer Networks, pp. 167–171 (2006)

Comparison of Mobile Device Navigation Information Display Alternatives

157

4. European Statement of Principles on Human Machine Interface for In-Vehicle Information and Communication Systems (1998), ftp://ftp.cordis.europa.eu/pub/telematics/docs/ tap_transport/hmi.pdf (retrieved on 24-02-2009) 5. Blanco, M., Biever, W.J., Gallagher, J.P., Dingus, T.A.: The impact of secondary task cognitive processing demand on driving performance. Accident Analysis and Prevention 38, 895–906 (2006) 6. Biever, W.J.: Auditory based supplemental information processing demand effects on driving performance. Unpublished master’s thesis, Virginia Polytechnic Institute and State University, Blacksburg, VA (1999) 7. Tönnis, M.: Time-Critical Supportive Augmented Reality - Issues on Cognitive Capture and Perceptional Tunnelling. In: The Fifth IEEE and ACM International Symposium on Mixed and Augmented Reality, Santa Barbara, CA, USA, October 22 - 25 (2006) 8. Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Human Mental Workload 1, 139–183 (1988) 9. Tonnis, M., Klinker, G.: Augmented 3D Arrows Reach Their Limits In Automotive Environments. In: Wang, X., Schnabel, M.A. (eds.) Mixed Reality In Architecture, Design And Construction, pp. 185–202. Springer, Heidelberg (2009) 10. Nowakowski, C., Green, P.: Map Design: An On-the-Road Evaluation of the Time to Read Electronic Navigation Displays (Technical Report UMTRI-98-4), Ann Arbor, MI: The University of Michigan Transportation Research Institute (1998) 11. ISO 15008:2009 Standard, http://www.iso.org 12. Green, P.: Estimating Compliance with the 15-Second Rule for Driver-Interface Usability and Safety. In: Proceedings of the 43rd Annual Meeting Human Factors and Ergonomics Society (1999)

Visual Complexity: Is That All There Is? Alexandra Forsythe Liverpool John Moores University, UK {a.m.forsythe}@ljmu.ac.uk

Abstract. Visual complexity is conventionally defined as the level of detail or intricacy contained within an image. This paper evaluates different measures of complexity and the extent to which they may be compromised by a familiarity bias. It considers the implications with reference to measures of visual complexity based on users’ subjective judgments and explores other metrics which may provide a better basis for evaluating visual complexity in icons and displays. The interaction between shading and complexity is considered as a future direction for the empirical study of visual complexity. Keywords: Icons, Visual complexity, Familiarity, Metrics.

1 Complexity in Icon and Symbol Research Visual complexity is a concept introduced by Snodgrass and Vanderwart [1] to refer to the amount of detail or intricacy in a picture. This concept has now been adopted in icon research and is frequently measured with reference to the number of lines within an icon or symbol [2, 3]. The amount of detail or intricacy within an icon influences the rate at which an icon is detected. Very simple or very abstract icons are detected faster than those in the mid-range. Some studies [4] report a negative effect on response latency have used abstract and concrete stimuli - a mixture of symbols and icons - but many have based their findings on arbitrary stimuli such as symbols [5, 6] or lattices of random black and white quadrangles [7]. Images that are more concrete or real world do not produce the same increase in response latency [8]. One explanation is that arbitrary stimuli are probably more semantically impoverished and are also less likely to have any predictable pattern. Only small pieces of information can be processed at any one time and the visual system is possibly unable to draw any semantic inferences to the same extent as can be achieved with pictures [7]. Thus, increases in response latency and response errors are perhaps more likely to occur. Whilst we know that complexity is possibly related to response efficacy, there is still little consensus as to what complexity is or how it should be defined and measured. For example, Feldman’s definition of an ‘absence of pattern’ reflects an emphasis on randomness, impoverishments and a degree of perceptual difficulty [10]. In contrast, Garcia et al. [11] relates complexity with increasing real worldness. Whilst the latter is perhaps an oversimplification, it does allude to principles of higher organization and the human propensity to search for pattern, an effort after meaning not lack of it [11]. Different approaches to measuring complexity will now be reviewed. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 158–166, 2009. © Springer-Verlag Berlin Heidelberg 2009

Visual Complexity: Is That All There Is?

159

2 Empirical Complexity 2.1 Early Approaches to Visual Complexity The study of visual complexity emerged from the empiricist tradition. The tradition is based on the premis that people make poor intuitive judges in uncontrolled settings; understanding could only be advanced through quantification in controlled laboratory settings. When unusual, unexplainable results emerged, Gestalt psychology developed to explain them. The Gestaltists set out to understand the processes of perception, not through the meticulous analysis of patches of light, shape and colour, but through an analysis of the whole, configuration or form [12]. Their philosophy was that sensations are not elementary experiences; we “see” shape and form regardless of where the image falls on the retina or what neurons process the various image components. What was important was constancy. One such law generated through the Gestalt movement was prägnanz. The Prägnanz principle contends that the forms that are actually experienced take on the most parsimonious or ‘best’ arrangement possible in given circumstances. In other words, of all the possible perceptual experiences to which a particular stimulus could give rise, the one most closely fitting to the concept of ‘good’ will be experienced. The term ‘good’ means symmetrical, simple, organised and regular [13]. This study of psychological organisation explained the tendency to create psychologically, simple order patterns from a wide range of perceptual stimuli. This early study of ‘simplicity’ evolved into the study of ‘complexity’, with theorists attempting to re-write the Gestalt Law of simplicity within a more formal framework [14, 15, 16]. Both Hochberg and Attneave acknowledged that shape was a multidimensional variable that would vary with the complexity of an image. Attneave [14] also developed a measure whereby an image could be measured in ‘dots’. An outline image was presented to observers who were then requested to place dots in important image areas (bends and curves). These dots would be used to reproduce the image as accurately as possible. Simple images required fewer dots. 36 dots were used to recode the changes in contour for a picture of a sleeping cat. Attneave was able to produce an abstract of the image by connecting these points with a straight edge (Figure 1). Hochberg and Brooks [14] developed a semi-automated measure of image complexity. They argued that relying solely on human judgments would mean that there would be no way of predicting how complex a novel image might be judged. Hochberg’s calculations demonstrated that it was possible to predict how viewers would ‘see’ an image; the more interior angles, different angles and the more lines in an image the more likely it would be perceived in three-dimensions (Figure 2). The number of interior angles, the average number of different angles and the average number of continuous lines can be combined to provide a measure of complexity. However, knowing how many dimensions were needed to explain a shape was not sufficient to judge its complexity, since some dimensions (e.g. reference-axis or spaces) were more meaningful than others [15]. In other words, the calculation of a metric based on increasing tri-dimensionality tells us very little about either the complexity of unfamiliar images or the learning processes that can influence the perception of form. Attneave & Arnoult wanted to understand the degree to which

160

A. Forsythe

Fig. 1. Attneave’s cat

Fig. 2. Increasing tri-dimensionality

(A)

(B) (C) (D) (E) (F)

Using a table or random numbers place a set of scatter points on a piece of 100x100 graph paper. The number of points will correspond to the number of sides the shape will have. Connect the peripheral points to form a convex polygon; some concavity will be tolerated. The smallest subset of points with convex angels are then connected by drawing the line towards a point on the opposite angle. Lines should not cross one another All points must be connected Nonsense shape Fig. 3. Creating random polygons (Attneave & Arnoult, 1956)

Visual Complexity: Is That All There Is?

161

size, contrast, method and familiarisation influenced the perception of form. They developed a system of calculations that could be used to generate nonsense shapes, the idea being that if testing using such a metric worked for images that had no meaningful relationship with ‘real world’ counterparts then it could be generalized to other stimuli. This system is outlined in Figure 3. 2.2 Later Approaches to Visual Complexity Complexity has received less attention in recent years, in part because of the absence of a universally acceptable metric [17] and those measures that have been developed are not particularly well supported within a theoretical framework. For example, Geiselman et al [18] developed an index of discriminability between graphic symbols and identified nine ‘primitive’ attributes; e.g. numbers of straight lines, arcs, quasi angles and blackened-in elements. Symbols selected for high discriminability using this metric were responded to faster than those with lower discriminability. Similarly, Garcia, Badre & Stasko [10] developed a complexity metric based on a calculation of several icon features including the number of closed and open figures, and horizontal and vertical lines. This metric was developed primarily as a measure of the concreteness of icon. Garcia et al. reported that icons that are pictorially similar to their real world counterparts are more likely to be judged as complex. This has been found not to be the case icon: complexity is more closely related to search efficacy [1, 4]. A more valid and reliable measure of complexity would enable researchers to determine more accurately the effects of extra detail and intricacy on performance. Forsythe et al. [3] tested several automated measures of icon complexity based on measurements of the changes in image intensity. These measures were informed by arguments that course and fine lines are critical in providing information about a stimulus. The brain registers variations in an image as changes in intensity, and it is these coarse and fine changes that provide detail and local information about a stimulus [19, 20, 21, 22]. Coarse scales are thought to be treated by the brain as low-frequency components obtained from local information. This difference in processing speed would seem to be a function of image complexity: When an object is of a detailed nature, its global attributes are processed much faster than its local ones [23, 24]. Forsythe et al. [3] showed that these basic perceptual components [i.e. edges] are important in the measurement of complexity in so far as the extent to which an image is measured as having edges correlated highly with subjective judgments of image complexity. For example, the ‘Perimeter’ detection metric correlated (rs=.64, p<.001) with a random set (n=68) of the McDougall et al. [2] icons and symbols and also correlated (rs=.66, p<.001) with measures by Garcia et al [10]. This Perimeter metric has reasonably good predictive validity when applied to other pictorial images [25]. Perimeter measures are described by [26] as contour-based, global measures of shape. Perimeter measures do not divide the shape into parts; rather the whole shape contour is used to describe the shape. This makes this type of measure very straightforward for users to implement and as such it tends to be a popular method of image measurement. An alternative automated measure of complexity that is also very straightforward to implement is based on the size of the compressed image file [27, 28] Image compression techniques take advantage of the fact that many images contain a lot of repetition. This information is removed or reduced to enable storage of the image in a

162

A. Forsythe

compact form (it takes up less disc space). A more complex picture will have more image elements and these elements will be less predictable (there will be less repetition). The file string (an ordered sequence of storing variables) will be longer and contain an increasing number of different sequences. Donderi [29] revisited information-theory as a possible framework that could explain the success of image compression techniques (such as Jpeg) as a determinant of complexity. Information-theory treats a message as a series of components to be communicated and the components in a complex image are its primitives. Donderi argues that when a picture is compressed the string of numbers that represent the organisation of that picture is a measure of its information content. When the image contains few elements or is more homogenous in design, there are few message alternatives and as such the file string contains mostly numbers to be repeated. This is consistent with a basic premise of information theory: the information content in a message is inversely proportional to its probability of occurrence. Forsythe et al [30] applied this theory in the evaluation of several compression measurement techniques. Compression scores were collected for the image sizes of several published image sets. Gif compression provided a good approximation of human judgments of visual complexity across three image sets1. This study also demonstrated that many complexity metrics – based only on human judgments- were biased by a familiarity effect; unfamiliar images were rated as more complex than they actually were using the automated metrics (see Forsythe et al. for a list of studies). Forsythe et al [30] demonstrated this effect by training participants on a group of nonsense shapes. When subsequent ratings for complexity were collected from this group they rated the shapes used in training as simpler than a group of naïve participants. These results suggest that humans are not best placed to make judgments relating to the complexity or simplicity of an image. Compression techniques are a fast and user friendly option for the measurement of visual complexity and they are not so affected by judgments of familiarity. These metrics have some underlying theoretical bases, such as information theory, and produce good approximations of human judgments.

3 Nativist Complexity The following section examines how our perception of visual complexity is overlaid by other factors. On this basis, I suggest that visual complexity should be considered in relation to other factors rather than alone. 3.1 Familiarity Recent work has moved some way closer to developing a theoretically informed measure of visual complexity, but is that all there is? The finding that familiarity is 1

Jpeg is a technique that reduces the size of the image file by removing redundant information, but generally assumes that some loss of information is acceptable. Gif works on a similar principle except that when the image is to be recovered no image loss occurs.

Visual Complexity: Is That All There Is?

163

related to complexity resurrects and an old argument that complexity is meaningless; it is the way in which a stimulus is perceived that is important, not the number of elements [31]. If complexity correlates negatively with familiarity is it intrinsically bad? If familiarity is a part of the construct of complexity, then this is what researchers and designers may need to take into consideration rather than simply arriving at the best, context free measure of visual complexity. So, removing familiarity effects is not an advantage in its own right, it depends on what one wants. 3.2 Novelty and Interest In addition to familiarity, one important concept which appears to be linked to complexity is the degree to which stimuli are able to capture attention and interest. There are many interface environments where capturing an individual’s attention or relieving boredom by introducing interesting stimuli are more important than simply ensuring fast processing of simple stimuli. Berlyne [34] argued that interest is a monotonic function of collative variables such as novelty, complexity, surprise and ambiguity, suggesting that icon detail and intricacy is likely to be closely related to how interesting an icon is. At present however technology cannot measure an ‘interest factor’. A separation of the symbol property ‘complexity’ from ‘detail and intricacy’ in icon research is warranted. ‘Detail’ is perhaps a more neutral description of the structural components within an icon. Complexity implies difficulty; it suggests that an image will be more difficult to understand than a ‘simple’ image. This perhaps explains why observers rate familiar shapes (even nonsense shapes) as less-complex than they actually are [30]. What this means is that when detailed icons are used, the most important property for the observer is that they are meaningful. Interest and meaningfulness helps us focus our attention, and retrieve salient information about the message and reduce the interpretational burden. 3.3 Spatial Frequency Information Forsythe [3] found that observers are unlikely to judge a detailed icon as simpler if it contained a large amount of low spatial frequency information, relative to high (i.e. ratings of complexity are reduced when shading is reduced). Queen [33] also found that responses were faster to icons and symbols that were of low spatial frequency and that this frequency was unique from the frequency of other icons in the set. Little attention, however, has been given to the interplay between visual complexity and shading. The relationship between icon complexity and shading can be explained as follows. The visual system “knows” that an object will reflect different amounts of light, but this reflection does not depend on the properties of the stimulus [34]. Almost all the variation in light-levels is due to the illuminant, the physical properties of the object account for only a small fluctuation in the waveform. Depending on the reflective surface, the time of day, a dark night or a snowy day the light variation changes considerably. Encoding absolute light-levels would be an inefficient strategy for the neural system, thus the brain has adapted ways in which the importance of light levels can be minimised. It does this by attenuating to zero and near-zero spatial frequencies.

164

A. Forsythe

(a) Original

(b) Low spatial filter

(c) High spatial filter

Fig. 4. High and low spatial filtering

Low spatial frequency information is the most consistent of information that the visual system receives. By attenuating to it the visual system is able to maximise the perception of the object and ‘know’ where edges occur [34]. Figure 4 illustrates the differences between high and low spatial information. This attenuation comes at a cost; small details are over looked. Ginsburg et al. [35] reported that pilots with high contrast sensitivity were able to detect a blocked runway at a greater distance than those with poor contrast sensitivity. Likewise, Harvey, Roberts & Gervais [36] reported that letters with common features [e.g. O, Q] were not confused when their spatial frequencies differed. However letters with noncommon features [e.g. “A”, “S”], but similar spatial frequencies were confused. These findings contradict feature integration theory [37, 38] and suggest that spatial frequency information is much more important to visual processing than the integration of image parts.

4 Conclusions Compression techniques such as Gif and Jpeg offer researchers the most reliable and user friendly option for the quantification of visual complexity, they are also unbiased - they are not affected by familiarity with an image set. These metrics have a strong theoretical basis (information theory) and produce good approximations of human judgments. However, it is a reality that visual complexity is related to familiarity and researchers should consider what it is that they want from a measure of visual complexity and if removing familiarity from the equation is warranted. Further work may want to explore the usefulness of familiarity in our understanding of what is perceived as complex or simple. Finally, spatial frequency offers a new direction for complexity research. It seems likely that understanding the ratio of high to low spatial frequency information in icon design can improve reaction times and performance.

Visual Complexity: Is That All There Is?

165

References 1. Snodgrass, J.G., Vanderwart, M.: A standardized set of 260 pictures. Norms for name agreement, image agreement, familiarity and visual complexity. Journal of Experimental psychology. Human Learning & Memory 6, 174–215 (1980) 2. McDougall, S.J.P., Bruijn, D.O., Curry, M.B.: Measuring symbol and icon characteristics: Norms for concreteness, complexity, meaningfulness, familiarity and semantic distance for 239 symbols. Behavior Research Methods 31(3), 487–519 (1999) 3. Forsythe, A., Sheehy, N., Sawey, M.: Measuring icon complexity: an automated analysis. Behavior Research Methods, Instruments, and Computers 35, 334–342 (2003a) 4. McDougall, S.J.P., Bruijn de, O., Curry, M.B.: Exploring the affects of picture characteristics on user performance: The role of picture concreteness, complexity and distinctiveness. Journal of Experimental Psychology: Applied 6, 291–306 (2000) 5. Arend, U., Muthig, K.P., Wandmacher, J.: Evidence for global feature superiority in menu selection by pictures. Behavior and Information Technology 6, 411–426 (1987) 6. Bryne, M.D.: Using pictures to find documents: Simplicity is critical. In: Proceedings of the conference on Human Factors in Computing systems, INTERCHI 1993. AddisonWesley, Reading (1993) 7. Brunel, N., Ninio, J.: Time to detect the difference between two images presented side by side. Cognitive Brain Research 5, 273–282 (1997) 8. Rossion, B., Pourtois, G.: Revisiting Snodgrass and Vanderwart’s object set: The role of surface detail in basic-level object recognition. Perception 33, 217–236 (2004) 9. Feldman, J.: How surprising is a simple pattern? Quantifying “Eureka!”. Cognition 93, 199–224 (2004) 10. Garcia, M., Badre, A.N., Stasko, J.T.: Development and validation of icons varying in their abstractness. Interacting with Computers 6(2), 191–211 (1994) 11. Bruner, J.S.: Beyond the Information Given: Studies in the Psychology of Knowing. Norton, London (1973) 12. Hochberg, J.E.: Perception, 2nd edn. Prentice-Hall, Englewood Cliffs (1986) 13. Koffka, K.: Principles of Gestalt Psychology. Lund Humphries, London (1935) 14. Attneave, F.: Some informational aspects of visual perception. Psychological Review 61, 183–193 (1954) 15. Attneave, F., Arnoult, M.D.: The quantitative study of shape and pattern perception. Psychological Bulletin 53, 452–471 (1956) 16. Hochberg, J.E., Brooks, V.: The psychophysics of form: Reversible perspective drawings of spatial objects. American Journal of Psychology 73, 337–354 (1960) 17. Johnson, C.J., Paivio, A., Clark, J.A.: Cognitive components of picture naming. Psychological Bulletin 120(1), 113–139 (1996) 18. Geiselman, R.E., Landee, B.M., Christen, F.G.: Perceptual discriminability as a basis for selecting graphic symbols. Human Factors 24, 329–337 (1982) 19. Beck, H., Graham, N., Sutter, A.: Lightness differences and the perceived segregation of regions and population. Perception and Psychophysics. 49(3), 257–269 (1991) 20. Harwerth, R.S., Levi, D.M.: Reaction time as a measure of suprathreshold grating detection. Vision Research 18, 1579–1586 (1978) 21. Sutter, A., Beck, J., Graham, N.: Contrast and spatial variables in texture segregation: Testing a simple spatial-frequency channels model. Perception and Psychophysics 46(4), 312–332 (1989) 22. Vassilev, A., Mitov, D.: Perceptual time and spatial frequency. Vision Research 16, 89–92 (1976)

166

A. Forsythe

23. Hoeger, R.: Speed of processing and stimulus complexity in low-frequency and highfrequency channels. Perception 26, 1039–1045 (1997) 24. Parker, D.M., Lishman, J.R., Hughes, J.: Integration of spatial information in human vision is temporally anisotropic: evidence from a spatiotemporal discrimination task. Perception 26, 1169–1180 (1997) 25. Forsythe, A., Sheehy, N., Sawey, M.: The automated measurement of pictorial image complexity: a feasibility study. In: Harris, D., Duffy, V., Smith, M., Shephanisdis, C. (eds.) Human-Centred Computing: Cognitive, Social and Ergonomic Aspects, vol. 3, pp. 205–209. Lawrence Erlbaum, Hillsdale (2003b) 26. Zhang, D., Lu, G.: Review of shape representation and description techniques. Pattern Recognition 37, 1–19 (2004) 27. Vitevitch, M.S., Armbrüster, J., Chu, S.: Sublexical and Lexical Representations in Speech Production: Effects of Phonotactic Probability and Onset Density. Journal of Experimental Psychology: Learning, Memory, and Cognition 30(2), 514–529 (2004) 28. Donderi, D.: Visual Complexity: A review. Psychological Bulletin 132, 73–97 (2006) 29. Shannon, C.E., Weaver, W.: The mathematical theory of communication. University of Illinois Press, Urbana (1949) 30. Forsythe, A., Mulhern, G., Sawey, M.: Confounds in pictorial sets: the role of complexity and familiarity in basic-level picture processing. Behavior Research Methods 40(1), 116– 129 (2008) 31. Rump, E.E.: Is there a general factor of preference for complexity? Perception& Psychophysics 3, 346–348 (1968) 32. Berlyne, D.E.: Novelty, complexity, and interestingness. In: Berlyne, D.E. (ed.) Studies in the new experimental aesthetics: Steps toward an objective psychology of aesthetic appreciation, pp. 175–180. Hemisphere Publishing Corporation, Washington (1974) 33. Queen, M.: Icon Analysis; Evaluating Low Spatial Frequency Compositions, Boxes and Arrows (2006), http://www.boxesandarrows.com/view/icon_analysis 34. De Valios, R., De Valios, K.: Spatial Vision. Oxford Series, vol. 14. Oxford University Press, Oxford (1990) 35. Ginsburg, A.P., Evans, D.W.: Contrast sensitivity predicts pilots’ performance in aircraft simulators. American Journal of Optometry and Physiological Optics 59, 105–109 (1982) 36. Harvey, L.O., Roberts Jr., J.O., Gervais, M.J.: The spatial frequency basis of internal representations. In: Geissler, H.G., Buffart, H.F.J.M., Leeuvenberg, E.L.J., Sarris, V. (eds.) Modern (1983) 37. Treisman, A.: Features and objects in visual processing. Scientific America, 106–115 (November 1986) 38. Treisman, A., Gelade, G.: A feature integration theory of attention. Cognitive Psychology 12, 97–136 (1980)

Operational Decision Making in Aluminium Smelters Yashuang Gao1,2, Mark P. Taylor2, John J.J. Chen1, and Michael J. Hautus3 1

Chemical & Materials Engineering 2 Light Metals Research Centre 3 Department of Psychology, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand [email protected]

Abstract. Many computer systems incorporating artificial intelligence have been introduced for use in industry to assist in making decisions and controlling processes. However, decision making in a complex industrial plant, such as an aluminium smelter, involves psychologically related factors such as intuitive reasonings, operator response characteristics, perception of risk, and implication of rewards. While a significant body of work does exist on decision science, research concerning human interaction with process control systems is still at the development stage. The work reported here aims to meet the needs of the process industry by incorporating human factors and decision making strategies into computer programs such as a supervisory control system for aluminium smelters. A case study on the control of the level of the liquid electrolyte was carried out to firstly facilitate an understanding of the variables, including human factors, on process control. It was found that the availability of crushed solidified electrolyte material had a significant impact on the level of the liquid electrolyte, while the implementation of a supervisory control system had a certain impact, management and leadership styles also had a significant influence. Keywords: Supervisory control system, decision making, human and system interaction, process control.

1 Introduction 1.1 Requirements in the Process Industry In any organization, business case, research, or engineering projects, the people involved have to deal with large amounts of information. This information is then processed into a format suitable for use in making decisions and solving problems [1]. Many research projects have theoretically or empirically established that human brains have limited capacity for information processing [2, 3]. The approximate maximum number of variables that humans can process at any one time is four [2]. Apart from the limited information processing capacity or heavy mental workload, insufficient knowledge is another issue in decision making and problem solving [4]. Therefore, scientists brought computer systems and artificial intelligence such as D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 167–178, 2009. © Springer-Verlag Berlin Heidelberg 2009

168

Y. Gao et al.

various types of decision support systems, supervisory control systems, and expert systems, to aid humans in information processing, decision making and problem solving [1]. Through the cooperation of psychologists and computer scientists, principals from human factors and psychological science were introduced into some of the computer systems to improve their functions [1, 5]. Air traffic control systems represent an example of the success of this [5, 6]. Process control is commonly understood as a statistics and engineering discipline that deals with architectures, mechanisms, and algorithms for controlling the output of a specific process [7]. Digital computer control, which is linked to the availability of computers, was first used in the early 1980s. The aims were and still are to maintain a process operating steadily under the designed specification conditions. To goal is to achieve the desired quality of the product, and reduce cost by through improving the efficiency of the process, while minimizing the human and systems errors that may occur in the process. However the potential to integrate the power of human reasoning and decision making with the almost unlimited computational capacity now available has not been fully explored [8]. The fundamental premise of traditional process control was to remove deviations from the target condition of the process by measuring and acting on single variables, for example, liquid level to control flow, temperature to control energy input [7]. Over time, many compensatory rather than corrective control strategies were applied to any cause that arose or became embedded in the process. This tendency was reinforced by the drive to achieve automatic operation with a minimum of human intervention. However, compensatory actions do not by their nature address the root causes, or trigger subsequent human actions to remove them, so that similar or even more serious variations will occur in the future. Furthermore, many processes are multivariate, with the variables interacting in a very complex manner. Human intervention, diagnosis and decision making are required to identify the root causes, take corrective actions, and thereby improve the process and product quality over time. Decision making in a complex industrial plant involves psychologically related factors such as intuitive reasoning, operator response characteristics, perception of risk, and the implications of rewards. While a significant body of work does exist on decision science, research concerning human interaction with process control systems is still at the development stage. 1.2 Requirements in Aluminium Smelters Aluminium smelting is a process which is multivariate and involves highly complex mechanisms such as mass and energy balances, electrochemical reactions, the supply of reactants and the maintenance of the composition of the reaction mixture [9]. The large amount of information coming in from such a process, some in real-time and others intermittently at varying frequency, is challenging for a human brain to process. To control this complex process to achieve high productivity and efficiency requires day to day (and sometimes minute to minute) monitoring of the variables, and a high level of deductive problem solving and decision making. Establishing and implementing a computer system for operational staff in a smelter to manage not only the process and maintenance functions, but also the work of people, has stimulated

Operational Decision Making in Aluminium Smelters

169

much interest. Many technical solutions have been proposed, but usually with small incremental improvements. A few supervisory systems have been built and tested in a number of smelters [10, 11, 12, 13, 14]. The common aim is to collect all the process information and present it in a visual format, hence to help the operational staff understand the process variations better. Some of the supervisory systems incorporated technologies such as fuzzy logics, neural networks, or even more advanced tools such as 3D control envelopes to diagnose any abnormality in the process [13, 15, 17, 18, 19]. The common finding from the evaluation of the efficacy of supervisory systems is that they help the operational staff to achieve better process control [10, 11, 12, 13, 14, 17]. However, these claims are based on the observations of the process control output. The real mechanism of the effectiveness of the supervisory systems on process control remains unexplored. Furthermore, human operators are only considered as the users, while the human reasoning aspect and the interaction with the systems, as well as the other factors such as the impact of organization and management are absent in these explanations. 1.3 Effect of Leadership and Management Style on Decision Making The quality of the decisions made by managers reflects their effectiveness. However, their leadership and management styles are one of the factors which influence the decision making process. One of the decision making and problem solving models that is widely used in many organizations involves five basic components – problem recognition, information search, construction of alternatives, choice, and implementation [1, 20, 21]. Each step in the decision making process can be affected by environmental and organizational factors, as well as personal values, personality, and the perception of risk, amongst others [20, 22]. In any organization or any cultural work environment, motivation always plays a significant role in governing behaviour and work performance [20, 22]. Job performance has been defined as a function of capacity to perform, opportunity to perform, and willingness to perform [22]. Willingness is effectively associated with motivation. Motivated employees are able to perform better and achieve set goals. The sources of motivation vary with different cultures [20, 23]. Organizations implement a variety of rewards to attract, retain, and motivate people to achieve both personal and organizational goals [22]. Many organizations use money in the form of salary increases or bonuses as a reward to motivate the employees [22, 24, 25]. Maslow’s theory of human motivation classified the need of human beings into five hierarchical levels: from physiological, to safety, to social, to esteem, and then to the highest level – self actualization [26]. In certain circumstances, money can help humans to meet the needs of Maslow’s first two levels, for example having enough money to buy food or to buy a house. 1.4 Objectives The present research aims to study human reasoning, decision making, and behaviour with respect to their influence on process control in a complex industrial environment. This will be accomplished through a case study of the implementation of a new supervisory control system. Also included is a series of case studies on operational and process control tasks in

170

Y. Gao et al.

an aluminium smelter. Due to commercial and industrial sensitivity, the smelter where this work is conducted will be referred to as Smelter A. This research work will also investigate any environmental or organizational factors that had an impact on the quality of decisions made. The ultimate aim of this research, therefore, is to meet the requirements of the process industry by incorporating human factors and decision making strategies into computer programs, such as supervisory control systems.

2 A Case Study – Bath Height Control To facilitate the discussions, some common terms used in the aluminium smelting industry are given here. Glossary Anode: The carbon mass connected to the positive side of the power supply and used for the electrochemical removal of oxide ions. The anode is gradually consumed in the electrochemical process and an anode spends about 28 days in the electrolytic cell before it is replaced. Anode butts: A replaced anode which has been used in an electrolytic cell and has reached the end of its useful life. Alumina: The reactant in the process which is reduced into aluminium in the electrolysis. Bath: The molten electrolyte used in the cell. Bath height: The bath liquid level measurement. Bath processing: Crust material on anode butts, and tapped bath material (solidified) are crushed into a suitable size for use in covering new anodes in the pots. Current efficiency: The ratio of the quantity of metal produced in the cell for the number of Coulombs passed compared to that theoretically expected from Faraday’s Law. Expressed as a percentage. Crusher: Machine for crushing solid material. Cryolite: Electrolyte. Crushed bath: Solid bath material or solid crust material after passing through the crusher. Crust: The solidified electrolyte-alumina powder matrix that forms on top of the anodes. Pot: An expression used for an electrolytic cell. Potroom: Production hall which contains a number of pots aligned in series. Potline: A number of pots linked together in series, electrically separate from any other potline. Pacman material: The material scooped from the surface or bottom of an anode cavity when an anode is removed by a machine commonly referred to as Pacman. Sludge: The alumina-electrolyte matrix beneath the metal pad. Synonymous with Muck. Tap: A term used to describe the act of removing bath or metal from a pot. Section: A group of pots. Stub: The steel rod that is casted into the anode for electrical contact and physical support. (Note: definitions are adopted from [39])

Operational Decision Making in Aluminium Smelters

171

2.1 Background The significance of bath mass balance is reflected through its impact on the mass balance, energy balance, and current efficiency of a pot [27, 28]. Maintaining constant liquid bath volume is important for bath mass balance. Liquid bath volume can be measured by adding and analysing trace elements such as Sr into the bath [17, 29]. However, this procedure is relatively difficult and tedious in practical operation. Instead of controlling bath volume, bath height is measured and monitored every day. Bath height has a direct impact on current efficiency and other process variables, such as alumina concentration, voltage, and metal purity. When the bath height is low, the retention time is short, and alumina tends to sink to the bottom of the pot before it can dissolve into the bath. Hence the alumina concentration in the bath is low, because the alumina has become part of the sludge formed at the bottom of the pot [17]. With low bath height, there is a potential risk of open circuit. To prevent this situation, anodes have to be lowered, that is the anode – cathode distance is reduced. However, this could result in low bath temperature, sludge formation, and increased bubble noise, as well as back reaction, hence reducing current efficiency [30, 31]. On the other hand, when the bath height exceeds the upper control limit, there is an increased chance for the liquid bath to contact the stubs, and hence increase the iron content in the aluminium metal [32]. Low bath height and large fluctuation situations have been of major concern in Smelter A. Therefore, this case study aims at identifying the variables which are associated with bath height control in Smelter A, and investigating the causes of their bath height control issues. This case study also attempts to connect the human behaviour and reasoning with process control and in this case, bath height control. 2.2 Variables That Affect Bath Height Control Table 1 lists the factors which might contribute to bath height variation. In this study, these variables are categorized into two groups. One group is referred to as ‘common factors’, which have impact on the entire potline. An example is anode quality. If a common factor has an impact on bath height control, every section in the potline is affected the same way. The other group is referred to as ‘individual factors’. Which have a different impact on different parts of the potline. For example, if there is an effect on bath height control of the management or leadership style of section leaders, then this effect will be different in sections of the potline with different leaders. 2.3 Method In this study, process data and events were collected from four sections in Smelter A as shown in Figure 1. Statistical analysis of the process data will provide some evidence to confirm the effect of the variables on bath height control. Evaluating the pot performance and the operating conditions in the potrooms, and working with the operators and section leaders, provided an opportunity to observe their modus operandi, and responses to operational problems, and work practices.

172

Y. Gao et al. Table 1. Categorization of possible bath height control variables

Common Factors Availability of crushed bath Power interruption Anode quality Cast house machine working condition Implementation of Information Technology such as supervisory control systems Work practice standards Implementation of Reward and Punishment system

Individual Factors Metal height Voltage control Making different decisions Section leader’s Management/Leadership style Degree of applying Information Technology such as supervisory control systems Degree of implementing Reward and Punishment system Degree of implementing correct work practices

Fig. 1. A schematic drawing illustrating the layout of the four sections in Smelter A

A survey, with 21 questions, was designed to obtain information that will help with the understanding of how bath height control is being carried out in Smelter A. The survey also provided information to investigate the dominant factors that resulted in low bath height, as observed in March, June, August, and October in 2008. The participants were 6 section leaders, 5 vice section leaders, 10 shift section leaders, 2 potroom managers, and 1 engineer. An interview with one of the section leaders who was regarded as a role model by the smelter managers, provided a detailed insight into the bath height control situation in Smelter A.

3 Results and Discussions 3.1 The Effect of the Availability of Crushed Material In the bath processing and recycling circuit, tapped solid bath material, crust material from anode butts, and pacman material are crushed into particles with a size range of 100µm to 10mm [33]. The crushed material is used mainly to cover newly set anodes or to re-dress the exposed anodes. This serves to prevent oxidation of the anodes and regulate the heat balance of the pot. The material which falls into the pot cavity during anode changing, dressing, or re-dressing, becomes one of the sources to replenish liquid bath [28, 33, 34]. Figure 2 shows the daily average of bath height measurements of the four sections in two potrooms from March to October 2008. As indicated in Figure 2, the targeted bath height measurement (cl) is 19 cm, the upper control limit (ucl) is 23 cm, and the lower control limit (lcl) is 15 cm. Also, the upper warning limit (uwl) and the lower

Operational Decision Making in Aluminium Smelters

Period A (More than 10 days)

Period C (Before Supervisory control system implementation)

173

Period B (More than 10 days)

Period D (After Supervisory control system implementation)

Fig. 2. Bath height measurements of the four sections from March to October 2008 (Y-axis: Bath height in cm)

Fig. 3. Bath height measurements for each section when crushed bath was not available – 15/05/2008 to 15/06/2008 (Y-axis: Bath height in cm)

warning limit (lwl) are 21 and 17 cm respectively. According to the operational staff, the crusher was broken at a rate of once every 2 weeks from March to May and from July to August. From the middle of May to the middle of June (Period A in Figure 2), the crusher was completely out of action. It was out of action again from September to the end of October (Period B in Figure 2). Periods A and B in Figure 2 correspond to periods when the crusher was broken for more than 10 days, and consequently crushed bath was unavailable. Figure 2 also shows that the average bath height of the four sections decreased when the crushed bath was unavailable for a long period of time (in both Period A and B). Figure 3 shows the bath height measurements for each section when the crusher was broken in Period A. All the pots were operating in a low bath height condition. Sections 1, 2, and 3 were operating below the lower control limits for most

174

Y. Gao et al.

of the time. Section 4 operated within the control band, but bath height was below the lower warning limit for the majority of the time. These data show clearly that bath height has a strong relationship with the availability of crushed bath. 3.2 The Effect of the Supervisory Control System Smelter A implemented a newly designed supervisory control system in July 2008. This system is able to warn the user when a process parameter is out of control by use of a visual display, incorporating elements such as colour and statistical graphs. For example, when a bath height measurement of a pot is below the lower control limit or above the upper control limit, the pot will be highlighted in red on the user interface. This is one of the concepts applied to alarm alerting systems, which is adopted from the human factors engineering literature [35]. The outcome on bath height control by using the colour alarm system in the supervisory control system is presented below. Figure 4 shows the daily average of the four sections for bath height measurements in Period C (before SCS implementation) and Period D (after SCS implementation). They show the overall trends of bath height measurement of the four sections. The average bath height in Period C had a large variation, whereas that in Period D was more constant. This illustrates the effectiveness of the use of supervisory control system. The data in Figure 4 have the standard deviations of the bath height measurements are 2.33 and 1.79 before and after implementing the supervisory control system. A sign test, which is commonly used to test the assumption that there is ‘no difference’ between the continuous distributions of two variables, confirmed the statistical significance of the difference in bath height measurements in Period C and Period D (p = 0.0136). However, to determine that this difference was caused by the implementation of the colour alarm in the supervisory control system, further explanations were searched for through surveys and interviews. Findings from the survey and interviews with the section leaders in Smelter A revealed that the Red colour alarms used to indicate low bath height gave adequate warning of the process state and put pressure on them to correct the problem. If the bath height was out of control, the section leaders would feel responsible for poor operation and face punishment from managers. Therefore, when red colour alarms were shown on the user interface, the section leader would first confirm the situation by searching for more information from both the supervisory control system and the operators, and then instruct the operators to take appropriate action to fix the problem.

Fig. 4. Bath height measurements for the four sections before (Period C in Figure 2) and after (Period D in Figure 2) implementing the supervisory control system (SCS) - (Y-axis: Bath height in cm)

Operational Decision Making in Aluminium Smelters

175

Some section leaders have set an allowable number of red colour alarms in their control plans. From observation, the ‘red colour alarm’ has been frequently used in their communication language. This indicates that the colour alarms in the supervisory control system have a positive impact on bath height control. The impact of the other common factors was not investigated in this case study. Future investigations are required to confirm their effects on bath height control. 3.3 The Effects of ‘Individual Factors’ The data in Figure 3 indicate that bath height measurements for Section 4 are closer to the target than those of the other three sections when the crusher was broken. To confirm this observation, a sign test was carried out. Very small p-values, much lower than 0.001, were obtained from the comparisons between Sections 4 and 1, Sections 4 and 2, and Sections 4 and 3. They indicate that the bath height measurements for Section 4 are highly significantly different from the measurements for any of the other three sections. Observation of the process data and other events suggested that the difference in bath height measurements between Section 4 and the other three sections is due to individual factors. In the situation when the crusher was out of action, most of the section leaders would add cryolite into the pots which had low bath height, or they would not take any action. The Section 4 leader took a different approach from that of the others. He instructed the operators to tap out liquid bath from pots that had excess liquid bath into trays. The cool solidified tapped-out bath was broken into large pieces and stored away. When the crushed bath material was not available, this section leader would ask the operators to break the large lumps of the stocked bath using hammers, and then feed these into those pots with low bath height, or use it as anode cover material. This approach led to better bath height control results, which are clearly shown in the graphs in Figures 3. 3.4 The Effects of Leadership and Management Style Motivation In Smelter A, a reward system was used to motivate and manage the operational staff. Staff performance was assessed by the immediate supervisor. The smelter manager, potline manager, and potroom managers all agreed that the Section 4 leader is a competent and effective leader. He understands and has the ability to control the process. He also has motivation to constantly seek improvement. This section leader has been rewarded and taken as a role model for the other co-workers. The recognition and rewards from the superiors have greatly motivated this section leader. A reward system was implemented in every section to assist the section leader to manage the operational teams. As an effective leader, the Section 4 leader tested and modified the reward system in his section. It was recognized by the managers that the modified system is more effective and it has been decided that it will replace the original system in the other sections. In the modified system, the Section 4 leader delegated a person to manage the reward system in his section full time. This person is responsible for checking and recording the work quality of the operators. He rewards the operators for outstanding jobs performed by giving extra points which

176

Y. Gao et al.

contribute to a salary increase. If any poor quality work is found, he informs the responsible operator to fix the problems. If the quality remains poor, he deducts points (ie., reducing salary) from the operator concerned. This new system increases the degree of supervision by having a full time person assessing the work performance of the operators. It also provides a second chance for the operators to improve their work quality if it was not up to standard in the first assessment. This modified system has been shown to be effective through the operational and process control outcomes. Money as a reward to motivate the operators in Smelter A has had a positive impact, and has been demonstrated by Section 4’s operational achievement. This can be explained suitably by Maslow’s theory. Because Smelter A is situated in a developing area where people have various educational backgrounds, and especially different financial situations, a large number of operators are working to secure the needs specified by Maslow’s first two levels. These needs are related to physiological and safety, which money is able to satisfy for most people. Decision making The Section 4 leader is a motivated leader who is also able to motivate the operators to achieve positive results. Motivation is often associated with creative thinking and proactivity [36, 37, 38]. These can often be reflected through decisions made to solve a problem or the quality of a task performed [1, 38]. Specific questions were designed to understand how the operational staff in Smelter A made decisions to solve the bath height control problems. Most of the section leaders recognized low bath height situations by reading tabulated data reports and attending to the ‘red’ warning alarms in the new supervisory control system. Most of them would search for more information to firstly confirm the situation, and then try to understand and investigate the cause of the low bath height. When crusher breakdown was the root cause of low bath height, most of the section leaders considered this to be a cause that was not under their control, and therefore no further action was taken. However, the approach of the leader of Section 4 of reusing excess bath (by tapping out and stockpiling) was a creative solution to fix and prevent low bath height situations.

4 Conclusions This on-site study was accomplished by analysing the process data, surveys, interview information, and observations in Smelter A. The main findings from this case study are: 1. Crusher breakdown caused crushed bath material to be unavailable, and was the main root cause of the low bath height situation, 2. The ‘red’ colour alarms in the supervisory control system increased the operational staff’s awareness of the process situation, especially regarding the low bath height situation, 3. Money as reward in Section 4 had positive impacts on bath height control, 4. Leadership and management styles had an impact on employee’s motivation, creativity, and decision making.

Operational Decision Making in Aluminium Smelters

177

Acknowledgement. We would like to acknowledge Dr. Arden Miller from the Department of Statistics, The University of Auckland, for his contribution and discussions regarding the statistical analysis.

References 1. Herbert, A.S., et al.: Decision Making and Problem Solving (1986) (2008), http://www.dieoff.org/page163.htm 2. Halford, G.S., Wilson, W.H., Pillips, S.: Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology. Behavioral and Brain Sciences 21, 803–831 (1998) 3. Halford, G.S., et al.: How many variables can humans process? American Psychological Society 16(1) (2005) 4. Neerincx, M.A.: Cognitive Support: Extending Human Knowledge and Processing Capacities. In: Human-Computer Interaction, vol. 13, pp. 73–106 (1998) 5. Swets, J.A., Dawes, R.M., Monahan, J.: Psychological Science Can Improve Diagnostic Decisions. American Psychological Society 1(1) (May 2000) 6. Bisseret, A.: Application of Signal Detection Theory to Decision Making in Supervisory Control: The Effect of the Operator’s Experience. Ergonomics 24(2), 81–94 (1981) 7. Taylor, M.P., Chen, J.J.J.: Advances in Process Control for Aluminium Smelters. Material and Manufacturing Process 22, 947–957 (2007) 8. Taylor, M.P., Chen, J.J.J., Hautus, M.J.: Operational Control Decision Making in Smelters. In: Proceedings of 9th Australasian Aluminium Smelting Technology Conference (2007) 9. Chen, J.J.J., Taylor, M.P.: Control of Temperature and Aluminium Fluoride in Aluminium Reduction, Aluminium. International Journal of Industry, Research and Applications 81, 678–682 (2005) 10. Scherbinin, S., et al.: Computer-aided system for pre-set voltage control, TMS –Light Metals (2002) 11. Zeng, S., Zhang, Q.: A supervision system for aluminium reduction cell, TMS – Light Metals (2003) 12. Yurkov, V., et al.: Development of aluminium reduction process supervisory control system, TMS – Light Metals (2004) 13. Berezin, A.I., et al.: FMFA-Based expert system for electrolysis diagnosis, TMS – Light Metals (2005) 14. Abaffy, C., et al.: CVG Venalum potline supervisory system, Light Metals (2006) 15. Lu, S.P.: Control and supervision of the aluminium electrolysis process with expert system, PhD thesis, University of Quebec (2002) 16. Zeng, S., Li, J., Ding, L.: Fault diagnosis system for 350KA pre-baked aluminium reduction cell based on BP neural network, TMS – Light Metals (2007) 17. Stam, M.: Common Behaviour and Abnormalities in Aluminium Reduction Cells, TMS – Light Metals (2008) 18. Gao, Y.S., Gustafsson, M., Taylor, M.P., Chen, J.J.J.: The control ellipse as a decision making support tool to control temperature and aluminium fluoride in aluminium reduction. In: Proceedings of 9th Australasian Aluminium Smelting Technology Conference (2007) 19. Ruiz, I.Y.: A global approach for supporting operators’ decision-making dealing with plant abnormal events, PhD thesis, Universitat Politecnica de Catalunya (2008)

178

Y. Gao et al.

20. Adler, N.J.: International Dimensions of Organizational Behavior, 2nd edn., pp. 152–160, 160–170. Wadsworth Publishing Company, Belmont (1992) 21. Wickens, C.D., Hollands, J.G.: Engineering Psychology and Human Performance, Pearson Education Taiwan, pp. 293–330 (2002) 22. Ivancevich, J.M., Matteson, M.T.: Organizational Behavior and Management, 2nd edn., Homewood, Boston, pp. 174–217 (1990) 23. Hofstede, G.: Motivation, Leadership, and Organization: Do American Theories Apply Abroad? Organizational Dynamics (1980) (summer) 24. Guzzo, R.A.: Types of Rewards, Cognitions, and Work Motivation. Academy of Management Review 4(1), 75–86 (1979) 25. Opsahl, R.L., Dunnette, M.D.: The Role of Financial Compensation in Industrial Motivation. Psychological Bulletin 66(2), 94–118 (1966) 26. Maslow, A.H.: A Theory of Human Motivation. Psychological Review 50, 370–396 (1943) 27. Tarcy, G.P.: Insight into parameters affecting current efficiency. In: Proceedings of 7th Australasian Aluminium Smelting Technology Conference (2001) 28. Taylor, M.P., Welch, B.J.: Improved energy management for smelters. In: Proceedings of 8th Australasian Aluminium Smelting Technology Conference (2004) 29. Iffert, M.: Challenges in Mass Balance Control, TMS – Light Metals (2005) 30. Welch, B.J.: The Impact of Changes in Cell Heat Balance and Operations on the Electrolyte Composition. In: Proceedings of 6th Australasian Aluminium Smelting Technology Conference (1998) 31. Kvande, H.: Bath Properties and Cell Operational Performances. In: Proceedings of 6th Australasian Aluminium Smelting Technology Conference (1998) 32. Lindsay, S.J.: Measures to control Fe contamination in Pre-bake reduction cells. In: Proceedings of 8th Australasian Aluminium Smelting Technology Conference (2004) 33. Taylor, M.P.: Anode Cover Material – Science, Practice and Future Needs. In: Proceedings of 9th Australasian Aluminium Smelting Technology Conference (2007) 34. Richards, N.E.: Anode Covering Practices. In: Proceedings of 6th Australasian Aluminium Smelting Technology Conference (1998) 35. Wickens, C.D., Gordon, S.E., Liu, Y.: An Introduction to Human Factors Engineering, pp. 223–251. Addison-Wesley Educational Publishers Inc., Reading (1998) 36. Wiener, Y., Vardi, Y.: Relationships between job, organization, and Career commitments and work outcomes – An integrative approach, Organizational Behavior and Human Performance, vol. 26, pp. 81–96 (1980) 37. Dale, K.: Leadership Style and Organizational Commitment: Mediating Effect of Role Stress. Journal of Managerial Issues XX (1), 109–130 (2008) 38. Grant, A.M., Ashford, S.J.: The Dynamics of Proactivity at Work. Research in Organizational Behavior 28, 3–34 (2008) 39. Grjotheim, K., Welch, B.J.: Aluminium Smelter Technology, 2nd edn., pp. 296–309 (1988)

Designers of Different Cognitive Styles Editing E-Learning Materials Studied by Monitoring Physiological and Other Data Simultaneously Károly Hercegfi1, Olga Csillik2, Éva Bodnár2, Judit Sass2, and Lajos Izsó1 1

Budapest University of Technology and Economics, Department of Ergonomics and Psychology, Egry J. u. 1, 1111 Budapest, Hungary {hercegfi,izsolajos}@erg.bme.hu 2 Corvinus University of Budapest, Institute of Behavioural Sciences and Theory of Communication, Fővám tér 8, 1093 Budapest, Hungary {olga.csillik,eva.bodnar,judit.sass}@uni-corvinus.hu

Abstract. At the Corvinus University of Budapest, a series of experiments was performed, applying the INTERFACE testing methodology developed by researchers of the Budapest University of Technology and Economics. This methodology is capable of recording data characterizing the user’s current mental effort derived from Heart Period Variability (HPV) and the user’s emotional state indicated by Skin Conductance (SC) parameters simultaneously and synchronized with other characteristics of Human-Computer Interaction (HCI). The current experiments aim to study how the teachers (electronic curriculum designers, developers) themselves use the e-learning development tools to design and edit a new piece of e-learning material. Keywords: Cognitive styles, analytic and holistic types, e-learning, Learning Management System (LMS), Moodle, usability testing and evaluation, empirical methods, Heart Period Variability (HPV), Skin Conductance (SC).

1 Applying the INTERFACE Methodology Figure 1 shows the conceptual arrangement of the INTERFACE (INTegrated Evaluation and Research Facilities for Assessing Computer-users' Efficiency) workstation. The advantage of the methodology applied in our study lies in its capability of recording continuous on-line data characterizing the user’s current mental effort derived from Heart Period Variability (HPV) and the user’s emotional state indicated by Skin Conductance (SC) parameters simultaneously and synchronized with other characteristics of Human-Computer Interaction (HCI). This way, a very detailed picture can be obtained which serves as a reliable basis for the deeper understanding and interpretation of psychological mechanisms underlying HCI. Elementary steps of HCI, like the different mental actions of users followed by a series of keystrokes and mouse-clicks, are the basic and usually critical components of using software. These steps can be modeled and analyzed by experts, but empirical results than expert analyses. One of the key aspects of the empirical methods is measuring D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 179–186, 2009. © Springer-Verlag Berlin Heidelberg 2009

180

K. Hercegfi et al.

observable behavior current screen content

data collecting and

keystrokes and mouse clicks

processing frame system

physiological signals by ISAX

Fig. 1. Conceptual arrangement of the INTERFACE user interface testing workstation

mental effort as it is laid down e.g. in the earlier international standard of software product evaluation (ISO/IEC 9126:1991). Hence we need methods capable of monitoring users’ current mental effort during these elementary steps. To attain the above, a complex methodology was developed earlier at the Budapest University of Technology and Economics, by Prof. Lajos Izsó and his team [3, 4, 5, 6]. This study presents an improved methodology and a new case study. The INTERFACE simultaneously investigates the following: • Users' observable actions and behavior − keystroke and mouse events; − video record of the current screen content; − video records of users’ behavior: (1) mimics, (2) posture and gestures. • Psycho-physiological parameters − Power spectrum of Heart Period Variability (HPV), regarded as an objective measure of current mental effort – we apply this signal successfully since more than 15 years [3, 4, 5, 6]; − Skin Conductance (SC) parameters, indicating mainly the emotional reactions – recently integrated into our system. In addition to observable elements of behavior, the applied complex method also includes traditional interviews to assess mental models, subjective feelings, and the users’ opinions about their perceived task difficulty and experienced fatigue. Recording these various data simultaneously requires a more sophisticated technical background than other empirical methods based on only personal observation or simple video recording. However, multiple channels enable researchers to concentrate on the channels that highlight the importance of various parts of the current event flow. 1.1 Assessing Users’ Performance and Behavior Performance measures are useful in general and in other projects of ours, but in the current study, we will apply them only for particular aspects of the interaction. Recording users’ behavior has outstanding importance. The video recording of the user’s face and activity is an extremely rich source of psychological information as it directly reflects the mental state (e.g. boredom, routine activity in familiar environment,

Designers of Different Cognitive Styles Editing E-Learning Materials

181

attention-demanding task, getting lost, emotions like frustration, anger, joy, etc.). To analyze this channel, we are working on integrating a new, sophisticated method into our INTERFACE methodology. 1.2 Assessing Mental Effort via Analyzing Users’ HPV Power Spectrum A number of studies [3, 4, 6, 11, 12, 13, 15] have shown that an increase in mental load causes a decrease in the so-called mid-frequency (MF) peak of the Heart Period Variability (HPV) power spectrum. To assess the spectral components of HPV power spectra, an integrated system called ISAX (Integrated System for Ambulatory Cardiorespiratory data acquisition and Spectral analysis) was developed and successfully used by Dr. Eszter Láng and her team. This equipment and the related method have been integrated into our INTERFACE system. The main advantage of our method over the previously existing HPV-based methods is that the MF component of HPV shows changes in mental effort in the time range of several seconds (as opposed to the earlier methods with a resolution of tens of seconds at the best). This feature was achieved by an appropriate windowing data processing technique, and application of an all-pole auto-regressive model with builtin recursive Akaike's Final Prediction Error criteria and a modified Burg’s algorithm. In this study, we apply a new, further developed version of ISAX for the first time. 1.3 Assessing Emotional Responses via Analyzing Users’ Skin Conductance Parameters Changes in the electrical activity of the skin (the so-called Electrodermal Activity – EDA) can be produced by various physical and emotional stimuli. We use the parameters derived from Skin Conductance (SC) responses, especially the Alternating Current (AC) component of the SC. In contrast with our earlier experiments applying Heart Rate Variability (HPV), measuring Skin Conductance (SC) in our INTERFACE methodology is relatively new to us1. We are working on it to complement the INTERFACE system with a component focusing mainly on the emotional aspects of the HCI, in addition to our well-tried approach of mental effort.

2 Applying INTERFACE to Study Interactions of Editing E-Learning Material We have used INTERFACE in various areas (e.g. mailing systems, CAD, WAP-based software, flight control system, etc.) [4, 5, 6]. We published details from a multimedia development project led by us [3], where we focused on the software usage of the students. Now, we focus on the other side of e-learning: we aim to study how the teachers use the development tools. 1

An interesting series of experiments using the new version of the ISAX to analyze SC responses is finished by one of our colleagues [8, 9, 10]. It is a good example of the promising way to use data mining techniques in empirical usability studies, as it is mentioned in the followings. However, in that case, the tool was not yet integrated into the complex INTERFACE system.

182

K. Hercegfi et al.

Among evaluating the user interface, as in the previous applications of the INTERFACE system, a specific aim of this series of experiments was to compare the holistic and analytic cognitive style of users’ mental activity and emotional responses during designing e-learning material. The series of experiments started in December 2007 with pilot experiments. The main part of the series started on 14th December and finished in March. 32 participants were involved. During the two-hour-long sessions, the task was to design and edit e-learning material with the Moodle Learning Management System (LMS). Participants, with basic knowledge about Moodle, executed this task with the help of prepared materials: “raw” texts and illustrations (pictures, video files). Analysis of the data, especially the statistical analysis of the hundreds of variables is in progress. To execute the statistical calculations and explore the deeper mechanisms, we use the SPSS 17.0 for Windows statistical and the SPSS Clementine 12.0 data mining software package.

Video images of the two cameras The screen just seen by the user AC of SC curve Crosshair: current moment

Time:032:56:765, Char: Microsoft Internet Explorer

, VK_CONTROL, UP

Timeline with signs

CSOPHD: DI 2007. dec.14. -

Keyboard and mouse actions (corresponding to the gray and black rectangles in the timeline)

RR curve MFP of HPV curve

Experimenter’s comments (corresponding to the red triangles in the timeline)

Fig. 2. The INTERFACE Viewer software. In the moment of this screenshot, the user is thinking, so the 3rd (green) curve is low.

2.1 Instruments to Determine Cognitive Styles In our examination, four questionnaires are used to identify holistic and analytic cognitive style users:

Designers of Different Cognitive Styles Editing E-Learning Materials

183

• Cognitive Style Index (CSI) [1] to measure the whole/part-processing dimension of cognitive style, identifying an individual's cognitive style as being either analyst or intuitive. • Personal Style Inventory (PSI) measures dimensions of the Myers-Briggs Type Inventory (MBTI). It is based on Carl Jung’s theory of personality types. These four dimensions are: extroversion-introversion, sensing-intuition, thinking-feeling, and judging-perceiving. • Kirton’s Adaptor–Innovator Inventory (KAI) assesses a person's position on the adaption-innovation continuum. When confronted with a problem, the adaptor turns on to conventional procedures in order to find solutions. In contrast, innovators will typically redefine the problem by approaching it from a novel perspective. • Digital Natives – Digital Immigrants [14]: the term digital native is coined as it pertains to a new breed of student entering educational establishments. Today's learners represent the first generations to grow up with this new technology. However, digital immigrants apply printing documents rather than commenting on screen or printing out emails. 2.2 Hypotheses from the Point of View of Cognitive Styles I. Analytical type The analytical type of tutors/e-curriculum designers is active and experience-oriented in e-learning. The analytic type prefers: • • • •

curriculum rich in visual and verbal units possibilities and exercises that require analyses simulations, models, videos to annotate and rearrange the curriculum by their own principle.

II. Holistic type The “holistic type” of tutors/e-curriculum designers is more active. They build up their curriculums and personal learning environments from these units. The holistic type prefers: • curriculum linking up memorized units and parts of curriculum • to connect parts together • experience, particular cases and exercises (compilation, thesaurus, collection of curiosity, suggested reading) • to comprehend the curriculum as a whole. 2.3 Procedure 1st phase (10 minutes) In this phase participants were asked to execute 3 tasks to detect their mental activity under different circumstances. Detection of different patterns of HPV and SC gives bases for comparison with responses during task period patterns: without any activity, after 1 minute alertness period participants count down for 1.5 minute, 2 minutes relaxation period.

184

K. Hercegfi et al.

2nd phase (45 minutes) The task is to design and edit e-learning material with the help of Moodle system. Participants, with basic knowledge about Moodle Learning Management System (LMS), execute this task with the help of prepared materials: 2 pages “raw” text about the learning material, and illustrations (pictures, video files). 3rd phase (10 minutes) Interview section about participants’ experiences during task execution. The structured interview has 3 main topics: prepared learning material (e.g. satisfaction, difficulties), circumstances of work, experiences with Moodle.

Fig. 3. In the moment of this screenshot, the HPV profile curve is low. This is caused by a typical software problem: the currently selected image doesn’t appear; the user try to solve this problem by the help of the context menu; but this is not a desktop application, it is a simple web-based software – the context menu contains only the standard commands of the web browser. The seriousness of this problem depends on the user’s cognitive style: the analytical-type users cope with this easier.

Designers of Different Cognitive Styles Editing E-Learning Materials

185

3 Conclusion Based on the results presented here as well as in related papers, it can be stated that the INTERFACE methodology can help us to understand the mechanisms underlying the HCI in depth. Studying the individual differences, such as differences in cognitive styles is a really important opportunity to understand HCI, in spite of choosing the easier way to design for “average” users. Acknowledgements. The authors would like to thank Dr. Eszter Láng for the guidenance, and the participants of the series of experiments for their valuable contribution.

References 1. Allinson, C.W., Hayes, J.: The Cognitive Style Index: a measure of intuition-analysis for organisational research. Journal of Management Studies 33, 119–135 (1996) 2. Chen, D., Vertegaal, R.: Using Mental Load for Managing Interruptions in Physiologically Attentive User Interfaces. In: Proc. CHI 2004, pp. 1513–1516. ACM Press, New York (2004) 3. Hercegfi, K., Kiss, O.E., Bali, K., Izsó, L.: INTERFACE: Assessment of HumanComputer Interaction by Monitoring Physiological and Other Data with a Time-Resolution of Only a Few Seconds. In: Proc. ECIS 2006, ECIS Standing Comm., pp. 2288–2299 (2006) 4. Izsó, L.: Developing Evaluation Methodologies for Human-computer Interaction. Delft University Press, Delft (2001) 5. Izsó, L., Hercegfi, K.: HCI Group of the Department of Ergonomics and Psychology at the Budapest University of Technology and Economics. In: Ext. Abstracts CHI 2004, pp. 1077–1078. ACM Press, New York (2004) 6. Izsó, L., Láng, E.: Heart Period Variability as Mental Effort Monitor in Human Computer Interaction. Behaviour & Information Technology 19(4), 297–306 (2000) 7. Kirton, M.J.: Adaptors and innovators: A description and measure. Journal of Applied Psychology 61, 622–629 (1976) 8. Laufer, L.: Jump is on the Skin Deep: Predicting User Behavior from Skin Conductance Level. In: Laufer, L. (ed.) Ext. Abstracts HCII 2007. Springer, Heidelberg (2007) 9. Laufer, L., Németh, B.: Anticipation of Stress as an Indicator of User Interaction. In: Proc. Workshop on Measuring Affect in HCI: Going Beyond the Individual. ACM Press, New York (2008) 10. Laufer, L., Németh, B.: Predicting User Action from Skin Conductance. In: Proc. IUI 2008, pp. 357–360. ACM Press, New York (2008) 11. Lin, T., Imamiya, A.: Evaluating usability based on multimodal information: an empirical study. In: Proc. ICMI 2006, pp. 364–371. ACM Press, New York (2006) 12. Mulder, G., Mulder-Hajonides van der Meulen, W.R.E.H.: Mental load and the measurement of heart rate variability. Ergonomics 16, 69–83 (1973)

186

K. Hercegfi et al.

13. Orsilia, R., Virtanen, M., Luukkaala, T., Tarvainen, M., Karjalainen, P., Viik, J., Savinainen, M., Nygard, C.-H.: Perceived Mental Stress and Reactions in Heart Rate Variability – A Pilot Study Among Employees of an Electronics Company. International Journal of Occupational Safety and Ergonomics (JOSE) 14(3), 275–283 (2008) 14. Prensky, M.: Digital Natives, Digital Immigrants. On the Horizons 9(5) (2001) 15. Rowe, D.W., Sibert, J., Irwin, D.: Heart Rate Variability: Indicator of User State as an Aid to Human-Computer Interaction. In: Proc. CHI 1998, pp. 18–23. ACM Press, New York (1998)

Analyzing Control-Display Movement Compatibility: A Neuroimaging Study S.M. Hadi Hosseini1, Maryam Rostami1, Makoto Takahashi1, Naoki Miura2,3, Motoaki Sugiura2, and Ryuta Kawashima2 1

Department of Management Science & Technology, Graduate School of Engineering, Tohoku University, Sendai, Japan [email protected] 2 Departement of Functional Brain Imaging, Institute of Development, Aging and Cancer, Tohoku University, Sendai, Japan 3 Department of Intelligence Mechanical Systems Engineering, Kochi University of Technology, Japan

Abstract. Despite the huge number of studies on control-display compatibility conducted over the past fifty years, there are still debates concerning the efficacy of conventional measures such as subjective evaluation and performance measures for discriminating between compatible and incompatible controldisplay mappings. Since compatibility refers to the control-display relationship corresponding to mental model of the users, we tried to apply functional neuroimaging technique as a direct objective measure for analyzing cognitive factors involved in human-machine interaction (HMI). Functional Magnetic Resonance Imaging (fMRI) was applied in order to analyze rotary control-linear display movement compatibility for horizontal and vertical linear displays. Although the results of behavioral measures were not significantly different for incompatible and compatible control-display mappings, neuroimaging results were quite successful in discriminating between them. Moreover, the fMRI results showed significantly greater brain activity for the incompatible condition than for the compatible one in the left posterior cingulate and the right inferior temporal gyrus that reveals the involvement of a greater cognitive load in terms of attention and visuomotor transformation in the incompatible condition. The results of this study suggest that neuroimaging method is a good complement to conventional measures and is quite helpful to acquire a better understanding of the cognitive processes involved in HMI.

1 Introduction Human interactions with machines and equipment are basically performed through interfaces that include several displays and controls. Information about the system status is presented through displays and necessary actions are taken by users using corresponding controls which further affect the displayed signals. There is usually a preferred mapping between characteristics of display signals and elements in response set of corresponding control for most people that is called compatible mapping or population stereotype [26]. This mapping can be defined in different aspects of D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 187–196, 2009. © Springer-Verlag Berlin Heidelberg 2009

188

S.M.H. Hosseini et al.

control-display characteristics such as spatial arrangement, movement relation, conceptual coding, etc. These kinds of tendencies should be considered by HumanMachine Interface (HMI) designers in order to realize easier learning, faster response times, fewer errors, and generally better performance. Pioneering studies on controldisplay compatibility (CDC) date back to works of [8], [34] in which they debated that displays and controls can not be examined in isolation. Because of the importance of CDC in human-machine interaction in terms of safety, performance, and user satisfaction ergonomics studies are still ongoing on different aspects of compatibility [5], [31], [35]. The conventional methods for analyzing CDC are either based on subjective evaluations like paper-and-pencil test and behavioral measures like subjects’ reaction time or based on some developed heuristics (e.g. Warrick’s principle [34] for controldisplay movement compatibility) each of which has its own benefits and deficiencies. Subjective measures are quite easy to apply with low cost but are susceptible to subject bias about the configuration being evaluated and are prone to dissociate with other types of measures like subjects’ performance [12], [25], [29], [32], [33], [35]. Behavioral measures are more objective and realistic than the subjective evaluations but they are mainly criticized because of their low sensitivity for discriminating between compatible and incompatible control-display mappings [5], [12], [36]. Considering developed heuristic principles, the main critiques associated with them are that they are not relevant to all control-display geometries and that sometimes inconsistency exists among them within one control-display arrangement [12], [35]. Since control-display compatibility refers to the control-display relationship corresponds to the mental model of the users, we tried to apply functional neuroimaging method [3] as a direct objective measure for complementing conventional measures of CDC analysis as well as for analyzing the cognitive processes involved in subjects’ interaction with the compatible and incompatible control-display mappings. There are several functional neuroimaging methods like Electroencephalography (EEG) [20], Positron Emission Tomography (PET) [1], and functional Magnetic Resonance Imaging (fMRI) [14] among which the fMRI has been growing very fast in the last decade because of its lack of invasiveness (it can be applied to healthy subjects) and reasonable temporal and spatial resolution. Functional MRI has enabled scientists to look into the human brain in vivo, to literally “watch it while it works”. The technique is safe, allowing repeated examinations to probe time-dependent changes in the human brain. Using this technique, one can analyze which brain areas have been activated in response to special stimulus comparing with a control condition. In recent years, application of functional neuroimaging methods in ergonomics is growing very fast [7], [15], [24], [27], [28]. In this study, we tried to use fMRI in order to analyze rotary control-linear display movement compatibility for vertical and horizontal displays. We tried to apply the neuroimaging method from an ergonomic point of view which is seeking two objectives: First, to find a cognitive measure for differentiating compatible and incompatible control-display movement mappings as a complement to the conventional measures of analyzing CDC. Second, to analyze which cognitive processes are involved in the interaction with the incompatible control-display movement mappings. The latter will ultimately lead to better understanding of the human-machine interaction as well as improvement of the HMI design by reducing the load of the involved cognitive processes.

Analyzing Control-Display Movement Compatibility: A Neuroimaging Study

189

2 Method 2.1 Participants Fourteen right-handed university students (mean age of 22 years old) participated in the study. All the subjects were healthy (no signs or history of medical or neurological diseases) and were native Japanese speakers. Subjects’ handedness were assessed by the Edinburgh Handedness Inventory [23], and the mean (SD) laterality quotient was 97.1 (6.9). Written informed consent was obtained from each subject in accordance with the guidelines approved by the Strategic Research and Education Center for an Integrated Approach to Language, Brain and Cognition, Tohoku University 21st Century Center of Excellence Program in Humanities, and the Helsinki Declaration of Human Rights, 1975. 2.2 Experimental Paradigm Two types of linear displays have been selected for this study; one horizontal and one vertical display. For each configuration, the subjects were asked to select which control-display movement mapping is preferable for them. The selected mapping has been considered as the compatible one. Mean (SD) percent of preferability for compatible horizontal and vertical configurations was 87% (4.1) and 75% (5.5), respectively. Based on this evaluation, two response mappings have been decided for each configuration: A compatible mapping that was in accordance to the subject’s prespecified preference and an incompatible mapping opposite to the compatible one. The task was designed so that the horizontal or vertical control-display configuration was displayed to the subjects. Meanwhile, the display indicator started to fluctuate within the specified limit shown by red stripes on the presented display. The subjects were asked to respond by using a rotary control. When the display indicator exceeds the specified lower or upper limit, they should respond to bring the indicator back to the normal position. For each configuration two tasks have been designed: a compatible task and an incompatible task. So, totally four task conditions have been included: Horizontal Compatible (HC), Horizontal Incompatible (HIC), Vertical Compatible (VC), and Vertical Incompatible (VIC). In order to exclude the unwanted brain activities, a control task was devised at the beginning of which an arrow was presented on the screen to show the direction (Clockwise (CW) or Counter-Clockwise (CCW)) that the subject was supposed to respond. Using this control condition, brain activities related to the perceptual input, judgment about whether the indicator exceeds the limit or not, and the motor output can be excluded from the task conditions in the analysis. Therefore, only the activations related to the response selection and the stimulus-response mapping will remain to be compared among the task conditions. A block paradigm has been designed for the experiment including four 15-sec task blocks named HC, HIC, VC, VIC and two 15-sec control blocks named CH and CV as depicted in Fig. 1.a. A cue before each block represented the coming task: “O” sign for the compatible task, “X” sign for the incompatible task, and a clockwise or counter clockwise arrow sign for the control task. In each task or control block, five deviations were designated and each block has been repeated five times in a session.

190

S.M.H. Hosseini et al.

REST

X

X

O HC

REST

VIC

REST

CH

REST

O

HIC

REST

VC

REST

CV

Fig. 1. Experimental paradigm

The order of blocks was counterbalanced among the subjects. The task scenarios were designed so that the total number of CW and CCW responses was equal for different conditions in each session in order to remove the corresponding effects in terms of motor output on the brain activation. A 10-sec rest condition was also placed between the task conditions in which subjects were instructed to gaze at a fixation plus presented in the middle of the screen. Immediately after the fMRI scanning, a questionnaire was presented to the subject about the difficulty of the tasks. Subjects were asked to rate the difficulty of each task and express the way they managed the compatible and incompatible responses. 2.3 fMRI Data Acquisition For performing the fMRI experiment, the subjects were asked to lie supine in the MRI scanner. A semi-lucent screen was put in front of the subject’s face and the visual stimulus was projected from outside of the MRI room to the screen. A rotary knob was set besides the subject so that he can comfortably manipulate it by the right hand. Thirty three slices (slice thickness = 3 mm, gap = 1 mm) covering the whole brain were acquired by the gradient-echo echo-planer (GE-EPI) magnetic resonance imaging (MRI) (repetition time = 3000 ms, echo time = 50 ms, flip angle = 90°, FOV = 192 × 192 mm2, voxel size = 3 × 3 × 4 mm3, matrix = 64 × 64) on a 1.5 T Siemens Magnetom Symphony scanner (Siemens, Munich, Germany). A T1-weighted structural image was also acquired for each subject. The fMRI scanning started with the rest condition for dummy scans to stabilize the effect of T1-time, and the total time of fMRI scanning for each subject was 13 min 45 sec. 2.4 Data Analysis Image processing and the statistical analysis of fMRI data were carried out using the statistical parametric mapping (SPM2) software [9]. The initial three scans of each subject were dummy scans to equilibrate the state of magnetization and were discarded from the analysis. The differences in the acquisition timing across slices in each scan were adjusted and effects of the head motion across the scans were corrected by realigning all the scans to the first scan. The subjects with excessive head motions (more than 2mm in any axis) were excluded from the analysis. The data related to three subjects did not satisfy this criterion and were excluded from the analysis. The functional scans were then spatially normalized to the Montreal Neurological Institute (MNI) space [6] and spatially smoothed with a 9-mm full-width at half maximum (FWHM) Gaussian filter to reduce the noise and minimize the effects of normalization errors.

Analyzing Control-Display Movement Compatibility: A Neuroimaging Study

191

For each subject, voxelwise statistical parametric maps (SPM) were calculated for linear contrasts between the regressors of interest. The SPMs from each subject for a given contrast were then entered into the group analyses, where participants were treated as random effects. Subtraction images were created for task versus control conditions (HC > CH; HIC > CH; VC > CV; VIC > CV), incompatible task versus compatible task for each display type (HC>HIC masked by HC>CH; HIC>HC masked by HIC>CH; VC>VIC masked by VC>CV; VIC>VC masked by VIC>CV), and incompatible condition versus compatible condition (HIC+VIC>HC+VC masked by HIC>HC and VIC>VC; HC+VC>HIC+VIC masked by HC>HIC and VC>VIC). The intersubject maps were created by performing a one-sample t-test on each voxel of each subtraction image on an intersubject basis in order to identify the voxels that had significantly large partial correlation. A statistical threshold was set at P < 0.05 (corrected for Family Wise Error (FWE)). Finally, the resulting activation maps were constructed and superimposed onto the stereotactically standardized T1-weighted MRI images. In addition, the region of interest (ROI) analysis [22] was performed to evaluate the local blood oxygen level dependent (BOLD) signal changes in the activated regions. For each participant and ROI, the mean beta values were extracted separately for each experimental condition relative to the control condition.

3 Results 3.1 Behavioral Results Fig. 2.a shows the results of subjective ratings for HC, HIC, VC, and VIC conditions after the fMRI scanning. The mean subjective ratings for the compatible and incompatible conditions were also included by averaging the subjective ratings related to HC & VC conditions and HIC & VIC conditions, respectively. Although these ratings showed that the incompatible task is more difficult than the compatible one for both the horizontal and the vertical configurations, the difference between the compatible and incompatible tasks were not statistically significant (t-test, p<0.05). Fig. 2.b shows the mean reaction times in different tasks including HC, HIC, VC, and VIC. The mean reaction times were also included for the compatible condition by averaging the subjective ratings of HC and VC conditions and for the incompatible condition by averaging the subjective ratings of HIC and VIC conditions. The results of t-test statistics (p<0.05) did not reveal any significant difference between the compatible and incompatible conditions. Task Difficulty%

Reaction Time (Sec.) 0.4

90 60 30 0

0.2

(a)

VC

om VI C In pa co tib m le pa ti b le

C

C

IC

H

H

In pa co tib l m pa e ti b le

VC

C

om

VI C

C H

H

IC

0

(b)

Fig. 2. Results of performance and subjective evaluations

192

S.M.H. Hosseini et al.

3.2 Neuroimaging Results The results of inter-subject analysis revealed significant activation (FWE correction: p<0.05) for the “incompatible vs. compatible” contrast in the left posterior cingulate gyrus and the right inferior temporal gyrus (ITG). No activation has been found for other contrasts using the same statistical threshold. Fig. 3 shows the sites of peak activation related to the “incompatible vs. compatible” contrast in MNI coordinate along with corresponding activation maps. The results of the estimated beta values relative to the control condition are depicted in Fig. 3. The results showed that activation in the left posterior cingulate is significantly greater in the incompatible condition than in the compatible one (t(10)=7.12, p<0.001, Fig. 3.a (right panel)). Activation in the posterior cingulate was also significantly greater in the HIC condition compared to the HC (t(10)=2.63, p<0.05, Fig. 3.a (middle panel)), as well as in the VIC condition compared to the VC (t(10)=2.40, p<0.05, Fig. 3.a (middle panel)). The ROI analysis for the right ITG also revealed significantly greater activation in the incompatible condition than in the compatible one (t(10)=5.34, p<0.001, Fig. 3.b (right panel)). Activation in the right ITG was also significantly greater in the HIC condition compared to the HC (t(10)=2.47, p<0.05, Fig. 3.b (middle panel)), as well as in the VIC condition compared to the VC ( t(10)=4.08, p<0.05, Fig. 3.b (middle panel)). Correlation analysis revealed a strong positive correlation between the difference of activation in the right ITG and the difference in subjects’ reaction time for HIC>HC contrast (r(10)=0.62, p<0.05). A strong positive correlation has also been found between the difference of activation in the left posterior cingulate and the right ITG for VIC>VC contrast (r(10)=0.72, p<0.05).

Activation (Beta Estimate)

*

0.4

*

0.2

Activation (Beta Estimate) 0.8

0

0

-0.2

-0.4

HC HIC

**

0.4

VC VIC

Com patible Incom patible

(a) Left posterior cingulate, MNI (-6,-40,22), t-value=7.13 Activation (Beta Estimate) 0.6 0.4 0.2 0 -0.2

*

*

Activation (Beta Estimate)

**

0.8 0.4 0 -0.4

HC

HIC

VC

VIC

Com patible Incom patible

(b) Right ITG, MNI (64,-24,-20), t-value=5.34 * statistically significant difference at p<0.05 ** statistically significant difference at p<0.001

Fig. 3. Activation maps and beta estimates of activations

Analyzing Control-Display Movement Compatibility: A Neuroimaging Study

193

4 Discussion In this study, functional magnetic resonance imaging has been used as a method for complementing the conventional measures of differentiating compatible and incompatible control-display movement mappings as well as for analyzing the cognitive processes involved in the corresponding interactions. In this experiment, the results of subjective evaluation of task difficulty (Fig. 2.a) and subject’s reaction time (Fig. 2.b) showed greater ratings in average for the incompatible tasks than the compatible one. However, the difference was not statistically significant to differentiate the compatible and incompatible control-display configurations. Although the results of subjective evaluation of task difficulty and performance were not successful in differentiating the compatible and incompatible control-display movement mappings, the results of the fMRI experiment showed a significant brain activation differences comparing the incompatible with the compatible condition. As it is shown in Fig. 3, activations in the left posterior cingulate and right ITG were significantly greater in the incompatible condition compared with the compatible one. Moreover, the activations of these areas were also significantly greater in the incompatible conditions for both the horizontal and vertical configurations. These results suggest that the activation of the left posterior cingulate and right ITG is a good predictor for differentiating compatible and incompatible control-display configurations. Besides differentiating between the compatible and incompatible control-display configurations, the obtained neuroimaging results can also be helpful to reveal distinctions between the cognitive processes involved in compatible and incompatible interactions. In this study, since no significant activation has been found comparing the compatible with the incompatible condition (FWE correction: p<0.05), the obtained activations in the left posterior cingulate and right ITG represent the salient cognitive mechanisms which are more exploited in subjects’ interaction with the incompatible control-display configurations rather than with the compatible one. The posterior cingulate cortex receives projections directly from the visual cortex and projects strongly to the mediodorsal caudate nucleus, placing this area in an ideal position in which to integrate visual information and response output [2]. Several neuroimaging studies have reported the involvement of the posterior cingulate in visuomotor transformation, navigation, and spatial orientation [13], [17], [19], [21] supporting activation of this area in the context of interaction between visuospatial information and motor output. Higher activation of this area in the incompatible task reveals that the incompatible control-display movement mapping increases the task difficulty in terms of visuomotor transformation. Since activation of this area was significantly greater in the incompatible conditions than the compatible ones, it implies that activation of this area can be a good measure for differentiating difficulty of visuomotor transformations in control-display movement mapping. The other activation area was found in the right ITG comparing the incompatible with the compatible condition. The ITG belonged to the higher visual cortex is a part of the ventral visual pathway [10], which is basically involved in object recognition. However, recent neuroimaging studies have revealed involvement of this brain region in tasks which require sustained attention and attentional control for conflict

194

S.M.H. Hosseini et al.

resolution [16]. Moreover, neuroimaging studies of attentional load [4], [18] reported increasing activation in the right ITG with increasing attentional load in goal-directed visual task, the Stroop task, and the Simon task. The strong positive correlation between activation of this area and the reaction time found in the present study supports the idea that the increase in activation of this brain region is attributed to increase in the attentional load and is supported by previous neuroimaging studies [18]. Altogether, greater activation in the right ITG as a part of the higher visual cortex can be reasonably attributed to more attentional load involved in interacting with the incompatible control-display movement mapping. Consequently, the significant difference in activation of this area in the incompatible condition rather than the compatible one for both the horizontal and vertical configurations reveals that the activation of this area can be a good measure for differentiating attentional load exploited for different kinds of control-display movement mappings. A significant correlation was also found between activation in the posterior cingulate and ITG. Since the posterior cingulate receives a massive input from the ventral visual pathway [30], it can be speculated that increase in task difficulty in terms of visuomotor mapping in the incompatible condition calls for more attention and involvement of the ITG. The results of the present study show that the neuroimaging method can be a good complement for the conventional measures in analyzing human-machine interaction. Neuroimaging method is not only more sensitive than conventional measures to reveal the differences in human-machine interaction, but also is quite helpful for understanding the involved cognitive processes. Since behavioral measures reflect more than one cognitive process, they are not unique measures of mental workload [11]. Using neuroimaging technique, one can analyze different cognitive processes involved in HMI by analyzing the activated brain areas. However, apart from high experimental cost of using fMRI, confounding factors like large subjects’ head movement that prevent the experimenter to perform highly interactive tasks that requires subjects’ locomotion should also be considered for real world applications. Nevertheless, these kinds of confounding factors may also be relaxed by using other kinds of neuroimaging techniques.

5 Conclusions The fMRI has been exploited in this study to analyze compatible and incompatible control-display movement mappings. Application of this neuroimaging technique for differentiating two types of compatible and incompatible control-display configurations revealed its capabilities to complement the conventional measures of controldisplay compatibility. The fMRI results showed an increase of neural activity in the higher visual cortex (ITG) and the posterior cingulate which are attributed to more cognitive load in terms of attention and visuomotor transformation in the incompatible condition. Altogether, the present study showed that neuroimaging method can be quite helpful for ergonomics studies in order to better understand the cognitive mechanisms involved in human machine interaction.

Analyzing Control-Display Movement Compatibility: A Neuroimaging Study

195

Acknowledgment This research was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (B), 18300039, 2007. I would like to thank Hiroki Miyata and Yuki Tobita for their assistance in the fMRI experiment.

References 1. Bailey, D.L., Townsend, D.W., Valk, P.E., Maisey, M.N.: Positron Emission Tomography: Basic Sciences. Springer, New Jersey (2005) 2. Bussey, T.J., Muir, J.L., Everitt, B.J., Robbins, T.W.: Triple Dissociation of Anterior Cingulate, Posterior Cingulate, and Medial Frontal Cortices on Visual Discrimination Tasks using a Touchscreen Testing Procedure for the Rat. Behavioral Neuroscience 111(5), 920– 936 (1997) 3. Cabeza, R., Kingstone, A.: Handbook of Functional Neuroimaging of Cognition. The MIT Press, Massachusetts (2001) 4. Caleb, M., Adler, C.M., Sax, K.W., Holland, S.K., Schmithorst, V., Rosenberg, L., Strakowski, S.M.: Changes in Neuronal Activation with Increasing Attention Demand in Healthy Volunteers: An fMRI Study. Synapse 42(4), 266–272 (2001) 5. Chan, W.H., Chan, A.H.: Movement Compatibility for Rotary Control and Circular Display: Computer Simulated Test and Real Hardware Test. Applied Ergonomics 34(1), 61– 71 (2003) 6. Collins, D.L., Zijdenbos, A., Kollokian, V., Sled, J.G., Kabani, N.J., Holmes, C.J., Evans, A.C.: Design and Construction of a Realistic Digital Brain Phantom. IEEE Transactions on Medical Imaging 17, 463–468 (1998) 7. Fafrowicz, M., Marek, T.: Quo Vadis, Neuroergonomics. Ergonomics 50(11), 1941–1949 (2007) 8. Fitts, P.M., Seeger, C.M.: S-R Compatibility: Spatial Characteristics of Stimulus and Response Codes. Journal of Experimental Psychology 46, 199–210 (1953) 9. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.B., Frith, C.D., Frackowiak, R.S.J.: Statistical Parametric Maps in Functional Imaging: a General Linear Approach. Human Brain Mapping 2, 189–210 (1995) 10. Gazzaniga, M.S., Ivry, R.B., Mangun, G.R.: Cognitive Neuroscience: The Biology of Mind. W.W. Norton, NY (2002) 11. Hancock, P.A., Szalma, J.L.: The Future of Neuroergonomics. Theoretical Issues in Ergonomics Science 4, 238–249 (2003) 12. Hoffmann, E.R.: Strength of Component Principles Determining Direction of Turn Stereotypes: Linear Displays with Rotary Controls. Ergonomics 40(2), 199–222 (1997) 13. Huang, C., Wahlund, L., Svensson, L., Winbald, B., Julin, P.: Cingulate Cortex Hypoperfusion Predicts Alzheimer‘s Disease in Mid Cognitive Impairment. BMC Neurology 2, 9 (2002) 14. Huettel, S.A., Song, A.W., McCarthy, G.: Functional Magnetic Resonance Imaging. Sinauer Associate Inc., Massachusetts (2004) 15. Karwowski, W., Siemionow, W., Gielo-Perczak, K.: Physical Neuroergonomics: The Human Brain in Control of Physical Work Activities. Theoretical Issues in Ergonomics Science 4, 175–199 (2003)

196

S.M.H. Hosseini et al.

16. Liu, X., Banich, M.T., Jacobson, B.L., Tanabe, J.L.: Common and Distinct Neural Substrates of Attentional Control in an Integrated Simon and Spatial Stroop Task Assessed by Event-Related fMRI. NeuroImage 22, 1097–1106 (2004) 17. Maguire, E.A.: The Retrosplenial Contribution to Human Navigation: a Review of Lesion and Neuroimaging Findings. Scandinavian Journal of Psychology 42, 225–238 (2001) 18. Mazoyer, P., Wicker, B., Fonlupt, P.: A Neural Network Elicited by Parametric Manipulation of the Attention Load. Neuroreport 13(17), 2331–2334 (2002) 19. Moffat, S.D., Elkins, W., Resnick, S.M.: Age Differences in the Neural Systems Supporting Human Allocentric Spatial Navigation. Neurobiology of Aging 27(7), 965–972 (2006) 20. Niedermeyer, E., Lopes da Silva, F.: Electroencephalography: Basic Principles, Clinical Applications, and Related Fields. Lippincott Williams and Wilkins, Philadelphia (2005) 21. Nielsen, F.A., Balslev, D., Hansen, L.K.: Mining Posterior Cingulate. NeuroImage 27(3), 520–532 (2004) 22. Nieto-Castanon, A., Ghosh, S.S., Tourville, J.A., Guenther, F.H.: Region of Interest Based Analysis of Functional Imaging Data. NeuroImage 19(4), 1303–1316 (2003) 23. Oldfield, R.: The Assessment and Analysis of Handedness: the Edinburgh Inventory. Neuropsychologia 9, 812–815 (1971) 24. Parasuraman, R., Rizzo, M.: Neuroergonomics: The Brain at Work. Oxford University Press, NY (2006) 25. Payne, S.J.: Naive Judgments of Stimulus-Response Compatibility. Human Factors 37, 495–506 (1995) 26. Proctor, R.W., Vu, K.P.L.: Stimulus-Response Compatibility Principles: Data, Theory, and Application. CRC Press, UK (2006) 27. Sanderson, P., Pipingas, A., Danieli, F., Silberstein, R.: Process Monitoring and Configural Display Design: a Neuroimaging Study. Theoretical Issues in Ergonomics Science 4, 151–174 (2003) 28. Sarter, N., Sarter, M.: Neuroergonomics and Challenges of Merging Neuroscience with Cognitive Ergonomics. Theoretical Issues in Ergonomics Science 4, 142–150 (2003) 29. Tlauka, M.: Display-Control Compatibility: The Relationship between Performance and Judgements of Performance. Ergonomics 47(3), 281–295 (2004) 30. Vogt, B.A., Vogt, L., Laureys, S.: Cytology and Functionally Correlated Circuits of Human Posterior Cingulate Areas. NeuroImage 29, 452–466 (2006) 31. Vu, K.P., Proctor, R.W.: Determinants of Right-Left and Top-Bottom Prevalence for TwoDimensional Spatial Compatibility. Journal of Experimental Psychology: Human Perception & Performance 27(4), 813–828 (2001) 32. Vu, K.L., Proctor, R.W.: Naive Judgments of Stimulus-Response Compatibility; Implications for Interface Design. Ergonomics 46, 169–187 (2003) 33. Vu, K.L., Proctor, R.W.: Stimulus-Response Compatibility. In: Proctor, R.W., Reeve, T.G. (eds.) Stimulus-Response Compatibility; An Integrated Perspective, pp. 89–116. North Holland, Amsterdam (2001) 34. Warrick, M.J.: Direction of Movement in the Use of Control Knobs to Position Visual Indicators. USAF AMC Rep. No. 694-4C (1947) 35. Yu, R.F., Chan, A.H.: Comparative Research on Response Stereotypes for Daily Operation Tasks of Chinese and American Engineering Students. Perceptual & Motor Skills 98(1), 179–191 (2004) 36. Wu, S.P.: Further Studies on the Spatial Compatibility of Four Control-Display Linkages. International Journal of Industrial Ergonomics 19, 353–360 (1997)

Graphics and Semantics: The Relationship between What Is Seen and What Is Meant in Icon Design Sarah Isherwood School of Healthcare, University of Leeds, Leeds, LS2 9JT [email protected]

Abstract. Visual icons can be considered as a means for designers to convey messages to end-users via the interface of a computer system. This paper explores the relationship between the users’ interpretation of icons and the meaning that designers intend icons to convey. Focussing on interface users’ understanding of icons, recent research has shown that it is the closeness of the relationship between icon and function, known as the semantic distance, that is of prime importance in determining the success of icon usability. This contrasts with previous research which has suggested that the concreteness, or pictorialness, of icons is the key to good design. The theoretical and practical implications of these findings are discussed. Keywords: Icons, Semantic distance, Concreteness, Semiotics.

1 Introduction: Signs and Semiotics Semiotics is the study of signs; it has, not surprisingly, been influential in assisting research on graphical user interfaces. De Souza [1] claimed that in addition to cognitively-based research, which focuses on an interface users’ comprehension of signs and the consequent actions performed by those users, semiotic engineering can also play a part in providing guidance to designers. For instance, using the theoretical underpinnings of semiotics we can consider visual icon design as being a form of communication from the designer(s) to the user(s) via the interface of the computer system. Information that is designed to be communicated via a computer system will often occur in a different space and time to when the end-user operates the system. This means that unlike instantaneous human-to-human communication, the user is unlikely to be able to respond directly to the designer if they do not understand the message that has been sent [1]. The first stage of design is therefore to encode information into a signal which the user will be able to interpret, or decode [2]. In order to communicate information to users, interfaces frequently make use of pictorial and graphical objects, commonly referred to as icons. To develop icons for graphical user interfaces it is necessary to consider how they communicate information. In contrast to other writing systems, visual icons often communicate information in a non-verbal manner, not relying on syntactic or phonological rules to convey meaning [3]. Instead icons attempt to represent objects, concepts and functions by relying on the user’s ability to learn the meaning of the icon using their pre-existing knowledge. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 197–205, 2009. © Springer-Verlag Berlin Heidelberg 2009

198

S. Isherwood

Fig. 1. Peirce’s three elements of a Sign with the example of an icon representing the concept of being ‘fast’

One of the founders of the field of semiotics, Charles Sanders Peirce [4], claimed that a sign is ‘something which stands to somebody for something in some respect or capacity’ (p. 135). According to Peirce signs are composed of three elements: 1) the Representamen (i.e. the representation); 2) the Object (i.e. the represented object, function or concept); 3) the Interpretant (i.e. the process of interpretation). This relationship is shown in Figure 1, using an example of an icon to represent the concept of being ‘fast’. There is not necessarily a direct connection between the Object and Representamen [5]. In the example in Figure 1, an interpreter of the icon would have to recognise the hare that is depicted by the icon and have the knowledge that hares are fast-moving animals in order to arrive at the meaning of the icon (‘fast’). Peirce also believed that the Interpretant itself was a sign which could lead to other signs as interpretation was the process by which people associate meanings to signs. In other words, the more we think of an object, concept or function for instance, the more meanings we can associate to it. In Figure 1, the user could recognise the depiction of a hare and understand it as meaning animal, which then may lead to the

Graphics and Semantics

199

Fig. 2. Sign interpretation of a designer and an interface user

meaning of mammal, and to leporid, and so on. It is not possible, however, to predict the amount of these meaningful associations [5]. When users interpret signs they do so uniquely; each user will have their own culture, knowledge, familiarity with the sign or its depicted function, frequency of use of the sign and so on. This means that it is not easy for the designer to determine the relationship between the Interpretant and Object for each interface user ‘since it is an inherent function of the person (Intepretant) or culture’ [6] (p. 742). It is therefore crucial that the designer considers who the end-users of a system are going to be with regard to their likely culture, knowledge and frequency of use of the icon. For instance, will the users understand the cultural codes used by the designer to communicate a certain message? Indeed both the end-users and the designer will have their own sets of acquired mental models relating to the knowledge and experiences they have gained during their lifetimes, and the culture that they belong to [7]. The choice of icons that are used to represent information for a specific computer system will ideally activate accurate mental models in the end-users. Faulkner [8] claimed that the computer interface must facilitate users in developing accurate mental models of the computer system, as it is these mental models that the users employ to understand how the system works. These models are likely to evolve as novice interface users recognise some element of the icon which allows them to understand its meaning or function (for instance a hare is recognised as a fast-moving animal in Figure 1 leading to the deduction that the icon means ‘fast’), or experienced users recognising the icon and its function through repeated exposure with it (the icon in Figure 1 is simply recognised as ‘fast’). How the user interprets the sign will depend

200

S. Isherwood

on the user’s mental models, likewise how the designer chooses to represent the object may also depend on their own set of mental models (see Figure 2). It is important to note that the function assigned to an icon by those designing it may be quite different to the meaning attributed to it by users in practice. Ideally the link between the Representamen and Object should be obvious to all using the interface and so lead to just one Interpretant. It should activate the correct mental model which allows the users to not only understand the function of an icon but also act on it appropriately.

2 Icon Concreteness In order to try and make the relationship between the Object and Representamen obvious icons may be designed to be pictorial representations of the objects they are depicting (i.e. concrete icons, see Figure 3 a and b). Concrete icons are thought to be easy to interpret as they allow people to apply their everyday knowledge, about the objects depicted by them, in order to make inferences about the function of the icon [9]. In contrast, abstract icons are likely to represent information using graphical features such as arrows and lines and consequently have less obvious connections with their real world referents (see Figure 3 c and d). In practice a user applying their everyday knowledge to a symbol would be likely to easily infer the meaning or function represented by a concrete symbol without needing any explicit learning of the icon as they should contain what we already know about everyday objects; whereas abstract icons are more likely to require training. Research has shown that users respond more quickly and accurately to concrete icons than to abstract icons, thus supporting the idea that a pictorial or visually obvious symbol will be most easily understood by a user [10], [11], [12], [13], [14]. However other experiments have found that such performance advantages diminish over time when users are allowed to gain experience with a set of icons [15], [16], [17]. Therefore, for interfaces that are likely to be used frequently, the initial advantages of concrete symbols will decline as users learn the meanings of the abstract symbols. It is interesting to note that although users prefer concrete symbols to abstract symbols [18], [19], [20], this is not always reflected in a user’s performance. Stammers [19] found that even when users preferred concrete icons for a function, they did not always respond more quickly or accurately than they did with abstract icons.

3 Semantic Distance 3.1 Semantic Distance and Concreteness Not everything that needs to be represented on an interface will refer to items, such as simple objects, that are easy to depict concretely. A number of studies have found that as objects, concepts or functions to be represented become more abstract they can become more problematic to depict pictorially [11], [21]. However, concreteness is not the sole determinant of ease of access to meaning. Semantic distance is the term used to refer to the closeness of the relationship between the icon and what it is intended to represent. This relationship can also be used to determine icon usability and may be either close or distant for both concrete and abstract icons. For instance, both

Graphics and Semantics

Direct relationship

201

Distant relationship

Concrete

(a) Baggage lockers

(b) Information

Abstract

(c) Entrance

(d) Pause

Fig. 3. Icon-referent relationships

Figure 3a and Figure 3c have direct relationships between the icon and the function they represent, despite the fact that one is concrete and the other is abstract. Similarly, Figure 3b and Figure 3d have a less obvious, more distant, relationship between icon and function. Interestingly, McDougall et al [22] examined users’ responses to icon sets in which the icon characteristics of semantic distance and concreteness were varied. They found that semantic distance was a stronger determinant of performance than concreteness. 3.2 Semantic Distance as a Continuum Although the word ‘icon’ is now commonly used to refer to the pictorial and graphical objects used to communicate information (and used interchangeably with the words ‘symbol’ and ‘sign’ in this report) it was a term given a more specific description by Peirce in his taxonomy of signs. Peirce classified signs into 3 categories, icon, index and symbol. 1. A sign in the icon category represents an object because it pictorially resembles the object. 2. Signs in the index category refer to the object they represent because they are affected by that object. For instance Moyes and Jordan [9] give an example of the association between smoke and fire as smoke can be used as a sign to imply the existence of a fire.

202

S. Isherwood

3. Finally symbols have an arbitrary relationship with the object being symbolized. There is no connection between the symbol and its real world counterpart. Arbitrary symbols therefore ‘represent objects by virtue of a rule or convention’ [23] (p. 70). This taxonomy describes a similar dimension to that represented in the concept of semantic distance. Where, in the first instance there is a close, direct, relationship between the icon and its intended function; the second type requires the use of inferences in order to ascertain the meaning of the icon; and the third level consists of arbitrary relationships in which the function of the icon is understood only if users have previously learned its meaning. In practice it is possible to regard this dimension as a continuum running from very closely related to very distantly related [24]. 3.3 The Importance of Semantic Distance The evidence available suggests that semantic distance has an important role to play in determining interpretability [6], [16], [22], [25]. For instance, Isherwood et al [16] examined the relative importance of icon characteristics (including semantic distance, concreteness and familiarity amongst others) in determining the speed and accuracy of icon identification as users gained experience with icons. Icon characteristics were found to account for up to 69% of the variance observed in user performance and semantic distance was initially found to be the primary predictor of user performance. It was thought that this potentially reflected the users’ learning of icon-function relationships. The importance of semantic distance, particularly for novice icon users, suggested that the effects of the visual metaphor employed in concrete icons were less powerful than is commonly supposed, possibly because only a limited number of functions can easily be represented pictorially [11], [13]. Many more concepts can be represented abstractly than pictorially and so icon design should perhaps focus more closely on this conceptual mapping between icon and function rather than relying on concrete icons. The importance of semantic distance may be related to the fact that it is a measure of the degree to which icon and function labels are related. Familant and Detweiler [2] claimed that the simplest type of icon-referent relationship is one where the signal denotes just the one referent (a direct sign relationship). This occurs whether or not the icon is a direct visual metaphor or an abstract representation of its referent. Hence it is the relationship between the signal and referent which is of importance rather than concreteness per se. The importance of 'goodness-of-fit' also seems to be significant for picture naming, and where a number of names are possible, this creates uncertainty and slows semantic access and naming response times [26], [27]. Three types of stored representations are thought to be involved in object naming: visual, semantic, and lexical representations. Each form of representation is usually associated with a series of processing stages. A theoretical model developed by Johnson et al [27] outlined the following processing stages: 1) search and perception of the picture, 2) retrieval of a matching representation (i.e., stored visual representations), 3) activation of semantic information (i.e., conceptual and functional information associated with the object), 4) access to the function, or name, via referential connections. It is possible that semantic distance is an index of the closeness and efficacy of the connection between visual, semantic and lexical representations.

Graphics and Semantics

203

4 Familiarity In addition to the strength of icon-referent relationships Isherwood et al [16] and McDougall and Isherwood [25] also found familiarity to be an important predictor of user performance with icons. As users gained experience with the icons in these studies familiarity with an icon, and with the function of an icon, became important predictors of performance. McDougall and Isherwood [25] argued that the importance of familiarity, with both the icon and function, suggests that they have longer term effects in determining response times because of familiar items being easier to access in long-term memory representation even after a number of repeated presentations. These authors suggest that, with regard to the processing stages outlined by Johnson et al [27] icon familiarity may be an index of the ease with which individuals can access stored visual representations and may even help drive initial semantic access. In addition to exploring the determinants of icon usability these studies have also shown that the primary predictors of performance change as users gained experience with the icons. Strong icon-referent relationships were of initial importance whereas icon function and familiarity become more important to experienced icon users. Icons are often not known and have to be learned initially but this is not the case once users have become experienced at using the icons. It is therefore not surprising that, predictors of icon identification change as learning occurs [15], [16], [17].

5 Conclusion In order to allow the continuing advancement of user controlled systems the users involved in human-computer interaction must be better understood. How information can be communicated from one person to another through the use of icons is often less straightforward than simply relying on pictorial associations with the icon’s referent. As noted by Familant and Detweiler ‘objects called icons, even if restricted to icons in a computer environment are far more diverse and their relationships to the objects and events they are intended to represent far more complicated, than one might suppose’ [2] (p.705). This report has attempted to advocate the consideration of the signal-referent relationship in icon design and to take into account the importance of the end-users’ input into the icon’s interpretation. Icons that are well-mapped to their referents and have been designed with consideration to the end-users (whether visual or auditory icons, for instance see [28], [29]) should be unambiguous in their intended meaning and consequently clearly understood and acted on appropriately by users. Good interface design should ideally produce a limited amount of meanings for a given message, without limiting the uses or functions of the computer system [1].

References 1. De Souza, C.S.: The semiotic engineering of user interface languages. International Journal Man-Machine Studies 39, 753–773 (1993) 2. Familant, M.E., Detweiler, M.C.: Iconic reference: Evolving perspectives and an organising framework. International Journal Man-Machine Studies 39, 705–728 (1993)

204

S. Isherwood

3. Carr, T.H.: Perceiving visual language. In: Boff, K.R., Kaufman, L., Thomas, J.P. (eds.) Handbook of perception and human performance: Cognitive processes and performance, pp. 29–92. Wiley, New York (1986) 4. Peirce, C.S.: Collected Papers of Charles Sanders Peirce. In: Hartshorne, C., Weiss, P. (eds.) Elements of logic, vol. 2. Harvard University Press, Cambridge (1932) 5. De Souza, C.S.: The semiotic engineering of human-computer interaction. MIT Press, Cambridge (2005) 6. Goonetilleke, R.S., Shih, H.M., On, H.K., Fritsch, J.: Effects of training and representational characteristics in icon design. International Journal of Human-Computer Studies 55, 741–760 (2001) 7. Barker, P., van Schaik, P.: Icons in the mind. In: Yazdani, M., Barker, P. (eds.) Iconic Communication. Intellect Books, Bristol (2000) 8. Faulkner, C.: The Essence of Human-Computer Interaction. Prentice Hall, London (1998) 9. Moyes, J., Jordan, P.W.: Icon design and its effect on guessability, learnability and experienced user performance. In: Alty, J.D., Diaper, D., Guest, S. (eds.) People and computers VIII, pp. 49–59. Cambridge University Society, Cambridge (1993) 10. Arend, U., Muthig, K.-P., Wandmacher, J.: Evidence for global feature superiority in menu selection by icons. Behavior and Information Technology 6, 411–426 (1987) 11. Rogers, Y., Oborne, D.J.: Pictorial communication of abstract verbs in related to humancomputer interaction. British Journal of Psychology 78, 99–112 (1987) 12. Stammers, R.B., George, D.A., Carey, M.S.: An evaluation of abstract and concrete icons for a CAD package. In: Megaw, E.D. (ed.) Contemporary ergonomics 1989, pp. 416–421. Taylor & Francis, London (1989) 13. Stammers, R.B., Hoffman, J.: Transfer between icon sets and ratings of icon concreteness and appropriateness. In: Proceedings of the Human Factors Society 35th Annual Meeting, pp. 354–358. Human Factors and Ergonomics Society, Santa Monica (1991) 14. Stotts, D.B.: The usefulness of icons on the computer interface: Effect of graphical abstraction and functional representation on experienced and novice users. In: Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting Human Factors and Ergonomics Society, Santa Monica, CA, pp. 453–457 (1998) 15. Green, A.J.K., Barnard, P.J.: Iconic interfacing: The role of icon distinctiveness and fixed or variable screen locations. In: Diaper, D., Gilmore, D., Cockton, G., Shackel, B. (eds.) Human computer interaction – Interact 1990, pp. 457–462. Elsevier Science Publishers, Amsterdam (1990) 16. Isherwood, S.J., McDougall, S.J.P., Curry, M.: Icon Identification in Context: The changing role of icon characteristics with user experience. Human Factors 49(3), 465–476 (2007) 17. McDougall, S.J.P., de Bruijn, O., Curry, M.B.: Exploring the effects of icon characteristics on user performance: The role of icon concreteness, complexity, and distinctiveness. Journal of Experimental Psychology: Applied 6, 291–306 (2000) 18. Nolan, P.R.: Designing screen icons: Ranking and matching studies. In: Proceedings of the Human Factors Society 33rd Annual Meeting, pp. 380–384. Human Factors Society, Santa Monica (1989) 19. Stammers, R.B.: Icon interpretation and degree of abstractness – concreteness. In: Adams, A.S., Hall, R.R., McPhee, B.J., Oxenburgh, M.S. (eds.) Ergonomics international, vol. 88, pp. 505–507. Taylor & Francis, London (1988) 20. Stammers, R.B.: Judged appropriateness of icons as a predictor of identification performance. In: Lovesay, E.J. (ed.) Contemporary ergonomics, pp. 371–376. Taylor & Francis, London (1990)

Graphics and Semantics

205

21. Jones, S.: Stereotypy in pictograms of abstract concepts. Ergonomics 26, 605–611 (1983) 22. McDougall, S.J.P., Curry, M.B., de Bruijn, O.: The effects of visual information on users’ mental models: An evaluation of pathfinder analysis as a measure of icon usability. International Journal of Cognitive Ergonomics 5, 59–84 (2001) 23. Greenlee, D.: Peirce’s concept of sign. Mouton & Co. N. V. Publishers, The Hague (1973) 24. McDougall, S.J.P., Curry, M.B., de Bruijn, O.: Measuring symbol and icon characteristics: Norms for concreteness, complexity, meaningfulness, familiarity and semantic distance for 239 symbols. Behavior Research Methods, Instruments and Computers 31, 487–519 (1999) 25. McDougall, S.J.P., Isherwood, S.J.: What’s in a name? The role of graphics, functions, and their interrelationships in icon identification. Behaviour Research Methods 41, 325–336 (2009) 26. Bates, E., D’Amico, S., Jacobsen, T., Székely, A., Andonova, E., Devescovi, A., Herron, D., Lu, C.C., Pechmann, T., Pléh, C., Wicha, N., Feremeiier, K., Gerdjikova, I., Gutierrez, G., Hung, D., Hsu, J., Iyer, G., Kohnert, K., Mehotcheva, T., Orozco-Figueroa, A., Tzeng, A., Tzeng, O.: Timed picture naming in seven languages. Psychonomic Bulletin & Review 10, 344–380 (2003) 27. Johnson, C.J., Paivio, A., Clark, J.M.: Cognitive components of picture naming. Psychological Bulletin 120, 113–139 (1996) 28. McKeown, D., Isherwood, S.J.: Mapping Candidate Within-Vehicle Auditory Displays to Their Referents. Human Factors 49, 417–428 (2007) 29. Petocz, A., Keller, P., Stevens, S.: Auditory warnings, signal-referent relations and natural indicators: Re-thinking theory and application. Journal of Experimental Psychology: Applied 14, 165–178 (2008)

The Effect of Object Features on Multiple Object Tracking and Identification Tianwei Liu1,2, Wenfeng Chen1,*, Yuming Xuan1, and Xiaolan Fu1 1 State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China {chenwf,xuanym,fuxl}@psych.ac.cn 2 Graduate University of Chinese Academy of Sciences, Beijing 100049, China [email protected]

Abstract. Tracking of multiple objects is challenging for computer vision system under certain circumstance. We investigated this problem with human observers. In our experiment, observers were asked to track multiple moving items as well as to maintain their identities. We found that the capacity of maintaining multiple moving object identities of human is about three to four items, and uniqueness improves the general tracking and ID performance. It also showed that observers’ capacity of ID task was dependent on feature type, and suggests that a less resource-demanding process of identity-related feature would lead to more effective improvement on tracking. These results provide some indications for the design of computer vision system which involves human monitoring, and suggest that creating a featural space to map the identity of multiple objects may aid the automatic object tracking.

1 Introduction It is important to identify and maintain information of multiple objects that move through an environment. The Multiple Object Tracking (MOT) paradigm is popularly used to study how our visual system tracks multiple moving objects [1]. The research has generated anti-intuitive findings concerning the role of distinct feature on tracking. Objects are commonly distinguished by features. Yet relative to object locations, object features such as color and shape are often lost easily during tracking [2]. Intuitively, individuating objects should facilitate tracking. Horowitz et al [3] addressed similar question that whether unique objects help or hinder MOT performance in comparison with identical objects (animals) and found the unique object advantage. However, this unique advantage was not always observed [4]. Furthermore, it remains intact for the role of distinct feature on identification based on identity-location binding during tracking. So in the present work, we tested that whether distinct feature information could improve tracking and identification. *

Corresponding author: [email protected]

D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 206–212, 2009. © Springer-Verlag Berlin Heidelberg 2009

The Effect of Object Features on Multiple Object Tracking and Identification

207

2 Methods 2.1 Participants Twenty-four participants (12 female) aged 19-25 (Mean 21.1) participated in the experiment as paid volunteers. All participants had normal or corrected to normal vision. 2.2 Stimuli The stimuli were presented on a 17-inch color CRT monitor controlled by E-prime 1.1. The refresh rate of monitor was 75 Hz at a resolution of 800×600 pixels. The viewing distance was about 60 cm.

3 1

6 7

t1

t2

t3

1 7

t4

3

6

1 2 3 4 5 6 7 8

t5

Fig. 1. Sequence of events in a trial in the identical and number condition. At the beginning of a trial, eight disks were shown at their random initial positions (t1). Then four of them flashed on and off to identify they were targets meanwhile inside each of these four disks a number randomly selected from 1 to 8 without replacement would be displayed. (t2). The numerical labels on the targets disappear and all items began to move for 5 or 10 seconds (t3). The disks stopped moving and observers selected out the 5 targets (t4). Observers assigned a number to each target (t5).

The objects of tracking were eight identical grey disks and four of them were targets. Each disk was 47 pixels in diameter and was surrounded by a white border of 2 pixels. The screen background was black and the movement was restricted in the center area subtending about 18.2 × 18.2º of visual angle (moving area). The initial positions of the eight disks were random selected on the screen with the constraint that each disk had to be at least 1º from the edges of the moving area and at least 1.1º from each other. The velocity of the items varied between 1.9 and 2.8º / s with a mean of 2.4 º / s.

208

T. Liu et al.

6

3

1

2

8 5

7

t1

t2

4

t3 4 1 6

7

t4

t5

1 2 3 4 5 6 7 8

t6

Fig. 2. Sequence of events in a trial in the unique number condition. At the beginning of a trial, eight disks were shown at their random initial positions (t1). Then four of them flashed on and off to identify they were targets (t2). All items began to move for 5 or 10 seconds with a unique number appearing in their centers throughout the movement (t3). A white screen flashed for 100ms to eliminate the visual afterimage (t4). Observers selected out the four targets (t5). Observers assigned a number to each target (t6).

2.3 Design and Procedure There was a mixed design with object differentiability (unique, identical), moving duration (5s, 10s) as within-participant variables and feature type (numeral label, color label) as between-participant variable. Twelve participants were tested in the number condition and twelve in the color condition. We introduced a variant of the classical MOT procedure. The main change was that an identification task based on identity-location binding was added after tracking task. The essential change for unique condition was that all the items were visually unique during movement because of the numerical or color labels which appeared when they began to move and remained visible until they stopped moving. At the start of each trial, eight disks appeared on the screen and four of them flashed for 1500 ms designated as targets. Then all items began to move for 5 or 10 seconds. During the moving of the objects, overlap between any items randomly appeared. When disks stopped moving, the participants were instructed to pick out all the items that had flashed before by clicking on them with the mouse (tracking task). Then the list of digits from 1 to 8 (or colors) was displayed nearby the right edge of the screen. Participants were asked to label each of the targets to the correct number that had ever showed at the beginning of the trial (identification task). They could click a target and then click one of the eight numbers. The trial ended after all of targets were labeled. The observer then pressed the spacebar to initiate the next trial. The sequence of events in each trial was shown in Figure 1 and 2.

The Effect of Object Features on Multiple Object Tracking and Identification

209

For unique condition, a number (or color) label randomly selected from 1 to 8 (or blue, coffee, cyan, green, orange, purple, red, yellow color) would be displayed inside each disk during moving. For identical condition, a number (or color) label would be displayed inside each disk when flashing and labels disappeared when moving.

3 Results Raw accuracy data were transformed into estimated capacities according to highthreshold guessing models devised for each condition. The estimated capacities of tracking and identification were computed according to the formulas of Horowitz et al. [3]. The tracking capacities were shown in Figure 3. A three-way ANOVA revealed that there were significant main effects of object differentiability [F(1, 22) = 42.44, p < .001] and moving duration [F(1, 22) = 25.69, p < .001], and interaction between these two variables [F(1, 22) = 18.92, p < .001], but not significant main effect of feature type and other interactions[F(1, 22) <1.20, p > .12]. Simple main effect analysis showed that the difference between moving durations was significant for identical condition [p < .001], but was not significant for unique condition [p = .759].

Fig. 3. Tracking capacity as a function of feature type, object differentiability and moving duration

The identification capacities were shown in Figure 4. A three-way ANOVA revealed that there were significant main effects of feature type [F(1, 22) = 11.11, p < .01], object differentiability [F(1, 22) = 60.71, p < .001] and moving duration [F(1, 22) = 19.82, p < .001], and interaction between these latter two variables [F(1, 22) = 47.85, p < .001], but not significant main effect of feature type and other interactions[F(1, 22) <1, p > .34]. Simple main effect analysis showed that the difference between moving durations was significant for identical condition [p < .001], but was not significant for unique condition [p = .160].

210

T. Liu et al.

Fig. 4. Identification capacity as a function of feature type, object differentiability and moving duration

4 General Discussion In the study, we investigated the human observers’ abilities of tracking multiple moving items and maintaining their identities. We found that observers can, to some extent, maintain the identities of multiple objects. However, observers’ capacity of ID task was dependent on feature type. ID capacity is about 3.6 items for digit-based representation, and about 3 items for color-based representation. Thus we suggest that the human capacity of maintaining multiple moving object identities is about three to four items. 4.1 The Unique Benefit The results showed that tracking capacity decreased with moving duration, but only with identical items. The use of unique items improved the general tracking performance and eliminated the difficulty caused by long moving duration. This was contrast to the lack of unique advantage in [4]. The main difference between their research and this study is an additional identification task may stimulate participants to adopt a voluntary strategy to encode the distinct feature. Participants might have attempted to encode the unique feature to help them with the tracking task. This may suggest that tracking implemented by early vision is “feature-blind” as contended by FINST theory [5], but the object identity differentiated by visual properties can be encoded or accessible from higher level cognitive processes when a voluntary strategy involved. It also showed that ID capacity decreased with moving duration, but only with identical items. The use of unique items improved the general ID performance and eliminated the difficulty caused by long moving duration. This was consistent with intuitive inference and had collectively demonstrated content addressable representations in multiple-object tracking [3]. The most important result was that ID capacity was better with number labels than with color labels no matter the labels were shown at the initial static phase or during the movement, but this was not the case for tracking capacity. It indicated a greater role of digit than color labels for content addressing. It may due to the more significant distinguishability of digit than color. We can

The Effect of Object Features on Multiple Object Tracking and Identification

211

distinguish and remember 8 digits more easily and quickly than 8 colors and thus require less resource to process these features. In fact, the digit span of adult is greater than the color span (6 items, [6]). A less resource-demanding process of identityrelated feature would lead to more effective improvement on tracking. 4.2 Implications and Applications of the Present Study In this study, we explored the capacity of multiple object tracking and identification of human observers. We found that human capacity of keeping track of object identities is about three to four items. This limited capacity should be taken into account in the computer vision system involving human monitoring. In a human-computer interaction, for example, to ensure efficient monitoring, at most four targets should be displayed to a human surveillant. And since identity-related features improve human identification performance, it is helpful to label every object in the display with unique features with less resource-demanding. We also found that identification of multiple moving objects is challenging for human observers just as it is for computer vision systems. However, for human observers, this performance can be improved by keeping the identity-related featural information of all items visible during their movement, which suggested that human can use feature information to maintain the identity of multiple moving objects. This featural correspondence facilitation could be applied in vision system by creating a feature space to map the identity of multiple objects. However, different feature don’t have the same impact on identification. This provides potential implications for computer MOT. Simple features are not selected arbitrarily to serve as efficient cues for improving identification during tracking. Furthermore, our data showed different impacts of unique benefit on tracking and ID performances, and indicated two separate mechanisms of resource allocation. For computer vision applications, tracking and ID are both probably involved in MOT. The separate mechanisms may imply a general principle of design with two vision systems (or modules) of MOT in computer vision application.

Acknowledgements This research was supported in part by grants from National Basic Research Program of China (2006CB303101), the National Natural Science Foundation of China (30600182, and 30500157) and Project for Young Scientists Fund, Institute of Psychology, Chinese Academy of Sciences (08CX032003).

References 1. Pylyshyn, Z.W., Storm, R.W.: Tracking multiple independent targets, evidence for a parallel tracking mechanism. Spatial Vision 3, 179–197 (1988) 2. Bahrami, B.: Object property encoding and change blindness in multiple object tracking. Visual cognition 10(8), 949–963 (2003)

212

T. Liu et al.

3. Horowitz, T.S., Klieger, S.B., Fencsik, D.E., Yang, K.K., Alvarez, G.A., Wolfe, J.Y.: Tracking unique objects. Perception & Psychophysics 69, 172–184 (2007) 4. Klieger, S.B., Horowitz, T.S., Wolfe, J.M.: Is multiple object tracking colorblind? Journal of Vision 4(8), 363 (2004) 5. Pylyshyn, Z.W.: Some puzzling findings in multiple object tracking: I. Tracking without keeping track of object identities. Visual cognition 11(7), 801–822 (2004) 6. Zoelch, C., Seitz, K., Schumann-Hengsteler, R.: From rag(bag)s to riches: measuring the developing central executive. In: Schneider, W., Schumann-Hengsteler, R., Sodian, B. (eds.) Young Children’s Cognitive Development: Interrelationships among Executive Functioning, Working Memory, Verbal Ability, and Theory of Mind. Erlbaum, Mahwah (2005)

Organizing Smart Networks and Humans into Augmented Teams Martijn Neef, Martin van Rijn, Danielle Keus and Jan-Willem Marck TNO Defence, Security and Safety, The Netherlands {martijn.neef,martin.vanrijn,danielle.keus, jan_willem.marck}@tno.nl

Abstract. This paper discusses the challenge of turning networks of sensors, computers, agents and humans into hybrid teams that are capable, effective and adaptive. We propose a functional model and illustrate how such a model can be put into practice, and augment the capabilities of the human organization. We specifically focus on the interaction between the human and artificial part of the system, with specific attention to task delegation, role adjustment and adaptive autonomy. In this paper, we introduce the main concepts and report on observations from initial experiments.

1 Introduction Networked systems and operations are at the center of many research and development agendas. Our working environments are full of networked devices, and close interactions between humans and networked devices are commonplace. We use sensor networks for remote observations, we interact by means of wireless communication devices, and we benefit from network-accessible information sources. Gradually, we also see networked devices play a more active and cooperative role in operations. We expect this development to continue. Close cooperation between humans and intelligent networks will occur at various levels of cognition, perhaps to the extent that we effectively create ‘augmented teams’: teams whose capabilities are greatly augmented by the involvement of sensors, networks and artificial actors, and in which technological means practically become part of the team itself. Advances in artificial intelligence fuel such developments, and allow intelligent systems to play a far more pro-active and autonomous role than traditionally. We are already seeing signs of such network-enhanced teams in the military domain, where the availability of networks and smart systems are changing the face of the battlefield. Augmented team concepts will appear in many safety and security domains, because of the incessant need for additional sensing and acting capabilities in such environments. Despite many research on sensors, networks and intelligent systems, there is little practical work on how to incorporate such systems effectively into human-centered teams. There are two major requirements that we present ourselves with: (a) the approach must be suitable for the current and future state of technology, and (b) the augmented team must exhibit adaptive capabilities. The first requirement is important in view of ongoing developments in system engineering. In most current systems, D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 213–222, 2009. © Springer-Verlag Berlin Heidelberg 2009

214

M. Neef et al.

humans are in charge of tasks that demand higher levels of cognition, whereas system components usually take on tasks with lower cognitive requirements. In order to be future-proof, designs must take into account that the traditional division of labour between human and artificial actors will gradually fade away. A related feature that we seek is adaptivity. Adaptivity refers to the ability of systems to modify its structure or behaviour to obtain a better fit when its circumstances change. There is a lot of interest in adaptive capabilities for systems and organizations, but it is a difficult feature to achieve. Many work on adaptive team concepts stem from the military domain (Klein, 2000), and various examples of adaptive capabilities have been implemented on the battlefield, although mostly in limited form. Most human teams use a fixed role and task allocation. We want our augmented teams to exhibit adaptive capabilities. The team should be able to alter role and task allocations (internal adaptivity) and course of action (external adaptivity) whenever needed. It should be possible to modify positions and activities in a seamless manner. Because an augmented team also includes artificial actors, this implies that it should be possible to have artificial actors switch roles with human actors, and vice versa. Our design concept should cater for such events, and facilitate adaptive behaviours.

2 A Functional Design for Augmented Teams An augmented team consists of a collective of sensors, actuators, information processing systems and humans that are interconnected through an intelligent network. An augmented team has adaptive capabilities with respect to organization structure, role and task allocation and information flow between elements. That implies that roles and tasks may be exchanged between team members without disrupting the integrity of the team and without needing a major redesign of the information flow through the system. It also means that the team can easily accommodate new elements (sensors, actuators, human actors), and that their added capabilities automatically become part of the feature set of the team. We view an augmented team as a cognitive system, a system that is set in the real world, has perceptive and cognitive capabilities (self-reflection, reasoning, understanding, learning, decision making) and can respond to situations with reason and intention. In conventional automation processes, there is a clear divide between the human team and the technical system. Because of our adaptivity and agility requirements, we intentionally disregard the challenge of proper division of labour between human and artificial team members, and start from a pure functional stance. We use the Networked Adaptive Interactive Hybrid Systems (NAIHS) model [2, 3] as a blueprint for our hybrid organizations. The NAIHS model describes a typical sensor-data driven networked system from a functional point of view, and considers both human actors and artificial entities to fulfill functional components of the system. NAIHS decomposes systems into functional components, essential functions that need to be fulfilled by actors in the system. The NAIHS model does not prescribe which actor should take responsibility for a certain function, just as long as performance criteria for each function are met during execution. This means that a part of the system could be responsible for fulfilling multiple functions, or that multiple actors could jointly achieve a single function.

Organizing Smart Networks and Humans into Augmented Teams

215

Fig. 1. The Networked Adaptive Interactive Hybrid Systems model [3]

NAIHS uses three principles to decompose a system into functional components: (a) the level of information abstraction, (b) the timescale of desired effects and (c) the physical structure of the system in its environment. For the decomposition in information abstraction level, NAIHS distinguishes between ‘situation awareness’ components and ‘command and control’ components. In addition, NAIHS uses four levels of information abstraction, taken from the JDL model [9], ranging from elementary signal assessment and generation (level 0) to high-level situation assessment and planning (level 3). The second dimension, timescale, emphasizes that functions may have different time-constraints. The third dimension, physical structure, captures the physical aspects of each function. NAIHS uses these dimensions as elemental steering guides for the assembly of effective chains of tasks in a networked system. For further details, see Kester [3]. 2.1 Organizing Team Structures The functional NAIHS model gives us a transparent way to describe the functions of an augmented team. But how do we organize the teamwork, and describe the various interactions between elements? The dynamic nature of an augmented team makes it unpractical and undesirable to arrange all possible task allocations and interactions in advance. This means that we need a flexible way to describe the tasks of each element, and the relationship it has with other elements. Such descriptions should effectively describe what kind of behaviour one can expect from an actor, and can subsequently be used to arrange effective collaborations between actors. Actor interaction in networked augmented teams is comparable with interactions in multi-agent systems. We use a specification framework from the agent research community to represent the organization of an augmented team and the interactions between elements. OperA [1] offers a comprehensive methodology and specification language to represent and structure dynamic cooperation of artificial agents. OperA uses three models to represent multi-agent organizations. The Organizational Model

216

M. Neef et al.

represents organizational goals and requirements. It describes roles, generic interaction structures, performance criteria, norms, ontologies and other aspects of an organization that define the boundaries of operation. The Social Model represents the agreements that individual elements adopt when they become part of the organization. These ‘social contracts’ describe tasks associated with a role, obligations, permissions and structural relation with other roles. The Interaction Model represents interaction commitments between elements that specifies the format and frequency of interaction. It can also include agreements on how to solve conflicts and other relevant processes.

Fig. 2. Using OperA to implement the organizational structure of an augmented team

OperA uses a formal description language to represent contracts, so that the organization can be validated through logical verification methods. Logical verification of the models can reveal unsatisfied objectives or contracts that are not fulfilled. This is, of course, an interesting feature for distributed systems for with respect to task planning and system performance assessment [10]. 2.2 Interaction and Adaptivity In an augmented team, the information flow must ‘bring’ the right (relevant) information to the right functional component. The flow of information is dynamic. Role adoptions at the social level and interaction agreements on the interaction level give rise to flows of information from one actor to another. This means that actors need to make sure that they receive information from other actors in the right form, and at the right time. They learn their information needs when they accept a ‘job contract’ during role adoption, and ensure that they receive information by negotiating interaction agreements with other actors. Social and interaction contracts are normally not present in an explicit form in organizations. Human teamwork is bound by common agreements that are usually informal in nature. In an augmented team however, we cannot depend on informal understandings, since artificial entities need to comprehend agreements in order to

Organizing Smart Networks and Humans into Augmented Teams

217

participate. The presence of artificial team members in augmented team makes it necessary to make every collaborative agreement explicit and accessible. This includes agreements between humans. If human actors reach an interaction agreement, their interaction contract must be available in the organization so it can be administered and monitored by other actors. This means that either the actors themselves need to publish the details of the contract, or have a third party capture the details of the agreement. To facilitate the contract processes, it is wise to create a contract manager. A contract manager is a component that is responsible for maintaining an overview of all elements and their contracts. Upon entering an augmented team, an element needs to accept the interactions contracts that are associated with the roles it will fulfill. The contract manager manages this process and keeps an administration of all contracts. Because of its administrative role, the contract manager is also in position to identify which element fulfils which functionality, and can signal mismatches and impossibilities in task allocation. The contract manager could interact with a process manager to assess the performance of the augmented team. When performance is sub-par, the process manager could instigate adaptive measures by modifying contracts or roles in the organization. The OperA models give us a clear way to describe three types of adaptive organizational behaviour: (1) Interaction adaptivity, in which elements adapt their way of interaction, (2) Role adaptivity, in which actors change roles, and (3) Behavioural adaptivity. The first has the least impact on the structure of the organization, because it only affects the interaction between actors. The last has the most impact, because it affects the fundamental aspects of the organization. Given the all the above, the essential design tactic of an augmented team becomes as follows. From the business model derive a suitable functional model using the NAIHS structure. Create roles and role descriptions, and design suitable control structures that can administer role adoption for all sorts of actors (interface-mediated role adoption for human actors, protocol-driven role adoption for artificial actors). Develop interaction templates that suit the type of actors in the organization, and design control structures that can help to implement and manage these interactions (e.g. communication, information and collaboration settings). And, to facilitate adaptive behaviours, create behaviour patterns that can help to quickly modify a team’s behaviour, as a driver for adaptivity (e.g. define working modes for the entire team). Brought together in a network- and service-oriented framework, these elements would form the basis for the development of augmented teams.

3 Experiments We are currently carrying out experiments to validate the augmented team concept. From a technical perspective, we test the concept on its ability to self-organize workand system structure under changing conditions. From an operational point of view, we want to learn about the cognitive engineering implications of the augmented team concept. Practically, our aim is to demonstrate that it is possible to assemble humans, sensors, actuators, information systems and communication means into an agile and adaptive system that can operative effectively in safety critical environments.

218

M. Neef et al.

Fig. 3. Schematic layout of the experiment environment

Our experiments take place in an indoor field lab. This lab contains a heterogeneous sensor suite, advanced tracking, tracing and prediction software and a command center with communication and information display facilities. The field lab is set in an actual office environment (the offices of TNO in The Hague) and covers an entire floor. The various sensors and software applications are organized by a serviceoriented architecture and structured following NAIHS principles [2, 3]. The current sensor suite contains cameras and radio beacons, which have been placed along the corridors of the fieldlab. The cameras are used for observation purposes and can recognize objects that have been trained in advance. The radio beacon network is, in conjunction with wearable tags, used as an tracking and tracing system. Information from the cameras and radio tags is combined in a tracker application that integrates sensor information over time and is capable of predicting the trajectory of people when they are out of reach of sensors. This information is displayed in the command center, from where a central coordinator plans actions among the various actors in the environment. For communication purposes, the players have mobile phones at their disposal. The current set of experiments uses a simple intruder detection and apprehension scenario in which an office safety team needs to figure out the position of an intruder and capture him by inclusion. The team consists of two roaming guards and a coordinator in the command center. The coordinator has access to the information in the network (positions of explorers and predicted location of intruder) and is in charge of directing the explorers. The security team has two main tasks (a) first find the intruder using a search strategy and (b) arrest him by means of enclosure by guards. The security team will have no trouble apprehending the intruder, but the time it takes depends on successful communication and planning. Successful runs ended in less than two minutes, while problematic runs took more than six minutes. Figure 5 shows which actors play which role in the organization. Artificial actors do not perform any decision making and coordination tasks yet. This condition limits adaptivity in our scenarios to human role change at the moment. Our main scenario involves the transfer of the coordinator role from the player in the control room to one of the mobile guards. This role change is useful when a guard obtains visual contact with the intruder, and has a better view on the situation than the coordinator.

Organizing Smart Networks and Humans into Augmented Teams

219

Fig. 4. The basic role layout of the scenario, and the actors that fulfill the various roles. On the left, from top to bottom, the tracker application, the positioning application, the smart camera network and the radio beacon network.

The coordinator-role-change scenario involves a modification of the Social Model. The Organizational Model does not change, because function-wise the organization remains intact. The Interaction Model, the third stage in the OperA suite, does change after role change. The interactions that the original coordinator had with information services and other actors, need to be modified to reflect that there is a new coordinator. In the coordinator-role-change scenario, we use job contracts, predefined packages of roles and interactions to speed up the negotiations phases. A job contract describes (1) designated tasks (e.g. observe, coordinate, inform), (2) task conditions (e.g. area, time-span, network constraints), (3) interaction contracts with actors that fulfill crucial data and information needs, (4) interaction contracts with actors that fulfill an identical role (e.g. to synchronize actions), (5) interaction contracts with actors that depend on the role taker to fulfill crucial data and information needs. For the coordinator role, the job contract package establishes the collaboration with the guards (what do they need to know, how can they be addressed), with the information systems (what information does the coordinator need, and which role provides that information?) and with other systems that are available (what control relations does the coordinator have, and how can they be put into effect?). The players are given instructions on the job contracts in advance, so that they can negotiate collaborations before the scenario starts. This means that, as soon as role change occurs, every actor knows exactly what rules to abide. At the start of the scenario, the player in the control room accepts the coordinator role, and implements his job contract. He is given the proper information from the information systems (the screens switch on in the control room), and he gives instructions to his guards. The guards have no access to any other information than what the coordinator relays. When the camera system picks up detections of the intruder the coordinator direct the guards to the right hallways in order to enclose the intruder. As soon as a guard has a visual contact on the intruder, then roles change. Since the guard has the best view on the situation, he takes over the coordinator role.

220

M. Neef et al.

The coordinator job contract calls for information about the position of actors and intruder detections. In the command room, this information is readily available on screen. The new mobile coordinator needs this information as well, and thus needs to negotiate new interaction contracts with the original providers of the information (the tracking and tracing applications). If the coordinator has a digital information device at his disposal, then he can arrange for that information to appear on his personal device. If he lacks such a device, then his only means to obtain information about the global situation is the actor in the command room. In that situation, the coordinator could opt to set an interaction contract with the command room actor, so that he functions as an information relay. This requires strict instructions from the coordinator on how to convey information, because it is hard to convey complex spatial information through a speech channel. We have tried numerous interaction variants, and strict adherence to communication agreements and proper preparation appears to be critical. It appears that agile role re-allocation is feasible, as long as the augmented team as a whole takes care of adjusting interaction contracts and information needs. If that succeeds, then adaptive role allocation becomes a powerful feature that paves the way for interesting options, such as the coordinator role-change from our scenario. Other interesting options would be to use human actors as instant sensors and effectors, or have system actors act as substitutes for human actors in case of emergencies.

4 Design Concerns The dynamic deployment of roles among human and artificial introduces many challenges with respect to human – machine teaming. We briefly discuss some apparent design concerns that we encountered in our practical work. 1. Define who is responsible for roles and task allocation Role and task allocation could be a joint responsibility of all entities, or the sole responsibility of an allocation manager. It needs to be clear who is in charge, because directly affects the operational chain of command. In a distributed, adaptive setting, the chain of command may become obfuscated because of changing roles and responsibilities [4], which may cause confusion and loss of coordination. 2. Ensure transparency of responsibilities and attributes Attributes and responsibilities of elements, both artificial and human, should be transparent and observable. This is an essential condition to enable dynamic role deployment. In a regular organization or system, it is clear from the start who or which system is responsible for which task. In a dynamic setting, assignments of elements change, and there is a distinct danger of loss of organization awareness. We advocate the use of explicit social contracts to represent responsibilities and capabilities, and the creation of an administrative role to keep an overview of all elements and their contracts in an organization. This adheres to the notion of observability and directability, which are fundamental principles from cognitive system engineering [4, 8]. 3. Make the type of adaptivity a design choice There are many forms of adaptivity. An important decision that needs to be made during design is whether to use prearranged behavioral patterns for adaptive behaviour, or to have adaptivity emerge from an internal collaborative process. To prevent

Organizing Smart Networks and Humans into Augmented Teams

221

‘clumsy automation’ issues [8] and uncontrollable adaptivity, use as much predefined adaptive measures as possible. For example, work from Parasuraman [7] could be used to pre-define collaboration types between actors, instead of relying on emergent adaptivity. 4. Prevent communication and interaction issues after role change All kinds of communication issues may occur when heterogeneous groups of entities collaborate. These problems are commonly known as interoperability problems. Before two elements can collaborate, there must be an agreement on how to communicate, and through which means. If the elements communicate at different levels, they will most likely fail to reach an agreement. As a rule of thumb, we suggest that elements should only communicate with neighboring elements, on the same level either of abstraction, or directly above or below (see the NAIHS model). 5. Prevent issues caused by multi-level or multi-role allocation of an element It is possible for elements to take on multiple roles or tasks with different characteristics, such as a different levels of abstraction or a different timescale. For each role change, it needs to be checked whether an element is not faced with roles that are too divergent, and will cause performance issues. For instance, a complex information analysis task might not fare well with an immediate physical task that would send the actor into the field. There need to be criteria available to assess the combination of multiple roles, and which can be assessed during contract negotiation. There are numerous other issues that need to be addressed when dealing with adaptive systems, such as maintaining system awareness, skill degradation concerns and user acceptance. Many are well-known and described in numerous papers about adaptive system design (e.g. [6]).

5 Conclusions This paper discussed the challenge of turning networks of sensors, computers, agents and humans into hybrid teams that are capable, effective and adaptive. We propose a functional model and illustrate how such a model can be put into practice and augment the capabilities of a human organization. We specifically focus on the interaction between the human and artificial parts of the system, with specific attention to task delegation and role adjustment. We use a functional model, the Networked Adaptive Interactive Hybrid Systems (NAIHS) model, as a blueprint for our organizations. The NAIHS model considers both man and machine to fulfill functional roles. To explicate the interactions between these roles, we make use of OperA, an organization modeling framework from multi-agent systems research. These models make it possible to express various aspects of a multi-agent organization, and as a result, help to organize a collection of autonomous agents into a coherent system. Despite of the obvious differences between human and artificial actors, we find that these models form an interesting basis to build hybrid organizations. The use of contracts facilitates the interaction and role transfer between actors, and gives a practical solution to articulate teamwork requirements in augmented teams. We believe that these interaction contracts are essential to fulfill basic cognitive engineering needs such as mutual

222

M. Neef et al.

observability, directability and resilience, and help to achieve complex features challenges such as organizational scaling, restructuring and agility.

References 1. Dignum, V., Dignum, F., Meyer, J.-J.C.: An Agent-Mediated Approach to the Support of Knowledge Sharing in Organizations. Knowledge Engineering Review 19(2), 147–174 (2004) 2. Kester, L.J.H.M.: Model for Networked Adaptive Interactive Hybrid Systems. In: Proceedings of COGIS 2006: COGnitive systems with Interactive Sensors, Paris (2006) 3. Kester, L.J.H.M.: Designing Networked Adaptive Interactive Hybrid Systems. In: Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems 2008 (MFI 2008), Seoul, Republic of Korea, pp. 516–521 (2008) 4. Klein, G., Pierce, L.G.: Adaptive teams. In: Proceedings of the 6th International Command & Control Research & Technology Symposium (ICCRTS 2001), Annapolis, MD, USA (2001) 5. Klein, G., Woods, D.D., Bradshaw, J.M., Hoffman, R.R., Feltovich, P.J.: Ten Challenges for Making Automation a “Team Player” in Joint Human-Agent Activity. IEEE Intelligent Systems 19(6), 91–95 (2004) 6. Miller, C.A., Funk, H., Goldman, R., Meisner, J., Wu, P.: Implications of Adaptive vs. Adaptable UIs on Decision Making: Why “Automated Adaptiveness” is Not Always the Right Answer. In: Proceedings of the 1st International Conference on Augmented Cognition, Las Vegas, NV, USA (2005) 7. Parasuraman, R., Sheridan, T.B., Wickens, C.D.: A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics, Part A 30(3), 286–297 (2000) 8. Sarter, N.B., Woods, D.D., Billings, C.E.: Automation Surprises. In: Salvendy, G. (ed.) Handbook of Human Factors and Ergonomics, 2nd edn., pp. 1926–1943. Wiley, New York (1997) 9. Steinberg, A.N., Bowman, C.L., White, F.E.: Revisions to the JDL Data Fusion Model. In: Proceedings of the SPIE: Sensor Fusion: Architectures, Algorithms, and Applications III, vol. 3719, pp. 430–441 (1999) 10. van der Vecht, B., Dignum, F., Meyer, J.-J.C., Neef, M.: A Dynamic Coordination Mechanism Using Adjustable Autonomy. In: Sichman, J.S., Padget, J., Ossowski, S., Noriega, P. (eds.) COIN 2007. LNCS, vol. 4870, pp. 83–96. Springer, Heidelberg (2008) 11. Woods: Human-centered software agents: Lessons from clumsy automation. In: Flanagan, J., Huang, T., Jones, P., Kasif, S. (eds.) Human centered systems: Information, Interactivity, and Intelligence, National Science Foundation, Washington, DC, pp. 288–293 (1997)

Quantitative Evaluation of Mental Workload by Using Model of Involuntary Eye Movement Goro Obinata1, Satoru Tokuda2, Katsuyuki Fukuda1, and Hiroto Hamada3 1

Nagoya University, EcoTopia Science Institute, Furo-cho, Chikusa-ku, 464-8603 Nagoya, Japan 2 Wichita State University, Wichita, Kansas, USA 3 Toyota Motor Co. Ltd., 1 Toyota-cho, Toyota 471-8572, Japan [email protected]

Abstract. This study considers a new method to quantify mental workloads (MWL) by using mathematical models for reflex movement of eye. Several mathematical models of reflex movements have been proposed and experimentally verified by physiologists. In those models, some models of vestibuloocular reflex (VOR) have sufficient accuracy to predict eye movements of individuals. The engagement of brain function to VOR is known in the learning process or in the adaptation process. This leads to the assumption that metal workloads appears in the change of characteristics of VOR. So as to confirm the assumption, we have designed an experimental setup and carried out several experiments. In the experiments, subject’s VOR responses have be accurately predicted by the mathematical model which is a dynamical model with the input of head movements and the output of eye movements. The model dynamics have changed while the subject was engaging in a higher cognitive activity. The coherence between the predicted VOR from the identified model of the particular subject and the observed VOR was as high as 0.92 when there was no additional mental demands. However, the changing MWL over five different n-back tasks revealed the clear correlation between the predicted VOR coherences and the MWL demands. This shows that MWL can be objectively quantified by measuring the error between observed VOR responses and the predicted VOR ones from the identified model.

1 Introduction Mental workload (MWL) is an essential concept in most Human-Machine systems. MWL can be defined as the currently used amount of cognitive resources in a person at a given point in time. Since cognitive resources in humans are limited, human performance is easily deteriorated by heavy MWL. Many methods have been proposed to quantify MWLs. These techniques can be categorized in three groups [1]; (a) primary/secondary task measures, (b) subjectiverating techniques including NASA-TLX, and (c) physiological measures. Unfortunately, most existing MWL measures to date share some of the major disadvantages, especially when measuring vehicle driver’s MWL. First, the primary/secondary task D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 223–232, 2009. © Springer-Verlag Berlin Heidelberg 2009

224

G. Obinata et al.

measures cannot quantify a MWL in real-time because it requires a person to perform two different tasks at two different times and compares the performances. Second, subjective-rating techniques require the person to rate the difficulty level of a task usually after a section of a task. This technique does not measure the person’s MWL in real-time, does not objectively quantify MWLs, and interrupts the main task to record rating. Third, although physiological measures are good at objectively quantifying mental workload in real-time with no or little interruption of main tasks, most of these physiological measures have other disadvantages. For example, heart rate and respiratory rate measures are not accurate in quantifying MWL. Brain imaging techniques such as fMRI and PET techniques are physically obtrusive to a person in a practical situation. All the previously existing MWL measures have some shortcomings when we use one of those methods to quantify human’s MWLs in practical situations. Recently, many researcher groups have started focusing on studying other kinds of physiological measures, such as involuntary reflexes, in the hopes of finding better MWL assessment techniques. For example, vestibule-ocular reflex (VOR) has grabbed the attention of researchers and has been examined to assess its effectiveness in quantifying MWLs [2], [3]. This VOR method of quantifying MWL has six major advantages over other existing MWL measures: The VOR method is (1) objective, (2) does not interrupt the main tasks, (3) measurable in real-time, (4) quickly reflects MWL, (5) not physically obtrusive, and (6) does not require large equipment. Taking from previous studies, our present study used the n-back tasks to determine if VOR responses could be reasonable measures for quantifying a person’s MWLs.

2 Model of Vestbulo-Ocular Reflex (VOR) Vestibule-ocular reflex (VOR) is an involuntary eye movement in humans performed to keep an object at a fixed gaze in order to offset their head movements. This study integrated two parts of VOR models to estimate the VOR responses. The model, which is shown in Fig.1, has been proposed by Merfeld and Zupan [4]. The mechanism of a VOR model consists of three stages, as shown in Fig.1; (1) physical world & sensors, (2) signal processing of neural networks, and (3) generating command signal in angular velocity of eye movements. The VOR mechanism senses head movement; linear accelerations in three dimensions and angular velocities in three dimensions. Since this signal processing is not the same across people, identification of the model is required for an accurate simulation of each person. Eye movements are on three axes. However, this study utilized only the two main rotations: horizontal and vertical. Our study examined the discrepancies between the predicted eye movements by the model and the observed ones on these two axes. The transformation from VOR output of the model to the angle is shown in Fig.2. The signal is modified through velocity storage, final common path and extraocular muscle [5]. Consequently, we integrate the model of the final common path and extraocular muscle with the VOR model described in Fig.1, and use it as the eye movement model of a subject.

Quantitative Evaluation of Mental Workload

225

Fig. 1. Model of vestibulo-ocular reflex (VOR)

Fig. 2. Model of final common pass and muscle dynamics

3 Experimental Setup and Model Identification The experiment set-up, sensors setting on the subjects, and definition of the coordinate system are shown in Fig.3. The rotating chair, which allowed rolling and pitching motions, was set up at a distance of 2[m] from the screen. The rotating chair, Joy Chair, provides rolling and pitching motions on the subject body. The subjects’ head

View angle =

±28[deg]

Position & angle sensor

View angle = 20[deg]

±

Fig. 3. Experimental setup and sensors setting

Eye recorder

226

G. Obinata et al.

position and rotation angle, horizontal and vertical rotation angle of eyeball, rotation angle of the chair were measured at a sampling frequency of 50[Hz]. In the experiment for identifying personal dynamics of the model, the subjects were asked to stabilize their eyes on the center of screen so that the eye movements of VOR were induced by shaking the subject with moving chair. A person’s VOR responses are almost perfectly predicted when the person is not engaging in a cognitively demanding task. The VOR model predicted the person’s responses as perfectly as 0.92 in the coherence between the predicted VOR and the observed one. We have applied genetic algorithm together with local search of gradient method to estimate the parameters in the VOR model which minimize the error expressed by J =

N

∑ {θ i =1

obs

( i ) − θ m d l ( i )} 2

(1)

where θ obs is time series with three dimension sampled from the observed data of eye movement, θ mdl is time series estimated by the model. Model identification is the process to identify individual’s eye movement dynamics that are represented by the 12 gain parameters shown with the seven triangle shapes in Fig.1 and Fig.2. These parameters are different by person and possibly by time. Before the identification, the measured eye movement has been treated with outlier removal process that extracts the rapid eye velocity data, which includes blinking and saccade. The identified model represents the VOR dynamics of a particular subject who was asked to put his gaze on the center of screen and was not engaging in any other demanding task. One result of the identification is shown in Fig.4. The time responses of measured eye movements and the estimates from the model are compared in the figure for two cases. The left figure (I) shows the case when we applied the values for gain parameters as the initial values which have been suggested in Merfeld and Zupan [4] and Robinson [5]. It is shown with the right figure (II) that the better fitting of the predicted angular velocities to the measured ones are obtained after the identification. We can use the frequency responses of the measured eye movements and the predicted ones as the index of concent between these signals. Specially, we take the coherence between these two signals for the index. The frequency responses shown in Fig.5 are the power spectra and the coherence between the measured eye movements

Measured output Simulated output

Measured output Simulated output

60

60

Horizontal Angular velocity [deg/sec]

Angular velocity [deg/sec]

Horizontal 30 0 -30 -60 60

Vertical 30 0 -30 -60

30 0 -30 -60 60

Vertical 30 0 -30 -60

0

2

4

6

8

10

0

2

4

Time [sec]

(

Ⅰ ) B efore identific ation

6

Time [sec]

(

Ⅱ ) After identification

Fig. 4. Example of identification results (time responses)

8

10

Quantitative Evaluation of Mental Workload

M ea s u re d o u tp u t S im u la te d o utp u t C oh e re n ce

H o riz o n tal

0 .6

1 .0

1 .0

0 .8

0 .8

0 .6 0 .4

0 .2

0 .2

0 .0 1 .0

0 .0 1 .0

0 .8

0 .8

V er tica l

0 .6

0 .6

0 .4

0 .2

0 .2

0 .0 1 .0

0 .0 1 .0

0 .2

0 .2

0 .0

0 .0

4

5

6

V e r tica l

0 .6

0 .2 3

0 .8

0 .8

0 .4

2

Ⅰ ) B e fo re id e n tific a t io n

0 .6 0 .4 0 .2

1

2

F re qu en c y [H z]

(

0 .6

0 .4

0 .4

1

0 .8

0 .6

0 .4

0 .0

1 .0

H o r iz o n ta l

Coherence

0 .4

Coherence

Power spectrum density

0 .8

M e as u r ed o u tp u t S im u la ted o u tp u t C o h e re nc e

Power spectrum density

1 .0

227

3

4

5

6

0 .0

Fr e q u e n c y [H z ]

(

Ⅱ ) A ft e r id e n t ific a tio n

Fig. 5. Example of identification results (frequency responses)

and the predicted ones from the identified models. In the figure, (I) and (II) correspond to the cases of before and after the identification. These results mean that the identified model is related linearly to the measured eye movements in the frequency range from 2 Hz to 4.5 Hz. We conducted experiments with three subjects for identifying the models. The results show that the averages of predicted errors in three subjects were from 3 degree to 6 degree during eye movements of the range of more than 20 degree. We observed that the identified parameters vary in a certain range in three subjects. These identification results prove that the identified models can predict well the eye movements of the subjects.

4 Experimental Procedure Three male students, aged between 20 and 24, participated in this study. Since the main purpose of this study was to examine if deviation of observed vestibule-ocular reflex (VOR) responses from the identified model was correlated with required mental workloads (MWL), each participant performed in four different MWL conditions (1 controlled and 3 experimental conditions) while the computer- controlled-shaking Joy Chair was causing the participant’s head to shake at the frequencies of 1 to 6 Hz. The participants were asked to gaze upon a certain fixed point on a projector screen for 30 seconds on each trial. Then, VOR was continuously evoked to stabilize his point of gaze against disturbance on the head position during the task. Each of the 4 conditions was repeated three times for each participant. On each trial, eye movements and reaction time to every verbal presentation were recorded. The 4 conditions were as follows: a control condition that was the Simple Reaction Task (SRT) condition where the participant was asked to simply hit the button when he heard another alphabet letter, which was provided every 2.5 seconds, 1-back task, 2-back task, and 3-back task. The n-back tasks including 1-, 2-, and 3-back tasks impose different amounts of MWL on the person so that experimenters can manipulate the participant’s MWL. In our n-back tasks, one alphabet was verbally presented to the participant every 2.5 seconds for 30 seconds on one trial. The participants were asked to hit the “yes” button

228

G. Obinata et al.

for each verbally presented letter when the presented letter matched the letter presented n items back in sequence, and the “no” button in other cases. The participants were notified of n beforehand. In this present study, n was either 1, 2, or 3. The nback tasks require to the subjects for holding /updating information, and decisions based on it. Working memory acts dynamically for such functioning during n-back tasks. Several researches of imaging brain functions by MRI have provided the observations that frontal association area, temporal association area, and Broca’s area activate during n-back tasks [6], [7]. Loads of MWL are different by person. However, for any person, a higher-number-back task universally requires more MWL than a lower-number-back task.

5 Results

n-back performane

Normalized reaction time [sec]

Proportion correct [%]

Overall, the results show that the more demanding MWL tasks induced more discrepancies between the predicted VOR responses with the identified model and the observed VOR responses. Fig. 6 shows the average results of the three participants on the Proportion Correct (PC, the rate of the right answer) and the reaction time in the n-back tasks. The participants took a longer time to answer in the more demanding n-back tasks, and seemed to try to correctly answer by taking their time in all the three n-back conditions, rather than to simply respond without thinking. This implies that our tasks seemed to appropriately manipulate and impose different MWL levels with the different n-back tasks.

100

80

60

1-back

2-back 3-back Conditions

Reaction time 0.6 0.4 0.2 0.0

1-back

2-back 3-back Conditions

Fig. 6. Proportion correct in percentage and reaction time in the n-back tasks

Fig. 7 shows the results of the power spectrum (red and blue lines) and the coherence (green lines) between the measured VOR and the predicted VOR in the two conditions: the control condition (SRT) and 3-back conditions. The values of coherence are indicated in the green lines. However, the frequency responses above 4 Hz in the horizontal eye movements seemed to have too much noise and be at the error levels, not reflecting VOR responses much. The comparisons between the measured VOR and the simulated VOR in the four conditions in Fig.8 indicate several interesting things. First, there does not seem to be clear distinctions between the power spectrum curves in the vertical eye movements to distinguish the two conditions in Fig.7. The horizontal axis recorded similar power spectra. Second, conversely, the coherence curves (green lines) had relatively distinct characteristics from each other among the two conditions. The coherence curves of the control condition were higher on both

Quantitative Evaluation of Mental Workload

M e a su re d o u tp u t S im u la te d o u tp u t C o h ere n c e

M e a s u r e d o u tp u t S im u la te d o u tp u t C o h e re n c e

H o riz o n ta l

0 .3

1 .0

0.5

0 .8

0.4

0 .6 0 .4

0 .1

0 .2

0 .0 0 .5

0 .0 1 .0

0 .4

0 .8

V e rtic a l

0 .3

0 .6

0 .4

0.1

0 .2

0.0 0.5

0 .0 1 .0

0 .2

0.1

0 .0

0 .0

0.0

5

6

V e rtic a l

0.3

0 .1 4

0 .8

0.4

0 .4

3

0 .6 0 .4 0 .2

1

2

3

4

5

6

0 .0

F r e q u e n c y [H z ]

F re q u e n c y [H z ]

(

0 .6

0.2

0 .2

2

0 .8

0.3

0.2

1

1 .0

H o riz o n t a l

Coherence

0 .2

Coherence

Power spectrum density

0 .4

Power spectrum density

0 .5

229

Ⅰ ) C o n t ro l

(

Ⅳ ) 3 -b a c k

Fig. 7. Comparison of coherences and power spectrum in the tasks p<0 05

Coherence

1.0

** : p<0.01 *

Coherence

*

**

0.8

0.6 VOR

SRT

1-back

2-back

3-back

Conditions

Fig. 8. Evaluating influence of working memory on coherence average (all subjects)

horizontal and vertical axes than the 3-back task. The task with the demanding mental workloads such as 3-back task had the lower coherence lines. Fig.8 shows the average of the coherence in the n-back tasks over all the three participants. This figure adds the VOR results, in which condition the participants were asked just similarly as in the identification experiment to stare at a fixed point on the projector screen without doing any additional task. As expected, the identified model achieved the highest coherence under the condition of identification experiment, as high as 0.92, closest to the perfect coherence 1.00 among the 5 conditions. In regard to the n-back tasks, as the n increased, the supposed mental demand increased, and the deviation of the coherence increased. The coherences were 0.87 for the 1-back task, 0.83 for the 2-back tasks, and 0.80 for the 3-back task. More discrepancies from the baseline value of 0.92 were observed as the supposed MWL increased. The decrease of the coherence with demanding mental tasks implies that the person’s cognitive activities may have interfered with the VOR mechanism in some ways. Our results were consistent with the previous studies by other researchers [2], [8], [9]. These results suggest that VOR measures enable quantification of a MWL.

230

G. Obinata et al.

6 Extension to Optokinetic Situation The model based method for quantitative evaluation of MWL can be extended to the cases when people use eye movements to pursue an object. The model of eye movements for such optokinetic cases can be described with the block diagram shown in Fig.9. The parts of VOR and final common path are the same as the sub-models described in chapter 2. We can use a mathematical model for the OKR part, which has been proposed by Schweugart, et al [10]. The model contains two components. They represent the dynamics of compensating functions for the retinal velocity error. One is a direct pathway characterized by a rapid rise in eye velocity, and the other is an indirect pathway that produces the slow-onset response. The error between the gaze direction and the position of the forward visual scene are inputs to the OKR model, and the model outputs the angular velocity of eyeball. The outputs are added to the outputs of VOR block; finally, the summation becomes the inputs of final common path. The procedure for identifying the model of a particular subject is similar as VOR case. Only the difference is that there exist more parameters to be identified in the block of OKR. Thus, we can obtain the model which represents the dynamics of a person’s eye-movements in extensive situations. If the model is identified for the case that the subject works without any heavy MWL, the predicted output from the model can be used to compare with the real eye movements of the subject. The same experimental setup which is shown in Fig.1 can be used for optkinetic situations. We add a steering on the chair, and make it work as a driving simulator. An example of experimental display of optkinetic situation is shown in Fig.10. The

Fig. 9. Block diagram of reflex eye-movement model (general case)

±10[deg] Primary task 3.5[deg]

6[deg] Secondary task

±4[deg]

* The angle in this figure is the view angle from the subject.

Fig. 10. Example of experimental display

2 0 -2 4

Measured output Model output

2 0 -2 20

30

40

Time [sec]

Horizontal angle [deg]

Measured output Model output

Horizontal Vertical error [deg] angle [deg]

4

50 Vertical error [deg]

Horizontal Vertical angle [deg] angle [deg]

Quantitative Evaluation of Mental Workload

231

Measured output Model output

4 2 0 -2 -4 6

Amplitude of error E(t) threshold level

4 2 0 4

Measured output Model output

2 0 -2 -4 6

Amplitude of error E(t) Threshold level

4 2 0

0

10

20

30

40

50

60

Time [sec]

(I) primary task (single task of tracking)

(II) dual task (tracking task and alphabets recognition)

Fig. 11. Comparison of measured angles with predicted angles by the identified model

variation between target signal (white line in the center) and controlled signal (red marker) is displayed on the screen. The subjects were required to manipulate the steering to bring the center line of the displayed road close to the center of the screen. The target signal was colored noise that is made from gaussian white noise. It has frequency band between 0.03[Hz] and 0.5[Hz]. In addition, a circle displayed on the center of screen randomly changes to red or blue or yellow. The subjects was required to detect the change of color and push the button that is placed on the left side of the steering as quickly as possible when the circle changes to the red. To examine the influence of MWL, we asked the subjects to conduct a dual task: a primary tracking task with alphabets recognition as the secondary task. The secondary task was for the subjects to detect particular alphabets from the displayed alphabets. The displayed alphabets include one particular alphabet that subjects were asked to detect. Alphabets were randomly displayed on the screen for one second once every 3[sec] to 6[sec] as shown in Fig.10. The predicted eye movements by the identified model for the primary task are compared with the measured eye movements in Fig.11 (I). It shows a good concent between the measured eye movements and the predicted ones. On the other hand, in the case of the dual task the predicted eye movements which were calculated from the identified model for the single task (primary tracking task) have considerable errors to the measured eye movements, which is shown in Fig.11 (II). This result suggests that we can quantify the effect of MWL by taking the norm of the absolute error for this kind of optkinetic situation. Another research in our laboratory examines the real-time aspect of this model to answer how accurately the person’s MWL can be quantified using model-based eye-movement measures in real-time [11]. Possible applications of this eye-movement method are many since this method can objectively quantify a person’s MWL in real-time without using large devices.

232

G. Obinata et al.

7 Concluding Remarks It is shown in this study that VOR measures can be used as a method to objectively quantify a person’s MWL. The results showed that the five different levels of mental demands were negatively correlated with the coherence between the simulated VOR responses and the observed VOR responses. When the participants were not engaging in a cognitive task, the VOR responses were predicted with the high coherence of 0.92. However, the coherence was as low as 0.80 when the participants were doing the 3-back task, the most mentally demanding task in our experiments. The observed VOR discrepancies from the simulated VOR responses were probably the result of interference with the human VOR mechanism. This indicates that when unusual VOR responses are observed, the person is most likely to be heavily using his cognitive systems. Additionally, the extension to optkinetic cases has been given, which shows the possibility of this model-based method for quantifying MWL in practical situations.

References 1. Stanton, N.A., Salmon, P.M., Walker, G.H., Baber, C., Jenkins, D.P.: Human Factors Methods – A Practical Guide for Engineering and Design. Ashgate, Burlington (2005) 2. Furman, J.M., Müller, M.L.T.M., Redfern, M.S.: Visual-vestibular stimulation interferes with information processing in young and older humans. Experimental Brain Research 152, 383–392 (2003) 3. Shibata, N., Obinata, G., Kodera, H., Hamada, H.: Evaluating the influence of distractions to drivers based on eye movement model. In: FISTA World Automotive Congress 2006, F2006D164 (2006) 4. Merfeld, D.M., Zupan, L.H.: Neural processing of gravitoinertial cues in humans. III. Modeling tilt and translation responses. Journal of neurophysiology 87, 819–833 (2002) 5. Robinson, D.A.: The use of control system analysis in the neurophysiology of eye movements. Annual review of neuroscience 4, 463–503 (1981) 6. Braver, T.S., et al.: A parametric study of prefrontal cortex involvement in human working memory. Neuroimage 5, 49–62 (1997) 7. Cohen, J.D., et al.: Activation of prefrontal cortex in a non-spatial working memory task with functional MRI. Human Brain Mapping 1, 293–304 (1994) 8. Yardley, L., Gardner, M., Lavie, N., Gresty, M.: Attention demand of perception of passive self-motion in darkness. Neuropsychologia 37, 1293–1301 (1999) 9. Talkwski, M.E., Redfern, M.S., Jennings, J.R., Furman, J.M.: Cognitive requirements for vestibular and ocular motor processing in healthy adults and patients with unilateral vestibular lesions. Journal of Cognitive Neuroscience 17(9), 1432–1441 (2005) 10. Schweigart, G., et al.: Eye movements during combinated pursuit, optokinetic and vestibular stimulation in macaque monkey. Experimental Brain Research 127, 54–66 (1999) 11. Obinata, G., Usui, T., Shibata, N.: On-line method for evaluating driver distraction of memory-decision workload based on dynamics of vestibulo-ocular reflex. Review of Automotive Engineering 29, 627–632 (2008)

Spatial Tasks on a Large, High-Resolution Tiled Display: Females Mentally Rotate Large Objects Faster Than Men Bernt Ivar Olsen1, Bruno Laeng2,3, Kari-Ann Kristiansen4, and Gunnar Hartvigsen1 1

Department of Computer Science, University of Tromsø, Norway 2 Department of Psychology, University of Oslo, Norway 3 Department of Biological & Medical Psychology, University of Bergen, Norway 4 Department of Psychology, University of Tromsø, Norway {Bernt.Ivar.Olsen,Gunnar.Hartvigsen}@uit.no, [email protected], [email protected]

Abstract. In order to assess the qualitative properties of large displays, compared to smaller displays we conducted an experiment using a mental rotation task and a large, 230 inches, tiled display and compared it to performance on a 14.1 inches laptop computer. We also investigated the effect of expectation about the novel technology among the participants. We found that females rotated objects faster than men on the large display with wider field of view. Furthermore, we found that females were influenced by the expectation that the large display should give better performance, since such a positive expectation yielded a faster performance only among females, with no apparent sacrifice of accuracy. Keywords: Tiled display, Spatial Tasks, Mental Rotation, Gender differences.

1 Introduction Display sizes for computers and TVs have grown in the last couple of decades, spawning an interest in the subject of display size and its effect on cognitive tasks, especially spatial tasks. The introduction of larger displays is among others based on an assumption that a larger screen improves the problem solving process. Based on the assumption of equality between genders, little attention has been paid to possible differences between the sexes. We investigated the effect of screen size on the perceptual task of mental rotation [1] by comparing performance on a large 230 inches display with that on a regular 14.1 inches laptop display. The “mental rotation” task has been repeatedly shown to yield robust differences between men’s and women’s performance (e.g., [2]) favoring the former over the latter. The present work was motivated by an endeavor to introduce a very large, high-resolution display called a Display Wall [3] into the medical domain of Radiology [4]. Our choice of the mental rotation task for studying the effects of the large display on cognitive tasks was based on the analogy between this laboratory task and expert visual inspection of images in a Radiology Department. That is, images of slices of the body are studied and compared in sequences (e.g., inspecting MRI slices of the human brain, for D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 233–242, 2009. © Springer-Verlag Berlin Heidelberg 2009

234

B.I. Olsen et al.

instance). Intuitively, visual comparisons between organs or their parts (e.g. healthy models and pathological “objects”), is not unlike what is done in the mental rotation task, where an object is mentally rotated to “fit” or “align” with the other. The somewhat scant research that exists on the cognitive effects of display size suggests that females can gain certain advantages from wider screens that allow a significant increase in the field of view (FOV) [5], [6], FOV meaning the amount of our visual field that is occupied by the display.

2 Method and Experimental Setup Participants. 22 men and 18 women participated in the study (age range: 18 - 43 years; mean age = 29.9, SD 6.2). All participants participated voluntarily and they were offered two lottery tickets for their participation. Written inform consent was obtained from each participant. Two persons were not Norwegians, and therefore they were given instruction in English. Another foreigner comprehended Norwegian well. All participants had normal, or corrected-to-normal, eyesight. Stimuli and Apparatus. The present ‘display wall’ consists of 28 projectors, back-projecting an image onto a screen surface (Fig. 1. shows an example of the display wall in use). There are 7x1024 horizontal pixels and 4x768 vertical pixels, seven by four tiles, on the large screen, totaling approximately 22 million pixels. The physical visible screen size of the large screen is 230 inches. Within the color spectrum there are 22 million pixels of red, green and blue. The small screen setup featured a 14.1 inches screen on a laptop computer, a Dell D600 with a native resolution of 1400 x 1050 pixels and a 24 bit color spectrum. The SuperLab software was running on the laptop computer, while the image was transferred to the display wall using a 100MB Ethernet interface and a Java implemented display-server running on the Virtual Network Computing (VNC) server on a Dell PowerEdge 2800, with 2 Xeon 3.8GHz/2MB 800FSB, 8GB Dual Rank DDR2 Memory (4x2GB), 146GB SCSI Ultra320 (15,000rpm) 1in 80 pin Hard Drive

Fig. 1. MR Figures on Display Wall

Spatial Tasks on a Large, High-Resolution Tiled Display

235

x 2 with the RedHat Linux operating system. The computer cluster feeding the projectors is comprised of 28+1 Dell 370 PCs with P4 Prescott, 3.2GHz, 2GB RAM, 1Gbit Ethernet and a 48 port HP switch. The SuperLab interface (stimuli) was transferred to the display and enlarged to fit the larger display area of the wall. As a consequence, the number of pixels was held constant between the displays, along with the aspect-ration (4:3). As for the screen width and consequential retinal size of the images projected (visual angle of screen), the projected screen (display area covered by Superlab) was measured using a laser-meter to 404cm and 28,5cm for the small screen. Note that projectors working together to produce a single coherent and continuous image (one “desktop”, if you will) have the unique feature that, if aligned correctly, they can produce one image without the bezels that ordinary displays (LCDs) do when aligned in a matrix. There will, however, be small color-variations between the different projectors, but the resulting “image” is remarkably coherent. Stacking either projectors or lcd-displays together like this, one can produce an almost unlimited sized display with a number of pixels proportional to the number of display devices in the configuration. Measuring an exact viewing distance was difficult, since the participants were instructed to maintain “comfort viewing distance” from their chair and table. Nevertheless, the table remained at the same point at all times, and was placed 370 cm from the large screen. As a result, viewing distance from large and small screen respectively, hence, was 370 cm and 65 cm. The viewing angle for large and small screen in our experiment was as shown in Table 1. Total visual angle means the visual angle provided by the display in question, while angle between objects refers to the approximate angle from the person to the midpoints of the two objects. The experiment took place in a room with a Display Wall at the Department of Computer Science, University of Tromsø, Norway. Temperature was set at 20° C and light setting to dark. All participants were tested in the same room with the same equipment within a time period of two months. Each participant was pseudorandomly assigned to groups that began testing with the small screen versus the large conditions. In addition, within each of these groups, an equal amount of participants were assigned to a group that received prior information about the experiment’s hypothesis. This could either be positive or negative. The “positive hypothesis” was a short verbal description accompanied by a graph informing the subject that previous research has shown that a larger-than-normal screen causes people to perform better (at mental rotation tasks). In contrast, the “negative hypothesis” stated the opposite; i.e., worse performance with the large screen than with the regular screen. Pseudorandom assignment consisted in alternating the conditions to balance the set with Table 1. Visual Angles of Between-Objects Distance

Large display

Small Display

Total visual angle display

57.0°

24.7°

Angle Between objects

27.3°

11.8°

236

B.I. Olsen et al.

gender, hypothesis and first trial-run condition (large or small screen). This was done to counter-balance for the practice effect in within-subjects design [2]. Experimental sets were also counter balanced for gender, so that one fourth of the females was given the small screen negative hypothesis, one fourth the negative hypothesis and large screen and so forth. The same setup was used for males. The participants were given 4 training samples before the start of the experiment to ensure that the participant understood the task. The first two training trials included feedback whether the objects were similar/not similar. The task itself was self-paced and each object remained on the screen until the participant made a decision by pressing one of the two keys “.” or “Z” to indicate that the shapes were either the same or different. The computer recorded the result for each key press by use of SuperLab© 2.0 software. There were a total of 266 trials for each subject in each of the two screen conditions. Both small-screen and large-screen conditions took place in the same room, each participant sat at the same table, in the same position, in order to try to keep the environmental variables constant. When both conditions were completed, the participants were given the questionnaire that collected some biographical information: Sex, age, years of education and type of education, line of work and whether they played computer games regularly. Participants were also asked whether they already had a preference for either large or small screens. All participants were left alone in the room, with the door closed. We used a mixed design where Sex (female/male) and Expectation (for/against the hypothesis) are between-subjects factors and Screen (large/small) and Angle (0°, 30°, 60°, 90°, 120°, 150°, 180°) are within-subject factors. An additional between-subjects factor was Order (large first/small first). Data were analyzed using Statview® (5.0) and SPSS® (v.15).

3 Results We first calculated descriptive statistics for each participant, obtaining mean response times (RTs) for correct responses and mean % accuracy rate for each combination of the variables (Screen Size, Match, Angle). Preliminary analyses showed no main effects or interactions in either accuracy or RTs for Order as a factor. Hence, we ignore Order as a factor in the analyses presented below. Response Times. We selected the mean RTs for correct responses for the ‘same shape’ Match conditions, since angular difference for the ‘different shape’ condition are inherently arbitrary (i.e., there is, by definition, no zero point of perfect alignment). We first performed a repeated-measures ANOVA with Screen (large/small) and Angle (0°, 30°, 60°, 90°, 120°, 150°, 180°) as the within-subject factors and Sex (female/male) and Expectation (for/against the hypothesis) as the between-subjects factor. This analysis was aimed at directly evaluating our predictions that females would perform relatively better with the large than the small screen and/or relatively better than the men. We found two significantly reliable effects: A main effect of Screen, F(1,36)= 10.41, p= .00, and an interactive effect of Sex with Screen and Expectation, F(1,36)= 4.15, p= .04. The classic effect of a positive linear trend in RTs with increasing angular disparity was also replicated, F(1,6)=78.8, p<.00.

Spatial Tasks on a Large, High-Resolution Tiled Display

237

Fig. 2. Interactive Effect of Sex with Screen and Expectation

Fig. 2. illustrates the interactive effect of Sex with Screen and Expectation. Women who had been given prior expectations in favor of the large screen showed no increase in RTs in the large screen condition; whereas all of the other groups showed an increase in RTs in the large screen condition and for the men regardless of prior expectations. The fact that women who had prior expectations that the novel situation (display wall) would lead to better performance did indeed respond faster in such a condition raises the possibility that they were simply trading accuracy for speed. In order to assess this, we computed the overall mean RTs and the overall mean accuracy for each participant. We then performed a simple regression with accuracy as the regressor and RTs as the independent variable. We found no evidence for the presence of a speedaccuracy trade-off in the whole group (N=40), since R2 = .04; F(1,38)= 1.4, p= 0.24. Moreover, the slope’s coefficient was negative (-45.7), suggesting that RTs tended to be shorter with increasingly accurate performance. When we repeated the same regression analysis on the group of women expecting the display wall to improve performance, there was no evidence for a speed-accuracy trade-off in this specific group either, since the slope’s coefficient was larger and still negative (-189.7) and the correlation was now significant: R2 = .42; F(1,7)= 5.1, p= .05. As also shown in Figure 2 there was a general slowing of performance with the large screen. However, we had specifically predicted an overall advantage with the large screen for women compared to men. To evaluate the presence of such an effect we computed the 95% confidence interval (C.I. = 523 ms) by using the formula of Loftus and Masson [7] for within-subject designs. Women were indeed 603 ms faster on average in the large screen condition than men (Women: mean RT= 5975;

238

B.I. Olsen et al.

Fig. 3. Mean RTs for both genders in the two display-size conditions

SD= 2736; Men: mean RT= 6578; SD= 3828). As Figure 3 also illustrates, the sex difference in mean RTs exceeded that of the 95% confidence interval. There was little difference between men and women with the small screen, while with the large screen there was a clear trend for males towards slower RTs. Finally, one should also note that the interaction of Sex with Screen and Angle approached significance, F(6,216)= 2.04, p= .06. Accuracy. In the analysis of mean % Accuracy we included all correct responses for both ‘same shape’ and ‘different shape’ trials; however, we ignored angular difference. We then performed a repeated-measures ANOVA with Screen (large/small) and Match (different/same) as the within-subject factors and Sex (female/male) and Expectation (for/against the hypothesis) as the between-subjects factor. The analysis revealed no main effects of Sex on accuracy (Females: Mean % accuracy = 80.3; SD = 11.7; Males: Mean % accuracy = 86.0; SD = 11.2) and no interactive effects of this factor with the others. There was only one marginally significant (main) effect of Match, F(1,36)= 8.4, p= .006; that is, ‘same’ trials were performed more accurately (mean % accuracy =86.1; SD= 12.8) than ‘different’ trials (mean % accuracy = 80.8; SD= 13.4). The preference data collected subsequently to the experiment was also in line with the findings above. 61% of the women reported that they preferred the large display to the small, while 65% of the men preferred the small display for the mental rotation task. These findings also lead us to suspect that the hypothesis presented to the

Spatial Tasks on a Large, High-Resolution Tiled Display

239

participants influenced their preferences, but a chi-square analysis revealed that both males’ and females’ preference was apparently not influenced, as χ2 = 0.012 for females and χ2 = 0.220 for males when testing for dependency between hypothesis and screen preference.

4 Discussion We hypothesized, based on the relatively scant research that exists on spatial tasks and display size that women would perform better on a large display relative to men. Indeed we found support for our hypothesis. However, we also found that specific predictions about the effect of large displays may be strongly influenced by prior “positive expectations” about new techniques, methods (as suggested by the “Hawthorne effect”). Moreover, the effect of gender appears to interact with such expectation effects, as shown in our results. That is, women who expected the display wall to be more effective than a standard computer screen were indeed able to outperform men in speed of mental rotation during the display wall condition, while keeping the same level of accuracy shown with the standard screen. Such finding is remarkable per se, given that the mental rotation task has typically revealed a robust sex difference favoring men [8], [9]. A possible explanation of our finding could be that women benefit from displays that occupy a larger part of the visual field – for instance in that they have a higher sensitivity to other objects of distraction that are visible with a smaller display. Another possibility in this regard is that women might be more efficient than men in spreading or dividing their attention over a larger portion of the visual field. However, Feng and Spence investigated this issue recently in [10], incidentally testing only females, and found no apparent difference in performance between center vision and periphery. Previous work by Feng, Spence and Pratt (including both genders), also report of related null findings [11]. In their studies they used counting of individual objects (enumeration task) and useful-field-of-view (UFOV) tasks, respectively, while mental rotation and our specific use of the original Shepard and Metzler task, using comparison between two objects, arguably is different. Lastly, we note that in Feng, Spence and Pratt’s work with videogame training (ibid. p. 853) discovered that the female control group did show signs of improvement in both the UFOV and the MR task, further indicating the potential importance of expectation. One could, thus, hypothesize that positive expectations enhance the ability of women to expand their functional field of view, which in turn has beneficial effects on spatial tasks like the mental rotation task. Men may be less susceptible to generalize their expectations from one task (or condition) to another. It is also interesting to note that negative information had little impact on performance with either sex, while positive information had an effect with females and not with males. In this context it is worthwhile noting that previous research has shown that presenting, or stressing the gender difference among participants does have an effect on the mental rotation task [12]. However, in this work, Moè and Pazzaglia found that both men and women were susceptible to react to awareness of superior performance of their own gender and the opposite gender – and that men in particular were more sensitive to this information. Their results were reported on accuracy and

240

B.I. Olsen et al.

with no mentioning of response time, and in contrast to the results of Moè and Pazzaglia, our study did not inform participants about gender biases in the task. The above mentioned finding is also referred to as “stereotype threat”, which is when one’s behavior will conform to a stereotype for a group with which one identifies. A recent study conformed such an effect on the MR task [13], while also proving that self affirmation diminishes the gender bias and removes the stereotype threat. Stereotype threat generally affects performance in a negative way, and depends on knowledge about the stereotype, either conscious or unconscious. As mentioned before, we did not inform participants about any superior group for the mental rotation task and the fact that there was no difference in performance between males and females in the small screen condition also contradicts the presence of such an effect. As for the “implications for design”, the results we have gotten are ambiguous: the effect of display size interacts with the effect of expectation. Specifically, our results indicate that when there is a change of display technology in a workplace where spatial tasks similar to that of mental rotation may be relevant (i.e., the Radiology department in a hospital), there is a difference in how the size of the new display affects the two genders, especially when there are positive expectations towards the new technology. An example of contemporary views on the relevance of size in regards to selection of radiology workstations and software is given by Krupinski and Kallergi [14] where they list the “the major criteria” for selecting radiology workstations, based on technical and clinical requirements for the system. There is no mentioning of display size (only pixel related issues and other qualitative requirements for the display), but the ability of zoom and pan is mentioned as important. In addition, the recent interest in including radiology and medical imaging into PDA-units within healthcare [15] further adds to the relevance of studies regarding object/screen size and cognitive issues. As for potential ergonomic issues influencing the current results, there are two variables worth mentioning. The first one is related to neck strain, where research has shown that a downward viewing-angle of 14° or more reduces neck strain in computer users [16]. In the current experiment, participants had a downward angle on the small display, while they had an almost level (horizontal) viewing angle with the large display. The second issue is the difference in viewing distance, 370 and 65 cm for large and small display, respectively. The most influential factor regarding eyestrain in this context is the issue of convergence, meaning inward turning of the eyes towards the nose when viewing objects up close. The “optimal distance” in this concern is called resting point of vergence (RPV), which is the point at which eyes are set to converge when there is no object to converge on – averaging of about 45 inches (114 cm) when looking straight ahead and 35 (89 cm) inches at a 30° downward angle (see [16] for references). Distances beyond RPV do not affect eyestrain. As can be seen from the figures above, the large screen condition could result in somewhat more neck-strain, while the small screen could induce somewhat more eyestrain, relative to the large display. The question remaining is how this should differ between genders to explain the present results. Even though the results presented in the current article are both interesting and quite surprising, it is premature to call for changes in e.g. radiology workstation selection or ergonomic guidelines. Firstly, our study has not been specifically conducted with radiographers. Secondly, to support ergonomic claims about display size, one would have to do clinical studies in situ. Our investigations and results merely point out that

Spatial Tasks on a Large, High-Resolution Tiled Display

241

more research is needed. However, in designing interfaces for e.g. radiology it could prove beneficial to keep in mind that the size of object plays a role in perception, and that this might differ between the genders. It is also interesting to note that field of view also seems to be an important component, as our findings complement previous accounts that only egocentrically performed tasks benefit from physically larger displays [17], whereas our results imply that greater FOV might also affect exocentric spatial abilities. Our initial results will need to be replicated in order to show whether the benefit accrued by women with the large screen is observed only in the context of a positive expectation about the use of such a new visual tool. Regardless of which variable was more important in our experiment the results obtained point to an important possible spurious effect of expectation that may need attention in future studies of cognitive effects and behavioral studies of new technology. Finally, one comment about the main effect of display size: When transferring the visual display of the desktop screen from the laptop computer running Superlab®, there happened a variable delay of about 800ms, that we confirmed using a highspeed camera. Hence – in this context – we should ignore the main effects of display size. That is, despite the presence of such a constant delay, our findings on the effects of sex and expectation indicate that these variables might both have a substantial effect on human performance with large display screens.

Acknowledgements We would like to thank the technical staff at the Department of Computer Science at University of Tromsø and the SHARE research group’s software contributions, making this work possible. This work was partly funded by the Tromsø Telemedicine Laboratory (TTL) project.

References 1. Shepard, R.N., Metzler, J.: Mental rotation of three-dimensional objects. Science 171, 701 (1971) 2. Peters, M.M., Laeng, B.B., Latham, K.K., Jackson, M.M., Zaiyouna, R.R., Richardson, C.C.: A redrawn Vandenberg and Kuse mental rotations test: different versions and factors that affect performance. Brain and Cognition 28, 39–58 (1995) 3. Wallace, G., Anshus, O.J., Bi, P., Chen, H., Chen, Y., Clark, D., Cook, P., Finkelstein, A., Funkhouser, T., Gupta, A., Hibbs, M., Li, K., Liu, Z., Samanta, R., Sukthankar, R., Troyanskaya, O.: Tools and Applications for Large-Scale Display Walls. IEEE Computer Graphics and Applications 25, 24–33 (2005) 4. Olsen, B.I., Dhakal, S.B., Eldevik, O.P., Hasvold, P., Hartvigsen, G.: A large, high resolution tiled display for medical use: experiences from prototyping of a radiology scenario. Studies in health technology and informatics 136, 535–540 (2008) 5. Tan, D.S., Czerwinski, M., Robertson, G.G.: Women go with the (optical) flow. In: CHI 2003. ACM Press, New York (2003) 6. Czerwinski, M., Tan, D.S., Robertson, G.G.: Women take a wider view. In: Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves. ACM Press, Minneapolis (2002)

242

B.I. Olsen et al.

7. Roots, H.: Using confidence intervals in within-subject designs. Psychonomic Bulletin & Review (1994) 8. Linn, M.C., Petersen, A.C.: Emergence and characterization of sex differences in spatial ability: a meta-analysis. Child Dev. (1985) 9. Voyer, D., Voyer, S., Bryden, M.P.: Magnitude of sex differences in spatial abilities: a meta-analysis and consideration of critical variables. Psychol. Bull. (1995) 10. Feng, J., Spence, I.: Attending to large dynamic displays. In: CHI 2008, pp. 2745–2750. ACM, Florence (2008) 11. Feng, J., Spence, I., Pratt, J.: Playing an Action Video Game Reduces Gender Differences in Spatial Cognition. Psychological Science (2007) 12. Moè, A., Pazzaglia, F.: Following the instructions! Effects of gender beliefs in mental rotation. Learning and Individual Differences (2006) 13. Martens, A., Johns, M., Greenberg, J., Schimel, J.: Combating stereotype threat: The effect of self-affirmation on women’s intellectual performance. Journal of Experimental Social Psychology 42, 236–243 (2006) 14. Krupinski, A., Kallergi, M.: Choosing a Radiology Workstation: Technical and Clinical Considerations. Radiology 242, 671–682 (2007) 15. Georgiadis, P., Cavouras, D., Daskalakis, A., Sifaki, K., Malamas, M., Nikiforidis, G., Solomou, E.: PDA-based system with teleradiology and image analysis capabilities. In: Conference proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society Conference 2007, pp. 3090–3093 (2007) 16. Goyal, N., Jain, N., Rachapalli, V.: Ergonomics in radiology. Clinical radiology 64, 119–126 (2009) 17. Tan, D.S., Gergle, D., Scupelli, P., Pausch, R.: Physically large displays improve performance on spatial tasks. ACM Transactions on Computer-Human Interaction 13, 71–99 (2006)

Neurocognitive Workload Assessment Using the Virtual Reality Cognitive Performance Assessment Test Thomas D. Parsons, Louise Cosand, Christopher Courtney, Arvind Iyer, and Albert A. Rizzo University of Southern California’s Institute for Creative Technologies Marina del Rey, CA, USA {tparsons,cosand,courtney,aiyer,arizzo}@ict.usc.edu

Abstract. The traditional approach to assessing neurocognitive performance makes use of paper and pencil neuropsychological assessments. This received approach has been criticized as limited in the area of ecological validity. While virtual reality environments provide increased ecological validity, they are often done without taking seriously the demands of rigorous research design and control for potentially confounding variables. The newly developed Virtual Reality Cognitive Performance Assessment Test (VRCPAT) focuses upon enhanced ecological validity using virtual environment scenarios to assess neurocognitive processing. After an assessment for potential confounds (i.e. appropriate level of immersion and performance on neuropsychological measures), the VRCPAT battery’s Attention Module (i.e. Humvee scenario) was administered to a sample of healthy adults. Findings suggest that increase in stimulus complexity and stimulus intensity can manipulate attention performance within the Attention Module. Keywords: Neuropsychological validity, virtual environment.

assessment,

neurocognitive,

ecological

1 Introduction Attentional processing is an area of particular significance for neuropsychological research into the pattern of neurocognitve strengths and weaknesses in both normal and clinical populations. Two predominant attentional networks have emerged from studies using techniques drawn from clinical [1] and experimental neuropsychology [2], [3]. First, there is the “posterior” system which is believed to include midbrain structures and posterior parietal areas. This “posterior” system is conceptualized as being a largely bottom-up network driven by environmental salience. A second system is known as the “anterior” system which is believed to include frontal and parietal regions as well as the reticular nucleus of the brainstem. This “anterior” system is conceptualized as being a top-down regulatory network involving neurocognitively driven response control. From an applied neuropsychological perspective, this means that the “anterior” system focuses upon the voluntary maintenance of vigilance and sustained attention [4]. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 243–252, 2009. © Springer-Verlag Berlin Heidelberg 2009

244

T.D. Parsons et al.

In general, findings from research related to attention may be understood in terms of automatic and controlled processing [5]. Whilst automatic processing is considered as parallel, requiring little effort, and not under the participant's direct control, controlled processing is understood as serial, effortful, and under a participant’s direct control [6]. Given the effortful nature of controlled processing, it has been found to have an attentional decrement, in which reaction times slow and error rates increase as an effect of time-on-task. The distinction between automatic and controlled processing can be further defined as exogenous and endogenous attention. While exogenous attention refers to the impact of external physical events upon automatic attention, endogenous attention refers to one’s active direction of attention to something deemed important by the participant [7]. Adjustments to stimulus complexity are used to assess these differing aspects of attentional processing. For example, automatic processing and endogenous attention may be assessed by having a subject stare at a computer screen that has four-digit numbers consistently presented in a fixed central location on a computer screen. Contrariwise, an example of controlled processing and exogenous attention is reflected in a scenario in which the four-digit numbers appear randomly throughout the computer screen. Neuropsychological studies of attention tend to assess neurocognitive (e.g. neuropsychological assessment in a controlled setting) and behavioral (e.g. self and other behavioral rating scales of the subject’s activities in a real-world setting) aspects of attention. It is important to note that neurocognitive measures in controlled settings and behavioral ratings based upon naturalistic observations do not consistently proffer parallel findings [8]. Further, dissimilar attentional components may be dissociated both by neurocognitive measures in controlled settings and behavioral ratings based upon naturalistic observations [3]. A related issue is that while traditional neuropsychological assessments manipulate the complexity of the stimulation, they do little to assess the impact of the intensity of the situation. The assessment of attention should reflect the varying levels of intensity found in real world situations. A more intense setting may elicit emotional responses. Findings from attentional assessments must be generalizable to real-world situations [9]. While controlled settings offer increased psychometric rigor, naturalistic observation-based behavioral ratings may better capture the subject’s performance in a real world setting. 1.1 Virtual Environments for Neuropsychological Assessment Virtual Reality offers the capacity for merging the benefits of controlled settings (e.g. increased psychometric rigor) within environments that simulate the environment in which naturalistic observation-based behaviors occur. Recent advances in simulation technology have produced new methods for the creation of virtual environments. With these systems, users can proffer ecological verisimilude reflective of “real world” environments. When delivered via an immersive head-mounted display (HMD), an experience of presence within these captured scenarios can be supported in human users. As such, the VR assets that allow for precise stimulus delivery within ecologically enhanced scenarios appears well matched for research into attentional processing. The value in using virtual reality technology to produce simulations targeting neurocognitive and behavioral applications has been acknowledged by an

Neurocognitive Workload Assessment

245

encouraging body of research. Some of the work in this area has addressed affective processes: anxiety disorders, pain distraction, posttraumatic stress disorder [10]. Other work has assessed neurocognitive processes such as attention and executive functioning [11], [12]; memory [13], [14], [15]; and visuospatial abilities [16], [17], [18]. While multiple attempts have been made to apply theoretical perspectives to the development of believable virtual environments, little has been done to “objectively” assess human interpretations of these environments. There is need for the incorporation of psychophysiological metrics into assessment of persons responses while in a virtual environment. As mentioned above, attentional assessment should aim to recreate the environment in which the subject will be processing information. This is especially important when persons are processing information in environments that have different levels of stimulus intensity. Exposure to emotionally intense situations results in regular activation of cerebral metabolism in brain areas associated with inhibition of maladaptive associative processes [19]. Identical neural circuits have been found to be involved in affective regulation across affective disorders [20], [21]. Systematic and controlled exposure to physiologically intense stimuli may enhance emotional regulation through adjustments of inhibitory processes on the amygdala by the medial prefrontal cortex during exposure and through structural changes in the hippocampus [22]. Thus far, the recording of psychophysiological variables while participants operate within virtual environments has produced useful results in studies examining attention and presence [23], [24], [25]. As such, the VR assets that allow for precise stimulus delivery within ecologically enhanced scenarios appears well matched for this research. Researchers have found that the individual characteristics of study participants may impact the immersiveness and subsequent findings of a given study. Of primary importance is the extent to which a participant is capable of “absorption” and “hypnotism.” Hence, individual differences may moderate presence and confound findings. The propensity of participants to get involved passively in some activity and their ability to concentrate and block out distraction are important factors to consider when conducting a study. Likewise, evidence suggests that hypnotizability plays a role in the outcome of studies using VR. Research into these moderating individual traits is of value because such research may augment participant selection. 1.2 Virtual Reality Cognitive Performance Assessment Test The project described herein builds upon a larger (ongoing) project that makes use of virtual environments to assess user sensory, perceptual, and neurocognitive performance on various tasks. Neurocognitive and psychophysiological data gleaned from such analyses provides opportunity for implementing systems that can exploit the capabilities of nervous systems, rather than simply depending upon human adaptation, to improve and optimize human-computer interaction. Monitoring the neurocognitive and psychophysiological activity of persons operating within a complex environment, however, poses severe measurement challenges. It is also likely that neurocognitive and psychophysiological responses in operational versus tightly controlled laboratory environments will be significantly, if not fundamentally, different than in controlled laboratory settings.

246

T.D. Parsons et al.

The Virtual Reality Cognitive Performance Assessment Test (VRCPAT) project focuses on the refinement of neuropsychological assessment using virtual environments to assess persons immersed in ecologically valid virtual scenarios. The VRCPAT is a three-dimensional virtual environment (i.e. virtual city and Humvee scenarios) designed to run on a Pentium IV notebook computer with one gigabyte RAM and a 128 megabyte graphics card. The primary aim of the VRCPAT project is to use the already existing library of assets as the basis for creating a VE for the standardized assessment of neurocognitive performance within a contextually relevant VE. The application uses USC’s FlatWorld Simulation Control Architecture (FSCA). The FSCA enables a network-centric system of client displays driven by a single controller application. The controller application broadcasts user-triggered or scripted-event data to the display client. The real-time three-dimensional scenes are presented using Numerical Design Limited’s (NDL’s) Gamebryo graphics engine. The content was edited and exported to the engine, using Alias’s Maya software. Three-dimensional visual imagery is presented using the eMagin z800. Navigation through the scenario uses a common USB Logitech game pad device. Virtual reality-based simulation technology approaches, as delineated herein, are considered to be the future alternative for devising neuropsychological assessment measures that will have better ecological/predictive validity for real-world performance. As well, the flexibility of stimulus delivery and response capture that are fundamental characteristics of such digital environments is viewed as a way for research objectives to be addressed in a more efficient fashion for long term needs. The overall design of this type of assessment tool allows for 1) Verisimilitude: the presentation of realistic environments that reflect activities of daily living; and 2) Veridicality: flexibility in terms of the independent variables that could be studied with this method once the psychometric properties of the standardized test are determined. Such flexibility enables this system to be viewed as an open platform on which a wide range of research questions may be addressed. These include the manipulation of: 1) information load on the front end via the intensity and complexity of target stimuli to be attended to and the type of information in terms of relevance, similarity, vagueness, sensory properties; 2) temporal constraints during varied sustained assessment conditions; 3) distracting activities during the neurocognitive assessments; 4) sensory modality of the information presentation that needs to be attended to; 5) the reward structure used during some tests to assess motivational factors that influence performance; 6) the presentation of aversive stimuli for stressed performance evaluations; and 7) the development of a test bed whereby neurocognitive training and augmented cognition strategies could be assessed under known conditions supported by normative standards.

2 Methods 2.1 Participants The study sample initially included 21 healthy adults made up of 15 civilians (i.e. USC students); and 6 military subjects (i.e. West Point Cadets). After an analysis of the impact of immersion as a potential confound upon attentional assessment, the

Neurocognitive Workload Assessment

247

military cohort was removed (see below). The resulting subject pool included 15 healthy subjects (Age, mean = 26.71, SD = 4.49; 50 % male; and Education, mean = 15.50, SD = 2.54). Strict exclusion criteria were enforced so as to minimize the possible confounding effects of additional factors known to adversely impact cognition, including psychiatric (e.g., mental retardation, psychotic disorders, diagnosed learning disabilities, Attention-Deficit/Hyperactivity Disorder, and Bipolar Disorders, as well as substance-related disorders within two years of evaluation) and neurologic (e.g., seizure disorders, closed head injuries with loss of consciousness greater than 15 minutes, and neoplastic diseases) conditions. Subjects were comparable in age, education, ethnicity, sex, and self-reported symptoms of depression. 2.2 Procedure The University of Southern California’s Institutional Review Board approved the study. Experimental sessions took place over a two hour period. After informed consent was obtained, basic demographic information and computer experience and usage activities were recorded. Subjects then completed a neuropsychological battery administered under standard conditions. Following completion of the neuropsychological battery, subjects completed the simulator sickness questionnaire, which includes a pre-VR exposure symptom checklist. Next, all participants were administered the VRCPAT as part of a larger neuropsychological test battery. While experiencing the VRCPAT, participant psychophysiological responses were reorded using the Biopac system. 2.3 Potential Confounds: Immersion and Neuropsychological Assessment The impact of highly immersive VR on participants’ psychophysiological responses was compared with responses to a less immersive experience of watching the scenario on a laptop screen. The “high immersion” condition utilized a head-mounted display, headphones, and a tactile transducer. In the “low immersion” condition, participants wore headphones and watched the scene on a laptop computer screen. The stimuli included a virtual environment, in which the participants experienced “high intensity” and “low intensity” scenarios that occurred while participants drove a Humvee. Stimulus “intensity” was modulated by placing the user in “safe” (low intensity) and “ambush” (high intensity) settings: start section; palm ambush; safe zone; city ambush; safe zone; and bridge ambush. While participants drove the Humvee through the virtual environment scenarios, they were intermittently probed with acoustic startles (110 dB acoustic startle probes). Measures included two psychophysiological measures (startle eyeblink amplitude and heart rate) and responses on two self-report questionnaires (Tellegen Absorption Scale and Immersive Tendencies Questionnaire). The following paper and pencil neuropsychological measures were used to asses for potential confounding differences within the subject pool: To assess Attention we used Digit Span (Forward and Backward) from the Wechsler Adult Intelligence Scale–Third edition (WAIS-III). To assess processing speed we used Digit Symbol Coding from the WAIS-III, and Trail Making Test Part A (TMT). To assess executive functioning we used TMT Part B; and the Stroop Color and Word Test. To assess verbal learning and memory we used the Hopkins Verbal Learning Test – Revised

248

T.D. Parsons et al.

(HVLT-R); to assess nonverbal learning and memory we used the Brief Visuospatial Memory Test – Revised (BVMT-R); and to assess Lexical-Semantic Memory we used Controlled Oral Word Association Test (FAS); 2) Semantic Fluency (Animals). 2.4 VRCPAT Humvee Attention Module The VRCPAT portion included a HUMVEE Attention Task. The Humvee scenario assessed attention using varying levels of both stimulus “intensity” and stimulus “complexity”. Manipulation of stimulus intensity included low intensity situations “safe zones” and high intensity situations “ambush zones”: 1) start section; 2) palm ambush; 3) safe zone; 4) city ambush; 5) safe zone; 6) bridge ambush. The manipulation of stimulus complexity involved the presentation of a four-digit number that was superimposed on the virtual windshield (of the Humvee) while the subject drove the Humvee. Each four-digit number was presented for approximately 300 ms and was randomly selected by the computer from a database of prescreened numbers. During low (simple) complexity presentations the numbers were continually presented in a fixed central location on the windshield. During high complexity presentations the numbers were presented randomly throughout the windshield. The design consisted of six Humvee attention conditions: 1. Fixed Position: 2.0 second condition (Start Section): In this condition, the fourdigit number always appeared in a fixed central location on the “windshield.” The numbers were presented at 2.0 second intervals. This occurred in the “Start Section” and ended just before the “Palm Ambush.” 2. Fixed Position: 1.5 second condition (Palm Ambush): The procedure for this condition was identical to the “Fixed Position” condition described previously except that the numbers were presented at 1.5 second intervals. This occurred in the “Palm Ambush” section and ended just before the “Safe Zone” section. 3. Fixed Position: 0.725 second condition (Safe Zone): The procedure for this condition was identical to the “Fixed Position” condition described previously except that the numbers were presented at 0.725 second intervals. This occurred in the “Safe zone” and ended just before the “City Ambush” section. 4. Random Position: 2.0 second condition (City Ambush): The procedure for this condition is similar to the “Fixed Position” condition with the exception that the numbers appear randomly throughout the “windshield” rather than in one fixed central location. The numbers were presented at 2.0 second intervals. This occurred in the “City Ambush” and ended just before the “Safe Zone”. 5. Random Position: 1.5 second condition (Safe Zone): The procedure for this condition is similar to the preceding “Random Position” condition except that the numbers were presented at 1.5 second intervals. This occurred in the “Safe Zone” and ended just before the “Bridge Ambush”. 6. Random Position: 0.725 second condition (Bridge Ambush): The procedure for this condition is similar to the preceding “Random Position” condition except that the numbers were presented at 0.725 second intervals. This occurred in the “Bridge Ambush”.

Neurocognitive Workload Assessment

249

3 Results To examine potential cohort confounds, the impact of high versus low levels of immersive virtual reality on participants’ (N=14: six West Point cadets and eight University of Southern California student civilians) psychophysiological responses was compared. 3.1 Assessment of Potential Confounds Given the similarity of participants in terms of age, sex, education, ethnicity, immersiveness, and performance on standard paper and pencil measures of neuropsychological assessments, no correction for these variables was employed. Notably, none of the participants reported simulator sickness following VR exposure as measured by the SSQ. West Point cadets, however, responded with significantly lower eyeblink amplitudes (F=7.249, p<0.05) overall. Participants in the “high immersion” condition, cadet or civilian, had higher eyeblink amplitudes than did the participants in the “low immersion” condition. A significant interaction between condition (“high immersion” condition and “low immersion” condition) and participant group (West Point cadets and University of Southern California students) was also found in relation to heart rate. West Point cadets had significantly slower heart rates during the “low immersion” condition compared to University of Southern California students (F=17.662, p<0.001), while their average median heart rates in the “high immersion” condition were nearly identical (West Point cadet mean = 0.802, University of Southern California student mean = 0.799, F=0.001, p=0.997). 3.2 Analyses after Controlling for Confounded Data As a result of the above findings a secondary analysis was done in which the West Point cadets were excluded. Given the similarity of participants in the civilian cohort in terms of age, sex, education, ethnicity, immersiveness, and performance on standard paper and pencil measures of neuropsychological assessments, no correction for these variables was employed. Again, none of the participants reported simulator sickness following VR exposure as measured by the SSQ. Analyses of Immersion Level’s Impact upon Users. To examine differences in levels of immersion upon this new cohort, one-way ANOVAs were performed, comparing median startle eyeblink amplitudes in “high immersion” (Mean = 0.29; SD = 0.09) versus “low immersion” scenarios (Mean = 0.18; SD = 0.03). The results indicated that the increase in immersion caused a significant increase in median startle eyeblink amplitudes (F = 19.17; p < 0.001). Participants’ cardiac responses showed a similar trend as the median beats per minute (BPM) in the “high immersion” condition (Mean = 86.71; SD = 47.75) were higher than median BPM in the “low immersion” condition (Mean = 61.21; SD = 11.29). This trend approached significance (F = 7.918; p < 0.005), corroborating the EMG finding that “high immersion” scenarios evoke a stronger physiological reaction than “low immersion” scenarios.

250

T.D. Parsons et al.

Analyses of Attentional Processing. To examine scenario differences related to the “complexity” of stimulus presentation, one-way ANOVAs were performed, comparing attentional performance in “simple” stimulus presentations (Mean = 43.63; SD = 8.91) versus “complex” stimulus presentations (Mean = 34.63; SD = 6.86). The results indicated that the increase in stimulus complexity caused a significant decrease in performance on attentional tasks (F = 5.12; p = 0.04). To examine scenario differences related to the “intensity” of stimulus presentation, we compared attentional performance in low intensity (Mean = 40.01; SD = 4.06) versus high intensity (Mean = 9.25; SD = 3.70) presentations. The results indicated that the increase in stimulus intensity caused a significant decrease in performance on attentional tasks (t = 9.83; p = 0.01). It is important to note that a confound was not found in the distribution of the standard neuropsychological assessment scores. Given the small sample size, we decided to not assess the construct validity of the VRCPAT Attention Modules. Hence, no attempts were made to assess correlations between standard paper and pencil tests and VRCPAT.

4 Conclusions Our goal was to conduct an initial pilot study of the general usability of the VRCPAT Attention Module scenarios. We aimed at assessing whether the increase in stimulus complexity would result in a significant decrease in performance on attentional tasks. We also wanted to see whether an increase in stimulus intensity would result in a significant decrease in performance on attentional tasks. We believe that this goal was met as the study results indicated that: (1) the increase in stimulus complexity caused a significant decrease in performance on attentional tasks; and 2) the increase in stimulus intensity caused a significant decrease in performance on attentional tasks. We also aimed to assess the impact of potential confounds (cohort, immersion, and performance on traditional neuropsychological assessments) upon neurocognitive performance within virtual environments. First, a cohort confound was found. Results suggest that although West Point cadets found the virtual environment to be a less negative experience than did the University of Southern California student controls, the “high immersion” condition was a more emotionally salient condition. Hence, highly immersive VEs may be effective training tools in simulating military scenarios. It is important to note that a confound was not found in the distribution of the standard neuropsychological assessment scores. This is important and may reflect construct validation. However, this must be corroborated with an increased sample size and a multitrait–multimethod matrix analysis, in which convergent and discriminant validity would be assessed. Our findings should be understood in the context of some limitations. First, these findings are based on a small sample size. As a necessary next step, the reliability and validity of the test needs to be established using a larger sample of participants. This will ensure that the current findings are not an anomaly due to sample size. Additionally, the diagnostic utility of this attention assessment tool must be determined. The ability of the VRCPAT’s Attention Module to accurately classify participants into attention impaired and attention intact groups based on carefully

Neurocognitive Workload Assessment

251

established critical values must be evaluated. This will involve the generation of specific cut-off points for classifying a positive or negative finding. The VRCPAT Attention Module’s prediction of attentional deficits will need to be evaluated by the performance indices of sensitivity, specificity, predictive value of a positive test, and predictive value of a negative test. In sum, manipulation of stimulus complexity and intensity in the VRCPAT’s Attention Module revealed significant differences in performance on attentional tasks. Complementary comparisons of the VRCPAT’s Attention Module with standardized behavioral and neurocognitive tests developed to assess attentional abilities are also warranted in an increased sample size to determine the VRCPAT’s construct validity.

References 1. Mirsky, A.F., Anthony, B.J., Duncan, C.C., Ahearn, M.B., Kellam, S.G.: Analysis of the elements of attention: A neuropsychological approach. Neuropsychology Review 2, 109– 145 (1991) 2. Posner, M.I., Petersen, S.E.: The attention system of the human brain. Annual Review of Neuroscience 13, 25–42 (1990) 3. Knudsen, E.I.: Fundamental components of attention. Annual Review of Neuroscience 30, 57–78 (2007) 4. Posner, M.I., Raichle, M.E.: Networks of attention. In: Posner, M.I., Raichle, M.E. (eds.) Images of mind, pp. 153–179. Scientific American, New York (1994) 5. Fisk, A.D., Schneider, W.: Control and automatic processing during tasks requiring sustained attention: a new approach to vigilance. Hum Factors 23, 737–750 (1981) 6. Schneider, W., Shiffrin, R.M.: Controlled and automatic human information-processing. 1. Detection, search, and attention. Psychol. Rev. 84, 1–66 (1977) 7. Posner, M.I.: Orienting of attention. Q. J. Exp. Psychol. 32, 3–25 (1980) 8. Gordon, M., Barkley, R.A., Lovett, B.J.: Tests and observational measures. In: Barkley, R.A. (ed.) Attention-deficit hyperactivity disorder: A handbook for diagnosis and treatment, 3rd edn., pp. 369–388. Guilford, New York (2006) 9. Foa, E.B., Kozak, M.J.: Emotional processing of fear: exposure to corrective information. Psychological Bulletin 99, 20–35 (1986) 10. Parsons, T.D., Rizzo, A.A.: Affective Outcomes of Virtual Reality Exposure Therapy for Anxiety and Specific Phobias: A Meta-Analysis. Journal of Behavior Therapy and Experimental Psychiatry 39, 250–261 (2008) 11. Parsons, T.D., Bowerly, T., Buckwalter, J.G., Rizzo, A.A.: A controlled clinical comparison of attention performance in children with ADHD in a virtual reality classroom compared to standard neuropsychological methods. Child Neuropsychology 13, 363–381 (2007) 12. Parsons, T.D., Rizzo, A.A.: Neuropsychological Assessment of Attentional Processing using Virtual Reality. Annual Review of CyberTherapy and Telemedicine 6, 23–28 (2008) 13. Parsons, T.D., Rizzo, A.A., Bamattre, J., Brennan, J.: Virtual Reality Cognitive Performance Assessment Test. Annual Review of CyberTherapy and Telemedicine 5, 163–171 (2007) 14. Parsons, T.D., Rizzo, A.A.: Initial Validation of a Virtual Environment for Assessment of Memory Functioning: Virtual Reality Cognitive Performance Assessment Test. Cyberpsychology and Behavior 11, 17–25 (2008)

252

T.D. Parsons et al.

15. Parsons, T.D., Silva, T.M., Pair, J., Rizzo, A.A.: A Virtual Environment for Assessment of Neurocognitive Functioning: Virtual Reality Cognitive Performance Assessment Test. Studies in Health Technology and Informatics 132, 351–356 (2008) 16. Parsons, T.D., Larson, P., Kratz, K., Thiebaux, M., Bluestein, B., Buckwalter, J.G., Rizzo, A.A.: Sex differences in mental rotation and spatial rotation in a virtual environment. Neuropsychologia 42, 555–562 (2004) 17. Parsons, T.D., Rizzo, A.A., Buckwalter, J.G.: Backpropagation and regression: comparative utility for neuropsychologists. Journal of Clinical and Experimental Neuropsychology 26, 95–104 (2004) 18. Parsons, T.D., Rizzo, A.A., van der Zaag, C., McGee, J.S., Buckwalter, J.G.: Gender and cognitive performance: a test of the common cause hypothesis. Aging, Neuropsychology, and Cognition 12, 78–88 (2005) 19. Schwartz, J.M.: Neuroanatomical aspects of cognitive-behavioural therapy response in obsessivecompulsive disorder. An evolving perspective on brain and behaviour. British Journal of Psychiatry Supplemental, 38–44 (1998) 20. De Raedt, R.: Does neuroscience hold promise for the further development of behavior therapy? The case of emotional change after exposure in anxiety and depression. Scandinavian Journal of Psychology 47, 225–236 (2006) 21. Mineka, S., Watson, D., Clark, L.A.: Comorbidity of anxiety and unipolar mood disorders. Annual Review of Psychology 49, 377–412 (1998) 22. Hariri, A.R., Bookheimer, S.Y., Mazziotta, J.C.: Modulating emotional responses: effects of a neocortical network on the limbic system. Neuroreport 11, 43–48 (2000) 23. Macedonio, M., Parsons, T.D., Rizzo, A.A.: Immersiveness and Physiological Arousal within Panoramic Video-based Virtual Reality. Cyberpsychology and Behavior 10, 508– 516 (2007) 24. Meehan, M., Insko, B., Whitton, M., Brooks, F.: Physiological measures of presence in virtual environments. In: Proceedings of 4th Annual Presence Workshop, Philadelphia (May 2002) 25. Pugnetti, L., Meehan, M., Mendozzi, L.: Psychophysiological correlates of virtual reality: a review. Presence 10, 384–400 (2001)

Sensing Directionality in Tangential Haptic Stimulation Greg Placencia, Mansour Rahimi, and Behrokh Khoshnevis Daniel J. Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089 {placenci,mrahimi,khoshnev}@usc.edu

Abstract. Few studies have explored haptic sensing on a finger pad as a means of transferring complex directional information. Stimuli presentation using Braille or tactile vibrators use binary (“on/off”) signals which require large areas to adequately represent data. Our research seems to support that tangential motion on a finger pad is a promising means of transmitting tactile information more compactly at equal or better rates than current methods. The index fingertips of 62 subjects were stimulated using random pattern of tangential motion in eight directions over two distances. An ANOVA found that distance was statistically significant, and direction was significant for 0.5 mm displacements, but not at 1.5 mm. Age also significantly affected perception of tangential motion. These results suggest tangential motion could transmit certain type of haptic information effectively; but its effectiveness may decrease with user age. Keywords: tangential motion, directional haptic sense.

1 Introduction From infancy we explore and actively manipulate our world through haptic’s dynamic two way interactions. Yet despite the importance of touch, its use as a method of information transfer has been relatively untapped, with the exception of Braille, introduced in 1821 (1). More recently, more complex stimuli presentation through lateral forces, vibration, and finger positioning have been explored as means of haptic information transfer. 1.1 Normal Force Stimulus Braille essentially uses normal forces to transmit information. Single tactile elements are raised dots; each providing 1 bit of information (touch or no touch), placed in 3 × 2 cell arrays, spaced 2.5 mm apart (from their centers); provide up to 6 bits (26 = 64 symbols) of information (Fig. 1). An earlier design, from which Braille evolved, used 6 × 2 arrays that were difficult to read because symbols were not felt all at once (1).

Fig. 1. Braille Element

D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 253–261, 2009. © Springer-Verlag Berlin Heidelberg 2009

254

G. Placencia, M. Rahimi, and B. Khoshnevis

Recently, it was found that the perceptual “frame” through which humans distinguish tactile information is largely confined to the area of contact (2). This means tactile symbols are difficult to resolve when not completely felt. A visual analogy is to read the English Fig. 2. Cell Analogy letter “Q” in two parts (Fig. 2). We recognize the symbols “Ø,” “O” or “Q” reading the upper part, but only distinguish “Q” after reading the lower part. Because of Braille’s proven usability, many efforts have sought to recreate or augment it using small actuators located at the fingertip. But these elements are expensive, limited to small scale normal forces and require a great deal of spatial acuity; a fundamental limitation of Braille encoding. 1.2 Lateral Force Information Lateral motion sensing has been used to implement Braille elements with limited usability (Fig. 3). Its effectiveness improves by increasing motion strength and contrast (3). Theoretically such elements emulate the Front information capability of traditional 1–bit Braille Arrows indicate linear motion elements using motion/no Top motion to produce up to 64 symbols. However lateral motion traverses two axes, allowing Fig. 3. Lateral tactile display at least 2 bits of information: © 2007 IEEE, reprinted with permission (3) front–back and left–right. Furthermore angular thresholds for lateral motion have been found from 16º– 28º (4), (5). For 360º motions, a single lateral motion element can theoretically transmit:

360° 360°

= log (22.5)⎦ = 4 bits = 16 symbols 16° ⎣ 2 28°

≈ ⎣log 2 (12.9)⎦ = 3 bits = 8 symbols

(1)

1.3 Sensing Vibration Vibration elements usually generate 1 bit of information (vibration – no vibration). For this reason they are often strategically placed on the body to convey spatial information, e.g. (6), (7). More advanced elements like Vibratese (8) used three normal intensity levels and three vibration levels at different locations around the chest to create a simple haptic “alphabet.” Recently such elements have been incorporated into cell phones with Immersion Corp’s Vibetonz system which is capable of 5 distinct vibration channels – shape (steady – ramp), duration (constant – varied), speed (slow – fast), style (sharp strong – sharp), and magnitude (high – low) – that can be used individually or in combination (9).

Sensing Directionality in Tangential Haptic Stimulation

255

1.4 Finger Position Information Finger position has also been used to transmit haptic information. One early effort was the “reverse” typewriter (Fig. 4) that pushed user fingers in the x, y, and z axes (10). By replicating typing motions a finger received about 3 to 4 characters (1–2 bits) of information. Using 8 fingers, the entire contents of a 1960’s QWERTY keyboard could be transmitted. The Tactuator (Fig. 5) was a generalized form of the reverse typewriter that used movable rods to transmit force amplitude, frequency, and relative motion of varied durations to three fingers (11). In those studies, multiple dimensions provided greater sensational contrast than multiple levels within a dimension. This indicated how tactile signals are masked unless distinctly separated (11).

Fig. 4. Finger stimulator with detail © 1962 IEEE; reprinted with permission (10)

Fig. 5. Tactuator finger position and motions © MIT; reprinted by permission (11).

1.5 Factors of Accurate Tangential Information Transfer–Distance, Direction, and Age Tangential motion can transmit complete signals to reduce confounding (2). It uses 2 dimensional axes to increase perceptual contrast (12), especially when paired with varied distance. Studies have suggested that lateral motion enhances tactile information transfer (13). Finally, prior studies have established force and angular thresholds for lateral motion (3), but have not examined its interaction with other factors. We identified direction and distance as possible main effects influencing accurate signal perception of an applied directional stimulus. To our knowledge, tangential motion with distance interaction has not been studied. We therefore wanted to test the perceptual effects of varied distance on tangential motion, in order to examine displacement restrictions so as to understand how compactly we could make signal representations. When matching tangential forces to normal ones, direction was not found statistically significant (14). However, when judging angular Just Noticeable Differences (JNDs) versus different references, statistical differences were found (4), (5). While we could not explicitly control age, we chose to examine this effect as a covariate factor. Tactile (15) and vibration thresholds (16), (17) increase significantly

256

G. Placencia, M. Rahimi, and B. Khoshnevis

with age, indicating that there might be a substantial affect on tangential motion perception.

2 Methodology We tested three haptic factors: 2 directions; front–back (distal–proximal) and left– right (radial–ulnar) and distance (0.5 mm and 1.5 mm). While up to 22 tangential directions could be distinguished (equation (1), we chose to use only 8 in order to increase contrast while maximizing available motion. Distance corresponded to > 75% recognition of distal–proximal motion for probes glued to forearms (18) with additional compensation for decreased lateral sensitivity at the fingertip (14). We developed an automated, tangential Probe Plunger motion device that stimulated subject Fig. 6. Stimulus Interface fingertips using a round nylon probe (μs = 0.25–0.5N; contact area ~ 6 mm diameter × 1.25 mm deep) moving approximately 5 mm/sec (Fig. 6). To ensure consistent fingertip stimulation, subject hand and finger were immobilized during testing. Probe motion was aligned to correspond to body position (axes) so as to reduce spatial confounding effects. A one N normal force was applied during testing using a weighted plunger (Fig. 6) to help subjects calibrate the force with which they touched the probe. We calculated that this generated a 0.33– 0.45 N tangential force during testing; which roughly corresponded to the 0.5 N tangential force used in (14) as baseline perception. Sixty two subjects were subjected to 32 randomized, inter–subject trials (2 distances × 8 directions × 2 replicates). Subjects indicated their gender and age category in response to a questionnaire. The questionnaire screened subjects for possible illness or injury that could affect perception. Subject breakdown is shown in Table 1: Table 1. Demographic Breakdown of Study

Gender Male Female 28 34 45% 55%

1:18-34 17 27%

2: 35–44 11 18%

Age Category 3: 45–54 4: 55–64 12 14 19% 23%

5: 65+ 8 13%

Sensing Directionality in Tangential Haptic Stimulation

257

Subjects were familiarized with test procedure and probe motion prior to testing. During testing, subjects wore a blindfold and a noise cancellation headset (Creative HN–700) to negate visual and audio cues, and were seated in an ergonomic chair with the option to have their arm supported to reduce V1 fatigue. Trials started with the probe at a neutral V8 V2 (center) position of the finger pad. Subjects were prompted to lower their fingertip onto the probe and V3 V7 prepare for stimulation using two separate tones played through the headset. The probe moved in a V4 random direction and distance followed by another V6 V5 tone signaling subjects to lift their finger off the probe and report their percept scores. The ensuing 20 – 60 second delay during scoring allowed the skin to unload Fig. 7. Finger pad directions to mitigate confounding between sequences. The probe (top view) was then reset for the next trial. Subjects reported their perception after each trial using a ten point Likert scale to describe perceived strength (magnitude); from 1 (no perception) to 10 (strongest perception) in one or more of the eight possible directions: “front” (V1 – towards the fingertip), “back” (V5 – towards the palm), “left” (V7), “right” (V3), plus the four in-between diagonals (Fig. 7). Scores were not restricted to single directions in order to measure complete perception. This resulted in the ith percept generating a vector :

r vi = [vi1

vi 2

vi 3

vi 4

vi 5

vi 6

vi 7

vi 8 ]

(2)

2.1 Dimensional Reduction–The Percept Vector

Vixy

X

+ –

stimulus

Approximately 42% of responses reported 2 or more values for each direction. We therefore needed to accurately represent single as well as multiple responses in terms of both stimulus direction and strength (magnitude). Details of our heuristic procedure are provided in (19), but are outlined here. Our intuition was to align subject percepts to the actual stimuli then break down the magnitudes of the ith percept into their x and y values (Fig. 8), so as to generate a percept vector (Vip). A simpler form of this was used to calculate a mean vector from a series of unit vectors representing mechanoreceptor responses in (20). In our case Vip was a function of directional (Vixy) and magnitude (Mip) components. Direction was defined as:

+ Y –

Fig. 8. Fingerpad axes with example stimulus (top view)

v ⎡vix ⎤ ⎡vi1 cos α ij1 + vi 2 cos α ij1 + K + vi 7 cos α ij 7 + vi1 cos α ij 8 ⎤ = ⎢v ⎥ = ⎢ ⎥ (3) ⎣viy ⎦ ⎣ vi1 sin α ij1 + vi 2 sin α ij1 + K + vi 7 sin α ij 7 + vi1 sin α ij 8 ⎦

where αij was determined by the directional stimulus during a trial as:

258

G. Placencia, M. Rahimi, and B. Khoshnevis

α i1 = front 45 L 270 315 ⎤ ⎡ ⎤ ⎡ 0 ⎢α = front − right ⎥ ⎢315 0 L 225 270⎥ ⎢ i2 ⎥ ⎢ ⎥ ⎥=⎢ ⎥ .Α=⎢ M O ⎢ ⎥ ⎢ ⎥ α i 7 = left 45 ⎥ ⎢ ⎥ ⎢ 90 135 L 0 ⎢⎣ α i 8 = front − left ⎥⎦ ⎢⎣ 45 90 L 315 0 ⎥⎦

(4)

Larger magnitudes contributed more significantly than smaller ones thereby “weighing” the vector towards their direction. The resulting angle was retrieved using:

α i ∗ = arctan(Vixy )

(5)

which can easily be retrieved using the ATAN2 function (or its equivalent) in any spreadsheet programs or dedicated statistical analysis programs. When two or more scores were reported they tended to reinforce each, generating magnitudes greater than the largest reported value when we calculated |vi|. Instead, we “averaged” the magnitude of Mip as a weighted sum of vi:

M ip = ∑ j =1 v ij ∗ wij 8

(6)

We were interested in finding the perceived magnitude in the actual stimulus direction. We therefore used αi* to estimate the value of Mip in that direction using the transformation:

Vicp = M ip ∗ cos α i∗

(7)

to generate the corrected percept vector (Vicp) which ranged from 10 to -10. Positive values indicated accurate perception, while 0 and negative values indicated motion was perceived perpendicular or opposite the actual stimulus respectively (Fig. 8). Results from these transformations compared favorably with established JNDs in (14), (21), and (5), and are detailed in (19).

3 Analysis A breakdown of Vcp by direction and distance is shown as a star diagram in Fig. 9. and in Table 1. An ANOVA found direction (p=0.004), distance (p<0.001), and their interaction (p=0.007) all statistically significant. A Tukey’s test showed distance significant (p<0.001). For direction, significance was only found between forward motion versus back–right (p<0.001), back–left (p=0.003), and front–left (p=0.002). Direction was also

Fig. 9. Median Vcp by Direction (top of finger)

Sensing Directionality in Tangential Haptic Stimulation

259

found to be of borderline significance for left motion versus back–right (p=0.05). The F test used in ANOVA simultaneously considers all possible contrasts of treatment means, not just pair–wise comparisons (22). Further examination of 0.5mm data and 1.5 mm data using one–way ANOVA showed that direction is significant (p<0.001) for the shorter distance, but not for the longer (p = 0.88). This corresponds with results reported in (14), (21), and (5). Table 2. Corrected Magnitude Perception (Vcp) by Direction Front Overall 0.5 mm 1.5 mm

Average Median Average Median Average Median

5.97 6.79 5.60 5.81 6.34 7.07

FrontRight 4.97 5.65 4.10 4.66 5.83 6.69

Right 5.25 6.00 4.22 5.00 6.23 7.00

BackRight 4.29 5.39 2.87 4.00 5.64 6.36

Back 5.01 5.83 4.12 5.00 5.85 6.84

BackLeft 4.62 5.65 3.75 4.54 5.41 6.47

Left 5.36 6.00 4.29 4.35 6.42 7.00

FrontLeft 4.57 5.66 2.90 3.58 6.14 6.80

3.1 The Effects of Age A general linear model found age to have a statistically significant effect on Vp (p<0.001). This effect is illustrated for distance in Fig. 10 with similar results found for direction.

Fig. 10. Average Vp by Age Category and Distance

4 Discussion Our analysis showed accuracy of perceived tangential motion is related to the distance of the stimuli. This effect was so pronounced that when incorporating direction,

260

G. Placencia, M. Rahimi, and B. Khoshnevis

shorter distances displayed significant differences between forward motion and diagonal motions; back – right, back – left, and front – left; while longer distance did not. This finding in conjunction with (14) suggests distances 1.0mm or greater transmit at least 3 bits (8 direction) of information accurately provided we use 45º separation. In contrast, distances less than 1.0mm appear to transmit less information Fig. 11. Proposed – slightly more than 2 bits – particularly when moving Augmented Braille diagonally. Design(top view) A practical application of these results could augment the results of (3) which sought to replicate Braille elements using 0.1 mm radial–ulnar motion with limited results. While Braille elements are typically 2.5 mm apart, our results suggest that increasing motion at such small scales can improve perception. Tracing outlines of Braille characters, as suggested by (23), using 2–axial motion is another possibility. A third option is generating an enhanced Braille alphabet using 2–axial motion; which could effectively double, if not quadruple, the information a single element can transmit (Fig. 11). However, development of any such paradigms must consider the age of the user. Our covariate analysis showed that older subjects perceived tangential stimuli less accurately than younger ones, particularly after 55 years of age. This finding in conjunction with (15), (16), and (17) suggest research in “haptic amplifiers” may be warranted, especially when 1 in 8 of the earth’s population is predicted to be 65+ years by 2030 (24). Haptic amplifiers improve tactile perception much like hearing aids augment audition. For example, (25) suggested larger contact areas reduced pressure thresholds. We therefore used a large, low friction probe rather than the “pin–like” probe in (14), (21), and (5). This allowed our interface to be more comfortable for subjects, while producing similar results within the parameters of our test. Such research would increase overall usability, thereby inducing users to adapt such technologies more readily.

Acknowledgements The author would like to acknowledge the invaluable help of Dr. Kurt Palmer in the statistical analyses of the complex data set.

References 1. Brunson, M.: A Brief History of Braille. American Council of the Blind (2005), http://www.acb.org/resources/braille-history.html 2. Salada, M., et al.: Fingertip Haptics: A Novel Direction in Haptic Display. In: Proceedings of the 8th Mechatronics Forum International Conference. University of Twente, Enschede (2002) 3. Braille Display by Lateral Skin Deformation with the STReSS2 Tactile Transducer. Levesque, Vincent, Pasquero, Jerome and Hayward, Vincent. In: Second Joint EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, pp. 115–120 (2007)

Sensing Directionality in Tangential Haptic Stimulation

261

4. Webster III, R., et al.: A Novel Two-Dimensional Tactile Slip Display - Design, Kinematics and Perceptual Experiments, vol. 2, pp. 150–165. ACM, New York (2005) 5. Vitello, Marco, Drif, Abdelhamid, Giachritsis, C.D.: Final Evaluation Report for Haptic Displays with Guidelines. TOUCH-HapSys, Towards a Touching Presence: HighDefinition Haptic Systems (2006) 6. Cholewiak, R.W., Beede, K.: The Representation of Space through Static and Dynamic Tactile Displays. In: Proceedings of Haptics International, Las Vegas, NV (2005) 7. Tan, Z.H., et al.: A Haptic Back Display for Attentional and Directional Cueing. Haptics-e. The Electronic Journal of Haptics Research 3 (2003), http://www.haptics-e.org 8. Geldard Frank, A.: Some Neglected Possibilities of Communication. In: American Association for the Advancement of Science. Science, New Series, vol. 131, pp. 1583– 1588 (1960), http://www.jstor.org/stable/1705360 9. VibeTonz SDK. Immersion Corporation, http://www.immersion.com/mobility 10. Bliss, James, C.: Kinesthetic-Tactile Communications. IRE Transactions on Information Theory 8, 92–99 (1962) 11. Tan, Z.H.: Information transmission with a multi-finger tactual display. Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science (1996) 12. Tan, Z.H., et al.: Information transmission with a multifinger tactual display. Perception & Psychophysics 61(6), 993–1008 (1999) 13. Srinivasan, Mandayam, A., Whitehouse, J.M., LaMotte, R.H.: Tactile Detection of Slip: Surface Microgeometry And Peripheral Neural Codes. Journal of Neurophysiology 63, 1323–1332 (1990) 14. Biggs, S.J., Srinivasan, M.A.: Tangential versus normal displacement of skin: relative effectiveness for producing tactile sensation. In: Proceedings of IEEE VR (2002) 15. Thornbury, J.M., Mistretta, C.M.: Tactile sensitivity as a function of age. Journal of Gerontology 36, 34–39 (1981) 16. Stuart, M., et al.: Effects of aging on vibration detection thresholds at various body. In: BMC Geriatrics, vol. 3 (2003) 17. Verrillo, R.T.: Age-related changes in the sensitivity to vibration. Journal of Gerontology 38, 185–193 (1980) 18. Olausson, H., et al.: Remarkable Capacity for Perception of the Direction of Skin Pull in Man. Brain Research 808, 120–123 (1998) 19. Placencia, Greg: Information Transfer via Tangential Haptic Stimulation. Epstein Department of Industrial and Systems Engineering, Viterbi School of Engineering. University of Southern California, Los Angeles, PhD Thesis (2009) 20. Birznieks, I., et al.: Encoding of Direction of Fingertip Forces by Human Tactile Afferents. The Journal of Neuroscience 21, 8222–8237 (2001) 21. Drewing, K., et al.: First Evaluation of A Novel Tactile Display Exerting Shear Force via Lateral Displacement. ACM Transactions on Applied Perception 2(2), 118–131 (2005) 22. Montgomery, Douglas, C.: Design and Analysis of Experiments, 6th edn. Wiley, Chichester (2004) 23. Millar, Susanna: Reading by Touch. Routledge, New York (1997) 24. Why Population Aging Matters: A Global Perspective. Department of State and the Department of Health and Human Services, National Institute on Aging, National Institutes of Health, Washington, DC (2007) 25. Tan, Z.H., et al.: Human Factors for the Design of Force-reflecting Haptic Interfaces. In: Proceedings of Winter Annual Meeting of the American Society of Mechanical Engineers: Dynamic Systems and Control, vol. 55, pp. 353–359 (1994)

Effects of Design Elements in Magazine Advertisements Young Sam Ryu1, Taewon Suh2, and Sean Dozier1 1

Ingram School of Engineering 2 Department of Marketing, Texas State University-San Marcos, 601 University Drive, San Marcos, TX 78666, USA {yryu,ts21,sd1100}@txstate.edu

Abstract. In this study, unlike previous studies where participants were instructed to pay attention to the advertisements, we set up a more naturalistic situation of reading magazine. Five major design elements (body text, head text, brand logo, product image, and human model image) were investigated and our results showed pictorial elements captured more looking time and fixations than textual elements in general and textual elements received more looking time and fixations per unit size than pictorial elements. Also, a comparative data analysis of two different but very similar advertisements of competing products provided design implications regarding the use of human model image and head text. Keywords: print advertisement, eye tracking.

1 Introduction Print advertisements in magazine are still one of the major advertising medium as they account for 13% share of ad spending in the United States in 2003 [2]. Thanks to the eye tracking technique, empirical studies measuring visual attention to each design elements of print advertisements have been done for the last decade by psychologists and marketing researchers. Most of them used eye fixation durations and number of eye fixations as the measurements of visual attention. An experimental study of print advertisement [3] found fixation durations were longer on the picture than the text. However, the more number of fixations were made on the text. Also, they found viewers tend to read in the sequence of large text, smaller text, and picture. Common belief for print advertising is that larger advertisements should attract more attention. However, the size effects of major design elements of print advertisements were rarely studied empirically. Pieters and Wedel [2] investigated the effects of three key advertisement elements (pictorial, catch copy, and brand) on attention capture in magazine advertisements and claimed pictorial is effective regardless of its size and the text is capturing attention proportional to its surface size. Recently, Aoki and Itoh [1] investigated the effects of the same three elements during reading printed advertising. They found more attention was paid to the ads with the three elements than those without. Also, they reported the viewers looked at ‘body text’ first instead of the three elements. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 262–268, 2009. © Springer-Verlag Berlin Heidelberg 2009

Effects of Design Elements in Magazine Advertisements

263

In this study, we set up a more naturalistic situation of reading magazine and examined viewer’s visual activity toward each of design elements of advertisements using eye tracking technique. Unlike most previous studies instructed participants to pay attention to the advertisements, the participants in this study were not instructed to do anything other than just reading the magazine presented. Instead of the three design elements by Pieters and Wedel [2] and Aoki and Itoh [1], five major design elements of print advertisements were classified and used for this study. Those include body text, head text, brand logo, product image, and human model image. We examined looking time, looking sequence, and initial fixations within each ad in order to explore the effectiveness of each of the five design elements as proportional to the surface size of each element. Also, a comparative data analysis was performed between two different but very similar advertisements of competing products.

2 Method 2.1 Participants To reduce any gender effect, the contents of the magazine selected for this experiment was only for females in their late teen or early twenties. Thus, twenty female students enrolled in business courses at Texas State University were recruited to participate in the experiment for their extra credits for the courses. Their ages ranged from 20 to 36 and average age was 22.76 (SD = 3.46). 2.2 Materials The magazine was edited to consist of 13 advertisement pages and 10 non-advertising pages. The eight of the thirteen advertisements were the ones interested in the experiment, all of which are for cell phone or cell phone service providers to reduce the effect of product domain. The other advertisements were random ones selected to act as dummies. Each advertisement page included various components of a typical advertisement such as brand logo, head text, body text, product image, and human model image. The magazine viewed by the participant was simulated on a 19 inch LCD screen using a PDF file. One screen showed two facing pages of the magazine as the traditional layout. 2.3 Equipments Eye movement of the magazine viewers was recorded by an ASL 6000 eye tracking system. A heavy duty chin rest device was installed to fix the position of the participants’ heads to minimize the variation of the eye tracking data. 2.4 Procedure The participant was asked to wear the head-mounted eye tracking device and to scan through computer screen to perform calibration assisted by the experimenter. After the calibration, the participant was asked to scan through the magazine presented on

264

Y.S. Ryu, T. Suh, and S. Dozier

the computer screen. The participant was not informed of the purpose of the study at all. They were asked to read the magazine as they normally do in order not to focus on viewing advertisements and there was no time restriction enforced. The participant was able to move to next pages by pressing the space bar on the keyboard. After the magazine viewing session was done, the participant filled in a structured questionnaire to recall the information of the advertisements in the magazine she viewed.

3 Result 3.1 Looking Time and Number of Fixations There were eight different advertising pages in the magazine we were interested in. We analyzed the data in terms of the amount of time participants looked at each of the five design elements – brand logo, body text, head text, product image, and human model Image. Among the eight advertisements, not all of them have all the five design elements. Also, some of them have additional design elements other than the five elements. Thus, the average looking time and number of fixations of each element across the eight advertisements are shown (). Without considering the surface size of each design element, participants looked at the human model Image more than any other elements in terms of looking time, F(4,76)=15.91, p<0.01 and number of fixations, F(4,76)=26.12, p<0.01. Product image was the second most element looked at, head text was the third, body text was the fourth, and lastly brand logo. Considering the surface size of each element, average looking time and number of fixations per square centimeter are shown (Fig. 2.). According to the data, head text received the highest looking time, F(4,76)=5.76, p<0.01 and number of fixations per square centimeter, F(4,76)=8.92, p<0.01. Body Text and Product Image are the next highest. Human model was the fourth highest. Logo received the least.

Fig. 1. Looking time and number of fixations for each design element

Effects of Design Elements in Magazine Advertisements

265

Fig. 2. Looking time and number of fixations per cm2 for each design element

3.2 Comparison of Two Ads Among the eight advertisements used in this experiment, we chose two advertisements that consist of almost identical design elements to be compared (BlackBerry VS Samsung). Average looking time and number of fixations for each advertisement were charted comparatively (Fig. 3. Comparison of looking time and number of fixations for two ads). Without considering the surface size of each element, human model Image on Samsung advertisement received significantly more looking time than any other design element, F(4,76)=4.24, p<0.01. However, there was no significant difference between the two ads, F(1,19)=0.14, p=0.71. There was significant difference across design elements in terms of number of fixations as well, F(4,76)=5.54, p<0.01. Considering the surface size of each element, average looking time and number of fixations per square centimeter for both advertisements are shown (Fig. 4.). According to the data, head text received the highest looking time, F(4,76)=5.10, p<0.01 and number of fixations per square centimeter for both advertisements, F(4,76)=5.58, p<0.01. Product Image was the second highest element for BlackBerry advertisement, while human model Image was the second highest for Samsung.

Fig. 3. Comparison of looking time and number of fixations for two ads

266

Y.S. Ryu, T. Suh, and S. Dozier

Fig. 4. Comparison of looking time and number of fixations per cm2 for two ads

3.3 Recall Data Twelve out of twenty participants recalled the BlackBerry advertisement. Six of the twenty participants remembered Samsung phone advertisement was in the magazine. Two of them who recalled Samsung phone recalled the name of the female human model in the advertisement instead of product or brand name.

4 Discussion 4.1 Looking Time and Number of Fixations Upon initial inspection, human model appears to be the most significant element of all the ads based on most looking time and number of fixations. Since Product Image received the second highest looking time and number of fixations, it appears that pictorial elements captured more looking time and fixations than textual elements in general. This result contradicts the results of previous studies [2, 3]. However, if we consider the surface size of each element, head text becomes the most significant element according to looking time and fixations per square centimeter. Since body text becomes the next highest, it appears that textual elements received more looking time and fixations per unit size than pictorial elements. The explanation of this contradiction may be caused by the experimental set up of viewing activity. Previous studies instructed participants to look at specific ads, so that the viewers had strong motivations of studying the printed advertisements. However, our study tried to set up naturalistic situation of magazine viewing activity, so that participants were instructed to engage in normal magazine viewing behavior. There was no emphasis given on either ads or magazine contents. 4.2 Comparison of Two Ads Comparative analysis of two different advertisements of similar products comprised of identical design elements gave us the opportunity to investigate effectiveness of advertisements as a whole. Without considering unit size of each element, both ads

Effects of Design Elements in Magazine Advertisements

267

followed the average pattern of pictorial elements receiving more looking time and fixations than textual elements as discussed above. Considering unit size of each element, head text received the most viewing time for both ads. However, human model received relatively more viewing time for Samsung ad than for BlackBerry, while Product Image received more for BlackBerry than for Samsung. This disparity may be contributed by a celebrity status of the human model in the Samsung ad. 4.3 Recall Data Recall data measures the comparative effectiveness of the two ads. Blackberry and Samsung were presented equally throughout the magazine, however, Blackberry was recalled more times (12 VS 6). Blackberry particularly received higher viewing time in the head text and Product Image, which both highlighted the product or included the product name. It appears that higher viewing time on the human model in the Samsung ad diverted viewer’s attention from the product. In fact, two participants recalled the name of human model alone instead of the product or brand name. 4.4 Design Implication Thus, based on the result of this study, several design implication for print advertisement can be listed. Although it takes less cognitive processing time to comprehend pictorial element than textual element, magazine viewers spent more time on pictorial elements (human model image and product image) then other design elements. Given one page for each advertisement, the effectiveness of each element to achieve the goal of advertisement should depend on overall time spent on each element than the time spent per cm2. Thus, the use of pictorial elements should be more important than the others according to the result of this study. Use of celebrity human model image can attract viewer’s attention, however, can divert their attention from the product. This was supported by the result of recall data between Samsung and Blackberry ads (see 3.3). Thus, establishing strong intuitive relationship between the celebrity and the product or brand name would be important premise to use a celebrity human model image in advertisements. Among text elements, use of head text is more important than that of body text. Brand logo did not capture any attention at all in this study, and this seems to be due to the small size of it. Also, brand logo can be comprehended intuitively by viewer’s peripheral vision instead of focal vision assuming that viewers are already familiar with brand logos.

5 Conclusion In this study, unlike previous studies where participants were instructed to pay attention to the advertisements, we set up a more naturalistic situation of reading magazine. Five major design elements (body text, head text, brand logo, product image, and human model image) were investigated and our results showed pictorial elements captured more looking time and fixations than textual elements in general and textual elements received more looking time and fixations per unit size than

268

Y.S. Ryu, T. Suh, and S. Dozier

pictorial elements. Also, a comparative data analysis of two different but very similar advertisements of competing products provided design implications regarding the use of human model image and head text. As results, pictorial elements attracted more looking time from the viewers than other design elements, however, head text is still very effective design element in advertisements.

References 1. Aoki, H., Itoh, K.: Eye Tracking Analysis of Effects of Key Styling Factors on Visual Attention During Reading of Printed Advertising. In: Applied Human Factors and Ergonomics International, Las Vegas, NV (2008) 2. Pieters, R., Wedel, M.: Attention Capture and Transfer in Advertising: Brand, Pictorial, and Text-Size Effects. Journal of Marketing 68(2), 36–50 (2004) 3. Rayner, K., et al.: Integrating text and pictorial information: Eye movements when looking at print advertisements. Journal of Experimental Psychology: Applied 7(1), 219–226 (2001)

The Influence of Shared-Representation on Shared Mental Models in Virtual Teams Rose Saikayasit and Sarah Sharples Human Factors Research Group, Faculty of Engineering, The University of Nottingham University Park, Nottingham NG7 2RD, United Kingdom {epxrs7,Sarah.Sharples}@nottingham.ac.uk

Abstract. This paper reports a laboratory experiment investigating the influence and effects of shared-representation facilities on collaboration and shared mental models development in virtual teams. The experiment has two experimental conditions; with or without shared-representation facilities. Participants were asked to work in pairs on a ‘house hunting’ scenario. The results showed no significant difference in the overall performance between the two conditions, however shared mental models development was significantly higher where partners were able to use shared-representation facilities. Keywords: shared mental models, shared-representation, collaboration, virtual teams.

1 Introduction Virtual teams have become commonplace in the world today where organizations are constantly searching for better ways to plan and carry out their business. These teams consist of geographically distributed members who rely heavily on communication technologies for effective collaboration. The use of collaborative tools has seen a rapid incline due to the decreasing costs and availability of the internet. Technologies such as email, instant messaging, internet phone, video conferencing and application/desktop sharing seek to offer better solutions to aid collaboration. These technologies contribute and affect collaboration differently depending on the types of task being performed and the nature of the team. Many organizations have adopted the use of shared publication spaces to rectify the problem of sharing restriction in distributed workplaces. This is often where a collaborative portal brings all project information into one central location that can be accessed anytime, anywhere very easily [1]. This paper describes a study that evaluates the effects of some aspects of collaborative technologies such as desktop sharing or shared-representation, on decision making and shared mental models. Shared-representations allow two or more people to view or share the same visualization and resources at the same time, during synchronous collaboration, for example in a virtual meeting. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 269–278, 2009. © Springer-Verlag Berlin Heidelberg 2009

270

R. Saikayasit and S. Sharples

The objective of this study was to test how the availability of shared-representation affects the overall level of collaboration in virtual teams as well as the development of the teams’ shared mental models. A laboratory study was carried out where participants were divided into two groups, for two experimental conditions. In the control group, participants worked in pairs on a problem-solving task without the use of shared-representation facility. However in the second experimental group, participants also worked in pairs, but were able to use the shared-representation facility during the task. The results such as performance, satisfaction, shared-mental models development and collaboration were then compared between the two conditions. 1.1 Virtual Teams Virtual teams are fundamentally groups of individuals separated by distance and/or time, yet have common tasks to perform [3]. Edward and Wilson [4] have divided virtual teams into three categories - project teams, service teams and process teams. Project teams only come together for a finite period of time in response to a project brief; some exist as a resource on call for the resolution of problems or for advice, these are called service teams. The last group, process teams exist over an undefined period to respond to ongoing needs within a certain domain. These teams however are more complex than traditional co-located teams, and with the nature of virtual teams, members must work apart more than together, which reduces the amount of formal and informal communication. This in turn may result in members feeling isolated and lacking the sense of belonging. All these factors may impact on the development of their shared mental models development, which many organizations have to overcome. 1.2 Shared Mental Models (SMMs) The increasing use of technologies has contributed to the complexity of many tasks performed in the workplace, making it difficult for personnel to complete their work independently. Many organizations therefore rely on work-teams consisting of members from different fields or of the same background to carry out set tasks and projects. This emphasizes the importance of SMMs in virtual teams. Mental models (MMs) are organized knowledge structures that allow individuals to interact with their environment, to predict and explain the behavior of the world around them. They also allow individuals to work together in a team whilst recognizing relationships among components as well as other members within the environment [2&4]. Cannon-Bowers et al [3] suggested that teams needing to adapt quickly to changing task demands might be drawing on shared or common MMS using the rationale that members must predict what their teammates are currently doing and what they are going to need in order to accomplish that task. Orasanu and Salas [6] state that in order for a team to work together successfully, members must perceive, encode, and retrieve information in similar ways, thereby constructing similar individual MMs. This will allow teams to tightly coordinate with each other even under high pressure

The Influence of Shared-Representation on Shared Mental Models in Virtual Teams

271

conditions as members are able to anticipate and predict each other’s needs and reactions. Bristol [2] summarized the following: • • • •

Knowledge convergence leads to similar individual MMs Similar individual MMs leads to similar problem solutions Similar problem solutions help anticipation of another’s needs or actions Anticipation enables team coordination without extensive communication

This concept has mainly been developed for co-located teams where members work closely together all the time, and develop situation awareness of each other within the teams. However, when considering distributed teams, where members seldom communicate with each other directly, it is unclear how these virtual teams of skilled members develop their shared mental models or how their locations and distances affect this development. 1.3 House-Hunting Experiment The experiment for this study is referred to as ‘house-hunting’. This scenario has been chosen as it is easy to understand and allows volunteers to take part without having prior knowledge of the task. Spatial information is known to be difficult to verbalize and therefore very suitable for this experiment testing distributed collaboration where participants cannot communicate face-to-face. One of the objectives was to compare performance between conditions. In the first condition, both participants working in the same team could see the same spatial information whereas the other condition, participants purely relied on each other to verbalize all the information. Participants were asked to bring someone they know to the experiment, who they could then work together with in a pair. A pilot study was conducted to gather a list of important information people need when looking for a house to let. This included information on price, crime rates, transportation, distance to shops and parking availability for example. The locations of where all the ten properties to let were only given to one of the participants within a pair. This means for one of the conditions, the partners had to describe where houses were to each other. It was decided that the information such as rent would be expressed in areas as shown in Figure 1. This information was given to the other partner who did not have the locations of the properties. This meant partners had to clearly communicate with each other and this also ensured that collaboration was necessary in order to complete the task. Participants working together in a pair would need to combine information from Figure 1 together in order to establish how each of the houses fall into different price brackets for rent. Participants were also given a different set of criteria to their partners. This was to ensure that the experiment simulates the conflicting goals and needs in a real working environment as well as the need to compromise. It also was important to avoid one participant taking charge and become the main decision maker for the pair if they had the same criteria to work towards.

272

R. Saikayasit and S. Sharples

Fig. 1. Rent expressed in area available to participant 1 (left) and the locations of properties available to participant 2 (right)

This experiment also allowed the investigation of SMMs in the sense that participants had to work together to problem solve as well as to make decisions based on collaboration and the development of SMMs. This experiment has three main hypotheses to be tested as follow: H1: There is a difference between the levels of shared mental models development in the two experiment conditions H2: There is a difference in the overall performance between the two conditions H3: There is a difference on the user experience such as the level of satisfaction and perceived difficulty when comparing two conditions

2 Method Participants. 32 paid volunteers took part in the experiment forming 16 pairs; eight for each experimental condition (17 male; 15 female; mode age group of 22-25). Participants were asked to bring a friend along with them to the experiment, so they could work as partners. Design. The experiment was a between-subject design. All participants and their partners make the total of 16 pairs, hence eight pairs per each condition. The 2 experimental conditions are: 1. ‘Shared-representation’ or ‘SR’ 2. ‘No shared-representation’ ‘NSR’ Participants were located in the same room but were separated from their partners by a partition screen between their desks. This prevents them from being able to directly share information with each other or to communicate face-to-face during the experiment. However they were able to communicate vocally. Task. Participants were asked to select three houses they would like to rent together, from a selection of ten. In a pair, participants working together would get separate pieces of information to their partners. However, they needed to collaborate and work together in order to combine and utilize all the available information as a team. The

The Influence of Shared-Representation on Shared Mental Models in Virtual Teams

273

information was divided to encourage collaboration. It was also divided to ensure that partners could not have completed the task properly unless they collaborated. Partners were also given conflicting criteria for the required house. This is to simulate many of the real life situations where different needs and requirements with individuals collaborating conflict and therefore compromises have to be made. Participants were given 40 minutes to complete the task. Materials. All sessions were video recorded by two video cameras on tripods, i.e. one camera per partner working on the task. Booklets containing the information on the task as well as the instructions were given to all participants at the start of the experiment. Participants within pairs were given a list of criteria for the potential house-to-let, which was not however, given to their other partners, who received another set of conflicting criteria. A partition screen was used to separate participants. Each participant was given a laptop to use during the experiment, which they used to browse through the PowerPoint slides containing information about different properties. These laptops were also attached to a spare monitor on the other side of the partition screen, on their partner’s desk. This allowed the driving laptops to produce a copy or a projection onto the spare monitor. In the SR condition, the spare monitors were switched on, meaning both partners would have their laptops as well as a spare monitor each, where they could see a copy of their partner’s screen. This spare monitor was switched off for the NSR condition where participants were not allowed to see their partner’s screen. After the task completion, post-experiment questionnaires were given to all the participants. This was to gather subjective data such as the perceived difficulty of the task, the level of satisfaction w the final selection of houses as well as collaboration. Procedure. Once the participants arrived in their pair, they were asked to be seated on either side of the partition screen. They were asked to read and sign the consent forms before they were given a briefing on the task as well as their information booklet. Participants were allowed a few minutes to familiarize themselves with all the given information and their own criteria. They were also informed that they did not have the same slides as their partners. The experimenter then showed each participant how to use and navigate around their given presentation slides on Microsoft PowerPoint. Participants were able to ask questions about the experiment. Observation notes were made by the experimenter, who sat at the back of the room, throughout the experiment. Participants were allowed to ask questions relating to PowerPoint, whilst direct help which may influence the teams’ decisions were unanswered. At the end of the experiment, participants were given a set of post-task questionnaires to complete. They were also asked to speak English during the experiment at all times and were all paid after they have completed the experiment.

3 Dependent Variables The analysis of this experiment as divided into three parts. These are ‘SMMs development, ‘body movement’, ‘performance’ and questionnaire. The first two parts were mainly video analysis for both verbal and non-verbal communication during the experiment. The coding for shamed mental models

274

R. Saikayasit and S. Sharples

development was adapted from Bristol [2]. This was further divided into two other sections, one on shared mental models and team decision-making, another is mainly based on how partners navigate each other around during the experiment. The final part on performance was gathered from properties selected by participants at the end of the task. SMMs development. The first set of coding was used to analyze conversations and interactions within a team during the experiment. It was also used to identify the stages of development of the SMMs. For example these were; sharing/initiating plans, sharing/initiation evaluation, questioning/informing partner of own criteria, questioning/information of partner’s criteria, requesting partner’s opinion on actions/decisions, offering reasons behind decision and debate. Communication and body movement codes were also used to analyze activities related to how participants within a team navigated each other around their own maps to ensure that the same specific points was being looked at an understood. This showed how participants verbalized spatial as well as textual information to their partners. It also showed how well participants understood the given information and what they thought were crucial facts in which they needed to share with their partners. For example these were; general navigation (left, right, square, center), reference to specific landmark, dividing screen into quadrants, giving specific driving directions using roads, giving directions in terms of north, south, east and west. Body moment considered the conscious and subconscious body language used during interactions, even though these gestures could not be seen by their partners. This therefore focused mainly on the non-verbal communication aspect, taken from the video recordings. Example for coding includes; pointing at own computer screen, looking at the screen showing a copy of their partner’s screen, other gesturing when talking to partner and turning towards the partition screen to talk to their partner on the other side during discussion. This coding allowed analysis to be done with respect to issues such as presence and articulation. For instance, in the condition where participants could see a copy of their partner’s screen, the number of times they switched between the two computer screens was recorded. These coding schemes were considered as part of the development SMMs, as partners were exchanging information and ensuring that they both had the same understanding of the maps or representation given. Performance. At the end of the experiment, participants were asked to complete an answer sheet with their partners and finalize three properties they would like to rent. They were asked to list them in order of preference and the reasons for their choices. Because this scenario was designed for this experiment specifically, the marking scheme was also designed along side by taking into consideration all the criteria given to the participants. This marking scheme was then used to rank all ten properties in order and scores were given to each property. This enabled the final decisions made by all pairs to be measured and quantified to the same scale. Scores were then given to participants’ three most preferred properties. However, if they managed to select their top three properties in the correct order according the marking scheme, they were awarded bonus points. These scores were then used to compare their performance between different pairs in both conditions. However, because the

The Influence of Shared-Representation on Shared Mental Models in Virtual Teams

275

marking scheme was designed especially for this therefore the scores were considered non-parametric. Questionnaire. Subjective data such as satisfaction, ease of communication and the perceived difficulty of the task were taken from all the questionnaire responses from all the participants. Likert five-point rating scales were used throughout the questionnaire, which asked participants to rate their agreement to the given statements. Finally, the data were coded into categories as mentioned which allowed further statistical analysis for participants from the two experimental conditions.

4 Results Several statistical tests were carried out in order to find the differences in SMMS development, performance and satisfaction between the two experimental conditions. SMMs development. Table 1 shows variables showing significant differences between the two conditions from paired-sample t-tests. Table 1. SMMS Development Results Variables

Condition with higher mean

Initiating/sharing of strategies (t = 2.421; df = 21.65; 2-tailed; p<0.05) Initiating/sharing of evaluation (t = 2.639; df = 30; 2-tailed; p<0.05) Giving instructions (t = 2.39; df = 30 ; 2-tailed; p<0.05) Debating (t = 6.92; df = 21.19; 2-tailed; p<0.05) Suggesting solutions (t = 2.86; df = 30; 2-tailed; p<0.05) Offering reasons (t = 2.68; df = 30; 2-tailed; p<0.05) Reading out loud (t = 2.09; df = 30; 2-tailed; p<0.05)

SR SR SR SR SR SR NSR

Body language. The only significance difference found was in gesturing when giving direction (t = 2.06; df = 30; 2-tailed; p<0.05). The NSR condition had a higher mean for this variable, meaning participants in this group gestured more when verbalizing information to their partners. Performance. A Mann-Whitney test was performed on the scores given to all teams in both conditions. Table 2. Performance Results Condition

Minimum

Maximum

Mean

Std. Deviation

SR NSR

25 45

120 120

68.75 88.13

28.878 32.176

276

R. Saikayasit and S. Sharples

Three pairs in the non-shared representation scored full marks of 120 whilst one pair from the shared-representation scored full marks. However, no significance difference was found. Questionnaires. Responses from the questionnaires were also analyzed and compared between the two experimental conditions. T-tests were performed on the collected data. Significance differences were found in only two variables from the questionnaires. Table 3. Questionnaire Results Variables

Condition with higher mean

“I communicated with my partner articulately” (t = 2.36; df = 30; 2-tailed; p<0.05) “I found it easy to navigate my partner to a location” (t = 2.46; df = 25.42; 2-tailed; p<0.05)

NSR SR

Observation. In general all teams from both conditions started the task by studying each other’s given criteria and narrowing down the choices of available properties to suit and compromise each other’s preferences. Most pairs narrowed down their search by looking at all the text-based information given and ruling out properties which obviously did not satisfy their criteria. They then carried on to look at the reminding properties, often participants went back to the choices they had ruled out earlier on in the experiment in the cases where the perfect match was not found in the reminding pile. Many participants in the SR condition had often written down a list of information their partners had and requests such as “Can you open the slide with the distance to the bus stop?” were observed throughout the experiment. In the NSR condition, most pairs went through all the information available to each of them first. Specific questions were used to ask their partners when requesting or exchanging information. Participants in this condition were also forced to give directions to their partners to specific points on the maps, as they have different information on their slides. Directions were requested such as “starting from the yellow line, the motorway, from the left to right, can you direct me to where house A is?” or “if you were driving to house A, from the main junction at the bottom left of your map, can you explain how you would get to it?” These were less likely in the SR condition where partners could see the same pieces of information. Overall, it was observed that SR teams often spent more time on more aspects of a property before making their final selections in comparison with the NSR condition.

5 Discussion The three main hypotheses tested in this study were with regards to the overall and the development of SMMs within teams including levels of navigation and communication between partners, performance and the user experience between the

The Influence of Shared-Representation on Shared Mental Models in Virtual Teams

277

two experimental conditions. The data gathered from the experiment have been tested and some of the variables belonging to each category as shown in the results section have shown significance differences. By eliminating face-to-face communication and direct sharing of resources, participants were forced to clearly communicate with their partners at various stages of the task. They also communicated about the organization of the task, roles, crucial steps and strategies in solving the problem. Because the given task relied heavily on the use of spatial information and navigation, participants were unable to simply read out information to their partners, and hence the importance of shared mental models. It can be seen from the statistical tests performed that the SR group showed more SMMs development through their progressive conversations and decision making. Partners within this condition shared more evaluation, strategies and organization of the task as well as more debated backed up with reasons than those in the other group. This, in theory should allow the participants within the SR group to form ‘clearer’ shared mental models with their partners. From the analysis, participants in the SR group spent on average 96% of their total communications with their partners on SMMs development whilst only 69% of the time was spent on this by the other group. Participants in the NSR group spent 30% of their time navigating each other around as they needed to make sure they were able to combine their spatial information with each others’ given they could not view these information themselves. It was also found that in the NSR condition, participants were strong to inform their partners of their current activities such as what they were looking at, what they were trying to find out for example. This was less important in the SR condition as participants could already see what their partners were looking at. Much of the SMMs development in the SR condition came in the forms of strategic talk, discussions, evaluation and debate. This may be due to the fact that both partners felt they had an equal opportunity to find the solution as they both technically had all the information. On the other hand, this happened much less in the NSR. The SMMs development came in the forms of navigation for this condition. In the NSR condition, the overall team performance relied on how well partners navigated and communicated with each other as they have no shared visual representation to base their mental models on. Overall, there was no difference between the performance and the overall difficulty rating of both conditions. All teams in both conditions were allowed the same amount of time to complete the task. Satisfaction levels on various aspects differ between the two conditions. Participants in SR group perceived the information given for the task to be easier to navigate and understand, whilst those in the NSR group had a higher level of satisfaction on how well they communicated with their partners. By observation, it could be seen that the majority of the participants in the SR condition paid more attention to more aspects and focuses when making decisions. Participants from the NSR condition were forced to trust their partners’ judgment and the information given to them by their partners. For example, one pair of participants in the NSR group started off their task with the participant who could see all houses telling his partner where they all were. His partner then mapped these houses onto the crime zones and made the decision himself, to eliminate those houses, which fell into the crime zones quickly. On the other hand, participants from the SR condition

278

R. Saikayasit and S. Sharples

preferred to study all houses considering all factors before they started the elimination process. It can be concluded from this study that, if a virtual team needs to communicate briefly, precisely and make decisions promptly, then the use of shared visual representation between members did not seem beneficial on the performance outcome. It was also possible that participants from the SR condition were being overloaded by too much information available, which also included information of low importance and hence hindered them from focusing on main criteria, whereas without the shared representation, participants were forced to make quick judgments.

References 1. Balme, S., How, J., Utzel, N.: Using Remote Participation Tools to Improve Collaborations: Fusion Engineering and Design, vol. 74, pp. 903–907 (2005) 2. Bristol, N.: Shared Mental Models: Conceptualization and Measurement. The University of Nottingham PhD Thesis (2005) 3. Cannon-Bowers, J.A., Salas, E., Converse, S.: Shared mental models in Expert Team Decision Making. In: Castellan Jr., N.J. (ed.) Individual and Group Decision Marking, pp. 221–246. Lawrence Erlbaum and Associates, Hillsdale (1993) 4. Edward, A., Wilson, J.R.: Implementing Virtual Teams: A Guide to Organizational and Human Factors. Gower, Cornwall (2004) 5. Mathieu, J.E., Heffer, T.S., Goodwin, G.F., Salas, E., Cannon-Bowers, J.A.: The Influence of Shared Mental Models on Team Process and Performance. Journal of Applied Psychology 85(2), 273–283 (2000) 6. Orasanu, J.M., Salas, E.: Team Decision-making in Complex Environments. In: Klein, G.A., Orasanu, J., Calderwood, R., Zsambok, C.E. (eds.) Decision-making in Action, Norwood, NJ, pp. 350–370 (1993)

Harnessing the Power of Multiple Tools to Predict and Mitigate Mental Overload Charneta Samms, David Jones, Kelly Hale, and Diane Mitchell U.S. Army Research Laboratory, Human Research and Engineering Directorate, ATTN: AMSRD-ARL-HR-MB, Aberdeen Proving Ground, Maryland 21005-5425 {charneta.samms,diane.k.mitchell}@us.army.mil Design Interactive, Inc., 1221 E. Broadway, Suite 110, Oviedo, FL 32765 {david,Kelly}@designinteractive.net

Abstract. Predicting the effect of system design decisions on operator performance is challenging, particularly when a system is in the early stages of development. Tools such as the Improved Performance Research Integration Tool (IMPRINT) have been used successfully to predict operator performance by identifying task/design combinations leading to potential mental overload. Another human performance modeling tool, the Multimodal Interface Design Support (MIDS) tool, allows system designers to input their system specifications into the tool to identify points of mental overload and provide multi-modal design guidelines that could help mitigate the overload identified. The complementary nature of the two tools was recognized by Army Research Laboratory (ARL) analysts. The ability of IMPRINT to stochastically identify task combinations leading to overload combined with the power of MIDS to address overload conditions with workload mitigation strategies led to ARL sponsorship of a proof of concept integration between the two tools. This paper aims to demonstrate the utility of performing low-cost prototyping to combine associated technologies to amplify the utility of both systems. The added capabilities of the integrated IMPRINT/MIDS system are presented with future development plans for the system. Keywords: mental workload, overload, IMPRINT, MIDS, command and control, multimodal, integrated toolset.

1 Introduction Throughout the system development process, it is important to consider the effect of system design decisions on operator performance. However, predicting this effect is challenging, particularly when a system is in the early stages of development. Tools such as the Improved Performance Research Integration Tool (IMPRINT) and the Multimodal Interface Design Support (MIDS) tool individually provide analysts with D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 279–288, 2009. © Springer-Verlag Berlin Heidelberg 2009

280

C. Samms et al.

great capability to examine the mental workload of system operators but with some limitations. By merging their individual power, the combination could create a powerful capability to predict and mitigate mental overload. Combining the strengths of sophisticated modeling tools such as IMPRINT and MIDS can provide a great advantage to analysts needing to conduct complex system analyses. In this paper, we describe the process and benefits of performing low-cost prototyping to combine associated technologies using the IMPRINT and MIDS integration as a case study. The paper concludes by presenting the added capability of the integrated prototype system and planned extensions of the technology. 1.1 Improved Performance Research Integration Tool (IMPRINT) IMPRINT, developed by the U.S. Army Research Laboratory (ARL), has been used successfully to predict system performance of conceptual systems as a function of operator performance by predicting the mental workload of the operators and identifying task/design combinations that lead to mental overload [6]. It is a dynamic, stochastic, discrete event simulation tool that enables analysts to represent all of the functions and tasks required to complete a particular mission. When building the model, the analyst must parameterize each task with data, such as time the task will take to be performed and how much mental demand the task will place on the operator. All of the functions and tasks are coded to determine the order in which they will be performed to simulate the execution of the mission by the operators. During the model execution, IMPRINT calculates the mental workload associated with all of the combinations of tasks that occur during the mission and provides various reports for the analyst to review. With this data, the analyst can examine the mental workload profile of each operator over the whole mission to identify workload peaks and which tasks the operator is performing during those peaks. Due to its stochastic nature, IMPRINT is an excellent tool for identifying those potential workload peaks thereby predicting mental overload; however, once overload is detected, it does not provide guidance on how to change the system design in order to mitigate the mental overload. It is currently up to the analyst to draw upon their knowledge of human performance research to determine what should be done to mitigate the identified peaks. 1.2 Multimodal Interface Design Support (MIDS) Tool Another human performance modeling tool, developed by Design Interactive, Inc., is the Multimodal Interface Design Support (MIDS) tool. It also calculates mental workload based on a modified version of the Workload Index (W/INDEX) equation [5], but for a slightly different purpose. The MIDS tool allows system designers to input the specifications of task and user interface (in the form of task type, display type and mental workload demand) to calculate overall estimated workload and identify points of mental overload. During these overload peaks, MIDS relates tasks ongoing during those instances with specific multimodal design guidelines and heuristics that could be implemented into the system design to mitigate the overload. While the capability of MIDS to connect mental overload to potential mitigation strategies is powerful, it is focused on a generalized set of Command, Control,

Harnessing the Power of Multiple Tools to Predict and Mitigate Mental Overload

281

Communications, Computers, Intelligence, Surveillance, and Reconnaissance (C4ISR) tasks, and does not include stochastic task execution. 1.3 The Idea ARL analysts recognized that the respective strengths of MIDS and IMPRINT could be used to compensate for the constraints of each tool. For the past decade, ARL analysts have used IMPRINT to predict when human operators are likely to experience high workload [1-3]. At the completion of these projects, however, program managers (PMs) requested recommended design modifications that would mitigate the high workload predicted by IMPRINT. Since IMPRINT did not provide this type of guidance, the analysts followed their own techniques for developing design recommendations, which are inherently limited by the experience and knowledge of the analyst. This created an inconsistent process and could lead to contradictory recommendations being provided from various IMPRINT analysts. In attempting to address this problem, the ARL analysts identified the MIDS tool as a potential solution. Because the MIDS tool contains consistent guidelines on how to mitigate mental overload, the analysts could use it to develop mitigation strategies for the PMs. Although the analysts could use the MIDS tool to develop mitigation strategies, it was not the complete solution. While the MIDS tool capability to predict workload is very similar to IMPRINT, MIDS is not a stochastic modeling tool and is also focused on C4ISR tasks, which meant that analysts would still need to use IMPRINT to better predict mental overload and then use MIDS to obtain the guidelines. Ultimately, this two-stage process requires the entire modeling process to be performed twice and provides an opportunity for discrepancies to occur between the two models, making it difficult to match up states of overload detected in IMPRINT with design guidance provided in MIDS. Therefore, the ARL analysts suggested incorporating the guideline consistency of MIDS with the stochastic capability of IMPRINT to allow analyst a streamlined capability to predict and mitigate mental overload in their system analyses. The next section describes the development of an integration approach and the challenges encountered during the effort.

2 Proof of Concept Integration After exploring multiple methods to integrate IMPRINT and the MIDS tool, a proof of concept MIDS plug-in was developed which allowed IMPRINT to seamlessly pass information to the MIDS plug-in which would present appropriate design guidelines to analysts. 2.1 Selection of an Integration Approach Prior to developing a link between IMPRINT and MIDS, multiple approaches were evaluated to determine the optimal method to input the task model information into MIDS to drive guideline presentation. Because the goal of this initial integration was a proof of concept system, the evaluation of these approaches was based on: 1) the capability to provide all of the necessary information for MIDS to properly trigger

282

C. Samms et al.

guidelines associated with times of predicted overload; and 2) the modifications required to IMPRINT and level of effort associated to integrate each approach. Specifically, the selection criteria targeted an integration approach that highlighted benefits of integrating MIDS into IMPRINT without significant modifications to IMPRINT. The selected integration method resulted in developing a MIDS plug-in that will accept a single exported IMPRINT operator workload spreadsheet as input. As seen in Figure 1, the process of developing the required worksheet within IMPRINT is performed as a three step process of entering mission data, running the simulation, and requesting the report based on the simulation run. If the MIDS plugin is installed, the system detects this and allows the IMPRINT analyst to request the associated guidelines. The workload resources that were added to the IMPRINT model must then be associated with the standard human information processing resources available within MIDS. Once these steps are completed, MIDS accepts the operator workload spreadsheet that is developed by IMPRINT as the workload input to drive the extraction of relevant MIDS design guidelines at each point of overload.

Fig. 1. Integrated MIDS/IMPRINT System- Plug-in Approach

This method provided the MIDS plug-in with all the information that was needed to trigger guidelines and also did not require major changes to IMPRINT. 2.2 Addressing Integration Challenges Although IMPRINT and MIDS are similar in many ways, variations within the tools provided challenges to their integration. Timing of Workload Evaluation. Since both tools look at mental workload relating to shifting concurrent tasks over time, the identification of mental overload is dependent upon when mental workload is calculated during the model run. The MIDS tool calculates workload on a second by second basis by adding all of the tasks that occur at that given second regardless of whether or not the tasks have changed (i.e. a new task has begun or a previous task has ended). However, IMPRINT calculates workload whenever a contributing task starts or stops; not at regular intervals. This allows the analyst to identify mental overload that occurs at any time in the model unlike the second interval dependency of the MIDS tool approach. The IMPRINT

Harnessing the Power of Multiple Tools to Predict and Mitigate Mental Overload

283

approach also avoids reporting mental workload for tasks that have not changed. With this in mind, the MIDS plug-in was modified to follow the IMPRINT approach. Discrepancies in Workload Calculations. Although both tools’ workload calculations are based on W/INDEX, MIDS utilizes a modified version of the index to account for separate verbal and spatial cognitive resources which allows for more directed design guidance. To ensure that the most appropriate calculation was utilized in the integrated tool, a comparative analysis was performed. To evaluate the difference between workload levels calculated between the two systems, an identical test scenario was created within IMPRINT and MIDS. The workload levels calculated by each system were compared to: 1) determine if the average workload levels varied across systems; and 2) identify any discrepancies between when mental overload would be detected by each tool. The results of this comparison show that there were only negligible average differences between the total and individual resource average workload scores across the two systems (Table 1). There were no occasions where mental overload was detected differently between the two systems due to the workload calculations used. Although there were eight additional cases of overload detected using the MIDS calculation, this was due to task time rounding done in the MIDS tool. In the MIDS tool, task times less than 1 second are rounded up to seconds while IMPRINT calculates on a millisecond level. This caused an extended task to occur concurrently with another task, leading to an erroneous overload condition. This revelation further supported the need to adopt IMPRINT’s approach of calculating workload when tasks begin and end instead of at 1-second intervals. The overall results of the comparison suggest that the workload values calculated by IMPRINT could be used to drive the selection of MIDS guidelines. Matching Mental Resources. Another challenge involved the naming of mental resources within each tool. IMPRINT allows analysts to re-name or create mental resources, while MIDS uses a specific set of mental resources. Since the design guidelines are tied directly to the mental resources, it would be difficult to match up any mental resources that were different than those used in MIDS. To overcome this challenge, it was determined that analysts would need to map their IMPRINT mental resources to those used within the MIDS plug-in. As shown in Figure 2, once the MIDS plug-in is launched, a “Resource Mapping Request” interface appears that requires users to match each resource in IMPRINT with the resources available within MIDS. Table 1. MIDS/IMPRINT Comparative Workload Results Workload Resource Visual Auditory Motor Speech Cognitive Total

Average Difference 0.01 0.01 -0.01 0.00 -0.40 -0.38

*Due to differences in timescales used

Overload Mismatches 0 0 0 0 8* 0

284

C. Samms et al.

Fig. 2. Resource Mapping Request Window

2.3 The MIDS Plug-in Because MIDS is designed as a plug-in, IMPRINT must determine whether or not the plug-in is installed on the system. If so, when an IMPRINT analyst completes the model execution and opens reports, the reports window provides an option to launch the MIDS Plug-in (labeled “MIDS guidelines”) as depicted in Figure 3.

Fig. 3. Modified Report Request Interface

Once the “MIDS Guidelines” box is checked, the analyst selects “OK” and completes the Resource Mapping discussed earlier, the MIDS plug-in reads the corresponding IMPRINT report and opens the MIDS interface. MIDS Plug-in Interface. The MIDS plug-in interface (Figure 4) was designed to ensure that analysts can easily gather design requirements and associate them with points of predicted mental overload.

Harnessing the Power of Multiple Tools to Predict and Mitigate Mental Overload

285

Fig. 4. MIDS Interface

The MIDS Timeline pane (Figure 4, Pane 1) provides a visual summary of predicted mental overload for each mental resource within the IMPRINT model run. The default setting of the MIDS plug-in show a timeline of the scenario where red cross-hatch bars represent conditions of overload and white bars depict conditions where workload is below threshold. Red bars can be selected by a left-click of the mouse which then highlights the section of the timeline associated with the selected overload. When a section of the scenario is highlighted, the MIDS Plug-in will display a list of tasks performed at that time and guidelines targeted at reducing the overloaded resources in the lower portion of the interface labeled “Guidelines.” The interface allows the user to change or manipulate the timeline presented in the graph and access guidelines related to specific instances of overload. The Guidelines pane displays the multimodal guidelines applicable during the userselected instance of overload in the MIDS Timeline (Figure 4, Panes 2-4). An analyst is able to view guidelines specific to a selected overloaded resource or general design guidelines not specifically triggered by overload and the specific tasks to which the guideline(s) apply. The Guideline pane has three subcomponents: •

Guideline Selection Pane (Pane 2) - This pane lists all the multimodal design guidelines applicable to the instance of overload selected in the MIDS timeline. Within this display, one tab is presented for each overloaded resource during the selected time instance. For example, if a period of time is selected where there is only Visual overload present, then that tab becomes active along with a “General” tab, which is used to present broader level guidelines for the tasks that are ongoing at that time. Guidelines listed in any tab may be selected to obtain

286

•

•

C. Samms et al.

further detail regarding the information provided. When selected, a separate window is created that identifies which of the active tasks (see Task Presentation Pane section) the guideline is applicable to. In addition, detailed guidance is also displayed in the Guideline Presentation Pane. Task Presentation Pane (Pane 3) - When a user selects a time instance from the MIDS timeline, the Task Presentation Pane is populated with the tasks that are active during that period of the analyzed mission along with the resource-specific workload demand associated with each. Guideline Presentation Pane (Pane 4) - When a guideline is selected in the guideline selection pane, the complete presentation (full text version) of the guideline is presented here. Whenever a guideline is selected and presented in this portion of the display, the “Show Rationale” option is also active, providing a link to a brief explanation of the guideline and its source.

3 Validation To validate the utility and applicability of the MIDS plug-in and the guidelines it is intended to provide, a future command and control (C2) concept IMPRINT model was Table 2. Example of guidelines triggered at a given instance during the validation scenario Clock 1:05:14

Task Monitor COP

Overload

Guidelines Triggered

Visual

2 Use congruent pairings of color and position to reduce reaction time.

Visual

3 Use motion to enhance detection of objects in the periphery to overcome poor illumination.

Visual

4 Precede visual information with an auditory alert tone. Use congruent pairings of pitch and position to reduce reaction time.

Visual

5 Use vibratory/tactile cues for alerts/warnings

Monitor Chat

Cognitive Verbal Cognitive Verbal

Monitor VoiceNet/radio/intercom

Auditory Auditory Auditory

Obtain and Process voice info Cognitive Verbal Cognitive Verbal Cognitive Verbal Communicates via typing Cognitive Verbal

21 Present highest priority verbal task using audio instead of visual input. 22 Present one task at a time: Hold lowest priority task in queue until highest priority task is complete. 42 Present one auditory task at a time: Hold lowest priority verbal task in queue until highestpriority task is complete. 30 Keep auditory warning messages simple and short. 42 Present one auditory task at a time: Hold lowest priority verbal task in queue until highest priority task is complete. 44 Use written text for conveying detailed, long information. 45 Add spatialized audio to aid identification of auditory verbal messages in noisy environments. 21 Present highest priority verbal task using audio instead of visual input. 22 Present one task at a time: Hold lowest priority task in queue until highest priority task is complete.

Harnessing the Power of Multiple Tools to Predict and Mitigate Mental Overload

287

modified to create a 30-minute scenario. The original model was built in IMPRINT to identify potential mental overload issues within a futuristic C2 battalion [4]. The modified model was executed in IMPRINT and the results were read by the MIDS plug-in to extract all guidelines triggered by the scenario. For this model 25 unique guidelines were presented that could be used to aid system designers in the redesign process. Table 2 provides an example of guidelines triggered at one instance during the C2 scenario. When guidelines are triggered, the type of overload and the task to which the triggered guideline should be applied are also displayed. Each of the guidelines triggered was reviewed by the MIDS integrators (Time and resources did not allow for an independent validation) to ensure they were applicable for the condition of overload for which they were designed to mitigate. Specifically, the team went through the process of theoretically applying the guideline to the interfaces that were actively in use within the concept IMPRINT model, rating each into the categories of “applicable” or “not applicable.” The results of this analysis showed that 100% of the triggered guidelines were applicable at the time that they were triggered, demonstrating the validity of integrated guidelines.

4 Conclusion The proof of concept integration of IMPRINT and MIDS showed the benefit of bringing together the strengths of both tools to address complex issues associated with system design. It also demonstrated a low cost approach to developing a proof-ofconcept integrated toolset that can be used to integrate other tools with complementing capabilities into IMPRINT. This prototype will be the basis of future efforts to enhance IMPRINT to provide analysts with the ability to not only predict and identify areas of mental overload, but to also recommend validated multimodal design changes to minimize overload and enhance system performance. In the future, the goal is to integrate the MIDS plug-in into IMPRINT more seamlessly and provide detailed reports on identified guidelines related to the associated mental overload. Another planned enhancement is the capability to prioritize all of the identified guidelines and incorporate them back into the IMPRINT model to demonstrate the potential impact those guidelines have on mental workload if adopted by the system designer. These enhancements will ease the burden on system analysts when providing clear, substantiated design recommendations to enhance system design to their customers. By harnessing the power of these tools, system analysts will have a powerful modeling tool that predicts mental overload and provides tailored human performance research guidance to mitigate overload thereby helping them develop more successful systems.

References 1. Beideman, L.R., Munro, I., Allender, L.: IMPRINT Modeling for Selected Crusader Research Issues, U.S. Army Research Laboratory: Aberdeen Proving Ground, MD (1999) 2. Little, R., Dahl, S., Plott, B., Wickens, C., Powers, J., Tillman, B., Davilla, D., Hutchins, C.: Crew reduction in armored vehicles ergonomic study (CRAVES) (ARL-CR-80), Army Research Laboratory, APG, MD (1993)

288

C. Samms et al.

3. Mitchell, D.K.: Predicted Impact of an Autonomous Navigation System (ANS) and CrewAided Behaviors (CABs) on Soldier Workload and Performance, ARL-TR-4342, U.S. Army Research Laboratory, Aberdeen Proving Ground (2008) 4. Mitchell, D.K., Samms, C.L., Kozycki, R.W., Kilduff, P.W., Swoboda, J.C., Animashaun, A.F.: Soldier Mental Workload, Space Claims, and Information Flow Analysis of the Combined Arms Battalion Headquarters Command and Control Cells, ARL-TR-3861, U.S. Army Research Laboratory, Aberdeen Proving Ground (2006) 5. North, R.A., Riley, V.A.: W/INDEX: A predictive model of operator workload. In: McMillan, G.R. (ed.) Applications of human performance models to system design. Plenum Press, New York (1989) 6. Improved Performance Research Integration Tool (2009), http://www.arl.army.mil/IMPRINT

Acceptance of E-Invoicing in SMEs Karl W. Sandberg1, Olof Wahlberg2, and Yan Pan3 1

Mid Sweden University, Institution of Information Technology and Media, 851 70 Sundsvall, Sweden 2 Mid Sweden University, Institution of Social Sciences, 851 70 Sundsvall, Sweden 3 MTO-kompetens, 853 56 Sundsvall, Sweden {karl.w.sandberg,olof.wahlberg}@miun.se, [email protected]

Abstract. Electronic invoicing (e-invoicing) refers to the sending and receiving of invoices by electronic means. Small and media sized enterprises (SMEs) have not accepted e-invoicing to the same extent as large companies and the public sector in Sweden. The purpose of present study was to gain a better understanding of the acceptance of e-invoicing in SMEs, particularly small business, by describing the factors that affect e-invoicing in SMEs in rural area. The study is a part of a going on project “The Digital Age in Rural and Remote Areas” DARRA. We proposed a research model that found significant in prior research and grouped them into four different factors; organisational readiness, external pressure, owner/manager characteristics, and perceived benefit in the perception of e-invoicing in SMEs. To validate the model we collected data from owners/managers of SMEs by using a survey. The main results from present study indicate that SMEs are ready for acceptance of e-invoicing. Pressure from customers is considered to be an important factor for e- invoicing acceptance in SMEs. Furthermore, the SMEs perceive that acceptance of einvoicing can be beneficial, and lead to increased internal efficiency as well as impact on business processes and relationships. The innovativeness of the owner/manager was also found to influence acceptance of e-invoicing. Keywords: Acceptance of e-invoicing, SMEs.

1 Introduction The globalisation of markets, technology and competition has increased business' requirements for flexibility, quality, cost-effectiveness and timeliness. A way of meeting these requirements, information technology (IT) has transformed the way business is done [6]. Both businesses and consumers are buying and selling goods and services on the Internet or via other electronic networks. The possible advantages and disadvantages with e-invoicing compared to traditional invoicing in paper form are: Advantages • •

Could bring very high automatisation of the processes Few errors

D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 289–296, 2009. © Springer-Verlag Berlin Heidelberg 2009

290

K.W. Sandberg, O. Wahlberg, and Y. Pan

• Great benefits for both seller and customer • Well standardised and many business line solutions Disadvantages • • • •

Difficult for smaller businesses, with the exception of some services directed only for them Demands volumes Different standards High investment and integration cost

Large organizations, companies and municipalities have invested in einvoicing, but small and medium sized enterprises (SMEs) are lagging behind in this process (Al-Qirim, 2004; [19]). SMEs have special uniqueness with special needs, knowledge and resources when introducing e-invoicing [11]. The public sector in Sweden including several municipalities have invested in systems for sending and receiving e-invoices, while many SMEs still sending their invoices the traditional way. Ballantine, Levy and Powel [4] and Hansemark [13] try to explane that SMEs have:

• • • • •

Lack of business and IT strategy Limited access to capital resources Influence of major customers Limited information skills The owner/manager apply a high degree of locus of control in decision making [25].

It has also been shown that IT acceptance in SMEs is typically made by a single owner/manager [24, 11]. The business size, characteristics of the chief executive officer (CEO) is other important factors affecting IT acceptance in SMEs and especially small businesses. Small businesses are more likely to adopt IT when the CEOs are more innovative, have a positive attitude towards adoption of IT, and possess greater IT knowledge [28]. There is no standard definition of SMEs. This report will follow the 2005 European Commission's definition of SMEs which states that SMEs are autonomous, partner or linked enterprises with 10 to 250 employees and a total turnover of less than 50 Million EUR or a Balance Sheet total less than 43 Million EUR [10]. SMEs' lack of business and IT strategy leads to a short-term view of IT acceptance and implementation. Influence of major customers/suppliers usually makes SMEs' approach toward e-commerce acceptance more reactive than proactive, generally doing just enough to meet their customers/suppliers' needs (Chen and Williams, 1998). It has been shown that IT acceptance decisions in small businesses are typically made by a single owner/manager [24]. Present study will focus on the acceptance of e-invoicing in SMEs from the perspective that the company sends the invoice in B2B. The handling of received invoices will not be considered in this study.

Acceptance of E-Invoicing in SMEs

291

2 Research Model Davis proposed TAM, a model that has been tested in many studies [14]. Leader et al. summarized sixteen articles that tested the model for different technologies (e.g. ATM, e-mail, Netscape, Access, Internet, Word, and Excel). In their model, they considered beliefs about ease of use and perceived usefulness as the major factors influencing attitudes toward use, which, in turn, affected intentions to use. Many other studies have attempted to describe the factors influencing IT acceptance in SMEs. For example, Iacovou et al. [15] studied factors influencing the acceptance of IT by SMEs in different industries; they included perceived benefits, organizational readiness, and external pressure. To measure perceived benefits they used awareness of both direct and indirect benefits. Variables measuring organizational readiness were the financial and technological resources. Riemenschneider et al. [24] proposed a combined model using the theory of planned behaviour [1] and TAM. They tested individual models, partially integrated models. They found that the combined model provided a better fit. We proposed a research model (Figure 1), based on early research. We identified the factors that influencing acceptance of e-invoicing in SMEs, found significant in prior research and grouped into four different factors; perceived benefits, organisational readiness, external pressure, and owner/manager characteristics (Table 1). Organizational readiness was assessed by including two items about the financial and technological resources that the company may have available as well as factors dealing with the compatibility and consistency of e-invoicing with firm’s culture, values, and preferred work practices. Such items were found relevant in other research [5; 8; 23, 26].

Fig. 1. The proposed research model

292

K.W. Sandberg, O. Wahlberg, and Y. Pan Table 1. Summary of acceptance factors

Factor in the current study Factors in previous studies Perceived benefits Organizational readiness

External pressure

Source

Advantages and risks

Iacovou et al. (1995); Grandon et al. (2005); Thong (1999) Organizational readiness Iacovou et al. (1995) Readiness Chwelos et al. (2001) Organization Kuan and Chau (2000) Organizational readiness Mehrtens et al. (2001) Intra/extra organizational factors Igbaria et al. (1997) External pressure Environment External competitive pressure

Owner/manager Characteristics

Chwelos et al. (2001); Mehrtens et al. (2001) Kuan and Chau (2000) Premkumar and Roberts (1999) Thong et al. (1995), Riemenschneider et al. (2003)

External pressure was assessed by incorporating five items: competition, social factors, dependency on other firms already using e-invoicing, the industry, and the government.

3 Research Question The research question we explored was: What factors affecting acceptance of einvoicing by managers/owners of SMEs, particularly small business (less than 50 employeeds)?

4 Methodology 4.1 Subjects We targeted managers/owner of SMEs from a variety of industries in the rural region of the Sweden. In our study, we considered the number of employees as the principal criterion in determining whether a firm qualified as an SME since other categorizations involving revenue, total capital and/or other types are more difficult to apply and can result in misleading classifications. 4.2 Data collection The data were gathered by means of going on survey administered of SMEs during 2009. We identified the company name, a contact person, an e-mail address for that person, an address, and a telephone number. The contact person was typically the owner of the business or a manager. Documentation and interviews were used as collection methods for present study.

Acceptance of E-Invoicing in SMEs

293

5 Results We presented the main results of our empirical study of the case companies (n=20) in this section. Major characteristics of the analyses are emphasising regarding each factor that suppost to affect the acceptance of e-invoicing in our research model. The percent number within parenthesis show the responses of each question. 5.1 Overview of Perceived Benefits of E-Invoicing The main advantages of e-invoicing; • • • • • • • • • •

Cost savings (75 %) Staff resources can be set free for other tasks (75 %) Interest revenues and less invoice remainders needed (50 %) Reduced paper handling is beneficial for the environment (75 %) Using modern, rational and environmentally friendly technique can be good from a PR perspective (75 %) Strengthening of customer relations (50 %) Improved customer loyalty (50 %) Competitive advantage (50 %) Lower error rate (25 %) Service for the customers which makes it easier doing business (25 %)

Almost all advantages that are presented in our model are identified by the owner/manager of the companies. 5.2 Overview of External Pressure to Use E-Invoicing The importance of external pressure; • •

Pressure from customers (100 %) Competitive pressure (100 %)

The analyses indicate that all companies have considered adopting e-invoicing because customers have requested it, which is consistent with our research model. 5.3 Overview of Organisational Readiness to Use E-Invoicing The importance of organisational readiness; • •

Technological readiness (50 %) Financial readiness (75 %)

The level of technological resources differ between the companies, from a high level to sufficient IT utilisation and IT knowledge, for using an e-invoicing system. Regarding the financial resources, most of the companies have enough financial resources to accept e-invoicing and they also prioritise this investment.

294

K.W. Sandberg, O. Wahlberg, and Y. Pan

5.4 Owner/Manager Characteristics The importance of owner/manager characteristics; •

Innovativeness (50 %)

The level of the owner/manager's innovativeness differs between the companies. In one company owner/manager is described as very innovative, while the other companies are described as in-between innovative and conservative.

6 Discussion The purpose of present study was to gain a better understanding of the adoption of e-invoicing in SMEs. The overall result from the study indicated that SMEs are experiencing pressure to accept e-invoicing from their customers. The SMEs also perceive that acceptance of e-invoicing can be beneficial, and lead to increased internal efficiency as well as impact on business processes and relationships. Technologically, the SMEs are ready for an acceptance of e-invoicing. Furthermore, the technological and financial assistance and coercive methods are important strategies to facilitate the acceptance of e-invoicing in SMEs. All the SMEs in the study perceive advantages with e-invoicing, although the SMEs perceive different advantages and risks with e-invoicing. The result in this study show that factors described by Iacovou et al. [15] together with the owner/manager characteristics highlighted by Kuan and Chau [17] have been proved useful when describing the factors that affect acceptance of e-invoicing in SMEs. Perceived benefits of e-invoicing are concluded to affect SMEs' acceptance of e invoicing. Advantages described relate to increased internal efficiency such as cost savings and interest revenues are identified as important by the SMEs. Advantages that have an impact on business processes and relationships are also identified as important by the SMEs. This advantage has not been explicitly expressed in the literature. SMEs in the study believe that the advantages of e-invoicing outweigh the risks, and the benefits the companies perceive with an adoption of e-invoicing are a factor which affects adoption of e-invoicing in SMEs. The external pressure factor has been found to be the most influential factor affecting adoption of e-business systems in other studies e.g. [15, 20] and this factor is concluded to be an important factor for e-invoicing adoption in this study as well. The result shows that the SMEs subjected to this study have experienced external pressure to adopt e-invoicing, mainly from customers that have requested electronic invoicing and hence made the SMEs consider adopting e-invoicing. The majority of the SMEs also consider the competitive pressure, which is the e-invoicing capability of the firm's competitors, as a factor affecting adoption of e-invoicing. It can be noted that these SMEs also consider e-invoicing to be a competitive advantage. The SME that did not identify electronic invoicing as a competitive advantage did not consider the e-invoicing capability of their competitors as an important factor either. On an organisational level, the result indicates that the organisational readiness for adoption of electronic invoicing was generally sufficiently high for e-invoicing adoption.

Acceptance of E-Invoicing in SMEs

295

The resource poverty of SMEs is widely described in the literature, by e.g. Ballantine et al. [4] and Hansemark [13], but was only identified in one of the cases where limited financial resources was considered an obstacle for adoption of e-invoicing. In general, the cost for the e-invoicing solution is not concluded to be an obstacle as long as the adoption is well motivated through a good understanding of the benefits of e-invoicing. The technological resources of the SMEs vary, but is generally enough to be able to accept and use an e- invoicing solution. SMEs with high level of IT knowledge were also having already accepted an e-invoicing solution. When discussing the importance of the owner/manager's characteristics for adoption of e-invoicing, it can be concluded that the more innovative the owner/manager is, the more likely it is that the SME accepts e-invoicing. The reason for this is probably that innovative owners/managers are more willing to take the risk of acceptance e-invoicing, as stated by Thong et al. [28].

References 1. Ajzen, I.: The theory of planned behavior, Organizational Behavior and Human Decision Processes 50, 179–211 (1991) 2. Al-Qirim, N.A.Y. (ed.): Electronic Commerce in Small to Medium-Sized Enterprises: Frameworks, Issues and Implications. Idea Group Inc., Hershey (2003) 3. Amor, D.: The e-business revolution: living and working in an interconnected world. Prentice Hall, Upper Saddle River (2000) 4. Ballantine, J., Levy, M., Powel, P.: Evaluating information systems in small and mediumsized enterprises: issues and evidence. European Journal of Information Systems 7(4), 241–251 (1998) 5. Beatty, R.C., Shim, J.P., Jones, M.J.: Factors influencing corporate web site adoption: a time-based assessment. Information and Management 38, 337–354 (2001) 6. Burgess, S. (ed.): Managing Information Technology in Small Business: Challenges & Solutions. Idea Group Publishing, Hershey (2002) 7. Chang, M.K., Cheung, W.: Determinants of the intention to use Internet/www at work: a confirmatory study. Information and Management 39, 1–14 (2001) 8. Chin, W.W., Gopal, A.: Adoption intention in GSS: Relative importance of beliefs. Data Base 26(2-3), 42–64 (1995) 9. Chwelos, P., Benbasat, I., Dexter, A.: Research report: empirical test of an EDI adoption model. Information Systems Research 12(3), 304–321 (2001) 10. EC, The new SME definition. User guide and model declaration European Commission (2005) 11. Fillis, I., Johannson, U., Wagner, B.: Factors impacting on e-business adoption and development in the smaller firm. International Journal of Entrepreneurial Behaviour & Research 10(3), 178–191 (2004) 12. Grandon, E.E., Michael, J.: Pearson Electronic commerce adoption: an empirical study of small and medium US businesses. Information & Management 42(1), 197–216 (2004) 13. Hansemark, O.C.: The effects of an entrepreneurship programme on need for achievement and locus of control of reinforcement. International Journal of Entrepreneurial Behaviour and Research 4(1), 28–50 (1998) 14. Hendrickson, A.R., Massey, P.D., Cronan, T.P.: On the test- retest reliability of perceived usefulness and perceived ease of use scales. MIS Quarterly, 227–230 (June 1993)

296

K.W. Sandberg, O. Wahlberg, and Y. Pan

15. Iacovou, A.M., Benbasat, I., Dexter, A.: Electronic data interchange and small organizations: adoption and impact of technology. MIS Quarterly 19(4), 465–485 (1995) 16. Igbaria, M., Zinatelli, N., Cragg, P., Cavaye, A.: Personal computing acceptance factors in small firms: a structural equation model. MIS Quarterly, 279– 302 (September 1997) 17. Kuan, K.K.Y., Chau, P.Y.K.: A perception-based model for EDI adoption in small businesses using a technology-organization-environment framework. Information & Management 38(8), 507–521 (2001) 18. Lederer, A.L., Maupin, D.J., Sena, M.P., Zhuang, Y.: The technology acceptance model and the World Wide Web. Decision Support Systems 29, 269–282 (2000) 19. MacGregor, R.C.: Factors associated with formal networking in regional small business: some findings from a study of Swedish SMEs. Journal of Small Business and Enterprise Development 11(1), 60–74 (2004) 20. Mehrtens, J., Cragg, P.B., Mills, A.M.: A model of Internet adoption by SMEs. Information & Management 39(3), 165–176 (2001) 21. Mirchandani, A.A., Motwani, J.: Understanding small business electronic commerce adoption: an empirical analysis. Journal of Computer Information Systems, 70–73 (2001) (Spring) 22. Premkumar, G., Roberts, M.: Adoption of new information technologies in rural small businesses, Omega. The International Journal of Management Science 27(4), 467–484 (1999) 23. Premkumar, G., Potter, M.: Adoption of computer aided software engineering (CASE) technology: an innovation adoption perspective. Data Base 26(2-3), 105–123 (1995) 24. Riemenschneider, C.K., Harrison, D.A., Mykytyn, P.P.: Understanding IT adoption decisions in small business: integrating current theories. Information and Management 40, 269–285 (2003) 25. Sandberg, K.W., Millet, P.: Impact of locus of control how owner-manager’s perceive network usage and value in a small industrial park in rural Sweden. In: The 8th Uddevalla Symposium & The 8th McGill International Entrepreneurship Conference (2005) 26. Thong, J.Y.L.: Resource constraints and information systems implementation in Singaporean small businesses. Omega 29, 143–156 (2001) 27. Thong, J.Y.L.: An integrated model of information systems adoption in small businesses. Journal of Management Information Systems 15(4), 187–214 (1999) 28. Thong, J.Y.L., Yap, C.S.: CEO Characteristics, Organizational Characteristics and Information Technology Adoption in Small Businesses. Omega 23(4), 429–442 (1995)

Mental Models in Process Visualization Could They Indicate the Effectiveness of an Operator’s Training? Karin Schweizer1, Denise Gramß2, Susi Mühlhausen2, and Birgit Vogel-Heuser3 1

Faculty of Social Sciences, University of Mannheim [email protected] 2 Faculty of Education, Technical University of Braunschweig [email protected], [email protected] 3 Embedded Systems, University of Kassel [email protected]

Abstract. Nowadays process plant visualizations and operations take place without the operator’s physical presence at the technical device. As a conesquence a lot of complex systems must be visualized simultaneously on one or more monitors. Conventional two-dimensional man machine interfaces hardly meet the requirements of those increasing complexity of production processes. One approach to deal with the increasing number of faults during process plant monitoring is the creation and implementation of 3D visualizations. We examined the development of mental models with 2D and 3D visualizations and different forms of training (freeze image vs. slider vs. slider with interaction) regarding completeness and structure as well as the relation of the quality of problem solving and the accurate recognizing of critical situations. Additionally, we investigated the mental demand in different groups of visualization and training.

1 Mental Models Successfully adapting to changing technologies, is often a question of developing and generating functioning mental models. Mental models represent actual situations. If researchers are able to explain how people generate those mental models they are able to explain and design understanding and reasoning about complex systems. The term mental model has, however, been used in many contexts and for many purposes. Johnson-Laird (1983) [1] proposed mental models as a way of describing the process which humans go through to solve deductive reasoning problems. Gentner and Stevens (1983) [2] use the term to propose that mental models provide humans with information on how physical systems work. Mental models of complex systems represent different types of knowledge [2, 4]. According to Hegarty (1991) [5], there are three different types of knowledge necessary to understand a complex system: the knowledge of the basic components of the system, of possible states and the interrelation of the components and the source or origin of problems. Such knowledge is represented in a “mental model” of the system. It is acquired when interacting with the system [4] (see also [3]). D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 297–306, 2009. © Springer-Verlag Berlin Heidelberg 2009

298

K. Schweizer et al.

Mental models are assumed to play an important role in enhancing understanding and facilitating interactions especially when the operator has to anticipate what needs to be done before actually physically interacting with the machine. Norman (1983) [4] assumes that even if a user is not given a conceptual model on which a mental model can be based, he or she tend to develop a model which is likely to be incorrect. Other authors claim that people without a conceptual model rather apply trial and error methods not being able to fit the pieces together [6]. To test these hypotheses Borgman (1999) [3] investigated whether people can use a system better if they have a correct mental model of it. Her analyses were based on four measures coded from interviews on the mental models: the completeness of the model, the accuracy of the model, the level of abstraction, and the use of a model in approaching the tasks. An important question hereby is how to measure the effectiveness of mental models [7]. Rowe & Cook (1995) provide an overview of several methods to examine mental models and the predictability of performance. They compared four measures (laddering interview, relatedness rating, diagramming and think-aloud and verbal troubleshooting) and evaluated their ability to predict performance. The results indicated that the laddering interview and the relatedness rating are adequate techniques to investigate mental models as predictors of performance in a troubleshooting task.

2 Process Visualization Nowadays monitoring tasks in industrial processes take place without the operator’s physical presence at the technical device. As a consequence a lot of complex systems must be visualized simultaneously on one or more monitors [8]. One way to supply the operator with process information in a way adapted to human perception and information reception might be a spatial visualization in terms of three-dimensional process visualization. Smallman et al. (2001) [9] report several benefits of a threedimensional display compared to a two-dimensional one: the increase of ecological feasibility as well as a reduction of users’ mental workload through integration of spatial dimensions in one representation and users’ preference of familiarity and simplicity of three-dimensional displays. However, the authors also point out to the risk of ambiguity of three-dimensional displays that can result in problems with exact position determination. Empirical results regarding the comparison of two- and threedimensional displays are, however, not unique [11, 9]. Nevertheless, there are several findings that underline the assumption of the PCP [10]. Thus, the question arises, how to generate 3D visualizations which enhance information processing and whether mental models can indicate the success of those applications.

3 Applying Mental Models in Process Visualization To examine our research task in terms of an application example we used a thermohydraulic process to produce particle boards. This process is typically monitored by operators in a control room where different information is visualized on various displays. In our view, one possibility to improve operators work task is an alternative form of data presentation on the interface, therefore, we employed graphical displays in 2D and 3D which were developed in an interdisciplinary research project [12].

Mental Models in Process Visualization

299

Fig. 1. A 3D display of the distance of the steel bands in the hydraulic press, b 2D display of the distance of the steel bands in the hydraulic press (for explanations see text below)

The simulated production process is divided into three sections with various parameters which were visualized in four graphs on the displays. At first, the material comes on a belt and is weighted. At second, the material is transported in the machine where it is pressed, heated and compressed between two steel bands. This section was visualized in the two graphs. One showed the pressure which is exceeded on the material and the other one visualized the distance of the two steel bands in the hydraulic press, which indicates the thickness of material. At third, the final product has to be verified at the end of the process. In our experiment, 2D and 3D graphs (a surface-plot) include the same information in different ways. According to the PCP [10] the three-dimensional presentations integrate more information in the graphs in question while 2D graphs show it separately. In contrast to 3D graphs (see figure 1), 2D graphs supply two lines to monitor the right and the left hand side of the steel bands in the hydraulic press. Deviations of the process data from correct course are color-coded either through the complete graph (3D) or on the bottom of the display (2D) and could although be seen in changes of graph characteristics (e.g. the height of the graph).

4 Method 4.1 Hypothesis Our research hypothesis is divided in two closely connected assumptions. Firstly we assume that 3D visualizations are more efficient than 2D visualizations in enhancing operators understanding of the current process. We also assume that the application of 3D visualization facilitate operators work task and thereby reduce his mental demand. Secondly, we hypothesize that 3D visualization and slider training as well as slider training with interaction support forming better mental models than 2D visualization and training with freeze image. We, therefore, further assume that better elaborated mental models enhance the ability to recognize and react if a problem occurs.

300

K. Schweizer et al.

4.2 Participants The participants were 70 students (34 male and 36 female) of different departments at four universities who were rewarded with credit points or comparable rewards for their participation. Their age ranged from 18 to 41 (M=23.19, SD=4.055). Most students studied science (32) and engineering (21). Furthermore, students of humanities (7), computer science (7) and other students (3) took part in the experiment. Experimental groups were randomized. 4.3 Experimental Procedure The study employed five experimental groups which differed in types of dimensional data visualization and in kind of training. The fifth group had the additional possibility to interact with the graphs. Table 1. Experimental groups – each group consisted of 14 participants

2D 3D

Freeze group 1 group 3

Slider group 2 group 4

Slider with interaction group 5

In order to train participants we developed a training consisting of four different stages (see below, 4.4). After the training, participants were asked to monitor different critical and non-critical situations in two test phases. A critical situation is defined as a deviation from the normal process. During a normal process deviation can, however, also occur but this stays in a normal tolerance range. In a critical situation a corrective intervention as trained before was required. Each problem required an appropriate reaction to correct it. Participants’ reaction should be as fast and as correct as possible. Besides the monitoring task, secondary tasks (naming the critical situation and chatting with colleagues) were integrated. After the test phases, participants were interviewed (see chapter 4.5.1). Finally, questionnaires of presence [13], self-efficacy and mental workload (NASA-TLX) were provided [14]. The whole experiment took about 150 minutes for each individual tested participant. 4.4 Trainings According to experimental group, training differed in types of visualizations (2D vs. 3D) and the kind of training (freeze image vs. slider vs. slider with interaction). As described above training was divided in four sections: 1. An audio-visual presentation with the description of the functionality and assembling of the press and the characteristics of selected problems. Participants were taught about how a problem can be recognized and how to react when it occurs. Each problem required another corrective intervention. 2. An exploration phase in which the kind of training (freeze image vs. slider vs. slider with interaction) became relevant. During this part of the training, participant could explore several problems and a normal situation. Participants should learn

Mental Models in Process Visualization

301

how a problem arises and how it is characterized. Self-paced they could look at different states of the problems and their development out of a normal situation in detail. − Training with slider: The slider condition is characterized by the possibility to move across a problem. Participants could move the slider forward or backward, fast or slow and could explore the current problem at their own discretion to observe the development of the problem in detail. The slider condition was realized for both data visualization (2D and 3D). − Training with slider and interaction: Besides the slider, this condition is supplemented by the possibility to interact with the data visualization. Interaction means the possibility to move the diagram in an arbitrary viewpoint. − Training with freeze image: During the conditions working with freeze image participants could only explore static pictures of several problem sections. Handling was similar to the slider condition. While moving within chosen situation only two or three meaningful points of time were available. This condition was also realized for each kind of data visualization (2D and 3D). 3. A summary of substantial problem characteristics followed to assure participants knowledge. Several aspects of each problem were repeated in a compressed form. 4. The final training phase resembled the test phase. In this section, participant had to monitor the current process, detect problems and make corrective inputs to solve the occurred problem as trained before. Various critical and non-critical situations were shown. For each reaction feedback was provided. 4.5 Measurements Measurement of Mental Model. To measure participants’ knowledge, we use a semi-structured interview which combined different methods and was subdivided in two parts. At first, a combination of teach-back method [15] and card sorting was used to examine participants’ mental model of the production process from raw material to the final product when no critical situation arises. Thereby, they should differ between processes, states and constituents which were characterized by different colored cards. In contrast to conventional card sorting, we used blank cards that should be inscribed by participants. Each card stood for an element of the model and was arranged in the created model. Afterwards, the created model was explained to the interviewer. The questions according to the teach-back method intended to detect and explain illogical or incomplete parts further. Afterwards, we asked questions about the visualized diagrams in the monitoring task about its characteristics and how, in general, a critical situation could be recognized. Additionally, the possibility to draw the diagrams seen was provided. The second part of the interview collected knowledge about problem characteristics. After having named the problems participant were asked to describe them in detail. Experts evaluated and scored the statements in the interviews based on a manual which includes assessment guidelines. Each agreement was rated with one point. At first the created model during the interview and the explanation of the model were examined considering processes (5 points), constituents (10 points) and states (8 points). Each created model was analyzed in comparison to the master model completeness. Additionally, structure was evaluated according to the sections of the

302

K. Schweizer et al.

system (max. 3 points) and sections in the press (3 points) and finally were summarized to an overall structure score (6 points). Also, the second part of the interview concerning problem characteristics was evaluated by experts according to assessment guidelines for each problem. We differed between two elementary and three complex problems. For the explanation of several complex problems five points for each problem could be achieved Two for the causes, two for the required reactions and additionally one point for complexity which means the essential relation characterizing the current problem e.g. the relation between distance and exceeded pressure. Measurement of Mental Workload – NASA-TLX. Measurement of participants’ mental workload was realized with the selected items of the NASA-Task Load Index [14]. Participants were asked to estimate their effort to solve the task, their visual and mental effort and the experienced time pressure as well as the felt frustration during the task by a scale. 4.6 Results Completeness of Mental Models. To analyze the completeness of mental models for experimental groups a Kruskal-Wallis-test was computed. The results showed that the five groups tend to differ in number of mentions for states (chi-square =6.981; P=.137). In contrast, no hints of group differences in constituents and processes were found. Subsequently, a Mann-Whitney-test to compare of group 1 with highest average and group 5 with lowest average for states was calculated. The results indicated a significant difference of the groups (U=53.000; P<.05). The participants who worked with 2D freeze image named significant more states of the production process than those using 3D slider with interaction. No other differences could be found. For the next step conditions with the same dimensionality were subsumed into one group (2D and 3D) and the fifth group (3D with interaction) was not allocated in one of these groups. Again, a Kruskal-Wallis-test for constituents, processes and states was computed. The result indicated a significant difference for states (chisquare=8.571; P<.05). Additionally, Mann-Whitney-tests were calculated which results showed a significant difference in states (U=233.000; P<.01). Participants using 2D called more states than 3D with interaction. Also, for both groups a tendency for difference in processes could be found (U=303.000; P=.128). A comparison of 3D and 3D with interaction showed a significant difference for states (U=255.000; P<.05). The results for subsumed groups of the same dimensionality indicated that 3D with interaction achieved a lower score than 2D and 3D for states in the production process. Groups were also subsumed according to the kind of training (freeze image and slider) without the fifth group as before. At first the Kruskal-Wallis-test was computed and the results showed that the groups differ in the number of mentions for states (chi-square=8.739; P<.05) and a trend for difference in number of constituents (chi-square=3.899; P=.142). We calculated Mann-Whitney-test to test the difference in detail. At first comparing freeze image condition with slider group stated a signifycant difference concerning constituents (U=274.000; P<.05). Those participants who were trained with slider reminded more constituents of the press. Secondly, we

Mental Models in Process Visualization

303

compare slider and slider with interaction condition. Results indicated that the number of states showed a significant difference (U=249.000; P<.05) and tendency for difference in called processes in the created model and his explanations (U=298.000; P=.106). Working with slider led to higher scores in states and by trend in processes. Finally, examination of freeze image and slider with interaction resulted in a significant difference in the number of mentioned states (U=239.000; P<.01). Again, the group using slider while training achieved higher scores. Summarizing the results on the completeness of mental models, we can see that especially, states as a characteristic of a mental model indicated on the one hand an advantage of 2D and on the other hand a superiority of slider condition. Structure of mental models. To examine our hypothesis that the structure of mental models differs between experimental groups, we computed the non-parametric Kruskal-Wallis-test. The results indicated a tendency of difference between experimental conditions concerning sections of the system (chi-square=7.649; P=.105). To compare various groups we used the Mann-Whitney-test. Table 2. shows significant results (bolt) and tendencies we found. Table 2. Results of Mann-Whitney-tests for sections of system

2D slider 3D slider

2D freeze image sections of overall strucsystem ture U=63.000; U=54.500; P=.095 P<.05 U=64.000; U=63.000; P=.086 P=.095

3D freeze image sections of overall strucsystem ture U=59.500; U=55.500; P=.071 P<.05 U=65.000; U=59.500; P=.137 P=.071

Significant results were found indicating that groups worked with slider mentioned more often the sections of the system. The fifth group was not different from the other groups. Additionally, a Kruskal-Wallis-Test was executed for overall structure score (chi-square=6.143; P=.189). As can be seen in Table 2. , only tendencies were found. Each time the slider conditions achieved the higher scores. As before, groups with the same dimensionality (2D and 3D) were subsumed (without group 5). The Kruskal-Wallis-Test we calculated however did not indicate any difference in several conditions of visualization neither in sections of systems nor for overall structure score. Finally, the various kinds of training were subsumed in groups (slider and freeze image). The 3D slider with interaction was investigated particularly. At first a Kruskal-Wallis-Test was examined which showed significant differences for sections of the system (chi-square=7.563; P<.05) and the overall structure score (chisquare=6.052; P<.05). For detail investigation, we computed several Mann-WhitneyTests which indicated s significant difference of freeze image and slider for sections of the system (U=239.000; P<.01) and overall structure score (U=245.000; P<.05). Comparing of freeze image and slider with the fifth experimental group (slider with interaction) showed no differences.

304

K. Schweizer et al.

In summary, we can state that participants working with slider developed a more structured and in this regard a detailed mental model than participants received training with freeze image or slider with interaction. Similar results could be demonstrated for overall structure score. The kind of data visualization revealed no differences in structure. Quality of Problem Solving and the Relation to Accurate Performance. The next step to investigate mental models was the examination of the quality of problem solving and the relation to accurate performance. To analyze the quality of problem solving we subsumed points for several complex problems to one sum score. Accuracy in recognizing complex problems was measured by hits of complex problems for both test trial. We calculated a Pearson correlation of both, the overall sum score of complex problems and the hits of complex problems. We found a significant correlation of complex problems (r=.325; P<.01). The knowledge about problems characteristics was associated with a better performance which means the accurate recognizing of required complex problems. Mental Workload. To examine mental workload, we firstly compared NASA-TLX ratings of different experimental groups by means of ANOVA. Only one significant difference resulted for time pressure (F=4.631; P<.005). A Bonferroni-test was computed to test possible difference between single groups. The results indicated that participants working with the 2D slider condition showed stronger feelings of time pressure during the working task than 3D freeze (P<.01), the 3D slider (P=.067) and the 3D slider with interaction (P<.05). Thus, all three groups with three-dimensional displays showed lower demand concerning time pressure. Furthermore, the investigation of the sum score of overall workload indicated a tendency (F=1.916; P=.119). A further Bonferroni-test revealed a tendency of difference between the 2D Slider condition and the group working 3D slider with interaction (P=.114). Participants of group 5 reported a reduced demand. According to the analysis of mental models, we combined groups with the same dimensionality and analyzed the differences in participants’ ratings by means of a ttest (without group 5). The results showed that time pressure (t=3.804; P<.01) differed significantly. Working with 2D displays seems to be more demanding than 3D visualization. Additionally, the sum of score of workload judgments indicated a tendency of difference between groups of different visualizations (t=1.685; P=.098). Thus, 3D visualizations can be a relevant factor for operators’ relief.

5 Discussion We intended to show that 3D visualizations with slider and interaction are more efficient in enhancing operators understanding of the current process than 2D visualizations under freezing conditions. We further assumed that this enhancing is due to the generation of better mental models than during 2D visualization and training with freeze image. We, therefore, investigated the mental models of 70 participants in a lab experiment with different forms of visualizations (2D vs. 3D) and varying forms of training (slider vs. freeze image). We measured reaction times and error rates, we interviewed the participants and we employed a questionnaire to measure mental workload (NASA-TLX).

Mental Models in Process Visualization

305

Summarizing the results on the completeness and the structure of mental models, we found that participants working with slider conditions developed better mental models than participants who worked with the freeze conditions. In contrast to our hypothesis 3D visualizations do not lead to better mental models than 2D visualizations. We have not found reliable differences between the dimensions or the group who worked with the interaction. However, regarding the mental workload after the experiment indicates that participants experienced higher mental demand during the 2D-Slider condition. Taking these results together one could ask whether mental models could indicate the effectiveness of training conditions at all unless there was this relationship between the quality of problem solving and the correctness of responses. We found a weak but reliable correlation between these two measurements. These are controversial findings at the first glance. A Further look shows that participants under the 2D-Slider condition had to integrate different information bits that were presented separately while sliding through various problems. On closer examination of results we, therefore, assume that participants in 2D groups grappled more with the data visualization, because in this condition it was more difficult to monitor the process. Perhaps that difficulty leads to a more intensely engagement with the problems and the system. Thus, consequently the mental model was further developed than in other conditions. One could also argue that the measures we employed to analyze the quality of mental models is not sufficient or reliable. Indeed as Rowe and Cook (1995) [7] indicated, there is a great disunity about suited measures for mental models. We based some of our measures on the study of Borgman (1999) [3]. We used a semi-structured interview especially the combination of teach-back method and card sorting to examine how participants understood the system. The method allowed the interviewer to detect illogical and incomplete explanations and to inspire participants to rethink. These investigations on mental models are useful insights on participants’ understanding and reasoning about complex systems and developing different measurements as we did with this investigation might even improve these insights.

References 1. Johnson-Laird, P.N.: Mental Models - Towards a Cognitive Science of Language, Inference and Consciousness. Harvard University Press, Cambridge (1983) 2. Gentner, D., Stevens, A.L. (eds.): Mental Models. Lawrence Erlbaum Associates, Hillsdale (1983) 3. Borgman, C.L.: The user’s mental model of an information retrieval system: an experiment on a prototype online catalog. Human-Computer Studies 51, 435–452 (1999) 4. Norman, D.A.: Some observations on mental models. In: Gentner, D., Stevens, A.L. (eds.) Mental Models, pp. 7–14. Lawrence Erlbaum Associates, Hillsdale (1983) 5. Hegarty, M.: Knowledge and processes in mechanical problem solving. In: Sternberg, K.J., Frensch, P.A. (eds.) Complex problem solving: Principles and mechanisms, pp. 253–285 (1991) 6. Moran, T.P.: The command language grammar: a representation fort he user of interactive systems. International Journal of Man-Machine Studies 15, 3–50 (1981) 7. Rowe, A.L., Cook, N.J.: Measuring mental models: Choosing the right tools for the job. Human resource development quarterly 6, 243–255 (1995)

306

K. Schweizer et al.

8. Zeipelt, R., Vogel-Heuser, B.: Nutzen der 3D-Prozessdatenvisualisierung in der industriellen Prozessführung. ATP- Automatisierungstechnische Praxis 45(3), 45–50 (2003) 9. Smallman, H.S., St. John, M.v., Oonk, H.M.: Information Availability in 2D and 3D Displays. IEEE Computer Graphics and Applications, 51–56 (2001) 10. Wickens, C.D., Andre, A.D.: Proximity Compatibility and Information Display: Effects of Color, Space, and Objectness on Information Integration. Human Factors 32(1), 61–77 (1990) 11. Baumann, J.D., Blanksteen, S.I., Dennehy, M.: Recognition of Descending Aircraft in a Perspective Naval Combat Display. Journal of Virtual Environments (1997) 12. Vogel-Heuser, B., Schweizer, K., van Burgeler, A., Fuchs, Y., Pantförder, D.: Auswirkun¬gen einer dreidimensionalen Prozessdatenvisualisierung auf die Fehlererkennung. Zeitschrift für Arbeitswissenschaft 1, 23–34 (2007) 13. Gramß, D., Schweizer, K., Mühlhausen, S.: Influence of presence in three-dimensional process control. In: PRESENCE 2008, proceedings of the 11th International Workshop on Presence, CLEUP Cooperativa Libraria universitaria, Padova, pp. 319–325 (2008) 14. Hart, S.G., Staveland, L.E.: Development of a NASA-TLX (Task load index): Results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, pp. 139–183 (1988) 15. Van der Veer, G.C., Puerta Melguizo, M.C.: Mental models. In: Jacko, J.A., Sears, A. (eds.) The Human-Computer Interaction Handbook: Fundamentals, evolving Technologies and emerging applications, pp. 52–80. Lawrence Erlbaum & Associates, Uitgever (2002)

Effects of Report Order on Identification on Multidimensional Stimulus: Color and Shape I-Hsuan Shen1 and Kong-King Shieh2 Department of Occupational Therapy, Chang Gung University, 259, Wen-Hwa 1st Road, Kwei-Shan, Tao-Yuan 330, Taiwan [email protected] 2 Department of Healthcare Administration, Oriental Institute of Technology, Taipei, Taiwan 1

Abstract. Two experiments were conducted to investigate the effects of order of report on multidimensional stimuli under between-subject and within-subject designs. The two orders of report were Order Color/Shape and Order Shape/Color. Eighteen participants responded according to the instructed one of the two orders of report in between-subject study. Results showed that response time for Order Color/Shape was significantly shorter than Order Shape/Color. Order Color/Shape, fit the Chinese “adjective then noun” grammar, is more appropriate if people report stimulus attributes in ways consistent with their long-standing language habits. However, another group of eleven participants were required to respond according to task cue alternately in within-subject study. Results showed that switch cost as indicated by response times increase was greater for Order Color/Shape than Order Shape/Color (97 msec. vs. 41 msec. for response time for the first stimulus dimension; 95 msec. vs. 28 msec. for response time total). Such results didn’t support the hypothesis that the switch cost would be greater for Order Shape/Color than for Order Color/Shape. The order in which the color attribute should be considered very clearly. Keywords: report order, multidimensional stimulus identification, task switch, color coding.

1 Introduction The problems in designing displays have been among the most important topics in human factors engineering. Compacting information into a single multidimensional stimulus can be an effective way of utilizing limited display space and reducing clutter [1]. It may also facilitate integration of information [2] and reduce mental workload [3] [4]. 1.1 Report Order If operators search the dimensions in a particular order, it implies the discriminability of two symbols is determined by the specific order in which features are examined [5]. To develop an optimal symbol set the designer must take into account the order of the search through the dimensions. Likewise, the order of reporting dimensional values D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 307–316, 2009. © Springer-Verlag Berlin Heidelberg 2009

308

I-H. Shen and K.-K. Shieh

may play an important role in the accuracy and speed of identifying targets. Shieh and his colleagues [6] [7] investigated the effects of order of report on the speed and accuracy of identifying multidimensional stimuli. They found that subjects responded faster and more accurately if there was a natural language-appropriate order of reporting the dimensional attributes. The stimuli they used were a subset of the Naval Tactical Display System symbols [8] (stimulus set see Fig.1). Two dimensions were attached to each symbol. The first dimension was Shape. The second dimension was Part. They found that the order of reporting dimensional values affected the speed and accuracy of identification. Subjects responded faster and more accurately if the order of reporting stimulus-dimension values was appropriate. The appropriate order of report is the order in which long-standing habits based on a standard word order are not violated. Reporting the Part dimension first then the Shape dimension is more consistent with the “adjective then noun” habit of native speakers of American or Chinese. For example, reporting a whole circular shape by “full circular” is more natural than reporting it by “circular full,” hence giving effects of order of report. Such results were consistent with the finding of Harris and Haber [9] that performance based on “adjective then noun” order is better than that based on the reverse order. But this report-order behavior effects was failed to replicate in another study [10] [11], suggesting that this may be not a robust finding.

Fig. 1. Nine symbols defined by part and shape dimensions. Each dimension had three values.

1.2 Compatibility The order of report in a multidimensional situation might be considered to be one type of S-R compatibility. Reporting the Part dimension first then the Shape dimension is more consistent with the “adjective then noun” habit of native speakers of American or Chinese. The order of report effect is a case of S-R compatibility; that is S-R compatibility is greater for Order Part/Shape than Order Shape/Part.

Effects of Report Order on Identification on Multidimensional Stimulus

309

1.3 Task Switch In task-switching paradigm, subjects are given two tasks. On some trials, subjects switch between the tasks while on others they repeat the previous task. A robust finding indicates that, when a task is performed on a switch trial (Task B then Task A), there is a sizable decrement in performance measured as increased reaction time or decreased accuracy compared with performance of a task which is repeated (Task A then Task A). This decrement is called switch cost and measured by reaction time [12] [13]. Gilbert and Shallice [14] suggested that the asymmetry of task switch cost is obtained when the two tasks demand larger differences in top-down control input, as in the classic Stroop tasks. It is easier to switch to the weaker task. The reverse pattern obtained when the tasks differ principally in the strength of their S-R mapping, as with typical S-R compatibility effects. If an experimental paradigm that amalgamates a task-switch design with two report orders as the two tasks is implemented. The appropriate order of report is the order in which long-standing habits based on a standard word order are not violated. The order of report effect is a case of S-R compatibility; that is S-R compatibility is greater for the appropriate order than the other. Based upon the results of the task-switching paradigm, it was assumed that switching from the report order of low S-R compatibility to the report order of high S-R compatibility is easier than switching in the opposite direction. In other words, if the report order of high S-R compatibility is indeed more appropriate, the effectiveness should be obtained by the asymmetry of task switch cost. Switch cost will be greater for the report order of low S-R compatibility than the report order of high S-R compatibility. 1.4 Color Dimension Another important multidimensional stimulus code is color, which has been one of the most studied coding techniques. Color attracts attention, enhances contrast between objects, and even appeals to our esthetic senses [2] [15]. The effect of order of report in multidimensional situations involving color coding has been an ambiguous issue at best. Some studies [1] [16] [17] suggested that color is less vulnerable to order of report effects and that memory for color deteriorates less during a retention interval than does memory for other stimulus attributes, hence, reporting color later may have an advantage. Other researchers [18] [19] considered color to be a natural adjective and an attention attractor, which should be coded for important stimulus dimensions and those to be reported first. Shieh and Chen [7] suggested that the order in which the color attribute should be reported depended on the characteristics of the stimulus dimensions. If there was a natural language-appropriate order of reporting the dimensional attributes, the color attribute should be used as an adjective and reported first. Otherwise, reporting color attribute later seemed to have an advantage because color is less vulnerable to memory deterioration. To identify the efficiency gain involving color coding in processing multidimensional information deserves further empirical study. 1.5 Hypotheses Both color and shape dimensions were used in the multidimensional stimulus in the present study. If the effects of natural language order effect can be consistent, the performance of Order Color/Shape will be better. We have two hypotheses: first,

310

I-H. Shen and K.-K. Shieh

participants respond faster for Order Color/Shape than Order Shape/Color in between-subject and within-subject design studies. Secondly, response times would be longer for Order Shape/Color, switch trials, and switch cost would be greater from Order Color/Shape to Order Shape/Color than in the opposite direction.

2 Method Two experiments are designed to provide information regarding the intention. In Exp. 1 one group report in color-shape order (Order Color/Shape) and the other group report in shape-color order (Order Shape/Color). The Order Color/Shape is a natural language order. Exp. 2 was designed to investigate performance using switch task paradigm; that is, order of reporting stimulus dimension was a within-subject factor. Experiment 1: Between-Subject Design. 2.1 Participants Eighteen male college students between 19 and 26 years old (M=21.5 yr., SD=1.9) were tested. All had 18/20 corrected visual acuity or better and normal color vision. The participants were paid for their participation. 2.2 Stimuli Nine symbols were used as the stimulus set (see Fig.2). Each of these nine symbols was encoded with two basic dimensions. The first dimension, "color," was yellow, green, or red. The second dimension, "shape," was circular, square, or angular. Nine stimuli were used, presented against a black background. The height and width of the stimuli were about 1 cm by 1 cm. The luminance of display symbols was about 35cd/m2 on a black background.

Fig. 2. One Nine symbols defined by color and shape dimensions. Each dimension had three values.

2.3 Design A between-subject design was conducted with 18 male participants randomly assigned to the two order conditions. There were 9 participants in each treatment group. The

Effects of Report Order on Identification on Multidimensional Stimulus

311

study evaluated one independent variable, order of report, with two levels. In Order Color/Shape, participants were instructed to report the “color” dimension first and the “shape” dimension second. In Order Shape/Color, the order of report was reversed. A block consisted of 8 random presentations of the 9 symbols. There was one practice blocks prior to the experiment. Participants completed six blocks of 72 trials in experiment. There was a 2-min break between blocks. 2.4 Procedure The symbols were presented one per trial at the center of the display during the identification task. Viewing distance between the participant and display was approximately 60 cm. The symbols were presented till the participants completed the response. Before each trial, a participant fixated on a small cross on the middle of the screen. The cross was presented for 0.9 sec prior to the onset of each stimulus. Each symbol could be identified on two dimensions according to the instructed order of report for the particular experimental conditions. The participants were instructed to identify the symbol presented during each trial by pressing the buttons that defined the symbols on a keypad. Two columns on the keyboard, three buttons in each column, were labeled with the descriptive names. The two columns represented the two stimulus dimension, and the three buttons in each column represented the three values of that dimension. Participants were instructed to enter the dimensional value of the left column first, and then enter the dimensional value of the right column. The order of report was manipulated by the dimension the left or right column represented. Each participant responded to the two stimulus dimensions by using the right index finger. 2.5 Performance Measures and Data Analysis Behavioral data were averaged separately for each order of report condition. Four behavioral measures were collected. Response time for the first stimulus dimension (RT1) was the time between the presentation of a symbol and the subjects’ correct response to the first dimension. Response time for the second stimulus dimension (RT2) was the time between subjects’ identification of the first dimension and their correct response to the second dimension. Response time total (RTT) was the sum of RT1 and RT2. Total percentage correct was 100 times the number of symbols correctly identified divided by the number of symbols presented under each experimental treatment. 2.6 Results Table 1 shows mean response times for the first and second dimensions, total response time, and total percentage correct for each order of report. The over-all mean of response time for the first dimension (RT1) was 826 msec.. Results of analysis of variance showed that order of report had significant effect on RT1. Response time for RT1 for Order Color/Shape (735 msec.) was significantly shorter (F1, 16= 6.32, p < .05) than Order Shape/Part (916 msec.). The overall mean of response time for the second dimension (RT2) was 343 msec., much shorter than for RT1. RT2 were 298 msec. and 388 msec. for Order Color/Shape and Order Shape/Color. However, these differences were not significant. The overall mean total response time (RTT) was 1169 msec.. Results of

312

I-H. Shen and K.-K. Shieh

analysis of variance showed that order of report had significant effect on RTT. RTT for Order Color/Shape (1034 msec.) was significantly shorter (F1, 16= 5.90, p < .05) than Order Shape/Color (1304 msec.). Apparently, RT1 was the more important component for RTT than response time for RT2. The overall percentage correct was 94.9, 95.8% for Order Part/Shape and 94.1% for Order Shape/Part. Analysis of variance performed on the percentage correct showed no significant effect. Table 1. Means and Standard Deviation (msec.) for the Four Behavior Measures for Each Report Order

Experimental Condition n 9 Color/Shape 9 Shape/Color Grand mean 18

RT1 M 735 916 826

SD 151 153 174

RT2 M 298 388 343

SD 94 118 113

RTT M 1034 1304 1169

SD 237 234 267

% Correct M SD 92.6 4.8 95.8 3.1 94.2 4.2

3 Experiment 2: Within-Subject Design 3.1 Participants Eleven male college students, another group different from experiment 1, between 19 and 24 years old (M=20.8 yr., SD=1.6) were tested. All had 18/20 corrected visual acuity or better and normal color vision. The participants were paid for their participation. 3.2 Design and Procedure The experimental stimulus sets and performance measures were the same as those used in Exp. 1, except as noted below. There was a task cue prior to each presentation of a symbol indicating which task the participant was to perform. For example, if the task cue was ‘color/shape’, then the participant had to report the ‘color’ dimension first and the ‘shape’ dimension second by pressing the appropriate buttons on the keypad. If the task cue was ‘shape/color,’ then the order of report was the reverse. The task cue changed randomly. On a non-switch trial, the task cue was preceded by the same task cue; on a switch trial, the task cue was preceded by a different task cue. The task cue was presented for 0.9 sec prior to the onset of each stimulus. Each participant completed five blocks; each block consisted of eight random presentations of the nine symbols (a total of 72 trials). The task cue was randomly selected and balanced for each stimulus. There was one practice blocks prior to the experiment. The independent variables in the analysis were 2×2 within-subjects factors and included order of report (Order Color/Shape or Order Shape/Color) and trial type (switch trial or non-switch trial). 3.3 Results Tables 2 to 5 summarize the means and standard deviations for the four dependent measures for the two orders of report and the two trial types of switch and non-switch trial separately. The switch cost was calculated by subtracting the mean reaction time

Effects of Report Order on Identification on Multidimensional Stimulus

313

on non-switch trials from the corresponding values on switch trials. The over-all mean of response time for the first dimension (RT1) was 1097 msec. Analysis of variance showed that order of report had significant effect on RT1. RT1 for Order Color/Shape (1144 msec.) was significantly longer (F1, 10= 14.49, p <.01) than Order Shape/Color (1050 msec.). The effect of trial type was statistically significant (F1, 10=12.58, p < .01). RT1 was greater for switch trial (1132 msec.) than for non-switch trial (1063 msec.). Interaction between order of report and trial type was not significant. The overall mean of response time for the second dimension (RT2) was 388 msec., much shorter than for RT1. RT2 were 372 msec. and 403 msec. for Order Color/Shape and Order Shape/Color. The difference was statistically significant (F1, 10= 5.58, p < .05). The effects of trial type and interaction between order of report and trial type were not significant. The overall mean total response time (RTT) was 1485 msec.. Results of analysis of variance showed that order of report had significant effect on RTT. RTT for Order Color/Shape (1517 msec.) was significantly longer (F1, 10= 12.09, p < .01) than Order Shape/Color (1454 msec.).The effect of trial type was statistically significant (switch trial: 1516 msec.; non-switch trial: 1455 msec.; F1, 10= 11.21, p < .01). Interaction between the two factors was not significant. Analysis of variance performed on the correct rate showed no significant effect for order of report (Order Color/Shape: 80.2.6%; Order Shape/Color: 80.8%). However, the effect of trial type was statistically significant (switch trial: 77.7%; non-switch trial: 83.2%; F1, 10= 6.32, p < .05). Interaction between the two factors was not significant. Table 2. Means and Standard Deviations (msec.) of RT1 for Each Experimental Condition

Switch Non-Switch Switch Cost M SD M SD M SD [97] 1144 323 Color/Shape 1193 344 1096 310 [41] 1050 287 Shape/Color 1071 306 1030 280 1132 324 1063 290 [69] 1097 306 Table 3. Means and Standard Deviations (msec.) of RT2 for Each Experimental Condition

Switch M SD 372 88 Color/Shape Shape/Color 398 100 385 93

Non-Switch Switch Cost M SD M SD 374 81 [-2] 372 83 410 113 [-12] 403 105 392 98 [-7] 388 94

Table 4. Means and Standard Deviations (msec.) of RTT for Each Experimental Condition

Switch Non-Switch Switch Cost M SD M SD M SD [95] 1517 379 Color/Shape 1565 403 1470 366 [28] 1454 365 Shape/Color 1468 377 1440 370 1516 384 1455 359 [61] 1485 369

314

I-H. Shen and K.-K. Shieh

Table 5. Means and Standard Deviations of Correct Rate for Each Experimental Condition

Switch Non-Switch Switch Cost M SD M SD M SD [-7.8] 80.2 8.8 Color/Shape 76.3 8.6 84.1 7.6 [-3.3] 80.8 8.9 Shape/Color 79.1 11.2 82.4 6.1 77.7 9.8 83.2 6.8 [-5.5] 80.5 8.8

5 Discussion and Conclusions Shieh, et al. [6] suggested that subjects responded faster and more accurately if the order of reporting stimulus dimension was natural language-appropriate. In Exp.1, results showed that response time for the first stimulus dimension (RT1) and response time total (RTT) for Order Color/Shape was significantly shorter than Order Shape/Color. Results suggest that Order Color/Shape, fit the Chinese “adjective then noun” grammar, is more appropriate if people report stimulus attributes in ways consistent with their long-standing language habits. Results in Exp.1 support findings by Shieh, et al. [6] and Shieh and Lai [20] [21]. However, results in Exp.2 didn’t support natural language order effect. Comparison of the results of Exp. 1 and 2 showed that the participants’ responses in Exp. 2 were significantly slower than those in Exp. 1. Participants responded according to the instructed report order and had no need to switch the other report order in Exp.1. However, as contextual requirements changed randomly in Exp. 2, previously information became irrelevant and cognitive processes need to be reconfigured to deal with changing contextual demands, suppressing previously relevant report order and implementing the currently report order. In Exp.2 with respect to the effect of task switch, the results showed that RT1, RTT and error rate were significant greater for switch trial than for non-switch trial. Exp.2 has revealed that even a simple switch between one of two response rules will take normal individuals extra time to complete. Performance on a more demanding task (within-subject study) may differ unexpectedly from performance on a simple task (between-subject study). Order Color/Shape is more appropriate and supports that if people report stimulus attributes in ways consistent with their long-standing language habits in Exp.1. However, that report order effect was not consistent in Exp.2. RT1 and RTT for Order Color/Shape were significantly longer than Order Shape/Color. Moreover, switch cost for RT1 and RTT was greater from Order Shape/Color to Order Color/Shape than in the opposite direction. These results are contrary to the findings by Shieh and Shen [10]. Different stimulus sets might be responsible for the different results. Color dimension is an attention attractor and not a spatial attribute like shape or part. Attention research by Treisman [22] clearly showed that some dimensions are preattentively discriminated and have a special status (like color or orientation) whereas others are not. With respect to the effect of task switch, switch cost for RT1 and RTT was greater from Order Shape/Color to Order Color/Shape than in the opposite direction. Our second hypothesis that the switch cost would be greater for Order Color/Shape than for Order

Effects of Report Order on Identification on Multidimensional Stimulus

315

Shape/Color were not supported by these data. Further empirical study deserves to identify the effects involving color coding in processing multidimensional information.

References 1. Tsang, P.S., Bates, W.E.: Resource of allocation and object displays. In: Proceedings of the Human Factors Society 34th Annual Meeting Human Factors Society, Santa Monica, CA, pp. 1484–1488 (1990) 2. Wickens, C.D., Andre, A.D.: Proximity compatibility and information display: effects of color, Space, and objectness on information integration. Human Factors 32, 61–77 (1990) 3. Duncan, J.: Selective attention and the organization of visual information. Journal of Experimental Psychology: General 118, 13–42 (1984) 4. Carswell, C.M., Wickens, C.D.: Mixing and matching lower-level codes for object displays: evidence for two sources of proximity compatibility. Human Factors 38, 1–23 (1996) 5. Fisher, D.L., Tanner, N.S.: Optimal symbol set selection: a semiautomated procedure. Human Factors 34, 79–95 (1992) 6. Shieh, K.K., Lai, C.J., Ellingstad, V.S.: Effects of report order, identification method, and stimulus characteristics on multidimensional stimulus identification. Perceptual and Motor Skills 82, 99–111 (1996) 7. Shieh, K.K., Chen, F.F.: Effects of report order and stimulus type on multidimensional stimulus identification. Perceptual and Motor Skills 95, 783–794 (2002) 8. Osga, G.: An evaluation of identification performance for Raster Scan generated NTDS symbology (Final Report). Naval Ocean System Center, San Diego (1982) 9. Harris, C.S., Haber, R.N.: Selective attention and coding in visual perception. Journal of Experimental Psychology 65, 328–333 (1963) 10. Shen, I.H., Shieh, K.K., Ko, Y.H.: Event-related potential as a measure of effects of report order and training on identification of multidimensional stimuli. Perceptual and Motor Skills 102, 197–213 (2006) 11. Shieh, K.K., Shen, I.H.: Report order and task switch on multidimensional stimulus identification: a study of event-related brain potential. Perceptual and Motor Skills 102, 905–918 (2006) 12. Allport, A., Styles, E.A., Hsieh, S.: Shifting Intentional Set: Exploring the Dynamic Control of Tasks. In: Umilta, C., Moscovitch, M. (eds.) Attention and Performance: 15 Conscious and Nonconscious Information Processing, pp. 421–452. MIT Press, Cambridge (1994) 13. Rogers, R.D., Monsell, S.: Costs of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General 124, 207–231 (1995) 14. Gilbert, S.J., Shallice, T.: Task switching: a PDP model. Cognitive Psychology 44, 297–337 (2002) 15. Silverstein, L.D.: Human factors for color CRT display system: Concepts, methods and research. In: Durrett, H.J. (ed.) Color and the Computer Academic, Orlando, pp. 26–61 (1987) 16. Christ, R.E.: Review and analysis of color coding research for visual displays. Human Factors 17, 542–570 (1975) 17. Lappin, J.S.: Attention in the identification of stimulus in complex visual displays. Journal of Experimental Psychology 75, 321–328 (1967) 18. Luder, C.B., Barber, P.J.: Redundant color coding on airborne CRT displays. Human Factors 26, 19–32 (1984)

316

I-H. Shen and K.-K. Shieh

19. Shiffrin, M.S., Schneider, E.J.: Controlled and automatic human information processing: II Perceptual learning, automatic attending, and a general theory. Psychological Review 84, 127–192 (1977) 20. Shieh, K.K., Lai, C.J.: Effects of practice on the identification of multidimensional stimulus. Perceptual and Motor Skills 83, 435–448 (1996) 21. Shieh, K.K., Lai, C.J.: Multidimensional stimulus identification: Instructing subjects in the order of reporting stimulus dimension. Perceptual and Motor Skills 84, 995–1008 (1997) 22. Treisman, A.: Feature binding, attention and object perception. Philosophical Transactions of the Royal Society, Series B 353, 1295–1306 (1998)

Confidence Bias in Situation Awareness Ketut Sulistyawati and Yoon Ping Chui Centre for Human Factors and Ergonomics School of Mechanical and Aerospace Engineering Nanyang Technological University 50 Nanyang Avenue, Singapore 639798 [email protected], [email protected]

Abstract. In this paper, we explore the concept of confidence bias in Situation Awareness (SA), i.e., the perception of own situational knowledge, a metacognition aspect of SA. Two studies were conducted to evaluate the nature of confidence bias across the present and future status, and across individual and team missions, as well as its relation with performance outcome. The results from both studies were consistent. Participants’ confidence bias was higher in the future than present status, but did not differ significantly across individual and team missions. Participants who had lower confidence bias were found to have better performance. Keywords: situation awareness, confidence bias, meta-cognition.

1 Introduction Loss of Situation Awareness (SA) is often associated as the leading cause of performance errors in high risk, dynamic, and complex environment. SA is commonly viewed as knowing what is going on and projecting what is going to happen in the near future. This definition encompasses time elements of the “present” and “future”. Endsley [1] further broke down SA to include three levels, namely perception of elements in the environment (level 1), comprehension of the current situation (level 2), and projection of future status (level 3). Assessing the level of operator’s SA can be done objectively by comparing operator’s situational knowledge and the actual situation. Various SA assessment tools are derived from this concept, for example, Situation Awareness Global Assessment Technique (SAGAT) [2], Situation-Present Assessment Method (SPAM) [3], Situation Awareness Control Room Inventory (SACRI) [4]. Subjectively, the operators can be asked to provide self appraisal of their SA level. This method requires the operators to direct their attention toward themselves and evaluate the extent to which they are aware about the situation. This self appraisal can be treated as a confidence level of a person towards his situational knowledge. It can determine the selection of actions, e.g., whether to gather more information or to act immediately. Existing tools to measure self-appraisal of SA include Situation Awareness Rating Technique (SART) [5], Situation Awareness Rating Scales (SARS) [6], D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 317–325, 2009. © Springer-Verlag Berlin Heidelberg 2009

318

K. Sulistyawati and Y.P. Chui

and Crew Awareness Rating Scale (CARS) [7]. There are evidences that self appraisal is independent of the actual quality of the acquired SA. For instance, in a comparison between SAGAT and SART, no significant correlation [8] or only moderate correlation [9] was found between the two. Alfredson [10] suggested that the difference between what an operator is aware of (i.e., objective SA) and what he thinks he is aware of (i.e., self appraisal, subjective SA) is an important SA indicator. Similarly, Nofi [11] suggested that SA is a function of the two measures, with high objective and high subjective scores being the best SA, and low objective and high subjective scores being the worst SA. For example, a person who thinks that he is fully aware of the situation but in fact he is not, may decide to act directly and confidently without further assessment of the situation. On the other hand, a person who is aware that he has not gain enough understanding of the situation may decide to further assess the situation before taking actions, or select more conservative actions. In her experiment involving air-to-air combat, Endsley [12] noted that low SA elicited using SAGAT did not necessarily accompany low performance. She suggested that this was because the pilots could modify their behavior and act conservatively when they knew that their own knowledge was incomplete. Research on people’ perception of own knowledge can be traced back to the 1980s. Fischhoff [13] revealed the tendency of people to over-estimate how much they knew about general knowledge. Lichtenstein et al. [14] summarized several studies that investigate overconfidence. It was reported to be higher for difficult items compared to easy items in questions related to general knowledge. Overconfidence was also shown for the calibration of future events, although the calibration for future events was somewhat better than for the general-knowledge items. In the context of SA, Lichacz et al. [15] involved 32 individuals in simulations, and collected the subjects’ SAGAT scores and confidence in their responses. The result showed that the participants had less over-confidence bias in level 3 compared to level 1 and 2 SAGAT answers. Unfortunately, no other study in SA context is found, and the results from previous studies are somewhat inconsistent. This paper aims to explore the nature of confidence bias across the present and future elements of SA and across individual and team missions. We hypothesize that there will be differences in the confidence bias across present and future status, and across individual and team missions. In addition, we also aim to investigate the relationship between absolute confidence bias and performance outcome. Based on the theoretical propositions described above, it is plausible to hypothesize that participants with lower absolute confidence bias would have better performance. In the followings, we will present two studies conducted to explore the concept of confidence bias. The first study was conducted in a simulated urban warfare environment, and the second study in a simulated air combat environment. We use the notion of “absolute confidence bias” to represent the difference between self appraisal and what one is actually aware of. A higher bias would be associated with lower SA. Zero confidence bias or a good calibration means good self-awareness, knowing what one knows and does not know.

Confidence Bias in Situation Awareness

319

2 Study 1 2.1 Method Task. An urban warfare was simulated using a multiplayer first-person shooting game, Counter-StrikeTM (CS). There were two opposing teams in CS, namely counterterrorists and terrorists. The mission’s objective was to eliminate the opponent while inflicting the least injury on one’s own team. The team members were assigned with the same responsibilities, i.e., to search and kill the opponent players. Participants. Thirty-two students with age ranging from 18 to 24 years (M = 20.38, SD = 1.70) participated in the study. Their CS experience ranged from 1 to 9 years (M = 3.86, SD = 2.12). They played individual and team missions. Four participants with matched experience levels were involved in an experimental session. Two participants played against each other in the individual missions, and two teams of two participants played against each other in the team mission. Assessment Tools. A paper-and-pen version of SAGAT was used to objectively assess participant’s knowledge of the current and future situation. Examples of SAGAT queries include location, weapon, activity, and health level of each player, identification of opponent that is of a higher threat, whether opponent is within his weapon reach, and prediction of what will happen within the next 10 seconds, e.g., projected location, direction of movement, activity, weapon reach, and health level of all players. The participants were also asked to rate their confidence level (high/low) on each answer given to the SAGAT queries. A score for confidence bias was calculated by the average confidence rating across all items minus the proportion of the same items (SAGAT queries) that were answered correctly [16]. An absolute score of confidence bias indicates the distance from a well-calibrated zero-value of self awareness. The performance outcome was assessed by the health level (survivability) and the number of enemy successfully killed. Set Up. Five sets of computers (four for participants and one for experimenter) with a network connection were used. The workstations were partitioned so that the participants could not see each other’s display. The participants within the same team communicated using headsets through SkypeTM. The game and communication were recorded using FrapsTM. Four small tables were placed behind the participants for them to answer the SA assessment sheets. Procedure. The participants were first introduced to the study and briefed on the procedure of the experiment. They were given about five minutes to familiarize with the map used in the game. Throughout the missions, the simulation was paused twice to administer the SA assessment. During the pause, the participants were asked to turn around and answer the questions, which were placed on small tables behind them. They took about 1-2 minutes to complete the questions, and then the simulation resumed.

320

K. Sulistyawati and Y.P. Chui

2.2 Result In this study, the participants were assigned randomly as either terrorists or counterterrorists. A one-way analysis of variance (ANOVA) was performed to determine whether the SA scores were independent of the side to which each subject was assigned. The analysis showed that none of the scores in individual and team missions was affected by the role assignment. The subsequent analyses will not differentiate between the two roles. The means and standard deviations of participants’ absolute confidence bias in individual and team missions are presented in Table 1. Table 1. Means and Standard Deviations for absolute confidence bias (N = 32)

Absolute confidence bias

Present Future

Individual M .15 .22

mission (SD) (.11) (.17)

Team mission M (SD) .14 (.09) .28 (.19)

Estimated Marginal Means of Abs Conf. Bias

A 2 by 2 within subjects ANOVA was performed to evaluate participant’s absolute confidence bias on present versus future status and on individual versus team missions. The result showed a significant difference between the absolute confidence bias for the present and future status, F(1,31) = 23.00, p < .001, where the participants were significantly more calibrated in how much they know about the present than future prediction (see Figure 1). There was no significant difference between the participant’s absolute confidence bias in individual and team missions, F(1,31) = 1.05, p = .31. The interaction effect was also not significant, F(1,31) = 2.16, p = .15. 0.30

Mission

0.28

individual mission

0.26

team mission

0.24 0.22 0.20 0.18 0.16 0.14 present

future

SA level Fig. 1. 2 x 2 ANOVA of absolute confidence bias (Study 1)

In this study, the participants played against each other, and thus the performance score was tied to the opponent participant who played in the same game. The existence of active resistance, i.e., the opponents directly trying to prevent the team from accomplishing its goals, can increase the uncertainty, where the outcome of a team does not

Confidence Bias in Situation Awareness

321

only depend on its capability and team processes, but also depends on the capability and processes of the opponent team. As the players’ experience levels within one game were balanced, the losing team in that game was not necessarily worse than the winner of the other game. Two observers identified four better and worse performers from the entire pool of subjects. Independent-samples t tests were conducted to evaluate whether the better and worse performers differ in their absolute confidence bias. In individual mission, the difference between the better and worse performers were significant, t(6) = - 3.27, p < .05. In team mission, the difference was marginally significant, t(6) = -.2.26, p = .06. The result suggested that participants with better performance were more calibrated in their SA (lower absolute confidence bias).

3 Study 2 3.1 Method Task. Air combat environment was simulated using a PC-based simulation game, Falcon 4.0. Two fighter pilots were involved in each experimental session. The mission’s objective was to sweep all adversaries along the assigned navigation route. An additional objective to reach a designated checkpoint at specific time was incorporated such that the participants would meet the pre-programmed enemy aircrafts at the targeted location. Participants. Sixteen military fighter pilots participated in the study. They age between 25 to 37 years (M = 29.43, SD = 3.08), with flight experience ranging from 390 to 2600 flying hours (M = 1091, SD = 530). They were assigned into eight teams of two, one as the flight lead and the other as the wingman. Assessment Tools. Similar to Study 1, a paper-and-pen version of SAGAT was used to assess participant’s knowledge of the current and future situation. Examples of SAGAT queries including determining location, altitude, heading, and airspeed of own aircraft; bearing, range, and altitude of enemy aircraft (SA level 1); determining whether the aircraft is within the enemy’s weapon envelope (SA level 2); predicting whether they will be in a position to take shots in the next 10 seconds (SA level 3). The participants were also asked to rate their confidence level (high/low) on each answer given to the SAGAT queries. A score of absolute confidence bias was derived as described in Study 1. The performance outcome was measured by the number of times the aircraft was shot down by the enemy (survivability). Set Up. Two simulator consoles and one control station with a network connection were used. A partition was placed in between the two simulator consoles to prevent the subjects from seeing each other’s screen during the team missions. A Thrustmaster® HOTAS Cougar Flight Controller, which is an almost exact replica of the flight control used in F-16 aircrafts, was used to control the simulation. SkypeTM was used to facilitate team communication, and FrapsTM to record the simulation. Procedure. After being introduced to the study, the participants were given adequate time to familiarize with the system. The actual missions were paused several times at

322

K. Sulistyawati and Y.P. Chui

specified trigger points for data collection. During the freeze, the participants were asked to turn around and answer the SA assessments placed on tables behind them. The participants took about 1 to 3 minutes to fill up the questions, depending on the number and type of questions asked in each freeze. At the end of the session, the participants were debriefed. 3.2 Result The means and standard deviations of participants’ absolute confidence bias in individual and team missions are presented in Table 2. Table 2. Means and Standard Deviations for absolute confidence bias (N = 16)

Absolute confidence bias

Individual mission M (SD) .10 .08 .11 .07 .24 .17

level 1 level 2 level 3

Team mission M (SD) .17 .15 .15 .17 .36 .21

A 2 x 3 within subjects ANOVA was performed on participants’ absolute confidence bias. The result showed significant main effect of the SA level, F(2,28) = 17.96, p < .001. The absolute confidence bias was significantly higher for level 3 than for level 1 and 2 SA (see Figure 2). The absolute confidence bias did not differ significantly across individual and team missions, F(1,14) = 3.58, p = .08. The interaction effect was also not significant, F(2,28) = .36, p = .70.

Estimated Marginal Means of Abs Conf. Bias

0.40

Mission

0.35

individual mission

0.30

team mission

0.25 0.20 0.15 0.10 1

2

3

SA level Fig. 2. 2 x 3 ANOVA of absolute confidence bias (Study 2)

Pearson product-moment correlation coefficients (one-tailed) between the absolute confidence bias and performance scores in individual and team missions were calculated. The results were significant in both individual mission, r = .63, p < .01, as well as team

Confidence Bias in Situation Awareness

323

mission, r = .68, p < .05, indicating that participants with better SA calibration (lower absolute confidence bias) had better survivability (less damage due to enemy shots).

4 Discussion The results from the two studies were consistent. The effect of SA levels on confidence bias was significant. The participants in both studies had higher confidence bias in their responses in the future (or level 3 SA) compared to the present situation (or level 1 and 2 SA). In other words, they were more calibrated on what they thought they knew about the present than the future situation. This contradicts to the similar study by Lichacz et al. [15], who reported that the participants had less confidence bias in level 3 compared to level 1 and 2 SAGAT answers. However, this is somewhat inline with Lichtenstein et al. [14], who reported that the confidence bias was higher for difficult compared to easy items. SA probes of the present situation required the participants to report specific information such as current location, flight parameter, and location of enemy. The participants usually knew when they did not report this information accurately and reported a low confidence when they were unsure on the accuracy of their answer. On the other hand, it was harder to gauge whether their prediction of what is going to happen would be correct or not, thus resulting in a poorer judgment on the accuracy of this future knowledge. The impact of mission type (individual or team) on confidence bias was not significant in both studies. Although the effect was not statistically significant, Figure 2 suggests that we cannot totally dismiss this factor. As things get more complicated in team mission, due to the presence of teammate and more enemies, there might be a chance that the confidence bias in team mission would be higher than in individual mission. Further studies to explore this issue are warranted. With respect to performance outcome, both studies suggested that better performing individuals and teams had significantly better calibration on what they knew (lower confidence bias), which was consistent to the hypothesis. When people are over-confident, they will prematurely close off the search of evidence, feeling that they “know the truth”, and they will less likely to seek additional needed information, and confidently making decision and taking action that are prone to errors. Overconfidence of own ability is dangerous as one would bear to take higher risk, thinking that they are doing better than they really are, and does not hesitate to take aggressive approach when the situation indeed should be handled with greater caution. For instance, as observed in the second study, some of the fighters over-confidently continued with enemy engagement as they thought they knew the situation well, which indeed they did not. This poor decision resulted in low survivability (being shot by the enemy). A less over-confident pilot would rather abort the engagement and planning for retarget. On the other hand, people who are under-confident mainly showed low situational knowledge and that the correctly guessed answers to SAGAT questions were mainly due to luck. Under-confidence can make people to be more cautious and possibly hesitant to engage decisions and actions, resulting in less chance to win the mission. Finally, we also noted that the distribution of confidence bias in Study 1 ranged from under-confident to over-confident, reflecting that some participants felt less confident on what they knew while others thought that they knew more that they actually

324

K. Sulistyawati and Y.P. Chui

did. Some participants, who were not sure on the answers to SAGAT queries, were allowed to write down their guess and asked to rate “low” in the respective confidence level. As a result, under-confidence bias partly represents the number of guessed answers given by the participants. In Study 2, however, we found that the pilots were rarely under-confident on their responses, which can be associated back to the nature of pilots who tend to be highly confident towards what they know and can do. This finding provided us some insights on how job nature and personality might have an impact on confidence bias. The individual differences might also be the reason for the inconsistent findings between the two studies reported here and the study by Lichacz et al. [15]. In summary, this paper provided a significant contribution to the fundamental theory of SA, specifically on the confidence bias towards own situational knowledge, which has not been much researched so far. With the significant impact of confidence bias on the performance outcome, future studies to better understand this concept are necessary.

References 1. Endsley, M.R.: Toward a theory of situation awareness in dynamic systems. Human Factors 37, 32–64 (1995) 2. Endsley, M.R.: Situation awareness global assessment technique (SAGAT). In: Proceedings of the IEEE National Aerospace Electronic Conference, pp. 789–795 (1988) 3. Durso, F.T., Hackworth, C.A., Truitt, T., Crutchfield, J., Manning, C.A.: Situation awareness as a predictor of performance in en route air traffic controllers. Air Traffic Quarterly 6, 1–20 (1998) 4. Hogg, D.N., Folleso, K., Strand-Volden, F., Torralba, B.: Development of a situation awareness measure to evaluate advanced alarm systems in nuclear power plan control rooms. Ergonomics 38, 2394–2413 (1995) 5. Taylor, R.M.: Situational awareness rating technique (SART): the development of a tool for aircrew systems design, in Situational Awareness in Aerospace Operation (AGARDCP-478). NATO-AGARD, Neuilly Sur Seine, France (1990) 6. Waag, W.L., Houck, M.R.: Tools for assessing situational awareness in an operational fighter environment. Aviation, Space, and Environmental Medicine 65, A13–A19 (1994) 7. McGuinness, B., Foy, L.: A subjective measure of SA: The Crew Awareness Rating Scale (CARS). In: Human Performance, Situation Awareness and Automation Conference. Savannah, Georgia (2000) 8. Endsley, M.R., Selcon, S.J., Hardiman, T.D., Croft, D.G.: A comparative analysis of SAGAT and SART for evaluations of situation awareness. In: Proceedings of the 42nd Annual Meeting of the Human Factors and Ergonomics Society, pp. 82–86. Human Factors and Ergonomics Society, Santa Monica (1998) 9. Endsley, M.R., Sollenberger, R., Stein, E.: Situation awareness: A comparison of measures. In: Human Performance, Situation Awareness and Automation: User Centered Design for the New Millennium Conference. SA Technologies, Inc., Savannah, GA (2000) 10. Alfredson, J.: Differences in Situational Awareness and How to Manage Them in Development of Complex Systems. Linkoping University, Linkoping (2007) 11. Nofi, A.A.: Defining and Measuring Shared Situation Awareness. Center for Naval Analyses (2000)

Confidence Bias in Situation Awareness

325

12. Endsley, M.R.: Predictive utility of an objective measure of situation awareness. In: Proceedings of the Human Factors Society 34th Annual Meeting, pp. 41–45. Human Factors and Ergonomics Society, Orlando (1990) 13. Fischoff, B.: Perceived informativeness of facts. Journal of Experimental Psychology: Human Perception and Performance 3, 349–358 (1977) 14. Lichtenstein, S., Fischhoff, B., Phillips, L.D.: Calibration of probabilities: the state of the art to 1980. In: Kahneman, D., Slovic, P., Tversky, A. (eds.) Judgement under Uncertainty: Heuristics and Biases, pp. 3–20. Cambridge University Press, New York (1982) 15. Lichacz, F.M.J., Cain, B., Patel, S.: Calibration of confidence in situation awareness queries. In: Proceedings of the 47th Annual Meeting Human Factors and Ergonomics Society, Santa Monica, CA, pp. 222–226 (2003) 16. McGuinness, B.: Quantitative analysis of situational awareness (QUASA): Applying signal detection theory to true/false probes and self ratings. In: 9th International Command and Control Research and Technology Symposium: The Power of Information Age Concepts and Technologies, San Diego, California (2004)

Tactical Reconnaissance Using Groups of Partly Autonomous UGVs Peter Svenmarck, Dennis Andersson, Björn Lindahl, Johan Hedström, and Patrik Lif Swedish Defence Research Agency (FOI) Box 1165, SE-58111 Linköping, Sweden {peter.svenmarck,dennis.andersson,bjorn.lindahl, johan.hedstrom,patrik.lif}@foi.se

Abstract. This paper investigates how one operator can control a multi-robot system for tactical reconnaissance using partly autonomous UGVs. Instead of controlling individual UGVs, the operator uses supervisory control to allocate partly autonomous UGVs into suitable groups and define areas for search. A state-of-the-art pursuit-evasion algorithm then performed the detailed control of available UGVs. The supervisory control was evaluated by allowing subjects to control either six or twelve UGVs for tactical reconnaissance along the route of advance for a convoy traveling through an urban environment with mobile threats. The results show that increasing the number of UGVs improve the subjects situation awareness, increase the number of threats that are detected, and reduce the number of hits on the convoy. More importantly, these benefits were achieved without any increase in mental workload. The results support the common belief in autonomous functions as an approach to reduce the operatorto-vehicle ratio in military applications. Keywords: Supervisory Control, UGV, Operator-to-Vehicle Ratio, Reconnaissance, Multi-Robot Systems.

1 Introduction Unmanned robotic systems are increasingly used in military operations to reduce the risk for military personnel. Typically, unmanned robotic systems are used for strategic intelligence, surveillance, and recognizance (ISR). There is, however, a recent interest in also using unmanned systems for tactical situations where manned and unmanned systems operate together as a team [1], [2]. For example, unmanned robotic systems may provide critical information while the manned systems remain in cover or outside of lethal range. In urban combat, this may mean that unmanned ground vehicles (UGVs) are used to peek around corners or survey areas where the opponent may approach [3]. Tactical reconnaissance is especially important in urban environments where the defender usually have an advantage. Each robotic system in military applications is often controlled by a single or a few operators to avoid problems of information overload. This operator-to-vehicle ratio hampers the benefit of robotic systems in military applications, where ideally one D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 326–335, 2009. © Springer-Verlag Berlin Heidelberg 2009

Tactical Reconnaissance Using Groups of Partly Autonomous UGVs

327

operator should be able to control a multi-robot system without being overloaded [4]. Simply allowing the operator to control multiple robots is usually not sufficient since the operator quickly become overload even when only performing a basic navigation task [5]. Introducing additional robots may in fact decrease the target detection performance in an urban search and rescue task [4], or decrease the operator’s monitoring performance [6]. One approach to reduce the operator-to-vehicle ratio is to develop better interfaces that allow operators to more effectively switch attention between robotic systems. A better interface may improve performance by supporting interruption management when switching between robots [7], executing the switch [8], or regaining situation awareness since the last interaction with the robot [9]. Another approach to reduce the operator-to-vehicle ratio is to introduce autonomous functions that can perform low-level control tasks without operator intervention. For example, Crandall et al. [10] show how increasing levels of autonomy for navigation can drastically increase the time between operator interventions. The higher levels of autonomy for navigation in the form of path planning allow better utilization of UGVs for reaching inspection points [11]. Preferably, the autonomous function should be able to coordinate multiple robots instead of merely using individual autonomy for each robot [12]. The operator can then focus on supervisory control by initiating the autonomous function in a way that is appropriate for the situation and allow the robots to perform the low level synchronization and coordination. For example, Wang and Lewis [12] show how mixed initiative control of three partly autonomous search and rescue robots resulted in higher performance than both manual and fully automated control. Although the robots autonomously avoided duplicating search efforts, the subjects could intervene any time they preferred and assume manual control or redirect a robot. Similarly, Parasuraman et al. [13] found that delegation-type interfaces, where subjects can choose manual control or autonomous modes, improve performance in the RoboFlag simulation environment compared to using only manual control or autonomous modes. In their study, the subjects controlled between four and eight robots using the delegation-type interface. Please see Wang and Lewis [12] for further examples of supervisory control of multi-robot systems. The results by Wang and Lewis [12] and Parasuraman et al. [13] are promising for reducing the operator-to-vehicle ratio. However, more studies are needed to evaluate supervisory control of multi-robot systems in military applications. Particularly, regarding how interdependencies between levels of control tasks affect the supervisory control of partly autonomous functions. For example, Hollnagel [14] and Hollnagel and Woods [15] characterize the task complexities facing human-robot systems as requiring four hierarchical layers of control where (1) tracking keeps the system within predetermined performance boundaries, (2) regulating achieves short-term goals, such as specific maneuvers, (3) monitoring of the system state relative the environment for initiation of action or goal setting for the tracking layer, and finally (4) targeting for achieving mission goals. Additionally, there are interdependencies between the levels of control where tracking may provide input that is necessary for the situation assessment required for targeting. At all levels there are also, typically, interdependent tasks for control of the robot and the robot’s sensors. An understanding of these interdependencies is important to avoid problems of automation surprises and out-of-the-loop performance when partly autonomous robotic functions are introduced. Therefore, the poorer performance with higher levels of autonomy for navigation of ground robots

328

P. Svenmarck et al.

may be due to that it is difficult to find a suitable division of subtasks [12]. Further, Wang and Lewis [12] discuss how tasks that decompose more readily into weakly related subtasks may be more suitable for higher levels of autonomy. For example, search tasks which may be partitioned into navigation and perceptual subtasks. Similarly, automated systems are usually most beneficial for those aspects that do not require much interaction between the human and machine [16]. The present study investigates how the supervisory control of multi-robot systems can be applied to tactical reconnaissance using groups of partly autonomous UGVs. The study was mainly intended to evaluate how an autonomous search function affects the operator-to-vehicle ratio since search may be a suitable task for higher levels of autonomy. Increasing the number of partly autonomous vehicles should therefore provide additional functionality without significantly changing the operator’s task. At least when there are weak interdependencies between the operator’s task and the autonomous search function.

2. Method 2.1 Participants Twelve paid subjects, ten male and two female students from Linköping University, participated in the study. All subjects had normal or corrected to normal vision. 2.2 Apparatus A PC workstation with dual Intel® Core 2 Duo CPU 2.66 GHz and 2.0 GB Ram with a 20" TFT monitor was used to run the simulation in the Man-System-Interaction laboratory at FOI, Linköping, Sweden. A standard keyboard and mouse were used to control the simulator interface. 2.3 Design and Stimuli A simulated urban environment was used to create similar types of reduced line-ofsight as in military operations in urban terrain (MOUT). The task was to escort a convoy of 15 vehicles that traveled through the urban environment along a path that was about 1.7 km long. Since the convoy traveled at about 18 km/h, the total time until the last convoy unit arrived at the destination was about 5 ½ minutes. The interface allowed the subjects to manage the groups of UGVs, monitor the progress of the convoy, view the position of the UGVs, designate search areas, and view the position of threats within the UGVs detection range. Along the convoy’s route of advance, there were 28 mobile threats which moved in fixed paths that either crossed or came near the convoy’s route of advance. However, the participants were only informed that there were mobile threats in the area, but not exactly where nor that they moved along fixed paths. Once a convoy unit was within firing range of a threat it received a hit. The convoy unit’s icon then changed color to indicate that the convoy was attacked. The threat then continued to fire on additional nearby convoy units until it was outside of firing range or the convoy units were in

Tactical Reconnaissance Using Groups of Partly Autonomous UGVs

329

cover from urban structures. The possibility for multiple hits was intended to make the convoy sensitive to the total exposure time of a threat. Participants had either six or twelve partly autonomous UGVs available for tactical reconnaissance along the convoy’s route of advance. Further, a state-of-the-art pursuit-evasion algorithm was implemented to enable the autonomous search function of the UGVs [17]. The algorithm was based on a representation where the urban environment was collapsed into convex cells where any threat is within detection range and line-of-sight of a UGV within the cell. This makes the algorithm applicable to large complex environments, as in the present study. The convex cells were derived from a node link network that describe acceptable trajectories. From a start node, nodes within line-of-sight and the UGVs’ detection range, were collapsed into a convex cell. This process was repeated once a node was outside the convex cell until all nodes had been investigated. The pursuit-evasion algorithm was designed to minimize the expected time of capturing the evader rather than guaranteeing capture. A short time of capture was important since the convoy does not wait for the area to be cleared of threats. Initially, it was assumed to be an equal probability that the evader was in any convex cell. The probabilities were then updated depending on how the evader could move between adjacent cells. The most suitable destination for a UGV was selected using a heuristic that minimizes the expected entropy cost over five consecutive cells. This heuristic had the shortest time of capture in the test by Hollinger et al. [17]. An A* path finding algorithm was then used to navigate the UGV to the destination. The autonomous function enabled participants to define areas for search along the convoy’s route of advance. Groups of UGVs could thereby search user-defined areas without intervention. At the start of a trial, the UGVs were positioned near the convoy’s starting position and not allocated to any group. The subjects could then allocate UGVs into suitable groups using a pane on the left side of the screen by dragging icons of the UGVs. A maximum of six color coded groups where available in both experimental conditions, irrespective of the number of available UGVs. The UGVs could be reallocate between groups anytime during a trial. The groups of UGVs were controlled by selecting a group in the control pane, designating a search area on the map by drawing a search area box, and then confirming the search by hitting the spacebar. The convoy started moving when the first search area was confirmed. A search area box could be drawn anytime during a trial. However, the UGVs only went to the new search area after they have reached their destination. Some care was therefore need when choosing search areas to avoid unnecessary movement by the UGVs over long distances. Any time a threat was within detection range of an UGV, the threat’s position was shown on the map. Further, the UGV’s icon in the control pane changed to a red color, and an auditory signal was presented. The stand-off detection of threats may seem far fetched but technologies are currently being developed for detection of concealed weapons and explosives from more than 10 m. All the currently detected threats could be neutralized by clicking on the threats icon on the map using the right mouse button. The threats then disappeared. However, a new threat was introduced using the same fixed path after some delay. The delay was set so that the threat could approach the convoy again before it had passed. A threatening area could therefore not be left unsupervised until after the convoy had passed. Although neither the

330

P. Svenmarck et al.

behavior of the convoy or threats, nor the UGVs’ autonomous search were optimal in any way, the overall task appears to be representative of the asymmetric threats that are common in current conflicts. The design included controlling either six or twelve UGVs with the order of presentation counterbalanced between participants. Each number of UGVs were controlled twice in two blocked trials. Thus, the experiment used a within-subjects design with two number of UGVs (six or twelve) and two trials, were the trials were nested within the number of UGVs. Finally, after completing each trial, the participants completed three questionnaires for mental workload using the National Aeronautics and Space Administration (NASA) Task Load Index (TLX) [18], situation awareness using the three dimensional (3-D) Situation Awareness Rating Technique (SART) [19], and trust in the autonomous search in terms of overall trust and three components of trust, that is predictability, capability, and reliability [20][21]. All questions were rated using a 7-point scale. 2.4 Procedure The participants where informed about the experiment’s purpose, and allowed to ask clarifying questions. They were then instructed on how to use the keyboard and mouse to control presentation of the map, allocate UGVs into groups, designate search areas, monitor the progress of the convoy, and neutralize the threats. All participants were then trained for about 15 minutes in two scenarios with first six and then twelve UGVs available. Participants were then allowed to view a printed map of the area and encouraged to reflect on how to use the available UGVs, before beginning a trial. Since the experiment was fairly fast-paced, the participants where encourage to draw all initial search area boxes first and then confirm them one by one.

3 Results A repeated measures ANOVA shows that the convoy received significantly more hits when the subjects only controlled six UGVs compared to when they controlled twelve UGVs (F(1, 11) = 18.6, p < 0.01). Figure 1 shows that convoy received a mean of 233 hits with six UGVs and 176 hits with twelve UGVs. The convoy thus received about 24 % fewer hits when subjects controlled twelve UGVs instead of only six UGVs. The high number of hits shows that the subjects were not consistently able to neutralize the threats before they approached the convoy. However, the subjects were still reasonably successful since the convoy could receive as many as 496 hits if no threats were neutralized. The improved performance with twelve UGVs was primarily due to that the subjects detected more threats (F(1, 11) = 12.3, p < 0.01), and more importantly neutralized more threats (F(1, 11) = 76.1, p < 0.001). A mean of 186 threats were detected with six UGVs and 235 threats with twelve UGVs. There was also a significant difference in the number of detected threats during the first and second trial (F(1, 11) = 5.59, p < 0.05). A mean of 223 threats were detected during the first trial and 199 threats during the second trial. The subjects may therefore have used a slightly less efficient strategy during the second trial, although the effect was week. A mean number of 46 threats were neutralized with six UGV and 61 threats

Tactical Reconnaissance Using Groups of Partly Autonomous UGVs

331

260 250 240 230

Number of Hits

220 210 200 190 180 170 160 150 6

12 Number of UGVs

Fig. 1. Mean number of hits on the convoy with six and twelve UGVs. The bars denote ± Standard Errors.

with twelve UGVs. The 33 % increase in the number of neutralized threats shows that the subjects had the capacity to cope with the additional threats that were detect when twelve UGVs were available. The higher number of detected threats relative to neutralized threats was due to that the threats often only appeared with the UGVs detection range or remained within line-of-sight for a short period of time before they disappeared. Therefore, the subjects often did not have enough time to neutralize the threat. This was particularly prevalent when the convoy traveled through a market area where there were many small structures that provided opportunities for cover. Further, it was advantageous to focus the attention on threats that posed or was about to pose a threat to the convoy. The analysis of how the subjects used the search areas shows that there is a significant difference in the average number of search areas that were used simultaneously (F(1, 11) = 9.94, p < 0.01). A mean number of 5.2 areas were used simultaneously when using six UGVs and 5.6 areas were used simultaneously when using twelve UGVs. The subjects therefore used slightly less groups with six UGVs available. With twelve UGVs available, the six predefined groups were generally all in use. No significant difference was found for the number of defined search areas. A mean of 18.7 areas were defined. This shows that the subjects were overall quite active in changing the search areas. Neither was there any significant difference in the average size of the search areas. The mean size of the search areas was 13.263 m2. However, there was a significant difference in the average time that the search areas were used

332

P. Svenmarck et al.

(F(1, 11) = 6.09, p < 0.05). The mean time for using a search area was 101 seconds when using six UGVs and 120 seconds when using twelve UGVs. Further, there was a significant difference in the standard deviation of the time for using the search areas (F(1, 11) = 22.6, p < 0.001). The mean standard deviation was 50.1 seconds with six UGVs and 59.9 seconds with twelve UGVs. This shows that the subjects had slightly more differentiation in the time for using the search areas when twelve UGVs were available. There was also a significant difference in the standard deviation of the time for using the search areas between trials (F(1, 11) = 6.50, p < 0.05). The mean standard deviation was 58.0 seconds in the first trial and 52.1 seconds in the second trial. The subjects were therefore slightly more consistent in the time for using the search areas during the second trial. Naturally, the subjects allocated less UGVs to the search groups when they only controlled six UGVs compared to when they controlled twelve UGVs (F(1, 11) = 867, p < 0.0001). The mean numbers of UGVs in the search groups were 1.14 with six UGVs and 2.06 with twelve UGVs. However, there was also a significant difference in the standard deviation of the number UGVs in the search groups (F(1, 11) = 10.8, p < 0.01). A mean standard deviation of 0.40 was used with six UGVs and 0.70 with twelve UGVs. This shows that the subjects generally allocated the UGVs evenly to the search groups when six UGVs were available. When twelve UGVs were available, the subjects had slightly more differentiation in allocating the UGVs to the search groups although they commonly used two UGVs in the search groups. Further, the were no significant differences in the number of reallocation of UGVs between search groups. A mean of 4.65 number of reallocations were performed. However, the standard deviation of the number of reallocations was fairly large at 8.17. This shows that some subjects performed quite many reallocations. Only two subjects did not perform any reallocations. These overall statistics of the number of reallocations provide further support for that the subjects had sufficient capacity to cope with the additional number of UGVs since the reallocations would probably have been reduced if the subjects were overloaded. The analysis of the subjective measurements are consistent with the results from the objective measurements. Overall subjective mental workload was computed by averaging the six NASA-TLX subscales and submitted to a repeated measures ANOVA. There were no significant differences in subjective mental workload when using six and twelve UGVs. The mean overall subjective mental workload was 4.5 which can be characterized as a moderate mental workload. The subscales for mental demand and time pressure received the highest mean ratings with 5.15 and 5.02, respectively. The subscale physical demand received the lowest mean rating with 3.77. Further, there was a significant difference in subjective situation awareness when using six and twelve UGVs (F(1, 11) = 11.5, p < 0.01). The overall subjective situation awareness was 4.88 with six UGVs and 5.71 with twelve UGVs. The higher situation awareness with twelve UGVs shows that the subjects had the capacity to cope with the additional threats that were detected without being overloaded. However, since the scale for the computed overall situation awareness range from -5 to 13, there is still considerable room for improvement. Finally, there were no significant differences in the overall trust in the autonomous search function when using six and twelve UGVs. The mean overall trust was 4.48. The result shows that the subjects had a moderate trust in the autonomous search function irrespective of the number of UGVs available.

Tactical Reconnaissance Using Groups of Partly Autonomous UGVs

333

Table 1. Standardized beta weights, t-values, and significance levels for a multiple regression analysis of trust

Component Capability Unexpected behavior Reliability Predictability

Beta 0.735 0.274 0.257 0.127

t(43) 7.45 2.62 2.48 1.25

p-level 0.000 0.012 0.017 0.219

A multiple regression analysis for how the subjects’ trust ratings are predicted by the components of trust show that the components explain about 62 % of the variance in trust (R = 0.79, R2 = 0.62, F(4, 43) = 17.7, p < 0.0001). Table 1 shows that capability was the best predictor of trust as in many other studies of trust in automated functions [22]. It is unclear why the two predictability components were positively related to trust when they usually lower trust.

4 Discussion The results show that for the tactical reconnaissance task used in the present study, increasing in the number of partly autonomous robots does not significantly change the operator task. Increasing the number of robots only provide benefits in the form of better situation awareness from detection and neutralization of more threats, which reduce the number of hits on the convoy. These benefits of increasing the number of robots were achieved without any significant effects on the subjects’ mental workload nor their behavior when interacting with the simulator. For example, the number of reallocations between search groups would probably be reduced if the subjects were overloaded when controlling more robots. Clearly, increasing the number of robots in the search groups does not significantly change the operator’s task since the autonomous search function manages the additional control requirements. The lack of effect on the operator’s task is not entirely surprising since the number of available search groups was the same in both conditions. However, the results show that autonomous functions can reduce the operator-to-vehicle ratio in military applications of robotic systems. The positive effect of the autonomous function in the present study was only possible since the subjects largely perceived their task as independent from the autonomous function. Overall, the rater large search areas and uniform distribution of the number of UGVs in the search groups show that the subjects have difficulty in adapting to an intricate urban environment. More experienced subject would probably attempt a more detailed control which would have increased the relationship between subtasks and reduce the effects of the autonomous function for improving the operator-to-vehicle ratio. Therefore, the strength of the relationship between subtasks depends on both the subjects’ level of expertise and the formal properties of the task as in the study by Wang and Lewis [12]. The operator’s decision to direct, or cede or regain initiative when using autonomous functions is from a theoretical perspective partly based on a judgment of both

334

P. Svenmarck et al.

the operator’s own and the autonomous functions’ capacity for coping with future task demands. However, more research is needed on how to analyze interdependencies between subtasks and what the decision actually means in terms of control requirements for conceptual autonomous systems. For example, Wang et al. [23] show how the coordination demands can be estimated from data of user interaction with robots. Empirical investigations of operator interaction with autonomous systems will, however, remain important until more complete theories are developed. Additionally, such theories can be used to estimate the optimal interaction with partly autonomous robots. Crandall and Cummings [24] provide one example of how the optimal interaction can be estimated for individual autonomous robots. The optimal interaction can then be used for comparison with actual operator behavior, as well as to develop decision support systems for guiding the operator’s control decisions. An evaluation of successful and less successful strategies in the present study may provide some information about the optimal interaction. Further studies may also investigate interfaces for mixed-initiative control. For example, by allowing different degrees of freedom in using free designation of search areas and fixed search areas along the convoy’s route of advance. Future studies will also investigate the effects of using advanced autonomous robotic surveillance functions in a fictive layout of the Swedish Camp in Afghanistan.

References 1. Barnes, M.J., Cosenzo, K.A., Mitchell, D.K., Chen, J.Y.C.: Human robot teams as soldier augmentation in future battlefields: an overview. In: Proceedings of HCI International 11th International Conference on Human Computer Interaction, Las Vegas, NV (2005) 2. Svenmarck, P.: Principles of human-robot coordination for improved manned-unmanned teaming. FOI Memo 1507. FOI – Swedish Defence Research Agency, Linköping, Sweden (2005) 3. Lif, P.J., Jander, H., Borgvall, J.: Tactical evaluation of an unmanned ground vehicle during a MOUT exercise. In: Proceedings of the 50th Annual Meeting Human Factors and Ergonomics Society, pp. 2557–2561. HFES, San Francisco (2006) 4. Chadwick, R.: Multiple robots and display views: an urban search and rescue simulation. In: Proceedings of the 49th Annual Meeting Human Factors and Ergonomics Society, pp. 387–391. HFES, Orlando (2005) 5. Lif, P., Hedström, J., Svenmarck, P.: Operating multiple semi-autonomous UGVs: target detection, strategies, and instantaneous performance. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS, vol. 4562, pp. 731–740. Springer, Heidelberg (2007) 6. Chadwick, R.A.: Operating multiple semi-autonomous robots: monitoring, responding, detecting. In: Proceedings of the Human Factors and Ergonomics Society 50th Annual Meeting, pp. 329–333. HFES, San Francisco (2006) 7. Trouvain, B., Schlick, C.M.: A comparative study of multimodal displays for multirobot supervisory control. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS, vol. 4562, pp. 184–193. Springer, Heidelberg (2007) 8. Crandall, J.W., Cummings, M.L., Nehmez, C.E.: A predictive model for human-unmanned vehicle systems. Journal of Aerospace Computing, Information, and Communication (submitted, 2008)

Tactical Reconnaissance Using Groups of Partly Autonomous UGVs

335

9. Goodrich, M.A., Quigley, M., Cosenzo, K.: Task switching and multi-robot teams. In: Parker, L.E., Schneider, F.E., Schultz, A.C. (eds.) Multi-Robot Systems. From Swarms to Intelligent Automata. Proceedings from the 2005 International Workshop on Multi-Robot Systems, The Netherlands, vol. III. Springer, Heidelberg (2005) 10. Crandall, J.W., Goodrich, M.A., Olsen, D.R., Nielsen, C.W.: Validating human-robot interaction schemes in multi-tasking environments. IEEE Transactions on Systems Man, and Cybernetics, Part A: Systems and Humans 35, 438–449 (2005) 11. Trouvain, B., Wolf, H.L., Schneider, F.E.: Impact of autonomy in multirobot systems on teleoperation performance. In: Schultz, A.C., Parker, L.E., Schneider, F.E. (eds.) MultiRobot Systems: From Swarms to Intelligent Automata. Proceedings from the 2003 International Workshop on Multi-Robot Systems, vol. II. Kluwer, Dordrecht (2003) 12. Wang, J., Lewis, M.: Human control for cooperating robot teams. In: Proceeding of the Second ACM/IEEE International Conference on Human-Robot Interaction, HRI 2007. IEEE, Los Alamitos (2007) 13. Parasuraman, R., Galster, S., Squire, P., Furukawa, H., Miller, C.: A flexible delegationtype interface enhances system performance in human supervision of multiple robots: empirical studies with roboflag. IEEE Systems, Man and Cybernetics-Part A, Special Issue on Human-Robot Interactions 35, 481–493 (2005) 14. Hollnagel, E.: Analysis of UAV scenarios using the extended control model. Final report NATO Working Group HFM-078-017, Uninhabited Military Vehicles (UMVs): Human Factors Issues in Augmenting the Force (2005) 15. Hollnagel, E., Woods, D.D.: Joint Cognitive Systems: Foundations of Cognitive Systems Engineering. CRC Press, Boca Raton (2005) 16. Sarter, N.B., Woods, D.D., Billings, C.E.: Automation surprises. In: Salvendy, G. (ed.) Handbook of Human Factors and Ergonomics. Wiley, New York (1997) 17. Hollinger, G., Kehagias, A., Singh, S.: Probabilistic strategies for pursuit in cluttered environments with multiple robots. In: Proceedings of the 2007 IEEE International Conference on Robotics and Automation, ICRA 2007, Roma, Italy, pp. 3870–3876 (2007) 18. Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, pp. 139–183. Elsevier Science/North Holland, Amsterdam (1988) 19. Taylor, R.M.: Situational awareness rating technique (SART): the development of a tool for aircrew systems design. In: Situational Awareness in Aerospace Operations. NATOAGARD, Neuilly Sur Seine, France, AGARD-CP-478, pp. 3/1–3/17 (1990) 20. Muir, B.M.: Trust between humans and machines, and the design of decision aids. International Journal of Man-Machine Studies 27, 527–539 (1987) 21. Muir, B.M.: Trust in automation: Part I. theoretical issues in the study of trust and human intervention in automated systems. Journal of Ergonomics 37, 1905–1922 (1994) 22. Muir, B.M., Moray, N.: Trust in automation. Part II: experimental studies of trust and human intervention in a process control simulation. Ergonomics 39, 429–460 (1996) 23. Wang, J., Wang, H., Lewis, M.: Assessing cooperation in human control of heterogeneous robots. In: Proceedings of the Third ACM/IEEE International Conference on HumanRobot Interaction, HRI 2008, pp. 9–16. IEEE, Los Alamitos (2008) 24. Crandall, J.W., Cummings, W.L.: A predictive model for human-unmanned vehicle systems. Report No. HAL 2008-5. MIT, Cambridge (2008)

Use of High-Fidelity Simulation to Evaluate Driver Performance with Vehicle Automation Systems Timothy Brown, Jane Moeckli, and Dawn Marshall National Advanced Driving Simulator 2401 Oakdale Blvd, Iowa City, IA (tbrown,jmoeckli,marshall)@nads-sc.uiowa.edu

Abstract. Automation is an important tool for improving driver safety over the coming decades. Vehicle automation will tend to be implemented in stages with the intent of incrementally increasing the overall safety of driving through the reduction in crashes related to driver error. Driving simulators play a critical role in assessing the effectiveness of these new technologies. This paper discusses vehicle automation and provides several examples of the use of high fidelity simulators to evaluate new automation technologies in several different forms.

1 Introduction Automation can be an important tool for vehicle designers in augmenting the capabilities of drivers to improve overall safety or efficiency. From an engineering perspective, automation is the best design solution when the human is incapable of performing the task or when automation can perform the task more safely or efficiently than the human operator can [1]. Special care in the design and implementation of automation must be taken to insure that the overall safety of the human-machine system is not reduced due to the addition of the automation. Automation, by definition, shifts the role of the human from operator to an increasingly supervisory position; the resulting transformation provides a challenging change in the relationship between the human and the system being controlled. Care must be taken in the allocation of function to carefully balance the assigned roles between the supervisor-human and the automated system [2]. Attention must also be paid to situations in which the operator is required to intervene rapidly when the automation fails. Keeping in mind the inherent limitations of the human operator, automation failure lends itself to situations where the operator is unable to accurately perceive the situation and respond quickly enough to prevent more catastrophic system failure. When considering the fundamental purposes of automation in improving system safety and efficiency, there are clear benefits that can be realized in areas such as driving. The introduction of automation to vehicles has been studied for many years. Janssen et al. [3] have previously defined five stages in the evolution of automation in vehicle control, starting with navigational and longitudinal control at stage 1 and ending with full automation in stage 5. They, however, also warn that changes in driver behavior in response to increasing automation can ultimately reduce the potential D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 339–348, 2009. © Springer-Verlag Berlin Heidelberg 2009

340

T. Brown, J. Moeckli, and D. Marshall

safety gains that might be realized. As such, careful testing from the perspective of the combined human-machine system is necessary to fully understand the implications of each type of automation.

2 Vehicle Automation Systems The concept of vehicle automation is not new and extensive prior research has already been undertaken. Work on vehicle automation was first presented to the public during the 1939 World’s Fair in New York by General Motors (GM) [4]. Additional development work completed by GM on the driverless vehicle by the early 1960s demonstrated utility for robotic trucks. More recent studies conducted in the 1990s as part of the National Automated Highway System Consortium on full automation examined a number of critical issues associated with the automation of traffic [5]. Concurrent to these efforts, simulator studies examined the human factors issues surrounding use of automated lanes, including complacency and the entering and exiting of these dedicated lanes [6-8]. Although these research efforts showed that there was promise to vehicle automation, the research program ended in 1998. Since then, continuing research has shifted away from traffic automation to a greater focus on individually automated vehicles based on technology that could control the speed and headway of the vehicle, and other technology that could track the lane and maintain lane position. Research on these topics has included new advances into Adaptive Cruise Control[9]. Although much research has already been undertaken toward developing and deploying these systems, many research questions still need to be answered to allow continued development. Janssen et al. [3] have defined one approach to classifying vehicle automation in terms of the stage of deployment that provides a useful framework for discussion. Specifically, they envision a process where development progresses from one stage to the next with the stages defined in Table 1. Although the stage approach proposed by Janssen et al. provides a logical approach to how automation is likely to be deployed in vehicles, it is clear that much of the development work has been undertaken concurrently. Some automated vehicle systems have already been deployed. They include such Stage 1 systems as Adaptive Cruise Control (ACC) that has been introduced into the vehicle fleet by companies such as Toyota, Nissan and General Motors; and navigation systems (including real-time traffic information) that have been deployed aftermarket by companies such as Garmin and Tom-Tom. Other systems starting to be deployed include Stage 2 technologies such as the SAVE-IT [10] concept by Delphi which monitors the driver and the environment to mitigate changes in driver state, as well other systems such as the Volvo Driver Alert Control and Lane Departure Warning. Even now, Stage 3 systems designed to control driver position in the lane using vision-based sensors are under development both independently and as an offshoot of lane departure warning systems. System designs for future developments in dedicated lanes and full automation have been ongoing since the 1990s [8, 11], but much more work is needed before their infrastructural and operational constraints can be sufficiently addressed.

Use of High-Fidelity Simulation to Evaluate Driver Performance

341

Table 1. Stages of Automation [3]

Stage 1 2 3 4 5

Description Navigation and Longitudinal Control Integrated Systems as Co-Driver Extension towards Lateral Control Dedicated Lanes Full Automation

Another classification scheme that can be used in parallel with Janssen et al.’s stages of automation defines three broad approaches to automation: augmented perception, augmented vehicle control, and overall vehicle control. Augmented perception is the foundation for most of the technologies that are used to provide control to the vehicle. The general approach is to use sensors to monitor the environment and provide alerts to the driver concerning situations of which the driver may not be aware such as the vehicle approaching the lane boundary or a braking lead vehicle. Augmented vehicle control is the vehicle providing assistance to the driver based upon sensor data to help the driver complete their intended maneuver. Example applications would include systems such as those designed to help maintain control when traction is reduced and systems that would provide assisted braking during an emergency brake response. Overall vehicle control addresses the total control of a specific aspect of driving rather than just assisting the driver. Example applications would include full lateral or longitudinal control, and ultimately full automation of the vehicle.

3 Applications of NADS Simulators to the Evaluation of Vehicle Automation The National Advanced Driving Simulator at the University of Iowa is host to the NADS-1 simulator platform and fixed-based NADS-2 (see Fig. 1. ). The NADS-1 is comprised of a 13-degree-of-freedom motion base with the largest motion envelope of any publicly available driving simulator in the world. The motion system's unique capabilities set it apart from other simulators, enabling the NADS-1 to accurately reproduce motion cues for sustained acceleration and braking maneuvers, movement across multiple lanes of traffic, and interaction with varying road surfaces that is not possible in fixed-base or limited-lateral-movement simulators. Fully instrumented car, sport utility vehicle, or truck cabs are mounted inside a 24-foot dome. The interior of this dome serves as the projection surface for the 360-degree photo-realistic visual display system. The cabs are instrumented to fully capture driver interaction while also providing configurable force feedback. Multiple in-vehicle cameras provide customized views of the cab environment. The NADS-2 utilizes the same architecture and vehicle cabs as the NADS-1, without the motion platform or the wrap-around visuals. Since the NADS simulators became operational in 2001, they have been used for a variety of efforts to evaluate a range of transportation human factors issues, from driver behavior to cognitive change due to medication use to advanced vehicle systems[12]. Table 2. provides a list of research efforts that will be discussed in this

342

T. Brown, J. Moeckli, and D. Marshall

Fig. 1. (a) NADS-1Platform, (b) NADS-2 Platform Table 2. Summary of NADS projects Topic

Stage

Evaluation of driver/environment monitoring on driver performance Electronic Stability Control

2

Automatic Warning Modes for Night Vision Enhancement Systems Adaptive Cruise Control

2

2

1

Automation Approach Augmented Detection Augmented Control Augmented Detection Vehicle Control

Type of Evaluation Proof of Concept Safety Benefit Design Concepts Early Adopters

paper. These projects provide a cross-section of automation-related research that covers different types of automation and different study purposes. Each project in the table is defined in terms of the stage of automation, automation approach and the type of evaluation. In the discussion of these projects, it will be possible to explore how high-fidelity driving simulation can factor into the evaluation of vehicle automation. 3.1 Evaluation of Driver/Environment Monitoring System This project served as one of several summary evaluations of the SAVE-IT[10] concept. The technology to be tested was designed to monitor and mitigate driver distraction in response to the proliferation of in-vehicle and carry-in technologies that drivers are now using. This technology would be defined as Stage 2 automation due to its intent to assist the driver through monitoring the environment and the driver’s own state. When considering the automation approach, this technology can be clearly defined as augmented detection in that the system functionality is geared toward aiding the driver in detecting safety-critical situations through the use of its various sensors to detect potential lane departures and collisions. Driver state was evaluated using a vision based system that could detect when the driver was and was not attending to the driving environment. The type of evaluation undertaken can be defined as a proof of concept in that the evaluation focused on assessing the potential safety benefits of a prototype system.

Use of High-Fidelity Simulation to Evaluate Driver Performance

343

This evaluation focused on the ability of the system to accurately detect when the driver is not attending to the driving environment and on the ability of the system to mitigate that distraction through changes to the in-vehicle display, providing earlier warnings when distraction is present, and suppression of false alarms when the driver was attentive. The alerts included visual, audio and haptic components. The study involved each driver completing three drives in which different implementations of the collision and lane departure detection algorithms were tested. Throughout each drive, the driver was periodically instructed to read incoming text messages and to identify a location on an electronic map. Both of these tasks were completed on an interactive system mounted in place of the radio on the center console. The haptic component of the alert was the release of the throttle producing a subtle motion cue to the driver. Subtle motion cues of this type have been shown to be an effective component of in-vehicle safety systems due to their compatibility with the required reaction from the driver [9]. The high fidelity motion system of the NADS-1 is necessary to produce motion cues as subtle as a throttle release. Results showed that drivers engaged in the tasks, but self-mitigated the distraction through chunking the tasks by glancing from the task to the roadway. Interview data indicated that some drivers attempted to anticipate when the system would provide alerts to minimize the number of alerts they received. This is an example of humans adapting behavior when automation is introduced. However, this was a single firsttime use of the system. Different behavior may be seen with long-term use of similar automation. Drivers may be more willing to engage in riskier behavior, such as looking away from the roadway for longer periods and trusting the system to detect hazards they do not. Even upon first-time use, some interviewees’ responses support this trend, noting that they chose to engage in the distracting task because they trusted that the system would notify them when they needed to shift their attention back to the roadway, see Fig. 2.

Fig. 2. Reliance on and Utility of In-Vehicle Information System (IVIS)

344

T. Brown, J. Moeckli, and D. Marshall

3.2 Electronic Stability Control Evaluation (ESC) Electronic Stability Control (ESC) is an active safety system that detects when the vehicle is moving in a direction that differs from the intended direction. The system automatically triggers computer-controlled braking of the appropriate wheels or down-shifting to stabilize the vehicle and help the driver maintain control. It can be defined as Stage 2 automation in that it functions as an aide to the driver by monitoring and providing input when the system identifies a discrepancy. ESC would be defined as augmented control due to the fact that although the system intervenes to control the vehicle, it only does so in a limited way that aides the driver rather than replacing the driver control. The type of evaluation can be defined as a safety benefit analysis for existing technology. NADS undertook one corporate-sponsored and two government-sponsored studies to examine the impact of ESC under various driving situations and road surface conditions. The studies examined crash rate on dry pavement, crash rate reduction on wet pavement, value of iconic indicators of system activation, and differential impact between types of vehicles. Study drives involved completion of avoidance, and curve negotiation maneuvers to assess the safety benefits of this technology. Across these studies, data were collected from more than 650 participants to assess the potential safety implications of this technology. Participants ranged in age from 16 to 74. These evaluations of the ESC system required the ability to reproduce subtle cues that are present at the onset of loss-of-control situations. These subtle cues provide the early indications the driver needs to sense that the vehicle is in danger. Without these cues, drivers without an ESC system would be significantly hampered compared to the real world in their ability respond to these situations. Without realistic motion, the evaluation would likely over represent the benefits of the system. As such it was critical to use the NADS-1 due to the high fidelity motion cues it is able to generate. Study results showed significant decreases in the crash rate across all vehicle types. In the first evaluation based on data from 120 participants [13], a significant reduction in loss of control was found with the system. Overall, there was an 88% reduction in loss of control with ESC present, when collapsing across all the data. The results of this study did not show any significant main effects for driver age or gender. The two follow-on NHTSA studies provided additional data to the government for consideration in their rule-making role. These studies on wet and dry pavement continued to show reductions in loss-of-control associated with ESC technology in a variety of situations including both left and right avoidance maneuvers, curves, wind, and obstacle avoidance. Formal reports on the results for these studies are currently pending with NHTSA. Based in part on these research studies, NHTSA formulated rulemaking (Federal motor vehicle safety standard No. 126, 49 CFR Parts 571 & 585) that requires ESC in all light vehicles by model year 2012. NHTSA estimates that this technology could annually save up to 9,600 lives in the United States and prevent up to 238,000 injuries [14]. 3.3 Adaptive Cruise Control Evaluation Adaptive cruise control (ACC) is designed to provide longitudinal control of the vehicle. As with conventional cruise control, ACC maintains a set vehicle speed while

Use of High-Fidelity Simulation to Evaluate Driver Performance

345

driving; unlike conventional cruise control, ACC is also able to maintain a specified distance from a lead vehicle. The driver’s longitudinal travel, therefore, is controlled by an algorithm that balances the set speed preference of the driver against the flow of traffic in the driver’s lane. This automation technology can clearly be classified as Stage 1 automation. The automation approach would be defined as vehicle control as it is intended, once engaged, to fully control the speed and headway of the vehicle under the supervision of the driver. The type of evaluation that this study will constitute is an examination of the types of errors that early adopters make in use of the system in safety critical situations. Developed as a precursor to fully-automated longitudinal travel, current implementations of ACC are mainly designated as convenience systems, not systems designed to improve safety. This is, in part, due to the infancy of the technology and the complexity of the task. ACC employs a vision-based system, using a radar or laser to determine the headway between the driver’s vehicle and the lead vehicle. The system may fail if the radar or laser are occluded (e.g., from fog, dirt on lens), and because of the vision-system’s angle of view, it may register vehicles in adjacent lanes or “lose” the lead vehicle on a curve. Human error is also significant. Although ACC has been available as an optional or standard feature for the past decade, its availability continues to be limited to select luxury vehicles and thusly its deployment is limited. In a series of NHTSA and AAAFTS-sponsored early adopter reports [15-17], researchers found that although ACC has been well-received among early adopters, based primarily on its perceived convenience and improved safety, relatively few respondents fully understand how the system operates or, critically, the system’s limitations. The lack of knowledge regarding ACC’s operation presents challenges to safe vehicle operation: errors are made when users do not know when to take over from the automatic control, and users become less attentive to the driving task, more reliant on the automated system, and therefore less equipped to address time-sensitive critical driving situations that exceed the system’s limits. The purpose of the NADS ACC study[18] is to examine driver performance among users of ACC in order to discern usage and error patterns that could be addressed through empirically-grounded countermeasures. Critical to this evaluation is to recruit participants with sustained use of an ACC system. Of particular interest is: (1) the participant’s reaction to feedback provided by the ACC system when its operational limits are about to be exceeded, such as in a rapid deceleration situation associated with an impending collision with a parked vehicle. Does the participant know how to accurately interpret the system’s feedback and provide an appropriate and timely response? How does the participant balance the system’s automation and their own control of the vehicle while driving?; (2) to identify driver usage patterns and error patterns that occur among drivers with ACC system experience, and attempt to link these patterns to their root causes. Errors caused by automation complacency and by system misunderstanding will be of special interest, as is the criticality of driver errors, and what impact they have on safety; and (3) to devise countermeasures based on expert feedback, relevant findings in the literature, and empirical data that have the potential to decrease the frequency and severity of errors or address problematic usage patterns.

346

T. Brown, J. Moeckli, and D. Marshall

The evaluation of the ACC system requires the ability to reproduce subtle cues indicating the acceleration and deceleration of the system. These subtle cues provide the primary feedback to the driver on the functioning of the system, without which the driver cannot accurately understand the systems current state. Similar to ESC, these critical system components of ACC require the high fidelity motion cues of the NADS-1. 3.4 Enhanced Night Vision Systems Enhanced night vision systems are designed to provide the driver with critical visual information from the night driving scene that might not normally be available and to alert the driver when a pedestrian enters the driver’s path. These systems aid the driver in identifying and avoiding pedestrians and potentially other hazards in the nighttime environment. This automation technology can be classified as Stage 2 automation due to its design intent to assist the driver through the presentation of additional information. The automation approach would be defined as augmented perception. The type of evaluation will be a design concept study due to the fact that the goal of the study is to provide design guidance concerning the choice of alert modality that might be effective, as well as defining a framework for additional system comparisons. Night vision systems have been used for decades by the military to assist soldiers in nighttime maneuvers. Despite the widespread military use, this technology has not garnered widespread use in automobiles. Initial implementations in automobiles provided the driver with only the infrared signatures from the environment without the threat warnings that are currently being implemented. Current technology allows these systems to differentiate between a target in the environment and a threat to the driver. Tsuji et al. [19] provide an excellent summary of the system approach that is under evaluation. This study[20] will examine the effectiveness of various alerting modalities on the ability of the driver to respond to pedestrian threats. Tsuji et al. [19] reported that 68% of all pedestrian fatalities occur at night, 90% occurred while the driver was traveling straight ahead, and that 78% occur when the driver is traveling greater than 40 km/h. These statistics point to a general difficulty with identifying and responding to a pedestrian quickly enough to avoid hitting them, which night vision systems with target warning should be able to help address. A critical question in the development of these systems remains what is the best way of alerting the driver to the presence of the potential threat. As the events to which this system would alert the driver are relatively infrequent, identifying an alert mechanism that can effectively warn the driver of the threat in such a way that they can safely respond is critical. Options under consideration include a visual alert on the display indicating a threat, an auditory tone, and seat vibration. The ultimate aim will be to provide appropriate guidance to NHTSA on the utility of visual, auditory, and tactile warnings as part of enhanced night vision systems. Although this study requires high-fidelity visuals to provide an accurate understanding of the driver’s ability to detect targets in the night driving scene and to provide the infrared display, there is not a compelling need for motion cueing to assess this technology. This study will, therefore, utilize the NADS-2 simulator rather than

Use of High-Fidelity Simulation to Evaluate Driver Performance

347

the full motion NADS-1. As the system being tested is forward looking only, this an ideal match between the needs of this study and the NADS-2.

4. Conclusions Vehicle automation will become an increasingly important tool for vehicle designers in a continued effort to improve overall safety on our nation’s highways. Each stage of automation provides different types of safety benefits. As the implementation of automation progresses beyond the Stage 1 and 2 automations that are currently being deployed, drivers will be increasingly moved to more supervisory roles. This has the potential to reduce crashes by reducing the potential impact of the greatest cause (human error); however, there is a risk that this transition will lead to more severe crashes when they do occur. Stage 3, 4, and 5 automations are linked to needed technology advances in lateral and route control. Once these advances are in place, it should be expected that automation in these stages will be quickly developed in response to drivers’ desire for increased convenience and an overarching goal of reducing crashes. Special care is and will continue to be needed in the evaluation of automation technologies across all five stages to insure that a net safety benefit will occur through the addition of the automation into the driver-vehicle system. Driving simulators can play an important role in the evaluation of these technologies. Simulators of different fidelities play unique roles in this, with lower fidelity simulators being used earlier in the design phase, and higher fidelity simulators being used to provide more definitive answers later in the process. High fidelity simulators such as those located at NADS allow for nearly full immersion in the driving environment that is as close to real world driving as is possible in a simulator. For many research efforts, the ability to provide valid motion cueing to the drivers is critical to achieving realistic driver responses that can be used to accurately assess safety critical system. High fidelity driving simulation environments, particularly those provided at the National Advanced Driving Simulator, offer the opportunity to test a variety of automation systems to assess the potential safety impact in a manner that is both safe and reliable. The value of these types of assessments is most clearly demonstrated by the Electronic Stability research conducted in the NADS-1.

References 1. Wickens, C.D.: Engineering Psychology and Human Performance, 2nd edn. HarperCollins Publishers Inc., New York (1992) 2. Sanders, M.S., McCormick, E.J.: Human Factors in Engineering and Design, 7th edn. McGraw-Hill, Inc., St. Loius (1993) 3. Janssen, W., Wierda, M., Van Der Horst, R.: Automation and the Future of Driver Behavior. Safety Science 19, 237–244 (1995) 4. Rillings, J.H.: Automated Highways. Scientific American 277(4), 6 (1997) 5. Tan, H.-S., Rajamani, R., Zhang, W.-B.: Demonstration of an Automated Highway Platoon System. In: American Control Conference, Philadelphia, PA (1998) 6. Bloomfield, J.R., et al.: Driving Performance and Commuting via an Automated Highway System, Federal Highway Administration (1998)

348

T. Brown, J. Moeckli, and D. Marshall

7. Bloomfield, J.R., et al.: Using an Automated Speed, Steering, and Gap Control System and a Collision Warning System When Driving in Fog., Federal Highway Administration: McLean, VA (1998) 8. Bloomfield, J.R., et al.: Driving Performance after an Extended Period of Travel in an Automated Highway System, Federal Highway Administration McLean, VA (1998) 9. Lee, J., et al.: Effects of Adaptive Cruise Control and Alert Modality on Driver Performance. Transportation Research Record: Journal of the Transportation Research Board 1980, 8 (2006) 10. Safety Vehicle Using Adaptive Interface Technology, SAVE-IT (2009), http://www.volpe.dot.gov/hf/roadway/saveit/ (cited 2009 2-4-2009) 11. Thorpe, C., Jochem, T., Pomerleau, D.: The 1997 Automated Highway Free Agen Demonstration. In: IEEE Conference on Intelligent Transportation Systems (1997) 12. NADS projects (2009), http://www.nads-sc.uiowa.edu/ projects/projects.htm (cited 2009) 13. Papelis, Y.E., et al.: Study of ESC Assisted Driver Performance Using a Driving Simulator, National Advanced Driving Simulator, Iowa City, IA, p. 35 (2004) 14. Federal Motor Vehicle Safety Standards; Electronic Stability Control Systems; Controls and Displays, National Highway Traffic Safety Administration, Washington, DC (2007) 15. Jenness, J.W., et al.: Use of Advanced In-Vehicle Technology by Young and Older Early Adopters. Survey Results from Five Technology Surveys, Washington DC (2008) 16. Jenness, J.W., et al.: Use of Advanced In-Vehicle Technology by Young and Older Early Adopters. Survey Results on Adaptive Cruise Control Systems, Washington, D.C. (2008) 17. Llaneras, E.: Exploratory Study of Early Adopters, Safety-Related Driving with Advanced Technologies - Final Report, Washington, D.C. (2006) 18. Brown, T.L., Moeckli, J., Dow, B.: Evaluation of Adaptive Cruise Control (ACC) Interface Requirements on the National Advanced Driving Simulator (NADS): Proposal, National Advanced Driving Simulator, Iowa City, IA, p. 60 (2008) 19. Tsuji, T., et al.: Development of Night-Vision System. IEEE Transactions on Inelligent Transportation Systems 3(3), 7 (2002) 20. He, Y., Moeckli, J., Schnell, T.: Evaluation of Automatic Warning Modes for Night Vision Enhancement Systems: Proposal, National Advanced Driving Simulator, Iowa City, IA, p. 73 (2008)

Applying the "Team Player" Approach on Car Design Staffan Davidsson1 and Håkan Alm2 1

Industrial PhD Candidate at Volvo Cars Corporation, Gothenburg, Sweden 2 Luleå University of technology, Luleå, Sweden [email protected]

Abstract. Automation can cause problems with ‘the human factor’. One approach is to make automation become a team player. A team player agrees on a common ground, they show intention, they show reasoning, express their limits of performance and so on. This approach has been applied to adaptive driver information in the present study. Ten experts on different in-vehicle systems were interviewed. The experts found the team play approach both challenging and interesting. However, the experts also found a difficulty in combining the increased visual workload required to "be a team player" with car driving, that is already visually, manually and cognitively challenging. The experts believed that the approach described by the researchers rather described agents before they become team players than being team players. What is needed is "teambuilding"; the solution suggested is a compromise and could be described as a separate view for the above mentioned information.

1 Introduction Whether we like it or not a lot of new information is available for the driver and there is more to come when cars become connected to the internet or other infrastructural networks. Car to car information, Car to infrastructure, as well as in-car information coming from new sources such as radars, sensors etc will most likely invade cars. The reasons for this may be to improve safety, environmental friendliness, transport efficiency and perhaps also because drivers just like the information [3]. However, it is not reasonable to show all the information from these gadgets or functions simultaneously due to visual and cognitive workload. Therefore, some car manufacturers (e.g. Volvo Cars) have introduced workload managers. So far these have been limited to reducing workload by blocking information to the driver in critical situations. The next step will most likely not only block information but also provide situation adapted information. For instance, do we really need the speedometer in the garage? Maybe a 360 degree camera would be more helpful there and, of course, vice versa on the highway. In many ways it could then be said that the car works as an agent or an automatic system that controls the information flow. In the extreme, the whole driving task may be automated [12]. Automation is very well investigated. When introducing automation one reason is often to reduce workload. However, capitalizing on some strength of automation does not replace a human weakness. It creates new human strengths and weaknesses— often in unanticipated ways [1] Stanton & Young [11] discuss driving automation and raise issues such as trust, mental workload, locus of control, driver stress and mental D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 349–357, 2009. © Springer-Verlag Berlin Heidelberg 2009

350

S. Davidsson and H. Alm

representation. Norman [8] expresses that technological artifacts can enhance human expertise or degrade it, "make us smart" or "make us dumb". It is therefore obvious that introducing automation can cause problems but what do we do about it? One way of getting away from at least some of the automation induced problems is to make automation become a friend. Dekker [5] concludes that system developers should abandon the traditional "who does what" question of function allocation. Instead, the more pressing question today is how to make humans and automation get along together. Parasuraman et al. [9] doesn't agree with much of the content in Dekker [5] but points out that he sees some value in their team play approach. Woods [13] raises several questions about automation in cockpits for aircrafts. What should we learn from the problems? Do they represent overautomation or human error? Or perhaps there is a third possibility; they represent coordination breakdowns between operators and the automation? Instead of reducing automation or designing mainly to reduce errors it is suggested that it is possible to tame complexity of a system by making the automation act as a team player. Young et al. [14] suggest that a blend throughout the driving subtasks may prove most efficient and that we are thinking in terms of shared authority, rather than either human or technological authority. Sarter [10] consider supervisory control of automated resources as a cooperative or distributed multi-agent architecture. One cooperative agent concept, "management by consent," requires that the human members of the team agree to changes in target or mode of control before they are activated. This cooperative architecture could help the people in the system to stay involved and informed about the activities of their automated partners. In summary, when designing a joint system for a complex, dynamic, open environment, where the consequences of poor performance by the joint system are potentially grave, the need to shape the machine agents into team players is critical [2] All of this sounds reasonable. However, the big problem is how do we create car systems that act as team players? Klein et al. [7] outline ten challenges for making automation components into effective "team players" when they interact with people in significant ways: 1) To be a team player, an agent must fulfill the requirements of a Basic Compact to engage in common grounding activities. 2) To be an effective team player, agents must be able to adequately model the other participants’ intents and actions vis-à-vis the state and evolution of the joint activity. 3) Human-agent team members must be interpredictable i.e. be able to observe and correctly predict future behavior of teammates. 4) Agents must be directable. 5) Agents must be able to make pertinent aspects of their status and intentions obvious to their teammates. 6) Agents must be able to observe and interpret pertinent signals of status and intentions. 7) Agents must be able to engage in goal negotiation. 8) Planning and autonomy support technologies must enable a collaborative approach. 9) Agents must be able to participate in the management of attention and finally 10) Controlling the costs of coordinated activity. Christoffersen and Woods [2] conclude that observability and directability are the keys to fostering a cooperative relationship between the human and machine agents in any joint system. Summarizing the ten research challenges by Klein [7] and the other researchers’ statements about how to make an agent become a team player and also adding a time line may look like table 1.

Applying the "Team Player" Approach on Car Design

351

Table 1. Behavior as a team player Before:

During action: After:

Share common goals, Show intention, Share representation of the problem state, Directable, Negotiable, Being observable, Observe humans, Negotiable levels of authority, Future oriented Being gentle, Not overloading, Not clumsy, What is it doing?, Negotiate, Show its limits of performance, Share representation of the activity Explain why action, Feedback, Why did it do this? Observability (including things being observed, observer, context), Change behavior after negotiation, Give humans feedback.

1.1 Purpose The purpose of this study was to create a starting point for making automation become a team player in a driver information context using the research about how to become a team player. The study should also serve as way to highlight important questions when automating information flow in the vehicle.

2 Method 2.1 Participants Two female and eight male experts at Volvo Car Corporation, Luleå Technological University and Chalmers participated in the study. The reason for having so many experts was that they were experts in highly automated systems of cars, but in different areas which may cause disparate answers. The areas of expertise were design of Active Safety systems, design of Driver Information Systems and HMI design. 2.2 Material The experts were sent an e-mail with a scenario before the interview (See italic below). The experts were also provided a list of potential problems by automating information flow. The problems described were over-trust, under-trust, skilldegeneration and workload when automation fails. Scenario: "It is morning and you have just left the bed. You go down to the kitchen and take a quick look at the computer. There are no queues yet but the road condition seems to be slightly slippery. The car is filled up with fuel and doesn't need a service for a while but perhaps it is better to leave earlier due to the road condition. It may take a few extra minutes to get to work. You spread the table and start to browse through the paper. You like to take it easy in the morning. You clear the table and go to the car. Where there used to be a speedometer and tachometer there is a screen with information about the car’s status, about the same information as on the computer but in more detail. In addition you get feedback about how fuel efficient you drove last time and you are reminded to change gear earlier. You start the engine. The screen now shows a 360 degrees view around the car to make sure that there are no obstacles around the car. You reverse the car and enter the road. Now, the speedometer, a map over the area and where the roads are heading in the next

352

S. Davidsson and H. Alm

crossing are shown. When you approach the crossing more details are shown and the speedometer shrinks. There is also advice that you should mind meeting traffic when turning left since many accidents have happened…." 2.3 Procedure The Delphi [6] procedure was used. Anonymity of groups and interaction with controlled feedback reduces bias and also makes measurable feedback available. The Delphi method first elicits judgments from experts individually; the expert then gets the judgment from the previous experts and could then re-evaluate his/her own judgment. The method was modified in the way that instead of having individual feedback sessions after all being interviewed, all experts were called to a focus group meeting.

3 Result Table 2 is a summary of the results from the interviews. The comments are listed in order of presence (Most commonly mentioned first). Table 2. Summary of interview results 1. How do you define a "Team player"? Team players achieve an improved result by working together rather than individually and have a holistic view. The team work is built on knowledge of the others and own performance limits and trusts in that each are doing their best. The team players find a pleasure in working together.

2 A. Share common goals, Most of the experts mentioned a combination of different ways of sharing a common goal. • The driver could enter a goal mode for the trip: E.g. Environmental friendly, safe, sporty or efficient. • The system should understand the goal for the trip and adjust the information thereafter. This could be done by comparing present state with a database. E.g. Time, location but also dynamic information such as driving behavior or looking at the response of information (e.g. discard). o "A real team player knows what I want". o It should be set at the car dealer depending on personality. • The information should be good as it is (One mentioned 95%) but could be adjusted in a menu or similar. • It is not a team work if the driver is the only one in charge. • Another mentioned that some things could be predetermined e.g. that the driver doesn't want to crash.

2 B. Show intention, Future oriented • •

•

•

Some suggested that next upcoming mode should be visible. The experts accepted different levels of intrusiveness: By looking at it rather than highlighted for attention / It should be possible to accept a new mode before it is shown / It should be possible to reject a change in mode / The change should be continuous rather than in steps. Others suggest that showing attention is not needed at all. The information system could show intention if the driver wants this (By a menu or similar).

Applying the "Team Player" Approach on Car Design

353

Table 2. (continued) 2 C. Show reasoning • • • •

The parameters behind the logics could be shown / If the algorithm is complex it should not be shown (1), This type of info is better before and after driving / Don’t believe in briefing. Some suggest that the reasoning could be shown if the driver wants to (e.g. settings). Reasoning should not be shown at all due to information overload. "A really good team player does not need to show intention or reasoning". "How do you learn to become a team player if no one shows intention or reasoning?".

• 2 D. Understand reasoning • • •

•

Driver behavior (e.g. input from: Radars, cameras, accelerator, brake, steering, time, GPS, map data). The system needs training to understand the driver (Statistics could be developed for e.g. Most likely path, Time, etc.) Some conclusions could be drawn from how the driver response to information e.g. if the driver discards information many times it may not be wanted. In early stages in product development it can be investigated to what extent a function is requested or not in a certain situation. If there is a low consensus for a function in a situation it should maybe not be automated. (1)

2 E. Share representation of the problem state, • •

•

Fear too much dialogue. Many suggest a dialogue with the agent that shows how the different goals affect each other: The driver could prioritize different goals / Show how the different goals affect each other: e.g. 100% safety gives 10 % sportiness / The impact on different goals are shown: E.g. ACC distance too close => impact on safety / By analyzing driver behavior ("You seem to have this prioritization"). Feedback: Some kind of feedback from the agent about how driver performs according to goals could be shown.

2 F. Directable, Negotiable levels of authority • •

•

Levels of automation: The level of automation should be adjustable / Most should be good from start but possible to fine tune. Levels of help: It is suggested that also the level of help can be changed: In two levels much - little / Distribute information differently over time: First, little information to get to know the system and then more info such as tips and tricks and finally when you know the system, less information again. By behavior: If you say NO to some information several time it does not suggest this any longer.

2 G. Negotiable • • •

The algorithm should be shown for how the agent is reasoning (E.g. how the different goals depend on each other. React on input: If the driver inputs travel time to E.g. Stockholm to 4 h. The agent tells the driver that this affects safety, green driving etc.... This is not a good idea. It should be a part of the planning.

• 2 H. Being observable,

2 I-J. Being gentle, Not overloading, Not clumsy, • • •

•

HMI design: Minimalistic / Very integrated functionality / Should not call for attention / Gradually rather than discrete changing information / Reduce number of times information is shown / Settings of how much information: Little - Much. Give feed forward information to prepare the driver. Don't believe in pre and post trip info. Give the correct information at right time.

354

S. Davidsson and H. Alm Table 2. (continued) 2 K. What is it doing • •

Show which mode it is in and what it is doing (e.g. thinking). The reason for this may be to show that it is consistent. HMI Design: Change information without calling for attention, be subtle, like the body language.

2 L. Negotiate • • •

It should be possible to change level of automation during driving but also change mode manually. Some said that it should not be done while driving. It could be somewhat adaptive. E.g. Change ACC time gap depending on driving style without ACC. It should be possible to reject a change of mode as a way to negotiate.

• 2 M. Show its limits of performance, • • • •

Show Performance limits: Show signal strength from sensors with a graph bar / Inform discrete levels, Show active or in-active / Inform when passing the limits / Show trend. This is not important since this is not directly driving related. Show this information may be bad from a competition point of view. The system should have limits but it should be possible to override. Separate view to see the limits of performance.

• 2 O. Explain why action, • • •

Not important. This is mainly for urgent warnings. Location of this information: Could be located at the same place as intention / It could if people are interested also be shown as briefing or a log afterwards.

2 P. Feedback, • •

A time line that shows what has happened. Post accident for urgent warnings. History over ordinary happenings after driving (Briefing).

•

The system could change behavior in three different ways: By looking at the response of the information / Post trip evaluation / After some time a question could be shown: The learning period is ended. Do you still want this or that information?

• 2 Q. Change behavior after negotiation

2 R. Give humans feedback. • •

Give feedback to the driver depending on goals the agents agreed about. Feedback should be: With a positive spirit, moderate and sophisticated. Feedback as a coach or a game.

• • •

To keep the communication on the correct level. To match the mental models of the drivers. To create robust solution acceptable and enjoyed by most. To understand drivers intention.

• 3. Biggest challenge

•

4 Discussion 4.1 General It was obvious that the idea of making the automation in a car become a team player was new for the experts. However, they found it challenging, interesting, and agreed on that the idea that making the automation become a team player could reduce some of the problems with automation. They also came up with several ideas of how to

Applying the "Team Player" Approach on Car Design

355

improve today's systems - not only about automating information flow but also about their own area of responsibility within the car, such as navigation, active safety and other support systems. The answers were divergent and showed proof that the idea of making automation a team player is immature, at least among the experts within the car industry (and specifically, at Volvo Cars.) 4.2 Specific Comments on the Questions The experts defined a "team player" as someone who achieves an improved result by working together rather than individually and has a holistic view. Team work is built on knowledge of the others and their own performance limits and trust in that each player is doing their best. The team players find a pleasure in working together. When discussing how to share a common goal it seems that two extremes are represented: the driver decides all by himself what the goal is by setting a mode (goal) for the trip, or the other extreme where the designer or car dealer decides what is important. In the continuum between some of the experts stated that: "It is not team work if the driver is the only one in charge", or that the car looks at the driver behavior to adjust the goal. The most commonly mentioned way of showing intention is to show the next presentation mode that the car plans to enter. It should be presented in a way so that the change does not call for attention. Some suggested that it should be possible to choose if they wanted to see intention or not in a menu. Some also discussed how to reject changes but this might be more correctly located under "Directable" (See table 2). It could also here be stated that the answers are disparate. Some of the experts don't want to show reasoning at all, mainly because they couldn't find a way to solve this without risk of overload and/or because they couldn't understand why it was important. Others found it more important and suggested a separate view for reasoning showing the algorithms behind the logic in a pedagogic way. Perhaps the most interesting ideas were made under question 2 C. Some of the experts stated that what the researchers described were not "team players", rather, they were "not yet team players". One expert claimed that "a really good team player does not need to show intention or reasoning" or "A real team player knows what I want". This was commented by another expert that said "But, how do you learn to become a team player if no one shows intention or reasoning"? This is probably the key to the whole issue about making the automation in a car become a team player. In a car, while driving, the visual demands are high and it is therefore not recommended to show too much information. On the other hand, if the system doesn't show intention, reasoning or is directable etc., it is likely to get automation induced errors. As in the previous point where the agent showed the driver how it is reasoning, the driver should show how s/he is reasoning. As in the other points the answers were rather differentiated. Perhaps a good compromise would be to first use user clinics etc to get a good picture about how people think and then fine tune dynamically by looking at behavior and build a database with historical data to be used to predict driver behavior.

356

S. Davidsson and H. Alm

When discussing how to share representation of the problem state some of the experts fear too much dialogue. On the other hand some came up with ideas about how to solve the issue. The main idea was to show how the different goals affect each other (e.g. a view shows that if the driver prioritises sportiness this will affect fuel consumption or safety). It could also be integrated within the different systems (e.g. if you choose this route the sportiness or safety will be two out of ten. When discussing directability and negotiable levels of authority the experts suggested that it should be possible to choose level of automation and level of help. This may be done in a menu or by adapting to how people respond to information (e.g. if the driver says NO to some information several times the system does not propose this information any longer). Being gentle, not overloading and not clumsy was discussed mainly in general terms. To summarize, it was mainly about being minimalistic, moderate and careful in the HMI design. However, it was also suggested that feed-forward information could help the driver to reduce workload in critical situations. The main idea of showing what the system is doing was to change the information without demanding attention. The driver should see the changes only if s/he looks. This could be done by changing information more gradually rather than all at once. As in several other points the extremes are represented among the answers when discussing if and how to show limits of performance. Some want to show signal strength from sensors and others don't want the information at all. However, also as in other points, it was suggested that showing signal strength could be shown in a separate view possible to look at on request. It was agreed that an explanation of why an action occurred is mainly important for urgent warnings where intention or reasoning is impossible to show due to time constraints. One interesting suggestion was to give feedback to the driver depending on goals the agents agreed about. The main thought were that feedback should be: With a positive spirit, moderate and sophisticated, and that feedback preferably is designed as a coach or a game. On the question about the biggest challenges, the top one was: To match the mental models of the drivers, to keep the communication on the correct level, to create robust solutions acceptable and enjoyed by most and to understand driver's intention. 4.3 Conclusion Applying the team player approach to car automation seems to be difficult. The main problems are that showing intention, limits of performance, negotiation with or direct automation according to the experts, requires visual attention - visual attention that is also important for the driving task. It is worth mentioning that there were very few experts suggesting other modalities than visual in communicating with the agent. The experts believed that the approach described by the researchers rather described agents before they become team players than being team players. Real team players do not need to show intention, reasoning etc. The main issue is therefore the journey to become team players. This procedure could perhaps be called "Team building". On the other hand, car manufacturers would prefer automation that do not need specific driver training. It is unlikely that considerable progress could be made

Applying the "Team Player" Approach on Car Design

357

in car-driving support if one simply relies on learning by doing [4]. From what has been said in the interviews it seems that a compromise with a separate view for goals, intention, reasoning, limits of performance, negotiations etc. is the most suitable solution. Future research could investigate other modalities of interaction than visual / manual. It would also be interesting to empirically study if a team player approach applied on car design could reduce automation induced errors.

References 1. Bainbridge, L.: Ironies of automation. In: Johannsen, G., et al. (eds.) Analysis, design and evaluation of man-machine systems, pp. 151–157. Pergamon, Oxford (1982) 2. Christoffersen, K., Woods. D.: How to Make Automated Systems Team Players. In: Salas, A. (ed.) Advances in Human Performance and Cognitive Engineering Research, vol. 2. JAI Press/Elsevier (2004) 3. Davidsson, S.: Work Domain Analysis for Driver Information. (2008) (manuscript submitted for publication) 4. Hoc, J.-M., Young, M.S., Blosseville, J.-M.: Cooperation between drivers and automation: implications for safety. In: Theoretical Issues in Ergonomics Science. Taylor and Francis, Abington (2008) 5. Dekker, S.W.A., Woods, D.D.: MABA-MABA or Abracadabra: Progress on humanautomation cooperation. Cognition, Technology and Work 4(4), 240–244 (2002) 6. Kirwan, B., Ainsworth, L.: A Guide to Task Analysis, pp. 157–158. CRC Press, Boca Raton (1992) 7. Klein. G., Woods. D., Bradshaw. J., Hoffman. R., Feltovich. P.: Ten Challenges for Making Automation a Team Player in Joint Human-Agent Activity. IEEE Intelligent Systems 19(6), 91–95, (November/December 2004), doi:10.1109/MIS.2004.74 8. Norman, D.: The Design of everyday things. The MIT Press, London (1989) 9. Parasuraman, R., Sheridan, T., Wickens, C.: Situation Awareness, Mental Workload, and Trust in Automation: Viable, Empirically Supported Cognitive Engineering Constructs. Journal of cognitive Engineering and Decision Making Human Factors and Ergonomics Society 2(2), 140–160 (2008) 10. Sarter, N.B., Woods, D.W.: How in the world did we ever get into that mode? Mode error and awareness in supervisory control. Human Factors 37(1), 5–19 (1995) 11. Stanton, N.A., Young, M.: A proposed psychological model of driving automation. Theoretical Issues in Ergonomics Science 1(4), 315–331 (2000) 12. Walker, G.H., Stanton, N.A., Young, M.S.: Where is computing driving cars? International Journal of Human Computer Interaction 13(2), 203–229 (2001) 13. Woods, D., Sarter, N.: Clumsy Automation and "Going Sour" Accidents. In: Sarter, N., Amalberti, R. (eds.) Cognitive Engineering in the Aviation Domain. Erlbaum, Hillsdale (1999) 14. Young, M.S., Stanton, N.A., Harris, D.: Driving automation: Learning from aviation about design philosophies. International Journal of Vehicle Design 45(3), 323–338 (2007)

New HMI Concept for Motorcycles–The Saferider Approach J.P. Frederik Diederichs1, Marco Fontana2, Giacomo Bencini3, Stella Nikolaou4, Roberto Montanari5, Andrea Spadoni5, Harald Widlroither1, and Niccolò Baldanzini3 1

Fraunhofer-Institut für Arbeitswirtschaft und Organisation IAO, Human Factors Engineering, Stuttgart, Germany 2 PERCRO Laboratory, Scuola Superiore Sant’Anna, Pisa, Italy 3 Università degli Studi di Firenze, Dipartimento di Meccanica e Tecnologie Industriali, Firenze, Italy 4 Centre for Research & Technology Hellas, Hellenic Institute of Transport, Athens, Greece 5 HMI Group, Engineering Science and Methods Department, University of Modena and Reggio Emilia, Reggio Emilia, Italy [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. For more than one decade the European Commission has been focusing on the enhancement of road safety by funding research on Advanced Driver Assistance Systems (ADAS) and Intelligent Vehicle Information Systems (IVIS) in the field of automotive. However, the application of such technologies in Powered-Two-Wheelers (PTW) is currently lacking behind. While in the automotive sector extended knowledge has been generated also on the Human-Machine Interface (HMI) for ADAS and IVIS this does by far not apply for the PTW sector. This paper presents the SAFERIDER (Advanced telematics for enhancing the safety and comfort of motorcycle riders) project outline and focuses on the new HMI concept and haptic interface devices that are developed within the project. Keywords: SAFERIDER, haptic Human-Machine Interface, Powered Two Wheelers, Motorcycle, Advanced Driver Assistance System, Advanced Rider Assistance System, Intelligent Vehicle Information System.

1 SAFERIDER Outline SAFERIDER is a 3-year research project, launched in January 2008, funded by the 7th Framework-Programme of the European Commission under the DG Information Society & Media. The SAFERIDER consortium is a group of 20 partners from industry, organizations, research institutes and universities scattered in Europe. It aims to study the potential of ADAS (consequently called ARAS – Advanced Rider D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 358–366, 2009. © Springer-Verlag Berlin Heidelberg 2009

New HMI Concept for Motorcycles–The Saferider Approach

359

Assistance Systems) and IVIS integration in PTW and develop an efficient and riderfriendly HMI concept and new HMI elements for riders comfort and safety. The SAFERIDER homepage [1] provides further information on the project. 1.1 SAFERIDER ARAS and IVIS SAFERIDER aims to develop and test eight ARAS / IVIS subsystems and their elements and integrate them in different combinations on the project demonstration vehicles which are six PTW of different types and three simulators. The SAFERIDER ARAS and IVIS are well known and approved applications from the automotive field; hence the main aim of SAFERIDER is to investigate the feasibility of integrating those into modern PTW. Four ARAS have been chosen. SPEED ALERT: To present information to the rider when and how much the speed exceeds the legal speed limit. CURVE WARNING: To provide the rider with a constant feedback during curve maneuvers about necessary deceleration or other maneuver errors. FRONTAL COLLISION WARNING: To warn the rider when an obstacle is detected in the motorcycle near-field front area. INTERSECTION SUPPORT: Integration of the three above functionalities to provide efficient warning on potential hazards in intersections. Additionally four IVIS are developed for riders comfort and for a potentially positive impact on traffic safety: eCALL: Emergency call to a server or person when a PTW crash or fall is detected. TELEDIAGNOSTIC SERVICE: Remote data logger of the principal PTW parameters like position, speed, acceleration, fuel consumption, oil temperature and more. NAVIGATION AND ROUTE GUIDANCE: Provision of route information to the rider. WEATHER; TRAFFIC & BLACK SPOT WARNING: Provision of dynamic information to the rider about weather conditions, traffic jam, road blockings or dangerous points along the planned route.

2 HMI Development Introduction of active safety functions in PTW has been constantly facing problems regarding user acceptance. In order to facilitate the acceptance of such applications it is mandatory to offer a highly intuitive HMI that is smoothly integrated into the feedback regulation process between a bike and the rider – offering further information about the environment without limiting fun and direct feedback about the PTW and road status. In SAFERIDER it is believed that respective HMI elements have been selected and are under development that will enable an innovative HMI concept for new PTW ARAS and IVIS that will lead to higher possibility of rider’s acceptance.

360

J. P.F. Diederichs et al.

At the same time deciding which HMI elements will be applied in SAFERIDER was the first step in a development chain which ensures a user centred development process with iterative user feedback starting at the very beginning of development until the final prototype/product. 2.1 Methodology Since SoA of Human Machine Interface in the PTW domain is lacking behind HMI solutions will cover a key role within the project. The development methodology takes care of different aspects in order to identify possible HMI elements, starting from the feasibility of some devices and finishing with on-road pilot tests in real contexts. Generally, the development of SAFERIDER HMI is following a methodology based on 8 sequential development phases. 1.

DEFINITION and first feasibility analyses of HMI elements within the project. Driven by experts’ opinion in the consortium and a rider’s needs and wants questionnaire filled by 5 further highly experienced riders from different fields a list of HMI elements has been selected for prototype development.

2.

DEVELOPMENT of prototypes and prototypical integration. The selected HMI elements are developed as prototypes and integrated to mock-ups in order to enable first tests with users.

3.

First OPTIMIZATION step of characteristics. The early prototypes are tested with a limited number of users in order to trigger the haptic sensing towards acceptable and meaningful characteristics.

4.

ASSIGNMENT of HMI elements to SAFERIDER applications. Some HMI elements are potentially useful for more than one application. By integrating the prototypes to simulators and on-road PTW the elements can be tested as feedback for different applications. The result is a 2D matrix containing best matching elements to applications.

5.

Second OPTIMIZATION step for characteristics. After elements are assigned to the best matching applications, the second optimization step aims for improving sensing characteristics towards the precise information that needs to be given by the matched application/s.

6.

Development of best ACOUSTIC and VISUAL attendances for the HAPTIC prototypes. According to the approach of focusing on haptic HMI elements but developing a multimodal interface, this step aims to enrich sensing characteristic by defining organic visual and acoustic attendances.

7.

Final INTEGRATION taking safety and quality requirements into account. Finally, the applications and HMI elements have to be integrated into the PTW and connected by the overall system architecture.

8.

Pilot TESTING of complete system and optimization. Finally users will test the complete system in on-road tests in simulated and real context. This step will lead to further enhancement of the whole system.

New HMI Concept for Motorcycles–The Saferider Approach

361

Specifically, in order to improve the impact of the users on the design, a simple User Centred Design is basically applied both before and after each relevant development step. By this method [2] the users are involved many times during the design/development process of requirements gathering and usability testing. 2.2 HMI Elements For PTW riders, even more than in 4-wheel vehicles, the demand of focusing their attention totally on the road and riding task has to be considered when developing HMI solutions for those vehicles. Two wheelers are very sensitive vehicles in terms of motion dynamics and any unexpected or sudden changes in motion, caused by an alerted/distracted rider or a dynamic assistant system, could easily lead to loss of control over the PTW. Hence the traditional visual information presentation comes with some fundamental disadvantages. It is assumed that – even more than in the automotive sector – visual information is distracting and inappropriate to present imminent warnings and information in complex riding tasks. The motorcyclist’s attention is and shall be fully directed towards the road section in front of him, hence intuitive haptic and acoustic warnings are supposed to be most supportive to raise the rider’s situation awareness, to guide his attention and to support corrective actions in case of danger. The present paper focuses on the haptic channel HMI elements – while visual and acoustic elements will be applied as attendances in order to create multimodal and organic information. For the visual information a head-up and a dashboard display are foreseen. For acoustic information a binaural speaker system will be available. The 4 haptic HMI elements are: FORCE-FEEDBACK IN THROTTLE: In SAFERIDER a throttle with programmable return force is developed. Analogue to the successful introduction by Continental [3], Volkswagen and Nissan of the Accelerator Force Feedback Pedal (AFFP) in the automotive sector it is assumed that information directly inside the speed controlling device is highly intuitive. In the case of the car, the force that the pedal exerts on the driver foot is programmed in order to deliver intuitive messages to the user regarding speed limits [4] or even road condition [5]. In the SAFERIDER project the PTW will be equipped with a force controlled throttle able to tune the return force through a servo controlled electric motor in order to communicate a speed reduction warning. The throttle is a critical interface for the rider stability and control of the vehicle and every action and force that is applied on it has to be carefully chosen. For this reason the programmed behaviour of the throttle is designed for a non invasive and highly intuitive feedback. The control principle is based on gradually increasing the stiffness of the return spring adding a simulated stiffness when a speed warning has to be transmitted (see Fig.1). The system consists of an electric motor that is connected through a pulley on the return cable of the throttle. A dedicated electronic unit controls the motor to behave as a virtual spring in parallel to the return spring. The wanted stiffness and the stiffness variation slope are programmed through RS-232. A set of tests on a riding simulator is foreseen in order to tune the system parameters and study the stability and safety of the system before integrating it into an on-road motorcycle.

362

J. P.F. Diederichs et al.

Fig. 1. Force versus throttle angle: Range of simulatable stiffness

HAPTIC FEEDBACK IN HANDLE: Different than the force-feedback throttle is an approach to give a haptic feedback either in the right handle, the throttle, or in the left handle. This feedback is perceived as a handle’s shape variation given through appropriate elements sliding out of the inner part of the handle and moving the surface of the handle itself, Fig. 2.

Fig. 2. Mock Up system for sensibility tests (handle not covered)

Several studies have been conducted regarding the conveying of information through tactile means, vibrations or pressure variations [6, 7]. Using the mock up system of the haptic handle, a first campaign of sensibility tests was carried out, involving a small group

New HMI Concept for Motorcycles–The Saferider Approach

363

of 8 volunteers. Each trial consisted in applying a tactile stimulus to the hand of the volunteer while he/she was squeezing the handle as in natural riding conditions. The tactile stimulus was applied three times, at intervals of 10s, with the same amplitude but with different frequencies, 1.0Hz, 1.5Hz and 2.0Hz [8]. These parameters were combined with two different durations of the tactile stimulus, 3s and 5s, and two tactile conditions of the hands: Bare-hands and wearing gloves. Each volunteer had to perform three trial sessions, each one consisting in four trials (two with bare hands and two wearing gloves) in which frequencies were randomly executed. At the end of each trial, the volunteers were asked to rate each stimulus concerning two aspects: The perception of the feedback, and the discomfort level due to the stimulus. In Table 1 answers to the aforementioned questions, for all the conditions investigated, are collected. Data reported in Table 1 show a clear trend: there is approximately a linear correlation between perception of the stimulus and its discomfort level; in fact tactile stimulus executed with the highest frequency, 2.0 Hz, is the most strongly perceived, but at the same time volunteers indicate this frequency as the most tiresome. This discomfort level rises when the feedback is applied for 5s. The contrary happens with stimulus executed with the lowest frequency, 1.0Hz. Based on the previous considerations, two possible applications for the haptic handle can be identified. In a first application the haptic handle can be used to communicate information on potential dangerous situation; this is strongly related to the tactile characteristic of the system: A feedback executed with a frequency of 2Hz, applied for a short period, not longer than 5 seconds. A second possible application of the haptic handle can be realized using low frequencies, 1.0Hz or 1.5Hz, that showed highest values of acceptability in terms of discomfort, to call rider’s attention on useful information for a better riding, like for example, weather warning visualized on the navigator. VIBRATION FEEDBACK BRACELET: A wireless driven bracelet with integrated vibrotactile elements that give vibratory feedback on left, right, top and bottom sides of the right hand wrist is developed. The vibrations will be employed for transmitting navigation hints and to communicate when speed limits are overcome. The effectiveness of such principle for providing navigation hints has been largely investigated. Heuten [9], Regenbrecht [10] and Sergi [11] have shown the capability for a similar device to provide guidance cues. The vibrations are produced through eccentric-mass electric motors, similar to those used in portable phones. The vibration signals are delivered at a fixed frequency of about 150-180 Hz and they are modulated in ON/OFF pulsating mode with a pulsation frequency in the range of 0-40 Hz. The bracelet, represented in Fig. 3, is connected through Bluetooth wireless using Serial Peripheral Port (SPP) profile. Custom driver electronics have been realized and integrated in the bracelet. Non-standardized user tests show already a highly intuitive interpretation of the directional information and a potential of giving also intuitive speed related information on a motorcycle by activating the upper and lower vibrator in certain time sequences. The twist of hand wrist as necessary to manipulate the throttle can be stimulated in both directions. The tests also confirm what literature states [see e.g. 12]: The imminence of the event that has to be communicated can be expressed with frequency. For example, as close as the rider gets to a curve on the right hand side the higher is the delivered pulsing frequency on the right side of the wrist. An alternative prototype has also been developed with the same vibrating elements integrated in a motorcyclist’s glove. Same as in the bracelet the information appears to be highly intuitive for directional and speed related cues.

1,0Hz 1,5Hz 2,0Hz

1,0Hz 1,5Hz 2,0Hz

How would you define the discomfort level of the warning? Acceptable 2 3 Tiresome 1,0Hz 54,2% 20,8% 20,8% 4,2% 1,5Hz 20,8% 62,5% 8,3% 8,3% 2,0Hz 8,3% 25,0% 50,0% 16,7%

3 Seconds Duration with Gloves How would you define the intensity of the warning? Slightly Strongly 2 3 Perceptible Perceptible 29,2% 58,3% 12,5% 0,0% 0,0% 16,7% 54,2% 29,2% 0,0% 16,7% 33,3% 50,0%

29,2%

8,3%

2,0Hz

58,3%

2,0Hz

8,3%

1,5Hz

How would you define the discomfort level of the warning? Acceptable 2 3 4 Tiresome 1,0Hz 62,5% 29,2% 8,3% 0,0% 1,5Hz 33,3% 45,8% 8,3% 12,5%

1,0Hz 1,5Hz 2,0Hz

1,0Hz

1,0Hz 1,5Hz 2,0Hz

1,0Hz 1,5Hz 2,0Hz

3 Seconds Duration without Gloves How would you define the intensity of the warning? Slightly Strongly 2 3 Perceptible Perceptible 20,8% 58,3% 16,7% 4,2% 0,0% 20,8% 58,3% 20,8% 0,0% 4,2% 37,5% 58,3%

33,3% 4,2%

45,8% 37,5%

16,7% 41,7%

4,2% 16,7%

How would you define the discomfort level of the warning? Acceptable 2 3 Tiresome 62,5% 25,0% 12,5% 0,0%

5 Seconds Duration with Gloves How would you define the intensity of the warning? Slightly Strongly 2 3 Perceptible Perceptible 20,8% 58,3% 20,8% 0,0% 4,2% 12,5% 58,3% 25,0% 0,0% 4,2% 29,2% 66,7%

How would you define the discomfort level of the warning? Acceptable 2 3 Tiresome 54,2% 25,0% 20,8% 0,0% 20,8% 58,3% 20,8% 0,0% 8,3% 29,2% 45,8% 16,7%

5 Seconds Duration without Gloves How would you define the intensity of the warning? Slightly 4 Strongly 2 3 Perceptible Perceptible 8,3% 66,7% 16,7% 8,3% 0,0% 8,3% 58,3% 33,3% 0,0% 12,5% 20,8% 66,7%

Table 1. Answers percentage for all the conditions investigated

364 J. P.F. Diederichs et al.

New HMI Concept for Motorcycles–The Saferider Approach

365

Fig. 3. Vibration Bracelet for navigation and speed limits warning

VIBRATION IN SEAT: Since a high amount of feedback from the motorcycle and road is received by the bottom of the rider it is consequent to think about an HMI element that gives direct feedback at this contact-area between rider and motorbike. In SAFERIDER a vibration unit is developed which aims to give feedback at the seat to communicate a warning to the rider. Analogously to the Vibrating Bracelet, the system consists of a custom vibrating eccentric mass motor which is however more powerful. The motor is fixed under the saddle of the motorbike and is controlled in ON/OFF mode through dedicated electronics.

3 Results / Status The development of the SAFERIDER HMI strategy and prototypes is on the run. Phase 1 of the development methodology has been completed and the selected HMI elements are presented in this paper. Currently the development of elements is in phase 2 for the vibration in seat and force feedback throttle devices. Both elements have been realized as prototypes and are ready for the first tests of phase 3. For the vibration in the bracelet device a prototype has been developed and preliminary tested and with the haptic handle device already a set of standardized tests of phase 3 have been compiled with detailed results which contribute to the assignment of this element to applications. This is planned in phase 4 in summer 2009 while the characteristics optimization of phase 5 and the development of acoustic and visual attendances of phase 6 are planned for autumn and winter 2009. Phase 7, the integration into PTW, will be realized in winter and spring 2010 and the final pilot tests shall be conducted in phase 8 in summer 2010.

4 Conclusion In order to realize higher security for PTW riders SAFERIDER intents to introduce active safety and information systems in form of ARAS and IVIS for a wide spectrum

366

J. P.F. Diederichs et al.

of PTW. The acceptance of those ARAS and IVIS will highly depend on the user interface; hence in the SAFERIDER project the HMI is in special focus. The SAFERIDER approach follows a methodology based on 8 development phases and will now enter phase 4 of the process. Based on the assumption that visual information is inappropriate for PTW riders the SAFERIDER HMI will be based mainly on haptic HMI elements which are presented in this paper and which will be accompanied by visual and acoustic stimuli in order to achieve a multimodal and organic information message that is able to gain riders acceptance.

References 1. SAFERIDER, http://www.saferider-eu.org 2. Abras, C., Maloney-Krichmar, D., Preece, J.: User-Centered Design. In: Bainbridge, W. (ed.) Encyclopedia of Human-Computer Interaction. Sage Publications, Thousand Oaks (2004) 3. Continental, http://www.continental.com 4. Abbink, D.A., Boer, E.R., Mulder, M.: Motivation for continuous haptic gas pedal feedback to support car following. In: Intelligent Vehicles Symposium, 2008, pp. 283– 290. IEEE Press, Los Alamitos (2008) 5. Aoki, J., Murakami, T.: A method of road condition estimation and feedback utilizing haptic pedal. In: AMC 2008. 10th IEEE International Workshop on Advanced Motion Control, 2008, pp. 777–782. IEEE press, Los Alamitos (2008) 6. Gunther, E.: Skinscape: A tool for composition in the tactile modality. Massachusetts Institute of Technology (2001) 7. van Veen, H.A.H.C., van Erp, J.B.F.: Tactile Information Presentation in the Cockpit. In: Murray-Smith, R. (ed.) Haptic HCI 2000. LNCS, vol. 2058, p. 174. Springer, Heidelberg (2001) 8. Hale, K.S., Stanney, K.M.: Deriving Haptic Guidelines from Human Physiological, Psychophysical and Neurological Foundations. IEEE Computer Graphics and Applications 24(2), 33–39 (2004) 9. Heuten, W., Henze, N., Boll, S., Pielot, M.: Tactile wayfinder: a non-visual support system for wayfinding. In: NordiCHI 2008 Proceedings of the 5th Nordic Conference on HumanComputer interaction: Building Bridges, Lund, Sweden, pp. 172–181. ACM, New York (2008) 10. Regenbrecht, H., Hauber, J., Schoenfelder, R., Maegerlein, A.: Virtual reality aided assembly with directional vibro-tactile feedback. In: GRAPHITE 2005 Proceedings of the 3rd international Conference on Computer Graphics and interactive Techniques in Australasia and South East Asia, pp. 381–387. ACM, New York (2005) 11. Sergi, F., Accoto, D., Campolo, D., Guglielmelli, E.: Forearm orientation guidance with a vibrotactile feedback bracelet: On the directionality of tactile motor communication. In: 2nd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics BioRob 2008, pp. 433–438 (October 2008) 12. Jones, L.A., Sarter, N.B.: Tactile Displays: Guidance for their Design and Application. Human Factors: The Journal of the Human Factors and Ergonomics Society 50 (2008)

Night Vision - Reduced Driver Distraction, Improved Safety and Satisfaction Klaus Fuchs, Bettina Abendroth, and Ralph Bruder Institut für Arbeitswissenschaft, TU-Darmstadt, Petersenstr. 30, 64287 Darmstadt {fuchs,abendroth,bruder}@iad.tu-darmstadt.de

Abstract. Accidents in the dark in non-urban areas result in disproportionately high rates of pedestrians injuries (Hülsen 2003). This paper describes the methods and results of test drives with 39 test persons for the assessment of driver behavior in interacting with different Human-Machine-Interfaces (HMI) for Night Vision systems which inform drivers about the presence of pedestrians. Data of driver eye movements was analyzed to evaluate the different HMI designs of pedestrian alerts, in a passenger vehicle with Head-up-Display (HUD), regarding the ergonomic suitability and safety benefit. All systems were compared in field tests on public roads and on a test track. Keywords: night-vision, head-up display, eye tracking, driver distraction.

1 Introduction Night Vision with pedestrian detection supports drivers’ cognition. The examined systems try to decrease the high rates of pedestrians injuries through visual assistance (Hülsen 2003). On the other hand, due to the increasing number of driver assistance systems in today’s vehicles, there is a need to evaluate the potential risk of increased driver distraction caused by additional displays before they are available for end users. Especially with Night Vision systems, which generate, regarding to Green (cited in Jones 2006), the problem “that these systems demand that the driver take his or her focus from the road”. To enhance perception of pedestrians Scheuner et al (2005) and Fardi et al. (2005) already studied pedestrian detection using infrared sensor based Night Vision systems. This study focuses on drivers’ behavior and the allocation of drivers’ attention comparing different HMI designs of pedestrian detection using Head-up-Display technology to visualize the information. Three different HMI designs for pedestrian detection were compared with two baselines, where one baselines was “without Night Vision” and the second one a “conventional Night Vision” without pedestrian alert. The study examines different objective and subjective appraisals of HMI-Systems depending on the environment and drivers’ task with a focuses on drivers’ distraction. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 367–375, 2009. © Springer-Verlag Berlin Heidelberg 2009

368

K. Fuchs, B. Abendroth, and R. Bruder

2 Description of Methods 2.1 Experimental Design The Institute of Ergonomics (IAD) of the TU Darmstadt developed a method to test different implementations of the pedestrian warning of the Night Vision HMI. Three different HMI designs were compared with two baselines (see table 1). The baselines were “without Night Vision” and a “conventional Night Vision” without pedestrian alert. The examined HMI implementation was varied by the level of abstraction and the duration of the presented information. The drivers were able to test all systems and baselines with several pedestrians before they used the system on public roads. During the field test each subject had the chance to experience each night vision system and baseline with two pedestrian events. Table 1. Experimental design 3 Systems System 1 2 Baselines

System 2

System 3

Baseline 1 Baseline 2

A BMW E60 with far infrared Night Vision was used as basis for the experiment. It was equipped with brightest Xenon-technology available to ensure best preconditions for the drivers’ vision. To test Night Vision systems under lifelike and reproducible conditions, tests were carried out on public roads and on the test track of the TU Darmstadt. The driver’s job was to drive the given route according to their habits. The route bypassed eight positioned pedestrians. The subject was neither asked to look out for pedestrians nor was he asked for any response, in the case he detects a pedestrian. However there was a questionnaire after the subject passed a pedestrian, weather he had detected him or not. The subjects had to drive 22.5 km (approx. 14 miles) on public roads. After the field test on the public road the variants were compared again on the test track. There, the driver’s job was to drive 50 km/h (approx. 30 mph) and to reduce speed to 30 km/h (approx. 20 mph) once they saw a pedestrian. The location of the pedestrian was altered. The passenger car trials were carried out with 39 female and male test persons in two age groups. The younger age group included drivers aged 25 to 40 years, and the older age group included drivers from 50 to 65 years. The trials were carried out comparable terms of visibility after sunset. The subjects drove 50 to 60 minutes. The overall duration of the experiment was 1½ to 2 hours. 2.2 Methods of Measurement Eye tracking is an important tool for evaluating different HMI designs in passenger vehicles. Because of a correlation of eye movement and observed information

Night Vision - Reduced Driver Distraction, Improved Safety and Satisfaction

369

(SEIFERT et al. 2001, RÖTTING 2001) the recorded data exposes the allocation of drivers’ attention. The recorded data was analyzed based on the Eye-Mind Assumption and the Immediacy Assumption of JUST AND CARPENTER (1980) and the Sequence Assumption. The sequence assumption postulates that the sequence of fixations allows for the drawing of conclusions to the subjects’ sequence of information processing. The drivers’ eye movements were recorded with a modified head mounted SMI eye tracker (resolution: 0.5° – 1.0°, 720 x 576pixels at 50 Hz; 8000Kbps; measurement range ±30° horizontally, ±25° vertically). The recorded data was analyzed manually. Besides eye tracking, various other data was collected such as car speed, steering wheel angle, braking pressure and the videos of the driver and the environment. Questionnaires were used to collect subjective data. The subjects had to complete questionnaires before the trial and were interviewed after they passed a pedestrian. After the trial they had to complete another questionnaire.

3 Results In this study, visual distraction of different HMIs was analyzed by the percentage of the accumulated durations of fixation, the maximum durations of fixation and the frequency of the fixations on different areas of interest (AOI) during the driving task. The scene video was divided in the following AOI: Street, HUD, instrument cluster, interior, pedestrians, miscellaneous and measurement error. 3.1 Response Time The duration period of cognition (Te) is defined as the time difference between the first fixation and the passing of the pedestrian. It is a factor of the time which remains for the driver to handle the situation appropriately. Longer Te’s are better. A comparison of the medians in figure 1 illustrates, that Subjects with “system B” had, on average, more time to react appropriately to the presence of a pedestrian after they detected him. The difference of the mean-values of Te was verified in a t-test for paired samples. The significant results are shown in table 2. This demonstrates that “system B” gives drivers significantly more time to react appropriately compared to “baseline 1” and also significantly more time for an appropriate reaction compared to “system C”. The results of the duration period of cognition (Te) were compared with the response times which were measured in a track test that followed the field test where the drivers were asked to slow down as soon as they saw a pedestrian on the test track. The graph in figure 2 shows the distance of the drivers to the pedestrians at the moment they utilized the brake pedal. A longer distance is superior, because it leads to a longer time span to react appropriately to the presence of a pedestrian. The comparison of the medians in figure 1 and figure 2 reveals that temporary renditions of the information, (“system B” and partially “system A” as well as “system C”) which showed excellent results in the realistic field test, did not perform as well in the test field when the driving task tempted drivers to focus on the secondary task, the HUD.

370

K. Fuchs, B. Abendroth, and R. Bruder

Duration period of cognition Te [s]

14.0 12.0 10.0 8.0 6.0 4.0 2.0 0.0 Baseline 1 (N=36)

Baseline 2 (N=22)

System A (N=61)

System B (N=62)

System C (N=60)

Fig. 1. Box plot: Duration period of cognition (Te in seconds): difference between the first fixation and the passing of the pedestrian Table 2. Significant results of t-test for the duration period of cognition (Te)

Version Baseline 1 – System A Baseline 1 – System B Baseline 1 – System C System B – System C

p 0.000 0.000 0.001 0.002

T 5.825 6.803 3.822 3.284

Furthermore, a response time T1 was defined. It is the difference between the first HUD warning and the first HUD fixation of the subject. A shorter response time T1 gives more time to the driver to react appropriately to the presence of a pedestrian. In figure 3 it is shown, that the median of the subjects focus on the pedestrian warning is significantly earlier in “system B” compared to the systems “A” or “C”. The results of a paired t-test have proven a significant difference of the mean values of “system B” and “system C” (p=0.027; T=2.268). A smaller interquartile range is observed with “system B” compared to “system A” and “system C” Because of the temporary nature of the information presentation in “system B”, the visual stimuli for the driver in “system B” may be more intense compared to “system A” and “system C”.

Night Vision - Reduced Driver Distraction, Improved Safety and Satisfaction

371

Fig. 2. Box plot: Initiation of slowdown on the test track in meters (left) ahead of the pedestrian and the calculated time to pass (right)

Fig. 3. Box plot: Response time (T1 in seconds): difference between the first HUD warning and the subject’s first fixation on the HUD

3.2 Comparison of Subjective and Objective Rating of Distraction According to Yan et al. (2008) the “drivers' distraction could be the highest risk factor leading to the failure of attempting to avoid crashes”. Because of this, one focus of this study was the evaluation of drivers’ distraction.

372

K. Fuchs, B. Abendroth, and R. Bruder

very distracted

not distracted at all

n=39

Baseline 2

System A

System B

System C

Fig. 4. Box plot: Questionnaire: “How do you rate your distraction by the particular variant?” Scaling from 1 – “not distracted at” all to 5 “very distracted”

Drivers’ distraction was measured in several ways, including a subjective rating of the drivers. The subjective rating of distraction was evaluated with a questionnaire at the end of the test drive. Figure 4 shows the subjective rating of the drivers’ distraction. A paired one-way analysis of variance (ANOVA) was used for testing differences in ranking. An Influence of a factor is shown with the factors F=7.501, df=3, error df=36 and p=0.001. Differences between single groups were calculated using contrasts. There is a significant difference of “system B” compared to “baseline 2” with the factors p<0.001, F=20.030 and df=1. A tendency of “system C” being superior over “baseline 2” is shown factors p=0.051; F=4.063 df=1. One of the features of “system B” is the temporary display of the pedestrian information. There were several objective indicators recorded and calculated, which allow to draw conclusions to the drivers distraction, including various response times. One way to objectively rate the potential driver’s distraction is the maximum duration of the drivers’ fixations of the secondary task. As long as a driver focuses his attention on a secondary task, he is not able to react to sudden changes in the road environment. For example, other road members crossing the road or sudden speed changes of other road users cannot be processed by the driver while they focus on a secondary task. A lower maximum duration of fixations is preferable. Figure 5 shows the maximum duration of fixations of the HUD during driving task. The drivers’ maximum duration of fixations on HUD are significantly reduced with “system C” compared to “Baseline 2”, as well as compared to “system B”. The results of the t-test for paired samples are shown in table 3.

Night Vision - Reduced Driver Distraction, Improved Safety and Satisfaction

373

Fig. 5. Box plot: Maximum duration of fixations of the HUD during driving task[s] Table 3. Significant results of t-test for maximum duration of fixations of the HUD

Version System C – Baseline 2 System C – System B

p 0.037 0.045

T 2.267 2.079

According to ZWAHLEN et al. (1988) a secondary task is too distracting, if the subject focuses more than approx. 2 seconds a secondary object. However, these limit values were not intended for Head up Displays, where the drivers’ eyes don’t have to accommodate and adapt, when they focus the road again. The maximum duration of fixations combined with the total duration of fixations and the number of fixations is an indicator that a design gives drivers a better possibility to stop the secondary task easily and focus the attention on the road and other road users again. The medians of “system C” were lower than the medians of the other systems while the duration of fixations was lower and the number of fixations was higher. This leads to the conclusion that it’s easier with “system C” to interrupt the monitoring of the assistance system “Night Vision with pedestrian detection” compared to the other systems. The difference of the results of the questionnaire (figure 4 / figure 5; table 3) and the results of the maximum durations of fixations are founded in the inaccuracy because subjects tend to rate things better, which they like, as well as the challenge to sum all indicators for distractions in one objective item.

374

K. Fuchs, B. Abendroth, and R. Bruder

3.3 Number of Observed Pedestrians One important factor for benchmarking the system is the number of observed pedestrians in the field test. Drivers can only react appropriate to perceived pedestrians. The analysis shows that the design of the HMI has a tremendous influence on the number of observed pedestrians. In figure 6 the number of observed pedestrians is visualized depending on the location, where the subjects observed them first. The drivers showed a 100% success rate in this test while using the HMI of “system B”. Other systems could not improve the number of perceived pedestrians. With the altered driving task on the test track, there was a 100% success rate with all systems. This might be caused by the predictable, less demanding environment compared to the field study. This leads to the conclusion that Night Vision systems with an appropriate HMI design can increase road safety also in demanding driving environments at night. 100%

Not observed

80% Road only

60% Road first

40% HUD first

20% 0%

N=19 Baseline 1

N=18

N=37

N=37

N=37

Baseline 2

System A

System B

System C

Fig. 6. Number of observed pedestrians (%) in field-test depending on the location of the accommodation of information

The analysis of the drivers’ durations of fixations of the AOI “road” showed no significant reduction. There was also no significant reduction on the frequency of fixations of the AOI “road” observed during these tests. In addition to the discussed results above, the average speed of the subjects was analyzed. There was no significant speed difference observed between the average speed of the drivers with the different HMI designs, however, a slight trend in a reduced average speed on the road compared to the test track was observed.

4 Conclusion The results show that vehicles equipped with appropriate HMI implementations of the Night Vision systems, which inform the drivers to the presence of pedestrians, are

Night Vision - Reduced Driver Distraction, Improved Safety and Satisfaction

375

significantly superior to “non Night Vision” vehicles, which were investigated in the “baseline 1”. The appropriate HMI implementations are also superior to conventional Night Vision systems, as seen in “baseline 2”. “System B” was superior in most analyzed parameters. However, “system C” showed a great potential for the future because its HMI leads to significant less “maximum duration of HUD fixations” compared to the other systems. With an appropriate HMI design, Night Vision with pedestrian detection increases perception of pedestrians in darkness. This is a contribution to increased road safety. Furthermore, there was no indication for increased driver distraction with an appropriate HMI design. The study shows that a less realistic driving task and an environment, which needed less attention because of the decreased exposure, can lead to a different rating and to different driver behaviors. Therefore the appropriate choice of environment and tasks is important to generalize the results.

References 1. Fardi, B., Scheunert, U., Wanielik, G.: Shape and motion-based pedestrian detection in infrared images: a multi sensor approach. In: IEEE Intelligent Vehicles Symposium, Las Vegas, NV (2005) 2. Hülsen, H.: Unfallgeschehen mit Fußgängern bei Nacht. In: Deutscher Verkehrssicherheitsrat e.V. (ed.), Unfälle in der Dunkelheit, pp. 14–17. Deutscher Verkehrssicherheitsrat e.V, St. Augustin (2003) 3. Jones, W.: Safer Driving in The Dead of Night. In: Jones, D. (ed.) IEEE Spectrum, vol. 43(3), pp. 20–21 (2006) 4. Just, M.A., Carpenter, P.A.: A theory of reading: From eye fixations to comprehension. Psychological Review 87, 329–354 (1980) 5. Rötting, M.: Parametersystematik der Augen- und Blickbewegungen für arbeitswissenschaftliche Untersuchungen. Schriftenreihe Rationalisierung und Humanisierung, 34. Shaker, Aachen (2001); Doctoral Dissertation, RWTH Aachen 6. Scheunert, U., Cramer, H., Fardi, B., Wanielik, G.: Multi-Sensor-Daten-Fusion zur Personenerkennung mit dem Merkmalsmodell. In: INFORMATIK 2005, - Informatik LIVE! vol. 2, Beiträge der 35. Jahrestagung der Gesellschaft für Informatik e.V, GI (2005) 7. Seifert, K., Rötting, M., Jung, R.: Registrierung von Blickbewegungen im Kraftfahrzeug. In: Jürgensohn, T., Timpe, K.P. (eds.), pp. 207–228. Springer, Berlin (2001) 8. Yan, X., Harb, R., Radwan, E.: Analyses of factors of crash avoidance maneuvers using the general estimates system, Traffic Inj. Prev., Knoxville, Tennessee, USA, vol. 9(2) (June 2008) 9. Zwahlen, H.T., Adams Jr., C.C.: Safety aspects of CRT touch panel-controls in automobiles. In: Gale, A.G., Freeman, H.M., Haslegrave, C.M., Smith, P., Taylor, S.P. (eds.) Vision in Vehicles II, pp. 335–344. Elsevier, Amsterdam (1988)

Measurement of Depth Attention of Driver in Frontal Scene Mamiko Fukuoka1, Shun’ichi Doi1, Takahiko Kimura3, and Toshiaki Miura2 1

Department of Intelligent Mechanical Systems Engineering, Kagawa University 2217-20 Hayashi-Cho Takamatsu, Kagawa, 761-0396 Japan [email protected], [email protected] 2 Department of Applied Cognitive Psychology, Osaka University 1-2 Yamadaoka, Suita, Osaka, 565-0871 Japan [email protected] 3 Kansai University of Welfare Sciences 3-11-1 Asahigaoka, Kashiwara, Osaka, 582-0026 Japan [email protected]

Abstract. Safety driving has been maintained by suitable watching for frontal scene. The delay of reaction has been found to be the major cause of vehicle accidents. It is the purpose of this paper to investigate depth attention characteristic of drivers when drivers are in the traffic environment by the use of a semirealistic setting a three-dimension (3-D) attention measurement system. The experiments were conducted in order to clarify the effect of the individual characteristic of driver and traffic environment on the characteristic of depth attention by studying various aspects of the effect of aging, illuminance and display color on characteristic of driver's depth attention from three elements of traffic. Keywords: display color, three-dimensional space, shift of attention, depth, allocation of attention, driver.

1 Introduction Recently, the number of heavy traffic accidents is on a declining trend. However, the ratio of the traffic accident related to the senior citizen is increasing, and the number of accidents at twilight stands out, too. A visual system of drivers receives massive amounts of information and prioritizes from frontal road signs and markings in traffic environment, so drivers have to switch continually their visual attention between the objects presented in front of drivers. The in-car devices including car navigation system and speedometer become widespread in recent years, thus drivers get information from various in-car devices while driving. The display colors, brightness, and the contrast, etc. of in-car devices were investigated from the viewpoint such as visibility. However, there are few researches that examine the influence that relation between incar device and traffic environment exerts on characteristic of driver's depth attention. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 376–385, 2009. © Springer-Verlag Berlin Heidelberg 2009

Measurement of Depth Attention of Driver in Frontal Scene

377

In preceding study for the characteristic of depth attention, using cue to present the appearance position, it was shown that Reaction Time (RT) was earlier when the appearance position presented like the cue than the cue was not presented, and RT was slower when the appearance position was not presented like the cue than the cue was not presented. In addition, in transfer efficiency of depth attention, it was shown that RT was slower when subjects' attention was shifted "near to far" than "far to near" as asymmetry, and RT was slower when the moving distance was long than short. Furthermore, it was shown that these characteristics appeared more remarkably at the dynamic condition than static condition. In the present study, we clarify the effect of the individual characteristic of driver and traffic environment on the characteristic of depth attention by studying various aspects of the effect of aging, illuminance and display color on characteristic of driver's depth attention from three elements of traffic.

2 Influence of Aging Given to the Characteristic of Depth Attention 2.1 Method According to results of experiment, analyze characteristic of depth attention of young subjects and elder subjects. Subjects: There were two group subjects. First group was 17 volunteers participated at Kagawa University. All the participants have normal vision or corrected to normal vision. Second group was 17 elder subjects from 59 to 68 years old. One subject's data was excluded, because of cull rate of error was over 20%. Apparatus and stimuli: The overall size of the device is 8m length. Scale the model to be 1/25 of actual size in the inner of this device, it looks like a tunnel in which there were four targets. The subjects observed the scene by an eyepiece (1/2 multiple), so the apparent scenery was 1/50 scale. The fixation point was a yellow LED with approximately 4.3cd/m2 in brightness. A fixation point was presented at a distance of 120cm from the subject. The targets were red LED with approximately 7.6cd/m2 in brightness, and were located at 30cm, 81cm, 158cm and 231cm from the subject, separately. The subject sat on the chair of a cart moving forward in tunnel alleyway. The stimuli were digital LED located in the front of sight line of the subject. There were two targets separately in front and behind of the fixation, 30cm and 81cm (in front) as well as 158cm and 231cm (in behind) from observer. It is equated distance of 15m, 40.5m, 79m and 115m respectively in real space by means of eyepiece. Luminance of the environment was daylight condition (480-680 lx). Design: Three independent variables were examined: cue validity (Valid, Neutral, and Invalid), asymmetry (“far to near” or “near to far”) and subjects group (elder subjects and young subjects). Procedure: The observing condition was in moving condition (0.44m/s). Three within-observe variables were used: cue validity (valid, neutral and invalid). 65% of

378

M. Fukuoka et al.

all trials were valid, 15% were invalid and 20% were neutral. The entire experiment consisted of a unique session of 320 trials. Subjects have a short rest every 160 trials. The fixation point was presented after 1000ms from the beginning of each trial and the information of target location were presented in the fixation position by digital LED (1 to 4). And then, the targets were presented until subjects made response. In order to judge accurately targets which appeared timely, shape of targets were used in “E” or “3”. Task of the subject was to judge whether the target presented nearer than fixation point or further than it, then the subjects must make a quick response according to the information presented beforehand at the targets location: “E” or “3”, and then push the button as soon as possible. Subjects have to exercise before the formal experiment. The procedure and task in exercise are almost the same as the formal experiment. When subject can accomplish the task by achieving the criteria of accuracy, the exercises stop. Reaction time (RT), for correct trials only, was calculated for every subject. Besides, it were not adopted that reaction time were beyond 1000 ms and within 100 ms because response of the subjects to the targets was too fast or too slow. Binoculars 120cm

Position of observation T1 Cue

T2

T3

T4

Targets

0.44m/s 30cm

81cm

158cm 231cm

Fig. 1. A three-dimensional depth attention measurement system

2.2 Results and Discussion Fig.2 showed RT and standard deviation in three kinds of cue cases (valid, neutral and invalid). There were significant differences in RT of each cue[F(2,62)=25.02 p=.000], subjects group[F(1,31)=56.36 p=.000] and asymmetry[F(1,31)=9.65 p=.004]. In additional analyses, RT was slower in invalid than valid (p=.000) and neutral (p=.000), and RT was faster in valid than neutral (p=.002). From the above result, it was indicated that RT of elder subjects were delayed approximately 200ms behind young subjects, and RT was slower when subjects' attention was shifted "near to far" than "far to near". In addition, it was indicated that RT when the subjects was turned one's mind was faster than when the cue was not presented, and RT was slower when the subjects was turned one's mind to the place that was not appearance position than when the cue was not presented. These results are consistent with the preceding study.

Measurement of Depth Attention of Driver in Frontal Scene

379

900 Younger Elder

Reaction Time[ms]

800 700 600 500 400 300 Valid

Neutral near to far Expectancy of Validity

far to near

Fig. 2. Reaction Time of each cue

3 Influence of Illuminance Given to Characteristic of Depth Attention 3.1 Method According to results of experiment, analyze influence of illuminance given to characteristic of depth attention. Subjects: In this experiment, 13(high visual adaptability group: 6, low visual adaptability group: 7) students of Kagawa University participated as subjects. Apparatus and stimuli: The apparatus and stimuli were the same except peripheral illluminance. Peripheral illluminance of this chapter was dawn condition (5-8lx), twilight condition (95-135lx), and daylight condition (480-680lx). Design: Four independent variables were examined: cue validity (valid, neutral, and invalid), asymmetry (“far to near” or “near to far”), illuminance (daylight, twilight and dawn) and subjects group (visual adaptability was high group and low group). Procedure: The basic procedure of this chapter was the same as chapter 2. 3.2 Results and Discussion Subjects were categorized two groups: visual adaptability was low group and high group like Fig.3. Fig.4 showed RT of low group and standard deviation in three kinds of cue (valid, neutral and invalid). There were significant differences in RT of each cue[F(2,22)=27.63 p=.000] and asymmetry[F(1,11)=35.91 p=.000]. From this result, RT was slower when subjects' attention was shifted "near to far" than "far to near". In additional analyses of multiple comparison, RT of each cue was slower in invalid than valid (p=.000) and neutral (p=.000), and faster in valid than neutral (p=.016). These results are consistent with the preceding study. However, in this experiment, there

380

M. Fukuoka et al.

0.7

low environmental

Moving adjustability

0.6 0.5 0.4

high visual

low visual

0.3 0.2 0.1 0 0

0.2

low1 moving 1.2

0.4 0.6 0.8 Ambience adaptability

Fig. 3. Moving and Ambience adaptability

Reaction Time[ms]

500

450

400 daylight_high twilight _high dawn _high daylight_low twilight _low dawn _low

350

300 Valid

Neutral

far to near

near to far

Fig. 4. Reaction Time of each cue

were no significant differences in RT of illuminance[F(2,22)=1.07 p=.360]. Therefore, main effect of illuminance was examined in respective visual adaptability groups. In high visual adaptability group, there were no significant differences in RT of illuminance[F(2,10)=0.05 p=.956]. In low visual adaptability group, there were no significant differences in RT of illuminance[F(2,12)=1.77 p=.212] too. However, seen in Fig.4 and analysis result, these results suggested that low visual adaptability group was susceptible to illuminance than high visual adaptability group.

4 Influence of Display Color Given to Characteristic of Depth Attention According to results of experiment, analyze influence of display color given to characteristic of depth attention.

Measurement of Depth Attention of Driver in Frontal Scene

381

Ⅰ Influence of Colors’ Combination Given to Depth Attention Characteristic 1 Method According to results of experiment, analyze influence of colors’ combination given to depth attention characteristic. Subjects: The subjects were 10 volunteers participated at Kagawa University. All the subjects have normal vision or corrected to normal vision. Apparatus and Stimuli: The apparatus and stimuli were the same as chapter 2 except the color of the fixation point and targets. The fixation point was a red (10cd/m2) and green (18cd/m2) LED in brightness, and the targets were red (5.5cd/m2) and green (5.0cd/m2) LED in brightness. Design: Four independent variables were examined: condition (static and dynamic), cue validity (valid, neutral, and invalid), asymmetry (“far to near” or “near to far”) and combination of colors (cue-target: red-red, green-green, red-green and green-red). Procedure: The procedure was the same as chapter 2 except the condition, the number of trials and the way of presenting the cue. The observing condition was dynamic (0.44m/s) and static. The entire experiment consisted of a unique session of 640 trials. The subjects experimented by 320 trials separately in two days. The information of target location was presented in the fixation point by digital LED (in front: U, in behind: inverse U and Neutral: H). 2 Results and Discussion Fig.5 showed RT and standard deviation in three kinds of cue cases (valid, neutral and invalid) in dynamic condition. There were significant differences in RT of each cue[F(2,18)=4.50 p=.026], asymmetry[F(1,9)=16.06 p=.003], combination of colors[F(3,27)=5.52 p=.004] and cue-colors’ combination interaction [F(6,54)=2.54 p=.031]. In additional analyses of multiple comparison, RT of each cue was slower in invalid than valid (p=.017) and neutral (p=.019). There were no significant differences in RT between valid and neutral (p=.952). This result was attributed to the fact that range cue (attention allocation by top-down process) was used such as “in front” and “in behind”. In addition, RT of each colors’ combination was faster in green-red combination than green-green (p=.003) and red-green combination (p=.002). In uncomplicated main effect as additional analyses, there were significant differences in RT of combination of colors in invalid[F(3,81)=8.44 p=.000]. In multiple comparison, there were significant differences in RT between green-green and green-red combination (p=.000), green-green and red-red combination (p=.001), red-green and green-red combination (p=.001) and red-red and red-green combination (p=.010). From the above result, it was indicated that RT was slower when subjects' attention was shifted "near to far" than "far to near", and RT when the subjects was turned one's mind to the place that was not appearance position was slower than when the cue was not presented. These results are consistent with the preceding study. In addition, RT was faster when the target color was red than green. In fact, this result suggested that main effect of cue was dependent on target colors than colors of fixation point.

382

M. Fukuoka et al.

Reaction Time[ms]

500 Cue: Red, Target: Red Cue: Green, Target: Green Cue: Red, Target: Green Cue: Green, Target: Red

450

400

350

300 Valid

Neutral

far to near

near to far

Fig. 5. Reaction Time of each cue in dynamic condition

Ⅱ Influence of Display Colors of the Same Luminance Given to Depth Attention Characteristic

1 Method According to results of experiment, analyze influence of display colors of the same luminance given to depth attention characteristic. Subjects: The subjects were 27(male: 12, female: 15) volunteers participated at Kagawa University. All the subjects have normal vision or corrected to normal vision. One subject's data was excluded, because of cull rate of error was over 20%. Apparatus and Stimuli: The apparatus and stimuli were the same as experimentⅠ except the color of the fixation point and targets. The fixation point was a red, green and blue (8.2-8.5cd/m2) LED in brightness, and the targets were red (7.4cd/m2) LED in brightness. Design: Five independent variables were examined: gender, condition (static or dynamic), cue validity (valid, neutral, and invalid), asymmetry (“far to near” or “near to far”) and display colors (red, green and blue). Procedure: The procedure was the same as experimentⅠ except the rate of cue validity type and the number of trials. 64% of all trials were valid, 14% were invalid and 22% were neutral. The entire experiment consisted of a unique session of 336 trials. 2 Results and Discussion Fig.6 showed RT and standard deviation in three kinds of cue cases (valid, neutral and invalid) in dynamic condition. There were significant differences in RT of each cue[F(2,48)=50.71 p=.000] and condition-asymmetry interaction [F(1,24)=6.62 p=.017], and had a tendency of gender-condition-display colors’ interaction [F(2,48)=2.73 p=.075]. In additional analyses of multiple comparison, RT of each cue was slower in invalid than valid (p=.000) and neutral (p=.000). There were no significant differences in RT between valid and neutral (p=.170). The result was attributed to the fact that range cue (attention allocation by top-down process) was used such as

Measurement of Depth Attention of Driver in Frontal Scene

383

Reaction Time[ms]

500 Red Green Blue

450 400 350 300 Valid

Neutral

far to near

near to far

Fig. 6. Reaction Time of each cue in dynamic condition

in front” and “in behind”. This result is consistent with the experimentⅠ. In uncomplicated main effect as additional analyses, there were significant differences in RT of asymmetry in dynamic condition[F(1,48)=7.69 p=.008]. From these results, it was indicated that RT was slower when subjects' attention was shifted "near to far" than "far to near" in dynamic condition, and RT when the subjects was turned one's mind to the place that was not appearance position was slower than when the cue was not presented.

Ⅲ Influence of Display Colors of the Same Amount of Energy Given to Depth Attention Characteristic

1 Method According to results of experiment, analyze influence of display colors of the same amount of energy given to depth attention characteristic. Subjects: The subjects were 14(male: 7, female: 7) volunteers participated at Kagawa University. All the subjects have normal vision or corrected to normal vision. Apparatus and Stimuli: The apparatus and stimuli were the same as experimentⅡ except the luminance of the fixation point. The fixation point was a red (11cd/m2), green (13cd/m2) and blue (13cd/m2) LED in brightness. Procedure and Design: The procedure and design were the same as experimentⅡ. 2 Results and Discussion Fig.7 showed RT and standard deviation in three kinds of cue cases (valid, neutral and invalid) in dynamic condition. There were significant differences in RT of each cue[F(2,24)=32.56 p=.000], display colors[F(2,24)=12.26 p=.000], gender-display colors interaction[F(2,24)=3.98 p=.032] and cue-display colors interaction [F(4,48)=9.94 p=.000]. In additional analyses of multiple comparison, RT of each cue was slower in invalid than valid (p=.000) and neutral (p=.000). There were no

384

M. Fukuoka et al.

Reaction Time[ms]

500 Red Green Blue

450 400 350 300 Valid

Neutral

far to near

near to far

Fig. 7. Reaction Time of each cue in dynamic condition

significant differences in RT between valid and neutral (p=.511). This result is consistent with the experimentⅠ. In addition, RT of each colors was slower when the color was red than green (p=.004), and blue (p=.000). From this result, it was indicated that RT was slower when cue presented subjects with red than green and blue. In uncomplicated main effect as additional analyses, there were significant differences in RT of display colors in invalid[F(2,72)=31.62 p=.000], and female [F(2,24)=15.10 p=.000]. In multiple comparison of invalid, there were significant differences in RT between red and green (p=.000), red and blue (p=.000) and green and blue (p=.006). In multiple comparison of female, there were significant differences in RT between red and green (p=.002), red and blue (p=.000), and had a tendency between green and blue (p=.068). From the above result, RT was slower when the subjects was turned one's mind to the place that was not appearance position than when the cue was not presented. These results are consistent with the preceding study. In addition, RT was faster when the display color was blue than green, and slower when the display color was red than green. Furthermore, these results suggested that invalid and female were susceptible to the display color.

5 General Discussion Reaction adaptability of some young drivers was still faster than elder drivers, although visual function of them decline due to retinal degeneration. RT was slower in invalid cases than valid and neutral cases. This implies that it was dangerous when elderly driver driving due to their RT was slower than young driver with dark filed. The luminance of peripheral environment has had an influence on the response adaptability of drivers while driving. These attention characteristic appears more remarkably in twilight condition than in daylight and dawn condition for drivers having low visual adaptability. That is, reaction time in each cue cases is slowest in twilight condition than either daylight or dawn condition.

Measurement of Depth Attention of Driver in Frontal Scene

385

The display color has had an influence on the response adaptability of drivers while driving. These attention characteristic appears more remarkably in invalid condition and female drivers and be susceptible to the display color. The shift of depth attention had asymmetry, that is, RT as different when the shift of depth attention was from nearer to further space and from further to nearer space. RT was slower in “near to far” than in “far to near”, and this tendency stood out in dynamic condition. On the other hand, this tendency became weak when the target color was red. Visual perception will be examines based on the characteristic of ocular convergence and people's the depth visual function for the future. In addition, it will be examined when bright and color of the targets will be changed, and though the test of the brain wave of the subjects, the feasibility of evaluation the impact of the road traffic environment on driver’s comfortableness by using the fluctuation law of the brain wave is discussed in the future. Moreover, driving safety application research will develop for road traffic safety, for example, the traffic safety educational system in driving for the aged driver base on the studies above in future. From these considerations, it must to be clarified to prevent the traffic accidents that we should understand the delay of reaction caused by the lack of understanding the context of the traffic and foresight of the coming danger in the frontal scene.

References 1. Eriksen, B.A., Eriksen, C.W.: Effect of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics 16, 143–149 (1974) 2. Posner, M.I., Snyder, C., Davidson, B.: Attention and the detection of signals. Journal of Experimental Psychology, General 109, 160–174 (1980) 3. Shulman, G.L., Remington, R.W., McLean, J.P.: Moving attention through visual space Human Perception and Performance. Journal of Experimental Psychology 5, 522–526 (1979) 4. Andersen, G.J., Kramer, A.F.: Limits of focused attention in three-dimensional space. Perception & Psychophysics 53, 658–667 (1993) 5. Downing, C., Pinker, S.: The spatial structure of visual attention. In: Posner, M.I., Martin, O. (eds.) Attention and Performance, vol. XI, pp. 171–187 (1985) 6. LaBerge, D., Brown, V.: Variations in size of the visual field in which target are presented: an attentional range effect. Percsption and Psychophysics (1986) 7. Miura, T., Shinohara, K., Kanda, K.: Shift of attention in depth in a semi-realistic setting. Japanese Psychological Research 44, 121–123 (2002) 8. Xia, R., Fukushima, M., Doi, S., Kimura, T., Miura, T.: The Study of Characteristic of Attention in Depth of Driver in Supposed Traffic Environment. In: IEEE/ICME, pp. 1413– 1418 (2007)

Understanding the Opinion Forming Processes of Experts and Customers During Evaluations of Automotive Sounds Louise Humphreys1, Sebastiano Giudice1, Paul Jennings1, Rebecca Cain1, Garry Dunne2, and Mark Allman-Ward3 1

University of Warwick {Louise.Humphreys,S.D.Giudice,Paul.Jennings, R.Cain.1}@warwick.ac.uk 2 Jaguar Land Rover [email protected] 3 Sound Evaluations Ltd. [email protected]

Abstract. A challenge in automotive engineering is to understand the subjective reactions of individuals to vehicle sounds; this is necessary in order to improve decision making during product design. We can use “structured evaluations” to achieve this, but we need to ensure that 1) we understand the reasons behind such evaluations i.e. the opinion forming process and 2) that such evaluations are analogous to appraisals of vehicles on the road. Hence for structured evaluations to be effective, it is important that we understand the opinion forming process in real-life situations. Since there is a lack of knowledge on how people form perceptions about vehicles in reality, an appraisals framework is described in this paper. Moreover, this paper discusses a pilot study that investigated how experts assess vehicle sounds on-road, as well as planned future studies to examine how customers evaluate automotive sounds.

1 Introduction Automotive products will always ultimately be used by people. Therefore, as well as understanding the objective functional attributes of such products, we also need to understand the subjective human perspective. Decision making throughout automotive product development can be improved through the most effective and efficient use of the subjective reactions of drivers. In particular, the sounds of cars are one attribute which are not easily quantified - the best sound isn’t necessarily the quietist. Hence subjective evaluations are particularly important for the development of automotive sounds. In a structured evaluation customers appraise products, or elements of products (e.g. a vehicle's sound), in a controlled environment (for sound, in a listening room or using an interactive noise, vibration and harshness [NVH] simulator). Structured evaluations can be used before the actual car exists, saving costs during product development. However, not only do we need to know what the results of these D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 386–395, 2009. © Springer-Verlag Berlin Heidelberg 2009

Understanding the Opinion Forming Processes of Experts

387

evaluations are, we need to understand why individuals make the decisions they do i.e. their “opinion forming processes”. Moreover, the way in which individuals appraise products in real life can differ to the way they appraise products in structured evaluations, due to the effect of context. It is therefore important for manufacturers that the results of product appraisals within structured evaluations match, or come as close as possible to, real-life. We therefore need to develop consistent methods for capturing opinion forming processes in different environments (e.g. on-road, in vehicle simulators, and in listening rooms) and compare these opinion forming processes across the different levels of reality. From this we can gain an understanding of how real world representation can influence decision making during product development, which will lead to the development of guidelines for the representation of the real world in simulated environments. It is also important to note that evaluations are often carried out by experts (i.e. NVH engineers and decision makers), but such evaluations may be very different to those of customers. Hence it is important to be aware of such differences: structured evaluations may need to be organised differently for customers and experts. The overall aim of our research is to optimise structured evaluations by learning more about the appraisal of vehicles by experts and customers. This paper demonstrates how we have started to address this aim. It describes an appraisals framework which was developed due to a gap in knowledge about subjective appraisals of vehicles. The framework identifies key influences that may govern people’s opinion forming processes. This paper also discusses a pilot study that was carried out to examine how expert assessors evaluate vehicle sounds on-road, as well as planned future studies that will investigate how customers appraise vehicles.

2 Appraisals Framework When an individual subjectively evaluates a car, their opinion forming processes are not simply determined by the attributes of the car itself, but also by other influences, particularly relating to the individual themselves. Indeed, a person’s response to a car will be influenced by (1) their knowledge of the brand (2) their self-image and personality (3) their emotional responses whilst driving and the extent he/she enjoys driving in general (4) their preferences, needs and demographics (5) their driving behaviour and experience (6) their expertise in appraisals i.e. customers versus experts. In addition to these influences, a person’s assessment of a car might change depending on when it takes place. Each of these influences will now be discussed in more detail. 2.1 Knowledge of the Brand An individual’s pre-expectation of a car is likely to have an impact on how the car is perceived. In particular, the brand of the car could be crucial in shaping the opinion formed. Wänke, Herrmann, and Schaffner [1] have demonstrated the importance of brand name in determining brand perception. They found that individuals rated hotels according to their pre-existing knowledge of the hotel names; they were influenced by the semantic associations of the names. Although this study investigated the perception of hotels based on their names, it is useful in illustrating the cognitive biases that

388

L. Humphreys et al.

can occur when appraising products. Ellermeier and Vase Legarth [2] have carried out a related experiment investigating automotive sound quality. Subjects had to judge automotive sound recordings; before hearing each sound subjects saw a picture of either a high-performance or low-performance car. The results showed that the former had a positive bias on the evaluations; the latter had a negative bias on the evaluations. Although this experiment did not exclusively assess brand image, it does demonstrate how individuals carry knowledge about certain types of cars, which influences their opinions of such cars. 2.2 Self Image and Personality When evaluating a product, brand image/knowledge may interact with an individual’s own self image; an individual may believe that the brand is a reflection of their own image. Self-concept theory maintains that individuals act in ways that preserve and develop their self-concept; this can be achieved through the purchase and use of products. Gardner and Levy [3] and Levy [4] argue that consumers frequently purchase products for their symbolic value, and not for their physical attributes or functional benefits. Consumers hold symbolic meanings (or images) of themselves, as well as of products and the types of customers who buy and use these products. Essentially, an evaluation of a car might not just be determined by the brand image itself, but also by an individuals’ own self image, and the image they hold about the type of customer who buys the car. The image congruence hypothesis postulates that an evaluation of a product will be influenced by the degree of congruence between product image and self-image. It is possible that personality plays a role in shaping an individuals’ subjective evaluation of a vehicle. Indeed, the congruency hypothesis may be more appropriate for individuals with certain personality types. Graeff [5] found that self-monitoring moderated the image congruence hypothesis; enhanced self-monitoring is linked with a greater influence of image congruence on consumers’ evaluations of publicly consumed brands. Self-monitoring is the extent to which an individual monitors and controls his/her expressive behaviour and self-presentation to concur with social cues (Gould [6], Snyder [7]). A number of personality types have been linked to driving behaviour; ‘external’ locus of control (e.g. Montag and Comrey [8]) i.e. a disposition towards attributing causation or blame to outside factors, such as luck or chance; sensation seeking (e.g. Heino, Molen & Wilde [9]) i.e. the tendency to seek sensation and excitement; and Type A personality (e.g. Perry and Baldwin [10]) i.e. a strong sense of urgency, high competitiveness, impatience, and quick irritability. It is possible that such personality types may also be linked to subjective evaluations of cars. If personality influences driving behaviour, then it can be argued that it can also influence subjective appraisals of vehicles. Choo and Mokhtarian [11] found a link between personality and choice of car. They found that organisers (those who like to be in charge) are more likely to drive moderate cars, as opposed to smaller, larger or speciality cars. Moreover, calmer people are more likely to drive minivans.

Understanding the Opinion Forming Processes of Experts

389

2.3 Emotional Responses Whilst Driving and Driving Enjoyment It is possible that emotional reactions that occur during driving alter opinions formed about the car. A number of studies have attempted to measure driving pleasure (Matsuura, Araki and Chen [12]; Tischler, Peter, Wimmer and Voskamp [13]). Tischler et al found that both emotional speech and facial expression varied according to the type of car. However, they found that people did not express happiness when they drove alone and there was a lot of non-emotional movement of the face due to observing traffic which made facial recognition difficult. Such studies are useful in highlighting emotional responses that occur during driving, but there needs to be more studies that explicitly link such emotional responses with any opinions formed. It is possible, however, that emotional responses that occur during driving are not the cause of the evaluations but a product of them. Nevertheless, it is important to measure such responses when investigating subjective reactions because they can tell us what individuals really feel about a car, rather than what they believe they feel; they remove language as a barrier. The extent to which an individual enjoys driving in general may alter their perception of a car. A car enthusiast might evaluate a vehicle differently to somebody who does not enjoy driving. Moreover, being a car enthusiast could influence evaluations positively or negatively. For example, if somebody enjoys driving then they might put up with imperfections and evaluate the car more positively than someone who does not enjoy driving. Conversely, a car enthusiast might demand the very best from the car and will evaluate it more negatively than someone who does not enjoy driving. There might also be a difference between enthusiasts and non-enthusiasts when evaluating luxury and non-luxury cars; enthusiasts might evaluate sports cars more positively and less expensive cars more negatively than non-enthusiasts. These ideas are quite speculative but warrant of further investigation. The pilot study described within this paper attempts to develop a measure of driving enjoyment, although because the sample consisted of expert assessors of automotive sounds, most participants scored highly. This measure will be used when assessing how customers appraise vehicle sounds. 2.4 Preferences, Needs and Demographics The next influence refers to what an individual requires of a car (which can include both the car itself and how the individual will use the car), as well as to their demographic information (e.g. age, gender, income etc.). This is a very important (and could possibly be the most important) influence on subjective evaluations of vehicles. A person might see a sports car as congruent with their self-image, and might also be a car enthusiast, who prefers sports cars, but if the person does most of his/her driving over long distances then they might not evaluate such cars positively; sports cars are not necessarily practical for long journeys. Moreover, affordability may influence an individual’s evaluation of a car; if an individual can afford an expensive sports car then he/she might evaluate the car more positively than someone who cannot afford such a car. The importance of these factors has been highlighted in a study by Choo and Mokhtarian [11]. This study showed a link between the amount of time people think they

390

L. Humphreys et al.

spend travelling and their choice of car; individuals who state that they travel frequently over short distances are more likely to drive sports cars; those who state that they travel frequently over long distances are less likely to drive sports cars. Choo et al state that this suggests that sports cars are more desirable for travelling around town but are not practical for long trips. Moreover demographic variables had a significant effect on an individual’s choice of vehicle. For example, age was negatively associated with driving small or sports cars, or SUVs. In addition to these findings, Choos et al found that personality, travel attitude, and lifestyle had a significant effect on car preference. For example, with regard to travel attitude, individuals who have a dislike for travel are more likely to drive luxury cars; with regard to lifestyle, workaholics are less likely to drive luxury and sports cars, and status seekers are more likely to drive luxury and sports cars. This latter finding could be linked to personality; status seekers believe that their car is a status symbol. 2.5 Driving Behaviour/Driving Experience The way an individual drives a car is likely to have an impact on how he/she evaluates the car. For example, an aggressive driver might be more likely to enjoy driving a sports car compared to a more conservative driver. The literature tends to focus on driving behaviour in terms of risky driving. However, it might also be useful to investigate driving behaviour in relation to subjective appraisals of vehicles. In particular, driving aggressively, as opposed to calmly, will result in different sounds (and performance levels) being evaluated. Moreover, driving experience could also play a role in the opinion forming process. 2.6 Expertise in Appraisals The way a customer evaluates a car might be very different from the way an expert, such as a VET (Vehicle Evaluation Team) or a NVH (Noise, Vibration & Harshness) expert, evaluates a vehicle. Indeed some of the determinants outlined in this paper might not be as influential for experts. 2.7 Time A person’s assessment of a car might change depending on when it takes place; an individual’s pre-expectation of a car could be very different to an appraisal given after or during driving, or after 90 days experience of the car. The duration of an assessment can also be an influencing factor.

3 The Pilot Study The framework outlined in section 2 was used as basis for conducting a pilot study, which aimed to gain insight into how expert evaluators appraise cars in reality, with a particular focus on the sounds of a car. The study did not only aim to measure the actual assessments of the car (i.e. the results of the evaluations), but how participants arrived at such appraisals by measuring both influences from within the person and across time. Moreover, this study was a prelude to a wider customer study; the aim

Understanding the Opinion Forming Processes of Experts

391

was to understand the process of decision making among experts before examining how customers appraise vehicle sounds. Method. 13 experts (12 NVH engineers, and 1 member of the Vehicle Engineering Team [VET]), all employees of Jaguar Land Rover in the UK, took part in the study. Participants firstly completed a pre-drive questionnaire, which contained 3 sections. Section 1 contained questions relating to the participants’ demographics, driving experience, their current/next car and their driving enjoyment1. Section 2 contained 20 questions taken from Zukerman’s [14] sensation seeking questionnaire, in which each item contains 2 choices, A and B; subjects choose which statement best describes their likes and feelings. One statement is always indicative of the likes and feelings of someone who is a sensation seeker and the other is not. For example, participants would have to choose between (a) a sensible person avoids activities that are dangerous and (b) I sometimes like to do things that are a little frightening. Driver behavior, driving enjoyment and car preference questions constructed by the authors were added to the personality questions in the same forced choice format. The reason for constructing new driving behaviour questions rather than using existing questionnaires (e.g. the driving behavior questionnaire, Reason Manstead, Stardling, Baxter and Campbell [15]) was that such questionnaires focus on risky driving. The driving behavior questions measured whether the subjects are aggressive or passive drivers; the driving enjoyment questions measured whether the subjects are car enthusiasts; the car preference questions asked subjects about the car requirements that they have. Finally, section 3 asked participants about their knowledge and pre-expectations of the car to be evaluated (a Jaguar sports convertible2 [Car A]). The pre-drive questionnaire was followed by a test drive to obtain evaluations whilst driving, emotional responses whilst driving and to obtain objective driving behaviour data. Participants were asked to carry out a similar test drive to those conducted as part of their work routine. Drivers could choose their route, and the length of the drive varied between 30 minutes and 80 minutes. Participants were asked to evaluate the sound of the vehicle as they drove by vocally expressing their opinions. In particular, they were asked to verbalise their actions, motivations and impressions formed. To measure emotional responses whilst driving, video and voice recordings were taken, allowing facial expressions and tone of voice to be observed. To measure objective driving behaviour, and to obtain an electronic record of what the drivers were doing, CAN data was recorded from the car (rpm, speed, throttle, brake, gear [selected and actual], and steering angle). GPS data were collected throughout the evaluation to identify the location at any given time of the appraisal. Finally participants were given a post-drive questionnaire to evaluate Car A after driving. Moreover, participants were asked whether they were affected by the in-car recording kit, to determine whether they were very aware of the video camera whilst driving, and whether this influenced their behaviour. We also wanted to find out whether participants found the commentary distracting. 1

The Driving enjoyment question was taken from Lajunen and Summala [16]. However, instead of getting participants to tick one of four options in response to the question, what does driving mean to you, participants had to rate each of the four statements in terms of how much they agree with them. 2 Assessments were conducted with the roof closed.

392

L. Humphreys et al.

Results and Discussion. The demographic data showed that the sample was fairly homogeneous i.e. males in their forties who live in the suburbs. Regarding driving experience, the average driving license duration ranged from 16 to 32 years; the annual mileage from 6k to 20k; nearly all of the participants did their driving on country roads or the motorway. The sensation seeking results showed that there were few participants that scored highly on their overall score. However, over half of the participants obtained high thrill and adventure seeking scores, for example, they liked to do frightening activities. The forced choice driving behavior questions showed that only 27% of participants could be classified as aggressive drivers (i.e. gave aggressive responses on at least 4 questions out of 5). However, all participants said that they drove fast and sporty; very few participants said that they were inconsiderate drivers, that they got angry whilst driving, and that they speeded. Hence there could be some social desirability in responding. The forced choice driving enjoyment questions showed that 64% of participants could be classified as car enthusiasts (i.e. gave driving enjoyment responses on at least 4 questions out of 5); participants were mixed when deciding whether they like to drive for pleasure or just to get from A to B, and when deciding whether they buy car magazines or not. When asked to rate various statements in terms of how much they agreed with them, 73% of participants agreed with the statement ‘I enjoy driving’ and 55% agreed with the statement ‘to me driving is, after all, a way to move from one place to another’. These findings agree with the forced choice questions; although the participants enjoy driving, in their spare time they only drive to get from one place to another, perhaps because they drive cars as part of their jobs. All subjects said that the sound of the car would influence whether they would buy it; all but 1 said that they prefer luxurious cars to less expensive cars; 64% said they prefer loud and throaty cars to quieter cars; 64% said it was important for them to buy an eco-friendly car; only 45% said that they prefer sporty cars to practical cars. All of the subjects were familiar with Car A, had driven the car before and were able to give an approximate value of the car. Moreover, prior to the test drive, most subjects judged Car A to be extremely powerful, fun, exciting, sporty and spirited; there was little variation in their views. The questionnaire results mentioned so far can largely be explained by the nature of the subjects’ professions. In particular, most subjects were car enthusiasts, which is not surprising given their roles. Moreover, over half of these individuals enjoy doing exciting and frightening activities, which could be one reason why they enjoy driving, and hence is a reason for doing their job. Interestingly, in their spare time they see cars more functionally, rather than as a pleasurable activity. Perhaps only 45% prefer sporty cars to practical cars because these individuals drive sports cars as part of their job. This could be a common trend for expert assessors that evaluate cars on a day to day basis. The fact that sound is important to all subjects is quite predictable from their occupation – much more variability could be expected from customers. Whilst taking part in the test drive expert evaluators showed little variation in emotional expression (facial and tonal). For example, participants did not display facial expressions that indicate emotions such as enjoyment, surprise, anger or disgust. Moreover, there was little change in tone of voice. All assessors choose to conduct their evaluation on a different route; however, similarities between each assessor existed. The following were deduced from the observations and analysis. Country lanes were used for full load accelerations (throttle fully applied) and overrun assessment

Understanding the Opinion Forming Processes of Experts

393

(coasting down after acceleration). Motorways (60 to 70 mph) were used for road and wind noise assessments. Power train sound quality (noise from engine, exhaust, intakes and gears) was also evaluated here by sequentially applying and releasing the throttle, whilst in top gear and at high speed. Urban areas were used to evaluate error states such as gear whine, rattle and squeaks. Broken surface were chosen for evaluating road impacts and cabin vibration. These results show the similarities that occur between experts when assessing cars, although this could be because all assessors were based within the same company. Different assessors did, however, approach the evaluation from different perspectives dependent on their role within the NVH team. The assessor from the VET focused on the customers’ perspective of the brand and the vehicle. This person frequently performed full load accelerations, whilst trying to appreciate how the customer would perceive the sound at these driving maneuvers. The NVH technical specialists focused on similar perspectives, but used insight of the constraints in the vehicle development programme to formulate their opinions. The NVH engineers would particularly focus on a specific aspect of the vehicle’s refinement. They would dwell on the error states and comment more on sound and vibration levels as opposed to character of the sound. Finally there were assessors, who were not NVH experts but were part of the Sound Quality team. These people provided an overview of their impressions of the car. They did not focus on error states and / or the character of the sound. The post-drive questionnaire showed that expert evaluators were not affected by any of the in-car recording kit (i.e. the video and voice recordings). Moreover, the evaluations formed remained relative unchanged over time; subjects judged Car A to be extremely spirited, fun, exciting, sporty and powerful, as in the pre-drive questionnaire. There were, however, a few opinion changes; 25% of subjects judged Car A to be somewhat effortless instead of extremely effortless, 12.5% judged the car to be extremely aggressive as opposed to somewhat aggressive, 12.5% changed their answer from somewhat comfortable to not at all comfortable and 12.5% from extremely luxurious to somewhat luxurious, and 25% changed their answer from extremely pleasant to somewhat pleasant. The pilot study has given some insight into how expert evaluators appraise vehicles in reality. In particular the pilot will provide benchmark data for future comparisons with real customers. Indeed although many of the measurements showed little variation this is not surprising given the expertise of the individuals who carried out the appraisals; greater variation is expected from customers. For example, participants’ pre-expectations of Car A, as well as their later evaluations, were similar across participants. A wider variation in opinions would be expected from customers due to the factors mentioned in the appraisals framework. In addition to the similarities across participants, there was also consistency across time; there were little opinion changes between the pre- and post-drive questionnaire. Participants also showed little facial expression and alterations in tone of voice whilst driving. These results are still important as they serve as a benchmark for future studies. Although in this pilot study the small number of participants could not provide statistical effects and correlations between the different measurements (e.g., opinions formed about the car with personality, driving behavior, demographics and so on), this data is still valuable in showing the type of results that emerge. Moreover, the study provided understanding of suitable methods, a starting point for future studies.

394

L. Humphreys et al.

In addition, it is important to point out that decision making throughout automotive product development usually involves experts only, and decisions are made without input from other individuals; customers are rarely used as they are expensive and evaluations are time consuming. By considering the problem from this perspective, the sample size should not make the observations or the outcomes any less valid. However, much larger sample sizes are necessary for future studies. Therefore, the next stage of research is to conduct a large scale questionnaire study; this will allow us to identify participants for future on-road studies. Questions may be added to determine personality factors that influence the image congruence hypothesis as the pilot study simply focused on sensation seeking as a personality factor. The questionnaire may also contain questions relating to how important automotive sounds are to individuals; it may be possible to organize subjects into 2 groups based on whether sound is important to them or not. By carrying out a questionnaire study on a large sample of participants, it will be possible to validate the questionnaire. After completing the questionnaire study, participants will be selected for another on road study using real customers.

4 Conclusions There are many factors which can influence the evaluation of automotive sounds. Influences need to be captured in order to gain an understanding of the decision making process. By gaining insight into how people appraise vehicles in reality, we can use this information to optimise structured evaluations. Part of the process of optimising structured evaluations might involve organising them differently for customers and experts; if the decision making process is different for customers and experts, it is likely to be true within structured evaluations. By developing guidelines for representing the real world in simulated environments, we can improve decision making during product design.

Acknowledgements This research was funded by the UK’s Engineering and Physical Sciences Research Council (EPSRC) through the Warwick Innovative Manufacturing Research Centre.

References 1. Wänke, M., Herrmann, A., Schaffner, D.: Brand Name Influence on Brand Perception. Psychology & Marketing 24, 1–24 (2007) 2. Ellermeier, W., Vase Legarth, S.: Visual Bias in Subjective Assessments of Automotive Sounds. In: Proceedings of Euronoise 2006: The 6th European Conference on Noise Control (2006) 3. Gardner, B.B., Levy, S.J.: The Product and the Brand. Harvard Business Review 33, 33– 39 (1955) 4. Levy, S.J.: Symbols by Which We Buy. In: Stockman, L.H. (ed.) Advancing Marketing Efficiency, pp. 409–416. American Marketing Association, Chicago (1959)

Understanding the Opinion Forming Processes of Experts

395

5. Graeff, T.R.: Image Congruence Effects on Product Evaluations: The Role of SelfMonitoring and Public/Private Consumption. Psychology and Marketing 13, 48–499 (1996) 6. Gould, S.J.: Assessing Self-Concept Discrepancy in Consumer Behavior: The Joint Effect of Private Self-Consciousness and Self- Monitoring. Advances in Consumer Research 20, 419–424 (1993) 7. Snyder, M.: Self-monitoring processes. In: Berkowitz, L. (ed.) Advances in Experimental Social Psychology, pp. 85–128. Academic Press, New York (1979) 8. Montag, I., Comrey, A.L.: Internality and Externality as Correlates of Involvement in Fatal Driving Accidents. Journal of Applied Psychology 72, 339–343 (1987) 9. Heino, A., van der Molen, H.H., Wilde, G.J.S.: Risk-Homeostatic Processes in CarFollowing Behaviour: Individual Differences in Car-Following and Perceived Risk. Report VK 92-02, p. 31. Traffic Research Centre, University of Groningen, the Netherlands (1992) 10. Perry, A.R., Baldwin, D.A.: Further Evidence of Associations of Type A Personality Scores and Driving-Related Attitudes and Behaviours. Perceptual and Motor Skills 91, 147–154 (2000) 11. Choo, S., Mokhtarian, P.L.: What Type of Vehicle Do People Drive? The Role of Attitude and Lifestyle in Influencing Vehicle Type Choice. Transportation Research Part A: Policy and Practice 38(3), 201–222 (2004) 12. Matsuura, Y., Araki, K., Chen, Z.: Study of Maneuvering Feeling Via Drivers Physiological Responses. In: International Pacific Conference on Automotive Engineering (1993) 13. Tischler, M.A., Peter, C., Wimmer, M., Voskamp, J.: Application of Emotion Recognition Methods in Automotive Research. In: Proceedings of the 2nd Workshop on Emotion and Computing - Current Research and Future Impact, Oldenburg, Germany, September 2007, pp. 50–55 (2007) 14. Zuckerman, M.: Behavioural Expressions and Biosocial Bases of Sensation-Seeking. Cambridge University Press, Cambridge (1994) 15. Reason, J., Manstead, A., Stardling, S., Baxter, J., Campbell, K.: Errors and Violations on the Road: A Real Distinction. Ergonomics 33(10–11), 1315–1332 (1990) 16. Lajunen, T., Summala, H.: Driving Experience, Personality, and Skill and Safety-Motive Dimensions in Drivers’ Self-Assessments. Personality and Individual Differences 19(3), 307–318 (1995)

HR Changes in Driving Scenes with Danger and Difficulties Using Driving Simulator Yukiyo Kuriyagawa1, Mieko Ohsuga2, and Ichiro Kageyama1 1 Nihon University, 1-2-1 Izumi-cho, Narashino-shi, Chiba, 275-8575, Japan Osaka Institute of Technology, 5-16-1 Omiya, Asahi-ku, Osaka 535-8585, Japan [email protected], [email protected], [email protected]

2

Abstract. To provide a safe and comfortable driving environment, extracting a variety of stress scenes experienced by drivers and utilizing them for investigating actual causes and ways to assist drivers is effective. To find scenes that could be investigated efficiently in this way, we proposed a method based on changes in a driver’s physiological indices that emotional changes may have caused. In this paper, we examined the possibility of applying this method to experimental situations using a driving simulator (DS). An experiment using a DS has an advantage over one done in a real life situation in that the experimental parameters can be controlled. This paper examines the relationship between a driver’s emotional changes and physiological changes during driving. As a result, we suggest that whether an event is recognized and how much emotion it caused can be estimated by combining measurements of changes in heart rate (HR), skin conductance (SC), and respiration.

1 Introduction Recently, scientists in Japan have actively pursued research using drive recorders to analyze the causal processes and factors in traffic accidents and latent accidents [1]. Before and after a traffic accident a drive recorder records operation conditions as a trigger of a vehicle’s behaviour. It tries to obtain information that will help to decrease traffic accidents. In addition, we have measured a driver’s emotional changes to detect problem situations in daily driving. We think that measuring these emotional changes can effectively decrease not only traffic accidents but also driving workload. A driver may be startled when he encounters potentially dangerous situations or hazards (e.g. an abrupt crossing by a pedestrian, a sudden braking of the car in front of him), while he is emotionally strained when he has to predict dangerous or difficult situations (e.g. pulling into traffic or passing another car). We can measure appropriate physiological responses that will enable us to extract a driver’s startled response and emotional strain without disturbing the driver's driving. We observed two kinds of heart rate (HR） changes, GL (gentle and lasting) and SA (sharp and abrupt), in data taken from drivers during real car driving (see Figure 1). Considering the temporal relationship between the HR changes and the relevant events, it was suggested that the target of study in the subjective reports would be different for GL D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 396–403, 2009. © Springer-Verlag Berlin Heidelberg 2009

HR Changes in Driving Scenes with Danger and Difficulties Using Driving Simulator

] m p b [

397

] m p [b

(a) GL (gentle and lasting)

(b) SA (sharp and abrupt)

Fig. 1. Two major categories of HR fluctuation

and SA [2]. For the GL the target of study would the time when GL came to an end. For the SA rate the target of study would be the time when SA began. In our research we used a driving simulator to analyze in detail the changes in HR and their correspondence to the driving situation. From the results, we observed two kinds of HR that occur in the same way as when a driver is driving a real car. In addition, we suggest there are two kinds of HR changes in GL. We think that HR change depends on the driver’s feeling; for example, how does the driver perceive the event?;, and how does the driver handle the event? In this study, we examined the relationship between the driver's feelings and HR changes.

2 Measurement of Data During Use of a Driving Simulator 2.1 Experimental Scenario We carried out an experiment using a Driving Simulator (DS) with a spherical screen and a 6-DOF motion base in our laboratory (see Figure 2). In on-road experiments using a real vehicle, it is not possible to set up actual dangerous situations and collect reproducible data. However, experiments using a driving simulator (hereafter “DS”) can set up various, specific driving conditions. Therefore, DS experiments have an advantage in driving research. DS studies can compare the results among different subjects under the same experimental conditions or investigate the variations in response by the same subject by repeatedly executing the same experiments. As objects of examination for a traffic incident we set up two points in the simulation in which the driver came to a narrow one way passage on the main road he was driving on. (see Figure 3). The approaches to these places where the road narrowed were different. The road curved differently for each place. Therefore, the anticipation tension time until the driver discovered the narrowing of the road was different for each place, and his approach was also different. The main road in the driving scenario was a double-lane country road. In addition, in the scenario there was a lot of traffic, so the subjects would not know what aspect of their driving would be tested in the simulation. The accelerator and the brake were controlled by a computer program; and the driver only had to control the steering wheel. The purpose was to control the time from a subject’s discovery of an incident event to the time of his reaction to it. To maintain the appropriate tension, the driving speed was kept at

398

Y. Kuriyagawa, M. Ohsuga, and I. Kageyama

Fig. 2. The road narrowed on the Driving Simulator

Fig. 3. Driving Simulator

approximately seventy km/h. The subjects participated in an experimental session of eight minutes. 2.2 Procedure We first conducted a practice session that aimed to familiarize subjects with the DS system and the sensors for measuring physiological responses. In the practice session, the subjects were made to experience an accident intentionally at one of the places in the simulation where the road narrowed. The purpose was to cause an anticipation tension in all subjects there.

HR Changes in Driving Scenes with Danger and Difficulties Using Driving Simulator

399

After each experiment, the subject was required to report about the object of the examination when he expected the incident: when he recognized the incident how he felt about the incident; and how he handled the incident, during the experimental driving. 2.3 Measurement To examine the extracted situations multi-dimensionally, the subject’s physiological measurements, - ECG, SC and respiration - were recorded by Polymate (AP1124). These measurements differed at the time of the discovery of the street nearby according to the direction of the subject’s eyes during driving. The physiological indices used for examining a subject’s response change when the subject discovers an incident, so we measured when exactly the subject discovered an incident by having him use a push-button. In addition, the driver’s view and his or her facial expression was video-recorded (see Figure 4).

Fig. 4. Front view of a driver and his facial expression on the Driving Simulator

400

Y. Kuriyagawa, M. Ohsuga, and I. Kageyama

2.4 Subjects Eight young drivers participated in the experiment. Their driving frequency was from twice a month to seven times a week. After the experiment was explained, each subject signed an informed consent form.

3 Results 3.1 Data Processing of Physiological Indices HR changes were quantified by the following data processing. First, radiofrequency noise and baseline fluctuations were removed from the measured chest electrocardiogram (ECG). Second, an R-wave enhancement filter was applied and R waves were detected [3]. Instantaneous HR was obtained from the sequence of R-R intervals beat by beat and then converted into equi-interval data by 3-order spline interpolation. The interpolated HR was passed through a low-pass filter (cut-off frequency, 0.08 Hz), which removed the respiratory and Mayer wave components of heart rate variability. The other physiological responses yielded raw data that allowed us to observe details of the changes. 3.2 Correspondence of Physiological Indices, Subjective Reports, and Driving Behavior In the practice session all subjects caused an accident at one of the places where the road narrowed, as intended by the experimenters. So, the interviews with the subjects about the first place where the road narrowed showed that the subjects had various feelings about it. Such comments as “I will cause a car accident when I come to a place like this” and “It was possible to pass through the narrow area because I have become accustomed to driving” illustrate this point. So when the subjects came to a second place where the road narrowed during the simulation, most of them made positive comments such as "It is possible to pass through this narrow area if I drive as well as I did before’ after they had experienced going through the first narrow passage. Here, by visual observation, HR elevation could be seen in the data of subjects who felt strong danger. Figure 5 shows one example of the results. Zero is shown in Figure 5; it occurred when the subject pushed a button to show he detected a narrowing of the main road. The narrow passage was set to come up in the scenario while the subject was driving on a curve on the main road. There was about 15 seconds from where the road started to curve to the subject’s discovery of the narrowing of the road to a one-way passage. (1)Physiological indices HR decreased in all subjects before they came to the places where the road narrowed. Respiratory amplitude decreased for subjects B, D, and E, and increased for subjects A and C. In addition, SC rose in subjects C and D when they discovered the road narrowing, and SC rose in subjects A and B when they went into the narrow passage.

HR Changes in Driving Scenes with Danger and Difficulties Using Driving Simulator

401

(2)Subjective reports Most subjects commented, "(Because I was able to safely pass the first place ) I thought it would be safe if I drove as fast as I did for the first place" However, subjects A, B, and C also answered, "I drove carefully from the point when I detected that the road became narrow", "I drove carefully because I couldn’t get a good enough grip on my senses", and "I was afraid although I was able to drive well through the narrow passage ", etc. It seemed that the subjects had uneasiness even when they came to the second place where the road narrowed. (3)Driving behaviour All subjects confirmed the position of the vehicle and the places where the road narrowed. They operated the steering wheel as well when they came across the second place where the road narrowed as they did when they came across the first place. From these results, it was shown that HR rose in the scenes in which a subject felt strong danger or difficulty. HR decreased in the scenes that did not require concentration when there was no feeling of strong danger. The respiratory amplitude also decreased intensively in response to the event, and then increased when the event was over. This result was the same as one we had in a past finding. In addition, for SC we could not find a clear relationship between a subject’s emotional change and his handling of the event.

Fig. 5. Example of HR changes, SC, and Respiration at R2000 curve

402

Y. Kuriyagawa, M. Ohsuga, and I. Kageyama

4 Discussion In this study, it was confirmed that HR decreased when subjects concentrated on going through a narrow, one way passage while driving on the main road. On the other hand, it was also confirmed that HR increased when the subjects had a dangerous feeling in experiencing an incident. Therefore, we found that the style of HR change differed according to the style of coping with an incident. We think that the HR increase before coming to a place where the main road narrowed to a one-way passage depended on a subject’s concentration more than on the activation of his sympathetic nervous system. Moreover, we confirmed that SC changed when the subject discovered an incident. In a former report [5], we suggested the possibility that SC might show acknowledgment of an incident when HR did not do so. But the opposite was the case in this study. The data showed large individual variations in SC, and there were also many subjects who did not appear to show any reaction to the incident in terms of SC change. So we need to examine the relationship between SC change and emotional change in the future. Also, we confirmed the trend that respiratory amplitude increases and decreases synchronously with HR increase and decrease, respectively.

5 Conclusions In this study, we examined the relationship between a driver's feelings and HR change. As a result, it was clarified that HR rose when the subject had a strong dangerous feeling in regard to an incident. On the other hand, HR decreased because the subject paid attention to handling the incident. We also observed that in many cases respiratory amplitude increased when HR rose, and breath amplitude decreased when HR decreased. Therefore, we suggest that a driver's emotional change can be detected by breath amplitude. Furthermore, SC change was also observed in discovering and dealing with an incident. However, the change was not the same for all subjects and driving situations. Given this result, we can say that in this report we clarified the relationship between HR change and a driver's emotional change; however, we have also left a lot of issues unresolved in regard to the relationship between SC, respiratory amplitude and a driver’s emotional change. Therefore, we will increase the number of subjects and continue examination of these research topics.

References 1. Shino, M.: Relationship driver behaviour and traffic circumstance based on analysis of forward. In: Proceedings of JSAE, vol. 01-08, pp. 27–30 (2007) 2. Kuriyagawa, Y., Ohsuga, M., Kageyama, I.: Extraction of driving scenes with difficulties and danger of Aged drivers using heart rate changes. Journal of Human Interface Society 9(2), 117–124 (2007)

HR Changes in Driving Scenes with Danger and Difficulties Using Driving Simulator

403

3. Ohsuga, M.: On a QRS Detection Algorithm Applicable to the Evaluation of Mental State in Normal Subjects. Japanese Journal of Medical Electronics and Biological Engineering 30(2), 130–134 (1993) 4. Ohsuga, M., Shimono, F., Genno, H.: Assessment of phased work stress using autonomic indices. International Jounal of Psychophysiology 40(3), 211–220 (2001) 5. Kuriyagawa, Y., Ohsuga, M., Kageyama, I.: Physiological indices change in driving scenes with danger and difficulties using driving simulator, vol. 51-08, pp. 25–28 (2008)

Driver Measurement: Methods and Applications Shane McLaughlin, Jonathan Hankey, and Thomas Dingus Virginia Tech Transportation Institute 3500 Transportation Research Dr Blacksburg VA, 24061 {smclaughlin,jhankey,tdingus}@vtti.vt.edu

Abstract. This paper presents an overview of methods used when measuring driver behavior and performance. Simulators, test-track, on-road, field-operational-trials, and naturalisitic methods are described. Useful driver measures are described. Three examples are provided of the application of driver measurement in product design and evaluation. Keywords: driver, behavior, performance, measurement.

1 Methods The objective of this paper is to provide an overview of methods used for measuring driver behavior and performance. Driver behavior refers to tasks or actions, including both driving-related activities as well as non-driving-related activities. Driver performance refers to the human perceptual and physical capabilities and limitations that affect safe driving. In the second section of the paper, a set of applications are presented to illustrate types of driver measurement and how they are used. 1.1 Study Methods Methods used to study driver behavior can be considered on a continuum. At one end are the controlled studies. These are studies of driving that follow the methods used in traditional experimental research. In these driving studies, two or more conditions are created (manipulated), while as many extraneous factors as possible are controlled, and measures are collected to evaluate the driver’s response to the conditions. The controls present in these studies can limit the ability to generalize to the complete set of conditions that occur in the real world. At the other end of the continuum, minimal or no instruction is given to drivers and they are measured as they perform the things they normally do while driving. In these naturalistic studies, fidelity to real world driving is high, but isolating relationships between factors is more difficult. Driver measurement in simulators, on test tracks, on-road, in fieldoperational-trials, and in naturalistic driving studies describes the range of control and realism found in most driving studies. Simulators. Simulators range in fidelity from desktop driving simulators to multi-axis motion simulators with 360 degrees of display and a range of visual, auditory, and D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 404–413, 2009. © Springer-Verlag Berlin Heidelberg 2009

Driver Measurement: Methods and Applications

405

tactile feedback to the driver [1]. The value of simulators in driving research is that conditions can be controlled precisely and the driver can be tested in conditions which would be hazardous if not created through simulation. The disadvantage of simulators is the difficulty in quantifying how well simulator results will transfer to real driving results. Test Track. Test tracks provide some of the realism of on-road driving, while still permitting experimental and safety controls. Test tracks can be used to measure general driving behavior or to present scripted scenarios and measure performance. When controlling the vehicle on a test track, the driver must remain involved in the driving task. As with simulators, however, the degree to which test-track driving creates the same driving burden as real road driving depends on execution. Factors such as the expectations of participants, presence of experimenters, level of traffic, and test track design should be considered along with the research question. For example, if the research question requires that the participant maintain a thorough visual scan around the vehicle, it may be necessary to include other traffic or pedestrians in the experimental protocol. On-Road. Some tests relating to driving do not require the safety controls or experimental controls provided by a test track or simulator. When the system being tested and the testing methodology have minimal impact on the participant’s normal driving, testing on public roadways is possible. Examples of these situations include testing simple center stack interfaces or testing design iterations that are only a small change from currently common in-vehicle systems. In these cases, on-road protocols provide increased face validity because the driver is aware that the typical driving hazards are present, and so must maintain vigilance and control during the study. In some cases, a confederate vehicle is used to create a specific driving scenario of interest. For example, an experimenter in a lead vehicle might gradually reduce speed, and in doing so, elicit a pass maneuver from the participant. Field Operational Trial. A Field Operational Trial (FOT) is a study in which a developmental system is incorporated into a vehicle and the vehicle is put into normal use in the field as measurements are collected related to the system of interest. FOTs often involve different design alternatives. The Advanced Collision Avoidance System (ACAS) FOT provides a good example of an FOT [2]. In this FOT, 11 Buick LeSabres were provided to 96 participants, accumulating approximately 137,000 miles of driving with development ACASs installed on the vehicles. For most FOTs, data collection for one participant might last for two to six weeks. In many cases, the vehicles used in an FOT will not be the participant’s personal vehicle. This is generally due to logistical reasons. The systems being tested are new systems which are difficult to add to someone’s current vehicle. Additionally, until recently, instrumenting a personal vehicle for a short period was cost prohibitive. The advantage of the FOT is that the driving environment is real. FOTs provide an excellent approach for investigating driver behavior with specific systems of interest. The FOT driving is likely different from everyday driving due to the presence of the test system, newness of the vehicle, and lack of ownership of the vehicle. The duration of the driving time for a participant may not create exposure to a complete

406

S. McLaughlin, J. Hankey, and T. Dingus

set of driving scenarios. For example, some participants might only experience summer driving during the study, while others might only experience winter driving. Additionally, for participants to use a system on the open road, it must be at the final stages regarding safety testing. For this reason, FOTs are generally not feasible until somewhat late in the product development process. Naturalistic. Naturalistic driving studies involve instrumenting participants’ own vehicles and measuring driver behavior and performance over extended periods. In a naturalistic study, no specific instructions are given to the participant and disturbance of their daily routine is minimized. Instrumentation is generally inconspicuous and includes multiple video views, forward radar, accelerations, speed, pedal actuation, latitude and longitude. The first driving study of this type recorded approximately 2 million miles of driving from more than 100 vehicles for 12-13 months [3]. In this study, data were recorded continuously. In some studies, triggering is used which attempts to retain only events of interest, and so reduce data storage requirements. The advantages of naturalistic studies are that the driving data that are collected represent real world driving and any situation in which a driver finds himself or herself. In the Dingus et al. study, 69 crashes of various types and severities were captured. When continuous data are collected, data describing routine driving are available for use in quantifying exposure to different conditions. This tends to be of value in determining event rates, estimating risk, or for comparing what occurs in challenging events to more routine situations. The disadvantages of naturalistic driving studies are that conditions are uncontrolled and project setup and management can be fairly complex compared to other approaches. 1.2 Measures Within the four study methods, many different driving measures can be used. The next sections of the paper introduce these measures and provide some background on their use. Glance. Due to the highly visual information content employed in driving, monitoring the eyes is considered valuable for making a number of inferences. Wickens [4] begins his discussion of the selective nature of attention with discussion of our visual field and visual sampling behavior. He indicates that gaze direction is indicative of direction of attention. Different search behaviors are also indicative of the nature of a task. An ordered movement of the eyes from fixation to fixation is indicative of a supervisory level of attention where the operator has a developed mental model of the task and associated expectancies about where the most useful information will occur. A more random search pattern is indicative of a target search where the location of information is unknown. Glance locations generally include driving-related locations, including the forward road scene, mirrors, and depending on the method of monitoring the eyes and reducing data, can include specific exterior objects or gaze location measured on some coordinate system. When approaching curves, drivers look to the tangent point of the curve and the proportion of glances to the curve tangent increases rapidly in the 1-2 seconds before it is reached [5]. Glance behavior has also been investigated in terms of driver experience [6, 7], driver route familiarity [8], night versus day [9], mirrors and driving tasks [10], and in-vehicle tasks.

Driver Measurement: Methods and Applications

407

When specific in-vehicle interfaces are of interest, measurements of the gaze to these interfaces or locations in the vehicle are also collected [11]. More time looking ahead is clearly of value. Longer glances and frequent glances to in-vehicle locations increase risk. In measuring glance behavior while driving and in the presence of secondary tasks, total glance time, mean single glance time, and number of glances are considered surrogate measures of safety. In other words, because safety is difficult to measure directly, they are measured to provide an indication of safety. In-vehicle tasks should not require glances away from forward longer than approximately 1.6 seconds [12]. Tijerina [13] used glance behavior in a unique way by exploring carfollowing measures at the instant drivers look away from the forward scene. Task. A frequent part of driver measurement involves collecting information about tasks that drivers perform. These may be driving-related tasks, such as turning across traffic at an intersection, or analyses of secondary tasks, such as selecting a song from a music player. When investigating some in-vehicle system, driving performance measures are monitored for degradation. In addition to the visual measurements described in the previous sections, driving-related task measurements generally relate to safety measures, such as unplanned lane departures, abrupt maneuvers, or late reactions. Task measurement can also use many of the techniques used for task analyses in non-driving applications, such as counting the number of steps, errors, task duration, etc. In some cases, due to the difficulty of monitoring a driver’s readiness to respond to critical events, a peripheral detection task [14] is used in which the driver’s ability to detect events outside the vehicle is measured while interacting with some in-vehicle system. Speed and Braking. Speed is an obvious indicator of many aspects of driver behavior and performance. Drivers tend to reduce speed as workload increases [15]. How drivers adjust speed is considered indicative of a number of factors. Summala [16] indicates that speed adjustment can relate to motivation level, including motivations other than transportation related, or subjective risk perception. Speed is the main variable that captures the kinematics of the driving situation. Abrupt decelerations tend to indicate late recognition of a situation or insufficient monitoring of speed and distance. Comfortable decelerations on surface streets range from approximately 0.15g to 0.4g [17]. However, naturalistic driving studies indicate that braking at 0.6g or higher is common, depending on the driver and the driving situation [3]. On freeways, where speeds are higher, decelerations of 0.1g to 0.2g can be considered high [17, 18]. Range, Range Rate, and Time-to-Collision. Forward measures to a lead vehicle are of value in a number of investigations. In general, the range to lead vehicles, when considered simultaneously with the range rate (i.e., closing or separating speeds), provides an indication of severity, aggressiveness, or criticality. Range also provides a coordinate system on which the interaction between the participant’s vehicle and other vehicles, objects or pedestrians can be analyzed for many different driving investigations. The addition of range rate permits kinematic analyses of events, the timing of actions, and estimation of alternative outcomes. A short range with a high

408

S. McLaughlin, J. Hankey, and T. Dingus

closing range rate tends to be a risky situation. As drivers close on lead vehicles, they must either decelerate or overtake. More aggressive drivers will demonstrate these interactions more frequently than a more conservative driver. When range is divided by negative range rate, it indicates the time-to-collision (TTC) if path or speed is not altered. Note that the TTC calculation does not include acceleration of the lead or following vehicle. For this reason, it only provides an instantaneous assessment of TTC. If the lead vehicle is braking hard, and the following vehicle is not, the time available will be shorter than this TTC computation indicates. Investigations involving this type of braking should include acceleration in the computation of TTC. Following. When measuring more steady state following situations, headway is generally used. If one vehicle is following another and range is divided by the speed of the following vehicle, the value indicates headway. Similar to TTC, headway is a time-based measure which accounts for speed. Headway is the value often provided in driver training, in which a time-based following distance is recommended for safety (e.g., 2 seconds or 1 car length for every 16 km/h (10 mi/h)). Headway is different from TTC in that it indicates the time available for the following driver to match a deceleration of a lead vehicle. Forbes [19] indicated that drivers are following, i.e. responding to lead vehicle speed and distance, when headway is between 0.5 and 4 seconds. At longer headways, the following vehicle is involved in overtaking and not directly adjusting speed to match the lead vehicle. Brackstone et al. [17, 20, 21] have done considerable work describing how drivers adjust to and follow a lead vehicle. Steering. Steering provides a good measure of driving performance. Smooth and continuous inputs are considered indicative of driver preview of the roadway. When steering is smooth, no error has been allowed to build up between where a driver wants the vehicle to track and where it is tracking. When an error in lateral position is detected or anticipated by a driver, a correction is made. These corrections, often referred to as “steering reversals,” can be used as a measure of the difficulty a driver is having in maintaining lateral position [22, 23]. Measurement of steering has also been used to monitor workload or attention of the driver when additional task demands are present, either from driving conditions or secondary tasks [24, 25]. Lane Keeping. As introduced in the description of steering, the layout of the road lines and edges are essentially a display that drivers are tracking. When a vehicle exceeds a lane boundary, it is generally considered an indication of poor driving performance. The standard deviation of lane position is a measure of the lateral range within which a driver is holding the vehicle. A limitation of lane position as a measure is that there is discussion around the objectives drivers employ in tracking and what to judge as poor performance. For example, wandering left from one lane into a lane in the same direction on a highway with no other cars around may be acceptable and is probably more acceptable than wandering to the right onto the shoulder. Related to this, it is likely that the experienced driver does not continuously pursue some precise lane position, but instead seeks to maintain the vehicle within some satisfactory range [26].

Driver Measurement: Methods and Applications

409

Lateral Acceleration. Lateral acceleration is used in a similar manner as longitudinal acceleration. Percentiles are typically used to quantify what is comfortable for drivers, but variability in driving situations restricts the use of this measure alone. Lateral accelerations at one speed may be comfortable, while at higher speeds they would not be. Observed lateral accelerations exceeding some level are considered a possible indication of failure to select an appropriate speed for a given situation. Incident or Event. While much of driving is routine, at times an unexpected event will occur. Measures of driver performance in these situations provide both system design guidance and a method for evaluating the readiness of drivers to respond to unexpected situations. Response is considered to be composed of detection, recognition, decision, and movement. In the actions of drivers, these phases of response are often not clearly defined. For example, as a driver detects something ahead, he or she may begin a movement, such as reducing pressure on the gas pedal, before fully determining what the object is, what path it may be following, and what the best final response would be. Olson and Sivak [27] measured perception time as time to release the accelerator, reaction time as time from releasing the accelerator to pressing the brake, and the total time (perception response time) for drivers detecting an object in the road ahead. Lerner [28] measured response time of different age groups when a barrel was released into the road. Measurements of driver braking and steering response while using anti-lock braking systems have been collected in the simulator [29] and on the test track [30]. Response effectiveness and timing vary with factors including driver expectations, stimuli and number of alternatives [4].

2 Applications The following three applications provide examples of how some of the measurements that have been described can be used to guide product design. 2.1 Navigation System Evaluation In the early days of in-vehicle navigation systems, a number of safety-related questions were posed regarding different design alternatives. An on-road study was conducted to determine if any of the navigation configurations would result in an unsafe driving behavior [11]. Driving was measured in the presence of five different navigation system alternatives, as well as use of a conventional paper map. Driving performance measures included the number of crashes, assessment of crashes causal factors, eye glance duration, abrupt lateral maneuvers, abrupt braking maneuvers, unplanned lane deviation, dangerously close headways, turn tracking errors, unsafe intersection behavior, late/inappropriate reaction to an external event, unplanned speed variation greater than 16 km/h (10 mi/h), and stopping in unsafe circumstances. This research was used by the system manufacturer for optimizing their system, and by the federal government to evaluate the safety of a new in-vehicle system. 2.2 Risk Perception in Car Following While systems which warn or intervene in critical events have received attention for sometime, a number of vehicle systems are in development that are intended to

410

S. McLaughlin, J. Hankey, and T. Dingus

Range (m)

30 20 10

Range Rate (m/s)

0

24

26

28

30

32

34

36

38

40

42

24

26

28

30

32

34

36

38

40

42

24

26

28

30

32 34 Time (s)

36

38

40

42

2

0

-2

TTC (s)

100

0

-100

Fig. 1. Range, Range Rate, and TTC shown over time (x-axis). As range decreases, a negative range rate is created, and TTC approaches some minimum until range stops decreasing.

Seconds per hour in bin

30 25 Young 20

Old

15 10 5

0-0.5 0.5-1 1-1.5 1.5-2 2-2.5 2.5-3 3-3.5 3.5-4 4-4.5 4.5-5 5-5.5 5.5-6 6-6.5 6.5-7 7-7.5 7.5-8 8-8.5 8.5-9 9-9.5 9.5-10

0

TTC (s) Fig. 2. The distribution of time spent at TTCs between 0 and 10 seconds are shown for younger and older drivers

support drivers in a more ongoing manner. As with most human-centered systems, a good reference point for design is to measure how people currently do things. To develop this type of understanding in relation to TTC, previously collected naturalistic driving data were used [31]. In Figure 1, the two component measures of TTC are shown over time (i.e. range and range rate), followed by the computed value of TTC in the third plot.

Driver Measurement: Methods and Applications

411

In the figure, between approximately 29 sec and 32 sec, as the instrumented following vehicle approaches a lead vehicle, TTC decreases to some minimum as shown. When vehicles begin separating, TTC has a singularity as range rate passes through zero, and then TTC is negative as the vehicles separate. In 8203 trips from vehicles in the 100-Car Naturalistic Driving Study [3], following was first identified by locating oscillations in following distance behind a lead vehicle. Then, TTC measures were collected at each 0.1-sec time sample. These intermediate data were used to create a distribution of the amount of time spent in 0.5-sec time bins for a younger age group (19-24 yrs) and an older age group (56-68 yrs) (see Fig. 2). Though not exactly how a driver would want an automated system to operate, these curves provide guidance as to the frequencies these two groups of drivers experience TTCs between 10 sec and 0 sec. Due to individual differences, it is likely that different drivers have different comfort levels and preferences. 2.3 Target Detection with Swiveling Headlamps Headlamps that swivel have been of interest for many years as a potential way of putting light on the vehicle’s path of travel earlier than is possible with non-swiveling headlamps. In a study conducted on public roads, McLaughlin et al. [32] placed 18 cm × 18 cm targets along the roadway and measured the distance at which participants detected the targets with swiveling and with non-swiveling high intensity discharge headlamps. The targets were placed in right and left curves of radii between 20-50 m (e.g., intersections) and 215 m. In addition to target detection distances, driving performance measures including speed, speed variance, longitudinal and lateral acceleration, steering variance, and yaw rate were collected. The first iteration of the study indicated that while target detection distances were greater with the swiveling lamps in left hand turns, in right hand turns the detection distances where shorter. These results, particularly when identified with the curve radii in which the differences were found, provided guidance to system engineers. After modifying the algorithm that controlled the swiveling, the study was repeated. With the modifications, the benefits found in left hand turns were maintained and performance in right hand turns was equivalent to performance without swiveling headlamps.

3 Summary The measures described here provide a starting point for quantifying driver behavior and performance. Selection of the best measures depends on the questions that must be answered. Similarly, a range of methodologies are available to researchers and practitioners. For most questions, a trade-off is present between control of conditions and fidelity to real world driving. Controlled methodologies are more powerful for identifying differences between conditions, but it may be difficult to determine how well the results reflect real world outcomes. Naturalistic studies capture what occurs in real world driving, but may have difficulty isolating what factors are influencing behaviors and performance.

412

S. McLaughlin, J. Hankey, and T. Dingus

References 1. Hoffman, J.D., Lee, J.D., Brown, T.L., McGehee, D.V.: Comparison of Driver Braking Responses in a High-Fidelity Simulator and on a Test Track. Transportation Research Record 1803, 59–65 (2002) 2. Kiefer, R., Cassar, M.T., Flannagan, C.A., LeBlanc, D.J., Palmer, M.D., Deering, R.K., Shulman, M.A.: Forward Collision Warning Requirements Project: Refining the Camp Crash Alert Timing Approach by Examining “Last-Second” Braking and Lane Change Maneuvers under Various Kinematic Conditions (DOT HS 809 574) National Highway Traffic Safety Administration (2003) 3. Dingus, T.A., Klauer, S.G., Neale, V.L., Peterson, A., Lee, S.E., Sudweeks, J., Perez, M.A., Hankey, J., Ramsey, D., Gupta, S., Bucher, C., Doerzaph, Z.R., Jarmeland, J., Knipling, R.R.: The 100-Car Naturalistic Driving Study, Phase II - Results of the 100-Car Field Experiment, Washington, D.C. National Highway Traffic Safety Administration (2006) 4. Wickens, C.: Engineering Psychology and Human Performance, 2nd edn. HarperCollins, New York (1992) 5. Land, M.F., Lee, D.N.: Where We Look When We Steer. Nature 369 (1994) 6. Mourant, R.R., Rockwell, T.H.: Strategies of Visual Search by Novice and Experienced Drivers. Human Factors 14(4), 325–335 (1972) 7. Summala, H., Nieminen, T., Punto, M.: Maintaining Lane Position with Peripheral Vision During in-Vehicle Tasks. Human Factors 38(3), 442–451 (1996) 8. Mourant, R.R., Rockwell T.H., Rackoff N.J.: Drivers’ Eye Movements and Visual Workload. Highway Research Record, Washington, DC, pp. 1-10 (1969) 9. Rackoff, N.J., Rockwell T.H.: Driver Search and Scan Patterns in Night Driving. Driver Visual Needs in Night Driving - Trb. Special Report 156, Washington, D.C., Transportation Research Board (1974) 10. Mourant, R.R., Donohue, R.J.: Mirror Sampling Characteristics of Drivers (Sae 740964). Society of Automotive Engineers, Warrendale, pp. 1–12 (1974) 11. Dingus, T.A., McGehee, D.V., Hulse, M.C., Jahns, S., Manakkal, N., Mollenhauer, M., Fleischman, R.: Travtek Evaluation Task C3 - Camera Car Study (FHWA-RD-94-076) Federal Highway Administration (1995) 12. Wierwille, W.: Visual and Manual Demands of In-Car Controls and Displays. In: Peacock, B., Karwowski, W. (eds.) Automotive Ergonomics, pp. 299–320. Taylor & Francis, Washington (1993) 13. Tijerina, L.: Driver Eye Glance Behavior During Car Following on the Road. In: Society of Automotive Engineers, Warrendale PA (1999) 14. Winsum, W., van Martens, M., Herland, L.: The Effects of Speech Versus Tactile Driver Support Messages on Workload, Driver Behaviour and User Acceptance (TNO report TM99-C043). TNO Human Factors, Soesterberg (1999) 15. Newcomb, T.P.: Driver Behavior During Braking (Sae 810832). Society of Automotive Engineers, Warrendale (1981) 16. Summala, H.: Risk Control Is Not Risk Adjustment: The Zero-Risk Theory of Driver Behavior and Its Implications. Ergonomics 31(4), 491–506 (1988) 17. McLaughlin, S., Serafin, C.: On-Road Investigation of Driver Following and Deceleration on Surface Streets. In: Proceedings of the IEA 2000/HFES 2000 Congress, pp. 3-294–3297 (2000) 18. Brackstone, M.A., Waterson, B., McDonald, M.: Determinants of Following Headway in Congested Traffic. Transportation Research Part F 2(12), 131–142 (2009)

Driver Measurement: Methods and Applications

413

19. Forbes, T.W.: Human Factors in Highway Traffic Safety Research. Wiley-Interscience, Hoboken (1972) 20. Brackstone, M.A., McDonald, M.: Car-Following: A Historical Review. Transportation Research Part F 2, 181–196 (1999) 21. Brackstone, M.A., Sultan, B., McDonald, M.: Motorway Driver Behavior: Studies on Car Following. Transportation Research Part F 5, 31–46 (2002) 22. McLean, J.R., Hoffmann, E.R.: The Effects of Lane Width on Driver Steering Control and Performance (Paper No. 881). ARRB Proceedings 6(3), 418–440 (1972) 23. McLean, J.R., Hoffmann, R.: Steering Reversals as a Measure of Driver Performance and Steering Task Difficulty. Human Factors 17(3), 248–256 (1975) 24. Macdonald, W.A., Hoffmann, E.R.: Review of Relationships between Steering Wheel Reversal Rate and Driving Task Demand. Human Factors 22(6), 733–739 (1980) 25. Nakayama, O., Futami, T., Nakamura, T., Boer, E.: Development of a Steering Entropy Method for Evaluating Driver Workload. SAE, Warrendale (1999) 26. McLean, J.R., Hoffmann, R.: Steering Reversals as a Measure of Driver Performance and Steering Task Difficulty. Human Factors 17(3), 248–256 (1975) 27. Olson, P.L., Sivak, M.: Perception-Response Time to Unexpected Roadway Hazards. Human Factors 28(1), 91–96 (1986) 28. Lerner, N.D.: Brake Perception-Reaction Times of Older and Younger Drivers. In: Proceedings of the 37th Annual Meeting Human Factors and Ergonomics Society, pp. 206–210 (1993) 29. Mazzae, E., Baldwin, G.H.S., McGehee, D.V.: Driver Crash Avoidance Behavior with Abs in an Intersection Incursion Scenario on the Iowa Driving Simulator (1999) 30. McGehee, D.V., Mazzae, E.N., Baldwin, G.H.S.: Driver Reaction Time in Crash Avoidance Research: Validation of a Driving Simulator Study on a Test Track. In: Proceedings of the IEA 2000/HFES 2000 Congress, pp. 320–323 (2000) 31. Yamamura, T., Kuge, N., McLaughlin, S., Hankey, J.: Research on Quantification of Drivers’ Risk Feelings While Car-Following Using Naturalistic Driving Data (JSAE 20085164). Society of Automotive Engineers of Japan (2008) 32. McLaughlin, S., Hankey, J., Green, C., Larsen, M.: Target Detection Distances and Driver Performance With Swiveling HID Headlamps (SAE 2004-01-2258). Society of Automotive Engineers Government/Industry Meeting, Washington DC (2004)

The Assessment of Driver's Arousal States from the Classification of Eye-Blink Patterns Yoshihiro Noguchi1, Keiji Shimada1, Mieko Ohsuga2, Yoshiyuki Kamakura2, and Yumiko Inoue2 1 Information Technology Lab., AsahiKASEI Corp., AXT Main-tower 22F, 3050 Okaga, Atsugi-shi, Kanagawa 243-0021, Japan {noguchi.yg,shimada.kb}@om.asahi-kasei.co.jp 2 Biomedical Engneering, Osaka Institute of Technology, 5-16-1 Omiya, Asahi-ku, Osaka 535-8585, Japan {ohsuga@bme,kamakura@is,yumiko@is}.oit.ac.jp

Abstract. To realize the real-time assessment of driver’s arousal states, we propose the assessment method based on the analysis of eye-blink characteristics form image sequences. The driver’s arousal level while driving is not monotonous falling from high to low. We proposed the two-dimensional arousal states transition model which was taken into account the fact that a driver usually held out against sleepiness. The eye-blink pattern categories were classified from image sequence using HMM (Hidden Markov Model), then the driver’s arousal states were finally assessed using HMM by histogram distribution of those typical eye-blink categories. The arousal assessment results are also verified against the rating results by trained raters. Keywords: arousal states, drowsiness, blink, image, EOG, HMM, driver.

1 Introduction Driving with drowsiness is one of the main reasons causing traffic accidents. Recently, the assessment of driver's arousal level has been expected as an element of technology for establishing a safety transportation system. Drivers tend to be unaware by themselves falling into a state of being drowsy, or may be compelled to overcome the drowsiness and continue driving despite being drowsy. In this paper, we propose a method to assess driver’s arousal states which is regarded as one of the most important issues for the development of adaptive driving support systems. Arousal is important in regulating consciousness, attention, and information processing. It can be observed by complex variations of physiological measurements, such as brain and heart activities, and facial expressions. We focus on the changes in eye-blink activities, particularly the eye-blink patterns, as an important element for the assessment of arousal states. Our system provides a non-intrusive and driverindependent approach with the aim to identify the struggling state, or the state in which the drivers suffer from low arousal levels and trying to spend a large effort to overcome drowsiness. We rely on advanced image processing and pattern D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 414–423, 2009. © Springer-Verlag Berlin Heidelberg 2009

The Assessment of Driver's Arousal States from the Classification of Eye-Blink Patterns

415

classification methods based on Hidden Markov Model (HMM) to classify eye-blink categories from the driver’s facial video. The arousal states are then estimated from the distributions of the classified eye-blink categories. Despite a long history of research in this field [1], [2], [3], the method to efficiently assess the arousal states has not been realized in a real world. For one thing for the study of the assessment of driver's arousal level, there is sleepiness detection using blinks of driver. Heretofore, an eyelid closure time and a blinking period, which are relatively easy to be measured, have been focused attention on as the blinking characteristic for the assessment of driver's arousal level. However, the driver's arousal level while driving is not monotonous falling from high arousal level to low. The driver usually holds out against sleepiness while driving, so that we consider the same eyelid closure time and blinking period is occurred sometimes even though the driver's arousal level is different. Therefore it is difficult to assess the driver's arousal level only using these characteristics, and it has not been in practical use yet. In this paper, we propose a new method for the assessment of driver's arousal states from eye-blink image sequences using the histogram distribution of the typical eyeblink waveform categories according to changing of the arousal state. According to changing of the arousal state, the histogram distribution of the eye-blink waveform category is changed. Therefore we classified the blink pattern categories from eyeblink image sequence using HMM (Hidden Markov Model), then the driver’s arousal states were finally assessed using HMM by histogram distribution of those typical blink categories. The arousal assessment results are also verified against the rating results of driver’s facial expression by trained raters.

2 Driver's Arousal States 2.1 Two-Dimensional Arousal States Model In a traditional method for drowsiness assessment, the drowsiness states can be expressed subjectively. They are rated into one of the following states [4]: not drowsy, becoming drowsy, very drowsy, and extremely drowsy. Each drowsiness state corresponds to a qualitative drowsiness level ranging from 1 to 4. By contrast, another model based on Karolinska Sleepiness Scale (KSS) [5] defines the sleepiness levels from 1 to 9. These measurements consider only one dimension of drowsiness characteristics, namely the drowsiness level. However, the drivers usually hold out against sleepiness while driving, the drivers tend to experience more complicated arousal states. Before the drivers enter the extremely drowsy state and thrive on sleep, they may have to spend large efforts striving to keep themselves awake. We refer to this state as a struggling state. This state involves complex variations of physiological measurements due to the driver’s efforts to overcome drowsiness. Therefore, it is necessary to include this countermeasure effort of the driver as an additional factor to the model. Based on this idea, we propose a two-dimensional arousal states model shown in Fig. 1. On horizontal dimension, it is corresponded to the arousal level from high arousal on the left to low arousal on the right. On the vertical dimension, the effort against drowsiness is represented and ranged from low effort on the top to high

416

Y. Noguchi et al.

small Alert

high

Asleep

low

Arousal Effort Struggling

large

Fig. 1. Two-dimensional arousal states model

effort on the bottom. The struggling state lies on the bottom right corner of the model, where the arousal level is low, but the countermeasure effort is high. 2.2 Data Preparation Driving Data Collection. We collected the data from recruited fifty-nine able-bodied paid volunteers (both genders aged between 20 and 60) to participate in the experiments. All subjects were licensed drivers, but their driving experience varied significantly. The objective of the experiments was to intentionally induce the drowsiness of the drivers. However, extra cares were taken to ensure that all subjects were not aware of this object. We used a video game system, projected on a big screen, and a controller equipped with a steering wheel, an accelerator and a brake pedal. A simulated driving environment with a monotonous driving course was specially created for this experiment. The video camera was mounted on the dashboard for recording driver’s face and his eye-blink activities. A vertical electro-oculogram (v-EOG) recording was also carried out. The sampling rate of the v-EOG was 200 Hz, 5,000 times gain, with lowpass filter at the 35Hz. A video sequence with 640x480 pixels and 30fps of driver's eye-blink was also shot simultaneously. Each subject was instructed to perform a fifty-minute driving task in which they could opt to abort at any time. Before the experiment started, a brief training session was given so that the driver became familiar with the simulated driving system. Arousal States Rating. We propose a scale for arousal state assessment in accordance with the two-dimensional arousal states model which we proposed in Fig. 1. The degree of arousal (from the first dimension) is represented by a number from 1 to 4, smaller numbers meaning higher arousal levels. We divide the states 2 and 3 into the α and β states, according to the levels of the efforts used to overcome drowsiness. The states 2α and 3α indicate low countermeasure efforts against drowsiness, whereas the states 2β and 3β represent high efforts to fight against drowsiness. Finally, the state W indicates the state where a driver suddenly wakes up from a very drowsy condition. The W means a special situation when the driver’s arousal state was rapidly changed. As a result, this scale is composed of 7 (1, 2α, 2β, 3α, 3β, 4, W) arousal states.

The Assessment of Driver's Arousal States from the Classification of Eye-Blink Patterns

417

Fig. 1. Experimental setting Table 1. Arousal states rating from facial expression low

４

W

Extremely low arousal

３α

Arousal

Very low arousal

２α

Slightly low arousal

１

Suddenly awake

３β

Very low arousal with struggling

２β

Slightly low arousal with struggling

high

High arousal Effort small

large

The characteristics of each arousal state based on facial expressions are summarized in Table 1. To define a ground-truth data of arousal states, we need to determine the arousal state of the driver at any given time period. In our experiments, the evaluation of driver’s arousal states by three trained raters was carried out. The video sequence of the fifty-minute driving task was divided into small segment twenty-second video sequence. Then, trained three raters rated the arousal states from facial expression of driver on individual random sequenced small segment video sequence on each. The levels of efforts (β) were rated using examples from behaviors such as moving mouth, robbing face, head movement, conscious eye-blinking, and conscious deep breathing. Two samples of the arousal states rating results from a fifty-minute driving session is shown in Fig. 4. The arousal states according to our proposed two-dimensional model can be depicted in this result. The vertical axis corresponds to the arousal levels, from 1 (for high arousal) to 4 (low arousal), whereas the gray shade for low effort (α) and black shade for high effort or struggling states (β).

418

Y. Noguchi et al.

3 Arousal States Assessment Using Eye-Blink Categories 3.1 v-EOG Based Eye-Blink Categories Clustering Given the v-EOG waveform a fifty-minute driving session, we extracted only the parts where eye-blinks occurred. Three parameters were derived from each v-EOG eye-blink waveform (see Fig. 3). These parameters consist of the eye closing time (Tc), the eye opening time (To), and the eye-blink aperture (PA). The classification of eye-blink patterns was performed by K-means clustering with the number of clusters (K) set to eight clusters. The analysis done by an expert was then carried out in order to decide which clusters represented a standard eye-blink category A in high arousal state 1. The average parameters PA, Tc and To of the eyeblink category A were subsequently computed. The rest of the blinks were individually assigned the blink categories based on their parameters and the constraints shown in Table 2. We compared each parameter of the standard eye-blink category A. The parameter was regarded as ‘standard’ when it fell within the ranges ±5% of category A for PA and ±10% of category A for Tc and To, whereas ‘high’ and ‘low’ indicated the levels above and below these ranges, respectively. Tc

To

PA

Fig. 2. Parameters of v-EOG waveform Table 2. Characteristics of eye-blink categories Eye-blink categories A

Standard

B

Large PA and/or long (Tc and/or To)

C

Small PA

D E

Standard or small PA and long (Tc and/or To) Small PA and short (Tc and To)

3.2 Arousal States Assessment Samples of eye-blink categories classification results based on v-EOG signals from two drivers were shown in Fig. 4. Fig. 4 shows the distribution of blink categories over a 50-minute driving session, along with a corresponding arousal states rating

The Assessment of Driver's Arousal States from the Classification of Eye-Blink Patterns

EOG Eye-blink Categories

Ⅰ

Subjent

419

Ⅱ

Subject

A B C D E

Arousal States Rating

Fig. 4. The correspondence between eye-blink categories distribution and arousal states rating

results. It can be seen that an increase in eye-blink category D associates with low arousal levels. However, the frequent occurrences of eye-blink category B tend to be responsible for a rise in the countermeasure efforts to fight against drowsiness (the struggling states). The histogram distribution of the eye-blink categories based on v-EOG according to changing the arousal states is perceived, therefore we consider a capability of the assessment of driver's arousal states using this histogram variation.

4 Arousal States Assessment from Eye-Blink Image Sequences The system requires only an input from the video camera attached on the dashboard. It analyzes eye-blink patterns from the acquired video streams. The arousal states of the driver are derived from the histogram distributions of the eye-blink patterns. The system is consisted of two sub-stages, the eye-blink categories classification stage and the arousal states assessment stage. 4.1 Eye-Blink Categories Classification Feature Extraction. Given a video sequence of the drivers captured at 30 frames per second, the first process is to detect the eyes from each image. The eye-blink images are transformed into sequential feature vectors for eye-blink categories classification. We measure the eye-blink aperture ratio between the height (H) and width (W) of the eye as shown in Fig. 5(b), which is invariant to the distance of the driver’s face and the camera. We apply a method for eye-blink aperture measurement based on Active Shape Model (ASM) [8]. The ASM method uses the statistical model of the eye (in this case, a set of 20 points corresponding to the outline of upper and lower eyelids) constructed from 340 eye images of 17 drivers. By varying the model parameters based on these constraints, we can possibly generate any eyelid shape of the eye-blink. Considering

420

Y. Noguchi et al.

a

c

b

Fig. 3. ASM eye-blink aperture measurement Eye-blin k Category

Subject

Ⅰ

Su bj ect

Ⅱ

A

B

B

－A

B

－A

C

C

－A

C

－A

D

D

－A

D

－A

E

E

－A

E

－A

Fig. 4. Eye-blink aperture normalization

an unseen eye image, we rely on an iterative optimization process to find the model parameters that best fit the eye as shown in Fig. 5. Fig. 5(c) shows the example of eye aperture measurement result of eye-blink. The curve is very similar to v-EOG despite a difference in measurement sampling rates (200 Hz for v-EOG and 30 fps for video). Then, we perform duration normalization by subtracting the aperture measurement with the average aperture measurement of blink category A. Fig. 6 shows the mean shapes of each blink category of subject and on the left, together with the corresponding results after duration normalization along the right side. Despite the differences in the blink durations of two subjects, it is shown that the resulting shapes for each category share similar characteristics.

Ⅱ

Ⅰ

Categories Classification. The eye-blink categories classification process relies on Hidden Markov Model (HMM), a statistical classifier capable of describing complex dynamic behaviors. In this case, we construct five HMM, one for each category, A to E. Each HMM represents a driver-independent model for each eye-blink category. Given the normalized feature vectors extracted from a sequence of eye-blink images as the input of the HMM, the classification process is done by choosing the HMM

The Assessment of Driver's Arousal States from the Classification of Eye-Blink Patterns a22

1

a12

2

a33 a23

b2 (ot)

3

421

a44 a34

b3(ot)

4

a45

5

b4(ot)

Fig. 5. A topology of HMM

that gives the highest likelihood probability. The blink category represented by that HMM is regarded as a recognition result. The HMM has a left-to-right topology as shown in Fig. 7. Each state has a transition to itself and the next state. The HMM parameters of state i are composed of state transition probabilities from state i to j (aij ) and the output probability density function (pdf), or bi. In this case, bi is defined by a mixture of Gaussian pdfs. The optimal parameters of the HMM need to be computed before they can be used as a classifier. We first train the models by using the maximum likelihood (ML) training method. Training samples and their category labels are derived from the v-EOG eyeblink categories classification as ground truth data. 4.2 Arousal States Assessment The histogram distribution of eye-blink categories over a driving session has a close relationship to drivers’ arousal states, including a sign of drivers entering the struggling state. We build another HMM classifier for the assessment of arousal states. The ratio of each blink category is obtained by dividing the histogram by the total blink numbers occurred in that period. We construct a feature vector from this blink ratio (five dimensional feature vector, one for each blink category) plus another dimension indicating the total blink numbers in M seconds. We extract feature vectors continuously every N seconds, from which a temporal change in eye-blink ratios can be observed. Thus, each feature vector is overlapped by M−N seconds. We generate six HMM in accordance with the proposed two-dimensional arousal model. The data used for HMM training are obtained from the v-EOG eye-blink category histogram distribution as ground truth and its corresponding arousal state rating data. Since we focused on a driver-independent system, all experiments were carried out based on a leave-one-out cross validation method. The experimental results of four subjects are shown in Fig. 8. The top graph of each subject is the arousal states rating from facial expression over a driving session as ground truth. The middle six graphs from A to E is the histogram distribution of eye-blink categories from eye image sequence. And the bottom graph is the assessed arousal states from the histogram distribution of eye-blink categories using eye image sequence. The average correlation coefficient between the ground truth and the assessed arousal states of four subjects is 0.82.

422

Y. Noguchi et al. Subject

Subject

Subject

Subject

Arousal States Rating HMM Blink Category A B C D E HMM Arousal States Assessment

Arousal States Rating HMM Blink Category A B C D E HMM Arousal States Assessment

Fig. 8. Comparison between the ground truth (top) and the arousal states assessment (bottom)

5 Conclusions We investigated the assessment of driver’s arousal states from eye-blink image sequence of driver as non-contact measurement. We proposed the two-dimensional arousal state transition model. The blink pattern categories were classified from eyeblink image sequence using HMM, then the driver’s arousal states were finally assessed using HMM by histogram distribution of those eye-blink categories. We consider a capability of the assessment of driver's arousal states from the variation of blink categories histogram.

The Assessment of Driver's Arousal States from the Classification of Eye-Blink Patterns

423

References 1. Hamada, T., Ito, T., Adachi, K., Nakano, T., Yamamoto, S.: Detecting method for drivers’ drowsiness applicable to individual features. Intelligent Transportation Systems 2, 1405– 1410 (2003) 2. Miyakawa, T., Takano, H., Nakamura, K.: Development of non-contact real-time blink detection system for doze alarm. In: SICE 2004 Annual Conference, vol. 2, pp. 1626–1631 (2004) 3. Home, J.A., Reyner, L.A.: Driver Sleepiness. Sleep Monitoring, IEE Colloquium (1995) 4. Wierwille, W.W., Ellsworth, L.A.: Evaluation of Driver Drowsiness by Trained Raters. Accident, Analysis and Prevention 29(5), 571–581 (1994) 5. Akerstedt, T., Gillberg, M.: Subjective and Objective Sleepiness in the Active Individual. Int. Journal of Neuroscience 52(1-2), 29–37 (1990) 6. Noguchi, Y., Nopsuwanchai, R., Ohsuga, M., Kamakura, Y., Inoue, Y.: Classification of Blink Waveforms towards the Assessment of Drivers Arousal Levels - An Approach for HMM Based Classification from Blinking Video Sequence. In: Harris, D. (ed.) HCII 2007 and EPCE 2007. LNCS, vol. 4562, pp. 779–786. Springer, Heidelberg (2007) 7. Ohsuga, M., Kamakura, Y., Inoue, Y., Noguchi, Y., Nopsuwanchai, R.: Classification of Blink Waveforms toward the Assessment of Drivers Arousal Levels - An EOG Approach and the Correlation with Physiological Measures. In: [7], pp. 787–795 8. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(6), 681–685 (2001)

Guiding a Driver’s Visual Attention Using Graphical and Auditory Animations T. Poitschke, F. Laquai, and G. Rigoll Technische Universität München Institute for Human-Machine Communication Theresienstrasse 90, 80333 Munich, Germany {poitschke,laquai,rigoll}@tum.de

Abstract. This contribution presents our work towards a system that autonomously guides the user’s visual attention on important information (e.g., traffic situation or in-car system status signal, etc.) in error prone situations while driving a car. Therefore we use a highly accurate head-mounted eyetracking system to estimate the driver’s current focus of visual attention. Based on this data, we present our strategies to guide the driver’s attention to where he should focus his attention. These strategies use both graphical animations in form of a guiding point on the Graphical User Interface as well as auditory animation that are present via headphones using a Virtual Acoustics system. In the end of this contribution, we present the results from a usability study.

1 Introduction This contribution presents our work on a system that autonomously guides the user’s visual attention on important information (e.g., traffic situation or in-car system status signal, etc.) in error prone situations while driving a car. According to investigations of U.S. NHTSA (National Highway Traffic Safety Administration) from the year 2000, the main reasons of 25% of all traffic accidents are driver inattention or distraction [10]. Also, Austrian and German accident statistics consider an attention deficit as main reason for at least 10% of all accidents with fatalities (e.g. [6]). This can mainly be justified by the high risk of distraction during the operation of today’s infotainment systems compared to tasks like entering a telephone number or operating previous simple radio systems. For example, the entry of a navigation destination needs much more time. Furthermore, during those tasks the (visual) attention does not only decrease, but the view is even for a longer period of time not on the road but in the car. Thus, one goal during the development of new infotainment systems must be to focus the driver’s attention – especially in a dangerous situation – quickly and safely to where they are most needed. Therefore, the task of recognizing a potential risk of accidents can be partially accomplished by the car itself, for example by steadily observing the distance to the vehicle in front or by recognizing lane borders.

D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 424–433, 2009. © Springer-Verlag Berlin Heidelberg 2009

Guiding a Driver’s Visual Attention Using Graphical and Auditory Animations

425

2 Previous Work To implement the presented concepts previous work from our institute and our research partners was integrated. Here, we used a high accurate eyetracking system to analyze the current focus of visual attention as well as a playback system for virtual acoustics using headphones.

Fig. 1. Head-mounted eyetracking system [13] and headphones for the virtual acoustics

2.1 Eyetracking In order to efficiently guide a user’s attention it is very important to know where the user is currently focusing his attention. The user’s current focus of attention is highly correlated to its current line of sight and the corresponding fixated point. For the detection of the user’s line of sight and gaze point, we used an eye tracking system (see fig. 1). To compute highly accurate and fast gaze data, we used the combination of the head-mounted gaze tracking system EyeSeeCam [13] and an external tracking [1]. Thereby, the gaze tracker computes the user’s line of sight in its own coordinate system and is simultaneously tracked by a stereo camera system that also defines the world coordinate system. Afterwards, the gaze data is transformed into the world coordinate system using the position and orientation of the gaze tracker in 3D. From the current line of sight we can compute the user’s current gaze point by simply subtending the gaze line with previously calibrated areas [1]. 2.2 Virtual Acoustics To present the warnings in a manner that is distinguishable from other ordinary warnings in the automobile, and to test whether a moving sound can have a stronger

426

T. Poitschke, F. Laquai, and G. Rigoll

effect on the focus of attention compared to a fixed one, we used the virtual acoustics presented in [12] to replay the sounds. Thus, using headphones, it is possible to present a sound in any horizontal angle around the test person.

3 Concept and Implementation To present the warning strategies in a most realistic environment, a rudimentary InVehicle Information System (IVIS) was implemented. The interaction system bases on a large central information display. Therefore, we used a 15 inches touchscreen that is mounted on a position comparable to current central displays in massproduction cars (see figure 3 and [8]). On this screen, the graphical user interface (GUI) of our infotainment system is visualized. For this contribution we did not implement the complete functionality but the user can enter navigation destinations or telephone numbers. The system also processes the data from the eyetracker and controls the virtual acoustics functionality. 3.1 In-Vehicle Information System The GUI of the used prototypical IVIS is divided into a status bar at the bottom and an extensive control panel (see fig. 2). The status bar serves as return key for the menu navigation (e.g., returning to a superior menu level) and displays warnings. In the control panel the user can for example switch to the phone menu, a calendar function or select the radio. However, for this contribution only the menu items navigation and phone provided functionality, as these menus hold the largest potential for distraction in the car [11]. To ensure a better operability, the buttons are illustrated with icons that reflect their function, e.g., a telephone or a street. In the phone menu, the user can choose between the normal input of a telephone number and an address book entry. The number pad is arranged as known from a mobile phone. The navigation menu provides the user menu with four options to enter a navigation destination. However, for this contribution we only used the option ”New Destination”. The destination input is realized with an alphabetical keyboard. If the destination entering is finished, the guidance can be started. The user can switch between boxes at anytime to correct mistakes. 3.2 Graphical Animations There arise several possibilities to guide a user’s attention on a screen. For example, the user can be advised of messages that do not have to be directly taken into account, but as soon as possible, e.g., a new traffic message. According to the orientation reflex one can assume that a blinking light (=movement) in the peripheral field of view should attract sufficiently enough attention. Furthermore, a traffic jam message can be indicated by a blinking light in the status bar. Since the gaze direction is known through the eytracking system, a more detailed dialog may open if the user is looking at the message. Such messages can be easily acknowledged by tapping on the monitor.

Guiding a Driver’s Visual Attention Using Graphical and Auditory Animations

427

Fig. 2. Screenshots of the used interaction system. The left part shows the main menu of the implemented interaction system. The right part illustrates the graphical guidance of the attention using a guiding dot (light dot on the GUI). The small black dot represents the current gaze point, sand the orange line marks the target. Both are not visualized during system operation.

Fig. 3. System overview: mock-up and mounted displaying areas

428

T. Poitschke, F. Laquai, and G. Rigoll

However, there is also a second method to guide the user’s attention if this concentrates on the road or the blinking on the monitor is simply missed. For such cases, we also implemented a guiding point. In addition to the blinking icon in the status bar, the entire screen is faded out, except of an area of about 30 pixels. Thus, this area attracts the user’s attention (see [7]), and is designed to guide its attention to the actual target. Therefore, this area does not move autonomously around, but is positioned on the connecting line between the current gaze point and the actual target, where the user is supposed to look at. A distance of 150 pixels between the current gaze point the guiding point has proven useful. This phenomenon is comparable to a dust particle on the eye ball. The eye follows this dust particle involuntarily, and for example, the eye moves ever upwards. Since the movement is not automatic but in relation to the viewing direction, it is difficult to miss or ignore. The point disappears when the user’s view arrived in the ”target area”, which is spread 300 pixels around the warning message. The area is chosen that large, because the target is in the border area of the working area of the used eyetracking system. An unintended triggering of the warning messages did not occur during the presented tests. 3.3 Auditory Animations To ensure a fast and robust attention guiding, also an auditory signal can be reasonable, as this cannot be overlooked or occluded. Thus, such advises should be perceived more reliable. Therefore, the virtual acoustics system presented in [12] was used. This system allows it to replay a sound to a test person that is placed on a circle around him. To design a sound that is best possible locatable, it is important that the sound has the widest possible spectrum. Since the sound should also enable an attention guiding from the lower mounted IVIS to the top (e.g., the road), the elevation has to be simulated additionally. Therefore, the sound’s spectrum has to contain components at 8kHz, which are perceived as coming from above [5]. However, white noise should not to be used as a warning sound, because the human is not used to. Furthermore, a single sine at 8kHz is very unpleasant. Hence, more spectral components must be present. Thus, the basic warning sound is composed of three sine tones at a distance of one octave at 500Hz, 1kHz and 2kHz with a broadband noise of 0.1 to 8kHz. In addition, a narrow-band noise with a higher level at 8kHz is used. This sound mixture is good to locate and can direct the driver’s attention to the top, without being unpleasant. Similar to the guiding point, we implemented a sound that is moving with respect to the current gaze point. This is located between the current gaze direction and the target direction (e.g., the road). Thus, the tone is designed to direct a user’s attention solely with its movement. In contrast to the animated sound, we also evaluated a fixed sound in a direction from the road, which might guide the attention faster because of the given immediacy. 3.4 Combined Animations To achieve a maximum of benefits, the combination of visual and auditory attention guiding is possible. Thus, the guiding point described above was combined with the

Guiding a Driver’s Visual Attention Using Graphical and Auditory Animations

429

acoustic guidance. The entire concept does not only address the attention guiding on the screen, but particularly guiding the view from the screen to the road. Thus, the immediately visible point on the IVIS starts the movement of the user’s view from the screen to the road. This movement will be perpetuated by the sound, even if the user’s view has left the displaying area.

4 System Evaluation The experiments were carried out in our driving simulation lab. The presented systems were integrated on an vehicle mock-up (see fig. 3), which is a rudimentary cockpit with a driver’s seat, force feedback steering wheel and a touch screen to display the IVIS. In addition, the eye tracking system installed. Besides the eyetracker goggles the test persons (TP) wore the headphones for the virtual acoustics. To measure the driving performance of the TPs, the Lane Change Test (LCT, see [3]) is used as primary task measure, and the Peripheral Detection Task (PDT, see [9]) is used as secondary task measure. 4.1 Test Procedure At the beginning of the test, the TP had to fill out a first questionnaire to collect demographic data. After each part of the test, another questionnaire composed of the SEA scale (self-report measure to evaluate the subjective mental workload, see [4]) and the System Usability Scale (SUS, see [2]) was handed out. At the end of the entire test run, there was a final interview. During the test, the reaction times and the missed points of the PDT as well as the lane keeping performance in the LCT were recorded for later analysis. Furthermore, the response times to alerts and warning, and the corresponding view position at the beginning of the system feedback were saved. First, the subjects had the possibility to acclimatize with the LCT and PDT. The next step was a test trip while a given navigation destination had to be entered into the navigation system without any guiding concept available. Afterwards, three runs with a simultaneous navigation input, and our warning strategies were arranged: display of graphical warnings on the IVIS and the moving and fixed the warning sound. The order varied from person to person to avoid learning effects from the results. The navigation inputs were chosen in a way that the TP was engaged as long as possible. At the beginning and the end of the experiment, a baseline ride without any secondary tasks were accomplished (only LCT and PDT) as a reference. 4.1.1 Graphical Warnings The subjects were first informed that there will appear several warnings during the test rides, followed by a short demonstration of each graphical warning strategy (i.e., with and without guiding point). During the trip, the timing and manner of warnings were controlled from the test supervisor. Thus, it has been ensured that the messages with a guiding point occurred as often as those without. However, the timing was chosen in a randomly manner. If a warning message was not detected after 10 seconds, it was canceled and registered as disregarded.

430

T. Poitschke, F. Laquai, and G. Rigoll

4.1.2 Auditory Warnings The auditory warning strategies (i.e., moving or fixed sound) were also presented before the test runs started. Thus, the volume could be adjusted to a pleasant value. Furthermore, the TPs were introduced, that there will appear several auditory warnings prior to the lane changes during the test runs, which indicate that the driver needs to focus its attention back on the road. These sounds were triggered by the test supervisor. Again, the warnings were uniformly distributed between the attention guiding with and without guiding point. 4.2 Results The following section summarizes the results of the questionnaires, the PDT and the LCT. The experiments were conducted with 15 test persons (TP, 14 male and one female). The average age was 28.6 years. Seven TPs stated that they have already operated a navigation system with touchscreen, five used only systems without a touchscreen, and three had no experience with navigation systems. The level of interest in technology within the sample was high, and almost all expressed their interest in technical innovations. All subjects reported to use their PC daily. Therefore, it is unlikely that differences in the test results are caused by different abilities in the operation of the IVIS. During the baseline runs, all subjects responded to all points of PDT with an average response time of 766ms. The mean lane deviation was 0.51m. During driving and competitive navigation inputs without any warnings, only an average of 81% of the PDT points was recognized with an average response time of 1043ms. Thus, the reaction was 36% slower (average 277ms) compared to driving without load. The lane deviation was 1.03m (twice as high as during the reference runs). These results are all statistically significant. During a test without driving task, all TPs could differentiate between a fixed sound and a sound that was rotating around them. However, during driving only four of 15 TPs stated that they recognized a difference between the fixed and the moving sound. Thus, a separated evaluation is addressed only very briefly. This difference to the test runs is probably caused by the high load during the driving experiments (i.e., LCT and PDT combined with navigation input). One TP even stated that he did not perceive the moving of the sound, because he was not able to pay attention. The TPs who perceived a difference, rated the task with the fixed sound as more strenuous (105 to 90 on the SEA-scale). Further, these TPs had a larger lane deviation then during the runs with the moving sound (1.03 m to 0.95 m). These differences are not significant. In contrast to runs without warning, we achieved a significant improvement of the lane keeping performance if the warning strategies were enabled: The lane deviation (with warning, fixed sound) averaged only 0.83 m, which is only slightly more than 1.6 times as much as for the reference run. During runs without warning, the TPs had a lane deviation which was more than twice as much as during the reference runs. This represents an improvement of about 40%. The data of trips with a moving sound were slightly worse. However, this difference was not significant. The remaining additional lane deviation was caused by the poor lane keeping while typing the navigation destination when the user’s gaze rested on the IVIS. However, using the

Guiding a Driver’s Visual Attention Using Graphical and Auditory Animations

431

auditory warning strategies we achieved a reduction of late lane changes and missed road signs. Furthermore, the warning strategies did not have a significant effect on the PDT response times (1064ms to 1043ms). Here, an effect was not expected, since the timing for the warnings was tuned to the road signs, but not to the PDT. Interestingly, the detection rate decreased by 10% (from 80.92% to 70.06%). This is probably related to the fact, that the TPs relied on the auditory warning and therefore did not check the PDT points as often as without warnings. This is also related to the test setup, as the top priority task for the TPs was the lane changing. Furthermore, during normal driving warnings will appear with a much lower frequency than in the presented experiment. Therefore, habituation to the sound is not as strong as in the given experiment. However, this effect is still not negligible. Summing up, all test persons stated that the auditory warning attracts their attentiveness very fast and consequently can guide it very quickly and safely back on the road. For the attention guiding using graphical animations on the screen (i.e., the guiding point and flashing warnings), the response time for the PDT is something higher compared to runs, in which only an input into the navigation system was to make (1109ms to 1043ms). This difference is significant but small. Analyzing the PDT detection rates, this effect gets more obvious. The detection rate is only at 66% compared to 81%. This is probably due to the fact, that the driver’s view is often focused on the IVIS. The lane deviation is with 1.18m also slightly higher. However, this difference is not significant. Therefore, it can be assumed that the warning provides only a minimal additional burden. A similar result provides also the evaluation of subjectively experienced effort. With 97 points during the runs with guidance point and 98 points for the rides without a point, this is in the same range as for the runs with auditory warnings. Since the primary task (LCT) is always processed first, the evaluation of the response times was not very enlightening. It was apparent that in the average 37% of the flashing warnings had to be aborted, since it was not detected after 10 seconds. This might be partly linked to the placement of the warnings. During our experimental procedure, this proved to be adverse, since the subjects occluded the warnings several times with their own arms, even though they were instructed that this problem might occur. However, they were often too busy with the primary task that they could not always respond to the flashing warning. Nevertheless, the attention guiding using the guiding point always led to the desired goal. The attention guidance using the guiding point also performed better regarding the results from the System Usability Scale. Here, the guiding point achieved 75 from 100 possible points, compared to 64 points for the flashing warnings. This result is statistically significant. Even though an evaluation of the response times was not reliable due to non deterministic latencies that occurred randomly during the experiments, more than half of the subjects stated, that they could react faster, if they could simply follow the guiding point with their eyes. Anyhow, half of TPs felt restricted by this guidance concept. During normal driving, a warning should not occur as frequently as in the experiment and not always during the interaction with the IVIS. Therefore, one can assume that the sense of restriction will decrease.

432

T. Poitschke, F. Laquai, and G. Rigoll

Nearly all TPs stated that the guiding point guided their focus of attention to the monitor in every case. Further, 12 TPs (80%) rated the guidance point as safe and 10 (67%) as pleasant in the final questionnaire.

5 Conclusion and Outlook The evaluation showed that while the guiding system was activated, the test persons missed less traffic signs and therefore accomplished the primary task with a small lane deviation. Also, the overall lane deviation was smaller if the system exhorted the drivers to reallocate the visual attention on the driving task. In our study, the auditory animations were more effective compared to the graphical animations. Currently we are working on the expansion of the presented system. Therefore we are especially testing the benefits of the presented concepts in a multi-display setup (i.e., freely programmable instrument cluster, large area Head-Up Display and the presented Central Information Display). Therefore we are also working on the integration of our own remote eye- and gazetracking system, which should ensure a higher level of acceptance by a non-intrusive measurement of the user’s focus of visual attention. Furthermore, we are working on a hardware setup, that allows a sound presentation comparable to the virtual acoustics system, but without the currently needed headphones. This should also ensure a higher acceptance of the system.

Acknowledgements This work is supported in part within the DFG excellence initiative research cluster "Cognition for Technical Systems - CoTeSys", see also www.cotesys.org. Also, we would like to thank the student workers from TU München who supported the work presented in this contribution. Especially we would like to thank Miss Monika Forstner.

References 1. Bardins, S., Poitschke, T., Kohlbecher, S.: Gaze-based Interaction in various Environments. In: Proceedings of 1st ACM International Workshop on Vision Networks for Behaviour Analysis, VNBA 2008, Vancouver, Canada (October 31, 2008) 2. Brooke, J.: SUS - A quick and dirty usability scale. Technical report, Redhatch Consulting Ltd. (1996) 3. DaimlerChrysler, A.G.: Research and Technology. Lane Change Test 1.2 User Guide (2004) 4. Eilers, K., Nachreiner, F., Hänecke, K.: Entwicklung und Überprüfung einer Skala zur Erfassung subjektiv erlebter Anstrengung. Zeitschrift für Arbeitswissenschaft 40, 215–224 (1986) 5. Fastl, H., Zwicker, E.: Psychoacoustics, Facts and Models. Springer, Heidelberg (1990) 6. Kuratorium für Verkehrssicherheit. Verkehrsunfallstatistik (2007), http://www.kfv.at

Guiding a Driver’s Visual Attention Using Graphical and Auditory Animations

433

7. Heinecke, A.: Mensch-Computer-Interaktion. Hanser Fachbuch Verlag, Leipzig (2004) 8. Laquai, F., Ablassmeier, M., Poitschke, T., Rigoll, G.: Using 3D Touch Interaction for a Multimodal Zoomable User Interface. In: Proceedings of the International Conference on Human-Computer Interactional HCI International 2009, San Diego, USA (2009) 9. Matthews, G., Desmond, P.: Stress and Driving Performance: Implications for Design and Training. In: Hancock, P., Desmond, P. (eds.) Stress, Workload and Fatigue, ch. 1.8. Lawrence Erlbaum Associates, Mahwah (2001) 10. National Highway Traffic Safety Administration – NHTSA. NHTSA Driver Distraction Research: Past, Present, and Future (July 2000), http://www-nrd.nhtsa.dot.gov/departments/nrd-13/ driver-distraction/PDF/233.PDF 11. Stutts, J., Reinfurt, D., Staplin, L.: The Role of Driver Disraction in Traffic Crashes. In: AAA Foundation for Traffic Safety (2001) 12. Völk, F., Kerber, S., Fastl, H., Reifinger, S.: Design und Realisierung von virtueller Akustik für ein Augmented-Reality-Labor. In: Tagungsband Fortschritte der Akustik – DAGA 2007, Stuttgart, March 19-22, pp. 559–560. DEGA (2007) 13. EyeSeeCam Homepage, http://www.eyeseecam.com

Fundamental Study for Relationship between Cognitive Task and Brain Activity During Car Driving Shunji Shimizu1,2,3, Nobuhide Hirai3, Fumikazu Miwakeichi3,4, Senichiro Kikuchi3, Yasuhito Yoshizawa5, Masanao Sato5, Hiroshi Murata6, Eiju Watanabe3, and Satoshi Kato3 1

Tokyo University of Science, Suwa 2 Brain Science Institute, RIKEN 3 Department of Psychiatry, Jichi Medical University 4 Faculty of Engineering, Chiba University 5 Graduate School of Engineering Management, Tokyo University of Science, Suwa 6 Interface Co.,Ltd [email protected]

Abstract. For a long period, many researches about the human spatial recognition are being carried on. They are needed to make robot and automatic driving system for a car or wheelchair and with functions as high as those of humans: spatial perception, decision-making, and determining direction. The final goal of our measuring brain activity research is to contribute to developing of welfare robots with functions that are responsive like human. In this paper, the hemoglobin density change of human frontal lobe is measured. First, to analyze human spatial perception, experiments using a driving movie were designed. In the experiments NIRS (Near Infrared Reflectance Spectroscopy) was used.

Keywords: Drawing circle line, Frontal lobe, Prefrontal cortex, Premotor area.

1 Introduction Human movements change the relative relation to his environment. Nevertheless, he recognizes a new location and decides what behavior to take. It is important to analyze the human spatial perception for developing autonomous robots or automatic driving. The relation of the theta brain waves to the human spatial perception was discussed in [1] [2]. When humans perceive space, for example, in a maze and try to decide the next action, the theta brain waves saliently appear. This means we have a searching behavior to find a goal at an unknown maze. From the side of human navigation E.A. Maguire et al. measures the brain activations using complex virtual reality town [3]. But each and every task is notional and the particulars about the mechanism that enables humans to perceive space and direction is yet unknown. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 434–440, 2009. © Springer-Verlag Berlin Heidelberg 2009

Fundamental Study for Relationship between Cognitive task and Brain Activity

435

Recently, functional localization of the brain has been gradually clarified through many researches on brain functions. It is well known that in the frontal lobe higher order processing is done such as of memory, judgment, reasoning, etc. However, there is little information on what happens in the frontal lobe when every act of driving taken. We grasp mechanism of information processing of the brain by analyzing data on human brain activity during car driving. The goal of this study is to find a way to apply this result to new assist system with human motions. To achieve the goal, the brain activity of frontal lobe, which related to behavioral decision-making is discussed from the viewpoint of human spatial perception. In particular, a driving movie is shown to the subjects as sensory information. Second, we measured the brain activity of frontal lobe how concern to the handling a wheel motion of drivers connects in action of car driving. Furthermore, The brain activity is to be measured and discuss the mechanism of information processing of the brain by analyzing experimental data on human brain activity during car driving using NIRS.

2 Experiment 2.1 Brain Activity on Driving Movie Is Shown The subjects for this experiment were eight men aged 22 to 24. The average age was 22.7 and the age of standard deviation was 0.74. All of the subjects were right handed. They were asked to read and sign an informed consent regarding the experiment. An NIRS (Hitachi Medical Corp ETG-100) with 24 channels (sampling frequency 10 Hz) was used to record the density of oxygenated hemoglobin (oxy hemoglobin) and deoxygenated hemoglobin (de oxy hemoglobin) in the frontal cortex area. Driving movie for the experiment was recorded from a car with a video camera aimed toward the direction of movement. The movie is included two scenes at a Tjunction in which it must be decided either to turn to the right or left. In the second scene, there is a road sign with directions. The movie has nine kinds of movie in about one minute. Before showing the movie, subjects were given directions to turn to the right or left at the first T-junction. They were also taught the place which was on the road sign at the second T-junction. They had to decide the direction when they looked at the road sign. They were asked to push a button when they realized the direction they were to turn. The subject’s eyes which were closed and they take a rest during 10 seconds before they were shown the movies and made image after that. Then the brain activity was recorded from the first eyes-closed rest to the last eyes- close rest. Here we will define Tasks A, B, and C; Tasks A and C were proposed as the same experiment tasks. After the subjects had pushed the button, task B was added as an operation in which the steering wheel was turned in the direction of destination when the subjects were able to judge the task proposed. For this experiment, a PC (Personal Computer) displayed the movie on a HMD (Head Mounted Display). The PC emitted a trigger pulse at the start of the eyesclosed rest and driving movie. Then NIRS is recorded the brain activity, the trigger pulse from PC and the pulse from the button that gets pushed at the second T intersection. Fig 1 shows the image of this experiment.

436

S. Shimizu et al.

Fig. 1. NIRS records each hemoglobin and pulses

Subjects were seated in car seat. Then they were fitted with the NIRS probe and the HMD. They were covered with black cloth to shut out the light from outside. 2.2 Brain Activity on Handling Motion Five subjects were a healthy male in their 20s, right handed with a good driving history. The subject was asked to perform simulated car driving, moving their hand in circles as if using a steering wheel. A PC mouse on the table was used to simulate handling a wheel, and NIRS (near-infrared spectroscopy) to monitor oxygen content change in the subjects’ brain. NIRS irradiation was performed to measure brain activities when the subject sitting on a chair making a drawing circle line of the right/left hand 1) clockwise, and 2) counterclockwise. The part of measurement was the frontal lobe. The subject was asked to draw on the table a circle 30 cm in diameter five times consecutively, spending four seconds a circle. The time design was rest(10 seconds) task(20 seconds) - rest(10 seconds).

3 Experimental Result 3.1 Brain Activity on Driving Movie Is Shown For task A and B, the suggested place the subjects were headed to informed direction, and they let decided which way to turn the road sign. At the T-junction, they were to push the button when they realized the direction. In task A, the task was added after the button indicating the direction to turn was pushed by the subjects. The

Fundamental Study for Relationship between Cognitive task and Brain Activity

437

0.2 0.1

] m 0 m l・o m -0.1 [m b -0.2 H yx -0.3 o -0.4 -0.5 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Channel

Fig. 2. Result from subject A of task A 1 0.8 ] m 0.6 m ・l 0.4 o m 0.2 [m b 0 H yx o -0.2 -0.4

Task AB Task AC

-0.6 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Channel

Fig. 3. Compared between turrning the steering wheel and not 0.4

Task AB

] 0.3 m m 0.2 l・ o m 0.1 [m b H 0 y x o -0.1

Task AC

-0.2 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Channel

Fig. 4. Compared between turning the steering wheel and not

hemoglobin variation was compared in the results of Tasks A and B, A and C to see the brain activity pertaining to special perception during the same movie. Structure (1) was used to compare the data. τ1 was set the time as its length was 1 second before being pushed the button. Similarly, τ2 was set in a way similar to τ1. And xi (t) indicates variation of i channel oxy hemoglobin or de-oxy hemoglobin. We then took a average of xi(t) throughτ1 andτ2. In this situation, i of the defined c (i) was the channel for the brain activity. Because of the sampling frequency was set on 10 Hz, we calculated 10 times per sec. Fig 2 shows the sample result for this calculation c(i) in task A with oxy-hemoglobin.

c(i ) = ∑ xi (τ 2 ) − ∑ xi (τ 1 ) τ2

τ1

(1)

A comparison was made between the situations in which the steering wheel was turned and when it was not. Fig 3 is the calculation result of the test subject A with the tasks A and B. And fig 4 is the calculation result of him with the tasks A and C. For fig 3, it could be found that the incrementation of oxy-hemoglobin was higher when the steering wheel was turned than when it was not. On the other hand; fig 4, it could be found that the tendency in some channel increasing de-oxy hemoglobin. Therefore, it could be found that the total amount of hemoglobin was increased in the frontal lobe.

438

S. Shimizu et al.

1 Task AB

0.8 ] m 0.6 m ・ l o 0.4 m m 0.2 [ b H 0 y x o -0.2 e d

Task AC

-0.4 -0.6

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Channel

Fig. 5. Compared between turning the steering wheel and not (oxy Hb of average) 0.4 Task AB

] 0.3 m m l・ 0.2 o m m [ 0.1 b H y 0 x o e d -0.1

Task AC

-0.2 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Channel

Fig 6. Compared between turning the steering wheel and not (deoxy Hb of average)

The next step was to calculate the average of all subjects. Fig 5 and 6 show the results. Many upward tendencies could be found in fig 5. This might have occurred when they were finding they way from a road sign. In addition, the results indicate a greater increase when the subjects turned the steering wheel. That means Observation of brain activity has been made during movement based on spatial perceptions. On the whole, the variation in de-oxy hemoglobin (Fig 6) was smaller than in the oxy hemoglobin. However, there was a great increase in Channel 18. This might be the variation based on the spatial perceptions. Next, differences were investigated concerning the subject’s brain activity. The First case was when the vision was directed after having been told the direction. The Second case was when the vision was directed after having been told the direction gone to the direction which the subjects decided where to go from a road sign. d1 and d2 shown in fig 7 are defined as below. d1 is the variation of hemoglobin turning at the first T-junction. And d2 is variation of hemoglobin at the second one. From the measurement result, d1 and d2, all of the 269 times of each subject, there were significant differences in oxy hemoglobin 3ch. (p<0.02: paired t test) and 20ch. (p<0.03) in fig 8 using NIRS. Subjects pushed a button before turning at the second T-junction, so it influenced brain activities. The possibility of a correlation between d2 and the time until the movie was turned at the second T-junction after each subject pushed a button was investigated. Each correlation coefficient of hemoglobin channel was calculated. There was significant difference at only de-oxy hemoglobin 10ch. (p<0.07) using

Fundamental Study for Relationship between Cognitive task and Brain Activity

439

d1

b H d2

0.5s

0.5s

0.5s

Turn at 1st T-junction

0.5s

Turn at 2nd T-junction

Time

Fig 7. Define variation of hemoglobin d1 and d2

1 3

2 4

6 8

15

7 9

11

13 5

18 10

12

14 16

20

oxy H b

3

19 21

23

1 17

6 22

24

2 4

8

15

7 9

11

13 5

18 10

12

14 16

20

deoxy H b

17 19

21 23

22 24

Fig 8. Significant differences at NIRS's oxy-hemoglobin are gray.

paired t test. In only this result, the relationship between pushing a button and d2 cannot be judged. During the motion, the increase of oxy hemoglobin density of the brain was found in all subjects. The different regions of the brain were observed to be active, depending on the individual. The subjects were to be observed 1) on starting, and 2) 3-5 seconds after starting moving their 3) right hand 4) left hand 5)clockwise 6)counterclockwise. Although some individual variation existed, the result showed the significant differences and some characteristic patterns. The obtained patterns are shown as follows. Regardless of 1), 2), 3) and 4) above, the change in the oxy hemoglobin density of the brain was seen within the significant difference level 5% or less in the three individuals out of all five subjects. The part was the adjacent part both of left pre-motor area and of left prefrontal cortex. Especially, in the adjacent part of prefrontal cortex a number of significant differences were seen among in four out of five subjects. Next more emphasis was put on the rotation direction: 5) clockwise or 6) counterclockwise. No large density change was found in the brain with all the subjects employing 6). But the significant difference was seen in four out of five subjects employing 5). (Fig.1) It is well known that in the outside prefrontal cortex higher order processing is done such as of behavior control. It is inferred that the pre-motor area was activated when the subjects moved the hand in the way stated above because the pre-motor area is responsible for behavior control, for transforming visual information, and for generating neural impulses controlling.

440

S. Shimizu et al.

Fig. 9. Brain activity (Clockwise)

4 Conclusion The hemoglobin density change of the human subjects’ frontal lobe is partly observed in the experiments we designed, where three kinds of tasks were performed to analyze human brain activity from the view point of spatial perception. The NIRS measures of hemoglobin variation in the channels suggest that human behavioral decision-making of different types may cause different brain activities as we saw in the tasks: 1) take a given direction at the first T-junction, 2) take a selfchosen direction on a road sign at the second T-junction and 3) turn the wheel or not. Some significant differences (paired t test) on NIRS’s oxy-hemoglobin and less interrelated results between “pushing a button” and brain activity at the second T-junction are obtained. Researches into other human brain activities than spatial perception are to be necessary with accumulated data from fMRI, EEG,etc. Furthermore, experimental results indicate that with the subjects moving their hand in circle, regardless of right or left, 1) the same response was observed in the prefrontal cortex and premotor area, and 2) different patterns of brain activities generated by moving either hand clockwise or counterclockwise. The regions observed were only those with the 5% and less significance level. Possible extensions could be applied to other regions with the 10% and less significance level for the future study. With a larger number of subjects, brain activity patterns need to be made clear.

References 1. Kahana, M.J., Sekuler, R., Caplan, J.B., Kirschen, M., Madsen, J.R.: Human theta oscillations exhibit task dependence during virtual maze navigation. Nature 399, 781–784 (1999) 2. Nishiyama, N., Yamaguchi, Y.: Human EEG theta in the spatial recognition task. In: Proceedings of 5th World Multiconf. On Systemics, Cybernetics and Informatics (SCI 2001), 7th Int. Conf. On Information Systems, Analysis and Synthesis (ISAS 2001), pp. 497–500 (2001) 3. Maguire, E.A., Burgess, N., Donnett, J.G., Frackowiak, R.S.J., Frith, C.D., Keefe, J.O.: Knowing Where and Getting There: A Human Navigation Network. Science 280 (May 8, 1998)

A Study on a Method to Call Drivers' Attention to Hazard Hiroshi Takahashi Shonan Institute of Technology Fujisawa, Kanagawa, 251-8511, Japan [email protected]

Abstract. This paper presents a new warning method for increasing drivers' sensitivity for recognizing hazardous factors in the driving environment. The method is based on a subliminal effect. The results of many experiments performed with six subjects show that the response time for detecting a flashing mark tended to decrease when a subliminal mark was shown in advance. This paper also proposes a scenario for implementing this method in real vehicles. Keywords: Driving Assistant System, Subliminal Information, Attention.

1 Introduction Driver-assistance systems are being put on the market in conjunction with ongoing research concerning Intelligent Transport Systems (ITS). Driver-assistance systems reduce a driver's workload for executing operations to make a vehicle go, turn and stop, representing the basic dynamic behavior of vehicles. Adaptive cruise control (ACC)[1] and lane-keeping support systems[2] are typical examples of such systems that are designed to lighten drivers' workload. In addition to mechanical intervention, other conceivable forms of driver assistance include support for the tasks of perception, cognition and judgment. For example, a navigation system[3] that can detect a distant pedestrian and display the information on a dashboard monitor can reduce drivers' perception workload by aiding their ability to perceive the outside world. From this perspective, this study focused on a system for assisting drivers' situational awareness. There are many instances where the discrepancy between a driver's situational awareness and the actual situation can lead to a serious accident. However, it is not easy to assist drivers in perceiving their surrounding circumstances in diverse driving situations. In general, the direct presentation of information to a driver about the driving environment can be considered as a means of making the person aware of a potentially critical situation. A typical example here is a following distance warning system.[4] However, there are times when the provision of such information by a system can interfere with a driver's cognition of some other aspect of the driving environment. For example, warning a driver about the presence of a forward object might D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 441–450, 2009. © Springer-Verlag Berlin Heidelberg 2009

442

H. Takahashi

reduce the attention resources allocated to other aspects.[5] There are also problems related to the accuracy (certainty) of the sensed information provided. If the sensed information is 100% accurate, providing it to a driver can be expected to have some effect in assisting driving operations. However, if the information presented contains uncertainties, the provision of uncertain sensed information to a driver, even though it might happen very infrequently, could cause confusion and induce a phenomenon known as risk homeostasis.[6] So long as remote sensing is used to detect the information that is the target of a driver-assistance system, it is necessary to construct cognitive support systems that tolerate uncertain information within given limits. For example, consider a situation where many vehicles are stopped due to congestion in the oncoming traffic lane of a two-lane road with two-way traffic. For the drivers of vehicles traveling in the opposite lane that is not congested, there is the uncertain possibility that someone might suddenly dash out from behind one of the stopped vehicles in the congested oncoming traffic lane. There are people who exercise vigilance against such a possibility and drive very cautiously.[7] On the other hand, there are others who are completely indifferent to it. If the latter drivers be could be alerted that the situation requires caution, it could heighten their awareness of and attention to the uncertain possibility of an accident occurring because of someone suddenly dashing out in front of their vehicle. That could quicken their response time in the event such a situation should actually occur. In fact, the importance of training in risk prediction is recognized at driver education schools. It is expected that such prediction training will improve drivers' attention and sensitivity to the driving environment. Against this backdrop, this study focused on a method of improving drivers' risk sensitivity and prediction in relation to unsafe factors of an uncertain nature that might occur at anytime, which are referred to here as hazardous factors. Focusing on the subliminal consciousness that acts on human latent consciousness, a study was made of the possibility of using subliminal warnings to increase drivers' awareness of and attention to their surrounding circumstances. Section 2 discusses subliminal consciousness and section 3 explains the basic experiments that were conducted. Section 4 discusses the experimental results and section 5 summarizes the findings of this study.

2 Warning Action 2.1 Previous Studies on Subliminal Consciousness The term subliminal combines the prefix sub, meaning below, and the Latin word limen, meaning threshold or boundary. A threshold is a value that marks the dividing line between whether a presented stimulus can be consciously perceived or not. A stimulus which does not reach that level is subliminal. The quantity of visual information humans can perceive is reported to be 10 Mbit/s,[8] which is equivalent to the number of characters on approximately 53 newspaper pages. Of that amount, it is said that humans can consciously perceive an information flow of 40 bit/s, which means

A Study on a Method to Call Drivers' Attention to Hazard

443

cognition of 2.5 characters per second. Subtracting 40 bit/s from 10 Mbit/s gives the quantity of information that is not consciously perceived and is processed as subliminal information. An example of an early study of subliminal consciousness is the experiment conducted by James Vicary, a marketing researcher, at a movie theater in Fort Lee, New Jersey in 1957. During the showing of the movie Picnic, he flashed two different messages (Eat Popcorn and Drink Coca-Cola) on the screen repeatedly every five seconds and measured the impact on sales. The frames containing the messages were displayed for such a brief duration that they were not consciously perceivable by the viewers. The experiments conducted by William Kunst-Wilson and Robert Zajonc can be cited as another example. Not all of the scientific experiments dealing with subliminal consciousness have shown that there is an effect on conscious perceptibility[9]. An effect has been reported in some studies but not in others depending on the conditions. It is not scientifically known at this point under what type of conditions the effect becomes stronger. 2.2 Application of Subliminal Warnings to Driver Awareness Assistance Assuming that a driver-assistance system is capable of detecting a potentially hazardous situation, there is the issue of how that information should be conveyed to the driver. The information can be presented visually by using dashboard warning lamps, HUD images or other visual means. The driver can also be warned of the situation by issuing audible or haptic alerts. Guiding the driver's line of sight for an extended period of time is not deemed to be a very suitable means of providing information for predicting potential risk when the occurrence of the event is uncertain. Audible or haptic alerts cannot not easily present information on spatial positions, and drivers might sometimes feel such alerts themselves are annoying. From these perspectives, we think that the presentation of a hazard warning by a technique that operates on the subliminal consciousness is a method of conveying information that is worthy of examination. This study investigated a method of presenting a warning to a driver in a situation where a potential hazard has been detected in the driving environment. Hazard detection, which is the precondition for issuing a warning, is accomplished by means of a hazard estimation model [10] that we constructed previously using a neural network and fuzzy logic rules.

3 Basic Experiments The experiments examined how long attention was improved by the presentation of information at the subliminal consciousness level. The experiments also examined the relationship between the timing for presenting subliminal information and the effect on improving attention. A concrete explanation of the experimental procedure is given below.

444

H. Takahashi

Fig. 1. Example of presented image (1)

Fig. 2. Example of presented image with predictor mark (2)

Several different types of images to be presented to the subjects were prepared in advance. The images included moving images and still images containing disturbance elements such as the direction and speed of a moving object. First, a still image like that shown in Fig. 1 was presented for over ten seconds. Then, an image like that in Fig. 2 was interposed for 0.02 s, after which the display returned to the image in Fig. 1 again. The interposed image in Fig. 2 corresponded to a visual stimulus that acted on the subliminal consciousness. The exclamation mark (referred to here as a predictor) in the figure was highlighted in the image, but the mark was actually difficult to perceive unless one looked closely. After the displayed image was changed from that in Fig. 2 back to the image in Fig. 1, an interval of 1, 3, 5, 7, 9 or 10 s was provided and then the image in Fig. 3 was presented. The subjects were asked to press a button as soon as they recognized some change in the image, as shown in Fig. 3. The interval from the presentation of the image in Fig. 3 until the subjects pressed the button was measured. The star in the image in Fig. 3 was located in the same position as the predictor mark, for stimulating

A Study on a Method to Call Drivers' Attention to Hazard

445

When this mark is displayed, subject presses the button

Fig. 3. Example of presented image with final mark (3)

Fig. 4. Time chart for presenting images

the subliminal consciousness, in the image in Fig. 2. It was thought that the subjects should immediately perceive the appearance of the star in Fig. 3, if they were cognizant of the predictor mark in Fig. 2. A time chart for the presentation of the images is shown in Fig. 4. The subjects were not told about the presence of the predictor and only knew that their task was to press the button as soon as they recognized a change in the image in Fig. 3. The attributes of the subjects are shown in Table 1, and the computer environment used in the experiments is described in Table 2. A typical example of another image used is shown in Fig. 5. This image represents a driving scene on an expressway. The subjects were shown experimental samples of the presented images in advance so that they would clearly understand the experimental procedure. These samples were newly created for this specific purpose and none of the images used in the experiments were presented at this time. After the subjects understood the procedure, they were shown the images in turn, using different image combinations that had been prepared in advance. Six sets of images were created at random, including one set that did not contain the predictor mark. One set of images was randomly selected for use in each experiment. The order in which the images were presented was entirely

446

H. Takahashi

Table 1. Attributes of test subjects

Gender

Age

Subject A

Male

23

Subject B

Male

23

Subject C

Male

22

Subject D

Male

50

Subject E

Male

37

Subject F

Male

44

Table 2. Computer environment used for presenting images

OS

WindowsVista™HomePremium

CPU

Intel®Core™2[email protected]

RAM

2030MB

Imagecreation software

Ulead VideoStudio®11

Imageplayback software

APlayer

Fig. 5. Example of presented image

random for all of the subjects. Because the limit of human reaction speed is reported to be 0.1 s, any data for less than 0.1 s were excluded from the data collected before switching to the final image during the experiment. In addition, data were also

A Study on a Method to Call Drivers' Attention to Hazard

447

excluded that deviated greatly from the standard deviation of 1 s after the presentation of the final image. Approximately 250 data samples were thus collected that represent the experimental results described in the following section.

4 Result and Discussion 4.1 Experimental Results The mean and standard deviation of the response time data obtained from all of the subjects are shown in Fig. 6. The vertical axis indicates the measured response time, and the horizontal axis shows the interval between the presentation of the predictor (exclamation mark) and the presentation of the final mark (star). The results in the graph indicate that the response time tended to be shorter when the predictor mark, which presumably acted on the subliminal consciousness, was presented, compared with the condition when it was not shown. In addition, the mean response time became faster as the interval between the presentation of the predictor mark and the presentation of the final mark was gradually lengthened from 1 s to 3 s to 5 s. Among all the intervals, the greatest effect on improving the response time was seen for an interval of 5 s, after which the response time became slower again. The results of a test for a statistically significant difference indicated that there was a significant difference, with a 95% confidence interval, for the patterns where the images were changed after intervals of 7, 9 and 10 s, compared with the condition without the predictor mark.

0.594

0.600

0.574

+SD 0.550

0.507 0.481

0.500

0.466

0.450

0.447 0.402

0.400

0.393

0.403

平均値 Mean

0.417

0.386

0.350

-SD 0.339

0.300

value

0.339

0.344

0.296 0.276

~

無

No predictor

0.353

0.325

0.250 0.200 0.000

0.431

秒

1s

秒

3 s

秒

5 s

0.288

秒

7 s

0.300

秒

9 s

秒

10 s

Interval between predictor mark and final mark

Fig. 6. Relationship between presentation of predictor mark and response time

448

H. Takahashi

Fig. 7. Relationship between predictor mark visibility and response time

The following experiment was conducted to verify the reliability of the measured data. The continuity of the measured data was examined to see if there was any difference ascribable to the ease of perceiving the predictor mark. The response time was investigated for the pattern where the presented images were changed after 5 s, which was the interval that showed the fastest response time overall. Three types of images were used in this experiment: one without a predictor mark, one with a difficult-toperceive predictor mark and one with an easy-to-perceive predictor mark. The results in Fig. 7 are for a 5-s interval between the presentation of the predictor mark and the presentation of the final mark. The vertical axis shows the measured response time and the horizontal axis shows the interval in seconds between the presentation of the predictor mark and the changing of the images to the final one. The results in the graph clearly indicate that the response time was faster for the images with the predictor mark compared with the response time without it. The response time continuously became faster in the order of no predictor mark, difficult-to-perceive predictor mark and easy-to-perceive predictor mark. The continuity of the effect on the response time attributable to the difficulty in perceiving the presented information can be seen from the results. 4.2 Application to an Actual System A preliminary study was made of the effectiveness of presenting subliminal information that acts on the subliminal consciousness of drivers. This section discusses how such information might be used as a warning. As mentioned earlier, the authors have previously proposed a method of predicting the location and position of hazards in the driving environment [10]. This method uses a camera to observe the forward direction and a neural network to estimate the location and position of hazards that typically

A Study on a Method to Call Drivers' Attention to Hazard

449

Fig. 8. Concrete example of operation of in-vehicle system

draw the attention of experienced drivers. The neural network has been taught the typical points that experienced drivers are generally vigilant of. While the general applicability of this method requires further detailed discussion, it is thought that the configuration of the system at least provides a mechanism for arousing the attention of drivers. As illustrated in Fig. 8, that can be accomplished by briefly projecting a light on the places in the forward view wherever and whenever hazards are detected with this method, so as to act on the driver's subliminal consciousness and encourage the devotion of attention to those locations.

5 Conclusion This paper has proposed the use of subliminal warnings as a new way of presenting warning information to drivers and described the results of a preliminary study of its effectiveness. It was found that the presentation of subliminal warnings to six subjects significantly increased their awareness of a change in the images of the driving scenes presented. In future work, it is planned to investigate the effectiveness of the proposed method for various presentation times, specific presentation modes and other factors in order to examine its applicability in the field of vehicle safety under more diverse levels of subliminal consciousness.

References 1. de Bruin, D., et al.: Design and Test of a Cooperative Adaptive Cruise Control System. In: Proc. of 2004 IEEE Intelligent Vehicles Symposium, pp. 392–396 (2004) 2. Ishida, S., et al.: Development, Evaluation and In-troduction of a Lane Keeping Assistance System. In: Proc. of 2004 IEEE Intelligent Vehicles Symposium, pp. 943–945 (2004) 3. Tsuji, F., et al.: Development of a Support System for Nighttime Recognition of Pedestrians, Preprint of JSAE Scientific Lecture Series, 20055287 (2005)

450

H. Takahashi

4. Katoh, et al.: Risk Reduction with a Following Distance Warning and Emergency Braking System, The Institute of Electronics, Information and Communication Engineers Technical Report 101(102), 11–16 (2001) 5. Ishibashi: Human Factors and Error Countermeasures. Journal of National Institute of Public Health 51(4), 232–244 (2002) 6. Gerald, J.S.W.: Target Risk, Japanese translation by Haga, S., Shinyosha, Tokyo (2007) 7. Kokubun, M., et al.: Analysis of Drivers’ Risk Sensitivity Characteristics. Transactions of the Human Interface Society 5(1), 27–36 (2003) 8. Zimmermann, M.: Neurophysiology of Sensory Systems, pp. 68–166 (1977) 9. Karremans, J.: Beyond vicary’s fantasies: the impact of subliminal priming and brand choice[Electronic Version]. Journal of Experimental Social Psychology 42, 792–798 (2006) 10. Takahashi, H., et al.: A Study on Predicting Hazard Factors for Safe Driving. IEEE Transaction on Industrial Electronics 54(2), 781–789 (2007)

An Analysis of Saccadic Eye Movements and Facial Images for Assessing Vigilance Levels During Simulated Driving Akinori Ueno1, Shoyo Tei2, Tomohide Nonomura2, and Yuichi Inoue3 1

Department of Electric and Electronic Engineering, School of Engineering, Tokyo Denki University, 2-2 Kanda-Nishiki-cho, Chiyoda-ku, Tokyo 101-8457, Japan 2 Master’s Program of Electronic and Computer Engineering, Graduate School of Science and Engineering, Graduate School of Tokyo Denki University, Ishizaka, Hatoyama, Saitama 350-0394, Japan 3 Japan Somnology Center, Neuropsychiatric Research Institute, 1-24-10 Yoyogi, Shibuya-ku, Tokyo 151-0053, Japan, [email protected], {syoyo_t,nonomura}@ff.f.dendai.ac.jp, [email protected]

Abstract. The authors analyzed facial video recordings and saccadic eye movements during 1-hour simulated driving in 10 subjects. Mean crosscorrelation coefficient between the visually determined facial sleepiness and the proposed index of saccade (i.e. PV/D) for 9 subjects was -0.56 and the maximum coefficient of inverse cross-correlation was 0.83. Mean crosscorrelation coefficient for 6 repetitive measurements for another subject was 0.72, and the maximum was 0.84. Variation in PV/D preceded that in facial sleepiness in 13 of 15 measurements and syncronized with it in other 2 measurements. From these results, we confirmed a fair potential of the PV/D to detect decline in vigilance levels earlier than facial sleepiness. We also revealed that narrow fluctuations throughout the measurement could lead to low inverse cross-correlation below 0.60 between the two indices. Therefore experimenter should pay attention to designing the experimental procedure to ensure broad fuctuations of the subject’s vigilance levels in the measurement. Keywords: doze prevention, saccade, facial sleepiness, advanced safety vehicle.

1 Introduction The increasing number of traffic accidents due to the diminished vigilance level of the driver has become a serious concern all over the world. Drivers with diminished vigilance levels suffer from a marked decline in their abilities of perception, recognition, and vehicle control. Therefore such drivers jeopardize not only their own D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 451–460, 2009. © Springer-Verlag Berlin Heidelberg 2009

452

A. Ueno et al.

lives but also the others’ around them. Accordingly, it is essential to develop a vehicle safety system based on driver’s vigilance monitoring. Several researchers have worked on vigilance monitoring of the drivers with a camera. For instance, Ueno H. et al. proposed a drowsiness detection system based on eye closure metrics [1]. Ji et al. presented a prototype real-time computer vision system for monitoring vigilance of a driver using visual cues of eye tracking, eyelid movement, face pose, and gaze [2]. Smith et al. developed a system for analyzing human driver’s visual attention using head and facial features including three-dimensional gaze [3]. All of these studies showed viability and robustness of the system for extracting facial or head features. However, relevancy and validity of their indices of vigilance have not been examined quantitatively at all except in [1]. Even in the article [1], another vigilance index defined by the authors was used for the evaluation, and adequacy of the self-defined index has not been verified objectively at all. Conceivably, this is because there is no gold standard for estimating vigilance levels. This makes the research for the vigilance monitoring quite difficult. In order to acquire driver’s acceptance and broad social acceptance to newly proposed vigilance index, the researchers are desired to explain the mechanism by which the index value are affected in accordance with the change in vigilance level. Since vigilance level is usually defined as an activity level in the brain in staying awake, researchers often seek to explain the mechanism using the activity signal of the brain such as electroencephalogram (EEG). However, even for EEG, any sensitive and widelyaccepted metric for estimating vigilance levels has not been proposed yet. And this makes the research of vigilance monitoring more troublesome. With the above background in mind, our research group has focused on dynamic characteristics of saccadic eye movement (saccade: SC) as a candidate of vigilance index, and have investigated correlations between the characteristics of SC and other vigilance indices that are quantitatively assessed from physiological, psychological, and psychophysical variables, respectively [4]-[11]. In the articles [4]-[8], spectral powers of EEG at Fz, Cz, and Pz during eye tracking tasks were analyzed, and were found to correlate closely with standardized peak velocity, standardized duration, and ratio of peak velocity to duration (PV/D) of SC. In the article [6]-[8], correlations with subjective sleepiness of Kwansei-Gaukin Sleepiness Scale (Japanese translation of Stanford sleepiness scale [12]) and also with the score of self-rated vigilance level were investigated through 24-hour sleep deprivation experiment, and results showed high correlations among the SC parameters and subjective vigilance indices. Additionally, we have showed implications between the SC parameters and a newly devised performance index of visual perception [9]-[11]. In view of our previous results and the fact that SC is the eye movement most frequently generated in driving, the dynamic characteristics of SC seems promising as a sensitive and reliable index for monitoring driver’s vigilance levels. In order to initiate a feasibility study toward a vehicle safety system based on driver’s vigilance monitoring, we have launched validation of the proposed vigilance index (PV/D) during simulated driving. In this paper, we report some preliminary results based on an analysis of facial images and SC associated with vigilance levels.

An Analysis of Saccadic Eye Movements and Facial Images

453

2 Materials and Methods 2.1 Subjects Nine males, aged 21 to 23, participated in experiment 1. Another male aged 21 participated in experiment 2. All subjects possessed uncorrected visual acuity more than 14/20. They didn’t wear corrective lenses during the experiments, nor showed ocular and oculomotor pathology. They were instructed to refrain from drinking beverage containing caffeine and from taking any stimulants on the day of the experiment. Informed consent was obtained from all subjects before the experiment. 2.2 Apparatus A custom-built driving simulator was employed in all experiments (see Fig.1). The simulator outputs rotation angle of steering wheel, acceleration signal of the accelerator pedal and lateral position in the lane. Horizontal and vertical monocular eye movements were measured with an optical eye movement monitor (Takei Scientific Instrument, 2930C). Facial image of the driver was recorded with a camcorder (Sharp, VL-MX1 PRO). Electro-oculogram (EOG) in vertical and horizontal directions was also measured with a bioamplifier (Biopac Systems, EOG100C). In addition, electroencephalogram (EEG) at Fz, Cz, Pz, and Oz of the international 10-20 system, and electromyogram (EMG) of mentalis and masseter muscles were measured in the experiment 1. All signals except the facial image were sampled at 1 kHz with 16 bit resolution, and then stored in a personal computer using a data recording system (Biopac System, MP-150 system).

Fig. 1. An image of the custom-built driving simulator employed in this study

2.3 Experimental Procedure In the experiment 1, the subjects steered the driving simulator for one hour. They instructed to keep a speed at 100 km/h in the left lane, but to slow down into 40 km/h while a LED cue in the rearview mirror lit up. Lighting up and out of the LED was controlled manually in a random manner by an experimenter. A monotonous and

454

A. Ueno et al.

repetitive driving course composed of a sine waveform was employed so that the deviation of the car body from the lane could be easily measured. The experiment started at 13:30 or 15:10. In the experiment 2, the subject steered the simulator for one hour in the same manner as that described in the experiment 1, but repeated the steering 6 times in different 6 days. The experiments started at 13:30 in the former 3 days and at 15:10 in the latter 3 days. All subjects had a meal which was served by the experimenter 90-minute prior to the start of the measurement. Then they practiced the steering more than 10 min until they become accustomed to the simulator. The sensitivity of the eye movement monitor was adjusted prior to start of each experiment. Nine calibration markings which were aligned in a cross shape with a distance of 300 mm were presented on a front face for the adjustment. The subjects fixated on each marking according to the instruction of the experimenter, while the amplifier gain and offset of eye position signal were inspected and adjusted. Distance between the subject’s eye and the center marking was also measured. All experiments were carried out in a darkened room.

3 Data Analyses 3.1 Analysis of Saccade Two voltage data obtained from the eye movement monitor were transformed at first to the horizontal and vertical rotation angles using calibration data. Then, the rotation angles were separately differentiated by the two-point central difference algorithm [13] and converted into angular velocity signals. Angular velocity thresholds were set to ±25 deg/s for detecting SC candidates. In order to discriminate SCs from blinks, an interval threshold was employed. Since the refractory period of SC is known to last more than 50 ms after the end of each SC motion, SC candidates with an interval less than 50 ms were regarded as blinks. The reliability of the SC classification algorithm was confirmed in our previous study [14]. Small SCs were discarded because of the limited angular resolution of the monitor. The threshold of amplitude for the small SC was set to 2.0 deg. For each selected SC, a synthetic velocity waveform was calculated using the Pythagorean theorem, and then the vigilance index of PV/D was calculated by dividing the peak velocity (PV) by the duration (D) of the SC (see Fig.2). PV/D is an accelerative characteristic having the unit of deg/s2 and is known to decrease in accordance with decline in vigilance levels in our previous studies [4][11], [15], [16]. PV/D values in each experiment were stamped with time of the SC occurrence in terms of the initiation of the experiment. PV/D values within a period from 0 to 60 s were pooled by referring to the stamped time and a mean PV/D in the 60-second time window was computed. Then the time window was shifted by 5 s and corresponding mean PV/D was calculated. This processing was repeated until the end of the window reached the end of the experiment (i.e. 3600 s).

An Analysis of Saccadic Eye Movements and Facial Images

455

10deg

(a) Horizontal rotation angle

2deg

(b) Vertical rotation angle Peak Velocity (PV)

200 deg/s Duration (D)

50ms

(c) Synthetic velocity Fig. 2. Records obtained from the eye movement monitor and calculated SC parameters

3.2 Analysis of Facial Video Two trained determiners analyzed the video recording of the subjects’ face every 5 s and rated facial sleepiness (FS) for the video in five levels of 1 to 5 according to the Kitajima’s method [15]. Rated two FS for the each 5-second video were averaged, and then moving average was applied to the averaged FS with 60-second time wiondow. 3.3 Cross-Correlation Analysis Cross-correlation between the processed FS and the PV/D was analyzed for each experiment. First 10-minite data were excluded for the analysis to curb improper influence caused by naive rating of FS. Maximum cross-correlation coefficient and the time lag of the FS to the PV/D were computed for each experiment.

4 Results and Discussion 4.1 Results of Experiment 1 Fig. 3 shows a major example of time-course variations in FS and in PV/D during simulated driving. As can be seen in the figure, the two indices changed quite similarly with time. Particularly reactions to three-time vocal alerts at 2304, 2977 and

456

A. Ueno et al. 1

6 Vocal Alert (V.A.)

R=0.72

V.A.

V.A.

2

5

3

4

4

3 PV/D

5

2

Facial Sleepiness

6 0

600

1200

1800 2400 Time [sec]

3000

1 3600

Fig. 3. Time course variations in FS and PV/D during 1-hour simulated driving. Downward direction in vertical axis corresponds to sleepy state and vise versa (Subject B). Table 1. Cross-correlation coefficient and lag time between FS and PV/D Subject A B C D E F G H I Ave.

Correlation Coefficient -0.43 -0.72 -0.55 -0.54 -0.54 -0.71 -0.47 -0.23 -0.83 -0.56

Lag Time [second] 15 10 15 10 10 0 35 15 0 12.2

Trial No. (Subject J) 1st 2nd 3rd 4th 5th 6th

Correlation Coefficient -0.50 -0.72 -0.76 -0.84 -0.82 -0.69

Lag Time [second] 25 25 15 5 10 10

Ave.

-0.72

15.0

3360 s were closely synchronized, respectively. As the result of cross-correlation analysis, both variations showed a high correlation of -0.72 and the PV/D preceded 10 s to the FS. Since lag times of FS in Table 1 were greater or equal to 0 in all subjects, PV/D seems having the potential to detect lowering of vigilance level earlier than FS. However, we have to notice that not all of subjects beared high correlations between the indices as in Table 1. One factor has been narrow fluctuation range of the subject’s vigilance levels throughout the 1-hour measurement. For detail, we discuss about this in the next subsection. 4.2 Results of Experiment 2 As indicated in the right column of Table 1, six repetition of the measurement for one subject revealed wide intraindividual differences (up to 0.34) in cross-correlation

An Analysis of Saccadic Eye Movements and Facial Images

457

7.5

1 R=0.50

2

6

3

4.5

4

3 PV/D

5

Facial Sleepiness

6 0

600

1200

1800 2400 Time [sec]

3000

1.5

0 3600

(a) 1st measurement day 1

7.5 R=0.84

2

6

3

4.5

4

3 PV/D

5

Facial Sleepiness

6 0

600

1200

1800 2400 Time [sec]

3000

1.5

0 3600

(b) 4th measurement day Fig. 4. Contrastive results of time-course variations in FS and PV/D in repetitive simulated driving (subject J). (a) and (b) correspond to 1st and 4th measurement day, respectively. The lowest and the highest correlation coefficients were obtained in the results of (a) and (b), respectively.

between FS and PV/D. Comparison between time course variations in the indices in Fig.4(a) and (b) suggests that narrow fluctuating range of the subject’s vigilance levels throughout the measurement may cause the low cross-correlation. In fact, difference between maximum and minimum FS in the period from 600 to 3600 s in

458

A. Ueno et al.

Fig.4(a) was 1.21. By contrast, the difference in Fig.4(b) was 2.88 and more than twice the difference in Fgi.4(a). Furthermore, the summary of the difference for all measurements in Table 2 supports our pypothesis. In Table 2, all measurements in which the difference was less or equal to 2.00 showed low cross-correlation below 0.60. Therefore inter- and intra-individual differences of cross-correlation coefficients between FS and PV/D in Table 1 could attribute mainly to narrow fluctuation of the subject’s vigilance levels throughout the measurement. Accordingly, the experienter have to pay more attention to design experimental procedures to ensure broad fuctuations of the subject’s vigilance levels in the future. Table 2. Cross-correlation coefficient between FS and PV/D, and difference between maximum and minimum FS in the period from 600 to 3600 s Subject A B C D E F G H I

Correlation Coefficient -0.43 -0.72 -0.55 -0.54 -0.54 -0.71 -0.47 -0.23 -0.83

difference in FS 1.58 2.83 1.75 1.50 2.42 3.08 2.00 2.75 3.83

Trial No. (Subject J) 1st 2nd 3rd 4th 5th 6th

Correlation Coefficient -0.50 -0.72 -0.76 -0.84 -0.82 -0.69

difference in FS 1.21 2.15 2.15 2.88 2.25 2.25

5 Conclusion and Future Issues The authors analyzed facial video recordings and saccadic eye movements during 1hour simulated driving in 10 subjects. Mean cross-correlation coefficient between the visually determined facial sleepiness (FS) and the proposed index of saccade (PV/D) for 9 subjects was -0.56 and the maximum coefficient of inverse cross-correlation was 0.83. Mean cross-correlation coefficient for 6 repetitive measurements for another subject was -0.72, and the maximum was 0.84. Valiation in PV/D preceded that in FS in 13 of 15 measurements and syncronized with it in other 2 measurements. From these results, we confirmed a fair potential of the PV/D to detect decline in vigilance levels earlier than FS. We also revealed that narrow fluctuations throughout the measurement could lead to low inverse cross-correlations below 0.60 between the indices. Therefore experimenter should pay attention to designing the experimental procedure to ensure broad fuctuations of the subject’s vigilance levels in the experiment. Future issues to be addressed are as follows: • reliability analysis of FS in view of concordance rate between the two determiner of the facial sleepiness, • repeatability investigation of the voltage-degree calibration for eye movement monitoring,

An Analysis of Saccadic Eye Movements and Facial Images

459

• analysis of driving performance during the measurment, • analysis of simultaneously measured EEGs.

Acknowledgement This work was advanced as a part of collaborative project with DENSO Corporation.

References 1. Ueno, H., Kaneda, M., Tsukino, M.: Development of Drowsy Detection System. In: Vehicle Navigation & Information Systems, pp. 15–20 (1994) 2. Ji, Q., Yang, X.: Real Time Visual Cues Extraction for Monitoring Driver Vigilance. In: Schiele, B., Sagerer, G. (eds.) ICVS 2001. LNCS, vol. 2095, pp. 107–124. Springer, Heidelberg (2001) 3. Smith, P., Shah, M., Lobo, N.V.: Determining Driver Visual Attention with One Camera. IEEE Trans. ITS 4, 205–218 (2003) 4. Ueno, A., Ota, Y., Takase, M., Minamitani, H.: Relationship between Vigilance Levels and Characteristics of Saccadic Eye Movement. In: 17th Ann. Int. Conf. IEEE EMBS, pp. 572–573 (1995) 5. Ueno, A., Ota, Y., Takase, M., Minamitani, H.: Characteristics of Visually Triggered and Internally Guided Saccade Depending on Vigilance States. IEICE Trans. Information and Systems PT 2 J81-D-II, 1411–1420 (1998) (in Japanese) 6. Ueno, A., Ota, Y., Takase, M., Minamitani, H.: Parametric Analysis of Saccadic Eye Movement Depending on Vigilance States. In: 18th Ann. Int. Conf. IEEE EMBS, pp. 319– 320 (1996) 7. Ueno, A., Hashimoto, H., Takase, M., Minamitani, H.: Diurnal Variation in Vertical Saccade Dynamics. In: 4th Asia-Pacific Conf. Medical and Biological Engineering, p. 373 (1999) 8. Ueno, A., Tateyama, T., Takase, M., Minamitani, H.: Dynamics of Saccadic Eye Movement Depending on Diurnal Variation in Human Alertness. System and Computers in Japan 33, 95–103 (2002) 9. Ueno, A., Uchikawa, Y.: Relation between Human Alertness, Velocity Wave Profile of Saccade, and Performance of Visual Activities. In: 26th Ann. Int. Conf. IEEE EMBS, pp. 933–935 (2004) 10. Ueno, A., Sakamoto, S., Uchikawa, Y.: Relation between Dynamics of Saccade and Bit Rate for Visual Perception during Numerical Targets Comparing. IEICE Trans. Information and Systems PT 2 J87-D-II, 2062–2070 (2004) (in Japanese) 11. Ueno, A., Uchikawa, Y.: An Approach to Quantification of Human Alertness Using Dynamics of Saccadic Eye Movement: For an Application to Human Adaptive Mechatronics. In: 8th Int. Conf. Mechatronics. Technol., pp. 563–568 (2004) 12. Herscovitch, J., Broughton, R.: Sensitivity of the Stanford Sleepiness Scale to the Effects of Cumulative Partial Sleep Deprivation and Recovery Oversleeping. Sleep 4, 83–92 (1981) 13. Bahill, A.T., Kallman, J.S., Lieberman, J.E.: Frequency limitations of the two-point central difference differentiation algorithm. Biol. Cybern. 45, 1–4 (1982)

460

A. Ueno et al.

14. Komatsu, Y., Ueno, A., Hoshino, H.: Real-time discrimination of saccade and eye blink from output signal of eye movement monitor. In: 19th Ann. Conf. The Society of Life Support Technology, p. 54 (2003) (in Japanese) 15. Ueno, A., Kokubun, S., Uchikawa, Y.: A Prototype Real-Time System for Assessing Vigilance Levels and for Alerting the Subject with Sound Stimulation. Int. J. Assist. Robotics and Mechatronics 8, 19–27 (2007) 16. Kokubun, S., Ueno, A., Uchikawa, Y.: Influence of Sound Pressure Level of Beep Stimulation on Arousal Effect: Quantitative Evaluation Based on Analysis of Saccade and Electroencephalogram. Trans. SICE 44, 871–877 (2008) (in Japanese) 17. Kitajima, H., Numata, N., Yamamoto, K., Goi, Y.: Prediction of Automobile Driver Sleepiness: 1st Report, Rating of Sleepiness Based on Facial Expression and Examination of Effective Predictor Indexes of Sleepiness. Trans. JSME Series C 63, 3059–3066 (1997) (in Japanese)

Implementing Human Factors within the Design Process of Advanced Driver Assistance Systems (ADAS) Boris van Waterschoot and Mascha van der Voort Laboratory of design, Production and Management, University of Twente, Drienerlolaan 5, 7500 AE Enschede, The Netherlands {b.m.vanwaterschoot,m.c.vandervoort}@utwente.nl

Abstract. This paper introduces our research which aims to develop a design approach for ADAS applications in which human factors (including stakeholder feedback and objective performance measures) are explicitly accounted for. Since driving is part of a complex (traffic) system, with a large number of interacting components, ADAS design is confronted with choices for which the influence on the system, and the driving performance in particular, is not immediately manifested. Therefore, providing designers with relevant feedback during the design process, about the consequences of specific choices, will increase the efficiency and safety of driver assistance systems. Keywords: Advanced driver assistance systems, driving task, human factors engineering, interaction design, design support, automation.

1 Introduction In-vehicle support systems show a rapid change in terms of sharing control with the human driver and present generation technologies are shifting their support from low-level vehicle control towards high-level driving tasks. Conventional vehicle automation (e.g., automatic gear changing or cruise control) is replaced by systems that show enhanced driving automation, i.e. systems that are able to perceive, decide and act in an appropriate manner. Adaptive cruise control (ACC) for example, shows cognitive abilities in order to maintain both required driving speed and distance from the leading vehicle; cognitive tasks which are (temporarily) taken over from the driver. As [1] already addressed, these emerging trends in vehicle automation are not only advances, but the increase of automation and shifts in task control are cause for concern as well. An extensive amount of research deals with the problems that might arise when technology and humans have to coordinate their tasks in order to reach a common goal. Endsley made an important contribution by showing that situation awareness (SA) interacts with automation [2]. In a similar vein, Walker and colleagues [3, 4] communicated their concerns regarding the implications of the increasing amount of automation in vehicle design. The present paper introduces our research, which is directly aimed at providing advanced driver assistance systems (ADAS) design with the appropriate knowledge and insight concerning the implications of supporting technologies within the driving task. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 461–470, 2009. © Springer-Verlag Berlin Heidelberg 2009

462

B.van Waterschoot and M.van der Voort

Currently, ADAS design is highly technology-driven, which means that new functions are added when they are feasible rather than because they are needed [5]. If ADAS design tends to focus on technique, attempting to automate whatever is possible, at least two main drawbacks arise. Firstly, since the supporting technologies are often developed independently, the overall performance of the vehicle remains unknown. Potential problems due to shared control cannot be evaluated until the actual realization of a given technology within the driver-vehicle system as a whole. Secondly, mere feasibility of technology discards the view of a joint driver-vehicle system. Although single technologies (e.g., ACC or automatic parking) can offer support for specific functions within the overall driving task, they are part of a larger system in which driver and vehicle share control in order to manoeuvre in traffic. The inclusion of dedicated support should therefore complement the overall driving task without interference. By shifting the control capabilities towards the higher level driving task, driver support systems are reducing certain requirements placed on the driver. The implications of driver support, however, are not clear-cut because improper driver support can be responsible for both mental under- and overload. Designing ADAS is therefore not only confronted with the mere implementation of advanced technologies, but should address implications for human operation as well. The present paper asserts that scientific inquiries have already provided much insight about the influence of automation on human (cognitive) performance. Multidisciplinary fields like cognitive engineering or human factors research have put much effort in highlighting the issues that arise with the emergence of automation. However, specific solutions how to overcome unwanted implications related to automated driver support, are not readily available. For this reason, our research will be aimed at developing an adaptive design environment, in which designers are provided with the consequences of their design choices on human-system performance. This knowledge should support the implementation of technologies that complement the human driver within the overall driving task. A complementation that is intended to improve the quality of the (joint) cognitive system in terms of efficiency, safety and convenience. The appreciation for evaluating in-vehicle systems already exists and is supported by a large body of scientific knowledge that encompasses, among others, driving behaviour, human factors and human-technology interaction. However, true knowledge about how to translate systems’ evaluation into specific design improvements is still lacking. This is mainly due to the fact that the influence of additional design changes (or re-designs) on the driving behaviour remains unknown until a re-evaluation takes place. Secondly, as already mentioned, the industry of ADAS applications is highly technology driven. Which means that whether a certain technology is implemented often depends on the possibilities for such a development instead of the true need from a driver-vehicle system point of view. We argue that these approaches are not mutually exclusive, but can only be realized when ADAS designers receive feedback about the system’s requirements and when the nature of potential problems are evaluated during the design process. If the influence of design choices cannot be predicted in advance, they should at least be evaluated early in the design process with short and adaptive iterations. Before addressing the main issues of ADAS design and evaluation in Chapter 4, we will first address the role of ADAS in contemporary driving and the consequences of increased automation.

Implementing Human Factors within the Design Process of (ADAS)

463

2 Driving and Advanced Driver Assistance Systems In 1938 [7] driving was thought as psychologically analogous to walking or running, with the addition that driving is locomotion by means of a tool (i.e. the car). In this view, the goal of the driver is to move from one point in space to another; the destination. During this locomotion obstacles are perceived and the drivers’ task is to avoid them. While driving can still be seen as an intimately merged perceptual-motor task, today’s driving task is seen as considerably more than that [8] and theoretical attempts to understand the driving task are still challenging in the scientific community. Two main contributors to a changing driving task are the introduction of ADAS in modern day cars, which is (deliberately) aimed at reducing the requirements placed on the driver, and the car manufacturers’ aim to make driving more comfortable. The predominant reason for introducing in-vehicle support systems is the assumption that by supporting the driving task through increased automation, the drivers receive additional information and driver support enables them to share the overall driving task with dedicated support. Reducing the cognitive efforts placed on the driver and providing relevant information, should make driving more comfortable and safer. By equipping vehicles with sensors, navigation and motion planning the driving task is shared between human actors and the supporting assistance systems. By adding and improving cognition and control techniques, this could lead to autonomous vehicles in which the driving task is controlled by the vehicle and the responsibility is shifted towards the vehicle and its manufacturer. Although legal issues and high infrastructural demands will prevent the introduction of such autonomous vehicles in the near future, research has already provided (semi-) automated concept cars in which minimal or no intervention of human actors is required. Meanwhile, different assistance systems are already supporting the present day driver by means of sensory information (e.g., visibility aids or lane departure warnings), correction (e.g., anti-lock braking system or traction control) or even control (e.g., automatic parking). Apart from introducing ADAS, the experience of driving changed with the assumption employed in modern day cars, that it is beneficial to reduce internal car noise [9]. According to [10] minimizing interior vehicle noise and vibration would allow passengers to “enjoy the latest advances in communications and entertainment technologies” (page 83). In accordance, [4] observed that in automotive design the level and type of feedback available to the driver, is diminishing. Although this trend (reduced interior vehicle noise and applying softer suspension) is not directly within the scope of our research, it is mentioned for two reasons: First, it is an explicit example of reduced feedback received by the modern day car driver. Reducing the noise and vibrations made by the car (generated by the engine, tires and wind), either by isolating the drivers’ cockpit from external noise or through the use of absorbing techniques and materials, reduces valuable motion cues, which are used to make safety-related judgments [4]. Reduced (auditory) feedback alters drivers’ speed perception, which potentially causes them to choose faster speeds, and places them at greater risk of crashing [9] because drivers’ choice of driving speed has been found to be an important predictor of crash risk [11-13]. This brings us to a second reason why we mention the trend of reducing internal car noise. While reducing the level of

464

B.van Waterschoot and M.van der Voort

internal noise is seen as a goal in designing modern cars, it shows an emerging conflict between comfort and safety. In addition, it shows a conflict with the efforts made by developing ADAS applications, because these supporting systems should increase both driver’s comfort and safety. By increasing comfort through designing cars with reduced levels of internal noise, potential safety issues arise that are unwanted. While car design and the implementation of dedicated support systems should result in safe and comfortable driving, it is argued that these attempts are often considered in isolation and that comfort and safety are not necessarily values on the same coin. Gibson and Crooks [7] described the driving task as “locomotion within an optic array” and the concept of traffic can be described as a dynamical system [14]. The driver-vehicle system, which travels within traffic, is dealing with constraints and these constraints can be defined as either limitations or abilities (affordances) but are generally investigated separately for human and machine (or artefacts). Performance issues (e.g., memory and skills or processing speed) are for the most part determined separately for user and artefact, without viewing the overall performance as a quality of the driver-vehicle system. Although the view of a unified driver-vehicle systems is generally recognized, it does not comply with most design- or evaluation research, which usually approaches human- and vehicle behaviour separately. Due to a wide range of feedback and assessment techniques, the possibilities to evaluate humanmachine systems are numerous and the development of controlled research environments (e.g., driving simulations and validated experimental designs) resulted in a large body of investigations concerning the influence of in-vehicle support on the users’ behaviour [15,16]. Oddly enough it is either the human factor or the technology that is subject to investigation. In the development and evaluation of HCI environments, humans are often seen as the weakest link [17] and their cognitive and physical properties are typically explained as limitations. This stance however is arbitrary and potentially narrows the perspective of the researcher. Furthermore, this view may result in a machine centred bias, which argues that automation compensates for human inadequacies [18]. The expectation that “machines do it better” induces at least a competition between humans and technology, while they are expected to cooperate. Consequently, this section argues that the observed trend of increased automation in the driving task poses new challenges in the design of driver support.

3 Automating Driver Support Due to the introduction of (semi-) automated support systems, a shift in control has taken place within the driving task [1]. This has led to three main consequences. First of all, the increase of automation and the shift in control have led to the recognition of behavioural consequences for the driver. The out-of-the-loop problem (which can be characterized as an insufficient interaction between driver and vehicle) and the related decrease of situation awareness are typical examples of this influence [19, 2]. Secondly, the shift in control between driver and in-vehicle support has led to an additional shift concerning the responsibilities and the level of interaction between ‘human and machine’. While in conventional driving the driver is in full control, a

Implementing Human Factors within the Design Process of (ADAS)

465

Fig. 1. Showing an idealized, although simplified picture of the relationship between driver and vehicle within the human-vehicle system (HVS). The x-axis represents the (cumulative) level of control, which corresponds linearly with the amount of responsibility within the driving task. The y-axis represents the level of interaction between Driver and Vehicle, which in its turn depends on the amount of control (x-axis).

Fig. 2. An example of the interactive relationship between Responsibility and Control of the driving task

control shift occurs when the need for human intervention diminishes (theoretically, in a fully autonomous vehicle no intervention is needed). When control shifts towards the vehicle and its supporting systems, the interaction between vehicle and driver increases which potentially results in an additional shift in responsibility. Theoretically, in a fully autonomous vehicle with no human interventions (i.e. no human control in the driving task) responsibility is shifted entirely towards the vehicle and its manufacturer. Figure 1 shows a hypothetical image of this changed relationship between driver and vehicle due to the increase of automation and the shift in control. In order to exemplify the relationship between driver and vehicle, an example is presented in Figure 2. At (X = -1), the driver is in full control and receives no additional information from the vehicle. Here, the amount of interaction (grey line,

466

B.van Waterschoot and M.van der Voort

y-axis) is zero. Accordingly, the amount of driver’s responsibility within the overall driving task, decreases as the amount of interaction between driver and vehicle increases. At (X = - .75), the driver is sharing his responsibility with the vehicle, as he’s receiving additional information from the latter (e.g., driving speed, temperature, distance to preceding cars, etc.). The amount of interaction (between driver and vehicle) increased as opposed to (X = -1), because the information presented by the vehicle influences the driver’s behaviour. In its turn, this information is highly interdependable on the driver’s behaviour, because he can act on this information accordingly (e.g., increasing speed, decreasing distance to preceding car, etc.). At (X = 0), driver’s and vehicle’s control of the overall driving task are in an (metaphorical) equilibrium. Here, the amount of control and responsibility are equally shared, and the interaction between driver and vehicle is maximal. The expression “equilibrium” is chosen to emphasize the instability of this level of control. After all, up till today, any type of information or supporting assistance system (e.g., driving speed or adaptive cruise control, respectively) can be overruled or ignored by the driver. Therefore, the state in which the control of the driving task is equally shared between driver and vehicle (and hence, the available ADAS) remains hypothetical. A third consequence of the increase of automation and the shift in control towards the vehicles’ supporting systems is the grown appreciation in the human factors and design communities for the potential (safety) problems involved. Psychological constructs like underload and overload have been adopted within the driving context (e.g., [20]) and mental (work)load and performance issues gained increased attention. However, although these constructs are able to reveal the influence of automation on the behaviour of the human driver, no consensus exists about the optimum values of these concepts. Individual differences and (human) cognitive flexibility are two main reasons for this shortcoming.

4 ADAS Design and Evaluation Regardless of the nature of the systems at stake, for ADAS designers it is important that they are provided with the impact caused by their design choices, manifested by (for instance) distraction related problems or cognitive under- and overload. Evaluation will enable the designer to improve or modify the concept according to the evaluation outcome. Human factors experts and ADAS designers try to optimize ADAS applications by using evaluation sessions and experimentally obtained information. They are confronted, however, with two main problems. Even if close collaboration between design and human factors research is established, at least temporal disparities remain between actual design decisions and subsequent evaluation. When prototypes are evaluated (e.g., by their impact on driving behaviour) researchers receive performance outcomes and provide recommendations with which designers head back to the ‘drawing board’. After applying design changes, the design process is back where it was when the first prototype was finished: not yet able to reveal its impact on the driving behaviour and its overall surplus value. Moreover, the assessment of design choices is not only concerned with the influence on the driving behaviour, but both design and evaluation are concerned with a driver-car-traffic system that changes when individual in-vehicle

Implementing Human Factors within the Design Process of (ADAS)

467

applications are added or reconfigurated. When dealing with such a complex system, ADAS design and evaluation are confronted with the difficult task to disentangle the influence of specific design choices. A second problem concerning the evaluation of ADAS applications (and hence ADAS design in general) deals with the type of information the designers have to deal with. Psychological constructs like situation awareness and workload, which are well adapted within human factors- and psychological research, have proven their applicability in order to address problems that concern the interaction between technology or artefacts and their users. Although experimental designs and evaluations using these constructs are appreciated for their ability to account for the human factor (i.e. how does technology influences human behaviour and which behavioural effects arise when humans interact with automation?), they do not provide explicit information for the designers to hold on. The behavioural constructs and the related experimental outcomes are therefore not able to directly instruct how ADAS design should be improved. Human factors- and design communities still lack a considerable amount of common ground [21]. For ADAS design and evaluation this has two major implications. On the one hand the technological stance often puts human factors professionals in a difficult position, because when consulted, they have to convince engineers of the importance of human centred design. On the other hand the evaluation considering human factors is confronted with the difficult task to translate its results (the influence of in-vehicle support on driver- and vehicle behaviour) into specific design considerations, which presents ADAS designers with the difficulty of accounting for human factors in accordance with the systems’ performance and the preferred outcome, including stakeholders’ preferences.

5 Conclusions and Future Work In this paper we argue that present-day driving and the design of automated driver support has three main issues to deal with. In general ADAS design has to deal with a biased technological viewpoint. Developers ask themselves if and how new technologies can be implemented [5]. During this process human factors professionals evaluate the assistance system through the behavioural impact on the driving behaviour of the user or the performance of the vehicle. Psychological constructs used for this evaluation are for example situation awareness and workload. This type of research provides a clear view of the influence or impact on the driving behaviour and (re)design recommendations can be made according to this research. The behavioural and physiological measurements used for evaluation, however, do not directly relate to or ‘produce’ design suggestions. This means that a conversion from evaluation results into design solutions is missing. In addition, while the measurements ordinarily used for evaluation are able to present the influence of isolated and controlled independent variables on the driving behaviour (typically represented by SA, workload or other psychological measures) they do not relate to the (entire) driver-vehicle system, let alone to the influence on the traffic system. In our view, design should therefore take account of the needs, competencies and limitations of the joined driver-vehicle system.

468

B.van Waterschoot and M.van der Voort

Secondly, in the development process of ADAS applications by the automotive industry much effort is spent to present prototypes before evaluating them. This iterative process can take a while and is reflected by high developmental costs. If experts from different disciplines (e.g., designers, engineers, human factors practitioners) would test the design at an earlier stage, the performance and impact of specific design choices could be observed more rapidly. This would enable them to apply design improvements during the design process. Improvements can be made by using, for example, VR simulations, user-experiences, performance measures and expert collaboration. In the present paper we argue that a distinct, straightforward and controlled design environment, which accounts for human factors and at the same time enables designers to improve their designs with unambiguous design solutions, is still lacking. This could be overcome by early involvement of human factors knowledge during the design process. Finally, the increase of automation and the shift of control in the driving task are reason for concern and raises the question of how to deal with the consequences and how to prevent unwanted and unforeseen effects. A vast body of scientific research allows us to evaluate the consequences of automation and control shifts, but ready to use solutions (that are applicable and reliable for each configuration) are not available. It is agreed in a large body of research that because automation shifts control and potentially influences driver behaviour or modifies the driving task, safety can be jeopardized [22-26]. However, no consensus exists about how to implement automation in order to optimize driver support. Perhaps Norman [21] is quite revealing when he asserts that “automation always looks good on paper. Sometimes you need real people”. At least the (automated) support and driver system should be in ‘balance’ and complement each other. While technological innovations can be promising and entail potential surplus value, they cannot disregard the actual and true human needs that are represented by ‘the human factor’ that keeps the driver in the loop. Optimizing driver support is an endeavour that will remain a current affair until we can seat ourselves in the first commercially available autonomous vehicle and human intervention is minimal. To summarize, the aim of our study is to improve the design process of advanced driver assistance systems by developing a new design approach. We introduced our research by discussing some major issues concerning advanced driver assistance, its design and the implementation of automation. In our view, a new design approach should: • Be able to take advantage of both human abilities (i.e. human factors) and technological possibilities. • View the driving task as being part of a joint driver-vehicle system. • Support the design process with expert knowledge about the systems’ needs, expectations and abilities. • Be able to identify performance (i.e. facilitate systems’ evaluation). • Translate evaluation into design solution or -suggestion. In order to realize a surplus value, the new design approach will implement human factors knowledge during the design process. In line with [27] the design environment therefore enables, at least, early stakeholder involvement. Next steps towards an integration of human factors and ADAS design are twofold. On the one hand

Implementing Human Factors within the Design Process of (ADAS)

469

subsequent research will consist of modelling the relation between driving functions and the related systems’ behaviour. One the other hand we will examine the integration of problem, solution and assessment within the context of driver support. In addition, we will explore the applicability of (existing) behavioural measures, having the strength to serve the translation between evaluation and design.

Acknowledgement This research is part of the knowledge centre AIDA, a collaboration between TNO and the University of Twente.

References 1. Young, M.S., Stanton, N.A., Harris, D.: Driving automation: learning from aviation about design philosophies. International Journal of Vehicle Design 45, 323–338 (2007) 2. Endsley, M.R., Kiris, E.: The out-of-the-loop performance problem and level of control in automation. Human factors 37, 381–394 (1995) 3. Walker, G.H., Stanton, N.A., Young, M.S.: Where Is Computing Driving Cars? International Journal of Human-Computer Interaction 13, 203–229 (2001) 4. Walker, G.H., Stanton, N.A., Young, M.S.: The ironies of vehicle feedback in car design. Ergonomics 49, 161–179 (2006) 5. Hollnagel, E.: A function-centred approach to joint driver-vehicle system design. Cognition, Technology & Work 8, 169–173 (2006) 6. Young, M.S., Stanton, N.A.: Malleable Attentional Resources Theory: A New Explanation for the Effects of Mental Underload on Performance. Human Factors 44, 365–375 (2002) 7. Gibson, J.J., Crooks, L.E.: A Theoretical Field-Analysis of Automobile-Driving. The American Journal of Psychology 51, 453–471 (1938) 8. Groeger, J.A.: Understanding Driving: Applying Cognitive Psychology to a complex everyday task. Psychology Press, Hove (2000) 9. Horswill, M.S., Plooy, A.M.: Auditory feedback influences perceived driving speeds. Perception 37, 1037–1043 (2008) 10. Trainham, J.: Quiter rides. Automotive Engineering International 113, 83 (2005) 11. Wasielewski, P.: Speed as a measure of driver risk: observed speeds versus driver and vehicle characteristics. Accident Analysis and Prevention 16, 89–103 (1984) 12. Horswill, M.S., McKenna, F.P.: The development, validation, and application of a videobased technique for measuring an everyday risk-taking behaviour: drivers’ speed choice. Journal of Applied Psychology 84, 977–985 (1999) 13. Evans, L.: Traffic Safety. Science Serving Society, Bloomfield Hills (2004) 14. Cantarella, G.E., Cascetta, E.: Dynamic Processes and Equilibrium in Transportation Networks: Towards a Unifying Theory. Transportation Science 29, 305–329 (1995) 15. Stanton, N.A., Young, M.S.: Vehicle automation and driving performance. Ergonomics 41, 1014–1028 (1998) 16. Wierwille, W.W.: Demands on driver resources associated with introducing advanced technology into the vehicle. Transportation Research Part C: Emerging Technologies 1, 133–142 (1993) 17. Flach, J.M., Hoffman, R.R.: The Limitations of Limitations. IEEE Intelligent Systems 18, 94–96 (2003)

470

B.van Waterschoot and M.van der Voort

18. Norman, D.A.: Things That Make Us Smart. Addison-Wesley, Boston (1993) 19. Endsley, M.R.: Toward a Theory of Situation Awareness in Dynamic Systems. Human Factors 37, 32–64 (1995) 20. Walker, G.H., Stanton, N.A., Young, M.S.: Feedback and driver situation awareness (SA): A comparison of SA measures and contexts. Transportation Research Part F: Traffic Psychology and Behaviour 11, 282–299 (2008) 21. Norman, D.A.: The Design of Future Things. Basic books, New York (2007) 22. Sheridan, T.B.: Human Factors of Driver-Vehicle Interaction in the IVHS Environment. Publication No. NHTSA/MIT, DOT-HS-807-737, National Highway Traffic Safety Administration, Department of Transportation, Washington, DC (1991) 23. Hancock, P.A., Parasuraman, R.: Human factors and safety in the design of intelligent vehicle-highway systems (IVHS). Journal of Safety Research 23, 181–198 (1992) 24. Michon, J.A.: Generic Intelligent Driver Support. A comprehensive report on GIDS. Taylor & Francis, London (1993) 25. Verwey, W.B.: Further evidence for benefits of verbal route guidance instructions over symbolic spatial guidance instructions. In: Reekie, D.H.M. (ed.) Proceedings of the Vehicle Navigation and Information Systems Conference (VNIS 1993), pp. 227–231. IEEE, Toronto (1993) 26. Lansdown, T.C., Brook-Carter, N., Kersloot, T.: Distraction from Multiple In-Vehicle Secondary Tasks: Vehicle Performance and Mental Workload Implications. Ergonomics 47, 91–104 (2004) 27. van der Voort, M.C., Tideman, M.: Combining Scenarios and Virtual Reality into a New Approach to Including Users in Product Design Processes. Journal of Design Research (in press)

A Survey Study of Chinese Drivers’ Inconsistent Risk Perception Pei Wang, Pei-Luen Patrick Rau, and Gavriel Salvendy Department of Industrial Engineering Tsinghua University, Beijing 100084, China [email protected], [email protected], [email protected]

Abstract. It is important to identify factors contributing to drivers’ risk taking behaviors in order to reduce traffic accidents and fatalities. This study conducted a survey to investigate drivers’ risk perception towards different risks encountered in daily life. Totally 438 subjects responded to the survey and indicated their likelihood of engaging in risk activities in different domains. The mean of likelihood to lend friend money is the highest and to shoplift is the lowest. It was found that respondents were most likely to engage in financial risks. Then it was social risks, safety risks, recreational risks and ethical risks. Respondents were least likely to engage in health risks. Male drivers were more likely to engage in the risks than female drivers in some factors. Keywords: Chineses driver; risk perception; risk domains.

1 Introduction In China, the number of cars and number of drivers is increasing. According to China National Bureau of Statistics, there were 19.58 million of cars and 107 million of car drivers until the end of 2007. Comparing with 2006, the number of cars and number of licensed car drivers increased by 26.7% and 9.17% respectively. In 2006, totally 378,781 traffic accidents happened in China, 89,455 persons died and 431,139 persons were injured [1]. Traffic accidents have brought loss of people’s lives, injuries and economic loss. Furthermore, the problems are not only serious in big cities, such as Beijing and Shanghai, but also in small and medium-sized cities. Drivers’ risk taking behavior is a contributing factor to injury and death in road crashes. In order to reduce the number of traffic violations and accidents, researchers have done a lot of empirical studies and theoretical commentaries in drivers’ risk perception and risk taking behavior [7]. It was found that individuals do not appear to be consistently risk seeking or risk averse across different domains even when using the same assessment method, as documented in both laboratory studies and managerial contexts [5]. For example, managers appear to have different risk attitudes when making decisions involving personal versus company money or when evaluating financial versus recreational risks. The purpose of this study is to investigate Chinese drivers’ risk perception towards D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 471–476, 2009. © Springer-Verlag Berlin Heidelberg 2009

472

P. Wang, P.-L.P. Rau, and G. Salvendy

different risks and to know whether Chinese drivers’ risk taking is related with the domain of the risk, what are the risk domains, and if there is gender differences in these risk domains.

2 Literature Review People showed different degrees of risk taking in activities of games of chance/gambling, financial investing, business decisions, personal decisions, social decisions, and ethic decisions. Personal decisions in risk taking can be broken down into categories that differ in content and variables that affect risk perception and risk taking such as familiarity and controllability [6]. According to the content, risk taking decisions can be divided into health/safety decisions (seatbelt usage, smoking), recreational decisions (sky diving versus bowling), social decisions (confronting coworkers or family members), and ethical decisions (cheating on exams, terminating a comatose family member’s life support). These results have been found using different scales for decision making, such as Kogan and Wallach’s Choice-Dilemma Scale, Budner’s Intolerance of Ambiguity Scale, and Zuckerman’s Sensation-seeking Scale version V [8]. Weber [8] did a survey to assess risk taking in five risk domains, including financial decisions, health and safety decisions, recreational decisions, ethical and social decisions. It was found that the degree of respondents’ risk taking was highly domainspecific, not consistently risk seeking or consistently risk averse in all risk domains. This means that conventional risk-attitudes are also domain-specific rather than reflections of a stable attitude or trait. Weber also pointed out that individual, gender, and content domain differences in apparent risk taking seem to be associated primarily with differences in the perception of the activities’ benefits and risk, rather than with differences in attitude towards perceived risk. Byrnes, Miller, and Schafer [3] conducted a meta-analysis of 150 studies comparing risk-taking behaviors of men and women in a variety of domains (e.g. financial or health risks) and tasks (e.g. hypothetical choices or self-reported behaviors). They found gender differences in risk perception that varied as a function of content domain. Bromiley and Curley [2] assume that risk-taking is influenced jointly by the situation and by characteristics of the decision maker.

3 Questionnaire Design This is a pen-and-paper survey and there are three parts of contents in the questionnaire. The first part is introduction of the survey. In the second part, totally 53 items of risk taking behaviors are included, with 2 replication and 1 overall question. All items of the questionnaire are shown in the Appendix. Most of the questions are from the items used by Weber [8]; four items about driving risk taking are from Parker’s study [4]; and the others are created empirically. The questions cover six content domains that involve risk taking decisions, they are financial decisions (e.g. buying stock), health decisions (e.g. smoking), safety decisions (e.g. seatbelt usage), recreational decisions (e.g. ski run), social decisions (e.g. lending money to a friend), and ethical decisions (e.g. cheating on exams), in the aim of covering risks in daily life.

A Survey Study of Chinese Drivers’ Inconsistent Risk Perception

473

After all the questions about risk taking behavior, the last part of the questionnaire is to ask the respondents’ personal information, including gender, age, education, income, and driving profiles, such as years of possessing a driving license, accumulative driving mileages. Respondents were asked to evaluate their likelihood of engaging in these risk behaviors on a seven-point rating scale ranging from 1 (‘Extremely unlikely’) to 7 (‘Extremely likely’). Items were not presented by content but randomly interspersed.

4 Results Analysis Participants of this survey are drivers who have drivers’ license for at least one year and have driving experience on highway. All of the respondents were invited by personal contact. Totally 438 drivers responded to the survey, aging from 20 to 60 years old. After filtering of incomplete and inconsistent answers from the 438 respondents, there were 365 valid responses (261 males and 104 females). The average age of the respondents was 31.3 years old (SD = 5.70 years). The average of their driving experience was 4.7 years (SD=4.40 years). Most of the respondents were highly educated, 53.2% of them had a bachelor degree and 43.8% of them had a master or higher degree. The 50 questions of likelihood to engage in risks were analyzed. The mean of likelihood to lend friend money is the highest, which is 5.8 (SD=1.23). In the domain of financial risks, the mean for most items are above 5.0. The mean of likelihood to buy stock and to invest in moderate fund is 5.6 (SD=1.32) and 5.5 (SD=1.39), and the mean of likelihood to hold stock in loss and to hold stock in profit are both 5.2, with SD=1.50 and SD=1.45 respectively. In the domain of traffic risks, the mean of likelihood to overtake inside is 5.2 (SD=1.62), the mean to overspeed for 15 minutes and overspeed for 5 minutes are 4.2 (SD=1.81) and 4.1 (SD=1.97), and the mean to speed for fun, to drive down hard shoulder and to disregard the speed limit are 3.9 (SD=2.05), 3.8 (SD=1.81) and 3.4 (SD=1.89) respectively. The mean for only two items in traffic risks are below 3.0, that is to drunk drive and not to wear seatbelt on freeway with the mean of 2.7(SD=1.74) and 2.6 (SD=1.74). The mean to binge drinking, to smoke too much, to take medical with side effect and to shoplift are the lowest, which are 2.2 (SD=1.65), 2.2 (SD=1.89), 2.1 (SD=1.38) and 1.4 (SD=0.79). Considering the mean of all items in each domain, respondents indicated that they were most likely to engage in financial risks. Then it was social risks, safety risks, recreational risks and ethical risks. Respondents were least likely to engage in health risks. Exploratory factor analysis was used to identify the factor structure of the questionnaire. The principle components analyses (PCA) extraction method and orthogonal rotation methods were used. After a series of factor analysis and item reduction, a model was obtained with principle components analysis extraction and Varimax with Kaiser Normalization rotation. For this model, KMO is 0.834 and the Barlett’s test of Sphericity is significant. Both of these indicate that the items have high correlation with each other and it is suited to do factor analysis. The Cronbach’s Alpha for all items is 0.868, indicating an acceptable level of internal consistency. The model explained 58.148% of total variance. There are 11 factors in the model and they are consistent with risks in domains of safety, health, recreation, ethic, social, and finance,

474

P. Wang, P.-L.P. Rau, and G. Salvendy

but some of the risk domains are further divided into two or three sub-domains. The risks in safety domain are divided into two parts. One is risks that are frequently taken by Chinese people, such as overspeed in order to save 5 minutes. The other is risks that are less taken but have more serious consequences, such as drunk driving. The financial domain is divided into three parts. One is recreational risks that related with spending money. The other two are financial investment and investment choice based on loss and benefit. The risks in social domain are also divided into two parts; they are social risks related and not related with money. It was found that male and female respondents differed significantly in their perceptions of five risk categories, including two parts of safety, recreation, health, and social risk related with money. Male drivers appeared to be more risk seeking than female drivers in the five categories. In the domain of social risks not related with money, such as speak unpopular issue and disagree boss, female drivers were more risk-seeking, though no significance was found.

5 Conclusion The purpose of this study is to investigate whether Chinese drivers’ risk taking is related with the domain of the risk, what are the risk domains, and if there is gender differences in these risk domains. It was found that the mean of likelihood to lend friend money was the highest and to shoplift was the lowest. Respondents showed very high likelihood to engage in financial risks, the mean for most financial risks are above 5.0. In the domain of traffic risks, the mean for 3 items are above 4.0, they are to overtake inside, to overspeed for 15 minutes and to overspeed for 5 minutes. The mean for only two items in traffic risks are below 3.0, that is to drunk drive and not to wear seatbelt on freeway. The mean of likelihood to engage in health risks are the lowest. Respondents’ perception on traffic risks is higher than health risks. It is a reflection of Chinese drivers’ low-risk perception on traffic risks. This is remarkable for further research of Chinese drivers’ risk perception and risk taking behaviors. In the process of factor analysis, it was found that some items were always extracted into one component. In the domain of traffic risks, overspeed to save 15 minutes, overspeed to save 5 minutes, disregard the speed limit, speeding for fun, drive down hard shoulder, and overtake inside were in one component; drunk driving, ride motorcycle without helmet, not wear seatbelt on freeway were in another component. In the domain of social risks, social risks not related with money, i.e. speak unpopular issue, wear provocative clothes and disagree with boss were in one component; and social risks related with money, i.e. co-sign for a friend, lend friend money were in another component. In the domain of financial risks, risks related with spending money, including playing mah-jongg, betting in poker game and spending money impulsively. Risks in ethic domains, i.e. steal TV cable, cheat of income tax, forge signature, use office supplies and cheat exam were always in one component. With comparing the mean of the 11 factors, it was found that male drivers appeared to be more risk seeking than female drivers in some factors. For social risks that are not related with money, female drivers were more risk-seeking, though no significance was found.

A Survey Study of Chinese Drivers’ Inconsistent Risk Perception

475

References 1. CRTAS, China Road Traffic Accidents Statistics. Traffic Administration Bureau of China State Security Ministry, Beijing, China (2006) 2. Bromiley, P., Curley, S.P.: Individual differences in risk taking. In: Yates, J.F. (ed.) Risktaking Behavior, pp. 87–132. John Wiley & Sons, Chichester (1992) 3. Byrnes, J.P., Millera, D.C., Schafer, W.D.: Gender Differences in Risk Taking: A MetaAnalysis. Psychological Bulletin 125(3), 367–383 (1999) (abstract retrieved) 4. Parker, D., McDonald, L., Rabbitt, P., Sutcliffe, P.: Elderly drivers and their accidents: the Aging Driver Questionnaire. Accident Analysis and Prevention 32, 751–759 (2000) 5. Schoemaker, P.J.H.: Are risk-attitudes related across domains and response modes? Management Science 36(12), 1451–1463 (1990) 6. Slovic, P.: Perception of Risk. Science, New Series 236(4799), 280–285 (1987) 7. Turner, C., McClure, R.: Quantifying the role of risk-taking behaviour in causation of serious road crash-related injury. Accident Analysis and Prevention 36 36(3), 383–389 (2004) 8. Weber, E.U., Blais, A.-R., Betz, N.E.: A Domain-specific Risk-attitude Scale: Measuring Risk Perceptions and Risk Behaviors. Journal of Behavioral Decision Making 15, 263–290 (2002)

Appendix: Questionnaire 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

Moving into a new decorated house with tolerating the chemicals Driving with a speed of more than 60miles/hour in a heave fog on the freeway Investing 10% of your annual income in a moderate growth fund Ignoring some persistent physical pain by not going to the doctor Drive down the hard shoulder of the motorway when the other lanes are jammed Co-signing an application to go abroad for a friend Consuming five or more servings of alcohol in a single evening Eating ‘expired’ food products that still ‘look okay’ Approaching your boss to ask for a raise Chasing a typhoon by car to take dramatic photos Crossing a junction when the traffic lights have already turned against you Investing 10% of annual income to buy stock of a well-performed company Disregarding the speed limits late at night or early in the morning Drive even you realize that you may be over the legal blood-alcohol limit Shoplifting a small item (e.g. a lipstick or a pen) Cheating on an exam Driving over 120 km/h on freeway just for fun Already have loss, going on holding a stock which began to fall two days ago Regularly riding your motorcycle without a helmet Forging somebody’s signature Buying an prohibited medicine (e.g. antibiotic )for your own use Openly disagreeing with your boss in front of your coworkers Betting a day’s income at the lottery. Overspeed in order to save 15 minutes in a 1 hour journey. Getting involved in unofficial ‘races’ with other drivers Cheating by a significant amount on your income tax return

476

27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.

P. Wang, P.-L.P. Rau, and G. Salvendy

Wearing provocative or unconventional clothes on occasion Not having a smoke alarm in or outside of your bedroom Trying out bungee jumping Already have profit, going on holding a stock which fell two days ago Going on a two-week vacation in a third-world country without prearranged travel and hotel accommodations ahead Periodically engaging in a dangerous sport (e.g. sky diving) Dating someone that you know from the internet a week ago Going down a ski run that is beyond your ability or closed Not wearing a seatbelt when driving on the freeway Becoming impatient with driver in the outer lane and overtake on the inside Deciding to share an apartment with someone you don’t know well Taking a week’s income to play in the mah-jongg Speaking your mind about an unpopular issue at a social occasion Smoking a pack of cigarettes per day Frequent binge drinking Taking a medical drug that has a high likelihood of negative side effects Swimming in nature waters without any safeguard Lending a friend an amount of money equivalent to one month’s income Overspeed in order to save 5 minutes on a 1hour journey Using office supplies for your personal business Going camping in the wilderness with friend, beyond the civilization of a campground Stealing an additional TV cable connection Betting a day’s income at a high stake poker game Spending money impulsively without thinking about the consequences Already have profit, selling out a stock which began to fall two days ago Not driving when you realize that you may be over the legal blood-alcohol limit What is the highest speed that you have ever driven on the highway?

Design for Smart Driving: A Tale of Two Interfaces Mark S. Young1, Stewart A. Birrell1, and Neville A. Stanton2 1

School of Engineering and Design, Brunel University, Uxbridge, Middlesex UB8 3PH, UK 2 Transportation Research Group, University of Southampton, Highfield, Southampton, Hampshire SO17 1BJ, UK {m.young,stewart.birrell}@brunel.ac.uk, [email protected]

Abstract. The environmental and financial costs of road transport are a key issue for governments, car manufacturers and consumers. Alongside these issues remain longstanding concerns about road safety. The ‘Foot-LITE’ project is aimed at designing a ‘smart’ driving advisor to improve safe and eco-driving behaviours. This paper presents part of the human-centred design process to devise an in-car human-machine interface which will facilitate the desired behaviours while avoiding negative consequences of distraction. Two rapid prototyping studies are presented, and the results of feedback from potential users as well as subject matter experts are discussed with respect to implications for the future interface design.

1 Introduction Over the past decade the environmental and economic costs of road transport has become a key issue for government, car manufacturers and consumers [10]. Meanwhile, road safety remains a key issue alongside these other concerns (e.g., [8]). One way in which the costs of driving can be reduced is by adopting ‘smart’ behaviours, which include a combination of both fuel efficient and safe driving styles. ‘Foot-LITE’ is a UK project, consisting of a consortium of five commercial companies, four governmental/charity organisations, and three universities, aimed at developing a system to encourage ‘smart’ driving behaviours. This would be achieved by providing pertinent advice on driving style, enabling drivers to adapt their behaviour and to make informed decisions about the trade-offs between eco- and safe driving. The work presented here is part of a package focused on the ergonomics of the system, with particular emphasis on the in-vehicle human machine interface (HMI), which will present information to the user while they are driving. In this paper, we explain the human-centred design development process for two candidate HMI concepts for Foot-LITE, before going on to describe two rapid prototyping studies aimed at enhancing and evaluating these concepts. 1.1 Design Development In order to facilitate the design process, a Cognitive Work Analysis (CWA; [9]) was previously conducted for the Foot-LITE project [1]. Based on the output of this CWA, D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 477–485, 2009. © Springer-Verlag Berlin Heidelberg 2009

478

M.S. Young, S.A. Birrell, and N.A. Stanton

Fig. 1. Prototype EID interface

two concept HMIs were generated for the present study, one drawing on principles of Ecological Interface Design (EID; cf. [3]), while the other represents a more conventional dashboard (DB) layout based on traditional best practice in interface design. Specifically relevant to the Foot-LITE project, EID offers to dynamically reflect the driving environment and integrate complex information onto a single, direct perception display. Figure 1 shows an early iteration of the EID interface developed at Brunel University for the Foot-LITE project1. The principal aspects of the interface are based on Gibson and Crooks [4] notion of the ‘field of safe travel’, which was noted as ‘…a spatial field but it is not fixed in physical space. The car is moving and the field moves with the car through space.’ (p. 456). As an alternative to the EID concept, a more conventional dashboard-type HMI (DB) has also been developed according to best practice in the human factors literature. Based on a vehicle instrument panel layout, the DB interface consists of bar charts, warning icons, pop-ups and textual information (see figure 2). The basic principles of the design are that Smart driving information is grouped (as with the EID), with the eco-driving parameters all being presented in the left hand circle, while safety-related information is shown in the circle on the right. The main centre circle has a smart driving meter situated at its crest, with additional driving related information or predefined Smart driving tips presented underneath. The DB design is intended to offer familiarity to drivers, being akin to standard instrument panels available in most vehicles. 1

Due to commercial sensitivities, we are unable to publish any more recent or detailed versions of the interfaces.

Design for Smart Driving: A Tale of Two Interfaces

Eco Parameters Presented Here

Driving Info & Tips Presented Here

479

Safety Parameters Presented Here

Fig. 2. Prototype DB interface

1.2 The Rapid Prototyping Study In keeping with a human-centred design process to the project, it is necessary to present the EID and DB concepts to potential users for their evaluation and consideration. In order to make an early human factors comparison of the two designs, static rapid prototypes of each have been produced using standard desktop software. Two iterative studies were conducted as part of this rapid prototyping phase, both with related aims and objectives. Study 1 was a questionnaire to determine user requirements for exactly what information should be presented on the interface and to develop appropriate icons for the relevant parameters. This was followed by Study 2, a desktop presentation study of a variety of driving scenarios on each HMI for user evaluation. In the rest of this paper, we detail these two rapid prototyping studies and discuss their outcomes and implications for future development of the Foot-LITE HMI. It is not an aim of the present study to decide on which of the two concepts will be taken forward for development; further detailed work is planned using the Brunel University Driving Simulator (BUDS) to evaluate the effects of each HMI on driving behaviour.

2 Study 1: User Requirements Questionnaire 2.1 Method Design. The questionnaire was split into two sections. The first focused on determining what type and format of information should be presented on the Foot-LITE invehicle HMI, while the second section of the questionnaire asked participants to rank, in order of preference, a selection of icons which represented different aspects of green and safe driving. These icons were derived from reviewing other standardised icons which are already present in current vehicles (i.e. adaptive cruise control, gear shift indicators etc.), following International Standards Organisation guidelines for invehicle icons (e.g., ISO 15008:2003 [7], ISO 11429 [8]), and other icons generated specifically during the present research. Four different icons were selected to represent each of eight different aspects of ‘smart’ driving. These aspects were: headway,

480

M.S. Young, S.A. Birrell, and N.A. Stanton

fuel economy, lane deviation, acceleration and braking forces, inappropriate cornering speed, gear shift indicators, approaching hazard warning, and a driver alertness warning. Participants. The questionnaire was distributed to 20 participants from the Brunel University driving study participant pool. All participants held current and valid driving licences, drove on a regular basis and had at least three years driving experience. In total 15 questionnaires were completed and returned (mean age of respondents 40.1 years; SD = 9.2 years), comprising nine females and six males. In addition to these, 11 subject matter experts (SMEs) representing partners from the Foot-LITE project consortium completed and returned the questionnaire. The SME group primarily comprised males aged 18-59. 2.2 Results and Discussion The first part of the questionnaire contained several questions designed to address content and format of the information to be presented on the HMI. For content, respondents were asked to indicate five aspects from a suggested list of 20 that they considered to be ‘Essential’, ‘Desirable’, and ‘Least Important’. These responses were then ranked across all respondents in order of indicated importance. The top five rankings were: 1. 2. 3. 4. 5.

Fuel economy (as a numeric value – i.e., miles per gallon / litres per kilometre) Real-time traffic information Headway Driver alertness warning Approaching hazard warning

In terms of format of presentation, both the users and the SMEs wanted information (e.g., fuel consumption, braking forces, emissions etc.) to be presented in an instantaneous format (i.e., actual moment-to-moment data) as well as an average for the entire journey. More specific questions on presentation format covered fuel consumption, headway, and emissions data. Both groups clearly preferred miles per gallon for fuel consumption, with other potential options being actual fuel used, cost of fuel used, or a graphical format. A simple generic representation of headway information (i.e., safe, dangerous etc.) was also favoured by both the user and SME groups. An option on actual, numeric data to be presented alongside resulted in a preference for time headway (in seconds) over distance or relative speed. The presentation of emissions data (e.g., CO2, NOx etc.) split the respondent groups. Two thirds of users and half the SMEs wanted emissions information to be presented to them while driving. When asked what types of emissions they wanted to include on the display, half of the respondents stated CO2 plus other emissions, a quarter only wanted CO2, while the remainder did not know. The second section of the questionnaire was concerned with icon presentation. Respondents ranked icons in terms of their preference. They were also encouraged to suggest combinations of icons or any amendments that would, in their mind, make an icon clearer to understand. The preferred icons for each driving parameter were aggregated across respondents to determine which icons would be used in study 2.

Design for Smart Driving: A Tale of Two Interfaces

481

As well as rating their preferences, the respondents gave some useful feedback about icon design. Key points from these comments related to advice on cornering speeds and representation of gearshift information. Icons for cornering speed received a mixed response. With further probing it transpired that participants did not want to receive such information while actually driving the corner (as was intended), as this could be distracting. Meanwhile, responses from SMEs and users differed for the gearshift indicators. A simple numerical gear icon was preferred by users over a more elaborate image of a gear ‘gate’ pattern. On the other hand, SMEs preferred the gate, but since they also rated the simpler icon a close second, the latter option was chosen to take forward.

3 Study 2 – Desktop Evaluation 3.1 Method Design. The principal aim of the desktop study was to evaluate users’ subjective responses to the two candidate designs for the HMI. The output of the user requirements questionnaire was used as input to the EID and DB designs described earlier. Information content and format was incorporated into the prototypes to reflect the opinions of users and SMEs as found in Study 1, with icons for information presentation being taken from the selection chosen by participants. A series of five driving scenarios was developed to represent various aspects of safe and/or eco-driving, and static exemplars for each version of the HMI were constructed to represent these scenarios. Both positive (i.e., desirable) and negative (undesirable) situations were represented. It is important to note that the scenarios and the associated HMI representations were carefully designed such that the information presented across each interface (EID and DB) remained the same – it is merely the format of presentation which was varied and evaluated. The scenarios were designed to represent likely situations which may occur during normal driving, with each interface presenting comparable information. Thus the experimental design comprised one independent variable (HMI design – EID vs. DB), while dependent variables covered performance measures of response times and accuracy in interpreting the scenarios. Qualitative analyses of participants’ descriptions of the displayed scenarios were used to infer the accuracy of their understanding. In addition, participants were asked to complete the System Usability Scale (SUS; [2]) as a quantitative reflection of their subjective opinions on usability for each HMI design. All participants were shown the scenarios for both HMI options in a counterbalanced repeated-measures design. Participants. Ten non-expert participants (mean age 43.8 years; SD = 10.6 years) were recruited from the Brunel University driving study participant pool, of whom six were female and four were male. All participants held current and valid driving licences, drove on a regular basis and had at least three years driving experience. These participants were a different sample from those selected for the questionnaire study. A minimum of ten participants was needed for the study in accordance with SAE Recommended Practice J2364, which suggests that for early development phases when using a mock-up or computer simulation, static task time averaged over ten participants should be less than 15 seconds [5].

482

M.S. Young, S.A. Birrell, and N.A. Stanton

Procedure. The desktop evaluation took place in a laboratory setting with the scenarios and interfaces being presented on a laptop PC. Participants were given a brief introduction to the Foot-LITE project and the aims of the study, and were informed of the basic principles of ‘Smart’ driving. In addition, participants were given a basic overview of the principles of each interface design (EID and DB). Following the briefing, the Smart driving scenarios for one of the interfaces were presented one at a time, followed by the scenarios for the other interface. As well as counterbalancing of the HMI variable, order of presentation of the scenarios was randomised to minimise any learning effects. After each scenario was shown, participants were asked the same set of questions: 1. What did this scenario mean to you? 2. What aspects indicate this? 3. How would you change your driving to rectify this situation? Responses to these questions were recorded verbatim for further subjective analysis and to assess the accuracy of their response (i.e., did they identify what the interface was intending to display). Response times were also recorded with respect to their initial understanding of the display (i.e., how long it took them to verbalise what they saw on the display). After all scenarios were completed for one HMI design, participants completed the SUS questionnaire Following completion of the SUS questionnaire, participants were asked other more open questions regarding their thoughts on best or worst aspects of the interface, aspects they would change, and finally at the end of the study which interface they ultimately preferred. 3.2 Results and Discussion The response time for each participant to verbalise their interpretation of the scenarios was recorded. Absolute time to respond was analysed, irrespective of whether the response was correct. Mean response times for the EID were 0.4s faster than the DB interface (8.0s vs. 8.4s respectively), although a Wilcoxon test revealed that this difference was not statistically significant (Z = -0.56; p = 0.58). Furthermore, the standard deviation of response times for the EID was also lower than the DB (4.0s vs. 5.1s respectively), suggesting more consistent (and maybe therefore predictable) response times. It is notable that the response times recorded here are well within the 15 second rule for static task completion as suggested by Green [5] and as part of SAE Recommended Practice J2365, thus implying safe use of either of these in-vehicle HMIs while driving. Both interfaces showed some degree of a learning effect, in that response times reduced for the scenarios presented later. When reviewing the transcripts, a common theme emerged that users were either slow to grasp or misunderstood the EID interface with respect to headway, interpreting the display as representing an obstacle or hazard instead – as some sort of collision warning system. These results indicated that headway needs to be made clearer to users, either by explanation beforehand or through a change to the design. Conversely users were quicker to identify the scenario which represented poor fuel economy and excessive acceleration with EID compared to DB. Accuracy of participants’ responses to the scenarios was coarsely classified into ‘fully correct’, ‘partially correct’ (i.e., some elements of safe and/or eco-driving were

Design for Smart Driving: A Tale of Two Interfaces

483

not correctly identified), or ‘incorrect’. Approximately one-third more participants correctly identified the scenario with the EID interface compared to the DB display. At the same time, more participants incorrectly identified the scenario with EID. Thus more participants only partially identified the scenario with the DB compared to EID. With more fully correct responses on the EID, the results suggest that this interface allows both the safety and fuel economy aspects of the design to be more clearly identified. However, it is a notable concern that five participants could not identify correctly any aspect of the interface, with all of these incidents occurring on the very first slide presented to the participants. Again, this implies that there is a steeper learning curve with the EID and highlights the need for a detailed explanation or ‘tour’ of the interface before use in an actual driving situation. A potential reason why the DB interface may not have performed as well in these initial tests may be the use of warning icons, which are notoriously misunderstood or ignored, particularly in motor vehicles. A study conducted by the AA in 20062 found that almost half of women and a third of men could not correctly identify symbols for frequently used functions or basic warning lights. The mean SUS scores showed that the DB design was rated higher than the EID (74.1 vs. 67.8 respectively), but a Wilcoxon test revealed that this difference was not significant (Z = -0.65; p = 0.61). As with response times, the standard deviations in the data again indicated more consistent ratings for the EID interface. Thus there was a larger discrepancy between those who liked or disliked the DB interface. Moreover, despite the mean rating for the EID interface being lower than the DB, more participants rated the EID higher than DB. Five participants rated EID higher, four DB higher while one gave identical ratings. Finally, analysis of participants’ responses to the general questions for each interfaces revealed some recurring comments which may be used to form the basis of future iterations to both the DB and EID interfaces. Other comments supported proposed aspects of the individual interfaces, particularly with the EID. For instance, participants clearly linked their acceleration/braking patterns directly to fuel economy with the EID design. This represents a significant achievement for the EID in helping drivers to understand key factors in eco-driving and linking these to positive changes in their driving behaviour. The EID interface also integrates such information, by keeping all aspects within the ‘Green Zone’ to maximise fuel economy. Conversely, participants also noted this as a potential limitation with the DB interface, in that it requires the user to look in two separate places, hold this information in memory and then make a mental leap to link the two together. A very clear response from the vast majority of participants regarding the EID interface was that after an initial confusion as to what the display was showing, they all quickly learned to understand the display. A degree of confusion was to be expected given the limited exposure to the interfaces, particularly for the EID interface as it is a novel design, and thus will be unfamiliar to car drivers. If the EID design should be taken forward to development, then a clear explanation of the interface, or brief tour, would be needed to overcome the learning effect. On the DB display, the driving information (journey time, fuel consumption etc.) aspects received mixed reviews from the participants. Some stated that the 2

http://www.theaa.com/motoring_advice/breakdown_advice/warning_lights.html

484

M.S. Young, S.A. Birrell, and N.A. Stanton

information was useful and gave lots of useful detail, while others suggested that there was too much information and they would not want this to be displayed during driving. Other suggestions were that the DB interface was a little inconsistent with respect to how some information was presented as what the driver ‘should’ do (e.g. change to 4th gear), while other information was feedback on what the driver ‘did’ do (e.g. cornering speed was too fast). On the positive side, participants liked the fuel economy gauge and gearshift indicator, as well as the lane deviation information and headway representation. Overall the DB interface was well received by the participants who found it easily understandable. Some stated that the design was recognisable for car drivers, being similar to existing displays, whereas EID was a new concept for drivers to engage with. After completing the study, participants were asked which interface they preferred. Six participants stated a preference for the DB design, two for EID and two had no preference. This split was also reflected in the SUS ratings, with those who stated a preference for a particular interface rating it approximately 20 points higher than the other. Those who stated no preference rated both interfaces approximately the same on SUS. However, it is interesting to note that those who preferred the DB design still performed better with EID, generally responding 0.7s faster to each Smart driving scenario and correctly identifying more EID scenarios when compared to DB information. Meanwhile, the two participants who preferred the EID also performed better with this interface.

4 General Discussion Results from the user requirements questionnaire suggested what information participants wanted to see on the Foot-LITE system, and also gave a clear indication on what format they wanted the relevant information to be presented in. In addition, the process generated a ‘bank’ of icons to represent different aspects of Smart driving, and icon preferences were recorded. These responses were used to develop the two candidate HMI designs taken forward into the desktop evaluation study. As an early human factors analysis of proposed interface designs for a ‘Smart’ driving advisor, the current study has served its purpose. Compared to the original specification from the CWA output [1], participants viewed fuel economy as a key component of eco-driving, while headway, driver alertness and hazard warnings were preferred aspects of safety information. Conversely, it was clear from the questionnaire that feedback on cornering speed was not seen as beneficial while driving the corner, due to the driver being otherwise occupied during this task. There was a clear preference for real-time feedback on driving on a moment-to-moment basis. The desktop evaluation revealed that whilst drivers largely preferred a traditional dashboard-style layout, the more adventurous ecological interface design had performance advantages once the initial learning curve had been established. Participants also made clearer links between their driving style and changes to fuel economy with the EID interface, ratifying the integrated and direct perception nature of this design. Safety aspects of driving seem to be well represented on both designs, with the popup icons of the DB interface preferred.

Design for Smart Driving: A Tale of Two Interfaces

485

The subjective responses given by the participants form a good basis for potential iteration of both interfaces. Recommended changes on the EID include the removal of real-time warning information regarding inappropriate cornering speed, which was deemed confusing and potentially distracting if given while negotiating the corner. Furthermore, safety aspects should be made clearer – perhaps using pop-up icons as with the DB, use of the eco-driving divisions on the display should be reconsidered, and gear change information should be incorporated onto the EID, given that this was popular on the DB. Specific recommendations for the DB interface were that journey information should be limited, the fuel economy gauge could be changed to a Smart driving meter, giving an indication of safety performance, and gearshift information should be simplified. The next stage of the research is to refine each interface design based on the output of the current studies, and to take both options forward to more dynamic testing in the Brunel University Driving Simulator. Acknowledgements. Foot-LITE is sponsored by the EPSRC, the DfT, and the Technology Strategy Board under the Future Intelligent Transport Systems initiative. The Foot-LITE consortium is comprised of: MIRA, TRW, Autotxt, Hampshire County Council, the Institute of Advanced Motorists, Ricardo, TfL, Zettlex, the University of Southampton, Newcastle University, and Brunel University.

References 1. Birrell, S.A., Young, M.S., Stanton, N.A., Jenkins, D.P.: Improving driver behaviour by design: A Cognitive Work Analysis methodology. In: Applied Human Factors and Ergonomics 2nd International Conference, CD-ROM (2008) 2. Brooke, J.: SUS: A “quick and dirty” usability scale. In: Jordan, P.W., Thomas, B., Weerdmeester, B.A., McClelland, A.L. (eds.) Usability Evaluation in Industry, pp. 189– 194. Taylor and Francis, London (1996) 3. Burns, C., Hajdukiewicz, J.: Ecological Interface Design. CRC Press, Boca Raton (2004) 4. Gibson, J.J., Crooks, L.E.: A theoretical field-analysis of automobile driving. The American Journal of Psychology 51, 453–471 (1938) 5. Green, P.: Estimating compliance with the 15-second rule for driver-interface usability and safety. In: 43rd Annual Meeting of the Human Factors and Ergonomics Society (1999) 6. ISO 11429: Ergonomics – System danger and non-danger signals with sounds and lights 7. ISO 15008: 2003: Road vehicles – Ergonomic aspects of transport information and control systems – Specifications and compliance procedures for in-vehicle visual presentation (2003) 8. PACTS: Beyond 2010 – a holistic approach to road safety in Great Britain. Parliamentary Advisory Council for Transport Safety, London (2007) 9. Vicente, K.: Cognitive work analysis: Toward safe, productive, and healthy computerbased work. Lawrence Erlbaum Associates, Mahwah (1999) 10. Young, M.S., Birrell, S.A., Stanton, N.A.: Safe and fuel efficient driving: Defining the benchmarks. In: Bust, P.D. (ed.) Contemporary Ergonomics 2008, pp. 749–754. Taylor & Francis, London (2008)

Supervision of Autonomous Vehicles: Mutual Modeling and Interaction Management Gilles Coppin1,2, François Legras1,2, and Sylvie Saget1,2 1

Institut Télécom; Télécom Bretagne; UMR CNRS 3192 Lab-STICC 2 Université européenne de Bretagne {gilles.coppin,francois.legras, sylvie.saget}@telecom-bretagne.eu

Abstract. As the capabilities of Unmanned Vehicle Systems increase, the tasks of their operators become more and more complex and diverse. Accordingly, the interfaces of these UVSs must become smarter in order to support these tasks and assist the operator. In this paper, we discuss how an Operator Support System can leverage dynamic interaction strategies to modulate the workload of the operator and how it could impact trust in automation. Keywords: unmanned vehicles systems, interaction, dialogue, trust in automation.

1 Introduction Unmanned Vehicle Systems (UVSs) will considerably evolve within the next two decades. In the current generation of UV Systems, several ground operators operate a single vehicle with limited autonomous capabilities, whereas, in the next generation of UV Systems, a ground operator will have to supervise a system of several cooperating vehicles performing a joint mission, i.e. a Multi-Agent System (MAS) [2,3]. In order to enable mission control, the autonomy of the vehicle and of the system will increase and will require new and richer forms of Human-system interaction. The operator of an UVS performs two tasks at the same time: (1) mission command & control, and (2) interaction with the system. Both tasks induce varying workloads during the system’s operation. In current systems, interaction is barely distinguishable from command & control. But as UVSs evolve, the interaction workload will rise as the operator will have to switch between several vehicles, streams of data, decision support systems, and so on. As Mouloua et al. have pointed out [1], the complexity of the interaction mechanisms between the operators and the system, and the complexity of the mission should vary in opposite ways. If mission complexity goes up (higher workload), interaction complexity should go down (simpler interaction with the system). In this perspective, we propose to dynamically leverage different interaction strategies in the context of operator-UVS activities in order to modulate the workload of the operator. In the remainder of this section, we describe the roles of the Operator Support System in these future UVSs and the kind of interaction that they should D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 489–497, 2009. © Springer-Verlag Berlin Heidelberg 2009

490

G. Coppin, F. Legras, and S. Saget

support. In section 0, we discuss the interaction models that provide a basis for our work. Section 0 exposes our rationale for workload adaptation via dynamic interaction strategies, and section 0 discusses trust in automation and mutual modeling between the operator and the interactive system. 1.1 Roles of an Operator Support System in Future UVSs From the perspective of this paper, the main component of the Ground Control Station1 (GCS) of an UVS is the Operator Support System (OSS) i.e. the information system that allows the command & control of the UVS by the operator. The OSS should support the following functions: • Situation Awareness (SA) and information display, the OSS should make available to the operator all mission-related data i.e. information about vehicles (position, status, etc.), information about mission elements (objectives, threats, maps, etc.), data from vehicles’ sensors (e.g. video feed); • Vehicle command & control (C2), the OSS should allow the operator to issue commands to the Vehicles of the system and track their progress; • Decision support, as UVS operational capabilities increase, the OSS should provide the operator with decisional aids of some sort (e.g. semi-automatic route planning, information fusion). As a support to these new functions, the OSSs will integrate two additional roles in the UVS: • Interaction management, considering the wealth of possible two-way interactions at different levels (SA, C2, decision support, see section 0) between the OSS and the operator, it becomes necessary to regulate these interactions. In similar ways to Human interaction, it becomes necessary to manage turn-taking, interruptions, task priorities and so forth.

Fig. 1. Illustration of the concept of semantic bridge on multi-UV system

1

For the sake of simplicity we will consider only UVSs with a single GCS and a single operator in the scope of this paper.

Supervision of Autonomous Vehicles: Mutual Modeling and Interaction Management

491

• Semantic bridge, as interactions within the UVS become more abstract (higher level) than the current remote operation, some kind of translation service becomes necessary. There is a need to convert operational commands from the operator side to machine-understandable instructions, for instance if the operator instructs a system of several vehicles to perform a search on a specific zone (see Fig. 1. ). Similarly, one has to convert multiple sensor data or Decision Support System (DSS) output to Human-understandable form. 1.2 Types of Interaction and Non-Understandings As was stated earlier, several types of interaction can happen between the operator and the OSS. Furthermore, several parts of the UVS can initiate interactions, notably: operator, vehicles and decision support systems. Table 1. illustrates some of these types interaction. Table 1. Types of interaction Initiator addressee

Interaction type

Examples in natural language

Operator Vehicles

Command & Control

DSS Operator

Decision support

Operator OSS

Interaction management

Operator OSS

Interface manipulation

“Send the two nearest Vehicles to patrol zone 1” “The intruder detected on the north fence seems headed toward hangar 2” “Do not disturb me for the next two minutes” “Display the power lines and zoom on the alarm”

Every type of high-level interaction2 can produce non-understandings. Humans usually deal with non-understanding by clarifying things. This is also applicable to operator-OSS interactions, with the OSS asking to re-phrase or for some clarification as in the following example: - Operator: “Send UV2 to the building” - OSS: “Which building?” - Operator: “The building near the airport” - OSS: “North of the airport?” - Operator: “Yes” - OSS: “OK.” Here, the feedback “OK” allows the participants to consider the non-understanding to be solved. Although this example is expressed in natural language interaction, a similar exchange could happen via more conventional means, with for example the OSS highlighting the building on the map.

2

High-level considered as opposed to low-level interaction like clicking on a button.

492

G. Coppin, F. Legras, and S. Saget

2 Interaction 2.1 Interaction as a Collaborative Activity The traditional view of interaction [4,5] defines it as a unidirectional process resulting from two individual activities: the generation of a communicative act by the speaker and the understanding and interpretation of this communicative act by the addressee. The success of an interaction is a consequence of the cooperative attitude of the speaker (his sincerity, his relevance, etc.). Consequently, the production of a suitable communicative act is concentrated on a single exchange and a single agent. The complexity (i.e. the cognitive load) of such a process is high. Moreover, the set of possible strategies to produce and understand a communicative act is very limited. The addressee having a passive role, positive feedbacks such as “Okay”, “Mhm”, “uhuh”, nodding, etc., signaling successful understandings, are not necessary. Finally, non-understandings are regarded as communication errors, which have to be handled by additional complex mechanisms. In contrast with this traditional view, collaborative models define interaction as a bidirectional process resulting from a single social activity [6]. Interaction is considered as a collaborative activity between dialog partners oriented toward the shared goal of reaching mutual understanding. Mutual understanding is reached through negotiation on interpretation, which is a form of interactive refinement of understanding until a sufficient point of intelligibility is reached. Consequently, the production of a suitable communicative act can be distributed between several exchanges from several dialog partners. The effort needed from each partner in such a process is lower than the effort produced by the speaker in the traditional view of interaction i.e. each partner has to contribute in some way to the interaction. The addressee has an active role, explicit and implicit feedbacks are required in order to publicly signal successful understandings. Finally, note that non-understandings are fully expected events in the process of negotiation and part of the model. 2.2 Interaction as a Subordinate Activity As usual for goal-oriented interaction, the operator of an UVS is engaged in two activities: achievement of the mission and interaction with the OSS. As stated by Clark [6,7]: “Dialogues, therefore, divide into two planes of activity. On one plane, people create dialogue in service of the basic joint activities they are engaged inmaking dinner, dealing with the emergency, operating the ship. On a second plane, they manage the dialogue itself-deciding who speaks when, establishing that an utterance has been understood, etc. These two planes are not independent; for problems in the dialogue may have their source in the joint activity the dialogue is in service of, and vice versa. Still, in this view, basic joint activities are primary, and dialogue is created to manage them.” Interaction is defined by the dialog partner's goals to understand each other, in other words to reach a certain degree of intelligibility, sufficient for the current purpose. That means that:

Supervision of Autonomous Vehicles: Mutual Modeling and Interaction Management

493

• Perfect understanding is not required. The level of understanding required is directed by the basic activity (i.e. the mission) and the context (e.g. time pressure); • The attention of the operator (and the associated workload) is split between the two activities. As we consider that interaction is subordinated to mission achievement and that the operator has finite cognitive resources, interaction complexity should vary depending on the complexity involved by the mission [1]. Indeed, the collaborative effort for a basic activity (i.e. not subordinated) has to be optimized, whereas the collaborative effort for a subordinated activity has to be minimized [8]. Such a behavior is rational (i.e. coherent) at both the collaborative and individual levels [9].

3 Load Adaptation Collaborative models of interaction have already been used for the design of OSSs, for example in the WITAS [10] and GeoDialogue [11] projects. The model of collaboration that we propose to use and have outlined in section 0 has two main characteristics: (1) the interpretation process is simple and therefore allows for realistic implementation; (2) it supports a wide range of methods for generation and interpretation. As we will see in the following subsections, the OSS can adjust its cooperative attitude toward the operator by choosing among different strategies for handling understanding of communicative acts, as well as for generation and interpretation of communicative acts. 3.1 Understanding and Non-Understanding In all collaborative models of interaction, the reaction to the understanding of a communicative act is the same: a positive feedback. These feedbacks range from a simple “OK” to a comprehensive recast of the act (e.g. “OK. Sending UV2 to building B213.”)3, but from a collaborative point of view, they have the same value. Concerning non-understandings, several strategies are available, that we illustrate here in decreasing order of collaborative effort on the part of the OSS: • Proposing a refinement or clarification to the operator (disambiguation) e.g. “Do you mean the building north of the airport?”; • Requesting a refinement or clarification to the operator (disambiguation) e.g. “Which building?”; • Asking for a recasting of the whole communicative act e.g. “Please rephrase.”; • Postponing or giving up, because something more urgent is coming e.g. “Hold on… Intruder detected at XY”. The more collaborative effort is put in the non-understanding management by the OSS the less effort is needed from the operator i.e. the operator only needs to answer by “Yes” (or “No”) to the question “Do you mean the building north of the airport?” 3

Again, this example uses natural language for simplicity, but the OSS could very well recast such a communicative act graphically by highlighting UV2 on the display and showing its route toward building B123.

494

G. Coppin, F. Legras, and S. Saget

whereas he or she has to recompose his or her request if the OSS replies “Please rephrase.” Thus, the management of non-understandings constitutes the first degree of freedom in setting the cooperative attitude of the OSS. 3.2 Generation and Interpretation Similarly, there exist many different strategies with varying levels of complexity for generating and interpreting communicative acts. Our model allows the use of many different strategies e.g. • Basing interpretation solely on keywords recognition (most basic form of interaction); • Selfish attitude: considering solely one’s own beliefs i.e. not taking into account what the operator knows or is supposed to know. With this strategy, one does not take into account what the other knows, has perceived or how he or she refers to particular objects e.g. if the OSS uses only its own terminology about mission objects for generating of interpreting communicative acts; • Cooperative attitude: considering solely the other’s beliefs or knowledge e.g. if the OSS adopted a selfish attitude, the operator would have to adopt a cooperative one in order to interact; • Mutual awareness: considering the part of the situational context with is accessible by oneself and by the other e.g. if the OSS uses its own terminology only if the operator has shown sign that he or she understands it; • Perspective taking on addressee’s point of view on mutual awareness; • Higher levels of consideration of each other’s mutual beliefs are possible, but are rarely deployed. These strategies are listed in order of ascending complexity [12]. Complexity concerns Human beings, who tend to rely on the simplest sufficient strategy for the current purpose, but concerns also the OSS if these strategies are to be implemented. The choice of a strategy for generation and interpretation constitutes the second degree of freedom in setting the cooperative attitude of the OSS. 3.3 Rationale for Adjusting the Cooperative Attitude of the OSS One can define the cooperative attitude of the OSS as its level of contribution to interaction at a given moment, expressed by the choices of strategies for (1) nonunderstanding management and (2) generation and interpretation. A high level of cooperative attitude on the part of the OSS will require less effort from the operator in order to interact with the OSS, thereby allowing him or her to concentrate on the mission. Therefore, it would seem logical to set the OSS at the maximum level of cooperative attitude throughout the whole mission.4 On the contrary, we argue that it should be beneficial to set the OSS to a lower level of cooperation attitude in some circumstances. 4

We will not consider computational power in the scope of this paper, even though it could indeed happen that some interaction strategies are too cost intensive in some circumstances.

Supervision of Autonomous Vehicles: Mutual Modeling and Interaction Management

495

A low level of cooperative attitude on the part of the OSS will force the operator to put more effort in the interaction process. Such an effort will notably lead to: • Maintaining the operator’s situational awareness (SA) i.e. it is easier to lose track of what is happening if all the operator has to do is answering by “Yes” or “No” to the requests of the OSS, particularly during low stress phases of the mission; • Improving the operator’s feeling to be part of a team (working conjointly) with the OSS; • Constructing and maintaining a common language. Therefore enabling the use of “cooperative” or “mutual awareness” strategies (see section 0) later on e.g. by using the “selfish” strategy at one point, the OSS will force the operator to learn its terminology, which will allow for more effective interaction later; • Developing more accurate model of each other’s capabilities by using different interaction possibilities. Of course, during high-stress, high workload phases of the mission, the OSS should bear the brunt of the interaction workload, therefore relieving the operator. This balancing effect will be all the more pregnant that the operator has had the opportunity to interact at different levels with the OSS.

4 Mutual Modeling 4.1 System Predictability and Mutual Modeling Being solely focused on the workload assessment may lead to some limitations in setting an adequate mode of man-machine cooperation. Therefore we expect an adequate interaction manager to include or rely on a model of cooperation between the operator and the system. We consider here the concept of cooperation in the sense of Klein et al.'s approach [13] involving the notions of basic compact, common ground, predictability and directibility. We will not address the basic compact (that represents the underlying agreement of working as a team and of following a cooperative behavior) and the common ground (that we have proposed to enrich in our approach as described here below). Predictability and directibility are on the contrary central in the approach. According to Klein et al., predictability means that the operator is able to anticipate the future behavior and strategies of the system - including interaction strategies - so that (s)he can better synchronize and coordinate with it. Still according to these authors, directibility means that the operator is able to "guide" the system towards a desired behavior. According to this approach this means that both operator and system should have a model of each other. This mutual modeling can be used for a better cross-estimation of mutual performances and for feeding an optimized allocation mechanism functioning at meta level. But it also opens to the setting of a different kind of relationship between the operator and the system based on trust.

496

G. Coppin, F. Legras, and S. Saget

4.2 Trust Classical models of trust are trying to associate the concepts of workload, trust and global performances. Lee's model [14], for instance, proposes to compute a dynamical level of trust through a first order autoregressive equation as follows: Trust(t) = α1 Trust(t-1) - α2 Fault(t) + α3 Fault(t-1) + α4 perf(t) - α5 perf(t-1)

(1)

where perf is the system's (i.e. operator + machine) performance and fault represents a fractional variation of the control system in regard with the reference values. Still according to Lee et al. [i], one can derive from this equation a percentage of work allocation to the automation, defined through: work_auto(t) = work_auto(t-1) + β1 (Trust(t) - Sc(t)) + β Indiv + ε

(2)

where work_auto(t) is the percentage of work allocated to the machine at time t, Sc(t) is the level of self-confidence, Indiv is a constant individual bias depending on the operator and ε a random noise. Even if possibly adapted to our context of multi vehicle control, this kind of model obviously ignores the effect of adequate (or inadequate) interaction between the operator and the system on trust and - possibly mutual - reliance. Based on our approach of adaptive interaction management, we can propose to enhance the classical approaches with new indicators, especially such as the level of understanding that may have been reached and, as a correlate, the relevance of the communication strategy of the automata according to the context and the operator workload. Thus taking into account interaction strategies and modes allow us to go beyond classical performance (observation of behavior) analysis and to include complementary aspects of trust such as process (understanding of causal mechanisms) and purpose (intend of use) aspects.

5 Conclusion This paper aims at emphasizing the role and interest of interaction management in man-machine cooperation and adaptive authority sharing. Dynamical management of interaction may not only have direct impact on the global workload of the operator while impacting the interaction-related workload, but also have a direct impact on the operator's trust in the system. Increasing the level of trust in the system should have at least indirectly - a positive impact on the way the operator uses the automation and should consequently lead to performance enhancement.

References 1. Mouloua, M., Gilson, R., Kring, J., Hancock, P.A.: Workload, situation awareness, and teaming issues for UAV/UCAV operations. In: Proceedings of the Human Factors and Ergonomics Society, vol. 45, pp. 162–165 (2001) 2. Johnson, C.: Inverting the control ratio: Human control of large, autonomous teams. In: Proceedings of AAMAS 2003 Workshop on Humans and Multi-Agent Systems (2003)

Supervision of Autonomous Vehicles: Mutual Modeling and Interaction Management

497

3. Legras, F., Coppin, G.: Autonomy spectrum for a multiple UAVs system. In: COGIS 2007 - COgnitive systems with Interactive Sensors (2007) 4. Grice, H.P.: Logic and conversation, Syntax and Semantics, Speech Acts. Syntax and Semantics, Speech Acts 3, 43–58 (1975) 5. Searle, J.R.: Speech acts: an essay in philosophy of language. Cambridge University Press, Cambridge (1969) 6. Clark, H.H.: Using language. Cambridge University Press, Cambridge (1996) 7. Bangerter, A., Clark, H.H.: Navigating joint projects with dialogue. Cognitive Science 27, 195–225 (2003) 8. Cherubini, M., van der Pol, J.: Grounding is not shared understanding: Distinguishing grounding at an utterance and knowledge level. In: CONTEXT 2005 (2005) 9. Saget, S., Guyomard, M.: Goal-oriented dialog as a subordinated activity involving collective acceptance. In: Proceedings of Brandial 2006, University of Potsdam, Germany, pp. 131–138 (2006) 10. Lemon, O., Gruenstein, A., Cavedon, L., Peters, S.: Collaborative dialogue for controlling autonomous systems. In: Proceedings of the AAAI Fall Symposium (2002) 11. Cai, G., Wang, H., MacEachren, A.: Communicating Vague Spatial Concepts in HumanGIS Interactions: A Collaborative Dialogue Approach. In: Proceedings of the Conference on Spatial Information Theory 2003, pp. 304–319 (2003) 12. Bard, E.G., Anderson, A.H., Chen, Y., Nicholson, H., Havard, C.: Let’s you do that: Enquiries into the cognitiveburdens of dialogue. In: Proceedings of DIALOR 2005 Designing for appropriate reliance. Human Factors, vol. 46, pp. 50–80 (2005) 13. Klein, G., Feltovich, P.J., Bradshaw, J.M., Woods, D.D.: Common ground and coordination in joint activity. In: Rouse, W.R., Boff, K.B. (eds.) Organizational simulation. Wiley, New York (2005) 14. Lee, J.D., Moray, N.: Trust, self-confidence, and operator’s adaptation to automation. International Journal of Human-Computer Studies 40, 153–184 (1994)

Conflicts in Human Operator – Unmanned Vehicles Interactions Frédéric Dehais, Stephane Mercier, and Catherine Tessier 2/10 av E. Belin 31055 Toulouse Cedex 4 France, CSDV ISAE-ONERA [email protected], {mercier,tessier}@onera.fr

Abstract. In the context of the supervision of one or several unmanned vehicles by a human operator, the definition and the dynamics of the shared authority among these agents is a major challenge. Indeed, lessons learned from modern aviation reveals that authority sharing issues between aircrews and on-board processes are remarkable precursors of air accidents (twenty accidents in the last twenty years). The analysis of these events highlights that the authority of the on-board processes is designed a priori and fails to adapt in case of conflict with the aircrew’s actions. Moreover the poor design of the HMIs (e.g. : there is no dialogue between artificial and human agents) and the complexity of the interactions may lead the aircrews to lose situation awareness and to enter a perseveration syndrome. We present the basic concepts of an approach aiming at dynamically adjusting the autonomy of an agent in a mission relatively to its operator, based on formal detection of conflict. An experimental set-up is under construction to assess our hypotheses. Keywords: human automation conflits, adaptive autonomy, authority sharing.

1 Introduction 1.1 The Man-System Conflict, an Alternative to Human Error Traditionally, measurement and prediction of man-system performance relies on a deviation between the human operator’s real activity and the expected task [1]. Many formal approaches based on human errors detection are proposed [2,1] the real activity is reconstituted dynamically with behavioral data (i.e. the human operator’s actions on the HMI) and then is compared to a database that represents the reference activity (e.g.: flight procedures). These methods show their limits as they face two epistemological problems of the existence and of the status of human error. Indeed: • The formalization of the human error is risky as long as the concept of norm to which it relates is not always defined. Moreover, it is recognized that operators change the norms for new procedures that are more effective and safer [3]; • The occurrence of an error does not necessarily lead to the degradation of the mansystem interactions. For example, experts pilots inevitably make errors but fix most D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 498–507, 2009. © Springer-Verlag Berlin Heidelberg 2009

Conflicts in Human Operator – Unmanned Vehicles Interactions

499

of them [4]. In fact, this production-detection-fixation of the error is a characteristic of expertise[5]; • The error plays a key role in self-assessment of the human operator’s performance (e.g. fatigue) [6]; An alternative approach is to consider the concept of conflict as a metric to assess the man-system performance. Indeed, sociology[7] shows that its presence is an indicator of a dynamic of tension and opposition between individuals; in distributed artificial intelligence, the occurrence of a conflict is used to diagnose associated interactions problems between artificial agents [7]. In addition, research conducted in a flight simulator reveals that the emergence of conflicts in the flight management [8] is a remarkable precursor of the degradation of the aircrew-system interactions. In particular, these experiments show that the pilot, when facing a conflict, has a trend to persevere in solving the problem instead of monitoring vital parameters. Then, from a formal point of view, this concept is more interesting than error: it does not systematically relate to a norm. The conflict between agents exist without any norm or truth : its essence is contradiction, i.e. the difference between two points of view [7]. Therefore it appears possible to detect conflicts through the analysis of phenomena of opposition, interference or behavioural inconsistencies between several (human and/or artificial) agents. 1.2 Conflicts in Human Operator-Automation Interactions The aim of this study is to consider the conflict within operators – systems interactions. Therefore this section proposes to examine the nature of those conflicts that occur with artificial systems capable to “perform, in series or in parallel, sorting out, decision and diagnosis functions that are usually assigned to the operator" [9]: • Automation changes the operator's role to a supervisor's role. This has led to decrease the occurrence of benign errors (e.g. : routine errors [10]) but has created authority conflicts with human operators whose consequences are much more serious [11]: − Increasing automation leads to the occurrence of programming errors (e.g. : erroneous waypoint selection). These errors are rarely detected because the operators have a trend to check only the final states of the system but not its intermediate states [12]. Moreover, this kind of errors have the insidious characteristic to be latent before emerging in the process; − From the human operator's point of view, the representational conflicts are the consequences of a lack of knowledge of the automation [13]: the human operator often fails to behave accurately (i.e. take over vs. reconnect automation). Moreover, the mode transitions of the automation are rarely detected or understood by operators and may have catastrophic consequences [14]; • These systems are not suited to the operators, because their designers, from a tradition of engineering, have a very logical and procedural vision of human reasoning that does not take into account the end users [12]; • The role and the authority of the agents are pre-defined and fail to adapt in abnormal operation. Moreover the poor design of the HMIs (e.g. there is no

500

F. Dehais, S. Mercier, and C. Tessier

dialogue between artificial and human agents) and the complexity of the interactions may lead the aircrews to lose situation awareness [15].

2 Concepts for Human Operator – Unmanned Interactions The definition and the dynamics of the shared authority among a human agent and unmanned vehicles is a major challenge to avoid conflicts. Consequently the aim of the work is to: • • •

detect any unexpected change in authority; identify the agents that are concerned by the change so as the consequences of the change; adjust the artificial agents’ plans or give advice to the human agents so that conflicts should be avoided or solved.

The approach that is proposed is to ground the study of authority sharing dynamics on objective components of the mission. More precisely, the concept of resource is considered as the key concept, e.g. tasks within the plans of the unmanned vehicles are considered as resources that may or may not be available for further tasks [16]. Therefore authority conflicts between agents can be defined as resource conflicts and a Petri net based model of resources allows them to be clearly modelled (see section 3). The following concepts described in the next sections will be used [18]. 2.1 Conflict Detection: Situation Assessment The situation assessment task [17] constantly analyzes the current state of the system; it compares the expected results of actions performed by the agents and the operator with the actual results and detects gaps that may appear. Moreover situation assessment estimates the possible future states of the system, according to the action plan and evolution models of the environment, of the system itself, and of all other relevant objects. This allows potentials conflicts to be detected. A conflict represents a mismatch between a plan of actions and its execution. Unexpected events coming from the environment can make the plan outdated, this is a conflict with the environment. If the plan shows inconsistencies due to an input of the operator, this is a conflict between the agents and the operator. A third objective of situation assessment is the recognition of procedures initiated by the operator. The only information about an operator's intentions is provided by her/his inputs into the system. However, if a pattern is recognized from these inputs and can be associated with one or several procedures known by the agents, this constitutes a valuable knowledge about the non-explicit goals of the operator and may contribute to anticipate her/his future actions. 2.2 Planning and Task Allocation Planning is one of the key tasks the agent should be able to execute. It lets the agent create structured lists of actions to perform, in order to achieve complex goals while satisfying the mission constraints. To do so, a model of the possible actions must be

Conflicts in Human Operator – Unmanned Vehicles Interactions

501

provided to coordinate them in a logical manner: for instance task B cannot be executed as long as task A is not completed, so the condition « done(task_A) » is a precondition (or: a resource) for task B. As the agent has to react to unexpected events occurring during the mission, the plan of actions has to be continuously updated. This process is called replanning, and is a mandatory ability of the agent; in order to be useful, it also has to respect time constraints and be executed quickly. Besides organizing the tasks that will be executed in a consistent manner, the planning process is also in charge of allocating them to the entities. For each individual task and depending on the current system situation, it assigns it either to one or several agents or to the operator. Among the considered criteria are global performance, safety, permissions and the operator’s workload but also her /his situation awareness. Capacity models for each entity have to be provided in order to describe the nominal application conditions and a current estimation of the available resources for the tasks that each entity is likely to execute. 2.4 Conflict Detection If conflicts that are likely to impact the mission are detected, they have to be solved. If several conflicts are detected simultaneously, they have to be prioritized according to the risk they involve. The system is designed so that the agents adapt their behaviours thanks to the replanning process and task update. However, inconsistencies may appear as some goals or constraints may not be satisfied. Situation assessment points out the origin of the conflicts: unavailable resources, timeouts, contradictory goals, unsatisfied constraints… Therefore choices have to be made among the tasks and goals according to the involved risks and according to who (an agent or the operator) will be able to achieve them safely. This is one of the key points of authority sharing and adaptive autonomy: task reallocation for the best possible mission achievement, under the requirement that each agent and operator within the system is aware of this reallocation and of its outcome on the mission.

3 Resources to Support Authority Sharing Dynamics 3.1 Resources The first step to get a formal and operational definition of adaptive autonomy is to model the basic concepts of a mission operated by physical agents and human operators. Our approach consists in modeling the resources needed for the mission accomplishment and how they may be shared (among the operators, the agents, the environment). Authority sharing during the mission consists in detecting conflicts on the resources, assess their impact on the mission and manage them in order to satisfy the objectives while satisfying some criteria like security or performance. We will consider a mission as a set of resources that are arranged in time so that the goal of the mission should be reached. Resources include: physical objects, energy, permissions, information, algorithms, logic conditions and tasks. The plan to reach the objectives of the mission is built from the resources and the constraints that connect them with each other.

502

F. Dehais, S. Mercier, and C. Tessier

3.2 Resource: A Generic Petri net Model Resource Net. Resources interact with each other as a result of the allocation process, with a using resource on one side and one or several used resources on the other side. However interactions affect resources and their internal states in different ways depending on their characteristics. In order to model the different kinds of resource interactions, we have identified several properties, considered from the used resource point of view: • a consumable resource is available in a limited quantity; once it is spent, no using resource can use it anymore; a non-consumable resource is restored in its entirety after use; • a shareable resource can be allocated to several using resources simultaneously; an unshareable resource can be allocated to only one using resource; • a resource can be preempted if, while it is used by one using resource, it can be taken by another one thus rejecting the first; notice that a resource cannot be shareable and preemptable.

Fig. 1. A generic resource Petri net

Conflicts in Human Operator – Unmanned Vehicles Interactions

503

Fig. 2. A generic interface Petri net

The resource net (see Figure 1) is a generic representation that shows (1) the properties of the resource and (2) the three possible states of the resource: absent, present and unallocated (not used) and allocated (used). The transitions from one state to another one are triggered by external events coming from the situation assessment function or other resources, changing the marking of the net according to the properties of the resource. If the resource is: • shareable (4): path A is used. The resource can be allocated several times and released, until the last resource using it releases it, leaving it unallocated; • unshareable (3), not preemptable (2): path B is used. The resource cannot be reallocated until the unique using resource releases it; • unshareable (3), preemptable (1): path C is used. After allocation to a first using resource, the resource can be preempted and reallocated to another one. Consequently the first using resource is rejected; • consumable (5): (paths A, B and C). When the resource is spent, it cannot be allocated anymore. Interface Net. The interface net (see figure 2) matches the availability of the used resource (offer) with the request of the using resource. This results in the allocation process: when offer and request are matched, that is when the used resource is available for allocation and at the same time the using resource requests it, the allocation transitions within the interface net and the used resource net are simultaneously fired through transition fusion. Additionally the interface net shows the status of a request, whether absent, pending or being satisfied. As the dependency relationship of the using resource is specific for each needed resource, there are as many interface nets as resources required by the using resource.

504

F. Dehais, S. Mercier, and C. Tessier

A using resource is characterized by its dependency upon the used resource(s), which is set in the initial marking: • initialization-dependent (1): the using resource needs the used resource only to become available; if the used resource disappears, the using resource remains. In the interface net, path A is used. The allocation of the used resource triggers the production of the using resource, making it available for further use. Simultaneously, its need for the used resource disappears: its request is cancelled. • presence-dependent (2): in the interface net, path Bis used. The allocation triggers the production of the using resource as before. However the used resource must remain allocated to the using resource for it to remain available. If the used resource disappears (failure, preemption by another using resource), the using resource is cancelled. • end-dependent (3): in the interface net, path C is used. The using resource needs the used resource to completely satisfy its request so as to become available, thus releasing the used resource.

4 Experimental Environment and Scenario In order to test our approach for adaptive autonomy and shared authority with concrete applications, the framework for experiments in real conditions with human operators interacting with “autonomous” vehicles is already being designed. 4.1 The Scenario The scenario (c.f. figure 3) is the localization and assessment of a fire by a UGVs in a partially unknown area. The mission for the UGV and the operator consists in looking for starting fires around a factory or a facility and determining its properties (localization, size, dynamics) so that it could be quickly put out. The area is hardly accessible, dangerous and partially unknown (no precise and updated map available). Additionally, the scenario could be extended with the possibility for the UGV to carry an extinguisher. This would allow the UGV to directly put out a very starting fire or delay a fire evolution in a given area, e.g. close to sensitive items. As the extinguisher would be very small, its use would have to be carefully chosen. Figure 4 shows the scenario. The area where the UGV evolves is divided into two parts: the start area which is known (a map is available), the search area which is partially unknown; • the known area includes obstacles to avoid, but there are localized on a map; • the human operator has no direct visual contact with either the UGV nor the outdoor environment; • there are sensitive items in the known area, which have to be protected against the fire threat coming from the partially unknown area; • the fires may evolve, possibly blocking known paths or endangering the UGV; • a fire evolution is determined by the objects that can burn; • the access paths to the search area are limited and narrow, making the access to the zone difficult.

Conflicts in Human Operator – Unmanned Vehicles Interactions

505

Fig. 3. Experimental environment

Fig. 4. The experimental set up. On the left, a beta version of ground station, on the right a top view of the Emaxx.

Additionally, some hazards may impair the mission: • communication breakdowns between the UGV and the operator; • dynamic and uncertain environment in the search area (obstacles, fires); • possible loss of GPS positioning; • sensor failures.

506

F. Dehais, S. Mercier, and C. Tessier

4.2 The Experimental Set-Up ISAE is developing an experimental set-up (see figure 2) composed of a ground station and several a UGV (Emaxx). The UGV may be controlled either using a remote control (in case of problems) or a graphical interface (normal use). They carry several sensors (GPS, inertial sensors, scenic camera, ultrasounds, odometer) and are able to follow a set of waypoints autonomously. Algorithms are currently being developed to be implemented onboard in order to equip them with decision abilities (planning, situation assessment, authority sharing management).

5 Future Work and Conclusion We have presented the general principles and some basic concepts for an approach of operational adaptive autonomy. Using situation assessment as a conflict detector within the system (agents + operator) or between the system and the environment, it is possible to identify the key elements of the conflicts so as to solve them in a relevant manner. This is indeed the very basis of dynamic shared authority or adaptive autonomy, i.e. reallocating tasks within the system so that conflicts should be solved safely with every entity being aware of what is being performed. Task reallocation will take into account the current capacities of the agents and operators, the operators’ desires, the constraints of the mission constraints, the priorities of the goals. Early conflict detection will allow agents to adapt their behaviours to the estimated operator's intentions as long as main constraints and objectives are respected, therefore improving the overall system performance. However, whether the operator intervenes or not, the agents are still expected to have the means to react “alone” to key issues. Another aspect of adaptive autonomy is the fact that agents should be able to alleviate the operator's workload, e.g. relieving her/him of routine tasks and let her/him focus on key tasks of the mission. Again this is based on mutual situation monitoring and assessment and a better allocation of tasks and resources within the system when the context changes. Current work focuses on a formal definition of mission execution, with the dynamic aspects of the basic concepts we have defined: goals, resources, constraints, tasks and on fine identification of what precisely is involved in task reallocation. At the same time experiments with firstly with one Emaxx UGV will be prepared at ISAE to assess our concepts for adaptive autonomy in real conditions. Reliability, overall performance and the operator's satisfaction will be among the observed criteria.

References 1. Callantine, T.J.: Activity tracking for pilot error detection from flight data. Technical report, NASA (2002) 2. Heymann, M., Degani, A.: On formal abstraction and verification of human-machine interfaces: the discrete event case. NASA Technical Memorandum (2001)

Conflicts in Human Operator – Unmanned Vehicles Interactions

507

3. Chaudron, L., Dehais, F., Le Blaye, P., Wioland, L.: Human activity modelling for flight analysis. In: Proceedings of HCP 1999, Intl. Conf. on Human Centered Processes, Brest, France (September 1999) 4. Rizzo, A., Bagnara, S., Visciola, M.: Human error detection processes. Internation of ManMachine studies 36, 253–259 (1987) 5. Alwood, C.M.: Error detection processes in statistical problem solving. Cognitive science 8, 413–437 (1984) 6. Wioland, L.: Etudes des mécanismes de protection et de détection des erreurs. Contribution à un modèle de sécurité écologique. PhD thesis, Université Paris V (1997) 7. Castelfranchi, C.: Conflict ontology. In: Müller, H.-J., Dieng, R. (eds.) Computational conflicts - Conflict model ling for distributed intelligent systems, pp. 21–40. Springer, Heidelberg (2000) 8. Dehais, F., Tessier, C., Chaudron, L.: Ghost: experimenting countermeasures for conflicts in the pilot’s activity. In: IJCAI 2003 Conference, Acapulco, Mexico (August 2003) 9. Amalberti, R.: Une réflexion sur le rôle des hommes dans les systèmes intelligents et automatisés. In: Le rôle de l’être humain dans les systèmes automatisés intelligents, Varsovie, Pologne, RTO HFM (2002) 10. Reason, J.: Human error. Cambridge University Press, Cambridge (1990) 11. Sweet, W.: The glass cockpit. In: Proceedings of IEEE Spectrum, pp. 30–38 (1995) 12. Amalberti, R.: La conduite des systèmes à risques. PUF (1996) 13. Sarter, N.B., Wickens, R., Kimball, S., Marsh, R., Nikolic, M., Xu, W.: Modern flight deck automation: pilot’s mental model and monitoring patterns and performance. In: Proceedings of the International Symposium on Aviation Psychology, Dayton (2003) 14. Mumaw, R., Sarter, N., Wickens, C.: Analysis of pilots’ monitoring and performance on an automated flight deck. In: International Symposium on Aviation Psychology, Colombus, Ohio (2001) 15. Dehais, F., Goudou, A., Lesire, C., Tessier, C.: Toward an anticipatory agent to help pilots. In: Proceedings of the AAAI Fall, Arlington, Virginia USA (2005) 16. Mercier, S., Tessier, C., Dehais, F.: Basic concepts for shared authority in heterogenous agents. In: AAMAS 2008 Workshop on Coordination, Organisations, Institutions and Norms in agent systems (COIN 2008), Estoril, Portugal (2008) 17. Lesire, C., Tessier, C.: A hybrid model for situation monitoring and conflict prediction in human supervised “autonomous” systems. In: Proceedings of the AAAI 2006 Spring Symposium "To Boldly Go Where No Human-Robot Team Has Gone Before", Stanford, California (2006) 18. Mercier, S., Tessier, C., Dehais, F.: Adaptive autonomy for a human-robot architecture 3rd national conference on Control Architectures of Robots (CAR 2008), Bourges, France (May 2008)

Ergonomic Analysis of Different Computer Tools to Support the German Air Traffic Controllers Muriel Didier, Margeritta von Wilamowitz-Moellendorff, and Ralph Bruder Institute of Ergonomics, Darmstadt University of Technology, Petersenstrasse 30, D-64287 Darmstadt, Germany [email protected]

Abstract. The Institute of Ergonomics of the Darmstadt University of Technology supported the German Air Traffic Control in analyzing how the implementation of new tools influences the work procedures of the air traffic controllers and if a modification of the work structures is necessary. New tools, such as “Data link” (contact with the pilot via an electronic connection), the Problem Display Window and the Main Data Window that replaces the former flight strips, were tested in simulations. The goals of the study were to select the most appropriate investigation method to be used, taking into account the specific features of the controllers’ work, to test different tools in a simulated environment and to analyze the impact of the new tools on the air traffic controllers. Keywords: ergonomic analysis, air traffic controller, eye movements.

1 Introduction: The Air Traffic Controllers’ Activities Eurocontrol estimated that the air traffic movements in the year 2015 will be twice as high as in the year 1999. The air traffic has in the last decades and will continue in the future to continuously increase, which has a large affect on Europe. The national organizations have to adapt to the evolutions of air traffic to ensure the high reliability of the air control activities. The German Air Traffic Control (DFS) continuously develops and optimizes the tools, the work structures and the work procedures of the air traffic controllers to cope with the changes in air traffic. One of the major issues when developing new tools or procedures is the understanding of the impact they will have on the activities of the air controllers and their mental processes and to evaluate if the expectations are going to be met. In this context, the Institute of Ergonomics (IAD) of the Darmstadt University of Technology in Germany supported the DFS in analyzing how the implementation of the new tools influences the work procedures of the air traffic controllers and if a modification of the work structures is necessary. Air traffic controller activities. Air traffic controllers currently work in a team, consisting of one “planner controller” and one “executive controller”, responsible for one sector. The role of the “planer controller” (also called “planer”) is to monitor the D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 508–517, 2009. © Springer-Verlag Berlin Heidelberg 2009

Ergonomic Analysis of Different Computer Tools

509

air traffic in his sector as well as in the adjacent sectors to anticipate the airplane movements in order to avoid conflicts and choose the most adequate flight routes for each airplane. The “executive controller” (also just called “executive”) is responsible to inform the pilots about any route modification in his sector that the planer decided. One sector is not only defined horizontally as a certain part of airspace, but also vertically separating the sky into two slices of “high sectors” and “low sectors”. To keep up with the increase in air traffic, new tools are regularly tested in simulations to evaluate their influence on the activities of the controllers. This paper will report on the impact of different tools that have been evaluated in the year 2008: • The introduction of “Data Link” for 35% of the simulated air traffic, which is representative of the expected amount of airplanes using Data Link in the near future. Data link gives the possibility to inform the pilot on his route changes via an electronic connection instead a radio-voice connection. It decreases the number of verbal communication done by the Executive Controller. • The introduction of a Problem Display Window (PDW). The PDW shows potential conflicts of airplane trajectories on a coordinates system that represents the distance between the airplanes and the time before entering in the predefined safe zone. The main goal of this display is to support the planer when monitoring the future air traffic. • The use of a Main Data Window (MDW), where all the information about each flight is represented on a separate monitor, replacing the former paper flight strips.

2 Choice of the Investigation Method The task of air traffic controlling is mainly related to mental activities. Air traffic controllers have to detect the airplanes, analyze their trajectory, plan the near future, make decisions and submit information to the pilots. The main objective is to permanently insure the safety distance between airplanes, but a second objective that receives more and more weight is to support the pilot to choose the optimal trajectory to reduce the consumption of fuels. Several methods are available to analyze mental activities. Subjective measurements such as self-reports [1], [2], performance measurements [3], body posture/movements [4], objective measurements such as physiological indicators [5] are typical examples of methods often applied to analyze the mental activities. However, some of those approaches suffer partially from limitations in predictive power and general applicability [6]. Another possibility to analyze the mental activities is the recording of eye movements [7]. The eye is a sensory/perceptual organ for gathering information of our environment. Once processed, the collected information is used for further mental processes. The gaze pattern, based on the observation of fixations and saccades (jumps between the fixations), provides information on attention, interest, possible motivations, and intentions. This method, a non-verbal and non-invasive method, is often mentioned in the literature for the collection of information on the mental processes of a subject. Just and Carpenter [8] developed one of the first theories that clearly associated the gaze movements with the underlying cognitive processes.

510

M. Didier, M. von Wilamowitz-Moellendorff, and R. Bruder

Fig. 1. Eye tracking system (SMI)

The analysis of the gaze movements has been successfully used in different research fields (air planes, vehicles, usability etc.). The method does not interrupt the activities of the subject and lets the subject move freely. This is a big advantage in comparison to other evaluation methods of the mental processes and especially important for air traffic controllers. But the method also has some limitations, for example, it cannot be guaranteed that looking at an object is systematically followed by the mental processing of this object. However, in the case of the air traffic controllers, the study focuses on the comparison between different electronic tools and their impact on the work procedures and not on a deep understanding of the mental processes. In the case of gazes that are not followed by a processing of the information, we can then assume that they are coincidental and happened at random in all conditions. Additionally, the DFS recorded supplementary data: a simplified version of the NASA-TLX, some DFS specific situation awareness questionnaires, debriefing interviews, log files of the technical simulation parameters (such as number, type, and direction of airplanes). The eye tracking system. A head mounted eye tracker (from SMI) was chosen, due to the numerous body and head movements of the controllers during their work. The selected system is a helmet equipped with an eye tracking system Figure 1 based on the cornea reflex method. The output of the eye-tracker is an mpeg video of the regarded scene on which the fixations are superimposed in form of a colored circle.

3 Experimental Set Up Simulation of air traffic situations. The study took place in the air traffic simulator of the DFS. The room is very similar to a normal control room and the whole electronic equipment is identical to the real workstations. A simulation session lasts between 5 and 10 days, each day between 2 and 3 scenarios, named “runs” (one-hour long simulation scenario), are evaluated. Several runs are used as training runs to insure that the functions and the manipulation of the new tools are correctly understood.

Ergonomic Analysis of Different Computer Tools

511

The results presented in this paper are based on one simulation session of 10 days that took place in 2008. During this simulation 20 runs were conducted, 7 of them were training-runs. The main objective of the Institute of Ergonomics of Darmstadt during this simulation session was to analyze the influence of three different tools on the activities of the controllers: Data-Link, Main Display Window (MDW) and Problem Display Window (PDW). To analyze the use of those tools, two head mounted eye-tracking systems have been set up. This configuration allowed the simultaneous collection of the eye-movement patterns of a working team (“executive controller” and “planer controller”), which increases the reliability and the quality of the analyses. Two test supervisors attended the runs and monitored the eye trackers.

3 Collection and Preparation of the Data The first step of the analysis consisted in manually encoding the videos with the help of the software “Interact” (from Mangold). The second step of the analysis was performed with a statistical program “Statistical Package for Social Science” (SPSS). First step: Encoding. The encoding of videos delivers information about the duration, the frequency and the sequence of some pre-defined events. The quality of the analysis is highly dependent on the choice of the event and on the precision of the encoding. Based on the previous experiment and on the specific question of this simulation session the following fixation areas were defined as follows: Radar, PDW, MDW, radar (partner), MDW (partner), Partner, Displays (other controllers), other controllers, radio, phone, notes, NASA-TLX, Miscellaneous Figure 2. Moreover, some events related to the type of activities were defined: tasks related to control activities or planning activities, and whether they were related to actual or future traffic.

Fig. 2. Work place of an air traffic controller with pre-defined fixation areas for the encoding

512

M. Didier, M. von Wilamowitz-Moellendorff, and R. Bruder

Second step: Statistical analysis. In such simulation sessions, the number of subjects (between 8 and 14) as well as the number of trials (between 20 and 60) is greatly reduced, due to the availability of the simulation infrastructure and personnel. Moreover, a controller is qualified only for a group of sectors, which reduces the number of potential subjects. These limitations explain why only descriptive statistics were performed for the analysis and some results need to be confirmed in further investigations. However, past experience has shown that analysis performed in such simulation sessions is very valuable in the development and improvements of the working place of air traffic controllers.

4 Results Before presenting the results some explanations on the meaning of some analyzed variables are needed. It is necessary to distinguish the control activities and the planning activities during the use of the Air Situation Window (ASW or radar). Control activities are tasks that modify the course, the altitude and the speed of an airplane (e.g. activation of labels menu). The controller actively modifies the airspace configuration. The planning activities consist of the tasks that serve the generation of information and the development of alternative solutions. The controllers take information on the state of an airplane or many airplanes without modifying any parameters (e.g. distance between one airplane and another airplane). With control and planning activities a further distinction can be made between the activities that are oriented on the actual or on the future air traffic. The actual traffic is defined as airplanes that are directly under the responsibility of the controller because they are in his sector. The future traffic is defined as airplanes, on which an activity is accomplished although they are outside of the actually controlled sector. Another variable that needs to be clarified is the already mentioned difference between a high level sector and a low level sector. In a high level sector, in contrast to a low level sector, the airways are no longer predefined with routes and nodes, rather the pilots are free to choose the shortest way to reach their objectives. This is called the “direct-routing-structure”. Thus, the air traffic in a high level sector is distributed over the complete airspace. The airplanes do not have to cross over certain predefined nodes which should reduce the number of conflicts that the controller has to handle. This decrease should also be reflected in the number of control activities of the controller: the controllers do not need to modify the course, altitude or speed of the airplanes. Hypothesis flight level and control activities: In the high level sector, the number of control activities will be lower than in the low level sector. As expected, the average number as well as the average duration of control activities are lower for the high level sector than for the low level sector Figure 3. It shows that a direct-routing structure requires less active intervention of the controller in the airspace configuration.

Ergonomic Analysis of Different Computer Tools

513

Fig. 3. Average number (left) and average duration (right) of control activities related to flight level

A further question was to see if this effect could be observed for the future as well as for the actual traffic or if there are any differences. The results show that there is a similar decrease to the one observed for the control activities in the high level sector for actual traffic, whereas future traffic displays no major difference between the flight levels. Discussion: the results confirm that the implementation of a direct-routingstructure reduces the proportion of control activities. With this structure the number of interventions and modifications of the airspace configuration is reduced. This is not the case for the future traffic because the main control activity concerning the future traffic is the activation of a label to “accept” the control of an airplane. This activity is not influenced by a change of the routing-structure: all airplanes have to be “accepted” in the controlled sector independently from the route-structure. Hypothesis flight level and planning activities: In the high level sector, the number of planning activities will be higher as in the low level sector. Due to the fact that the pilots of airplanes in high level sectors decide on their own route, it is more difficult for the air traffic controller to foresee the course of an airplane although he is still responsible in regulating the traffic such that no airplane enters a conflict zone (defined over the time gap or the distance between two airplanes). To complete their tasks, the controllers have to modify the way they normally work to perform the planning activities. The results in Figure 4 confirms the hypothesis that in the high level sector the controllers perform more planning activities and that their duration is longer than in the low level sector.

514

M. Didier, M. von Wilamowitz-Moellendorff, and R. Bruder

Fig. 4. Average number (left) and average duration (right) of planning activities related to flight level

Fig. 5. Average number (left) and average duration (right) of planning activities related to flight level and actual/future air traffic

The analysis of the actual and future traffic confirms that there is an increase of activities in the high level sector but only for the future traffic Figure 5. For the actual traffic they are similar, with slightly lower scores in the high level sector.

Ergonomic Analysis of Different Computer Tools

515

Fig. 6. Average number (left) and average duration (right) of control activities related to Data Link

Discussion: with a direct-flight-structure the list of nodes over which the airplanes have to fly is disappearing, such that the controller cannot foresee, as precisely as needed, where the airplanes are going to enter the controlled sector. They have to acquire more information on the route of airplanes that are outside their own sector before they can accept them. For the planning activities in high level sectors the controllers have to collect more information on the route to insure that the minimum time gap and distance limits between airplanes will not be exceeded. Another focus of the study was the influence of the use of Data Link (digital connection to the pilot) as a communication medium between the air traffic controllers and the pilots compared to radio-voice communication. Hypothesis Data Link and control activities: The number and duration of control activities will be increased with the introduction of Data Link. The results show that there is an increase (about 17%) of control activities when about 35% of the airplanes are equipped with Data Link Figure 6. Even though the main control activities are not controller-pilot communications, it is possible to conclude that there is a link between the introduction of Data Link and the increase of control activities although the increase is not as high as expected. Another effect of the introduction of Data Link was observed when considering the number and duration of direct verbal communication with controllers outside their team. The controllers communicate within the team as well as with teams from neighboring sectors (when the distance allows it) to agree upon or confirm some decisions and to prevent or support when conflicts arise. The results show a clear decrease in this communication when 35% of the airplanes are equipped with Data Link (Figure 7). It seems that the transfer of information to the pilot over an electronic

516

M. Didier, M. von Wilamowitz-Moellendorff, and R. Bruder

Fig. 7. Average number (left) and average duration (right) of direct verbal communication with controllers from the neighboring sectors related to Data Link

medium offers the controllers of neighboring sectors the possibility to directly collect information on the status of some changes (open, confirmed, refused) or on approaching, potential conflicts that could need support. To support the controllers in preventing conflicts from arising, a “Problem Display Window”, a small window within the ASW (radar) on which the time or distance critical parameters of two airplanes are displayed, was introduced. The use of the PDW was on average about 10 times per run. With such a low frequency, it is difficult to make comparisons with the whole number of activities. The second difficulty resides in the very large differences observed between the runs: from 0 glances to more than 40 in a run. No link could be found between the type of run or the number of critical situations. The individual differences in the control strategy or in the learning phase could explain these differences, although it was not possible to discover, in particular due to the small number of glances, what factors (e.g. controller/planner, actual/future air traffic, high level/low level sector, years of experience of the controllers…) influenced these results. Further investigations are needed to understand why such large differences can be observed.

5 Conclusion Three major results are presented in this paper and based on the eye-tracking data it could be demonstrated that: • There are links between the patterns of eye-movements and the type of air traffic structure: in the “high level sector”, in which the airplane can choose its own route based on a direct-flight-structure, there is a transfer of the controllers’ activities.

Ergonomic Analysis of Different Computer Tools

517

There are less control activities but more planning activities due to the necessity to more frequently collect information about the route of the airplanes that are going to enter the controlled sectors. • The use of “Data Link” by 35% of the controlled airplanes slightly increases the number and the duration of control activities compared to the use of the formerly used “radio contact”. However, at the same time, it reduces the number and duration of direct verbal communications with teams from neighboring sectors. Further investigations with a prevalent used of Data Link that analyses all types of verbal communications in detail should be performed. • Related to the use of PDW, it is not possible to establish a link with some factors that could influence the observed results. The use of the PDW seems to be dependent on some individual control strategies which could be interpreted as a weakness in its design: if only a few controllers are using it, it is probably due to the fact that it does not provide the right information at the right time. The analyses confirmed some expected results, but showed some unexpected findings as well, which should lead to a further improvement of the future tools for the air traffic controllers. Additional investigations have to be conducted to insure that the results are not coincidental, due to the small number of subjects, and that the improvements have a positive impact on the working structure.

References 1. Phillip, U., Reiche, D., Kirchner, J.H.: The use of subjective rating. Ergonomics 14, 611– 616 (1971) 2. Pfendler, C.: Measuring mental workload with the NASA Task Load Index. Zeitschrift für Arbeitswissenschaft 44(3), 153–163 (1990) 3. Thakay, R.I., Touchstone, R.M.: Detection Efficiency on an air traffic control monitoring task with and without computer aiding. Aviation, Space, and Environmental Medicine 60(8), 744–748 (1989) 4. Harrigan, J.A., Rosenthal, E., Scherer, K.R.: The new Handbook of Methods in nonverbal behaviour research. Oxford University Press, London (2005) 5. Henderson, P.R., Bakal, D.A., Dunn, B.E.: Cardiovascular response pattern and speech: a study of air controllers. Psychosomatic Medicine 52, 17–26 (1990) 6. Wierwille, W.W., Rahimi, M., Casali, J.G.: Evaluation of 16 Measures of Mental workload using a simulated flight task emphasizing mediational activity. Human Factors 27(5), 489– 502 (1985) 7. Schroiff, H.-W.: Zum Stellenwert von Blickbewegungsdaten bei der Mikroanalyse kognitiver Prozesse. In: Issing, L.-J., Midckasch, H.D., Haack, J. (eds.) Blickbewegung und Bildverarbeitung, Peter Lang, Frankfurt (1986) 8. Just, M.A., Carpenter, P.A.: Inference processes during reading: reflection from eye fixations. In: Senders, j.W., Fisher, D.F., Monty, R.A. (eds.) Eye movements and the higher psychological functions. Erlbaum, New York (1978)

Behavior Model Based Recognition of Critical Pilot Workload as Trigger for Cognitive Operator Assistance Diana Donath and Axel Schulte Universität der Bundeswehr München (UBM), Department of Aerospace Engineering, Institute of Flight Systems (LRT-13), 85577 Neubiberg, Germany {diana.donath,axel.schulte}@unibw.de

Abstract. Knowledge-based assistant systems are an approach to support operators in complex task situations, especially in vehicle guidance and control. The central idea is to introduce automation functions working in parallel to the human operator instead of replacing him. Like a human team member, an assistant system should be able to support a human operator according to his actual needs. Therefore, it needs capabilities to identify situations in which the human operator is overtaxed in order to transfer such situations into situations which can be handled normally by the assisted human operator. This paper will present a concept for a human behavior model based approach to subjective workload identification which uses a recognizable modification of human behavior occurring prior to severe performance decrements or errors. Therefore, behavior models of the operator, previously gathered within simulator trials, shall continuously compared with the actually observed behavior patterns in the same situational context. First results will be presented showing a modification of operator visual scanning behavior. Keywords: subjective workload, assistant system, eye movements, flight guidance, adaptive automation, behavior model.

1 Introduction One of the probably most challenging questions in the context of assistant systems is when a human operator really needs to be supported by the automation. A common approach is to provide adequate assistance functions according to the actual subjective workload level, to prevent unbalanced workload conditions which again might result in unwanted performance decrements or the occurrence of failures. Various approaches have been investigated trying to assess the state of the operators’ workload. These approaches range from estimations solely based upon the task load, e.g. caused by the occurrence of critical events, to the on-line assessment of the operators’ state by measuring performance or psycho-physiological parameters. Other approaches rely on models of human operator resources. For more information refer to [8, 13, 16]. This paper presents a behavior model based approach. The assumption is that there will be observable a significant change of human behavior patterns associated with a critical workload rise even prior to severe performance decrements or human erroneous action. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 518–528, 2009. © Springer-Verlag Berlin Heidelberg 2009

Behavior Model Based Recognition of Critical Pilot Workload

519

The following chapter will briefly describe our general idea to cognitive operator assistance systems. Then the approach to behavior model based workload determination shall be presented. Experimental findings conclude the paper.

2 Cognitive Operator Assistance At the UBM the approach of so called cognitive operator assistant systems has been followed now since the early 1990ies. During that long period of research and development various prototypes have been developed and successfully field tested (cf. CASSY e.g. [12]; CAMA e.g. [5]). In [11] the general approach to assistant systems is elaborated in a broader context. Such an assistant system is characterized by having its own situational understanding and working in parallel to and in cooperation with the human operator on the fulfillment of a common work objective. In contrast to conventional automation, which needs to be supervised by a human operator, a cognitive operator assistant system is able to act on its own initiative. Therefore, it continuously assess and interprets the environment against the background of the overall work objective. For a cooperative interaction with the human operator, [11] suggested some general guidelines for assistant systems: 1. “The assistant system has to be able to present the full picture of the work situation from its own perspective and has to do its best by own initiatives to ensure that the attention of the assisted human operator(s) is placed with priority on the objectively most urgent task or subtask.” 2. “If according to requirement 1 the assistant system can securely identify as part of the situation interpretation that the human operator(s) cannot carry out the objectively most urgent task because of overtaxing, then the assistant system has to do its best by own initiatives to automatically transfer this situation into another one which can be handled normally by the assisted human operator(s).” For the fulfillment of the second basic requirement for human-automation interaction the on-line determination of the operator workload is crucial. Earlier approaches in the field of knowledge-based pilot assistant systems solely rely on the assumption that the subjective workload rises with an increase of the objectively given task load. This results in proactive assistance functions offering e.g. automatic mission planning support, whenever an external mission event requires re-planning of the mission, no matter if this supportive function is needed by the pilot in that particular situation or not. Furthermore, assistant systems like CASSY or CAMA offer only reactive support so far, i.e. when an erroneous action of the flight crew already took place, resulting in a deviation of the observable course of events from the desired plan. Fig. 1. depicts the general idea of using a pilot behavior model as implemented by [14]. CASSY and CAMA were mostly trying to detect deviations of the aircraft behavior from the desired behavior, although claiming for the usage of pilot’s behavior models. This leads to a design which in general tries to correct already made errors, when they become visible to the assistant system either by observing the aircraft behavior or, to a minor extent, the pilot control actions.

520

D. Donath and A. Schulte

Situation

P

actual Pilot behavior

A/C actual aircraft behavior

situational knowledge of the assistant system

PE

expected A/C (Pilot) behavior

intent PIER

error

Fig. 1. Pilot Model and Intent & Error Recognition within CASSY/CAMA as triggering mechanism for assisting functions

As an addition, for the sake of earlier and somehow more sophisticated intervention of the assistant system, this paper advocates an approach of using pilot’s behavior based models for early excess workload identification. The following chapters will provide an overview of the concept and the acquisition of empirical behavior models.

3 Concept of Behavior Model Based Workload Determination In order to justify the proposed approach to more efficient human-automation interaction on the basis of excess workload triggered early intervention of an assistant system a few assumptions have to be made: • Human performance rapidly degrades at high subjective workload levels, i.e. the probability of human erroneous action goes up in coming close to what some authors call the red-line of workload [1]. • The behavior human operator’s exhibit in work situations in terms of interactions (information gathering and control, cf. [4] is, although being heavily task dependent, rather stable in normal workload conditions. • The behavior human operators exhibit under higher subjective workload conditions (i.e. approaching the red-line) changes considerably from normal behavior. The last bullet point is pretty much in accordance with an every day experience which most of us usually being an expert in car driving by oneself experienced when being the front set passenger in someone’s car. In this situation we can most likely identify just by observing the driver’s interactions with the car in the given situation excessive subjective workload situations. Especially, if we previously know the driver’s normal driving style, the detection of deviations resulting from additional tasks or excessive workload is very obvious. An influence of visual scanning behavior of pilots within different workload situations, evoked by an introduction of secondary tasks for example has already been investigated in several studies. [10] stated that a dynamic moment-to-moment workload can be assessed by the analysis of eye movement patterns in different tasks. Their hypothesis was that in case the individual’s workload increases, time pressure will force a modification of pattern of visual scan. Changes in such information gathering strategies imply that the operator is load-shedding or otherwise attempting

Behavior Model Based Recognition of Critical Pilot Workload

PE

situational knowledge of the assistant system

A/C

P

expected current pilot task

pilot interactions

Situation

521

PBI

expected A/C behavior

intent PIER

error

actual task situation / subjective workload information

DB

Fig. 2. Using human behavior models within an assistant system

to reduce the overall cognitive workload. Studies on scan patterns carried out by [9] revealed that increased workload is reflected in longer dwell-times in each position and the use of smaller number of display elements. In addition the scan pattern became much more variable between display elements. [3,6,18] analyzed the entropy (randomness) of visual scanning behavior. They found that patterns become less random as workload increased. Theses results were also approved by [7] within an experiment of scanning strategies of air traffic controllers. [2] analyzed the distribution of eye fixations by using the nearest neighbor index as and indirect index of mental workload. He found that the lowest spatial dispersion in a given task situation correlates with the lowest subjective workload perceived by the pilots. In our concept of behavior model based workload determination, we now try to capture the normal behavior patterns, i.e. interactions of the human operator with the system. We expect these behavior patterns being • heavily task dependent (In different task situations there will be observable different interactions e.g. concerning eye movement patterns for information gathering or manual interactions for control), • individually different (In fact, we have indication from experimental findings that there are inter-individual similarities [17], the consideration of which goes beyond the scope of this work at the moment), • specific to the interaction resources we consider (e.g. visual perception observed by eye-movement measurement). Having suchlike behavior models within an assistant system available, the assistant system having the ability to intervene in case of a detection of modified behavior. Therefore the assistant system continuously compares the expected human behavior, represented by a set of human behavior models for dedicated task situations within the database with the actual human behavior in form of pilot interactions in the context of the current situation determined by the Pilot Expert (PE) (see Fig. 2. ). Here, the following cases may occur:

522

D. Donath and A. Schulte

• Based upon the continuous determination of the current task context, being a central function of each assistant system, the assistant system could anticipate the expected interactions of the human operator (e.g. concerning eye movement patterns). • In parallel the assistant system should continuously observe the actual operator’s interactions (e.g. via eye movement measurement) and try to identify whether they are in accordance with the anticipated behavior patterns or not. • In case of an identified match between the actual human behavior and the expected, a further differentiation is necessary considering if the match belongs to a standard behavior of the pilot within normal workload conditions or to a known, previously assessed workload situation: − In case the match belongs to a standard workload condition, there is no need for an intervention of the assistant system. − In case of a match belonging to a known unbalanced, high workload condition the assistant system should intervene, according the second requirement of assistant systems for example by a reallocation of tasks between human and machine. • In case of identified mismatch between expected and observed behavior there can be assumed either of the following cases. The operator is working − on a different or additional task not known to the assistant system, potentially resulting in an increase of subjective workload. − on the expected task but suffers on psycho physiological degradations suchlike fatigue resulting in a change of the operator behavior. For an online detection of modified human behavior triggered by a critical workload situation the assistant system needs human behavior models which represent human behavior in normal/standard task situations as well as within different kinds of workload conditions, especially critical workload conditions. These models have to be gathered within simulator trials for each operator individual.

4 Experimental Model Acquisition In order to obtain task specific or even subjective workload related operator behavior models, experiments had to be conducted. During these experiments subjects shall be exposed to certain relevant and controllable work situations while their behavior shall be observed and recorded. As according to our application domain of flight guidance, a military helicopter transport mission was taken as example. 4.1 Experimental Design The main idea of the experiment was to exhibit subjects to reproducible task situations within normal/standard situations as well as within situations characterized by different subjective operator workload conditions. To achieve a change in human operator workload, accompanied with a change in human behavior according our previously stated assumption, these task situations were slightly varied, for example by increasing the task complexity. According to the prior findings, mentioned in chapter 0, our investigations of human behavior were limited to visual scanning behavior. During the experiment the

Behavior Model Based Recognition of Critical Pilot Workload

523

increase of task complexity was caused by a degradation of meteorological conditions, comprising a reduced visibility coupled with heavy turbulences. Furthermore the pilot was instructed to consider various mission constraints like timing requirements or altitude limitations. For a further increase of the pilot’s subjective workload secondary tasks were introduced. In this context attention was paid on using secondary tasks which did not directly influence the visual behaviour of the human operator by the task itself. As secondary tasks radio communication with ground forces and an audio response selection task, consisting of two different auditory stimuli in different intervals, representing radar and missile warning signals were used. According to our concept the following hypothesis should be evaluated within the experiment: • Human operator behavior is dependent the current task situation. • If the workload perceived workload by the human operator changes, a change in his behavior (here visual scanning behavior) occurs. So, the task situation had been used as independent variable and the eye-movement behavior as dependent variable. 4.2 Experimental Procedure For the experiments, a fixed-base helicopter simulator cockpit, equipped with faceLAB, a contact-free, video-based eye movement measurement system was used Subject was a military helicopter pilot at the age of 32 with a flying experience of 600 hours as helicopter pilot and 300 hours as commander. Before data collection, the pilot received a mission briefing and got familiarized with the flight dynamics and the cockpit layout. The total time of the experiment was about 45 minutes. During the mission various performance parameters, as well as eye movement data were recorded. To get an interrelationship between the human visual behavior and the subjective workload the operator completed a NASA-TLX subjective workload rating [Hart & Staveland, 1988] at the end of relevant task sequences, as listed in Fig. 3. fundamental task

increase task complexity

secondary tasks

TS1

enroute flight

TS2

enroute flight

timing constraint, degraded weather conditions

-------------

TS3

enroute flight

timing constraint, degraded weather conditions

reaction task (1 audio signal)

timing constraint, degraded weather conditions

reaction task (2 different audio signals)

TS4 TS5

enroute flight approach

timing constraint

timing constraint, degraded weather conditions

-------------

reaction task (1 audio signal)

Fig. 3. Workload stimulation within different task situations

524

D. Donath and A. Schulte

5 Results of Experimental Model Acquisition The experiment was analyzed in order to verify the following hypothesis: • Does the aforementioned workload stimulation, caused by increasing the task complexity and by the introduction of secondary tasks result in an observable increase of the perceived human operator workload? • Does an increase in workload cause a change of human interaction behavior (e.g. visual scanning patterns)? The following sections provide the detailed considerations of these questions on the basis of the experimental recordings. 5.1 Subjective Pilot Workload During the experiment subjective workload of the human operator was assessed at the end of each task situation, listed in Fig. 3. .Fig. 4. , represents the workload perceived by the human operator within these different task situations. Thereby two different basic tasks, enroute flight (TS1-TS4) and approach (TS5), were considered, as well as different task situations within the basic task of enroute flight, representing slightly different perceived workloads by the pilot. TS1 reflects the basic load without any additional stimulation. As intended the workload increases within TS2, which was characterized by an increase of the task complexity through the degraded weather condition. This might be responsible for the perceived increase physical demand and effort. With the introduction of the audio-reaction-task (TS3) only little increase of the perceived pilot workload could be observed. The introduction of a second audio signal, the pilot had to react on, did not lead to any further increases of the perceived workload. To get some information about the effect of different basic tasks (enroute flight, approach) at equal boundary conditions (task complexity/secondary tasks) (TS3 & TS5), an additional NASA-TLX was carried out at the end of an approach segment (TS5). Comparing TS3 and TS5 together showing only a slight decrease in workload during the approach segment as well as a slight different composition of the

Mental Demand

Performance

Physical Demand

Effort

Temporal Demand TS1

Frustration TS2

TS4

TS3 52%

49%

TS5

52%

48,3%

32% .

enroute flights

.

approach

Fig. 4. NASA-TLX Results within Task Situations (TS1-TS5)

Behavior Model Based Recognition of Critical Pilot Workload

525

workload occurred. The only finding, which seems to be remarkable regarding the NASA-TLX composition within the approach segment, was that the timing constraint of mission became more important for the pilot. 5.2 Visual Scanning Behavior According to our hypotheses mentioned above, we would like to identify a potentially influence of workload on human visual scanning behavior. Therefore, eye movement data related to each of the above depicted NASA-TLX plots were analyzed. The length of each segment was limited to 90 seconds, in order to obtain a close relationship to the subjectively rated workload. For the analysis of pilot visual behavior fixations were identified using the IDT-Algorithm [15]. Afterwards each fixation was assigned to world objects (see Fig. 1. ).

Fig. 1. World objects in visual field of helicopter simulator

In the first step transition probabilities between two consecutive fixations P(Sj|Si) were examined for each task situation. Each bar, depicted in Fig. 6. , represents the relative frequency of a fixation transition from a world object Si to a world object Sj. Surveying the diagrams (Fig. 6. ), a decrease in the number of different transitions between world objects, representing a decrease in the variety of scanned world objects by the pilot, accompanied with an considerably increase of only a few transitions was identified. This means, the more the workload perceived by the pilot increased, the more he focused his scanning behavior on only a few of available world objectives.

TS1

Si

TS2

Sj

Si

TS3

Sj

Si

TS4

Sj

Si

TS5

Sj

Si

Sj

Fig. 6. All kinds of occurring transition Probabilities P(Sj|Si) [ < 10% ] between two consecutive fixations

526

D. Donath and A. Schulte

TS1-Segment

TS2-Segment

Si

cv

map

mhdd1

alt

cv

Sj

cv

map

mhdd1

spd

cv

transition probability: 3%-5%:

map

map

5%-10%:

TS3-Segment

spd

TS4-Segment

TS5-Segment

cv

map

cv

map

cv

map

cv

map

cv

map

cv

map

> 10%:

Fig. 7. Occurring transitions of two consecutive fixations, considering only transition Probabilities greater than 3 percent

Within our experiment, the pilot mainly focused on transitions between map and center-vision. Even necessary information about the current speed, or the radar altitude was neglected by the pilot. During the approach, transitions to the center-vision as well as to the right-vision increased. In case we only consider transitions probabilities of two consecutive fixations the observed number of possible transitions was found to be considerably reduced. Fig. 7. shows only transitions the occurrence of which was found to be higher than 3 percent. The variety of different occurring fixation chains, representing different scanning patterns of the pilot, decreased considerably with rising perceived workload. The pilot neglected increasingly scanning of mission relevant information such as the current mission-time which is necessary to comply with given timing constraints, followed by an increasing neglect of primary flight instruments.

6 Conclusions and Perspective In accordance with our posted hypotheses we could demonstrate that the visual scanning behavior of the operator varies within different super-ordinate situations such as different flight phases (e.g. enroute, approach) but also within different task situations caused by an increase of the task complexity or the introduction of secondary task, which were coupled with an increase in subjective operator workload. Furthermore a modification in scanning behavior during an increase of perceived workload was observed. This was expressed in focusing on only a few, essential world objects under higher workload conditions. Future investigations will be undertaken, analyzing occurring fixation-chains with different chain lengths for verifying the collapse of higher order chains within increased workload conditions, which was discovered within video analysis of the pilot eye-movement behavior. Furthermore an analysis will be made which concentrates on task specific scanning behavior which may be expressed in different fixation patterns of different chain length. In a next step these data will be used for modeling individual visual scanning behavior within different task situations by using Hidden Markov Analysis.

Behavior Model Based Recognition of Critical Pilot Workload

527

References 1. de Waard, D.: The measurement of Drivers Mental Workload. Thesis, University of Groningen, Netherlands (1996) 2. Di Nocera, F., Camilli, M., Terenzi, M.: A random glance at the flight deck: pilot’s scanning strategies and real-time assessment of mental workload. Journal of Cognitive Engineering and Decision Making 1(3), 271–285 (2007) 3. Ephrath, A.R., Tole, J.R., Stepens, A.T., Young, L.R.: Instrument scan – Is it an indicator of the pilot’s workload? In: Proceedings of the Human Factors Society Annual Meeting, vol. 24, pp. 257–258 (1980) 4. Flemisch, F.O., Onken, R.: Human Factors Tool caSBAro: Alter Wein in neuen Schläuchen. Anthropotechnik gestern-heute-morgen, DGLR-Bericht 98-02 1998, 53–72 (1998) 5. Frey, A., Lenz, A., Putzer, H., Walsdorf, A., Onken, R.: In-Flight Evaluation of CAMA – The Crew Assistant Military Aircraft, Deutscher Luft- und Raumfahrtkongress, Hamburg, 17–20 (September 2001) 6. Harris, R.L., Glover, B.L., Spady, A.A.: Analytic techniques of pilot scanning behavior and their application. NASA technical paper 2525 (1986) 7. Hilburn, B., Jorna, P.G., Byrne, E.A., Parasuraman, R.: The effect of adaptive air traffic control (ATC) decision aiding on controller mental workload. In: Mouloua, M., Koonce, J. (eds.) Human Automation Interaction: Research and Practice, pp. 84–91. Erlbaum Associates, Mahwah (1997) 8. Kaber, D.B., Prinzel III, L.J., Wright, M.C., Clamann, M.P.: Workload-Matched Adaptive Automation, Support of Air Traffic Controller Information processing Stages. NASA/TP2002-211932 (2002) 9. Dick, A.O.: Instrument scanning and controlling using eye movement data to understand pilot behavior and strategies, NASA CR 3306 (1980) 10. O’Donnell, R.D., Eggemeier, F.T.: Workload Assessment Methodology. In: Boff, K.R., Kaufman, L., Thomas, J.P. (eds.) Handbook of Perception and Human Performance, Cognitive Process and Performance, vol. II. John Wiley and Sons, Chichester (1986) 11. Onken, R., Schulte, A.: System-ergonomic Design of Cognitive Automation – Dual-Mode Cognitive Design of Vehicle Guidance and Control Work Systems. Springer, Heidelberg (in press) 12. Prévôt, T., Gerlach, M., Ruckdeschel, W., Wittig, T., Onken, R.: Evaluation of intelligent on-board pilot assistance in in-flight field trials. In: 6th IFAC/IFIP/IFORS/IEA Symposium on analysis, design and evaluation of man–machine systems, Massachusetts Institute of Technology, Cambridge (1995) 13. Prinzel III, L.J.: Team-Centered Perspective for Adaptive Automation Design. NASA/TM2003-212154 (2003) 14. Ruckdeschel, W., Onken, R.: Modelling of Pilot Behaviour Using Petri Nets. In: Valette, R. (ed.) ICATPN 1994. LNCS, vol. 815, pp. 436–453. Springer, Heidelberg (1994) 15. Salvucci, D.D., Goldberg, J.H.: Identifying fixations and saccades in eye-tracking protocols. In: Proceedings of the Eye Tracking Research and Applications Symposium, pp. 71–78. ACM Press, New York (2000)

528

D. Donath and A. Schulte

16. Scerbo, M.W.: Theoretical perspectives on adaptive automation. In: Parasuraman, R., Mouloua, M. (eds.) Automation and human performance: Theory and applications, pp. 37–64. Lawrence Erlbaum Associates, Mahwah (1996) 17. Thomas, L.C., Wickens, C.D.: Eye Tracking and individual differences in off-normal event detection when flying with a Synthetic Vision System Display. In: Proceedings of the 48th Annual Meeting Human Factors and Ergonomics Society, Santa Monica (2004) 18. Tole, J.R., Stephens, A.T., Vivaudou, E.A., Young, L.R.: Visual scanning behavior and pilot workload. NASA Contractor Report 3717. Hampton, Virginia: NASA Langley Research Center (1983)

A Design and Training Agenda for the Next Generation of Commercial Aircraft Flight Deck Don Harris Department of Systems Engineering and Human Factors, School of Engineering, Cranfield University, Cranfield, Bedford MK43 0AL, UK [email protected]

Abstract. To maximize cost efficiencies the design of the modern commercial airliner flight deck must change quite radically. However, these efficiencies cannot be realized unless there are concomitant changes in the rest of the system, and in particular, the training aspect. This paper proposes a radical design agenda for the flight deck and outlines how efficiencies can be gained through a careful re-alignment and re-appraisal of the training requirements to operate this aircraft. Keywords: Flight Deck Design; Human Factors Integration; Training.

1 Introduction Human Factors Integration (HFI) or Human-System Integration (HSI) is essentially a human-centric acquisition management process. HFI/HSI considers not just the specification, design and development of the user-centric aspects of the system but it also takes into account other functions, such as training, personnel skills and availability, and organizational issues. It can broadly be characterized as socio-technical systems based approach for the requirements specification, design, development and in-service operation of large pieces of equipment. This short paper argues that as commercial aircraft are not specified by the end users (they are commercial products – the aircraft most closely matching the requirements of the purchaser is the one that is bought) many of the benefits from a wellmanaged HFI/HSI procurement process are not available to the airlines. Pilot training, especially its earlier stages, has also become divorced from the initial requirements of the airlines and it is also likely to fall further behind developments in flight deck design and operating concepts. As a result, the training burden on the airlines is increased in converting novice pilots into safe and efficient First Officers: much initial training is wasted as it is not required and a great deal of desirable instruction is not provided until they join the airline. However, this is intended to be a forward-looking discussion suggesting a direction for future flight deck design and pilot training. Evolutions in flight deck function and layout are hampered to some degree by regulatory requirements which specify tightly many aspects of design where true efficiencies could be achieved. However, perhaps more importantly, future concepts in Air Traffic Control/Air Traffic Management D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 529–536, 2009. © Springer-Verlag Berlin Heidelberg 2009

530

D. Harris

(ATC/ATM) will require new functionality to be developed for the flight deck and hence further new skills and abilities in the crew operating them will be required. The demands on the skill set of pilots could increase considerably. One of the greatest problems with Human Factors in the civilian domain is that it can be regarded almost be as a ‘hygiene factor’ (Harris, 2008). It almost goes without saying that a poor user interface will result in a flight deck which is difficult to use and which promotes error. However, providing a ‘good’ human-system interface does not ‘add value’ although a failure to provide a user-friendly flight deck does detract from the aircraft’s usability. As a result, it is often difficult to make a convincing argument for investing heavily in Human Factors. As a result flight deck interface inadequacies become a training or selection issue to be dealt with within the airline. In the Defense community, though, human performance is put at a premium. Military personnel must be able to use the equipment they are provided with in a range of stressful, high-pressure situations. The military customer has a further advantage in that any new equipment may be tailored precisely around the capabilities of their end users. Dedicated, comprehensive training can also be provided. While the military is unique in these aspects, by drawing upon the experiences of the Defense sector from studying best practice in the acquisition of equipment, a great deal of knowledge can be ‘spun out’ into the civilian aviation domain.

2 How Did We Get to the Current Status Quo? There is little impetus to change many aspects of the flight deck. This is largely as a result of external constraints and commercial issues unrelated to their design and functionality. As a result the evolution of commercial flight deck interfaces does not progress as quickly as ground-based applications or military aircraft cockpits. Even the interface in modern cars is evolving faster. Consider the civil aircraft certification requirements. No aspect of the flight deck associated with the control of an aircraft can be installed and operated without the approval of the airworthiness authorities. The airworthiness regulations – e.g. CS/FAR 25.1309 (ACJ 25.1309) require that systems such as the FMS/FMC (Flight Management System/Flight Management Computer) are required to show a level of reliability (in terms of system failure) in excess of 1 x 10-7 per flight hour. Attaining and demonstrating this level of reliability in a joint software/hardware system such as the FMS/FMC is no small matter and it certainly isn’t cheap. However, like the vast majority of the certification regulations only ‘machine’ issues are addressed. As many incidents, accidents and much research has demonstrated, the major source of unreliability in a joint cognitive system composed of a pilot and an aircraft lies on the human side of the equation. This is not to say that the pilot is to blame; far from it. The difficulties the pilots experience are as a result of the poor design of the humanmachine interface, such as it being incompatible with the pilots’ working environment; having unclear system logic and/or having to workaround shortcomings in the design of the system. As a result, the training provided to use some systems includes almost as much about avoiding error as it does about its actual use. A further factor in maintaining the status quo is the longevity of commercial aircraft. It is not uncommon for a basic design to be in production for 30 years and its

A Design and Training Agenda for the Next Generation

531

service life can itself be over 30 years. When it is also taken into account that the design freeze for the flight deck can occur five years before entry into service, it is possible that the basic design of the computers and their interface will have to survive for well over half a century. Revolution in flight deck interfaces (or even step changes in their evolutionary process) also cause logistical problems for the operators and hence are not undertaken with great frequency. Take a hypothetical example of the requirements imposed on an airline operator when performing a mid-life update on a commercial aircraft’s FMS from the current text-based interface to a more ‘modern’ graphical user interface (GUI). The airlines will need to make investments in equipment to train the pilots, for example developing computer-based training programs for introducing the new GUI and investing in updating part-task flight deck simulators and full-flight simulators. The re-equipment of the simulation facilities will also require approval by the airworthiness authorities. With regard to the training requirements the airworthiness authorities must approve all training courses. The trainers will also need training. Furthermore, there is also the expense of removing pilots from line flying onto training courses to instruct them on the operation of the new GUI to the FMS/FMC. The new FMS/FMC will also impose other new requirements, for example on the training of maintenance personnel and spares holdings. To re-iterate HFI/HSI all aspects of the system not just the user-centric design aspects. You cannot separate design issues from training issues (and other aspects of operation not directly involving the primary users). The list of reasons why not to adopt a particular new flight deck interface just goes on and on. Only occasionally do you get a relatively large change on the flight deck (for example, the new Airbus A380 is quite different). But these opportunities only happen rarely when a completely new type is introduced and even then, the airlines request commonality with other types to speed the process of pilots achieving a new type rating. As a result, interface design progresses slowly and deficiencies become training issues as this is perceived to be a cheaper solution. But while this may be true in the short term can his argument be supported in the longer term, taking a throughlife costing approach? This is where through careful design and analysis HFI/HSI can provide benefits even to an airline. Paying more in the short term may cost less in the longer term.

3 The Future Flight Deck The modern civil transport aircraft flight deck is still a highly evolved version of the cockpit of the first airliners flown in the 1930s. It is a place from where the pilots exert control over their aircraft. As such, it is still primarily optimized around manual control requirements. You simply have to look at the design of the major controls and the primary flight displays. As will be argued shortly, what pilots’ will require in the future is the ability to execute their desired 3D path through space (4D if time is also considered) hence they will need graphical flight planning and surveillance tools and the ability to visualize their flight plan relative to the terrain, airways (if they still exist), restricted airspace and other traffic. The manner by which the flight plan is controlled and executed is irrelevant. To illustrate, the Airbus A320 has 10 vertical

532

D. Harris

navigation (VNAV) modes and seven lateral navigation (LNAV) modes (all modes associated with aircraft control). However, most of these control modes are aircraft referenced, thus do not relate to its actual flight path. They do not correspond directly to ground based features (such as terrain) or ground referenced features (such as airways). Navigation is a ground-referenced problem but aircraft control is an airreferenced problem (e.g. stalling is an issue in a lack of airspeed not groundspeed; pitch attitude/angle of attack is not related to obstacle clearance or flight path). As a result, the navigation requirement and the control requirement have become separated to some degree. The modern flight deck needs to make these issues congruent again. A pilot’s job is to control an aircraft’s flight path. With the exception of take-off, there is no mandatory requirement to fly the aircraft manually. The conventional mode of operation of any commercial airplane is now one of supervisory control. The normal method of exercising control over the aircraft on a minute-to-minute basis is via the autopilot system (typically using the mode control panel); on a strategic level, control is exerted via the FMS/FMC. With changes in the management and configuration of airspace increasingly towards a free-flight environment where ATM provides a largely supervisory oversight role (rather than positive control) there is a change in the required function of the flight deck increasingly towards that of flight planning, communication, navigation and surveillance (CNS). Longer range flight planning tools optimized for 4D-navigation are now needed. With highly aerodynamically efficient aircraft, pilots are now required to plan ahead and manage the aircraft’s energy with respect to the desired flight profile: control of airspeed, altitude and rate of descent are no longer enough to achieve optimum control of the aircraft on complex, fuel efficient flight profiles. Furthermore, with increasingly sophisticated aircraft systems for power, environmental conditioning and even passenger entertainment, their management is also of increasing importance. Basically, the flight deck is now a management and information centre for the supervisory control of the whole flight. The question becomes simply this: does the flight deck properly support these functions? Furthermore: does the pilot’s initial training? The pilot of future generations of a highly automated aircraft will still be a pilot but one with a very different skill set. As Dekker [2] has noted, automation has made most of the dedicated flight crew functions redundant (e.g. the radio operators, navigators and flight engineers) and the pilots have been left to fill any gaps remaining that can’t be adequately covered by the automation. As a result they have been required to attain competencies beyond their original job mandates. Evolution (even revolution) in ATM concepts will change the role of the pilot and the design of the flight deck interfaces even further. 3.1 An Example Future ATM practices will require aircraft to navigate in a different manner. Direct Routing (or ‘Free Flight’) will significantly affect the pilots’ roles and responsibilities. Responsibility for ATM will be delegated to the flight deck (self-assured separation). Aircraft will fly direct routes and maneuver freely at their optimum speed and altitude, without consultation with ATC. The impetus to move to such a system is driven by the current inefficient use of airspace and a desire to spend less time in the air and save fuel. However, such changes demand wide-ranging transformations

A Design and Training Agenda for the Next Generation

533

throughout all other components of the system: both ATC/ATM and aircraft need to be re-equipped with new navigation and surveillance equipment and crews need to be trained. Such changes in ATM concepts cannot be fully exploited if aircraft are not equipped with suitable display technologies allowing pilots to maneuver to maintain separation from other traffic, avoid weather and undertake other aspects of real-time flight re-planning (CNS functions). Much work is being undertaken developing Cockpit Display of Traffic Information systems. This has principally centered on the real-time representation of 4D traffic information to aid situation awareness and decision-making (e.g. Johnson et al; [3]) and the development of rules for resolving airborne conflicts (e.g. Johnson et al., [4]). However, without automated assistance pilots were found to be inefficient at resolving conflicts clearly demonstrating that training is also required. More wide ranging options are also being considered by Air Traffic providers. For example, one concept would be for the various national/international ATM facilities to provide directly to the airlines quality assured, de-conflicted routes – the more you pay for your route, the more direct it is! These would be up-linked directly to the aircraft, obviating the need for an airline’s flight planning department. The function of the crew on the flight deck would be to supervise the execution of this route. However, consideration of this concept reveals that the functions of the airline/flight deck and the function of the Air Traffic provider have now reversed in several aspects. Flight planning is done by ATM; CNS, originally the core function of ATC, is now undertaken by the crew. 3.2 A Flight Deck Design Option If the flight crew are required to undertake the CNS function this implies that there will be design changes required on the flight deck. This gives the opportunity of producing a radical flight deck design solution. For example, why does a flight deck require two highly qualified pilots? Should the future flight deck have a pilot and a CNS specialist (who also has some flight skills)? Why should both sides of the flight deck have the same functions and displays (as they do today)? Why not optimize one side of the flight deck for flight path control and system management, and the other side for the CNS function? All of these options would provide better targeted functionality, optimized controls, displays and computer software, and flight crew with superior knowledge as a result of better targeted training (i.e. specialists, not generalists). It could be argued that this is a step back to a flight deck with a pilot and a navigator. However it begins to treat the workstations on the flight deck as two components in a distributed air/ground system and not simply isolated places from where to control the aircraft. Design architectures are already being developed for single crew commercial aircraft [5] which regard the flight deck as part of a distributed air/ground system. This design solution simply develops this notion in a slightly different direction. The important thing to note, though, is that it does not consider radical hardware/software design options separately from training.

534

D. Harris

4 Training The early stages of initial pilot training are concerned almost solely with the control of the aircraft, followed by the development of communication and navigation skills. The new pilot then develops these skills further for use at night and in instrument conditions, followed by an introduction to airways flying. This initial training takes place in a low-powered, piston-engined aircraft with limited performance, simple systems and dated instrumentation. The basic syllabus has not really changed since the 1930s. Until the 1950s the technology in small aircraft cockpits was similar to that in large aircraft. Large aircraft flew differently simply because they were bigger and they were often slower. With the advent of the jet engine things changed. Flight deck technology had to develop to accommodate the new levels of performance and the aircraft flown by ‘professional’ pilots began to diverge from that of the initial training aircraft. The transition to a modern highly automated ‘glass cockpit’ occurs relatively late in the training of a new pilot, usually after they join an airline. It is also usually concurrent with being introduced to multi-crew and jet-transport flying. Several authors have recommended that to alleviate problems with this transition the introduction to ‘glass cockpit’ technology should be made earlier (e.g. Rignér & Dekker [6]). Higher technology aircraft have been introduced into the early stages of flight training predicted on the basis that they resemble the future flight deck environment in terms of the type of instrumentation they contain and they also provide some of the automated functions found in advanced commercial airliners. But this reasoning is over simplistic. The question needs to be asked ‘are we teaching the right thing’? The syllabus and training concept needs revision, not its means of delivery. A full training needs analysis (TNA) needs to be undertaken for the airline pilot operating a modern commercial transport to establish the best lead in training. Simple evolution of technology and teaching is ineffective and inefficient. Even a cursory analysis of current training shows many areas of limited utility, for example low-level visual navigation (most large transports don’t even have VFR charts in them – and for what area should they carry them)? There is no need to learn the management of an Avgas fueled piston engine attached to a fixed (or variable) pitch propeller. When transferring to jet transport aircraft with fly-by-wire systems, as a result of the advanced flight control laws employed, the aircraft do not even respond in the same manner to stick inputs as a simple, light aircraft. Even the teaching of navigation using VOR/DME equipment may be questioned. The objective here is not to provide answers but simply to provoke debate and begin to encourage exploration of the question ‘could this training time be used to better effect’? There have been some superficial evaluations to evaluate the training effectiveness of introducing higher levels of automation training earlier in the flight training syllabus but these have also addressed slightly the wrong question [7]. For example Wood & Huddlestone [8] observed that the problem was not an issue in managing the automation interface but was rather an issue in understanding what the automation was

A Design and Training Agenda for the Next Generation

535

doing and how it was trying to control the aircraft. This knowledge is required first before it is possible to ‘manage’ the automation. Teaching automation is not about teaching how to use its interface. It is what lies unseen behind the interfaces that is important. Even later in the training process there is still an inappropriate focus in training. Training (as a result of flight crew licensing requirements) focuses heavily on technical malfunctions and aircraft control (particularly manual control). However, Thomas [9] observed that the vast majority of day-to-day threats encountered by flight crew during line operations stemmed not from system malfunctions but from other issues such as weather, traffic, terrain, ATC and airport conditions.

5 The HFI/HSI Approach The design of a radically new flight deck offers the ideal opportunity to re-design the training syllabus so the two are congruent. Flight deck design commences with a requirements analysis (what functions must the flight deck perform?) which also forms the basis for the TNA. If the functions of the flight deck are now split between piloting (flight path control and system management) tasks and CNS this will allow simplified, less compromised interfaces to be developed and better targeted training to be undertaken, specific to the crew role in question. Simplified, less compromised flight deck equipment is quicker to develop and certificate; cheaper to design and produce and requires less training time and has significantly reduced error potential. In this way safety and cost benefits may be available to the airlines. However HFI/HIS also encompasses organizational issues. Re-design of the flight deck in the manner specified will also create two distinct roles on the flight deck. Other matters will emerge such as issues concerning career progression and establishing exactly who is in charge of the aircraft. Will CNS flight crew attract lower pay (or vice versa)? The flight deck design revolution isn’t simply about the flight deck.

References 1. Harris, D.: Human Factors Integration in Defence. Cognition, Technology & Work 10, 169– 172 (2008) 2. Dekker, S.W.A.: On the other side of promise: what should we automate today? In: Harris, D. (ed.) Human Factors for Flight Deck Design, pp. 183–198. Ashgate, Aldershot (2004) 3. Johnson, W.W., Battiste, V., Holland, S.: A cockpit display designed to enable limited flight deck separation responsibility. In: Proceedings of the 1999 SAE/AIAA World Aviation Congress. Society of Automotive Engineers/American Institute for Aeronautics and Astronautics, Astronautics, Anaheim, CA (1999) 4. Johnson, N.H., Canton, R., Battiste, V., Johnson, W.: Distributed air/ground traffic management enroute free maneuvering rules of the road: requirements and implementation for a simulation of en-route self separation. In: Proceedings of 2005 International Symposium on Aviation Psychology, Oklahoma City, OK. Ohio State University Press, Columbus (2005) 5. Harris, D.: A human-centred design agenda for the development of a single crew operated commercial aircraft. Aircraft Engineering & Aerospace Technology 79, 518–526 (2007)

536

D. Harris

6. Rignér, J., Dekker, S.W.A.: Modern flight training: managing automation or learning to fly? In: Dekker, S.W.A., Hollnagel, E. (eds.) Coping with computers in the cockpit, pp. 145– 151. Ashgate, Aldershot (1999) 7. Casner, S.M.: Learning about cockpit automation: From Piston trainer to jet transport. NASA report NASA/TM-2003-212260. NASA Ames Research Center, Moffett Field CA (2003) 8. Wood, S.J., Huddlestone, J.A.: Requirements for a revised syllabus to train pilots in the use of advanced flight deck automation. Human Factors & Aerospace Safety 6, 359–370 (2007) 9. Thomas, M.J.W.: Improving organisational safety through the integrated evaluation of operational and training performance: an adaptation of the line operations safety audit (LOSA) methodology. Human Factors & Aerospace Safety 3, 25–46 (2003)

Future Ability Requirements for Human Operators in Aviation Catrin Hasse, Carmen Bruder, Dietrich Grasshoff, and Hinnerk Eißfeldt German Aerospace Center, Department of Aviation and Space Psychology, Sportallee 54, 22335 Hamburg, Germany {catrin.hasse,carmen.bruder,dietrich.grasshoff, hinnerk.eissfeldt}@dlr.de

Abstract. The present study addresses the optimal fit between technical innovations in aviation and aircraft operators. Because of the increase in computerization, an accurate and efficient monitoring of the automation poses a key challenge to future operators. As the German Aerospace Center’s Department of Aviation and Space Psychology is responsible for personnel selection of pilots and air traffic controllers, our objective for the selection of future personnel is to distinguish good monitoring operators from bad operators. In order to identify good monitoring behavior we developed a simulation tool that represents tasks of pilots and controllers within a dynamic air traffic flow. Participants have either to monitor the automatic process or to control the dynamic traffic manually. Monitoring behavior is measured by recording eye movement parameters. The identification of accurate monitoring behavior enables us to adapt selection profiles to future ability requirements. Keywords: automation, monitoring behavior, human performance, personnel selection, eye tracking, future ATM.

1 Introduction Improvements in air traffic management (ATM) and aircraft systems as well as organizational structures have become one of the key challenges of aviation in the 21st century. This is especially important with regard to the considerable increase in air traffic. The key question of DLR`s research program Aviator 2030 deals with changes that will concern pilots and air traffic controllers: Which modifications of operators’ tasks, responsibilities and ability requirements are to be expected? 1.1 Aviator 2030 – Ability-Relevant Aspects of Future ATM Systems Research project Aviator 2030 (see Fig. 1) focuses on an optimal fit between ATM system design and human operators in future aviation. This will be carried out by adapting selection profiles to future ability requirements. In the first project phase, workshops with experienced pilots and air traffic controllers were conducted in order to D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 537–546, 2009. © Springer-Verlag Berlin Heidelberg 2009

538

C. Hasse et al.

Fig. 1. Phases of the project Aviator 2030

develop a concept of future ATM. They were asked to tell their expectations regarding future tasks, roles and responsibilities. Summing up these workshop results, monitoring and teamwork in a highly automated workplace pose a challenge to future aircraft operators [1]. Thus, research should focus on the ability of monitoring as one major topic. The second project phase comprised the development of simulation tools that represent future workplaces in aviation. Experiments with humans operating in these simulated future workplaces serve as basis for identifying potential changes in ability requirements for pilots and air traffic controllers. Results allow for a timely adjustment of selection profiles and, thereby, for the development of future ability tests. 1.2 Monitoring Automated Systems Technical developments make it possible to automate many aspects of a humanmachine system. Automation is the allocation of functions to machines that would otherwise be allocated to humans. It is the human’s job to monitor the automated system and assume control when the automation fails. There is considerable evidence that automation issues are involved in most accident reports [6]. Modern workplaces in aviation are often complex human-machine systems, in which humans and machines work closely together. The quality of human-machine interaction determines the reliability of the system. Generally, there are three approaches affecting the interaction between the operator and machine: system design, operator training and personnel selection. Our focus is on the third approach: Which ability requirements are important for future human-machine interaction if many functions and tasks are automated? Concerning the requirements of humans interacting with automated systems, maintenance of situation awareness and adequate trust in automation pose a challenge. Sufficient situation awareness exists if the operator has a picture of the traffic situation, understands the situation and sees what happens in the future [5]. Human operators may lose their situation awareness when deficits in monitoring occur. Monitoring an automated system includes rapidly processing a complex and dynamic scene on the display of an automatic system. Moreover, manual system handling, in case of a system failure, is an important requirement. Often processes of

Future Ability Requirements for Human Operators in Aviation

539

automated systems are based on complex rules that are difficult for human operators to understand. If operators monitor sub optimally, they do not consider important information in their traffic picture. Hence, operators cannot interpret perceived information of system behavior. The resulting gaps and misconceptions make it difficult to form proper expectations of system status and behavior. Analyses of pilots´ accidents and incidents suggest these monitoring failures are responsible for breakdowns in pilot-automation coordination [17]. Previous studies have focused on the monitoring behavior and performance of operators in view of system design and degree of automation. In this regard, Murmaw, Nikolic, Sarter and Wickens studied both performance and eye-tracking data from pilots [9]. Whereas they observed pilots` flying a challenging scenario in a simulator as for performance data, they took pilots´ fixations of relevant targets as indicators for monitoring. Pilots appear to monitor flight mode announcements to a much lesser extent and at a more superficial level than intended and expected by designers and training departments. Murmaw et al. concluded that pilots` monitoring performance should be enhanced through a more adequate system image and the design of more effective automation feedback [9]. In our study, we focus on human ability requirements instead of ergonomic and design aspects in highly automatic environments. Therefore, we are interested in individual differences in monitoring strategies. We assume that individual differences in monitoring lead to differences in learning the underlying principles of the automatic system. 1.3 Individual Differences in Monitoring Models describing underlying cognitive processes of monitoring behavior provide first indicators for differences in monitoring behavior. Operators differ in their mental representation of the traffic situation. Whitfield and Jackson introduced the term “picture” as the global mental representation of traffic situation in working memory, which air traffic controllers use to solve their task [14]. Whitfield and Jackson found that experienced and novice controllers differ in their picture: Experienced controllers generate it more easily and faster. Furthermore, they are more flexible in switching between aircrafts and areas of interest. Additionally, another study pointed out that experienced controllers monitor information about aircrafts in accordance with their importance for controlling the traffic [8]. Niessen and Eyferth developed a model of an experienced air traffic controllers’ mental representation of traffic situation. It is a domain-specific model of controllers’ cognitive abilities [10]. The assumptions about the model are based on comparing novice and experienced controllers. The monitoring cycle of the model differs between two phases: data selection to build up the picture of the current situation (phase 1) and update to refresh it (phase 2). An experiment showed that the representation of a current situation is built up under considerable reduction of information. Thus, controllers selected relevant features as codes, position and flight direction. The update frequency adjusts to the relevance of information just as well. Highly relevant objects are updated more often than less relevant objects. Additionally, the model includes an anticipation cycle that provides conflict resolutions [10].

540

C. Hasse et al.

Wickens, Helleberg, Goh, Xu and Horrey [15] developed a model called “SEEV” whose components are representative processes of pilots’ allocation of attention to flight relevant information channels. Unique about this model is its linkage between visual attention and models of cockpit task management. The components of SEEV indicate that the allocation of attention in dynamic situations is driven by bottom up capture of salient (S) events, which is inhibited by the effort (E) required to move attention, and is also driven by the expectancy of seeing valuable events at certain locations in the traffic environment. Within aviation, there is a clearly established task priority hierarchy, which defines the importance of value (V) of areas of interest. In two cross validation experiments, the model fits increased with expertise, which accounts for 95 % of the variance. The results suggest that well trained pilots are indeed quite optimal in allocation of attention. Accordingly, the model can serve as a good standard for attention allocation in different complex environments [15]. Expert-novice comparisons provide additional indicators for differences in monitoring behavior as being responsible for differences in performance of human operators. In the field of driving psychology, there are studies that deal with the impact of skill and experience on visual search and hazard detection. Experienced drivers show increased horizontal variance in fixation locations and shorter gaze durations on dangerous objects compared to novice drivers [2]. Moreover, experienced drivers adjust scanning patterns to different processing demands, whereas the strategies of inexperienced drivers remained rather inflexible [3]. That is, novice drivers show more stereotypical fixation transitions [13]. Another important factor influencing human-automation interaction is the human’s trust in automation [5]. Low levels of trust can lead to disuse, when automated systems generate many false alarms [11]. High levels of trust in automation, however, lead to complacency. Singh, Molloy and Parasuraman argue that human operators differ in their complacency potential [12]. Complacent behavior is defined as inaccuracy and delay in detecting changes or failure of an automated system. Furthermore, complacency reflects the strategy to allocate the attention to other concurrent tasks. Therefore, eye movement recordings should show that operators scan raw information sources less frequently when using automated systems [11]. Previous research focused on individual differences in monitoring behavior in view of expertise. It was assumed indirectly that an increase in experience accounts for accuracy in monitoring. However, in personnel selection it is often impossible and beyond undesirable to select completely trained and skilled experts. In fact, the German Aerospace Center’s Department of Aviation and Space Psychology is responsible for the personnel selection of ready entries (ab-initio pilot or air traffic controller trainees) as they are called in aviation. Consequently, our scientific approach goes beyond differences in monitoring due to expertise. By contrast, we are interested in abilities that account for differences in monitoring behavior, independent of expertise. 1.4

Monitoring Performance in Future Personnel Selection

Wickens, Mavor, Parasuraman and McGee concluded that automation might affect system performance due to the new skills that may be required, but that controllers might not have been adequately selected and trained for [16]. Once automation is introduced, it is anticipated that the job of the controller shifts from a tactical one to an automation supported strategic job. Whereas tactical control refers to aircraft in

Future Ability Requirements for Human Operators in Aviation

541

one sector, strategic control refers to the flow of aircrafts across multiple sectors. Manning and Broach asked experienced controllers to assess the cognitive skills and abilities needed by controllers working with future automation [7]. Controllers agreed that coding (the ability to translate and interpret data) would be extremely important. Furthermore, verbal and spatial reasoning as well as selective attention would be needed in future aviation, particularly when control shifts from automation to the human operator. Numerical reasoning was rated as less relevant, because the automated system accomplishes numerical transactions. This was supported by a study with German air traffic controllers [4]. With the aim of adapting selection profiles to future ability requirements we focus on the ability of monitoring, which is of increasing importance to future aircraft operators. As our objective is to distinguish good performing operators from bad performing operators based on monitoring behavior, we firstly approach individual differences in monitoring strategies. Normative models of adequate and efficient monitoring behavior as well as differences between experts and novices serve as suggestions for critical monitoring behavior, on which we focus our study. Secondly, performance data after a monitoring phase serve as our criterion to evaluate the “goodness” of individual monitoring behavior. We zoom in on the link between monitoring and performance data, i.e. individual differences in monitoring behavior and differences in manual system handling. We assume that this link reflects differences in the ability to understand the underlying principles of the automatic system. On this note, we premise that monitoring automation predicts manual performing in case of automation failure. In view of all hypotheses, “good monitoring behavior” is associated with adequate and efficient system handling performance. To summarize, we concluded some hypotheses from comparisons between novice and expert operators, and from models representing operators´ cognitive processes of attention allocation and visual scanning. Concerning expert-novice comparisons in section 1.3, we hypothesize that: • Operators with “good monitoring behavior” do not differ much in their monitoring behavioral data as all these operators show a target-oriented scanning strategy that could be predicted by the demands of a given scenario. In contrast, operators with less understanding of the specific demands of a given scenario vary a lot in their scanning behavior, reflecting aimless and random monitoring behavior. • Operators with good monitoring behavior adapt their scanning behavior to the situation. Therefore, their scanning behavior varies between different scenarios. Operators with inadequate monitoring behavior do not adapt their scanning behavior to the situation. Furthermore, we derive hypotheses from the models of cognitive processes of operators (reported in section 1.3): • Operators with good monitoring behavior start with a data selection phase, in which they scan the whole environment and categorize information as high or less relevant. • After data selection, operators with good monitoring behavior update high relevant information more often than less relevant information. That means, high dynamic or critical situations are scanned more often.

542

C. Hasse et al.

• Good operators adapt their scanning behavior to situational demands while maintaining a robust mental representation of the whole system. Therefore, they switch faster between different tasks. • Operators with good monitoring behavior have a less complacent potential than those with a bad monitoring behavior. Complacent behavior is associated with inaccuracy and delay in detecting changes or failure of an automated system. As performance of manual control serves as the criterion for “good monitoring behavior”, i.e. “good monitoring” ensures an adequate and efficient performance in manual control in case of system failure, we premise for all hypotheses that “good monitoring” (as described above) is associated with an adequate and efficient manual system handling: • “Good monitoring” operators show an accurate, quick and flexible system handling.

2 Simulation Tool Research project Aviator 2030 targets the investigation of monitoring behavior and human performance in future ATM scenarios. We developed a simulation tool called “Self Separation Airspace” (SSAS) that represents future tasks of pilots and controllers. It is a dynamic simulation, which allows performance assessment. This tool consists of two workstations, which could be used separately and together. As our research focuses on general questions, the tool is a simplified and abstract simulation of basic requirements of future flight operators. In doing so, test subjects need no prior experience as a pilots or air traffic controller. The simulation tool comprises a traffic flow simulation (Fig. 2) and a simple flight control simulation (Fig. 2). The operator’s task is to control the traffic flow between two airports. Both airports are connected by airways transporting the traffic between outbound and inbound of airports. Sometimes aircrafts are critical, i.e. they do not flight optimally in the airway. In this case, the operator should switch to the flight control screen navigating the critical aircraft on the optimal pathway. The operator either monitors the automatic process or controls the dynamic traffic manually. In the automatic mode, the system controls the traffic flow automatically. In the manual mode, the human operator controls the traffic by using input devices. Both modes, automatic process and manual control, can be conducted in the same run. In doing so, we can research monitoring an automatic system and manual controlling the traffic separately. Most parameters of the simulation are modifiable to configure traffic volume task, balance of system, system feedback and interruptions through dual task. We designed scenarios, which differ in their traffic volume at the beginning of the scenario and the variety of traffic in the traffic flow simulation. Concerning this, the traffic flow is balanced, if the traffic of the airports as well as airways is similar. It is not balanced, if the traffic flow is much different between airports and airways. Additionally, the variety of traffic flow could be modified by faster clocking of airways, different target and limit values of the airports and blocking of airways during runtime. This allows monitoring behaviour and performance to be researched under varying complexity.

Future Ability Requirements for Human Operators in Aviation

543

Fig. 2. Simulation tool “SSAS”: Traffic flow simulation (above) and trajectory control (below)

3 Method In order to identify the core abilities of a future operator, our experimental paradigm focuses on interindividual differences in monitoring. Experimental Paradigm: With the objective of varying complexity and dynamic of the automatic system, we vary the amount of traffic as well as the variety of traffic in the traffic flow simulation. Thus, we developed four scenarios reflecting four possible combinations of both traffic parameters: Limited amount of traffic with little variety (scenario 1), limited amount of traffic with a lot of variety (scen. 2), extended amount of traffic with little variety (scen. 3), and extended amount of traffic with a lot of variety (scen. 4). Within these scenarios, we test the quality of monitoring behavior as a substantial effect on handling the complex system in case of system failure. Measurements: As for dependent variables, we focus on the establishment and maintenance of system understanding during the monitoring phase. We use eye movement parameters, which act as indicators for the perceptual and cognitive operations involved. As we assume the understanding of the system to be conditional for manual system handling in case of system failure, we combine both, eye

544

C. Hasse et al.

movement parameters and performance data, as measurements. As for eye movement analyses, we include fixation durations, as an indicator of the time taken to assimilate fixated objects, and the variance of fixation co-ordinates to describe the spread of search in both the horizontal and vertical axes. Regarding the effect on system handling, we generate reaction times and performance parameters that identify the quality of individual manual control of the system. Based on individual differences in monitoring behavior and related individual differences in manual controlling parameters, we are able to identify the core competencies of future aviators. Experimental device: Eye Movements are recorded by Eyegaze Analysis System manufactured by L. C. T.. Managing of raw data was conducted by NYAN software, developed by Interactive Minds. Subjects were seated in front of a 19-inch LCD computer display with a distance of approximately 60 cm. Test subjects: Our experiments are conducted with candidates of DFS (Deutsche Flugsicherung GmbH) and DLH (Deutsche Lufthansa AG). This enables us to compare our experimental data about monitoring in future human-machine systems with abilities measured in personnel selection tests. Procedure: Participants were tested individually. First, they were given a questionnaire measuring trust in automation, and the instruction for the following experiment. Participants were informed they would work on four scenarios, all consisting of two phases starting with an automation phase followed by a manual phase. Referring to the automation phase of each scenario, participants were instructed to monitor the automation with the objective of understanding the rulebased dynamics of the given scenario. Referring to the hand control phase (manual condition), participants were assigned to manually control the system in continuation of the automation. That is, participants should control the system in terms of the rules and dynamics that they have learned from monitoring the scenario in automation. After a short (15 s) calibration phase that ensures adjustment of Eyegaze Analysis System to individual gazes of the participants, the persons were then presented the four scenarios, each taking 5 minutes. There was a smooth transition between the automatic mode and manual mode within each scenario but pauses were placed between each scenario. The four scenarios were presented in a fixed order for every subject beginning with the easiest, scenario 1, finishing with the most complex, scenario 4.

4 Status Quo and Further Steps At present, our simulation tool SSAS is developed and investigated in preliminary tests. SSAS represents future tasks of pilots and controllers. By varying the complexity and dynamics of SSAS, different degrees of task difficulty are realizable. Thus, the system allows for the investigation of human abilities required by future tasks and by varying task difficulties. As we are especially interested in the ability of monitoring an automated system, the simulation tool is connected with an eye movement tracker. We assume eye movement parameters to reflect perceptual and cognitive processes involved in monitoring, so that our approach is on identifying good monitoring operators on the basis of eye movement parameters. We further assume that “good monitoring” is associated with an accurate manual system handling

Future Ability Requirements for Human Operators in Aviation

545

in case of automation failure, and therefore, aimed at connecting monitoring behavior with manual control behavior. Hence, we implemented within SSAS both, an automated system that demands monitoring from a test subject, and a manual phase, that demands manual control from a subject. In this regard, performance parameters of manual control serve as criterion for “good monitoring” behavior. We suggest these performance data to reflect individual differences in the ability of learning underlying principles of an automatic system while monitoring. Ability testing with dynamic simulation on the basis on eye movements is innovative and establishes new approaches assessing selection profiles. In this regard, SSAS is introduced as an appropriate basis tool to investigate human performance in future ATM Scenarios as well as the underlying ability requirements that allow for human performance in future aviation. Beyond this, fundamental research on other future core abilities is intended, e.g. attention and role shifting, diagnosing system control state as well as communicating with automatic functions. Accordingly, the simulation tool allows a smooth transition from research to future ability testing. Further research is on failure detection while monitoring fully automated system. As Wickens mentioned the possibility that system reliability is less than perfect [16], the human operator must detect system failures and has to respond to them. In doing so, we plan a study, in which the human operator should be able to detect automation failures during the monitoring phase as well as switch to the manual control.

References 1. Bruder, C., Jörn, L., Eißfeldt, H.: Aviator 2030 - When pilots and air traffic controllers discuss their future. In: Proceedings of the EAAP conference, Valencia, Spain (2008) (in press) 2. Chapman, P., Underwood, G.: Visual Search of Dynamic Scenes: Event Types and the Role of Experience in Viewing Driving Situations. In: Underwood, G. (ed.) Eye guidance in reading and scene perception. Elsevier, Oxford (1998) 3. Crundall, D., Underwood, G., Chapman, P.: Driving experience and functional field of view. Perception 18, 1075–1087 (1999) 4. Eißfeldt, H., Heintz, A.: Ability Requirements for DFS Controllers – Current and Future. In: Eißfeldt, H., Heil, M.C., Broach, D. (eds.) Staffing the ATM System, Ashgate, Burlington (2002) 5. Endsley, M.R., Bolte, B., Jones, D.G.: Designing for situation awareness – An approach to user-centered design. Taylor & Francis, New York (2003) 6. Funk, K., Lyall, B., Wilson, J., Vint, R., Niemczyk, M., Suroteguh, C., Owen, G.: Flight deck automation issues. International Journal of Aviation Psychology 9, 109–123 (1999) 7. Manning, C.A., Broach, D.: Identifying ability requirements for operators of future automated air traffic control systems. DOT/FAA/AM-87/26. Federal Aviation Administration, Washington, DC (1992) 8. Mogford, R.H.: Mental models and situation awareness in air traffic control. International journal of aviation psychology 7, 331–341 (1997) 9. Murmaw, R.J., Nikolic, M.I., Sarter, N.B., Wickens, C.D.: A simulator study of pilots monitoring strategies and performance on modern glass cockpit aircraft. In: Proceedings of the 45th Annual Meeting Human Factors and Ergonomics Society, Minneapolis, USA (2001)

546

C. Hasse et al.

10. Niessen, C., Eyferth, K.: A model of the air traffic controllers’ picture. Safety Science 73, 187–202 (2001) 11. Parasuraman, R., Sheridan, T.B., Wickens, C.D.: Situation awareness, mental workload, and trust in automation: Viable, empirically supported cognitive engineering constructs. Journal of Cognitive Engineering and Decision Making 2, 140–160 (2008) 12. Singh, I.L., Molloy, R., Parasuraman, R.: Automation-induced “complacency”: Development of the complacency-potential rating scale. International Journal of Aviation Psychology 3, 111–122 (1993) 13. Underwood, G., Chapman, P., Brocklehurst, N., Underwood, J., Crundall, D.: Visual Attention while driving: Sequences of eye fixations made by experienced and novice drivers. Ergonomics 46, 629–646 (2003) 14. Whitfield, D., Jackson, A.: The air traffic controller’s picture as an example of mental models. In: Johannsen, G., Rijnsdorp, J.E. (eds.) Proceedings of the IFAC Conference on Analysis, Design, and Evaluation of Man-Machine Systems (45–52). Pergamon Press, London (1982) 15. Wickens, C.D., Helleberg, J., Goh, J., Xu, X., Horrey, B.: Pilot task management: Testing an attentional expected value model of visual scanning (ARL-01-14/NASA-01-7). University of Illinois, Aviation Research Lab, Savoy (2001) 16. Wickens, C.D., Mavor, A.S., Parasuraman, R., McGee, J.P.: The future of air traffic control: Human operators and automation. National Academic Press, London (1998) 17. Woods, D.D., Sarter, N.B.: Learning from automation surprises and “going sour” accidents. In: Sarter, N.B., Amalberti, R. (eds.) Cognitive Engineering in the Aviation Domain, pp. 327–353. LEA, Hillsdale (2000)

The Application of Human Error Template (HET) for Redesigning Standard Operational Procedures in Aviation Operations Wen-Chin Li1, Don Harris2, Yueh-Ling Hsu3, and Lon-Wen Li4 1

Psychology Department, National Defense University, Taiwan, R.O.C. [email protected] 2 Human Factors Department, Cranfield University, United Kingdom 3 Department of Air Transportation, Kainan University, Taiwan, R.O.C. 4 Training Centre, National Defense University, Taiwan, R.O.C.

Abstract. Human Error Template (HET) is a checklist style approach to predict human errors in the cockpit for developing accident prevention strategies. It is applied to each bottom level task step in a hierarchical task analysis (HTA) of the task in question. This research applies the latest technique for human error prediction- Human Error Template to predict the potential design-induced human errors in the IDF during the landing phase of flight and provide a basis for improving software design and hardware equipment to enhance flight safety. In military operations emphasis is on the fulfillment of SOPs in an attempt to prevent incidents/accidents resulting from human factors. By the use of the scientific approach of HTA to evaluate current SOPs together with formal error analysis of the pilot’s, interface design and procedures, the air force’s combat effectiveness will be improved and a user-friendly cockpit interface can be developed. Keywords: Aviation Safety, Hierarchical Task Analysis, Human Error Template, Standard Operation Procedure.

1 Introduction New generation, modern technology aircraft have implemented highl automated systems and computerized cockpits. However, human factors accidents have become the most significant concern for everyone in the aviation industry. According to accident investigation reports, inappropriate system design, incompatible cockpit display layout, and unsuitable SOPs were the major factors causing accidents [9]. Li & Harris [6, 7] found that 30% of accidents relevant to ‘violations’ included intentionally ignoring standard operating procedures (SOPs); neglecting SOPs; applying improper SOPs; and diverting from SOPs. Dekker [1] has proposed that human errors are systematically connected to features of operators’ tools and tasks, and that error has its roots in the surrounding system: the question of human or system failure alone demonstrates an oversimplified view of the roots of failure. The important issue in a human factors investigation is to understand why pilots’ actions made sense to them at the time the accident happened. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 547–553, 2009. © Springer-Verlag Berlin Heidelberg 2009

548

W.-C. Li et al.

Human Error Template (HET) is a checklist style approach to error prediction that comes in the form of an error pro forma containing 12 error modes. The HET methodology is applied to each bottom level task step in a hierarchical task analysis (HTA) of the task in question. The technique requires the analyst to indicate which of the HET error modes are credible for each task step, the probability of error and the criticality of error, based upon their judgment for developing effective accident prevention strategies [4]. The HET error taxonomy consists of 12 basic error modes that were selected based upon a study of actual pilot error incidents and existing error modes identified from previous research [11]. The 12 HET error modes are: (1) Failure to execute; (2) Task execution incomplete; (3) Task executed in the wrong direction; (4) Wrong task executed; (5) Task repeated; (6) Task executed on the wrong interface element; (7) Task executed too early; (8) Task executed too late; (9) Task executed too much; (10) Task executed too little; (11) Misread Information; (12) Other. For each credible error the analyst provides a description of the form that the error would take. The analyst has to determine the outcome or consequence associated with the error and estimates the likelihood of the error using three levels: low, medium or high; and the criticality of the error using three levels, low, medium or high. If the error is given a high rating for both likelihood and criticality, that aspect of the interface involved in that task step is then rated as a ‘fail’, meaning that it is not suitable for certification [8]. The main advantages of the HET method are that it is simple to learn and use, requiring very little training and it is also designed to be a convenient method to apply in the field. The error taxonomy used is comprehensive as it is based on existing error taxonomies from a number of HEI methods [12]. The advanced automation in new generation aircraft have without a doubt offered considerable improvements in safety over their original types, however new types of error have begun to emerge on these flight decks. This was exemplified by accidents such as the Nagoya Airbus A300-600, where the pilots could not disengage the goaround mode after its inadvertent activation, as a result of a combination of lack of understanding of the automation and poor design of the operating logic in the autoland system. As a result of such accidents relevant to human errors, the US Federal Aviation Administration [3] commissioned an in-depth study of the pilot-aircraft interface in modern cockpits. The report identified several major flight deck design shortcomings and deficiencies in the design process. There were criticisms of the cockpit interfaces, such as pilots’ auto-flight mode awareness/indication; energy awareness; confusing and unclear display symbology and nomenclature, and a lack of consistency in FMS interfaces and conventions. HEI techniques should be capable of being used for the revision of SOPs and flight deck design to comply with the certification requirement and enhance the ability o perform Taiwan MOD’s priority task. In order to enhance safety, there is also a strong economic argument for airlines for the early identification of inadequacies of interface design in the cockpit. Making revisions late in the design of interfaces and/or operational procedures is expensive. This research applies HET for evaluating the pilot interface and identifying potential instances of ‘design-induced error’ in components such as software system design, human-computer interaction, automation and cockpit layout, and the associated SOPSs for using these systems. The ultimate objective is to enhance pilots’ situation awareness, reduce error and improve aviation

The Application of HET for Redesigning Standard Operational Procedures

549

safety. The results of this study will be able to improve the safety for pilots’ training, revising standard operation procedures, and for modifying software and hardware design.

2 Method Participants: Six participants took part in this research including three pilots with over 2000 flight hours (including senior IDF instructor pilots), two aviation safety domain researchers, and one aviation human factors expert with flight experience. Purpose: There were two purposes for this research: the first purpose was to evaluate the IDF Fighter pilot interface and the second purpose was to evaluate the IDF standard operation procedures through the application of Hierarchical Task Analysis and Human Error Template analysis. Research Design: An HTA was performed from the standard operation procedures and expert de-briefing for the final approach to parking the fighter on the ramp. This required the integration of IDF SOPs, knowledge of flight operations by senior instructors, safety management researchers, and aviation human factors researchers all of whom participated in the development of the HET analyses. Procedures for HTA: Hierarchical task analysis is the most popular task analysis method and has become perhaps the most widely used of all HF methods available. Originally developed in response to the need for a greater understanding of cognitive tasks [2], HTA involves describing the activity under analysis in terms of a hierarchy of goals, sub-goals, operations and plans. The end result is an exhaustive description of that task. The HET template was then applied to the bottom level tasks in the HTA to identify any potential error modes.

3 Results and Discussion Through the use of HTA, the expert team in this study analyzed the SOPs for landing the IDF and further developed the HET evaluation form for landing the IDF at CCK airport. Based on the findings of the HTA, it was found that the goal- ‘Land IDF at CCK airport’ was composed of 11 sub goals at level 2 (such as 1.11 Stop Aircraft at Check Zone), followed by another 28 sub sub-goals at level 3 (such as ‘1.11.2 Check after Landed), followed by 16 sub sub-goals at level 4 (such as ‘1.11.2.6 Navigation Equipment Off), and 6 sub sub-goals at level 5 (such as ‘1.11.2.6.1 TACAN Off). The result demonstrates the fact that the main goal of IDF Landing Safely won’t be achieved unless these 61 sub sub-goals at levels 2, 3, 4 and 5 are accomplished (Figure 1). Within the 61 sub goals in the HTA, there are contained 43 bottom level tasks as shown as the sub goals underlined in Figure 1. These bottom level tasks are also action items. Although each action item is for a specific goal, it represents an activity to be performed for a safety reason. Therefore the HET evaluation form developed for the IDF was applied to each of the 43 bottom level tasks identified from the HTA to diagnose the opportunities for the 12 basic error modes in the HET to be committed, i.e. the likelihood of the following errors “ Failure to execute”, “Task execution

550

W.-C. Li et al.

Fig. 1. The Example of SOPs for IDF Fighter Landing by HTA

incomplete”, “Task executed in the wrong direction”, “Wrong task executed”, “Task repeated”, “Task executed on the wrong interface element”, “Task executed too early”, “Task executed too late”, “Task executed too much”, “Task executed too little”, and “Misread Information”. Each of these actions was categorized on the basis of its error likelihood (very low, low, medium, high or very high), and the criticality of that error mode for flight safety (very low, low, medium, high or very high). For example, there are several potential error modes associated with the sub-goal of 1.1.2 Check Speed Brake Indicator, such as Forget to check, Not fully extend, Rear cockpit switch not central, Mis-switch weapon-cross switch, operate the S/B (Speed Brake)

The Application of HET for Redesigning Standard Operational Procedures

Task step: 1.1.2 Check Speed Break Indicator

Error Mode

Outcome

□Y ■N

Forget to check

To fly with doubt, unsure right S/B allocation so that affect the speed-up power when go around

■Y □N

Not fully extend

Decelerate slowly with high speed, throttle allocation low

Fail to execute

Task execution incomplete

Task executed □Y in wrong ■N direction

Rear cockpit Approach speed switch not increased easily, central descend rate is low

Wrong task executed

Mis-switch May lead to low weapon-cross AOA but high speed switch

□Y ■N

L

M H H L

Criticality M H H

Task repeated ■ Y □N Task executed on wrong □Y interface ■N element Task executed too early □Y ■N

Undo S/B switching but Undo S/B out D/F Undo the SOP to Change flight decrease the safety formation in distance between two case aircrafts

Task executed too late ■Y □N

S/B in by accident

Abnormal operation procedure which may compress operation time and increase workload at the same time

Task executed □ Y too much ■N Task executed ■ Y too little □N Misread information

□Y ■N

Other

□Y ■N

Lead to wrong Fake signal or estimation or misread operation

Fig. 1. An Example of Human Error Template Format for IDF Fighter during landing

FAIL

Likelihood Yes/ Description No

PASS

Scenario: IDF Landing at CCK AFB

551

552

W.-C. Li et al.

switch instead of D/F (Dog Fight) switch, accidental S/B retraction, or misread information (see Figure 2). These types of errors will result in operations with an unsure S/B allocation which will result in the aircraft decelerating slowly from high speed and throttle at low position, an increased approach speed with a low descent rate, AOA low with high speed. Restricted time to perform these operations increased workload at the same time, and helped to increase the chance of wrong estimates or operations. The expert team then needed to evaluate the likelihood of these errors and the criticality of the errors. If the error is given a high rating for both likelihood and criticality, this aspect of the interface involved in the task step is then rated as a ‘fail’, meaning that it is not suitable for certification. Thus, to effectively prevent the occurrence of human errors, more specific training should be implemented, the software/hardware need to be redesigned, and/or SOPs need to be updated, etc. Diaper and Stanton [2] suggested that HTA is the best method for human-machine research, and has great potential to be used in the system design and development of aircraft. The findings from this research show a high opportunity for IDF pilots to operate the IDF’s Speed Brake by mistake because the Speed Brake is close to Dog/Fight Switch; as a result there is a potential flight safety concern. It demonstrates that HTA and HET can identify whether redesign work is needed to the controls and instruments in the cockpit. Stanton [2] suggested that HTA is good for system design and analysis, from design concept to practical application, especially for the purposes of task allocation, procedure design, training syllabus design and interface design. The reason behind HTA’s popularity is its effectiveness and flexibility. A great deal of Human Factors research is unlikely to be effective without HTA, such as usability evaluation, error identification, or performance evaluation. The step-by-step output of HTA is practical to use. Researchers are able to gain an in-depth knowledge of the activity under analysis. However, the disadvantage of the technique is the amount of data collection required and the time that it takes [5]. For example, there were 43 action items in the IDF landing process. The HET error taxonomy consists of 12 basic error modes with 3 variables to assess (severity, frequency and pass/fail). In total, each participant pilot needs to fill in up to 43*12*3 = 1,548 data cells, which is a considerable amount work for every participant. In addition, it takes time to become familiar with the technique and conduct a reliability analysis. Nevertheless, performance of a formal error analysis at the early stages of the process for designing the flight deck and its operating procedures is still a lot cheaper than re-designing the aircraft interfaces once it has entered service.

4 Conclusion This research applies the latest technique of human error prediction- Human Error Template which is based on Hierarchical Task Analysis to evaluate current the cockpit design and standard operation procedures of IDF fighters. The research aims were to predict the potential design-induced human errors for the IDF during landing as well as improve software design and hardware equipment for flight safety. Together with data from previous incidents/ accidents and the studies of human factor engineering, HET is an appropriate technique to conduct error prediction for flight

The Application of HET for Redesigning Standard Operational Procedures

553

safety. This year, the military emphasis is on the fulfillment of SOPs in an attempt to prevent incidents/accidents resulting from human factors. By the use of a scientific approach using HTA to evaluate current SOPs together with error analysis, interface design and procedure certification, the air force’s combat effectiveness will be enhanced and a user-friendly task environment can be achieved.

References 1. Dekker, S.: The Re-invention of Human Error. Human Factors and Aerospace Safety 1(3), 247–266 (2001) 2. Diaper, D., Stanton, N.A.: Handbook of Task Analysis in Human-Computer Interaction. Lawrence Erlbaum Associates, Mahwah (2004) 3. Federal Aviation Administration, Report on the Interfaces between Flightcrews and Modern Flight Deck Systems, Federal Aviation Administration, Washington DC (1996) 4. Harris, D., Stanton, N.A., Marshall, A., Young, M.S., Demagalski, J., Salmon, P.M.: Using SHERPA to Predict Design-Induced Error on the Flight Deck. Aerospace Science and Technology 9, 525–532 (2005) 5. Kirwan, B., Ainsworth, L.K.: A Guide to Task Analysis. Taylor and Francis, London (1992) 6. Li, W.C., Harris, D.: Pilot error and its relationship with higher organizational levels: HFACS analysis of 523 accidents. Aviation Space Environmental Medicine 77(10), 1056– 1061 (2006) 7. Li, W.C., Harris, D.: Eastern Minds in Western Cockpits: Meta-analysis of human factors in mishaps from three nations. Aviation Space and Environmental Medicine 78(4), 420– 425 (2007) 8. Marshall, A., Stanton, N., Young, M., Salmon, P., Harris, D., Demagalski, J., Waldmann, T., Dekker, S.: Development of the Human Error Template – a new methodology for assessing design induced errors on aircraft flight decks. Final Report of the ERRORPRED Project E!1970. Department of Trade and Industry, London (2003) 9. Stanton, N.A., Baber, C.: Error by design: methods for predicting device usability. Design Studies 23, 363–384 (2002) 10. Stanton, N.A., Harris, D., Salmon, P.M., Demagalski, J.M., Marshall, A., Young, M.S., Dekker, S.W., Waldmann, T.: Predicting Design Induced Pilot Error Using HET- A New Formal Human Error Identification Method for Flight Decks. The Aeronautical Journal 110, 107–115 (2006) 11. Stanton, N.A., Salmon, P.M., Walker, G.H., Baber, C., Jenkins, D.P.: Human Factors Methods: A Practical Guide for Engineering and Design. Ashgate, London (2005) 12. Stanton, N.A., Stevenage, S.V.: Learning to predict human error: issues of reliability, validity and acceptability. Ergonomics 41, 1737–1756 (1998)

Effect of Aircraft Datablock Complexity and Exposure Time on Performance of Change Detection Task Chen Ling and Lesheng Hua School of Industrial Engineering, University of Oklahoma, 202 W. Boyd, Room 124, Norman, Oklahoma 73019, USA {chenling,hua}@ou.edu

Abstract. Air traffic controllers constantly perform tasks of monitoring traffic situation and searching for conflict between aircrafts. One requirement for these tasks is being able to detect any changes in the aircraft status presented by aircraft datablock. In this study, we investigated the effects of aircraft datablock complexity and exposure time on the change detection task performance. Two types of datablock, six field datablock (6F-DB) and nine field datablock (9FDB), were artificially designed. Ten participants learned the change detection taskwith aircraft datablocks for four days. Our results showed that datablock complexity and exposure time in the change detection task had direct impacts on task performance. In particular, participants had higher detection accuracy with the less complex 6F-DB than the more complex 9F-DB. The longer DB exposure time of 1 second and 3 second also led to higher detection accuracy than 0.5 second. The pattern fields in the datablock were associated with better detection performance than the alphanumeric fields. To optimize the performance of change detection task in air traffic control system, we need to consider both factors of datablock complexity and exposure time. For the more complex datablock, longer exposure time should be provided. Keywords: Air traffic control display, change detection task, complexity of datablock, exposure time.

1 Introduction In the air traffic control (ATC) system, the core cognitive tasks that controllers perform include monitoring air traffic situation and searching for conflict [1]. These tasks require the controller to detect changes in the aircraft statuses and maintain situation awareness. The controller’s change detection task performance might directly impact the effectiveness and safety of the air traffic control system. The basic information unit to present aircraft status on the ATC display is Datablock (DB). Each DB contains many fields that present information related to the aircraft, including altitude, speed, heading direction, etc. Because controllers obtain the aircraft information from the DB to perform tasks, the design of DB itself might directly affect the controller’s task performances. The design of DB has two dimensions of complexity. The first dimension is related to the number of DB fields. DB containing more fields has higher objective complexity. To avoid display clutter, there D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 554–563, 2009. © Springer-Verlag Berlin Heidelberg 2009

Effect of Aircraft Datablock Complexity and Exposure Time

555

are usually limits on the number of DB fields that can be conveniently displayed on the ATC display. It is of interest to know how many DB field is too complex for human to process information and perform tasks. The second dimension of complexity stems from different ways that the DB fields are encoded. Some DB fields employ visual features that are salient for the visual system to process, such as color, underline, onset, etc. In a change detection task, if changes are encoded with these visual features, they might be easier for the participants to detect. But if the changes occur in numbers or letters, it might be harder to detect. We call the DB fields that use salient visual features to encode changes “pattern fields”, and the DB fields encoded with numbers or letters “alphanumeric fields”. In this study, we aim to investigate the effect of the DB complexity dimensions- number of DB fields, and method of DB field encoding, on change detection task performance. It needs to be noted that the DB used in this study are artificially made for research purpose only and are not being used in real air traffic control facilities. Change detection task is commonly used to study visual attention and working memory [2]. There are three stages in a change detection task: perceptual encoding, retaining information in the Visual Working Memory (VWM), and comparison [3]. Participants first view a stimulus (called “sample display”) for a brief exposure time, encode its visual information perceptually, and store it in the VWM. People can hold visual information in VWM for a few seconds [4]. Then participants view another stimulus (called “test display”), retrieve the memorized pattern of the sample display and compare it against the test display to indicate any change. If a person fails in a change detection task, it might be due to failure in perceptual encoding of the sample display, in memory storage of the sample display, or in comparison between the sample and test displays [3]. Therefore, change detection task reflects limitations in both perceptual encoding and in memory storage [3]. The exposure time of the sample display used for change detection task is a critical factor for task performance. A failure in perceptual encoding could result from short exposure time of the sample display. Most VWM studies use a duration of 500ms or shorter [3].While 500ms may be long enough for participants to perceive simple stimuli, it is insufficient to perceive a large number of complex stimuli [3]. If the exposure time is too short for the sample display, participant may not have enough time to perceptually encode the information in stimuli. This is especially true for more complex stimuli. On the other hand, if the exposure time is too long, then participants may encode the stimuli in the sample display elaborately. They may start to group similar items together and store them in chunks [3]. Another important factor to consider in a change detection task is the complexity of the stimuli in the sample display. A failure in memory storage may be caused by the complexity of the stimuli. Past literatures report that the storage capacity of VWM is about four objects [4-6]. This might set the memory limit for the change detection task. DB used in air traffic control system usually contains more than four fields. We are interested to see how complexity in the DB stimuli affects the change detection task performance. Controllers receive systematic trainings to perform their tasks. It is valuable to understand their learning process of the change detection task. Can they learn to deal with complexity imbedded in the DB through training? Will their performance

556

C. Ling and L. Hua

improve with training? Research shows that although many aspects of visual cognition were sensitive to learning, the capacity of VWM was relatively insensitive to general procedural learning [3]. In this study, we kept track of the task performance throughout several training sessions and investigated the role of learning in the change detection task. Summing up, two factors could affect the learning process of the change detection task on DB: the complexity of the DB content both in terms of number of fields and ways of encoding the fields, and the exposure time of the sample display for the participant to detect any change. In this study, we investigated the effect of these factors on the learning process of DB change detection task.

2 Methodology 2.1 Participants A total of 10 male participants took part in this experiment. They were college students recruited from University of Oklahoma campus. Their ages ranged from 18 to 25. All of the participants had normal vision of at least 20/20 and normal color vision condition. 2.2 Experiment Stimuli Each participant was trained and tested with two types of DBs, six field datablock (6F-DB) and nine field datablock (9F-DB). The DBs were artificially made and partly mimicked a prototype of DBs in a future en route radar display. All DBs had three lines. The total numbers of fields that participants needed to monitor for changes were six for 6F-DB and nine for 9F-DB. An example of the 9F-DB is shown in Fig. 1. The DB had two types of fields: pattern field and alphanumeric field. 6F-DB had three pattern fields and three alphanumeric fields, and 9F-DB had five pattern fields and four alphanumeric fields. The alphanumeric fields are bolded in Fig. 1. The descriptions of the DB fields are given in Table 1.

Fig. 1. Meaning of the DB Fields in 9F-DB

Effect of Aircraft Datablock Complexity and Exposure Time

557

Table 1. Description of DB Fields in Two Types of DBs Change Rangea

Field Name Alert sign Call sign Loss of separation Planned altitude Vertical status Reported altitude

--

CA (Conflict alert) LA (Low-altitude alert)

UAL123

UAL123

---

Æ3

Number change randomly --290

Number change randomly UP DW [DW][UP][] 290#

6FDBb

9FDBb

Field Typec

x

x

P

x

x

P

x

P

x

x

A

x

x

P

x

P

Number change ranNumber change x x A domly randomly Heading Number change Number change ranx x A direction randomly domly Aircraft type R, J Other or no letters x A a The two columns represent the change range of the field. The content of the field can change from one column to another in change detection task b Cell containing an “x” means that DB contains that field. c P stands for pattern field; A stands for alphanumeric field. Flight speed

2.3 Equipment An Optiplex GX620 Dell computer with 2GB of RAM and a Pentium D Smithfield (3.2GHz) processor was used to control the presentation of the DB stimuli. The stimuli were presented on a 19-inch Dell color monitor with resolution of 1024x768. MATLAB program was used for the visual stimuli presentation and data collection. The experiments were conducted in a quiet room. 2.4 Procedure Before the experiment started, participants took a computer-based visual acuity test to ensure that they had normal vision conditions. Then the participants learned the meaning of all DB fields through a PowerPoint slide presentation. Afterwards, the participants were tested with a short verbal quiz to make sure that they understood the meanings of the DB fields and their change ranges. As depicted in Fig. 2, during the experiment, a DB sample display was presented briefly on the computer screen 240 pixels to the left from the center for the screen for 3s, 1s, or 0.5s, followed by a DB test display presented at the center of the screen. The participant made a judgment of “change” or “no change”, and responded by pressing the corresponding button on the lower half of the screen. The time between the presentation of the test display and the button-pressing was recorded by MATLAB program as the reaction time.

558

C. Ling and L. Hua

Fig. 2. Experiment Paradigm for Change Detection Task

The experiment took place on four consecutive days for the two types of DBs, with 6F-DB on the first two days, and 9F-DB on the last two days. The participant’s task was to tell whether the two DBs shown on the screen consecutively changed or not. If the participant indicated that the DB had changed, they also needed to choose which DB field had changed. Participants underwent six training sessions for each type of DBs, with the first three sessions on the first day and the second three sessions on the second day. Within each training session, there were three training blocks with different exposure time. The exposure time for the sample display in each block varied from 3s, to 1s, and to 0.5s. Each training block consisted of 40 test trials. The numbers of test trials for both types of DB were the same to control the training length. In the change detection task, the DB stimuli were designed as with or without change. In trials where the test display was different from the sample display, the change may occur in different DB fields. In 6F-DB, out of the 40 test trials in each training block, 16 were designed as control trials where no change occurred between sample and test display. In the remaining 24 trials, changes occurred in four trials for each of the six DB fields. In 9F-DB, 14 out of 40 trials were designed as control trials without any change. In the remaining 26 trials, changes occurred in two trials for each of the five pattern fields, and in four trials for each of the four alphanumeric fields. The accuracy of change detection task for each session was calculated by dividing the number of DBs that were correctly judged in a session by the total number of trials in the session. The accuracy was tracked using the Matlab program. At the end of each day’s training, participants filled out the NASA-TLX instrument to report the subjective mental workload that they were under.

3 Results 3.1 Effect of DB Type, Exposure Time, and Training Session on Performance The accuracy achieved in the change detection task is plotted in Fig. 3 and 4 as a function of training sessions and exposure time for 6F-DB and 9F-DB respectively. ANOVA was performed on the accuracy with the type of DB (6F-DB and 9F-DB), exposure time (0.5s, 1s, 3s), and training session (one through six) as independent variables. The analysis revealed a main effect of DB type, F (1, 9) =16.69, p=.003, a

Effect of Aircraft Datablock Complexity and Exposure Time

559

Fig. 3. Detection Accuracy across Six Training Sessions for 6F-DB (n=10, Error bars represent ± 1 SEM)

Fig. 1. Detection Accuracy across Six Training Sessions for 9F-DB (n=10, Error bars represent ± 1 SEM)

main effect of exposure time, F (2, 90) =217.22, p=.0001, and a main effect of training session, F (5, 45) =13.19, p<.0001. There was also significant interaction effect between DB and training session, F (5, 45) =5.39, p=.0006, and between DB and exposure time, F (2, 90) =6.17, p=.003. Participant’ accuracy in change detection task was higher with 6F-DB (M=0.85, SD=0.11) than with 9F-DB (M=0.80, SD=0.10). The accuracy associated with the three exposure time, 0.5s (M=0.75, SD=0.09), 1s (M=0.83, SD=0.10), and 3s (M=0.89, SD=0.08) were all significantly different from each other. Longer exposure time led to higher detection accuracy. Tukey test also showed that training sessions one (M=0.78, SD=0.11) and two (M=0.79, SD=0.10) were significant different from sessions three (M=0.82, SD=0.10), four (M=0.85, SD=0.11), five (M=0.85, SD=0.09) and six (M=0.85, SD=0.10), but they were not significantly different from each other. Significant DB type by training session interaction effect is demonstrated in Fig. 3 and 4. An obvious learning effect in 6F-DB is depicted in Fig. 3. There is a trend of increase across sessions one through six for all three exposure time with 6F-DB. However, such increasing trend is not observed in the learning curve with 9F-DB (Fig. 4). Instead, the detection accuracy maintained at the same level with some

560

C. Ling and L. Hua

fluctuations throughout the sessions. Such difference in learning between 6F-DB and 9F-DB is likely caused by the differences in the complexity of the DBs. The complexity due to extra fields in 9F-DB may have made it much harder to learn than 6F-DB. Fig. 3 and 4 also demonstrate the significant interaction between DB type and exposure time. In Fig. 3, the difference in accuracy between 3s and 1s is not very obvious. But such difference is more obvious in Fig. 4. It indicates that because 6F-DB is less complex, 1s is still sufficient for the participants to achieve similar level of performance as 3s. But with increased complexity in 9F-DB, there is a difference between 3s and 1s. 1s is no longer long enough for participants to do the change detection task. 3.2 Comparison between Pattern and Alphanumeric Fields We further examined the performance associated with two types of DB fields- pattern and alphanumeric fields, and investigated the difference between them in 6F-DB and 9F-DB respectively. The average detection accuracy for the two DB type across all training sessions for all DB fields are calculated and plotted in Fig. 5 and 6 (with alphanumeric fields circled in red). The obvious trend in both Figures was that the detection accuracies are lower for alphanumeric fields than the pattern fields. Also, the differences between exposure time became more obvious with the alphanumeric fields than with pattern fields. ANOVA was performed on 6F-DB accuracy data with DB fields as one factor, and exposure time (0.5s, 1s, 3s) as another factor. The analysis revealed a main effect of DB field, F (5, 45) =16.49, p<0.0001, a main effect of exposure time, F (2, 18) =46.75, p<.0001, and an interaction between field and exposure time, F (10, 90) =7.35, p<.0001. Post-hoc Tukey test showed that exposure time 3s (M=0.89, SD=0.12) and 1s (M=0.85, SD=0.17) were not significantly different from each other, but they were significantly different from exposure time 0.5s (M=0.74, SD=0.27). Tukey test also showed significant differences existed between pattern and alphanumeric fields. The interaction effect was mainly due to the larger differences among exposure time for alphanumeric fields, and smaller differences for pattern fields.

Fig. 5. Comparison of Task Performance for 6F-DB Fields

Effect of Aircraft Datablock Complexity and Exposure Time

561

Fig. 6. Comparison of Task Performance for 9F-DB Fields

Similarly, ANOVA performed on 9F-DB accuracy data revealed a main effect of DB field, F (8, 72) =36.66, p<.0001, a main effect of exposure time F (2, 18) =47, p<.0001, and an interaction between field and exposure time F (16, 144) =7.97, p<.0001. Post-hoc Tukey test on exposure time showed that exposure time 3s (M=0.86, SD=0.22), 1s (M=0.78, SD=0.28), and 0.5s (M=0.73, SD=0.33) were all significantly different from each other. Tukey test also showed significant differences between pattern and alphanumeric fields. In particular, there was no significant difference among pattern fields. Within the alphanumeric fields, aircraft type field which used letters to encode changes was significantly lower than fields of flight speed and heading direction, both of which used numbers. The significant interaction between exposure time and field was again due to the larger differences among exposure time for alphanumeric fields, and smaller differences for pattern fields. The larger difference suggested that alphanumeric fields were more difficult to process for the participants. When the exposure time became shorter, the participants were not able to detect the changes presented in the alphanumeric fields. On the other hand, all the pattern fields have relatively similar performances, all of which were above 90% (see Fig. 5 and 6). The results suggested that the salient visual features used to encode changes in pattern fields took much shorter time to process. Even 0.5 s was sufficient for detecting such changes. 3.3 Mental Workload Participants’ mental workloads were measured with NASA-TLX at the end of each day’s training. ANOVA was performed on mental workload rating with DB (6F-DB and 9F-DB) as one factor, and day of training (first day and second day) as another factor. The analysis revealed a significant main effect of DB, F (1, 10) =6.63, p=0.03. No effect of day of training and interaction were found. 9F-DB (M=64.22, SD=14.20) was associated with higher mental workload than 6F-DB (M=58.18, SD=13.15).

562

C. Ling and L. Hua

4 Discussion Compared to 6F-DB, the higher objective complexity in 9F-DB posed more challenges for the participants in performing a change detection task. In view of the similar level of performance with pattern fields for 6F-DB and 9F-DB, we speculate that the difference was mainly driven by the extra alphanumeric fields in 9F-DB. Human operators needed more attention resources and time to encode alphanumeric information. With five pattern fields and four alphanumeric fields, 9F-DB posed an excessive perceived complexity on the human operator. The superior performance with pattern fields in both 6F-DB and 9F-DB suggests that such encoding technique is more effective in alerting controller of important changes in the DB. Participants noticed changes presented in these types of fields more readily. On the other hand, it was much harder to notice changes in the alphanumeric fields. It might not be advisable to rely solely on numbers or letters to present critical change in information in a short timeframe. Instead, symbolic representation might be helpful to alert the controllers of changes occurring in these fields. In terms of exposure time, the longer exposure time of DB sample display was associated with higher accuracy. Significant differences were found among 3s, 1s, and 0.5s for both 6F-DB and 9F-DB. The exposure time of 0.5s was generally associated with lower detection performance, suggesting that the participant did not have enough time to encode the presented information perceptually. The exposure time of 1s was acceptable for 6F-DB, but was not long enough for 9F-DB. The combination of stimuli complexity and exposure time caused different difficulty levels for the change detection task. Learning occurred for 6F-DB as indicated by significant differences among training sessions. But when DB became too complex, the participants were not able to improve their performance with more training sessions. In the case with 9FDB, there was no significant difference among sessions, and the change detection task performance just fluctuated. This also suggested that 9F-DB was associated with a complexity level that is too high for the participant to improve their performance. In this study, we investigated the effects of DB complexity and exposure time on the learning of change detection task. Our results showed that both factors had direct impacts on task performance. In particular, the less complex 6F-DB led to higher change detection accuracy than the more complex 9F-DB. The longer exposure time of the DB sample display led to higher detection accuracy. And the changes in pattern fields were easier to detect than the alphanumeric fields. To optimize change detection task performance in air traffic control system, we need to consider both factors of DB complexity and exposure time. For more complex DB, longer exposure time should be provided. It is advisable to encode changes with salient visual features such as symbols, colors, and onsets rather than alphanumerically. Several differences existed between our experiment and traditional change detection experiment paradigm. First, our experiment used more complex stimuli that are similar to those in the real-life operational environment, whereas the traditional change detection experiment mostly used over simplified stimuli. Second, our experiment showed the sample display and test display at different locations instead of maintaining them at the same location and inserting retention time period. The reason for such design is for the results to be similar and applicable to the task scenario that air traffic controller might face because the aircrafts target kept on moving on the

Effect of Aircraft Datablock Complexity and Exposure Time

563

display. With the above-mentioned differences in our design, our results are more readily applicable to the air traffic control area.

Acknowledgement This research was supported by Federal Aviation Administration (FAA) Civil Aerospace Medical Institute (CAMI), Oklahoma City, with grant entitled as “Investigating Information Complexity in Three Types of Air Traffic Control (ATC) Displays” grant number FAA 06-G-013. The FAA grant monitor, Dr. Jing Xing, initiated the study, helped design of the experiment, and contributed to data analysis.

Reference 1. EATMP: Integrated Task and Job Analysis of Air Traffic Controllers - Phase 3 - Baseline reference of Air Traffic Controller Tasks and Cognitive Processes in the ECAC Area (HUM.ET1.ST01.1000-REP-0). 1.0 edn., Brussels: EUROCONTROL (2000) 2. Jiang, Y., Song, J.H.: Spatial context learning in visual search and change detection. Perception & Psychophsics 67(7), 1128–1139 (2005) 3. Eng, H.Y., Chen, D., Jiang, Y.: Visual working memory for simple and complex visual stimuli. Psychonomic Bulletin & Review 12(6), 1127–1133 (2005) 4. Luck, S.J., Vogel, E.K.: The capacity of VWM for features and conjunctions. Nature 390, 279–281 (1997) 5. Bundesen, C.: A theory of visual attention. Psychology Review 97, 523–547 (1990) 6. Vogel, E.K., Woodman, G.F., Luck, S.J.: Storage of features, conjunctions, and objects in visual working memory. Journal of Experimental Psychology: Human Perception & Performance 27, 92–114 (2001)

A Regulatory-Based Approach to Safety Analysis of Unmanned Aircraft Systems James T. Luxhøj and Ahmet Öztekin Department of Industrial and Systems Engineering Rutgers University, Piscataway, NJ USA [email protected], [email protected]

Abstract. Unmanned Aircraft Systems (UAS), the new frontier in civil aviation, add another dimension to the ever-increasing complexity of the current National Airspace System (NAS) in the United States. The future inclusion of private and commercial operations of the UAS into the NAS, unavoidably, raises safety concerns. As the NAS becomes increasingly more complex and constrained, the associated hazard and safety risk modeling must also mature in sophistication. Thus, there is a need for advanced studies focusing on risk-based system safety analysis of emergent UAS operations. This paper presents a regulatory-based integrated approach to system safety and risk analysis of the UAS operations and their interaction with the current NAS and the future Next Generation (NextGen) Airspace.

1 Introduction The National Airspace System (NAS) in the United States is increasingly becoming a complex array of commercial and general aviation aircraft, unmanned aerial systems, reusable launch vehicles, rotorcraft, airports, air traffic control, weather services, and maintenance operations, among others. This increased system complexity necessitates the application of systematic safety risk analysis methods to understand and eliminate where possible, reduce, and/or mitigate risk factors. As the NAS becomes increasingly more complex and constrained, the associated hazard and safety risk modeling must also mature in sophistication. Thus, there is a need for advanced studies focusing on risk-based system safety analysis of emergent Unmanned Aircraft Systems (UAS) operations. This paper presents a regulatorybased integrated approach to system safety and risk analysis of the UAS operations and their interaction with the current NAS and the future Next Generation (NextGen) Airspace. Four distinct yet closely related areas of analysis comprise the main thrust of the proposed approach: taxonomy development, causal factor identification, database development, and modeling complex uncertainty. Safe integration of UASs into the NAS presents significant challenges to all parties of the aviation community. Although the main thrust of this immerging technology originates from entrepreneurs, both civilian and military, the burden of the safe integration arguably lies on the shoulders of regulatory agencies such as Federal Aviation Authority (FAA). The question of safety associated with this integration arises D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 564–573, 2009. © Springer-Verlag Berlin Heidelberg 2009

A Regulatory-Based Approach to Safety Analysis of Unmanned Aircraft Systems

565

principally due to the unknowns of potential hazards and associated risks while operating in the NAS and interacting with existing NAS users. Formally, UAS is defined as: A device used or intended to be used for flight in the air that has no onboard pilot, which is a clarification of the existing Aircraft definition, 14 CFR §1.1, which indicates that UAS operations are governed by the existing regulations. In the past, when incidents or accidents occurred, a forensic approach was undertaken in the hazard analysis phase of the system safety approach. Heinrich suggested the "domino" theory of accidents [2]. This view changes the focus of accident investigations toward the events involved, rather than the conditions surrounding the accident environment. The objective is for analysts and investigators to understand the accident phenomenon on the basis of the chain of events that had occurred. The National Transportation Safety Board (NTSB) uses a variant of the sequence of events approach in their analysis of aircraft accidents. While such a forensic approach has merit and has been important and useful to system safety analysts in the past, it is very scenario driven and quite dependent on the contextual factors involved in the specific incident or accident. As such, the safety recommendations emerging from such a forensic analysis may be quite aircraft type or airport specific. The system safety approach involves an identify-analyze-control method of safety as opposed to a “fly-fix-fly” approach [1]. One key hypothesis of this paper is that the system safety approach is better suited to safety analysis of new classes of aircraft where data is sparse and operations are limited. In particular, UAS represent a new class of aircraft that is emerging in the current NAS and that will most likely be an integral component of the U.S. Next Generation (NextGen) Air Transport System and the Single European Sky (SESAR). The basic building block for both NextGen and SESAR involves creating an operating environment in four dimensions: latitude, longitude, altitude, and time [3]. It is envisioned that the NextGen will facilitate the move to “free-flight” where satellite-derived positioning data is transmitted via a digital data link to the ground and every aircraft in the sky will then select its own flight path for optimal speed, fuel consumption and turbulence avoidance [3].

2 System Safety Methodology A second hypothesis of this paper is that safety hazards may be derived top-down as opposed to bottom-up. Rather than collecting hazard information from a case-by-case or scenario approach, especially for novel aircraft systems where such data is usually not available, the conjecture is that the Title 14 Code of Federal Regulations or CFRs may be used to derive hazards as well as their underlying causal factors by utilizing a systems analysis approach. Thus, this paper proposes a regulatory-based integrated approach to system safety and risk analysis of the UAS operations and their interaction with the current NAS and the future NextGen Airspace. The next sections describe the elements of the proposed approach in more detail.

566

J.T. Luxhøj and A. Öztekin

2.1 Development of a System-Level Taxonomy for Categorization of UAS Hazards: One of the first steps in the proposed UAS system safety analysis is hazard identification and analysis. To that end, a new hazard taxonomy was developed. This taxonomy, termed the Hazard Analysis and Classification System (HCAS) identifies four main hazard system sources: Airmen, UAS, Operations, and Environment. The basic framework of the proposed taxonomy closes is based on the FAA regulatory perspective (i.e., Title 14, Code of Federal Regulations (14 CFR) chapters on Aircraft, Airmen, Certification/Airworthiness, Flight Operations, etc.). Such an approach uniquely distinguishes the HCAS taxonomy from all other UAS hazard analyses being performed by the Department of Defense (DoD), the RTCA-Special Committee (SC) 203 [4,5], etc. Safety analysis has a fundamental role to play in the identification of hazard source potentials, the understanding of the underlying causal factors, the likelihood assessment of these factors, the severity evaluation of the potential consequence(s) of mishaps, and the prioritization of mitigations. A sound system-level safety analysis relies heavily on properly identifying the key components of the area of interest. In particular, the identification of potential hazard sources and sub-sources within the systemic structure of the problem domain should be considered as a fundamentally important step in system safety analysis. Furthermore, since semantics play a crucial role while defining the domain variables, a systematic taxonomy that balances fidelity and generalization provides a solid foundation for a meaningful and relevant system safety analysis. Within this context, we present the HCAS taxonomy specifically designed and developed to identify and categorize individual system-level hazard sources for UAS operations. HCAS categorizes the UAS hazards consistent with the 14 CFR Sub-chapters, thereby establishing the taxonomy on the FAA regulatory framework. The advantage of the proposed approach is to allow direct association of hazards identified with regulatory requirements or vice versa. Once the system is established, it will not only provide the FAA as well as the UAS community the tools to determine safety and regulatory implications of UAS operating in the NAS, but also fall in directly under the FAA Safety Management System (SMS) Doctrine. Particularly, as described in [6], the taxonomy was uniquely developed but was inspired by the research of Hayhurst et al. [7] and the RTCA Special Committee 203 [5]. 208 hypothesized UAS mishap scenarios provided by the FAA are employed to verify and test the robustness of the taxonomy. These hypothesized scenarios were not detailed or specific operational scenarios but were rather more akin to “thought experiments” of possible UAS mishaps due to their inclusion in the NAS. de Jong et al. [8] present an approach to pushing the boundary between imaginable and unimaginable hazards that keeps the performance of the hazard identification process separate from the hazard analysis and hazard mitigation processes, so the idea of developing “UAS scenario themes” is consistent with the de Jong method for hazard identification. These hypothesized UAS scenarios supported “analytic generalization” and were primarily used to develop concise terminology of system and sub-system hazard sources. The semantics of the hazards were aligned in a general way with the wording of the main CFR chapters and also vetted with industry subject matter experts.

A Regulatory-Based Approach to Safety Analysis of Unmanned Aircraft Systems

567

UAS Hazard Classification and Analysis System (HCAS) – version 3.5

• Training • Supervision

• Regulatory Agency • Certification • Licensing • Oversight

• Individual Licensing

• Fixed • Multiple • Combinations

• Hardware and Software • Communications Link • Data Link Framework • Infrastructure • Signals • Organizational Human Factors • Aircraft Design Organization • Control Station Design Organization • Regulatory Agency • Certification • Licensing • Oversight

OPERATIONS

UAS

• Aircraft • Aerodynamics • Airframe • Payload • Propulsion • Avionics Hardware and Software • Sensors / Antennas • Communication Link • Onboard Emergency Recovery • Detect, Sense and Avoid • Aircraft Systems • Control Station • Classification • Mobile

• Pilot • Maintenance • Service and Support Personnel

ENVIRONMENT

AIRMEN

• Individual Human Factors • Pilot • Maintenance Technician • Service and Support Personnel • Organizational HF • Operator

• Flight Operations • Flight Planning • Phase of Flight • Emergency Recovery • Type of Operations • •

•

•

•

• •

• Terrain • Electromagnetic Activity • • • •

• • • • • • •

Weather Particulates FOD Wildlife Hazards • Bird Strike • Animals Obstacles Others Traffic External Influences International Regulatory Differences Airports Navigation Network National Security

Line of Sight / Beyond Line of Sight VFR / IFR

• Operational Control • Instrument Procedures and Navigational Charts Continued Airworthiness • UAV • Control Station • Maintenance Source • Communication Interface ATC Communications • Radio • Data Transmission • Visual Airspace • Established • Temporary Personnel (including Oversight Personnel and ATC) Organizational Human Factors • Operator • Regulatory Agency • Certification • Oversight

Fig. 1. The current version of the HCAS taxonomy

Using the definition of a hazard as adapted from Leveson [9] to be that a hazard is a state or set of conditions of a system that, together with other conditions in the environment of the system, may lead to an accident (loss event), the HCAS taxonomy was created. In the current version of the HCAS depicted in Figure 1, four primary hazard sources are identified as UAS, Airmen, Operations, and Environment. For each of the system hazard source potentials, subsystem elements are also identified. For example, for the system hazard source of “UAS”, the subsystem hazard sources groups of aircraft, control station, data link, and organizational human factors are included. The notion of a “hazard source” is consistent with “hazardous element” of Ericson’s “hazard triangle” and also recognizes that a hazard needs a trigger or initiator to move it from a dormant to an active state, thus focusing on the hazard’s “potential” to do harm. Summary papers on HCAS versions 1, 2 and 3 are presented in Öztekin, Luxhøj, and Allocco [6] and in Öztekin, Luxhoj [10]. These influence diagrams may then be used to study the interactions among various causal factors associated with the hazards. Conceptually, HCAS represents a hierarchical structure for UAS hazard sources. In particular, at the very top, there are system-level hazard sources, which, in lower levels, are decomposed into their subsystem-level hazard sources. Since civil UAS operations are relatively new and emergent, databases of mishaps are not readily available.

568

J.T. Luxhøj and A. Öztekin

The proposed taxonomy depicted in Figure 1 may also be used to construct influence/causal factor diagrams representing hypothetical or notional UAS outcome scenarios. The use of modifiers placed on the HCAS taxonomy elements, such as “inappropriate”, “inadequate”, etc. may be used to create such an influence diagram. 2.2 Identifying the Causal Factors – A Regulatory-Based Approach Hazards are not causal factors. The decomposition of hazards into their constituent causal factors is another important step in the development of a comprehensive scheme for UAS safety risk modeling. Underlying causes of the hazards, such as failure modes, operator and software errors, design flaws, etc., need to be identified in order to eventually determine the mishap risk and the hazard mitigations. However, HCAS is not a taxonomy of causal factors. Although the resulting taxonomy for the UAS hazard sources is intended to be generic and inclusive, it represents an inductive reasoning approach with particular emphasis on a given set of UAS hazard scenarios. Hence, to determine a taxonomy of UAS causal factors, which are, strictly speaking, hierarchically at a lower level than hazard sources, we chose to employ deductive reasoning and based our analysis on the current FAA regulations for commercial civil aviation. Knowledge elicitation sessions with subject matter experts are heavily utilized throughout this process. Subsequently, individual causal factors are mapped to the taxonomy of UAS hazard sources resulting in a seamless analysis that is generic enough to cover most possible UAS operational scenarios yet provides the necessary level of fidelity to map their prominent features into a database. At the crux of our regulatory-based approach lie the following assumptions: • UAS integration will impact the entire NAS because of the wide-ranges of UAS size, weight, performance characteristics, airspace access, and unique operation issues; • There are no sufficient data and proven methods to perform UAS safety analysis with the traditional event-driven approach; The regulations provide the essential safety net for the NAS safety; • There exist a set of causal factors, which can be identified, associated with each relevant regulatory section; • With proper descriptions of causal factors, the inter-dependencies (linkages) among themselves can be demonstrated; • These linkages form the basis to assess UAS safety risk analyses that are performed by applying to the current event-driven approach through the Hazard Classification and Analysis System (HCAS) model. Conceptually, our proposed approach to identify causal factors based on existing regulatory structure represents a hierarchical framework. At the very top, covering the whole NAS, Federal Aviation Regulations (FARs) provide the minimum requirement for safe operations. Within the context of FAR Subchapters, functional models provide fidelity to conceptualize the risk associated with the proposed UAS operations. Consequently, groups of causal factors are identified to outline the underpinnings of each UAS related risk. However, unlike conventional hierarchical methodologies such as Fault Trees, the proposed framework, illustrated in Figure 2,

A Regulatory-Based Approach to Safety Analysis of Unmanned Aircraft Systems

569

Federal Aviation Regulations Subchapter C - Aircraft

Subchapter D - Airmen

UAS Functional Model

UAS Functional Model

Risk

Risk

Causal Factors

Causal Factors

... UAS Functional Model

UAS Functional Model

Risk

Risk

Causal Factors

.. .

Causal Factors

.. .

Fig. 2. Hierarchy and interactions within the proposed regulatory-based framework FARs FAR Subchapter

Causal Factor Group

C - Aircraft

Airworthiness • Description • Definition • Keywords • Interactions

HCAS Hazard Source

Scenarios

Element # …

D - Airmen

Maintenance • Description • Definition • Keywords • Interactions

Element # …

Parts • Description • Definition • Keywords • Interactions

Element #…

UAS Scenarios

Fig. 3. Causal Factors are the link between regulations and HCAS taxonomy

also emphasizes the interactions and connectivity among various components and compartments comprising the whole domain. We introduce the notion of a “causal factor cohort” or “grouping” to maintain a system-level approach while structuring the UAS causal factor taxonomy and to establish a viable linkage between HCAS and individual causal factors. The “grouping” emanates from the Title 14 CFR Subchapters. In some cases, it may be that a cohort title coincides with the exact title of a subchapter, such as “Airspace” or “Airmen”. In

570

J.T. Luxhøj and A. Öztekin

other cases, it may be that the title of a cohort may not coincide with the title of a subchapter and be an “analytic construct” that is the result of a creative dialogue by the UAS safety team. In the latter case, such cohort terms as “Design” or “Documentation” may emerge. Since the development of cohort terminology will be a creative process by the UAS safety team, the cohort titles will need some vetting within the aviation community. For each causal factor cohort or grouping, the causal factors within that grouping will be identified, described, defined, and keywords provided. Any interactions with other causal factor cohorts or groups will also be identified. Figure 3 is a notional diagram depicting the causal factor cohort or grouping strategy. 2.3 UAS Safety Database The development of the database for UAS causal factors constitutes a key component of the proposed integrated approach and focuses on identifying keywords associated with individual causal factors. Since the whole research effort has a regulatory skew the proposed analysis is aimed to define some initial boundary conditions for understanding the safety requirements of an emerging technology with possible hazardous impact on current aviation operations. Within the context of a UAS scenario, which could be an accident/incident or an operational application, a collection of possible causal factors can be considered as the feature set defining that particular scenario. For the purposes of text/data mining, keywords will be used to map individual UAS scenarios to representative sets of causal factors. These sets of causal factors could also be employed by a regulatory agency to identify possible areas of concern for safety associated with a specific application for UAS operations. The development of the proposed safety database is currently in its early stages. 2.4 Causal Probabilistic Modeling of Complex Systems Another significant challenge in modern aviation system safety practice is the analytical modeling of emergent operations in the NAS that include the use of a new generation (NextGen) of advanced aircraft and supporting systems, such as very light jets, reusable launch vehicles, unmanned aircraft systems, among others. Since these operations are new, accident and incident data are extremely rare, and alternative modeling approaches to conventional fault tree logic are required to understand the impact of the introduction of these operations into the NAS. Many real-world complex systems are naturally represented by hybrid models, which contain both discrete and continuous variables. The UAS operations within the NAS represent such a complexity. However, current methods for quantifying complex uncertainty demonstrate topological and algorithmic limitations when addressing the interactions of these variables. Bayesian Networks (BNs) are tools to model uncertainty in the form of a probability distribution imposed by a directed acyclic graph representing the domain of interest. Hence, BNs only address the uncertainty in the form of randomness about a problem domain. However, uncertainty in a typical real-world application has three dimensions: vagueness, ambiguity, and randomness [11] and BNs, being solidly anchored to probability theory, only address one of these dimensions, namely randomness. For instance,

A Regulatory-Based Approach to Safety Analysis of Unmanned Aircraft Systems

571

consider that there is ambiguity regarding the observed evidence associated with some variable in a given Bayesian Network. Such an ambiguity can be modeled by Fuzzy Sets. More specifically, Fuzzy Set theory, introduced by Zadeh in the late sixties [12], proposes a framework to deal with a poorly defined concept in a coherent and structured way. Examples of poorly defined concepts suitable for the application of Fuzzy logic are semantic variables, such as heavy workload, inadequate training, fast, slow, tall, short, etc. Within the context of our current research, Fuzzy Sets present two important features worthwhile for further exploration: • Fuzzy Sets provide a complete set of tools to partition continuous domains into overlapping membership regions, which result in a much more realistic discretization of the continuous domain in question. • Uncertainty regarding any empirical observation can be represented as a Fuzzy measure. Many real-world complex systems include discrete and continuous variables, which poses as an additional challenge while representing the uncertainty associated with the system as an analytical model. Hybrid Bayesian Networks (HBNs), which include both continuous and discrete variables, are a generalization on discrete only Bayesian Networks. HBNs are inherently more suitable for modeling complex systems; such as visual target tracking as in “see and avoid” type of applications where the variables defining location of the target and its speed are inherently continuous and speech recognition where the bits and pieces of processed audio signals are often continuous. However, HBNs as the generalization of discrete BNs have their own shortcomings that arise when we would like to perform exact inferencing on them. Exact inferencing on general HBNs imposes restrictions on the network structure of the HBN. The state of the art exact inferencing algorithm for HBNs, the Lauritzen algorithm, requires that the network satisfies the constraint that no continuous variables have discrete children [13]. As one would expect, this restriction places quite a burden on the generalization claim of the HBNs. We believe that Fuzzy Set theory offers a comprehensive structure to introduce the ambiguity dimension of uncertainty to the existing framework of Hybrid Bayesian Networks and within this context, we are currently researching the development of a complete formalism that combines Fuzzy Sets and Bayesian Networks for reasoning about complex problems such as modeling the safety risk in UAS applications. Our proposed framework builds on the Lauritzen algorithm to generate a hybrid exact inferencing algorithm for general HBNs [14]. Conceptually, regardless of the resulting topological structure for the model of the complex system in question, the crisp-continuous variables in the hybrid Bayesian network are converted into their Fuzzy-discrete counterparts. This conceptual transformation of the model domain is depicted in Figure 6. Consequently, the resulting Fuzzy-Bayesian network is propagated using the proposed probabilistic inferencing algorithm. Notwithstanding particular emphasis on system safety analysis of UAS operations, the proposed hybrid methodology not only provides a tool set for realistic causal modeling of complex systems but also offers a calculus for quantifying complex

572

J.T. Luxhøj and A. Öztekin

Speed Speed

Sensor

Thrust

Data Link

Hybrid Bayesian Network

Thrust

Sensor

Data Link

Fuzzy Bayesian Network

Fig. 4. The crisp domain of a HBN is transferred to the Fuzzy domain of the proposed FBN.

uncertainty in real-world applications. We are currently researching the development of a formalism that combines Fuzzy Sets and Bayesian Networks for reasoning about complex problems such as modeling the safety risk in UAS applications.

3 Conclusions This paper outlines a conceptual framework for a regulatory-based system safety analysis of UAS integration into the NAS, with a particular emphasis on human factors while defining the building blocks of the proposed hazard and causal factor taxonomy. Within this context, four research components are proposed: HCAS taxonomy, regulatory-based causal-factor identification, database development, and uncertainty modeling of hybrid complex systems. HCAS provides a systems-level hazard taxonomy developed for unmanned aircraft. It presents a structured framework to identify and classify or categorize both system and sub-system hazard sources for UAS operations. However hazard sources are not causal factors. In order to determine a taxonomy of UAS causal factors, we employ deductive reasoning and based our analysis on the current FAA regulations for commercial civil aviation. This taxonomy of causal factors will, in turn, constitute the seed for a database to facilitate safety analysis of immerging UAS operations. Finally, the concept of a hybrid fuzzy-Bayesian approach is outlined that is being developed to handle both discrete and continuous variables when uncertainty and vagueness may co-exist in the safety risk analysis. Future research involves further development of the UAS causal factor taxonomy with the help of subject matter experts, detailed construction of the database using commercially available text mining tools, and more mathematical development of the hybrid methodology and applications to the unmanned aircraft contextual domain. Acknowledgement. This research is supported by Federal Aviation Administration grant number 06-G-008. The contents of this paper reflect the views of the authors who are solely responsible for the accuracy of the facts, analyses, conclusions, and recommendations represented herein, and do not necessarily reflect the official view

A Regulatory-Based Approach to Safety Analysis of Unmanned Aircraft Systems

573

or policy of the Federal Aviation Administration. The authors acknowledge the support and participation of Dr. Xiaogong Lee, Mr. Michael Allocco, Mr. Steve Swartz, Mr. Robert Anoll and the FJ Leonelli Group, Inc. to the Rutgers research.

References 1. Roland, H.E., Moriarty, B.: System Safety Engineering and Management, 2nd edn. John Wiley & Sons, Inc., New York (1990) 2. Heinrich, H.W.: Industrial Accident Prevention. McGraw Hill, New York (1936) 3. Rosenberg, B.: Next-Gen Nav/Comm. Aerospace Engineering and Manufacturing, 26–29 (2008) 4. Detect, Sense, and Avoid Safety Metrics (RTCA, RTCA Special Committee (SC) 203 Working Group 3, RTCA, Inc., Washington, DC, April 5 (2007a) 5. Guidance Material and Considerations for Unmanned Aircraft Systems (RTCA, DO-304, RTCA Special Committee (SC) 203, RTCA, Inc., Washington, DC, March 22 (2007b) 6. Öztekin, A., Luxhøj, J.T., Allocco, M.: A General Framework for Risk-Based System Safety Analysis of the Introduction of Emergent Aeronautical Operations into the National Airspace System. In: Baltimore, M.D. (ed.) Proceedings of the 25th International System Safety Conference, Baltimore, MD, August 13-17 (2007) 7. Hayhurst, K.J., Maddalon, J.M., Miner, P.S., DeEalt, M.P., McCormick, G.F.: Unmanned Aircraft Hazards and Their Implications for Regulation. In: 25th Digital Avionics Systems Conference, vol. 12, pp. 5B1-1 – 5B1-12 (2006) 8. De Jong, H.H., Blom, H., Stroeve, S.H.: How to Identify Unimaginable Hazards? In: Proceedings of the 25th International System Safety Conference, Baltimore, MD, August 1317 (2007) 9. Leveson, N.G.: Safeware: System Safety and Computers. Addison-Wesley Publishing Company, New York (1995) 10. Öztekin, A., Luxhøj, J.T.: Hazard, Safety Risk, and Uncertainty Modeling of the Integration of Unmanned Aircraft Systems into the National Airspace. In: 26th Congress of International Council of the Aeronautical Sciences, Anchorage, Alaska, September 14-29 (2008) 11. Ross, T.: Fuzzy logic with engineering applications. McGraw-Hill, New York (1995) 12. Zadeh, L.: Fuzzy sets. Inf. Control 8, 338–353 (1965) 13. Lauritzen, S.: Propagation of probabilities, means, and variances in mixed graphical association models. JASA 87(420), 1089–1108 (1992) 14. Öztekin, A.: A Generalized Hybrid Fuzzy-Bayesian Methodology for Modeling Complex Uncertainty, Ph.D. Proposal, Department of Industrial and Systems Engineering, Rutgers University (August 2008)

Using Acoustic Sensor Technologies to Create a More Terrain Capable Unmanned Ground Vehicle Siddharth Odedra1, Stephen D. Prior1, Mehmet Karamanoglu1, Mehmet Ali Erbil1, and Siu-Tsen Shen2 1 Department of Product Design and Engineering, School of Engineering and Information Sciences, Middlesex University, London N14 4YZ, United Kingdom 2 Department of Multimedia Design, National Formosa University, 64 Wen-Hua Rd, Hu-Wei 63208, Taiwan [email protected]

Abstract. Unmanned Ground Vehicle’s (UGV) have to cope with the most complex range of dynamic and variable obstacles and therefore need to be highly intelligent in order to cope with navigating in such a cluttered environment. When traversing over different terrains (whether it is a UGV or a commercial manned vehicle) different drive styles and configuration settings need to be selected in order to travel successfully over each terrain type. These settings are usually selected by a human operator in manned systems on what they assume the ground conditions to be, but how can an autonomous UGV ‘sense’ these changes in terrain or ground conditions? This paper will investigate noncontact acoustic sensor technologies and how they can be used to detect different terrain types by listening to the interaction between the wheel and the terrain. The results can then be used to create a terrain classification list for the system so in future missions it can use the sensor technology to identify the terrain type it is trying to traverse, which creating a more autonomous and terrain capable vehicle. The technology would also benefit commercial driver assistive technologies. Keywords: Unmanned Ground Vehicles, Terrain Sensing, Situational Awareness, Tyre Noise.

1 Introduction Unmanned systems are being used for many applications in nearly every industry where humans are either unwilling or unable to operate in. They operate in many different environments such as on the ground, in the air, under the sea and even out in space. Each of these environments have a range of conditions and obstacles which make it difficult for the unmanned system to operate in, for example wind speed is a key issue for the Unmanned Aerial Vehicle (UAV), as is keeping electronic components from getting wet for the Unmanned Underwater Vehicle (UUV); however because of the number of variables on the ground, the Unmanned Ground Vehicle D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 574–579, 2009. © Springer-Verlag Berlin Heidelberg 2009

Using Acoustic Sensor Technologies to Create a More Terrain

575

(UGV) has the hardest job in terms of navigating in its environment. Ground conditions are the most difficult to operate in because they usually include dynamic and variable obstacles over a range of different terrain types and systems have to usually operate in unknown, unstructured environments which include a large number of unpredictable variables, making the seemingly simple task of traversing very hard. Most vehicles that travel on the ground whether manned or unmanned are generally designed to drive over a flat structured road, however when the road conditions change with bad weather or the vehicle is required to go off-road, then the vehicle needs to be capable of coping with a larger range of conditions. To start with the vehicle has to have a very high degree of mobility (such as 4x4 or tracked vehicles), but in the case of an autonomous system that doesn’t have the decision making of a human operator then the vehicle must also have some intelligence in order to help understand the conditions of the local environment to inform an otherwise oblivious system of its situation, so that it can then make decisions on what to do and where to go next in order to complete the mission without becoming stuck.

2 Vehicle Assistive Systems Nowadays even commercial vehicles are equipped with some of these intelligent technologies to assist the driver when conditions become difficult to drive in, giving more control and decision making responsibilities to the vehicle. These technologies are added to our vehicles not only to make them more desirable and more enjoyable to drive but most importantly to make them safer by making them more capable. Safety systems have been highly developed over the years and most have gone from being available as optional extras to becoming standard safety equipment, Anti-locking Braking Systems (ABS) and Traction control are two that can now be found as standard systems on modern vehicles. With the advancement and affordability of technology, systems are being introduced that are a lot more intelligent which offer assistance to the driver in the case of difficult driving conditions such as bad weather and/or off-road terrain. This section will discuss some of the intelligent systems available on vehicles today. 2.1 BMW - iDrive With iDrive, BMW has opened up a new development in driver orientation and vehicle control [1]. Certain top of the range BMW vehicles come equipped with the iDrive, which is now in its 3rd generation. iDrive is an onboard computer that allows the user to control the vehicles features using the control knob. Some features can be seen as a luxury such as climate control and satellite navigation; however most features are driver assistive such as the suspension settings as well as the traction control, but the most impressive advanced assistive technology on the iDrive menu has to be the night vision mode that uses a front facing infrared camera to show up any potential obstacles in the dark on the iDrive’s LCD screen (Figure 1). This assists the driver in avoiding any obstacles or to see pedestrians that could potentially be unseen in the dark.

576

S. Odedra et al.

Fig. 1. BMW iDrive’s night vision (BMW, 2008)

2.2 Land Rover - Terrain Response Since its release in 2006, The Land Rover Discovery 3 comes complete with an assistive system known as Terrain Response, which allows the driver to select the type of configuration, via an in car dial (Figure 2), in reference to the terrain type that they are about to travel over. The dial has five settings: a general mode, for everyday driving; grass, gravel and snow driving; one for travelling through mud and ruts; sand; and finally a rock crawling mode. For each setting the system will adjust the vehicles characteristics to best suit the terrain type. ABS, traction control, stability control, suspension, the shift schedule of the transmission, the 4WD differentials and even the engines throttle response are all but some of the settings adjusted by the system [2].

Fig. 2. Terrain Response available on the Land Rover Discovery 3 (Land Rover, 2006)

2.3 Citroen - Snow Motion The 2009 Citroen C5 comes equipped with an assistive system known as Snow Motion. Citroen, who collaborated with Bosch on this system, say it eliminates the

Using Acoustic Sensor Technologies to Create a More Terrain

577

apprehension of non informed drivers [3]. Snow motion, unlike other anti-skid systems doesn’t stop the traction of the wheels but instead assists them independently in order to keep the vehicle driving through bad conditions. It works by analysing the situation using information from the vehicles acceleration, angle of the wheels, slope angle, grip and condition of the road; and if conditions are detected to be adverse then the computer will firstly authorise alternative spinning of the driving wheels to handle the condition of the road, and then the system works to adjust the drive settings by measuring the conditions, evaluating the action of the driver and also the effects of the computer’s instructions to the car; aiming to bring together what the driver wants to do with the reality of the road conditions. 2.4 Nissan - All Mode 4x4-i Nissan are another vehicle manufacturer that offers superiority when driving in adverse and off-road conditions. Their solution is the All Mode 4x4-i available on the latest X-Trail, which is a fully integrated system highly publicised to be able to analyse the terrain as you drive [4]. The systems uses information from its network of sensors to make adjustments to the torque distribution to the wheels, 4 wheel drive and ABS settings. There are 4 main modes - uphill start, downhill drive support, automatic torque distribution and finally the Electronic Stability Program (ESP) which works to match the vehicles response to the input at the steering wheel. Nissan believe that the correct adjustments to the settings mentioned previously can assist the driver and make the vehicle highly capable to successfully drive in any condition such as snow, water, rock and sand; whether it is on or off the road.

3 Terrain Sensing The systems discussed in the previous section highlight the need for vehicles to be more intelligent to become more aware of their environment. They need the intelligence to be able to make the correct adjustments to the drive settings; however they use information from internal sensors to work out the external conditions and lack the ability to directly sense the environment. There is therefore a need to directly sense the terrain and its conditions. A system with this ability could eliminate the need to guess the external conditions and make the system more aware of the real-time terrain conditions. This system could also automatically adjust the drive settings to best suit the conditions without having to worry the unaware driver or in the case of the autonomous UGV give it the added awareness that a human driver would have. To directly sense the terrain the actual interaction at the wheel would need to be measured, which can be done either actively or passively. To actively do this the contact of the tyre to the terrain would need to be measured which would require some type of contact sensor [5], however to do it passively the system must detect the passive result of the interaction; if contact is an active result of the interaction, then noise can be identified as a passive result. 3.1 Tyre Noise As mentioned tyre noise is the passive result of the interaction between tyre and terrain, which will change if either of the two elements in the interaction are changed. If

578

S. Odedra et al.

Fig. 3. Graph showing the change in tyre noise levels with a change in air voids (NCAT, 2004)

different tyres will result in different sounds over the same terrain, that also means that the same type of tyre will make different sounds over different terrain and if that difference can be measured then potentially the terrain type can be identified. The noise which is radiated from the tyre surface is produced by several mechanisms, including the vibration of the tyre surface, vibrations of the tread blocks and resonances of the air cavities in the contact patch between the tyre and the road surface [6] and these can all help to detect the terrain type. The sound caused by the deformation of the tread will give a distinctive sound that can help determine the hardness of the surface; noise from tyre vibrations will help determine the roughness of the surface; the sound of the air passing through the tyre cavities changes when closed which can help determine if the surface is covered, possibly by water (see Figure 3). Other sounds made by the tyre-terrain interaction which could help determine the terrain type are stick-slip and stick-slap sounds which can help to differentiate the density of surface [7]. To be able to sense all the different sounds and changes in them an advanced acoustic sensor (microphone) will need to be used. Further work will be done in testing tyre noise in order to find distinct differences in sound over different terrain conditions.

4 Conclusion In most commercial vehicles, especially those designed to drive off-road or in adverse conditions, come equipped with intelligent systems as discussed earlier. The development and availability of these systems is increasing with the affordability of technology and the increased need for more capable vehicles. The existence of these

Using Acoustic Sensor Technologies to Create a More Terrain

579

systems highlights a number of points; the first is the need for vehicles to be more intelligent, which they need to be so that they are more aware of their environment in order to cope with changes in terrain conditions. Once the vehicle has the information about its situation then it selects the correct drive configuration highlighting the next point that different terrain types and weather conditions require different drive styles and settings, and therefore vehicles must have adjustability in order to be adaptable to the situation leading to a more capable system. These intelligent systems are being offered on manned vehicles, therefore systems developed for autonomous UGV’s need to be highly intelligent as they don’t have the decision making of a human operator and have sole responsibility to select the correct settings to be able to successfully cope with the situation. This shows that an autonomous system needs more information and intelligence to be able to understand its environment, and therefore the sensing of terrain conditions will offer the system more situational awareness about the actual environmental conditions making a more knowledgeable and terrain capable system. Terrain sensing isn’t limited to UGV’s as mentioned earlier; it can also be assistive to the current systems, providing more information to help understand what the actual terrain conditions are, and also to make the driver more aware of the external conditions.

References 1. Quain, J.R.: For iDrive 4.0, BMW Brings Back a Few Buttons. New York times (2008) 2. Vanderwerp, D.: What Does Terrain Response Do? (2005), http://www.caranddriver.com/features/9026/ what-does-terrain- response-do.html 3. Citroen. Technology - Snow Motion (2009), http://www.citroen.co.uk/technology/safety/snow-motion/ 4. Nissan. New X-Trail ALL MODE - Nissan’s new 4x4 technology (2007), http://www.theallnewxtrail.info/mumfords/allmode4x4/ 5. Odedra, S., Prior, S.D., Karamanoglu, M.: Improving the Mobility Performance of Autonomous Unmanned Ground Vehicles by Adding the Ability to ‘Sense/Feel’ their Local Environment. In: Human Computer Interaction International 2007, Beijing International Convention Center (2007) 6. University of Cambridge, Tyre / road interaction noise - Road surfaces available for experimentation, Department of Engineering (2006) 7. Hanson, D.I., James, R.S., NeSmith, C.: TIire/Pavement Noise Study, National Center for Asphalt Technology. Auburn University, Alabama (2004)

Critical Interaction Analysis in the Flight Deck Chiara Santamaria Maurizio1, Patrizia Marti2, and Simone Pozzi1 1

DeepBlue s.r.l., Rome, Italy Communication Science Department, University of Siena, Italy [email protected], [email protected], [email protected] 2

Abstract. The paper describes an experimental work conducted within the HILAS (Human Integration into the Lifecycle of Aviation Systemshttp://www.hilas.info/mambo/) project. The objective of HILAS is to develop a model of good practices for the Human Factors (HF) integration throughout the life-cycle of aviation systems. The project developed a toolkit of HF tools for the evaluation of new technologies for the flight deck. CRIA (Critical Interaction Analysis) is one of the HF tools included in HILAS toolkit. This paper reports the results of a real time simulation, held at NLR in Grace Simulator, where CRIA was applied to assess the HF issues implied in the replacement within the flight deck of the current radio panel with an Interseat Touch Screen (ITS), implemented by GE Aviation. Keywords: Human Factors, flight deck, real time simulation, aviations, system evaluation, systemic approach.

1 Introduction HILAS (June 2005 - June 2009) is an IP-FP6 project co-funded by the EC and carried out by a consortium of 40 partners from the Aeronautic Industry, Universities, Airlines, Research Institutes and SMEs. This paper describes the evaluation process and the results of the second simulation experiment where CRIA was applied to evaluate the HF aspects implied in the replacement within the flight deck of the current radio panel with the interseat haptic touch screen (ITS). CRIA exploited insights on: (1) how the interseat haptic touch screen is accepted and appreciated by pilots; (2) what the impact of the new technology on pilot’s workload and situational awareness is; (3) what the safety, usability and domain suitability issues are. The aim of the approach is to address a wide unit of analysis where knowledge is distributed among humans, procedures and tools and it is continually evolve through use.

2 CRIA CRIA is a HF approach which is applied to the analysis and evaluation of complex systems such as avionics systems and air traffic control systems. By using CRIA the Human Factors Expert (HFE) is able to carry out an in-depth analysis of the critical D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 580–589, 2009. © Springer-Verlag Berlin Heidelberg 2009

Critical Interaction Analysis in the Flight Deck

581

interactions that take place among human operators and other components (e.g. technologies/applications, procedures and environment) in a given work setting as a consequence of the introduction of a change in a safety critical system. The CRIA approach enables the HFE to identify and assess the impact produced by a system-change on the human activity, taking a closer look at the implications at system-level . CRIA is neither a new methodology nor a new theory. It is a toolkit of methods, strategies and tools developed by a pool of experts who collected and systematized the best practices and methodological tools to be applied during the entire evaluation process. CRIA is inspired by the SHEL model [2] and SHELL [3] widely known and accepted in the ATM domain, and it is based on the assumption that each work process requires a specific combination of resources provided by different and complementary components. These resources include: rules, procedures and informal and formal practices of work (Software – S – the term “Software” in the context of CRIA should not be confused with the term “software” relating to computer applications), physical artefacts (Hardware-H), communication flows and interrelations among workers (Liveware-L), social, politic and economic variables (Environment-E). The combination of these resources continually changes during the system life cycle and any productive process is always defined by a specific combination of Hardware, Software and Liveware resources. There are no processes that can be carried out by one of these components in isolation. From the Hawkins’s model CRIA shares the focus on the pivotal role of the human component and the importance of exploring all the system interactions starting from the human component [3]. CRIA is also rooted in Activity Theory [5] which implies an understanding of human activity as mediated by external artifacts (both material and immaterial). Activity theory theorizes that when individuals engage and interact with their environment, production of tools are resulted. These tools are "exteriorized" forms of mental processes, and as these mental processes are manifested in tools, they become more readily accessible and communicable to other people, thereafter becoming useful for social interaction. Furthermore CRIA encourages the user-centred design which puts the human being at the centre of the design activity. CRIA Evaluation Process and CRIA Toolkit. The CRIA evaluation process is articulated in five main phases: • • • • •

Hypothesis Definition Scenario Design Simulation Run Data Analysis Reporting

In the evaluation process, CRIA carries its own Toolkit which assists the HFE in executing the different phases of the process described above. CRIA Toolkit is a set of human factors tools built up in accordance to CRIA analysis structure and tailored to fit properly the evaluation hypotheses. The CRIA Toolkit is made up of: • A repository

582

C.S. Maurizio, P. Marti, and S. Pozzi

It is a repository of examples from previous projects (simulation scenarios, questionnaires, evaluation hypotheses). It is a useful tool to document ad share best practices and to get inspiration for future studies. Hypotheses Identification Grid The Hypotheses Identification Grid aims to assist the HFE in defining the evaluation hypotheses through a stepwise process. Key features of the new technology are analysed with respect to the L – S – H components: e.g. which procedures, practices or norms are affected by the use of the radio frequency textual identifier? Which actors use that features? Which tools/interfaces utilise that feature? As a result of this process the experimental hypotheses are defined: e.g. “The ITS might impact the pilot’s performance when receiving the clearance to contact two different frequencies at the same time (S – H interaction). • Scenario Grid The Scenario Grid supports the HFE in defining the scenarios of the simulation and in identifying potentially critical events that can occur during the simulation. Each scenario is made up of different elements and it is related to a specific evaluation hypothesis. • HFE Observation Grid The HFE Observation Grid supports the HFE in observing during the simulation any critical interaction between the system components liveware, hardware and software. • SME Observation Grid The SME Observation Grid supports the Subject Matter Expert’s during the simulation in observing the activity, thus allowing the HFE to gather comments from an operational perspective. • PEQ Grid The PEQ is a Post Experiment Questionnaire to gather quick and synthetic comments from a large sample pool of people involved in the simulation. The results of the questionnaires are investigated and discussed during the debriefing session and further analysed after the experiment. • Debriefing Grid The Debriefing Grid is a tool supporting the HFE in conducting the debriefing after the simulation. On the basis of evidences collected during the simulation, the Debriefing Grid supports the HFE in conducting the discussion on the critical interactions among the system components. CRIA Debriefings take place right after the experiment in order to get the actors’ on – the – spot impressions. Unclear situations and/or behaviour observed during the experiment and unclear answers to the questionnaires which are subject to interpretation are further investigated and clarified during the Debriefings. Issues related to a single system interaction that were previously analysed in isolation are investigated during the Debriefings by analysing the impact that an interaction produces on the other system components. The starting point of the CRIA evaluation process is the Hypothesis Definition, that is the understanding of the problem and the quality of the proposed solutions in order to identify hypotheses to test during the simulation. In this phase interview sessions are carried out with the technology provider and the experimental leader in

Critical Interaction Analysis in the Flight Deck

583

order to get a deep understanding of the technology and its functionalities, to analyze the initial objectives of the technology provider in developing such technology, the expected benefits and any constraint that may affect the evaluation. This initial exploration is followed by activity analysis and envisioning sessions. During the envisioning sessions the operators try to imagine future scenarios of use of the new technology. Potentially critical events are also envisioned in this phase in order to feed the next phase of the process, the scenario design phase. In order to assist the HFE in developing scenarios for the simulation , the Hypotheses Identification Grid is adopted. The Scenario Design Phase represents an important phase of the CRIA validation process. The scenarios result from the collection and the elaboration of rich information coming from different sources (activity observations, manuals, interviews and the above-mentioned envisioning sessions) and different actors. They are used to: • propose realistic situations of what could happen in a real work setting (e.g. realistic procedures, realistic conditions) using the new technology [1]; • cover main safety issues emerged during the analysis phases; • investigate potentially critical interactions among the system componets (liveware, software and hardware). For example, in the HILAS real time simulation, different scenarios providing a representation of the use of the radio and of the ITS in different flight phases were developed. Contingencies and unexpected events were integrated in the scenarios to assess the emergence of critical interactions among system components and potentially hazardous situations in the use of the ITS. The collection of such scenarios represented a set of safety-related break-downs of the current activity. For each scenario we identified critical interactions between the pilots and the other system components (L, S, H) when using the ITS. In order to assist the HFE in developing scenarios for the simulation , the Scenario Grid is adopted. The Simulation Run Phase. During the experiment different kind of data are gathered and observation sessions are carried out by the HFE and by the Subject Matter Expert (SME). The role of the SME is to support the HFE’s analysis in assessing the new technology from the operational perspective. Two observation grids are available from the CRIA toolkit in this phase, one for the HFE and the other for the SME to guide their observations towards critical interactions occurring during the simulation. Right after the exercise, the pilots are requested to fill-in Post Exercise Questionnaires (PEQ). The PEQ Grid assists the HFE in building up the questionnaires. A Debriefing session follows. The Debriefing Grid assists the HFE in structuring the Debriefing to meet the objectives of the evaluation process. The Data Analysis Phase A preliminary data analysis is carried out right after the simulation experiment on the basis of annotations contained in the observation grids, the questionnaires and other available data (e.g. data logs or video clips). This preliminary analysis helps in conducting the interviews and allows the HFE to clarify and investigate together with the pilots aspects that arose during the exercise.

584

C.S. Maurizio, P. Marti, and S. Pozzi

A deeper analysis is carried out later by taking a closer look at the data gathered during the experiment including those gathered during the Debriefing session. The critical interactions that arose during the simulation experiment are analysed. Hypotheses stated at the beginning of the evaluation process are then assessed together with the analysis of potentially critical interactions among system componenets (L –S - H). The Data Reporting Phase. The results of the evaluation process are reported and redesign recommendations are provided.

3 The HILAS Case Study In the following paragraph we report a case study carried out within the HILAS project to assess the HF issues implied in the replacement within the flight deck of the current radio panel with an Interseat Haptic Touch Screen (ITS). The ITS is a touch screen display made up of two main parts, one for the Captain and the other for the First Officer. The ITS is located in the pedestal of GRACE simulator flight deck, between the Captain and the First Officer seats as described in Fig 1.

Fig. 1. ITS’s Hardware an location in the flightdeck(the picture is courtesy of NLR and is taken in "NLR's Generic Research And Cockpit Environment - GRACE")

During the Hypotheses Definition phase, the ITS was analysed to understand the potential problems and the proposed solutions of the ITS technology. The following issue arose: • It is not possible for the Pilot Flying (PF) and Pilot not Flying (PNF) to interact at the same time with the radio. This issue was considered as critical • The ITS is slow. • The confirm button was identified as an extra step. • The key 1 of the keypad was identified as an extra step since all the com frequencies start per 1.

Critical Interaction Analysis in the Flight Deck

585

• The haptic feedback was considered difficult to be perceived during a flight. • The textual identifier of the frequency displayed below the frequency number has been highlighted as relevant and as additional useful information. The Envisioning Session with an expert pilot revealed that: • The use of the radio can be critical during short flights with lots of frequency changes. This information provides requirements to select the scenario. • In comparison to the radio currently used in the ITS, the information is split in different tabs. This aspect was highlighted as critical because information needed at the same moment could fall under different tabs • The haptic touch screen functionality was identified as potentially critical in a cockpit environment (turbulence). The knob is easier to manage in those situation than a push button • The display is located in an uncomfortable position. It could be a problem in case of low visibility. • A textual identifier of the frequency (textual name in Fig.2) is displayed below the frequency number. This information was immediately highlighted as relevant and as an additional useful information. One of the finding that arose at the beginning of the evaluation process, the additional information provided by the textual identifier of the frequency, is reported as example to describe the entire CRIA process. The VHF Tab of the ITS is made up of three VHF boxes for the Captain and three VHF boxes for the First Officer. Each VHF box is composed of the elements described in Fig 2. Transmit Control Radio Name

Swap Standby and Active frequency

Active Frequency and textual name

Listen Control

Volume increase

Volume level

Volume decrease

Increase frequency by 25hz

Standby Frequency and textual name

Decrease frequency by 25hz

Fig. 2. ITS’s Interface reporting the additional information, the frequency textual identifier (the picture is courtesy of GE Aviation the ITS technology provider)

The Hypotheses Identification Grid was used to identify the hypotheses linked to this additional information. For each of the system components (L – S - H) the elements related to this additional information are listed. For the selected topic the following relations with the other components of the system were identified:

586

C.S. Maurizio, P. Marti, and S. Pozzi

• Software : Frequency Input Procedure and Frequency Change Procedure • Hardware: ITS’s stand- by frequency box and ITS’s active frequency box • Liveware: − Primary Actors : Pilot Flying (PF) and Pilot Not Flying (PNF) − Secondary Actors: ATCo Tower and ATCo Departure According to this analysis the following event was inserted in the scenario: The accomplice ATCo provides the wrong frequency by mistake without realizing the error not even after pilots’ read-back. The frequency provided by the ATCo is not in the ITS database so the text identifier is not displayed. When the pilot inserts the frequency in the stand-by frequency box an empty box is visualised below the stand-by frequency and after the swap an empty box is visualised below the active frequency. When the crew tries to contact the frequency there is no reply. The hypotheses to be verified during the course of the experiment, resulting from the Hypotheses Identification Grid were the following: H1: The textual identifier of the frequency might reduce the time to detect pilots’ digit errors, the ATCo’s errors and might increase pilots’ situational awareness.

Step1 ATCo in contact provides the pilots the next wrong frequency by mistake

Step2 Pilots insert the wrong frequency in the stand-by frequency box

Pilots notice the empty box visualized below the stand-by frequency box si

no

Step3

Step4

Step5

SWAP from Stand-By frequency to Active frequency

Pilots contact the frequency and do not receive any replay

Pilots swap to the previous frequency and ask for confirmation

Pilots notice the empty box visualized below the active frequency box

no

si

Fig. 3. Before the analysis the following possible pilots’ behavior has been identified. The hypothesis is that the additional information reported below the frequency if used by the pilots can help them to skip 3 steps (step 3, step 4 and step 5) if detected at the early stages or at least two steps before, if detected later (step 4 and step 5).

H2: The ITS provides additional useful information to double-check the frequency(e.g. when pilots put in advance the frequency to anticipate the insertion of a frequency that can be easily guessed). During the Simulation Run Phase, the following pilot’s behaviour was observed when the event occurred:

Critical Interaction Analysis in the Flight Deck

587

ATCo 125.75 gives the clearance to contact the company frequency on frequency 131.65 and immediately after to contact Brussels Radar on frequency 135.885. Priority is given to the ATC. The PNF does the readback of the frequency and inserts the frequency number in the ITS. The PNF realises that there is not a textual identifier reported below the frequency and inserts a new frequency 135.850. The PNF also realises that the new frequency does not have a textual identifier reported below, so he tries to contact the frequency but he does not get any answer. So he comes back to the previous frequency and asks for a confirmation. During the Data Analysis Phase the debriefing was carried out investigating, together with the pilots, their behaviour during the exercise and questioning how this additional information could change the current interaction between pilots and air traffic controllers (L - L).

4 Results and Discussion The scenario showed that the textual identifier of the frequency is useful, not only in detecting ATCo errors in providing the frequency and pilots digit errors, but also in rapidly recovering pilots errors and in double checking the frequency. During the Debriefing, how the textual identifier of the frequency could change the current interaction between pilots and air traffic controllers was discussed. (L-L). Different airlines apply different practices (L-S). Basically two different practices have been identified. Most of the crews that took part in the simulation assumed what the ATCo said as correct. The pilots highlighted : “More effort is needed to verify the information rather than to try to contact the frequency. In case there is no repl,y then it is easier to go back to the previous frequency”. So for these crews the interaction with the ATCo is basically one-way (Controller --> Pilot) Those crews, rarely ask for confirmation. These crews found the textual identifier very useful because it provides information on their position, thus increasing their situational awareness. Other crews found the textual identifier useful both in supporting them in detecting errors and as a confirmation of what they heard. It can help in supporting the pilots in Groung-Air-Ground communication misunderstanding. The pilots highlighted: “There are places where you are not always sure that you have cought the right frequency. Indeed sometimes the controller doesn’t speak good English and sometimes it is quite difficult to understand what they are saying. It is helpful if you have a backup that give us the name of the frequency”. It is also useful to increase the pilot’s awareness of their position and who they are speaking to. For these crews, the interaction with the ATC is bidirectional (Controller --> Pilot and Pilot --> Controller). It is possible for them to ask for confirmation if they have doubts about the frequency provided by the ATCo. This additional information is considered interesting and useful even if it is likely to increase the Pilot -Controller communications. During CRIA Debriefing Pilots also highlighted that sometimes, through the radio communication, they know more or less which airplane is in front of them. If they hear that the airplane in front is cleared to contact. For example, in the case of Brussels 135.5, they know that their next frequency will be 135.5. They start to

588

C.S. Maurizio, P. Marti, and S. Pozzi

pre-select this frequency on the stand-by box. When the ATCo calls the pilots and says to switch to 135.5, they have the frequency already there. In these situations the information reported below can help in double checking the frequency in advance. Redesign Recommendation and Suggestions for Improvement. CRIA also provided several recommendations and suggestions for improvement which are the results of pilots’ comments gathered during the debriefing sessions and results’ interpretation. The following aspects have been highlighted by the pilots: • The empty box can be confusing. Pilots have highlighted that the name of the sector is useful to detect an error, but when they see an empty box they get disoriented, so they have suggested to use another representation when the frequency is wrong. The pilots stated:“I would suggest not to leave the text identifier box empty when the frequency inserted is wron,g but to use another representation. It needs to be something that says it is wrong. If you leave it blan,k it can mean that maybe yes or maybe no” • The position of the screen does not allow use of this information because it is too far back. Some pilots did not notice the blank box and during the Debriefing they said that the cause was that the panel was faraway . They have suggested having a repeater in front of their view that, for example, says 118.9 Amsterdam Approach. In that case the additional information would be really useful and usable. Pilots have also said that they did not noticed the blank box because the technology was new and they were not trained enough.

5 Conclusions The paper presented the use of CRIA, a HF approach to the evaluation of complex systems such as avionics systems and air traffic control systems. Through the description and discussion of the case study, we argue that an in-depth analysis of the critical interactions among the liveware, software and hardware components is fundamental to assess and anticipate problems that may occur from the integration of a new technology in the operational context. The CRIA approach significantly relies on the use of a toolkit that supports the HFE in conducting a systematic analysis of the critical interactions that may occur among the system components. Among the other tools, the CRIA approach makes use of structured scenarios that proved to be an appropriate means to recreate realistic situations where the system components are subject to the full variability of input data and situations that may occur in the real world. In our approach, the structuring of scenarios implies an articulated process where the potential interactions between the different system components are represented to provide a meaningful context for the activity during the simulation. Indeed, the complexity of any socio-technical system is not a simple function of the number of interacting elements that compose the whole system. Rather, the complexity depends on the nature of interactions between those elements and the degree of knowledge held by subjects involved in this interaction. Structured scenarios present situations in

Critical Interaction Analysis in the Flight Deck

589

which the systemic components may serve different functions so that each component interacts with the other components with different modes leading to unplanned states or reactions. This can highlight complex interactions among components that are either not visible or not immediately comprehensible to the operators leading to hazardous situations. The case study presented in the paper provides a clear example of this: the two practices emerging from the use of the textual identifier may result in an increase of the communication exchanges between the pilot and the air traffic controller. If from one side this can be beneficial for improving the pilot’s situation awareness, from the other side can increase the pilot’s workload due to the time spent in communicating and coordinating with the air traffic controller. Such potential problems are more likely to emerge applying a systematic approach like the one offered by CRIA.

Acknowledgments The collaboration with pilots that participated to the experiment, with the experimental leader (NLR) and with the other partners of the HILAS project was very important to exhaustively apply CRIA throughout the entire evaluation process.

References 1. Carroll, J.M.: Scenario-based Design. Wiley & Sons, New York (1995) 2. Edwards, E.: Man and machine: Systems for safety. In: Proceedings of British Airline Pilots Associations Technical Symposium, pp. 21–36. British Airline Pilots Associations, London (1972) 3. Hawkins, F.H.: Human factors in flight, 2nd edn., Ashgate, Aldershot, UK (1987) 4. Marti, P., Scrivani, P.: The representation of context in the simulation of complex systems. Cognitive Technologies 8(1), 32–42 (spring 2003) 5. Nardi, B. (ed.): Context and Consciousness: Activity Theory and Human-Computer Interaction. MIT Press, Cambridge (1996)

Understanding the Impact of Rail Automation Sarah Sharples, Nora Balfe, David Golightly, and Laura Millen Human Factors Research Group, Faculty of Engineering, University of Nottingham University Park, Nottingham, NG7 2RD [email protected], [email protected], {David.Golightly,laura.millen}@nottingham.ac.uk

Abstract. Over the past ten a number of studies have been conducted to understand the way in which rail signalling operations are completed. This paper reviews some of the themes that have emerged from this body of work and considers two principal questions – firstly, how should we apply human factors methods to develop our understanding of the impact of automation and secondly, how can we link the data we have collected from the rail work to theoretical concepts that will help us to design future automation systems? Keywords: automation, rail, workload, observation, simulation, performance.

1 Introduction Over the past ten years a number of studies have been conducted by researchers in the Centre for Rail Human Factors (CRHF) in the Human Factors Research Group (HFRG) at the University of Nottingham to attempt to understand the way in which rail signalling operations are completed. Through this work we aim to enable current and future system design to be informed by human factors knowledge. This paper reviews some of the themes that have emerged from this body of work and considers two principal questions – firstly, how should we apply human factors methods to develop our understanding of the impact of automation and secondly, how can we link the data we have collected from the rail work to theoretical concepts that will help us to design future automation systems? This paper considers a set of methods that have been applied to examine rail signalling automation and identifies the situations in which they are particularly valuable in yielding insight into the way automation affects the work of a signaller. It then presents a set of challenges that have been encountered during our research to understand the impact of automation in rail and considers how these will affect the way in which we can inform the design of future rail automation systems.

2 Automation in Rail The majority of railway signallers currently use one of three types of technology – manual Lever Frames, NX Panel-based or VDU-based systems. An NX (“entryexit”) panel is a “hard-wired” control system, with a physical mimic of the track D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 590–599, 2009. © Springer-Verlag Berlin Heidelberg 2009

Understanding the Impact of Rail Automation

591

layout, signals and trains, indicated by LED displays in an overview panel and operated by switch presses. VDU-based systems represent the tracks and signals on a suite of computer monitors and the signaller sets routes by using a tracker ball and keyboard. Automatic Route Setting (ARS) technology can also be incorporated within the VDU-based system – this system automatically sets routes for passenger trains, although the signaller can still easily intervene to set routes manually. Our research has primarily focused on two types of signalling systems, NX panels and VDU based systems that generally include Automatic Route Setting technology. It is anticipated that the number of VDU based systems will increase in the future, and it is likely that the implementation of automatic route setting systems will increase. In addition to the explicit automation technology of ARS, rail signalling is underpinned by interlocking systems that support safe operation by for example preventing two trains being set on a collision path.

3 Human Factors Methods to Understand Automation We have applied a number of different types of methods to help in our understanding of automation – but what do these methods tell us, and how do they help us to advise on the design of future automation systems? The following section of this paper describes elements of the research approach that have evolved and considers a number of different methods that we have applied. 3.1 Observation Direct observation of signallers in situ is essential and a usual first step to enable a researcher to develop a good understanding of the way in which signallers complete their tasks and use automation. It has also been used in previous research to analyse the way in which transport control tasks are executed (e.g [1]). Observation has proved to be very valuable in identifying the active elements of controlling the trains (e.g. setting routes, making telephone calls) but does not capture some of the less observable cognitive elements of the task, such as an operator deciding which train to route first, or predicting the route that the automatic route setting system will set a train upon. For a researcher to fully understand the actions being completed we have found that they either need to undergo some form of signalling training themselves, or question either the signaller or an accompanying signalling manager in order to fully understand the nature of the work being completed. In addition, qualitative observations can yield a large amount of detailed information which, whilst rich and valuable for analytical insight into specific issues (e.g. expertise [2]) is difficult to use to obtain generalisable findings and compare data from different sites or sources. Therefore Balfe et al [3] have developed a structured framework for observation that enables the different activities of operators of different types of automation to be compared. However, in order to limit observer bias, interruptions must be kept to a minimum and this prohibits the collection of qualitative data from the signaller under observation. This limitation could potentially be countered by the use of retrospective video analysis with the signaller, but our experience has shown us that whilst video analysis is useful for recording interactions it is more easy to apply for the physical

592

S. Sharples et al.

interactions with the NX panels, compared with the interactions via tracker ball on a VDU type panel, primarily due to the difficulty in picking up the fine movements across multiple screens made by VDU signallers. A final consideration is that live field observations are at the mercy of events outside the control of the researchers. While this increases the overall knowledge and depth of data gained from the method it means that the results can be more difficult to compare and generalize. 3.2 Simulation Increasingly high fidelity simulators are being developed to support signaller training and these can be used as an additional resource for human factors researchers. The advantages to the use of simulators include the ability to control simulated scenarios and gather data on these from multiple signallers. This standardization of a scenario is almost impossible in the live environment due to the highly complex and dynamic nature of rail signalling which means that two identical situations are highly unlikely to occur. Use of a simulated environment for research also removes safety and performance concerns which may arise in the live environment due to the potential distraction of the signaller. However, there are a number of issues surrounding the use of simulation with regard to realism. It can prove difficult to match an appropriate scenario within the simulation to the level of complexity and workload that the signaller is exposed to in live situations. In addition to this, signallers often feel that their actions and procedures are under scrutiny by the researcher due to the safety critical nature of the tasks and the rules and regulations of the domain. The signaller also has a degree of face-to-face interaction with other signallers on adjacent panels / workstations. This interaction is difficult to mimic within a simulated scenario and although more pronounced on NX panels due to the overlapping of control areas it is also important in VDU based boxes. This therefore removes and important source of additional information, or of distractions that might influence workload. Finally, whereas VDU-based simulators are essentially an arrangement of PCs and monitors which can be configured to mimic any VDU-based workstation, it is too costly to develop a bespoke NX simulator for each real NX panel. Hence, whereas VDU simulation is geospecific, most NX simulation is currently geotypical, requiring the signaller to learn a new, generic track layout and traffic pattern. This further reduces the realism of this type of simulation. 3.3 Interviews Interviews with operational staff have proven to be a very effective method of gaining insight into signaller attitudes and strategies. However, the signalling domain is complex and requires a certain amount of expert knowledge on the part of the interviewer to correctly target the interview and probe for additional information. One method which has proven effective is to hold the interview at the signaller’s workstation enabling them to use the workstation itself to illustrate points to the interviewer. The disadvantage of interviews is the subjective nature of the data gathered, although the risks from this can be minimized by increasing the sample size and by expert analysis and interpretation of the data gathered. However, such interviews often provide a vast amount of data for analysis which can be very time

Understanding the Impact of Rail Automation

593

consuming, and difficult to interpret and present. The other difficulty, common to all skilled tasks, is that much activity is highly proceduralised and therefore implicit, even to the signallers themselves. 3.4 Quantitative Data Collection Finally, it may be possible to collect quantifiable data either as an alternative, or to accompany, qualitative methods. By being able to quantify performance or other measures of task characteristics, we are able to empirically and objectively compare across different types of system and conditions, including those influenced by automation. First, direct measurements of signaller activity can be used to generate quantitative data of signaller activity. Second, most simulators now automatically generate performance statistics, such as number of trains held at red lights or experiencing delays. Finally, it is possible to capture data relating to task characteristics such as workload and situation awareness. These measures may come from outside of the rail industry, but more commonly we find it useful to develop bespoke methods for rail signalling. Currently, we use the Integrated Workload Scale IWS) [4], and are working towards similar measures for SA. Typically, such methods involve a level of intrusion and are primarily used in simulation. It is, however, sometimes possible to capture data in parallel with live operations as long as the measure does not influence, or impair, the signaller’s ability to do their work. 3.4 Summary of Methods Table 1 shows a summary of the methods and their advantages and disadvantages. Table 1. Summary of Methods

Method

Advantages

Disadvantages

Direct Observation

Unobtrusive and “real life” Structured data for comparison Comprehensive description of signaller activities Ability to control and manipulate variables Standardisation Rich data on signaller strategies

Requires expertise from observer and/or accompanying subject matter expert Limited insight into reasons behind observed behaviours Reduced realism Not all elements of task simulated

Simulation

Interviews

Measurement

Quantifiable data, for comparison and empirical testing Objective

Subjective (both data and analysis) Time-consuming analysis Implicit knowledge Measures often need to be bespoke for rail Often intrusive, can only be used in simulation

4 How do We Link These Data to Theoretical Concepts? Once we have applied the set of methods to further our understanding of automation, we need to consider how to draw inferences that will allow us to develop our understanding of the underlying theoretical concepts related to automation use. This will enable us to contribute to development of theory and also provide consistent guidance that can inform the design of future automation technologies. However, we

594

S. Sharples et al.

face a number of challenges that are observed in the context of rail, but also relevant to other contexts, in making this transition. 4.1 Challenge 1: Different Signallers Use Different Strategies to Complete Tasks

Workstation observed (observation 1, 2 or 3)

One of our aims is to draw generalisable inferences that will allow us to predict the way in which automation should be designed in future systems. However, we have identified that there is a large level of variation in the way in which signallers complete their tasks. Data collected by Balfe et al (2008)[3] illustrates this clearly. Figure 1 shows data collected from three observations of separate signallers at each of six different VDU based signalling workstations with automatic route setting. All data was collected at the same time of day (and so should have had a roughly equivalent level of train movements and required interactions). All of the operations were completed efficiently, and to the required safety and performance standards. However, it can be seen that individual signallers varied considerably in the proportion of time they spent on different types of activities – for example, the amount of observable behaviour classified as monitoring varied from over 40% for the observation at workstation IL1, to less than 5% at the third observation of the same workstation (IL3). This variation is indicative of a number of challenges. It is difficult to control the events when collecting data in the real world, and small events may dramatically change the way in which an operator works. In addition, individual signallers differ in the way they use signalling systems – some preferring to actively anticipate the ARS intervention and set routes manually, whereas others prefer to observe ARS and understand the way in which it is working, to enable them to intervene if required.

AS3 AS2 AS1 NK3 NK2 NK1

Monitoring Intervention Planning Communications Quiet Time

IL3 IL2 IL1 SF3 SF2 SF1 LE3 LE2 LE1 YS3 YS2 YS1

0%

20%

40%

60%

80%

100%

% Time

Fig. 1. Observation data collected from six different signalling workstations with automatic route setting

Understanding the Impact of Rail Automation

595

4.2 Challenge 2: The Impact of Automation Is Different for Different Workload Levels Figure 2 shows the hypothetical relationship between different perceived levels of workload. It illustrates a phenomenon that we have observed – that automation has a particular impact when workload is high. This presents several challenges. Firstly, it is hard to predict when workload will be high in collecting real world data, and when workload is high, signallers are reluctant to be questioned on their perceived workload levels, or even be observed. Therefore, data from a high workload scenario is most easily collected in a simulator environment, thus sacrificing some elements of realism. Secondly, the interaction between level of demand and impact of automation on perceived workload means that it may be appropriate to recommend different types of automation for different levels of demand. Finally, one of the ironies of automation [5] – that a designer who tries to eliminate the operator still leaves the operator to do the tasks which the designer cannot think how to automate – is likely to come to the fore in this situation, as in high demand situations it is likely that something unusual or difficult to predict has happened.

Average IWSscore

Fig. 2. Hypothetical relationship between perceived workload and level of demand placed for different levels of automation 9 8 7 6 5 4 3 2 1

Manual Automatic Route Setting Low complexity

Medium complexity, Normal running

Medium complexity, disrupted running

Level of complexity of simulated scenario

Fig. 3. Perceived workload rated on Integrated Workload Scale for a 30 minute simulated scenario comparing a low and medium demand scenario

596

S. Sharples et al.

Figure 3 shows a data set combined from that collected by Balfe et al [6] and Golightly & Millen [7] in a simulated environment comparing three levels of automation within a VDU signalling workstation that examines this point, with workload measured using the IWS. The three datasets were illustrative of a low complexity environment, and two medium complexity environments, one of which was manipulated to include a disruption to rail traffic and thus increase demand on the signaller. This graph illustrates a general trend of increase in workload for the manual conditions, but the workload only appears to increase once the disruption is introduced for the condition including automation This preliminary data appears to support the hypothetical relationship proposed in Figure 2. 4.3 Challenge 3: Observed Activity Does Not Always Relate to Perceived Workload The final challenge that is particularly relevant to our approach of collected data in the field as much as possible is that it easier to observe active interaction with physical systems compared with cognitive effort that may be needed to completely remain “in the loop” for a VDU based system with automation. There are two factors to consider here – firstly it is easier to observe the physical action of pressing buttons on a control panel compared with monitoring use of a trackerball on a VDU; and secondly it is hard to draw inferences about cognitive work from observable behaviour. Figure 4 illustrates the anticipated relationship between observable behaviour and perceived workload for different automation levels. This graph illustrates the suggestion that there may be the same level of workload for the different automation types, but the nature of that workload and cognitive work being completed by the individuals may differ depending on the automation type. For example, in a high automation scenario, a signaller may be having to work harder in order to maintain situation awareness, as they not only need to keep track of the movements of trains on the screen but also maintain an understanding of the way in which the automation is controlling the movements. In a low automation scenario, the signaller may have more required interactions with the systems (and thus a higher level of observable activity) but not the same level of complexity in working to understand the way in which the system is making decisions to automatically control train movements. Figures 5 and 6 illustrate the different data collected from field observations in a field environment on a NX (low automation) and VDU (higher automation) signalling system [8]. Each bar on the graph indicates the subjective rating of workload (given on an adapted version of the AFFTC [9] workload scale) for a period of fifteen minutes in which the number of observable activities was recorded. Figure 7 shows some data more recently collected in a VDU environment [10] which confirms the trend illustrated from the small data set in Figure 6, this time measuring workload using the IWS. It can be seen that even for this small set of data, for the lower automation scenario there is more of an observable trend between increased perceived workload and number of observable behaviours, compared with the VDU context where there does not seem to be a clear trend.

Understanding the Impact of Rail Automation

597

36-40

8

7

31-35

7

6

26-30

6

21-25

5

5 4

16-20

4

11-15

3

6-10

2

2

0-5

1

1

3

Adapted AFFTC rat ing

15 minut es

No. of observed behaviours in

Fig. 4. Anticipated relationship between perceived workload and observed activity for different levels of automation

No. of behaviour s Workload r ating

36-4 0

8

7

31-3 5

7

6

26-3 0

6

21-2 5

5

16-2 0

4

11- 15

3

6-10

2

2

0 -5

1

1

5 4 3

Adapted AFFTC rat ing

15 minut es

No. of observed behaviours in

Fig. 5. Observed behaviours and workload rating collected from a NX signalling environment [8]

Fig. 6. Observed behaviours and workload rating collected from a VDU signalling environment with ARS [8]

S. Sharples et al.

10 9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3

IWSrating

No. of observed behaviours in 5 minutes

598

No. of behaviours Workload rating

2 1

Fig. 7. Observed behaviours and workload rating collected from a VDU signalling environment with ARS [10]

5 Conclusions Automation is already an important, and ever growing, aspect of signalling. Yet the implications, both currently and for future design, are not fully understood. In this paper we have presented a number of characteristics of the rail signalling domain that need to be considered when investigating automation. Understanding the influence that automation can have on signalling requires an appreciation of the strengths and weaknesses of the human factors methods that can be brought to bear. It also requires an appreciation of complexities of the domain, and the need to remain cautious of generalisations in the light of these complexities. Our current aim is to expand the set of qualitative and quantitative data on the implication of rail signalling in order to develop design and implementation guidance. At the same time, our intention is that this data will help to clarify some of the ongoing theoretical issues relating to the human factors of automation.

References 1. Heath, C., Luff, P.: Collaboration and Control: Crisis management and multimedia technology in London Underground Line Control Rooms. Computer Supported Cooperative Work (CSCW) 1(1-2), 69–94 (1992) 2. Farrington-Darby, T., Wilson, J., Norris, B., Clarke, T.: A naturalistic study of railway controllers. Ergonomics 49(12-13), 1370–1394 (2006) 3. Balfe, N., Wilson, J.R., Sharples, S., Clarke, T.: Structured Observations of Automation Use. In: Bust, P. (ed.) Contemporary Ergonomics, pp. 552–557 (2008) 4. Pickup, L., Wilson, J.R., Norris, B.J., Mitchell, L., Morrisroe, G.: The Integrated Workload Scale (IWS): A new self-report tool to assess railway signaller workload Applied Ergonomics, vol. 36(6), pp. 681–693 (2005) 5. Bainbridge, L.: Ironies of automation. In: Rasmussen, J., Duncan, K., Leplat, J. (eds.) New Technology and Human Error, pp. 271–283. Wiley, Chichester (1983)

Understanding the Impact of Rail Automation

599

6. Balfe, N., Wilson, J.R., Sharples, S., Clarke, T.: Effects of level of signalling automation. In: Third International Conference on Rail Human Factors, Lille (March 2009) 7. Golightly, D., Millen, L.: Situation Awareness Pilot Study. IOE Report prepared for Network Rail IOE/RAIL/09/04/DR (2009) 8. Nichols, S., Bristol, N., Wilson, J.R.: Workload assessment in Railway Control. In: Harris, D. (ed.) Engineering Psychology & Cognitive Ergonomics (2001) 9. Ames, L.L., George, E.J.: Revision and verification of a seven-point workload scale (AFFTC-TIM-93-01). Edwards AFB, CA: Air Force Flight Test Center (1993) 10. Balfe, N.: Personal Communication, February 19 (2009)

Cognitive Workload as a Predictor of Student Pilot Performance Nathan F. Tilton1 and Ronald Mellado Miller2 1

Embry-Riddle Aeronautical University, Worldwide Campus, PO Box 31252, Honolulu, HI 96820, USA 2 Brigham Young University-Hawaii, 55-220 Kulanui St. #1896, Laie, HI 96762, USA [email protected], [email protected]

Abstract. This study examined the relationship between cognitive task load and performance in pilot training in a civilian pilot training program. It was found that the NASA task load index was indicative of training success, with the most successful pilot trainees showing the most cognitive task load and vice versa for those performing poorly. The implications for this finding are discussed as is their relation to possible advantages to military pilot trainees over their civilian counterparts. Keywords: task load, cognition, TLX, aviation, training, civilian, military.

1 Introduction Flight researchers define “mental’ (or cognitive) workload as, “the mental cost placed on the pilot by performing the necessary mental processing to accomplish the mission” (Vidulich, 2003, p. 117). Maintaining control of the aircraft places large workload demands on any pilot. Unfortunately, cognitive tunneling can occur when a pilot focuses on primary flight tasks to exclude all other task requirements (Vidulich, 2003). Today, flight training is taking place in more Technologically Advanced Aircrafts (TAA) (Craig, Bertrand, Dornan, Gossett, & Thorsby, 2005). Glass cockpits integrate the standard instruments, avionics, and GPS. The result is that the majority of instruments in a normal cockpit are reduced to a 10-inch Primary Flight Display. TAA’s also include a Multi-function Display that displays the engine and navigational displays while serving as a backup if the Primary Flight Display (PFD) fails (Craig, Bertrand, Dornan, Gossett, & Thorsby, 2005). Furthermore, pilots must also communicate, fly the aircraft, and follow all applicable regulations. Students have to satisfy the requirements of the flight syllabus to advance. Part 61 and Part 141 flight programs require that the student make adequate progress. If the student fails to do so in a Part 141 program, where college credit is often granted, course failure can occur. It may be assumed that students who repeatedly perform poorly will suffer from increased stressed on subsequent trials due to the prospect of failure, even if that same stress motivates them in the short-term. D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 600–605, 2009. © Springer-Verlag Berlin Heidelberg 2009

Cognitive Workload as a Predictor of Student Pilot Performance

601

Interestingly, the history of psychologically based evaluations for predicting pilot training performance shows that there is some confusion over their utility and predictiveness. For example, one of the earlier studies, by Jessup and Jessup (1971), indicated that, in terms of one measure (the Eysenck Personality Inventory), the most likely candidate to pass the Royal Air Force (RAF) pilot training was identifiable based on psychological criteria. They found that a “stable-introvert” was the most likely to pass while “neurotic-introverts” were the least likely. The explanation for this difference was that the “neurotic-introvert” is more easily aroused and more likely to arouse beyond the learning threshold and, thus, under-learn and underperform. However, other, more recent studies (Hedge, Bruskiewicz, Borman, Hanson, Logan, & Siem, 2000) have found that cognitive, psychomotor, and biodata instruments have been among the best predictors of pilot performance whereas personality measures have tended to be less predictive statistically. On the other hand, psychology and performance are clearly linked. Bor and colleagues (Bor, Field, & Scragg, 2002) found that among those RAF pilots who have ejected from their aircraft, 40% reported prolonged psychological disturbances. These were seen as being the result of the pilots being active, self-confident, and competitive with a tendency towards perfectionism. The disturbances themselves manifested as addictive behavior, anxiety, marital conflict, somatization, depression, and phobic reactions. Even the more recent undergraduate pilot training (UPT) programs, which include perceptual and motor tests to evaluate potential pilots, utilize some psychological measures. For example, the Basic Attributes Test (BAT), a measure of potential pilot performance that the U.S. Air Force used prior to 2006, includes tests on item recognition to measure short term memory, self-crediting word knowledge to examine self-confidence, and an activities interest inventory to determine attitudes toward risk taking (Carretta, 2002). Such measures have increased the UPT success to roughly 75% of those chosen (Carretta, 2002), a high degree of success by any measure. The main concern regarding student pilots is that sustained cognitive workloads can hamper the development and progression of the student. Some key issues to consider for effective flight training are how far does the student want to progress in their career. Patrick (2003) states that the more the training will impact their future career, the more motivated the students will be. One area that affects a student pilot’s motivation in flight training is how useful that training is to them. Patrick (2003) defined four areas of training evaluation, namely: goal orientation, cost effectiveness or cost benefit of the training, a research oriented approach using variations of the scientific method, and finally, qualitative analysis of flight training, that is not dissimilar from the first method of training evaluation. For those students in the FAA-approved collegiate flight training programs, the motivation is to complete the course in the allotted time of one college semester. If workload is high for too long, motivation for flight training will decrease as observed by a reduction in flights per week, or voluntary removal from the program. Based on individual performance, an increase in workload will occur at different times, however, in certain students who have low motivation, workload should peak at a predictable level across the board independent of lesson number. If high workload levels can be predicted, measured, and addressed, student pilots should remain in the program (Haber & Haber, 2003).

602

N.F. Tilton and R.M. Miller

2 Method 2.1 Participants Participants in this study were twenty Part 61 students enrolled at Honolulu Community College (HCC) for credit. The majority of students were in their early twenties. 2.2 Materials The NASA Task Load Index was used to measure cognitive workload (Hart, 2007). The NASA TLX is a scale of 6 factors which are: Mental Demand, Temporal Demand, Physical Demand, Emotional Demand, Frustration, and Performance (Hart, 2007). The NASA TLX is considered to be highly reliable and is currently used to validate other workload scales. The NASA TLX is considered valid and has a longitudinal Cronbach Alpha greater than 0.80 (Hart, 2007). 2.3 Procedure The data was collected over time after each flight and the results were recorded along with the syllabus flight number, flights per week, instructors, GPA, and course load. The data was analyzed by Statsoft Statistica 8 for trends in flight training. Flight training was divided into three blocks. Block one prepared students for the first solo and served as the current test.

3 Results Out of the 25 students in HCC’s program, 3 soloed on time and only 5 completed the course. These 5 students were reported to have less pilot error by their flight instructors. In addition, those students showed higher mental, temporal, effort, and frustration characteristics as measured by the NASA TLX. The raw TLX scores are reported here as recommended by Byers, Bittner, and Hill (1989). Over 66% of the total body of students had to repeat multiple lessons due to task saturation. These students reported being overwhelmed. One subject was unable to complete the first six lessons without severe shaking. One student dropped the program before completing block one.

4 Discussion The scores on the NASA TLX ratings for mental, temporal, and performance demand showed the greatest differences between those who passed and those who did not. Marginally, those who passed had slightly higher TLX scores for effort and frustration. This result suggests that although pilot training is often thought of as being task based, emotional characteristics of the training should be taken into account in order to make training more effective. This may be much more the case in civilian pilot training, as here, because these pilots may lack the physical and emotional support

Cognitive Workload as a Predictor of Student Pilot Performance

603

Fig. 1. This figure shows the mean TLX scores for the pilot trainees. Those who passed had higher TLX mental demand and temporal demand scores and marginally higher TLX effort and frustration scores as well as better performance scores than those that did not pass.

network military pilots share. While those who passed note the mental and temporal difficulty involved, this may have been indicative of their focus and drive, which in turn may also have led to their more positive appraisals of their performance. However, that only 12% of students soloed on time, 20% completed the course, and 2/3 had to repeat portions is note worthy not only because of the high failure rate on this task but because the military success rate for the program (75% according to Carretta, 2002) is so high. Without a community of support (as in the military), it appears that the civilian trainees had a harder time focusing on the tasks needed to succeed. The kinds of support that military personnel receive are extensive. Commanders and flight mates provide mentorship and a community to support the student pilot. This sense of belonging to a community that is either going through similar training or has in the past should not be overlooked in terms of its motivational and inspirational utility for student pilots struggling to pass. Similarly, while civilian pilots may feel excited to fly, military pilots are often seen as “elite” in the sense that their aircraft are typically among the most expensive, fastest, and are used to defend their country in potential combat. Coupled with the sense of community, these could serve as powerful strengths to the military trainee’s psychology. On the other hand, the military also offers a great deal to those who do feel overwhelmed and stressed. In the Air Force, there are service-wide mental health

604

N.F. Tilton and R.M. Miller

organizations as well as Chaplain based services to provide for stress counseling, life skills training, and the like. Moreover, there is an understanding of the complexity of the life style for pilots. For the pilot’s family, there are wives clubs and Heartlink programs which sponsor social events and give emotional support. Lastly, the military pilot is paid to train and become the best pilot, allowing a level of focus uncommon in the civilian world where a student pilot may need to work other jobs, take other classes, etc., that distract from the ability to accomplish the difficult task of pilot training. Each of these advantages for the military has a counterpart in the civilian world, but in totality, they seem to speak to a level of support and understanding that assists the military pilot to succeed in greater measure than their civilian counterparts. Further study seems warranted to determine the nature and extent of this relationship, as hypothesized. If found, perhaps modifications to current programs for civilian student pilot training could be implemented in order to correct this imbalance and allow a greater number of qualified pilots to serve their communities and nations.

References 1. Bolstad, C.A., Endsley, M.R., Howell, C.D., Costello, A.M.: General Aviation Pilot Training for Situational Awareness: An Evaluation. In: Proceedings of the 46th Annual Meeting Human Factors and Ergonomic Society, Santa Monica, CA, pp. 21–25 (2002) 2. Bor, R., Field, G., Scragg, P.: Counselling Psychology Quarterly, vol. 15, pp. 239–256 (2002) 3. Byers, J.C., Bittner, A.C., Hill, S.G.: Traditional and raw Task Load Index (TLX) correlations: Are paired comparisons necessary? In: Mital, A. (ed.) Advances in Industrial Ergonomics and Safety, vol. I, pp. 481–485. Taylor and Francis, London 4. Carretta, Thomas, R.: Understanding the relation between selection factors and pilot training eprformance: does the criterion make a difference. International Journal of Aviation Psychology 2, 95–105 (2002) 5. Craig, P.A., Bertrand, J.E., Dornan, W., Gossett, S., Thorsby, K.K.: Ab initio training in the glass cockpit era: new technolgy meets new pilots. In: Proceedings of the 13th International Symposium on Aviation Psychology, Columbus, Ohio (2005) 6. Haber, R.N., Haber, L.: Principles and practice of aviation psychology. Lawrence Erlbaum Associates, Inc., Mahwah (2003) 7. Hart, S.G.: NASA-Task Load Index (NASA TLX). In: Proceeding of the Human Factors and Ergonomic Society 50th Annual Meeting, pp. 904–908. Human Factors and Ergonomic Society, Santa Monica (2007) 8. Hedge, J.W., Bruskiewicz, K.T., Borman, W.C., Hanson, M.A., Logan, K.K., Siem, F.M.: Selecting pilots with crew resource management skills. International Journal of Aviation Psychology 10, 377–392 (2000) 9. Jeppesen. Guided Flight Discovery Private Pilot. Englewood, CO: Jeppesen (2007) 10. Jessup, G., Jessup, H.: Validity of the Eysenck Personality Inventory in Pilot Selection. In: Occupational Psychology, vol. 45, pp. 111–123 (1971) 11. Li, G., Grabowski, J.G., Baker, S.P., Rebok, G.W.: Pilot Error in Air Carrier Accidents: Does Age Matter? Aviation, Space, and Environmental Medicine 77(7), 737–741 (2006) 12. O’Hare, D.: Principles and Practices of Aviation Psychology. In: Tsang, P.S., Vidulich, M.A. (eds.). Lawrence Erlbaum Associates, Inc., Mahwah (2003)

Cognitive Workload as a Predictor of Student Pilot Performance

605

13. Ottati, W.L., Hickox, J.C., Richter, J.: Eye scan patterns of experienced and novice pilots during Visual Flight Rules (VFR) navigation. In: Proceedings of the Human Factors and Ergonomic Society 43rd Annual Meeting, pp. 66–70. Human Factors and Ergonomic Society, Santa Monica (1999) 14. Patrick, J.: Principles and Practices of Aviation Psychology. In: Tsang, P.S., Vidulich, M.A. (eds.), Lawrence Erlbaum Associates, Inc., Mahwah (2003) 15. U.S Department of Transportation (2008). FederalAviation Regulations| Aeronautical Information Manual. Aviation Supplies & Academics, Inc., Newcastle (2009) 16. Vidulich, M.A.: Principlesand Practices of Aviation Psychology. In: Tsang, P.S., Vidulich, M.A. (eds.), Lawrence Erlbaum Associates, Inc., Mahwah (2003)

Direct Perception Displays for Military Radar-Based Air Surveillance Oliver Witt, Morten Grandt, and Heinz Küttelwesch FGAN-Research Institute for Communication, Information Processing, and Ergonomics Neuenahrer Str. 20, D-53343 Wachtberg, Germany {witt,grandt,kuettelwesch}@fgan.de

Abstract. Air surveillance is among the time-critical and highly prioritized tasks of naval ships, in which the human operator will stay the decision maker in the future as well. User-oriented human-systems integration requires the provision of ergonomically optimized user interfaces. Based on functional system descriptions in the form of abstraction hierarchies, perceptive displays were developed for air surveillance that constitute an advancement with respect to so far principally alphanumerical displays supporting the operator with an improved situation awareness in his decision-making processes. It concerns, in detail, displays for the tactical situation picture, the explicit information about airborne contacts as well as the condition and the configuration of system state especially regarding the radar equipment. Keywords: Abstraction hierarchy, user interface, polar diagram, military combat direction systems.

1 Introduction Naval platforms that are earmarked for a versatile spectrum of military missions are characterized by combat direction systems that are technologically state-of-the art and equipped with high-capacity sensors and effectors. They shall guarantee that the crew is enabled to adequately master so-called “naturalistic situations” that may be characterized by insecurity, dynamic environments, varying and undefined users, competing aims, time pressure and a high risk of decision failures [1]. Air surveillance in particular is among the eminent time-critical and securitysensitive tasks for naval platforms because of the kinematic qualities of airborne contacts, e.g. aircrafts or missiles. Because of today’s primarily asymmetric threat, however, the combat direction systems that are operated with a high degree of automation do not guarantee a total reliability with regard to the identification and classification of such contacts. Therefore, the human operator will remain the final decision maker in the future. Also when using computer-based decision supports, the focus should consequently lie on the optimized human-systems integration when designing complex military human-machine-systems and the according humanmachine interfaces. In order to avoid operator-out-of-the-loop problems, the operator should be included profoundly into the situation and the system (human-in-the-loop). D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 606–615, 2009. © Springer-Verlag Berlin Heidelberg 2009

Direct Perception Displays for Military Radar-Based Air Surveillance

607

Looking at today’s combat direction systems, however, one finds that information is predominantly offered to the operator on text-based (alphanumerical) and separated displays. For instance, crucial information about tracks, such as course, speed, altitude, position, etc., must be compared with given identification criteria by the operator and the individual results must then be mentally integrated in order to verify hypotheses concerning identities and intents. The operator has thus to complete multiple n-times-comparisons and integrations with respect to all aims observed by him as well as make multiple hypotheses tests. The actual system state, related, e.g., to the surrounding areas that are covered by radar, is not displayed. Instead, such information must be communicated verbally and henceforth be memorized in the working memory by the operator in order to include it in the decision-making process. It is obvious that a very high mental demand can hence result when complex decisions need to be made that may even entail deficits in performing and possibly fatal incorrect decisions in case of overload. The design of user interfaces of complex systems with an mission spectrum characterized by unpredictability, is increasingly carried out based on abstraction hierarchies both in civil [2][3] as well as in military areas [4]. This contribution deals with perceptive displays for air surveillance using the example of class 124 frigates of the German Navy. The displays shall support the operator in his decision-making processes by improved situation awareness with regard to system state and the external tactical situation. In detail, it is about the design of the condition and the configuration of the applied radar equipment which affects the sensed tactical air picture and furthermore about the display design regarding decision relevant information about airborne contacts.

2 Knowledge Representation with Abstraction Hierarchies Within greater military units like task groups or task forces the class 124 frigates of the German Navy primarily are responsible anti air warfare. The radar SMART-L (Signaal Multibeam Acquisition Radar for Tracking, L band, today’s D-band) is the main sensor for long-range detection, localization and tracking of airborne contacts. Therewith, objects with a high radar cross section such as air carriers or bombers can be captured up to a distance of approximately 400 km. Within the computer-aided information processing, the data provided by several sensors of the ownship and other (linked) platforms are fused in the process phase Sensor Data Fusion and subsequently undergo the process phases Identification and Classification. SMART-L as well as the mentioned software processes possesses a multitude of settings (characteristics of the radar beam, degree of automation, etc.). The adjustments of these system parameters are made by a Doctrine Management Officer (DMO) in the combat information centers (CIC) of the frigates. The data provided by the sensor are used by several operators, e.g., the Anti Air Warfare Officer (AAWO), in air surveillance. Consequently, the system state configured by the DMO has a significant influence on the tactical situation picture offered to the AAWO dealing with tactical situation evaluation. Depending on their roles, different information needs arise for diverse operators with regard to the system state and specific demands as to the interaction

608

O. Witt, M. Grandt, and H. Küttelwesch

functionalities. These user requirements can be collected and structured by means of abstraction hierarchies which are about functional descriptions of the system to be conducted and the work domain respectively that is “independent of a particular worker, automation, event, task, goal or interface” (after [6]). Abstraction hierarchies were developed based on the analysis of system specification and manuals, interviews with operators of the German Navy and developers of radar equipment as well as the observation of team trainings [5,6]. The analysis resulted in the abstraction hierarchy pictured in abstracts in Fig. 1. The first of five levels (functional purpose, FP) describes the aim with which the work domain was developed. The second level (abstract function, AF) provides the underlying regularities and principles. The third level (generalized function, GF) covers the involved processes. The fourth level (physical function, PFu) defines the involved entities and their availability. The fifth level (physical form, PF) contains the physical appearance and local arrangement of entities. Chen et al. [7] carried out this form of analysis for the interface design of sonobuoys. The difference between the deployment of sonobuoys and of radar equipment such as SMART-L lies, e.g., in the medium (water versus air), the time factor (critical versus uncritical), the kind of radiation (acoustic waves versus electromagnetic waves), the type of detectable contact (submarines versus aircrafts, missiles), the contact details (direction, distance, depth, acoustic data versus position, speed, IFF, ESM) and the location of the sensors (mobile versus ship-based).

Fig. 1. Abstraction hierarchy for military radar-based air surveillance (excerpt)

Direct Perception Displays for Military Radar-Based Air Surveillance

609

Chen et al. [7] distinguish between sensor management and tactical situation awareness. In Fig. 1, elements for radar system management are dashed, dotted for Tactical Situation Awareness and solid framed for commonality. The functional aim is the fast supply of a complete and accurate list of track attributes of airborne contacts under minimization of radar emission. This track attributes list covers distance, azimuth angle, elevation angle, speed, etc. The socalled jammers, that try to constrain or inhibit with active blockage the use of the Dband, belong to the objects as well. The illustration of radar echoes, the so-called radar video, in addition, serves for the observation of contacts. In situations with a high degree of sensor working load, the completeness and accuracy of data is not simultaneously guaranteed for all contacts because of physical or operational restrictions depicted on the level „abstract function“. Therefore there must be a balancing depending on priority. The required resource management and energy input depends on physical regularities such as the law of conversation of energy. Operational restrictions, like the so-called Rules of Engagement, determine the operational options, e.g. they may restrict the usage of radar systems in order to avoid radar emissions in certain parts of the operational area. Located on the level Generalized Function are the processes for the configuration of a radar equipment that are necessary for the transmission of signals as well as the reception of echoes, e.g., signal generation, amplification, transmission, reception and processing. For the tactical situation awareness the processes for the detection of signals, the localization of signal sources and the tracking of contacts are given. The level Physical Function contains radar equipment such as APAR, SMART-L und IFF on the one hand and the signal sources on the other hand. On the level Physical Form, the elements of the level Physical Function as well as the applied radar equipment are described by their location, form, dimension, material, etc. The attributes of the airborne contacts and their kinematic data stand on this level as well. The different layers of hierarchy are connected by means-end-relationships (symbolized by lines in Fig. 1), i.e., one layer respectively provides the means in order to reach the aims of the overlying layer, whereupon each layer in its particular form contains a complete system description. A central aspect of visualization is, as mentioned above, the adjustment of radar coverage (hatched in Fig. 1) normally carried out by the DMO, as it plays a decisive role in the interpretation of the tactical situational picture and the enclosed contacts.

3 Design of the User Interface The abstraction hierarchy provided contents for the human-machine interfaces to be designed by dissipating the information demands necessary for the operator from the individual cells of the abstraction hierarchy. The radar coverage, e.g., informs whether the functional purpose of the provision of contact data is achieved by SMART-L in a defined section. Conclusions concerning completeness, accuracy and actuality of the data are indeed desirable, but are presently not provided by SMART-L.

610

O. Witt, M. Grandt, and H. Küttelwesch

For information needs emanating from the (highlighted in grey) elements of the abstraction hierarchy (Fig. 1) displays were designed that will be explained individually in the following paragraphs. This concerns the visualization of the control parameters of the radar equipment and the master screen of the state of SMART-L (light grey), visualization of the tactical air picture (medium grey) as well as detailed contact details (dark grey). In contrast to the DMO who has to configure the combat direction system, as explained above, in relation to the qualities of the radar sensors and who needs a detailed display of the system state for this purpose, the AAWO has the task to carry out an analysis of threat based on the air picture shown on a tactical situation display and to initiate necessary activities, if applicable. The available radar equipment is the primary source for the design of the air picture. These can be configured by the DMO in a way that they cover the entire airspace or only certain sectors with certain degrees of intensity. The AAWO essentially needs displays for his task that inform him about the position and the qualities of the contacts picked up via the sensors. Nevertheless, it is also relevant for him which areas in the surroundings of the ownship are covered by which sensors. In this way, he can, on the one hand, detect in which sectors airborne contacts may actually be detected and, on the other hand, in which sectors the ship can be reconnoitered by other target objects due to the emitted radar radiation. Thus, both the DMO and the AAWO need information about the system state, even though on different levels of aggregation. 3.1 Visualization of Control Parameters For the visualization of control quantities and its influence on the system functionality a form following designs of civil process control systems was chosen. Fig. 2 shows the interrelation of radar system components while sending, i.e., the hardware structure (PFu) is shown according to the information flow (GF) from signal generation (Fig. 2, left) to signal transmission (Fig. 2, right) for the transmitter of the SMART-L radar system. The different components contain aggregated displays of the availability of the subsystems and the according operational condition, at which green symbolizes the availability, yellow the partial availability and red the unavailability of the respective component. The displays of the Frequency Control Unit (FCU) which generates the radar radiation, cover, amongst others, the current operational state (online operational) as well as the FCU’s availability. SMART-L possesses 8 frequency sub-bands in the RFarea. Each frequency sub-band can separately be approved for use. If none of these sub-bands has been selected, the aggregated display of the module additionally shows a warning symbol that the creation of transmitting energy is impossible at present. The signals generated in the FCU are amplified in the Amplifier Unit (AU). Amongst others, the lower part of the display points up the coherence between the adjusted scan range and the number (4, 8 or 32) of needed amplifiers. Highlighted in dark are currently not built-in slide-in modules. A precondition for an undisturbed operation of the transmitting antenna is its coaction with the B-Drive and the climate system. The B-drive state shows whether

Direct Perception Displays for Military Radar-Based Air Surveillance

611

Fig. 2. Visualization of control parameters

the SMART-L rotates. The availability of the antenna’s rotating part is aggregately displayed as a link. In case of a blockage the selection of this link delivers the cause in a separate overlay-window. The positioning indicator of the B-Drive contains a display showing the momentary position of the antenna (grey marking) and a display as well as a graphic input option for the position that the antenna shall take up after the halt (black marking). Because of the multitude of state displays, the animated iconized design of a revolving or not revolving B-Drive clarifies the two basic settings of the mechanical part of SMART-L, shown at the top right angle of the module. 3.2 Master Display of the Condition of SMART-L The monitoring display in Fig. 3 provides, on the one hand, relevant information primarily attuned to the functional aim, and allows, on the other hand, a fast access to detailed information of the underlying levels. Three iconized state indicators show whether SMART-L radiates (radiation), the transmitting energy is loaded into the artificial antenna (load) and whether the antenna rotates (B-Drive). In addition to the basic states „radiate“ and „not radiate“, the radiation symbol additionally displays the state „listen“ (ear symbol) that allows for, e.g., detection of jammer tracks if the antenna rotates without active radiation. If the DMO defined a azimuthally limited radar coverage “radar sectors” is displayed as “selected” and the radar coverage angles are displayed in the Tactical Situation Display (see Fig. 4).

612

O. Witt, M. Grandt, and H. Küttelwesch

Fig. 3. SMART-L master display

The lower part of the master display is based on a distinction between the transmitter and the receiver part. In the area “status / transmit” the different functional components of „SMART-L Status / Transmit” shown in Fig. 2 are aggregately represented. For instance, the Frequency Control Unit is not ready for use at present. The detailed state display can be navigated from the aggregated master display. At the same time, Fig. 3 shows on the right hand which kind of track information SMART-L provides for the following process phase “Sensor Data Fusion” based on the current system configuration, in this case it is exclusively about Jammer Tracks. 3.3 Visualization of the Tactical Air Picture For tactical situation analysis and threat evaluation the Tactical Situation Display (TSD) realized here as an overview display displays all geo-referenced airborne, surface and subsurface contacts detected by the ownship’s sensor systems or by other platforms linked to the combat direction system in bird’s view perspective (Fig. 4). A vertical section allows for the direct allocation of the contacts to altitude bands. Thus, potentially threatening contacts flying at low altitude can be recognized faster. On the other hand, the display shows which sectors are covered by SMART-L. Consequently, it is directly apparent for the AAWO why the areas not covered by radar do not capture contacts or only electro-magnetic bearings. In order to allow for a fast survey over the track attributes, so-called polar displays were integrated in direct vicinity of the Tactical Situation Display (see Fig. 4) as an addition to the detail displays for track evaluation that can not be examined more closely here for lack of space. The advantages of polar displays for the support of supervisory control tasks were pointed out using the example of nuclear power plants [8]. In this military application they constitute in integrated form the track attributes that are crucial for identification, such as distance to the ownship (DST), altitude (ALT), speed (SPD), course (CRS), IFF-information (IFF) and ESM-emissions (ESM). The activation of the individual polar displays occurs threat-triggered in the upper area and event-triggered in the lower area. For a single track attribute the polar display generates display proximity between the current attribute value and – taking into account predefined identification criteria – an un-critical attribute value on the attribute’s parameter beam. The symmetric figure in the background, the so-called normal range, represents the a priori defined scenario knowledge. For instance, it is known in advance which friendly, neutral

Direct Perception Displays for Military Radar-Based Air Surveillance

613

Fig. 4. Tactical Situation Display with a display of radar coverage configured by DMO (accentuation of active areas 10°-80° and 275°-290° at the compass rose) and polar displays in the left range of the figure. In area 100°-140° three electro-magnetic bearings are displayed, among them a hostile emission (118°).

(civil) and hostile radar emissions (ESM emissions) are to be expected. Similar friendly and neutral IFF (identification friend foe) codes are defined a priori. If potentially threatening attribute values are detected the respective value indicator deflects. The normal range of kinematic attributes like speed or altitude is defined in advance as a tolerance area. For instance, high velocities can be reached only by (hostile or friendly) military fighter aircrafts. Additionally, the variation of kinematic attributes, e.g., a sudden significant change of speed or altitude, is untypical for nonmilitary aircrafts and causes therefore a deflection of the respective value indicator. By connecting the current indicator values of single attributes a figure is generated which integrates the single pieces of information on a higher level of abstraction. This figure by means of symmetry or a-symmetry forms a so-called emergent feature which helps to transfer the interpretation of information content to the perception

614

O. Witt, M. Grandt, and H. Küttelwesch

phase of human information processing, i.e. direct perception: A symmetric figure (Fig. 4, lower polar display #4113) indicates an un-critical airborne contact. In contrast, the easy to perceive a-symmetry of the resulting figure in polar display #4112 (shown in the upper left part of Fig. 4) states the reason that this contact is critical regarding its ESM activity. Emergent Features, e.g., symmetry, alignment, parallelism, emerge from the relative constellation of multiple displays to each other. As the most relevant advantage of polar displays, in spite of the graphical aggregation of individual attributes they assure that the single pieces of information are better noticeable and perceivable. Thus, in contrast to classical alarm displays polar displays are alarming and diagnostic at the same time, because the possibly symptomatic characteristic of a single parameter value can be noticed easily. Furthermore, under different parameter constellations the figure-forming aggregation of single attributes allows for the direct derivation of higher-level task-related manifestations. In contrast, the notification about several pieces of information on separated displays as mentioned above requires multiple mental transformations and comparisons.

4 Conclusion and Outlook Based on functional system analyses by means of abstraction hierarchies visual displays for system configuration and tactical situation analysis in the context of air surveillance have been developed which support the human operator in decisionmaking providing enhanced situation awareness. For instance, interrelations between system configuration and the sensed tactical situation have been analytically determined and modeled and were integrated within the Tactical Situation Display. In several evaluation phases these visualizations have found to be a significant improvement regarding effectiveness, efficiency in comparison to displays known from current German naval platforms. They were rated to have a better usability, too. Thus, the benefits applying abstraction hierarchies and a model-based visualization of complex information have been shown. In the next step a further integration of displays and an optimized user guidance shall be realized which should take into requirements arising from both the users and the tasks. Doing so, combat direction systems shall be improved in order to ensure safe and efficient decision-making of human operators which is crucial facing the current and anticipated scenarios and operational conditions.

References 1. Orasanu, J., Connolly, T.: The reinvention of decision making. In: Klein, G.A., Orasanu, J., Calderwood, R., Zsambok, C.E. (eds.) Decision Making in Action: Models and Methods, pp. 3–21. Ablex Publishing Corporation, NJ (1993) 2. Jamieson, G.A., Vicente, K.J.: Ecological interface design for petrochemical applications: supporting operator adaptation, continuous learning, and distributed, collaborative work. Computers and Chemical Engineering 25, 1055–1074 (1999)

Direct Perception Displays for Military Radar-Based Air Surveillance

615

3. Yamaguchi, Y., Tanabe, F.: Creation and Evaluation of an Ecological Interface System for Operation Nuclear Reactor System. In: Proceedings of the XIVth Triennial Congress of the International Ergonomics Association and 44th Annual Meeting of the Human Factors and Ergonomics Society, vol. 3(2000), pp. 571–574 (2002) 4. Burns, C.M., Bryant, D.J., Chalmers, B.A.: Boundary, Purpose, and Values in WorkDomain Models: Models of Naval Command and Control. IEEE Systems, Man, and Cybernetics Part A: Systems and Humans 35(5), 603–616 (2005) 5. Rasmussen, J., Pejtersen, A.M., Goodstein, L.P.: Cognitive Systems Engineering. Wiley, NY (1994) 6. Vicente, K.J.: Cognitive Work Analysis: Toward Safe, Productive, and Healthy Computer Based Work. Erlbaum, Mahwah (1999) 7. Chen, H.Y.W., Burns, C.M., Lamoureux, T.: Work Domain Analysis for the Interface Design of a Sonobuoy System. In: 51th Annual General Meeting Human Factors and Ergonomics Society, Baltimore, MA (2007) 8. Wickens, C.D.: Engineering Psychology and Human Performance. Harper Collins, New York (1992)

A Selection of Human Factors Tools: Measuring HCI Aspects of Flight Deck Technologies Rolf Zon and Henk van Dijk National Aerospace Laboratory NLR, P.O. Box 90502, 1006 BM Amsterdam, The Netherlands [email protected], [email protected]

Abstract. Within the HILAS project two experiments in high fidelity flight simulators were performed. In the current paper results from one of those experiments are discussed. Focus of that discussion is on the added value of using a set of HF tools rather than individual tools and on a number of lessons that were identified from this experiment. The set of Human Factors tools that was applied in this experiment might be helpful for manufacturers of flight deck technologies or aviation authorities to establish whether new technologies should receive the predicate “Human factors certified”. Keywords: HILAS, Human Factors, HF, flight deck, experiment, flight simulation, HF tools registry, certification.

1 Introduction 1.1 The HILAS Project HILAS1 stands for “Human Integration into the Lifecycle of Aviation Systems”. It is an international research initiative with 40 partners from across the aviation industry and academia in Europe and beyond. The HILAS project (HILASa) develops a model of good practice for the integration of Human Factors (HF) across the full life-cycle of aviation systems. The project contains four parallel strands of work: the integration and management of HF knowledge; the flight operations environment and performance; the evaluation of new flight deck technologies, and the monitoring and assessment of maintenance operations. 1.2 The Flight Deck Technologies Strand All of those four strands focus on different aspects of HF in aviation. Within the HILAS Flight Deck Technologies strand two high fidelity flight simulator experiments (Roerdink and Zon, 2006, Kooi et al, 2007, Van Dijk and Zon, 2008a and 2008b) were performed. The general aim of these experiments was to select a set of HF tools for measurement of HCI aspects of new technologies (Zon and Roerdink, 1

The HILAS project runs from June 2005 until June 2009 and was funded by the European Communities as part of the 6th framework.

D. Harris (Ed.): Engin. Psychol. and Cog. Ergonomics, HCII 2009, LNAI 5639, pp. 616–624, 2009. © Springer-Verlag Berlin Heidelberg 2009

A Selection of Human Factors Tools

617

2006). The plans for the first experiment were presented at HCI International 2007 in Beijing (Zon and Roerdink, 2007). In that first experiment the HILAS Flight Deck Technologies strand partners installed and evaluated HF tools2 and new flight deck technologies in a high fidelity flight simulator. In the second experiment the lessons that were identified from the first were taken into account. This resulted in an adjusted set of improved HF tools and a more integrated approach in selection of tools, experimental design and in data analysis. 1.3 Focus of This Paper In this paper the focus is on the added value of having a set of HF tools, instead of using these HF tools individually. The HF tools are described briefly in the current paper; a more detailed description is provided in Zon and Roerdink (2007) and most up-to-date description of the individual HF tools may be found at HILASb. Besides describing the added value of applying the HF tools as a set of tools, the current paper identified a number of lessons from the HILAS experiments. These form a second point of focus of this paper.

2 The Experiment Two newly developed flight deck technologies were the vehicles for validation of the HF tools. These technologies were the “dual layer display” and the “interseat haptic touch screen”. The HF tools were used to verify the hypotheses regarding the technologies. Furthermore, the toolset was applied to study pilot behaviour independent of the specific technologies that were being used. Examples of human behaviour in this context are pilot “mental workload” and “Situational Awareness (SA)”. 2.1 Experimental Design Seven crews each comprising two airline pilots (a captain and a first officer) participated in the experiment for two consecutive days. The experiment consisted of a total of 11 experimental runs per crew, which were flown in pseudo-randomised order. Runs focussed on either the use of “dual layer display” and the “interseat haptic touch screen”, or they focussed on a particular construct that the HF tools could measure. Examples of such constructs are mental workload or SA. The approach followed (i.e. a number of short flight runs in one experiment) allowed to compare a great number of HF tools in a systematic way. More information about the exact content of the scenarios and the procedure in general may be found in Van Dijk and Zon (2008a and b) and Zon and Van Dijk (2009) and Van Dijk and Zon (2009). More information about the Generic Research And Cockpit Environment (GRACE), the high fidelity flight simulator in which the 2

In the current paper the words “HF tools” and “flight deck technologies” are frequently used. Both have a clearly different meaning. In the current paper HF tools are those tools that researchers use to study the interaction between pilot and flight deck, while flight deck technologies refer to technologies that are installed on the flightdeck and that are meant to assist pilots while performing their tasks.

618

R. Zon and H. van Dijk

Fig. 1. Generic Research And Cockpit Environment (GRACE)

Fig. 2. Combining data streams. Pilot in simulator while several tools (e.g. eye tracker IR camera) are registering data.

experiment was performed, may be found in Egter van Wissekerke (2004). Photographs of GRACE are displayed in Figures 1 and 2. 2.2 HF Tools The HF tools that were used in the second high fidelity simulator experiment were: • Questionnaires and rating scales were offered to the crews via an Electronic Flight Bag (EFB) in the cockpit, and via a desktop PC outside the cockpit for the longer

A Selection of Human Factors Tools

619

questionnaires. The open and closed questions were formulated by project partners and standardised by the University of Groningen. Two of the rating scales that were used were: Crew Awareness Rating Scale (CARS) for crew SA, the Rating Scale Mental Effort (RSME) for pilot mental workload. • (Debriefing) interviews, based on knowledge of the scenarios, pilot performance during the experiments and answers to the questionnaires and rating scales, were performed by specialists from NLR; i.e. Human Factors Expert Administered Debriefing Survey (HEADS) and from Deep Blue; i.e. CRitical Interaction Analysis (CRIA). • ASL head mounted eye trackers with optical head trackers, brought into the experiment by NLR, were used to record, on video as well as in databases, the crews’ eye scanning behaviour. • Heart rate variability and respiration rate were recorded as psychophysiological indices of mental workload by the University of Groningen. While TNO added for reasons of comparison the facial temperature as another psychophysiological measure for mental workload.

• All crew behaviour in the cockpit was recorded on video and audio by NLR. • A great number of simulator parameters were recorded by NLR. There were basically two kinds of parameters: the pilot inputs to the aircraft and the aircraft performance itself. • Two software applications were used for quicker and easier data analysis. From BAE SYSTEMS: Gwylio and from Noldus Information Technology: The Observer. 2.3 Technologies The two new cockpit technologies used in the experiment were: • The dual layer display was brought into the experiment by TNO. Two of those displays were installed to replace the navigation displays. The information that is normally presented on the navigation displays was now split over two layers where all fixed information (i.e. terrain and beacons) were presented on the further display and the moving information (i.e. other traffic) on the nearest display. For more detail about the dual layer display in general see (Kooi and Toet, 2003). The role of the dual layer display in the current experiment will be described in more detail in Zon et al (2009). • The interseat haptic touch screen was brought into the experiment by GE Aviation Systems. This touch screen gives the sensation as if one presses a button when touching it. It replaced the radio panel and was as such mounted at the pedestal behind the throttles. For more information about the interseat haptic touch screen the reader is referred to (Lewis et al, 2009). The role of the interseat haptic touch screen in the current experiment will be described in more detail in Zon et al (2009).

620

R. Zon and H. van Dijk

3 Lessons Identified The most generic lesson that was identified, and where a number of the lessons that are described below are related to, is that the thread, or critical path, of the experiment should not be interfered with by other apparently urgent aspects of the experiment. This is especially true because creating good HF scenarios takes time, and this time should really be available to fine tune the experiment. Only then high quality HF experiments can be performed. 3.1 Maturity of Technologies In the first high fidelity simulator experiment the emphasis was supposed to be on HF tools. However, new cockpit technologies (i.e. new displays or panels that the pilots can use in the cockpit to fly aircraft easier, safer or more efficient) were used as vehicles to validate the HF tools. It was assumed that less mature technologies might be helpful for validation of HF tools because there need to be flaws in the technologies in order to enable the HF tools to demonstrate that they can identify such flaws. Basically that is true but in order to evaluate technologies in a high fidelity simulator still requires that the technologies themselves have reached a certain level of maturity as well. Otherwise other environments, and therefore other HF tools, are more appropriate, for validation. The consortium evaluated for each cockpit technology that was installed in the simulator whether it had the right level of maturity for validation in a high fidelity flight simulator. Two of the technologies that were already applied in the previous experiment and that were developed further in the period between the two experiments have reached that level and were installed. The dual layer display was available for both sides of the cockpit and the content (navigation display in the second experiment instead of primary flight display in the first experiment) was more appropriate to validate the potential of a dual layer display (i.e. being able to display more information at once before the pilots perceive the display as too cluttered). The interseat haptic touch screen was different from the previous experiment because it was indeed haptic now, while in the first experiment it was just a touch screen. As such pilots could actually feel the display vibrate when they touched a

A Selection of Human Factors Tools

621

button on it. Further the application that was running on the display had a number of complementary features that have added value compared to the ordinary radio panel. 3.2 Spend Time on Scenarios in Order to Get Most Out of the HF Tools In order to avoid getting carried away in installing and fine tuning flight deck technologies and not enough focus on HF tools, the consortium decided to perform some smaller scale technology related experiments in other simulators. The (high fidelity simulator) time that was saved by that change of plans was used to create scenarios that were aimed at manipulating the constructs that the HF tools were designed to measure. There were scenarios where mental workload slowly increased and where some moments where build-in that would generate peaks of workload. These scenarios were approaches to Sion airport in Switserland. Those flights started at cruise level with relatively low workload. But due to a low cloudbase, a visual approach, mountainous terrain and a relatively small runway the workload progressively increased while approaching the runway. During the flight specific requests from ATC increased workload at particular moments. The subjective workload ratings, the heart rate, respiration rate and facial temperature were all recorded during these flights so that afterwards it could be studied to what extend these measures are redundant and whether they are also complementary to each other. There where also scenarios in which SA was manipulated. By doing that one can say that pilots have a decreased SA and researchers can study the information that the HF tools provide about those situations. In one scenario an indicated air speed (IAS) discrepancy was simulated. On the left and right side of the cockpit the speedtapes gave different information. It was the pilots’ task to first find out that this was happening, and secondly to find out which of the speedtapes gave the accurate IAS. With external observers, questionnaires and eye trackers the researchers formed and impression of pilot SA and about the added value of each of the HF tools that were applied. In another SA related scenario the crew was informed after a break that ‘something’ will be changed after the break. By doing so the researchers forced the crew to be aware of a decreased SA. In fact a fuel leak was simulated and the researchers used the same HF tools as in the other SA scenario to study how the pilots regained their SA. Because the researchers controlled and manipulated mental workload and SA in these scenarios this kind of scenarios was really aimed at validation of the HF tools rather than using the HF tools to validate new cockpit technologies. 3.3 Predicting How Scenarios Will Work Out is Not Easy Even though the task of flying an aircraft has a lot of procedural aspects it still offers a great deal of freedom for pilots how to operate in particular situations. Because of that freedom it is difficult to create scenarios that will work out the same way for every crew that participates in the experiment with as major disadvantage that not all

622

R. Zon and H. van Dijk

data from every crew can be compared with the data from all the other crews. Especially responses to off-nominal events like TCAS TAs or unruly passengers are not the same for all crews. 3.4 Subject Matter Experts Are Needed It was efficient to record lots of data like eye tracker output and psychophysiological data automatically. However, not all of these automatically recorded data are easy to interpret without background knowledge. It turned out that at least three different kinds of experts were needed to interpret the results of the experiments. 1. Simulator experts who know the differences between high fidelity flight simulators and the real aircraft. 2. Pilots who can explain why subjects in the experiment make particular decisions. 3. Human Factors experts who are familiar with the kinds of data that are recorded and are able to state when a seemingly different result is truly different or just an artefact. Besides for interpretation of results also in the design processes of technologies and scenarios, specialist knowledge is needed. A number of partners from the Flight Deck Technologies strand needed either more knowledge about HF or the aviation domain. Even though pilots tend to report a lot about why they made certain decisions and how they felt at that moment, they are not fully aware of everything that is relevant and takes place around them. For researchers it is relevant as well to understand if there is information that pilots have missed. Subject matter experts who are pilots and are aware of all aspects of the simulated scenarios can evaluate the pilot behaviour. However, not just in the evaluation but also in earlier stages, like experimental design, subject matter experts can play crucial roles. Pilots have experiences from situations that they had to deal with themselves in their daily work that might be interesting to simulate. They can help in designing the scenarios in such a way that they will really work out like the researchers intended.

4 Added Value of a Set of Tools 4.1 Identify a Set of HF Tools The focus of the experiment itself was on HF tools. In the period between the first and second experiment a number of the HF tools, and the ways how they were applied in the experiments, were further developed. This resulted in a better more refined set of HF tools. The most important development is that by applying the tools together as a set quicker and easier access to data is the result. Tools that allow to store all data in one database and that allow to study all data streams in the context of the others are a major improvement compared to analysing individual tools and afterwards compare the outcomes of the different tools.

A Selection of Human Factors Tools

623

4.2 Converging Evidence Principle Combining data streams (see Figure 2) greatly increases researchers’ insight and makes the current HF performance measures more objective. It turned out that the quality of interpretations of what has happened during the experiments is better when, in a holistic way, all data sources are included. For example to compare the psychophysiologically measure mental workload with the mental workload as reported by the pilots on rating scales. This pleads for (software) tools that enable a quick and intuitive fusion of data streams so that researchers will be better able to get an overview of data streams, in the context of all other data sources that were recorded, at the same time. For a number of HF tools it is clear that they measure aspects of an underlying concept. For example a set of HF tools was applied that all indicate mental workload. A number of these tools are sensitive as well to other concepts than mental workload. Some of them are sensitive to psychological stress, coffee intake, etcetera. By comparing the data from different tools and deducing what most of them indicate it becomes more likely that the deduced trend is indeed true, not an artefact. Such artefacts may result from the fact that the tool is sensitive to other concepts than mental workload. This is what is called the converging evidence principle. Two of the HF tools that were applied, are “Gwylio” and “The Observer”. Both of these tools offer the opportunity to store data from a number of data sources in one database. This enables integration and synchronized display of multiple full-resolution video streams, eye tracking data, psychophysiological signals and event data from the high fidelity flight simulator and eventually to make quick and easy comparisons between the different data streams. Therefore these tools contribute significantly to applying the converging evidence principle. 4.3 Situational Awareness Is Hard to Measure The concept SA is complex and comprises many aspects. Numerous researchers have tried to define it. As such it is not straightforward to measure SA. The best thing to do is to use a number of measures and see if they all convert to the same direction. For example ask pilots to rate their own SA and compare that with the ratings from a subject matter expert (e.g. another pilot) who monitored the flight on video. In definitions of SA (e.g. Endsley, 1988) it is often stated that the pilots first have to notice a particular phenomenon in order to become aware, understand and project into the future. That first step, noticing, and also giving attention to something, can be measured by eye tracking. The eye tracker shows where the pilots focus, which under certain circumstances may be interpreted as giving attention to. As such an eye tracker is a helpful tool to measure the ‘basis’ for SA. Eye trackers, rating scales and expert observations together provide a more coherent impression of pilots SA than all of those tools individually do. 4.4 Certification The selected set of HF tools may eventually be used by authorities and industry as a structured way of measuring HF and HCI aspects of new technologies and applications. Besides evaluation of new technologies and applications this approach may also be used as a HF certification instrument.

624

R. Zon and H. van Dijk

Acknowledgements This paper was prepared with information from the HILAS Flight Deck Technologies strand project partners3.

References 1. van Dijk, H., Zon, G.D.R.: HILAS D3.3.3.1. - Description of conditions to be varied and the applicable scenarios, GRACE experiment, NLR-CR-2008-539, NLR, Amsterdam (August - September 2008) (2008a) 2. van Dijk, H., Zon, G.D.R.: HILAS D3.3.3.2. - Briefing guide GRACE experiment, NLRCR-2008-478, NLR, Amsterdam (August - September 2008) (2008b) 3. van Dijk, H., Zon, G.D.R.: Situational Awareness assessment in flight simulator experiment. In: The proceedings of the International Symposium on Aviation psychology (ISAP), Dayton, OHIO, April 27-30 (to be published, 2009) 4. van Egter Wissekerke, R.F.: GRACE - Generic Research Aircraft Cockpit Environment. NLR-Memorandum ATTH-2004-014, NLR, Amsterdam (2004) 5. Endsley, M.R.: Situation Awareness Global Assessment Technique (SAGAT). IEEE CH2596-5/88/0000-0789, pp. 789–795 (1988) 6. HILASa, http://www.hilas.info/mambo/ 7. HILASb, http://www.hilas.info/toolsregistry/ 8. Human Factors Working Group for U.S. Department of Transportation Federal Aviation Administration: Advisory Circular (2004) 9. Kooi, F.L., Santamaria Maurizio, C., et al.: HILAS D3.2.4.1 Report with the analysis results and the comparison with the results of the design analysis. HILAS D3.2.4.1 (2007) 10. Kooi, F. L., Toet, A.: Additive and subtractive transparent depth displays. In: Verly, J. G. (ed.) Enhanced and synthetic vision 2003, SPIE-5081, pp. 58–65. The International Society for Optical Engineering, Bellingham (2003) 11. Catton, L., Starr, A., Noyes, J., Williams, D.: The use of Low Cost Simulators as a predictor of human performance. In: International Ergonomics Association conference, Beijing China, August 9–14 (2009) 12. Roerdink, M.I., Zon, G.D.R.: Briefing guide GRACE experiment, NLR CR-2006-806, NLR, Amsterdam (January–February 2007) (2006). 13. Zon, G.D.R., Roerdink, M.I.: HCI Testing in Flight Simulator: Set Up and Crew Briefing Procedures - Unique Design and Test Cycle. In: Proceedings HCI International 2007 (2007) 14. Zon, G.D.R., Roerdink, M.I.: HILAS D3.1.2.1 Assessment methodology and tools, NLRCR-2006-275, NLR, Amsterdam (2006) 15. Zon, G.D.R., van Dijk, H., et al.: HILAS D3.4.1.1 - Report with the results of the evaluation experiments (to be published in August 2009) 16. Zon, G.D.R., de Waard, D., Heffelaar, T.: Human Factors measurement and analysis tools for cockpit evaluation and pilot behaviour. In: Workshop at “Measuring Behavior” Conference, Maastricht, the Netherlands, August 26–29 (2008), http://www.noldus.com/mb2008/ 3

The project partners in the HILAS Flight Deck Technology strand are: GE Aviation and BAE SYSTEMS from the UK, NLR, TNO, Noldus Information Technology and the University of Groningen from the Netherlands, Elbit Systems from Israel, Selex Galileo and Deep Blue from Italy, Dyoptyka from Ireland, Lufthansa Systems from Germany and Avitronics Research from Greece.

Author Index

Abendroth, Bettina 367 Allman-Ward, Mark 386 Alm, H˚ akan 349 Andersson, Dennis 326 Balcisoy, Selim 149 Baldanzini, Niccol` o 358 Balfe, Nora 590 Bazen, Gideon 111 Bencini, Giacomo 358 Birrell, Stewart A. 477 ´ Bodn´ ar, Eva 179 Bojic, Miroslav 111 Brown, Timothy 339 Bruder, Carmen 537 Bruder, Ralph 367, 508 Cain, Rebecca 386 Cao, Yujia 3 Chang, Chih-Lin 139 Chen, John J.J. 167 Chen, Wenfeng 206 Chui, Yoon Ping 317 Clarke, David 22 Cobanoglu, Murat Can 149 Cobb, Sue 22 Coppin, Gilles 489 Cosand, Louise 243 Courtney, Christopher 243 Cromie, Sam 32 Csillik, Olga 179 Davidsson, Staﬀan 349 De Ambroggi, Massimiliano 32 Dehais, Fr´ed´eric 498 Didier, Muriel 508 Diederichs, J.P. Frederik 358 Dingus, Thomas 404 Doi, Shun’ichi 376 Donath, Diana 518 Dozier, Sean 262 Dunne, Garry 386 Eißfeldt, Hinnerk 537 Enomoto, Kenji 101 Erbil, Mehmet Ali 120, 574

Fontana, Marco 358 Foran, Tom 120 Forsythe, Alexandra 62, 158 Fu, Xiaolan 206 Fuchs, Klaus 367 Fukuda, Katsuyuki 223 Fukuoka, Mamiko 376 Gao, Yashuang 167 Giudice, Sebastiano 386 Golightly, David 590 Gramß, Denise 297 Grandt, Morten 606 Grasshoﬀ, Dietrich 537 H¨ akkinen, Jukka 13 Hale, Kelly 279 Hamada, Hiroto 223 Hankey, Jonathan 404 Harris, Don 529, 547 Hartvigsen, Gunnar 233 Hasse, Catrin 537 Hautus, Michael J. 167 Hedstr¨ om, Johan 326 Heinil¨ a, Juhani 13 Hercegﬁ, K´ aroly 179 Hirai, Nobuhide 434 Hosseini, S.M. Hadi 187 Hsu, Tai-Yen 139 Hsu, Yueh-Ling 547 Hua, Lesheng 554 Humphreys, Louise 386 Inoue, Yuichi 451 Inoue, Yumiko 414 Isherwood, Sarah 62, 197 Ishii, Hirotake 101 Iwakawa, Mikio 101 Iyer, Arvind 243 Izs´ o, Lajos 179 Jennings, Paul 386 Jones, David 279 Jou Yung-Tsan 139

626

Author Index

Kageyama, Ichiro 396 Kalakoski, Virpi 13 Kallinen, Kari 13 Kamakura, Yoshiyuki 414 Karamanoglu, Mehmet 120, 574 Kato, Satoshi 434 Kawashima, Ryuta 187 Kay, Alison Mragaret 32 Keus, Danielle 213 Khoshnevis, Behrokh 253 Kikuchi, Senichiro 434 Kimura, Takahiko 376 Kindiroglu, Ahmet Alp 149 Kolski, Christophe 52 Kontogiannis, Tom 32 Kristiansen, Kari-Ann 233 Kuriyagawa, Yukiyo 396 K¨ uttelwesch, Heinz 606 Laarni, Jari 13 Laeng, Bruno 233 Laquai, Florian 424 Lawson, Glyn 22 Legras, Fran¸cois 489 Leva, Maria Chiara 32 Li, Kai-Way 139 Li, Lon-Wen 547 Li, Wen-Chin 547 Lif, Patrik 326 Lindahl, Bj¨ orn 326 Ling, Chen 554 Liu, Tianwei 206 Liu, Ying-Chieh 43 L¨ opp¨ onen, Paula 13 Lu, Su-Ju 43 Lukander, Kristian 13 Luxhøj, James T. 564 Mahatody, Thomas 52 Marck, Jan-Willem 213 Marshall, Dawn 339 Marti, Patrizia 580 Mattei, Fabio 32 McDougall, Sin´e 62, 71 McEneaney, John E. 81 McLaughlin, Shane 404 Meitinger, Claudia 91 Mercier, Stephane 498 Millen, Laura 590 Miller, Ronald Mellado 600

Mitchell, Diane 279 Miura, Naoki 187 Miura, Toshiaki 376 Miwakeichi, Fumikazu 434 Miyagi, Kazune 101 Moeckli, Jane 339 Montanari, Roberto 358 M¨ uhlhausen, Susi 297 Murata, Hiroshi 434 Neef, Martijn 213 Nijholt, Anton 3 Nikolaou, Stella 358 Noguchi, Yoshihiro 414 Nonomura, Tomohide 451 Obinata, Goro 223 Odedra, Siddharth 120, 574 Ohsuga, Mieko 396, 414 Olsen, Bernt Ivar 233 Oomes, Augustinus H.J. 111 ¨ Oztekin, Ahmet 564 Palom¨ aki, Tapio 13 Pan, Yan 289 Pan, Hsu-Chang 139 Parsons, Thomas D. 243 Petocz, Agnes 62, 126 Placencia, Greg 253 Playfoot, David 71 Poitschke, Tony 424 Pozzi, Simone 580 Prior, Stephen D. 120, 574 Rahimi, Mansour 253 Rau, Pei-Luen Patrick 471 Ravaja, Niklas 13 Reppa, Irene 62, 71 Rigoll, Gerhard 424 Rizzo, Albert A. 243 Rostami, Maryam 187 Ryu, Young Sam 262 Sagar, Mouldi 52 Saget, Sylvie 489 Saikayasit, Rose 269 Salvendy, Gavriel 471 Samms, Charneta 279 Sandberg, Karl W. 289 Santamaria Maurizio, Chiara

580

Author Index Sass, Judit 179 Sato, Masanao 434 Savioja, Paula 13 Schulte, Axel 91, 518 Schweizer, Karin 297 Sharples, Sarah 22, 269, 590 Shen, I-Hsuan 307 Shen, Siu-Tsen 120, 574 Shieh, Kong-King 307 Shimada, Keiji 414 Shimizu, Shunji 434 Shimoda, Hiroshi 101 Smith, Gary 71 Spadoni, Andrea 358 Stanton, Neville A. 477 Stevens, Catherine 62, 126 Sugiura, Motoaki 187 Suh, Taewon 262 Sulistyawati, Ketut 317 Svenmarck, Peter 326

Theune, Mari¨et 3 Tilton, Nathan F. 600 Tokuda, Satoru 223

Takahashi, Hiroshi 441 Takahashi, Makoto 187 Taylor, Mark P. 167 Tei, Shoyo 451 Terano, Masaaki 101 Tessier, Catherine 498

Xuan, Yuming

Ueno, Akinori

451

V¨ aa ¨t¨ anen, Antti 13 van der Voort, Mascha 461 van Dijk, Henk 616 van Rijn, Martin 213 van Waterschoot, Boris 461 Vogel-Heuser, Birgit 297 von Wilamowitz-Moellendorﬀ, Margeritta 508 Wahlberg, Olof 289 Wang, Pei 471 Watanabe, Eiju 434 White, Anthony S. 120 Widlroither, Harald 358 Witt, Oliver 606 206

Yoshizawa, Yasuhito 434 Young, Mark S. 477 Zon, Rolf

616

627

Ergonomics and Health Aspects of Work with Computers: International Conference, EHAWC 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Online Communities and Social Computing: Third International Conference, OCSC 2009, Held as Part of HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)

Internationalization, Design and Global Development: Third International Conference, IDGD 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. New Trends: 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19-24, 2009, Proceedings, Part ... Applications, incl. Internet Web, and HCI)

Engineering Psychology and Cognitive Ergonomics - EPCE 2011

Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009. (Proceedings. Lecture Notes in Computer Science)

Virtual and Mixed Reality: Third International Conference, VMR 2009, Held as Part of HCI International 2009, San Diego, CA USA, July, 19-24, 2009, ... Applications, incl. Internet Web, and HCI)

Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Interacting in Various Application Domains: 13th International Conference, HCI International 2009, San Diego, CA, USA, ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Novel Interaction Methods and Techniques: 13th International Conference, HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)

Adaptive Learning Agents: Second Workshop, ALA 2009, Held as Part of the AAMAS 2009 Conference in Budapest, Hungary, May 12, 2009. Revised Selected ... Lecture Notes in Artificial Intelligence)

Hybrid Artificial Intelligence Systems: 4th International Conference, HAIS 2009, Salamanca, Spain, June 10-12, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Artificial Intelligence and Computational Intelligence: International Conference, AICI 2009, Shanghai, China, November 7-8, 2009, Proceedings (Lecture ... Lecture Notes in Artificial Intelligence)

Control and Automation: International Conference, CA 2009, Held as Part of the Future Generation Information Technology Conference, CA 2009, Jeju Island, ... in Computer and Information Science)

Engineering Psychology and Cognitive Ergonomics, 7 conf., EPCE 2007

Flexible Query Answering Systems: 8th International Conference, FQAS 2009, Roskilde, Denmark, October 26-28, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Scalable Uncertainty Management: Third International Conference, SUM 2009, Washington, DC, USA, September 28-30, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Case-Based Reasoning Research and Development: 8th International Conference on Case-Based Reasoning, ICCBR 2009 Seattle, WA, USA, July 20-23, 2009 Proceedings ... Lecture Notes in Artificial Intelligence)

Case-Based Reasoning Research and Development: 8th International Conference on Case-Based Reasoning, ICCBR 2009 Seattle, WA, USA, July 20-23, 2009 ... Lecture Notes in Artificial Intelligence)

Mobile Computing, Applications, and Services: First International ICST Conference, MobiCASE 2009, San Diego, CA, USA, October 26-29, 2009, Revised Selected ... and Telecommunications Engineering)

Human-Computer Interaction. Ambient, Ubiquitous and Intelligent Interaction: 13th International Conference, HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Contemporary Ergonomics 2009: Proceedings of the International Conference on Contemporary Ergonomics 2009

Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics, ISI 2006, San Diego, CA, USA, May

Artificial Intelligence and Cognitive Science: 20th Irish Conference, AICS 2009, Dublin, Ireland, August 19-21, 2009, Revised Selected Papers (Lecture ... Lecture Notes in Artificial Intelligence)

Text, Speech and Dialogue: 12th International Conference, TSD 2009, Pilsen, Czech Republic, September 13-17, 2009. Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Advanced Data Mining and Applications: 5th International Conference, ADMA 2009, Chengdu, China, August 17-19, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Intelligent Robotics and Applications: Second International Conference, ICIRA 2009, Singapore, December 16-18, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Entertainment Computing -- ICEC 2009: 8th International Conference, ICEC 2009, Paris, France, September 3-5, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)

Engineering Psychology and Cognitive Ergonomics: 8th International Conference, EPCE 2009, Held as Part of HCI International 2009, San Diego, CA, USA, ... Lecture Notes in Artificial Intelligence)

Ergonomics and Health Aspects of Work with Computers: International Conference, EHAWC 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Online Communities and Social Computing: Third International Conference, OCSC 2009, Held as Part of HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)

Internationalization, Design and Global Development: Third International Conference, IDGD 2009, Held as Part of HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. New Trends: 13th International Conference, HCI International 2009, San Diego, CA, USA, July 19-24, 2009, Proceedings, Part ... Applications, incl. Internet Web, and HCI)

Engineering Psychology and Cognitive Ergonomics - EPCE 2011

Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009. (Proceedings. Lecture Notes in Computer Science)

Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009. (Proceedings. Lecture Notes in Computer Science)

Virtual and Mixed Reality: Third International Conference, VMR 2009, Held as Part of HCI International 2009, San Diego, CA USA, July, 19-24, 2009, ... Applications, incl. Internet Web, and HCI)

Digital Human Modeling: Second International Conference, ICDHM 2009, Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Interacting in Various Application Domains: 13th International Conference, HCI International 2009, San Diego, CA, USA, ... Applications, incl. Internet Web, and HCI)

Human-Computer Interaction. Novel Interaction Methods and Techniques: 13th International Conference, HCI International 2009, San Diego, CA, USA, July ... Applications, incl. Internet Web, and HCI)

Adaptive Learning Agents: Second Workshop, ALA 2009, Held as Part of the AAMAS 2009 Conference in Budapest, Hungary, May 12, 2009. Revised Selected ... Lecture Notes in Artificial Intelligence)

Hybrid Artificial Intelligence Systems: 4th International Conference, HAIS 2009, Salamanca, Spain, June 10-12, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Hybrid Artificial Intelligence Systems: 4th International Conference, HAIS 2009, Salamanca, Spain, June 10-12, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Artificial Intelligence and Computational Intelligence: International Conference, AICI 2009, Shanghai, China, November 7-8, 2009, Proceedings (Lecture ... Lecture Notes in Artificial Intelligence)

Control and Automation: International Conference, CA 2009, Held as Part of the Future Generation Information Technology Conference, CA 2009, Jeju Island, ... in Computer and Information Science)

Engineering Psychology and Cognitive Ergonomics, 7 conf., EPCE 2007

Flexible Query Answering Systems: 8th International Conference, FQAS 2009, Roskilde, Denmark, October 26-28, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Scalable Uncertainty Management: Third International Conference, SUM 2009, Washington, DC, USA, September 28-30, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Case-Based Reasoning Research and Development: 8th International Conference on Case-Based Reasoning, ICCBR 2009 Seattle, WA, USA, July 20-23, 2009 Proceedings ... Lecture Notes in Artificial Intelligence)

Case-Based Reasoning Research and Development: 8th International Conference on Case-Based Reasoning, ICCBR 2009 Seattle, WA, USA, July 20-23, 2009 ... Lecture Notes in Artificial Intelligence)

Mobile Computing, Applications, and Services: First International ICST Conference, MobiCASE 2009, San Diego, CA, USA, October 26-29, 2009, Revised Selected ... and Telecommunications Engineering)

Human-Computer Interaction. Ambient, Ubiquitous and Intelligent Interaction: 13th International Conference, HCI International 2009, San Diego, CA, ... Applications, incl. Internet Web, and HCI)

Contemporary Ergonomics 2009: Proceedings of the International Conference on Contemporary Ergonomics 2009

Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics, ISI 2006, San Diego, CA, USA, May

Artificial Intelligence and Cognitive Science: 20th Irish Conference, AICS 2009, Dublin, Ireland, August 19-21, 2009, Revised Selected Papers (Lecture ... Lecture Notes in Artificial Intelligence)

Text, Speech and Dialogue: 12th International Conference, TSD 2009, Pilsen, Czech Republic, September 13-17, 2009. Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Advanced Data Mining and Applications: 5th International Conference, ADMA 2009, Chengdu, China, August 17-19, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Intelligent Robotics and Applications: Second International Conference, ICIRA 2009, Singapore, December 16-18, 2009, Proceedings (Lecture Notes in ... Lecture Notes in Artificial Intelligence)

Entertainment Computing -- ICEC 2009: 8th International Conference, ICEC 2009, Paris, France, September 3-5, 2009, Proceedings (Lecture Notes in ... Applications, incl. Internet Web, and HCI)

Recommend Documents