Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5620
Vincent G. Duffy (Ed.)
Digital Human Modeling Second International Conference, ICDHM 2009 Held as Part of HCI International 2009 San Diego, CA, USA, July 19-24, 2009 Proceedings
13
Volume Editor Vincent G. Duffy Purdue University School of Industrial Engineering 315 North Grant Street, Grissom Hall West Lafayette, IN 47907-2023, USA E-mail:
[email protected]
Library of Congress Control Number: Applied for CR Subject Classification (1998): H.5, H.1, H.3, H.4.2, I.2-6, J.3 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13
0302-9743 3-642-02808-X Springer Berlin Heidelberg New York 978-3-642-02808-3 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12712076 06/3180 543210
Foreword
The 13th International Conference on Human–Computer Interaction, HCI International 2009, was held in San Diego, California, USA, July 19–24, 2009, jointly with the Symposium on Human Interface (Japan) 2009, the 8th International Conference on Engineering Psychology and Cognitive Ergonomics, the 5th International Conference on Universal Access in Human–Computer Interaction, the Third International Conference on Virtual and Mixed Reality, the Third International Conference on Internationalization, Design and Global Development, the Third International Conference on Online Communities and Social Computing, the 5th International Conference on Augmented Cognition, the Second International Conference on Digital Human Modeling, and the First International Conference on Human Centered Design. A total of 4,348 individuals from academia, research institutes, industry and governmental agencies from 73 countries submitted contributions, and 1,397 papers that were judged to be of high scientific quality were included in the program. These papers address the latest research and development efforts and highlight the human aspects of the design and use of computing systems. The papers accepted for presentation thoroughly cover the entire field of human–computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. This volume, edited by Vincent G. Duffy, contains papers in the thematic area of Digital Human Modeling, addressing the following major topics: • • • • • • •
Face, Head and Body Modeling Modeling Motion Modeling Behavior, Emotion and Cognition Human Modeling in Transport Applications Human Modeling Applications in Health and Rehabilitation Ergonomic and Industrial Applications Advances in Digital Human Modeling
The remaining volumes of the HCI International 2009 proceedings are: • • • • •
Volume 1, LNCS 5610, Human–Computer Interaction––New Trends (Part I), edited by Julie A. Jacko Volume 2, LNCS 5611, Human–Computer Interaction––Novel Interaction Methods and Techniques (Part II), edited by Julie A. Jacko Volume 3, LNCS 5612, Human–Computer Interaction––Ambient, Ubiquitous and Intelligent Interaction (Part III), edited by Julie A. Jacko Volume 4, LNCS 5613, Human–Computer Interaction––Interacting in Various Application Domains (Part IV), edited by Julie A. Jacko Volume 5, LNCS 5614, Universal Access in Human–Computer Interaction––Addressing Diversity (Part I), edited by Constantine Stephanidis
VI
Foreword
• • • • • • • • • • •
Volume 6, LNCS 5615, Universal Access in Human–Computer Interaction––Intelligent and Ubiquitous Interaction Environments (Part II), edited by Constantine Stephanidis Volume 7, LNCS 5616, Universal Access in Human–Computer Interaction––Applications and Services (Part III), edited by Constantine Stephanidis Volume 8, LNCS 5617, Human Interface and the Management of Information––Designing Information Environments (Part I), edited by Michael J. Smith and Gavriel Salvendy Volume 9, LNCS 5618, Human Interface and the Management of Information––Information and Interaction (Part II), edited by Gavriel Salvendy and Michael J. Smith Volume 10, LNCS 5619, Human Centered Design, edited by Masaaki Kurosu Volume 12, LNCS 5621, Online Communities and Social Computing, edited by A. Ant Ozok and Panayiotis Zaphiris Volume 13, LNCS 5622, Virtual and Mixed Reality, edited by Randall Shumaker Volume 14, LNCS 5623, Internationalization, Design and Global Development, edited by Nuray Aykin Volume 15, LNCS 5624, Ergonomics and Health Aspects of Work with Computers, edited by Ben-Tzion Karsh Volume 16, LNAI 5638, The Foundations of Augmented Cognition: Neuroergonomics and Operational Neuroscience, edited by Dylan Schmorrow, Ivy Estabrooke and Marc Grootjen Volume 17, LNAI 5639, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris
I would like to thank the Program Chairs and the members of the Program Boards of all thematic areas, listed below, for their contribution to the highest scientific quality and the overall success of HCI International 2009.
Ergonomics and Health Aspects of Work with Computers Program Chair: Ben-Tzion Karsh Arne Aarås, Norway Pascale Carayon, USA Barbara G.F. Cohen, USA Wolfgang Friesdorf, Germany John Gosbee, USA Martin Helander, Singapore Ed Israelski, USA Waldemar Karwowski, USA Peter Kern, Germany Danuta Koradecka, Poland Kari Lindström, Finland
Holger Luczak, Germany Aura C. Matias, Philippines Kyung (Ken) Park, Korea Michelle M. Robertson, USA Michelle L. Rogers, USA Steven L. Sauter, USA Dominique L. Scapin, France Naomi Swanson, USA Peter Vink, The Netherlands John Wilson, UK Teresa Zayas-Cabán, USA
Foreword
Human Interface and the Management of Information Program Chair: Michael J. Smith Gunilla Bradley, Sweden Hans-Jörg Bullinger, Germany Alan Chan, Hong Kong Klaus-Peter Fähnrich, Germany Michitaka Hirose, Japan Jhilmil Jain, USA Yasufumi Kume, Japan Mark Lehto, USA Fiona Fui-Hoon Nah, USA Shogo Nishida, Japan Robert Proctor, USA Youngho Rhee, Korea
Anxo Cereijo Roibás, UK Katsunori Shimohara, Japan Dieter Spath, Germany Tsutomu Tabe, Japan Alvaro D. Taveira, USA Kim-Phuong L. Vu, USA Tomio Watanabe, Japan Sakae Yamamoto, Japan Hidekazu Yoshikawa, Japan Li Zheng, P.R. China Bernhard Zimolong, Germany
Human–Computer Interaction Program Chair: Julie A. Jacko Sebastiano Bagnara, Italy Sherry Y. Chen, UK Marvin J. Dainoff, USA Jianming Dong, USA John Eklund, Australia Xiaowen Fang, USA Ayse Gurses, USA Vicki L. Hanson, UK Sheue-Ling Hwang, Taiwan Wonil Hwang, Korea Yong Gu Ji, Korea Steven Landry, USA
Gitte Lindgaard, Canada Chen Ling, USA Yan Liu, USA Chang S. Nam, USA Celestine A. Ntuen, USA Philippe Palanque, France P.L. Patrick Rau, P.R. China Ling Rothrock, USA Guangfeng Song, USA Steffen Staab, Germany Wan Chul Yoon, Korea Wenli Zhu, P.R. China
Engineering Psychology and Cognitive Ergonomics Program Chair: Don Harris Guy A. Boy, USA John Huddlestone, UK Kenji Itoh, Japan Hung-Sying Jing, Taiwan Ron Laughery, USA Wen-Chin Li, Taiwan James T. Luxhøj, USA
Nicolas Marmaras, Greece Sundaram Narayanan, USA Mark A. Neerincx, The Netherlands Jan M. Noyes, UK Kjell Ohlsson, Sweden Axel Schulte, Germany Sarah C. Sharples, UK
VII
VIII
Foreword
Neville A. Stanton, UK Xianghong Sun, P.R. China Andrew Thatcher, South Africa
Matthew J.W. Thomas, Australia Mark Young, UK
Universal Access in Human–Computer Interaction Program Chair: Constantine Stephanidis Julio Abascal, Spain Ray Adams, UK Elisabeth André, Germany Margherita Antona, Greece Chieko Asakawa, Japan Christian Bühler, Germany Noelle Carbonell, France Jerzy Charytonowicz, Poland Pier Luigi Emiliani, Italy Michael Fairhurst, UK Dimitris Grammenos, Greece Andreas Holzinger, Austria Arthur I. Karshmer, USA Simeon Keates, Denmark Georgios Kouroupetroglou, Greece Sri Kurniawan, USA
Patrick M. Langdon, UK Seongil Lee, Korea Zhengjie Liu, P.R. China Klaus Miesenberger, Austria Helen Petrie, UK Michael Pieper, Germany Anthony Savidis, Greece Andrew Sears, USA Christian Stary, Austria Hirotada Ueda, Japan Jean Vanderdonckt, Belgium Gregg C. Vanderheiden, USA Gerhard Weber, Germany Harald Weber, Germany Toshiki Yamaoka, Japan Panayiotis Zaphiris, UK
Virtual and Mixed Reality Program Chair: Randall Shumaker Pat Banerjee, USA Mark Billinghurst, New Zealand Charles E. Hughes, USA David Kaber, USA Hirokazu Kato, Japan Robert S. Kennedy, USA Young J. Kim, Korea Ben Lawson, USA
Gordon M. Mair, UK Miguel A. Otaduy, Switzerland David Pratt, UK Albert “Skip” Rizzo, USA Lawrence Rosenblum, USA Dieter Schmalstieg, Austria Dylan Schmorrow, USA Mark Wiederhold, USA
Internationalization, Design and Global Development Program Chair: Nuray Aykin Michael L. Best, USA Ram Bishu, USA Alan Chan, Hong Kong Andy M. Dearden, UK
Susan M. Dray, USA Vanessa Evers, The Netherlands Paul Fu, USA Emilie Gould, USA
Foreword
Sung H. Han, Korea Veikko Ikonen, Finland Esin Kiris, USA Masaaki Kurosu, Japan Apala Lahiri Chavan, USA James R. Lewis, USA Ann Light, UK James J.W. Lin, USA Rungtai Lin, Taiwan Zhengjie Liu, P.R. China Aaron Marcus, USA Allen E. Milewski, USA
Elizabeth D. Mynatt, USA Oguzhan Ozcan, Turkey Girish Prabhu, India Kerstin Röse, Germany Eunice Ratna Sari, Indonesia Supriya Singh, Australia Christian Sturm, Spain Adi Tedjasaputra, Singapore Kentaro Toyama, India Alvin W. Yeo, Malaysia Chen Zhao, P.R. China Wei Zhou, P.R. China
Online Communities and Social Computing Program Chairs: A. Ant Ozok, Panayiotis Zaphiris Chadia N. Abras, USA Chee Siang Ang, UK Amy Bruckman, USA Peter Day, UK Fiorella De Cindio, Italy Michael Gurstein, Canada Tom Horan, USA Anita Komlodi, USA Piet A.M. Kommers, The Netherlands Jonathan Lazar, USA Stefanie Lindstaedt, Austria
Gabriele Meiselwitz, USA Hideyuki Nakanishi, Japan Anthony F. Norcio, USA Jennifer Preece, USA Elaine M. Raybourn, USA Douglas Schuler, USA Gilson Schwartz, Brazil Sergei Stafeev, Russia Charalambos Vrasidas, Cyprus Cheng-Yen Wang, Taiwan
Augmented Cognition Program Chair: Dylan D. Schmorrow Andy Bellenkes, USA Andrew Belyavin, UK Joseph Cohn, USA Martha E. Crosby, USA Tjerk de Greef, The Netherlands Blair Dickson, UK Traci Downs, USA Julie Drexler, USA Ivy Estabrooke, USA Cali Fidopiastis, USA Chris Forsythe, USA Wai Tat Fu, USA Henry Girolamo, USA
Marc Grootjen, The Netherlands Taro Kanno, Japan Wilhelm E. Kincses, Germany David Kobus, USA Santosh Mathan, USA Rob Matthews, Australia Dennis McBride, USA Robert McCann, USA Jeff Morrison, USA Eric Muth, USA Mark A. Neerincx, The Netherlands Denise Nicholson, USA Glenn Osga, USA
IX
X
Foreword
Dennis Proffitt, USA Leah Reeves, USA Mike Russo, USA Kay Stanney, USA Roy Stripling, USA Mike Swetnam, USA Rob Taylor, UK
Maria L.Thomas, USA Peter-Paul van Maanen, The Netherlands Karl van Orden, USA Roman Vilimek, Germany Glenn Wilson, USA Thorsten Zander, Germany
Digital Human Modeling Program Chair: Vincent G. Duffy Karim Abdel-Malek, USA Thomas J. Armstrong, USA Norm Badler, USA Kathryn Cormican, Ireland Afzal Godil, USA Ravindra Goonetilleke, Hong Kong Anand Gramopadhye, USA Sung H. Han, Korea Lars Hanson, Sweden Pheng Ann Heng, Hong Kong Tianzi Jiang, P.R. China
Kang Li, USA Zhizhong Li, P.R. China Timo J. Määttä, Finland Woojin Park, USA Matthew Parkinson, USA Jim Potvin, Canada Rajesh Subramanian, USA Xuguang Wang, France John F. Wiechel, USA Jingzhou (James) Yang, USA Xiu-gan Yuan, P.R. China
Human Centered Design Program Chair: Masaaki Kurosu Gerhard Fischer, USA Tom Gross, Germany Naotake Hirasawa, Japan Yasuhiro Horibe, Japan Minna Isomursu, Finland Mitsuhiko Karashima, Japan Tadashi Kobayashi, Japan
Kun-Pyo Lee, Korea Loïc Martínez-Normand, Spain Dominique L. Scapin, France Haruhiko Urokohara, Japan Gerrit C. van der Veer, The Netherlands Kazuhiko Yamazaki, Japan
In addition to the members of the Program Boards above, I also wish to thank the following volunteer external reviewers: Gavin Lew from the USA, Daniel Su from the UK, and Ilia Adami, Ioannis Basdekis, Yannis Georgalis, Panagiotis Karampelas, Iosif Klironomos, Alexandros Mourouzis, and Stavroula Ntoa from Greece. This conference could not have been possible without the continuous support and advice of the Conference Scientific Advisor, Prof. Gavriel Salvendy, as well as the dedicated work and outstanding efforts of the Communications Chair and Editor of HCI International News, Abbas Moallem.
Foreword
XI
I would also like to thank for their contribution toward the organization of the HCI International 2009 conference the members of the Human–Computer Interaction Laboratory of ICS-FORTH, and in particular Margherita Antona, George Paparoulis, Maria Pitsoulaki, Stavroula Ntoa, and Maria Bouhli. Constantine Stephanidis
HCI International 2011
The 14th International Conference on Human–Computer Interaction, HCI International 2011, will be held jointly with the affiliated conferences in the summer of 2011. It will cover a broad spectrum of themes related to human–computer interaction, including theoretical issues, methods, tools, processes and case studies in HCI design, as well as novel interaction techniques, interfaces and applications. The proceedings will be published by Springer. More information about the topics, as well as the venue and dates of the conference, will be announced through the HCI International Conference series website: http://www.hci-international.org/
General Chair Professor Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email:
[email protected]
Table of Contents
Part I: Face, Head and Body Modeling Static and Dynamic Human Shape Modeling . . . . . . . . . . . . . . . . . . . . . . . . Zhiqing Cheng and Kathleen Robinette
3
An Advanced Modality of Visualization and Interaction with Virtual Models of the Human Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lucio T. De Paolis, Marco Pulimeno, and Giovanni Aloisio
13
3D Body Scanning’s Contribution to the Use of Apparel as an Identity Construction Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marie-Eve Faust and Serge Carrier
19
Facial Shape Analysis and Sizing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . Afzal Godil
29
Facial Gender Classification Using LUT-Based Sub-images and DIE . . . . Jong-Bae Jeon, Sang-Hyeon Jin, Dong-Ju Kim, and Kwang-Seok Hong
36
Anthropometric Measurement of the Hands of Chinese Children . . . . . . . Linghua Ran, Xin Zhang, Chuzhi Chao, Taijie Liu, and Tingting Dong
46
Comparisons of 3D Shape Clustering with Different Face Area Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianwei Niu, Zhizhong Li, and Song Xu
55
Block Division for 3D Head Shape Clustering . . . . . . . . . . . . . . . . . . . . . . . . Jianwei Niu, Zhizhong Li, and Song Xu
64
Joint Coupling for Human Shoulder Complex . . . . . . . . . . . . . . . . . . . . . . . . Jingzhou (James) Yang, Xuemei Feng, Joo H. Kim, Yujiang Xiang, and Sudhakar Rajulu
72
Part II: Modeling Motion Development of a Kinematic Hand Model for Study and Design of Hose Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas J. Armstrong, Christopher Best, Sungchan Bae, Jaewon Choi, D. Christian Grieshaber, Daewoo Park, Charles Woolley, and Wei Zhou
85
XVI
Table of Contents
Generation of Percentile Values for Human Joint Torque Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florian Engstler and Heiner Bubb
95
Adaptive Motion Pattern Recognition: Implementing Playful Learning through Embodied Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anja Hashagen, Christian Zabel, Heidi Schelhowe, and Saeed Zare
105
A Multi-functional Visualization System for Motion Captured Human Body Based on Virtual Reality Technology . . . . . . . . . . . . . . . . . . . . . . . . . . Qichang He, Lifeng Zhang, Xiumin Fan, and Yong Hu
115
Augmented Practice Mirror: A Self-learning Support System of Physical Motion with Real-Time Comparison to Teacher’s Model . . . . . . . . . . . . . . Itaru Kuramoto, Yoshikazu Inagaki, Yu Shibuya, and Yoshihiro Tsujino
123
Video-Based Human Motion Estimation System . . . . . . . . . . . . . . . . . . . . . Mariofanna Milanova and Leonardo Bocchi
132
Virtual Human Hand: Grasping and Simulation . . . . . . . . . . . . . . . . . . . . . . Esteban Pe˜ na-Pitarch, Jingzhou (James) Yang, and Karim Abdel-Malek
140
Harmonic Gait under Primitive DOF for Biped Robot . . . . . . . . . . . . . . . . Shigeki Sugiyama
150
Problems Encountered in Seated Arm Reach Posture Reconstruction: Need for a More Realistic Spine and Upper Limb Kinematic Model . . . . . Xuguang Wang Intelligent Motion Tracking by Combining Specialized Algorithms . . . . . . Matthias Weber
160 170
Part III: Modeling Behavior, Emotion and Cognition Ambient Compass: One Approach to Model Spatial Relations . . . . . . . . . Petr Aksenov, Geert Vanderhulst, Kris Luyten, and Karin Coninx
183
A Comprehension Based Cognitive Model of Situation Awareness . . . . . . Martin R.K. Baumann and Josef F. Krems
192
A Probabilistic Approach for Modeling Human Behavior in Smart Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christoph Burghardt and Thomas Kirste
202
PERMUTATION: A Corpus-Based Approach for Modeling Personality and Multimodal Expression of Affects in Virtual Characters . . . . . . . . . . . C´eline Clavel and Jean-Claude Martin
211
Table of Contents
XVII
Workload Assessment in Field Using the Ambulatory CUELA System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rolf Ellegast, Ingo Hermanns, and Christoph Schiefer
221
Computational Nonlinear Dynamics Model of Percept Switching with Ambiguous Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norbert F¨ urstenau
227
A Computational Implementation of a Human Attention Guiding Mechanism in MIDAS v5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brian F. Gore, Becky L. Hooey, Christopher D. Wickens, and Shelly Scott-Nash
237
Towards a Computational Model of Perception and Action in Human Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pascal Haazebroek and Bernhard Hommel
247
The Five Commandments of Activity-Aware Ubiquitous Computing Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nasim Mahmud, Jo Vermeulen, Kris Luyten, and Karin Coninx
257
What the Eyes Reveal: Measuring the Cognitive Workload of Teams . . . . Sandra P. Marshall
265
User Behavior Mining for On-Line GUI Adaptation . . . . . . . . . . . . . . . . . . Wei Pan, Yiqiang Chen, and Junfa Liu
275
Modeling Human Actors in an Intelligent Automated Warehouse . . . . . . . Davy Preuveneers and Yolande Berbers
285
Bridging the Gap between HCI and DHM: The Modeling of Spatial Awareness within a Cognitive Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . Bryan Robbins, Daniel Carruth, and Alexander Morais Behavior-Sensitive User Interfaces for Smart Environments . . . . . . . . . . . . Veit Schwartze, Sebastian Feuerstack, and Sahin Albayrak Non-intrusive Personalized Mental Workload Evaluation for Exercise Intensity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. Luke Thomas, Yingzi Du, Tron Artavatkun, and Jin-hua She
295 305
315
Incorporating Cognitive Aspects in Digital Human Modeling . . . . . . . . . . Peter Thorvald, Dan H¨ ogberg, and Keith Case
323
Workload-Based Assessment of a User Interface Design . . . . . . . . . . . . . . . Patrice D. Tremoulet, Patrick L. Craven, Susan Harkness Regli, Saki Wilcox, Joyce Barton, Kathleeen Stibler, Adam Gifford, and Marianne Clark
333
XVIII
Table of Contents
Part IV: Human Modeling in Transport Applications A Simple Simulation Predicting Driver Behavior, Attitudes and Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aladino Amantini and Pietro Carlo Cacciabue
345
Nautical PSI - Virtual Nautical Officers as Test Drivers in Ship Bridge Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ulrike Br¨ uggemann and Stefan Strohschneider
355
Determining Cockpit Dimensions and Associative Dimensions between Components in Cockpit of Ultralight Plane for Taiwanese . . . . . . . . . . . . . Dengchuan Cai, Lan-Ling Huang, Tesheng Liu, and Manlai You
365
Multilevel Analysis of Human Performance Models in Safety-Critical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeronimo Dzaack and Leon Urbas
375
Development of a Driver Model in Powered Wheelchair Operation . . . . . . Takuma Ito, Takenobu Inoue, Motoki Shino, and Minoru Kamata A Model of Integrated Operator-System Separation Assurance and Collision Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steven J. Landry and Amit V. Lagu Modeling Pilot and Driver Behavior for Human Error Simulation . . . . . . Andreas L¨ udtke, Lars Weber, Jan-Patrick Osterloh, and Bertram Wortelen
384
394 403
Further Steps towards Driver Modeling According to the Bayesian Programming Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claus M¨ obus and Mark Eilers
413
Probabilistic and Empirical Grounded Modeling of Agents in (Partial) Cooperative Traffic Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claus M¨ obus, Mark Eilers, Hilke Garbe, and Malte Zilinski
423
A Contribution to Integrated Driver Modeling: A Coherent Framework for Modeling Both Non-routine and Routine Elements of the Driving Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Mihalyi, Barbara Deml, and Thomas Augustin The New BMW iDrive – Applied Processes and Methods to Assure High Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bernhard Niedermaier, Stephan Durach, Lutz Eckstein, and Andreas Keinath Method to Evaluate Driver’s Workload in Real Road Context . . . . . . . . . . Annie Pauzi´e
433
443
453
Table of Contents
Intelligent Agents for Training On-Board Fire Fighting . . . . . . . . . . . . . . . Karel van den Bosch, Maaike Harbers, Annerieke Heuvelink, and Willem van Doesburg
XIX
463
Part V: Human Modeling Applications in Health and Rehabilitation Eprescribing Initiatives and Knowledge Acquisition in Ambulatory Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashley J. Benedict, Jesse C. Crosson, Akshatha Pandith, Robert Hannemann, Lynn A. Nuti, and Vincent G. Duffy Using 3D Head and Respirator Shapes to Analyze Respirator Fit . . . . . . Kathryn M. Butler Hyperkalemia vs. Ischemia Effects in Fast or Unstable Pacing: A Cardiac Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioanna Chouvarda and Nicos Maglaveras Learning from Risk Assessment in Radiotherapy . . . . . . . . . . . . . . . . . . . . . Enda F. Fallon, Liam Chadwick, and Wil van der Putten Simulation-Based Discomfort Prediction of the Lower Limb Handicapped with Prosthesis in the Climbing Tasks . . . . . . . . . . . . . . . . . . Yan Fu, Shiqi Li, Mingqiang Yin, and Yueqing Bian Application of Human Modeling in Health Care Industry . . . . . . . . . . . . . Lars Hanson, Dan H¨ ogberg, Daniel Lundstr¨ om, and Maria W˚ arell A Simulation Approach to Understand the Viability of RFID Technology in Reducing Medication Dispensing Errors . . . . . . . . . . . . . . . . Esther Jun, Jonathan Lee, and Xiaobo Shi Towards a Visual Representation of the Effects of Reduced Muscle Strength in Older Adults: New Insights and Applications for Design and Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Loudon and Alastair S. Macdonald A Novel Approach to CT Scans’ Interpretation via Incorporation into a VR Human Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sophia Sakellariou, Vassilis Charissis, Ben M. Ward, David Chanock, and Paul Anderson The Performance of BCMA-Aided Healthcare Service: Implementation Factors and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Renran Tian, Vincent G. Duffy, Carol Birk, Steve R. Abel, and Kyle Hultgren
475
483
492
502
512
521
531
540
550
560
XX
Table of Contents
On Improving Provider Decision Making with Enhanced Computerized Clinical Reminders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sze-jung Wu, Mark Lehto, Yuehwern Yih, Jason J. Saleem, and Bradley Doebbeling Facial Shape Variation of U.S. Respirator Users . . . . . . . . . . . . . . . . . . . . . . Ziqing Zhuang, Dennis Slice, Stacey Benson, Douglas Landsittel, and Dennis Viscusi
569
578
Part VI: Ergonomic and Industrial Applications Method for Movement and Gesture Assessment (MMGA) in Ergonomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giuseppe Andreoni, Marco Mazzola, Oriana Ciani, Marta Zambetti, Maximiliano Romero, Fiammetta Costa, and Ezio Preatoni
591
Complexity of Sizing for Space Suit Applications . . . . . . . . . . . . . . . . . . . . . Elizabeth Benson and Sudhakar Rajulu
599
Impact of Force Feedback on Computer Aided Ergonomic Analyses . . . . . H. Onan Demirel and Vincent G. Duffy
608
A Methodology for Modeling the Influence of Construction Machinery Operators on Productivity and Fuel Consumption . . . . . . . . . . . . . . . . . . . . Reno Filla Human Head 3D Dimensions Measurement for the Design of Helmets . . . Fenfei Guo, Lijing Wang, and Dayong Dong
614 624
Realistic Elbow Flesh Deformation Based on Anthropometrical Data for Ergonomics Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Setia Hermawati and Russell Marshall
632
Database-Driven Grasp Synthesis and Ergonomic Assessment for Handheld Product Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keisuke Kawaguchi, Yui Endo, and Satoshi Kanai
642
Within and Between-Subject Reliability Using Classic Jack for Ergonomic Assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brian McInnes, Allison Stephens, and Jim Potvin
653
Human Head Modeling and Personal Head Protective Equipment: A Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingzhou (James) Yang, Jichang Dai, and Ziqing Zhuang
661
Part VII: Advances in Digital Human Modeling HADRIAN: Fitting Trials by Digital Human Modeling . . . . . . . . . . . . . . . . Keith Case, Russell Marshall, Dan H¨ ogberg, Steve Summerskill, Diane Gyi, and Ruth Sims
673
Table of Contents
The Pluses and Minuses of Obtaining Measurements from Digital Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ravindra S. Goonetilleke, Channa P. Witana, Jianhui Zhao, and Shuping Xiong
XXI
681
Auto-calibration of a Laser 3D Color Digitization System . . . . . . . . . . . . . Xiaojie Li, Bao-zhen Ge, Dan Zhao, Qing-guo Tian, and K. David Young
691
Virtual Task Simulation for Inclusive Design . . . . . . . . . . . . . . . . . . . . . . . . Russell Marshall, Keith Case, Steve Summerskill, Ruth Sims, Diane Gyi, and Peter Davis
700
Data Mining of Image Segments Data with Reduced Neurofuzzy System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deok Hee Nam and Edward Asikele
710
The Impact of Change in Software on Satisfaction: Evaluation Using Critical Incident Technique (CIT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akshatha Pandith, Mark Lehto, and Vincent G. Duffy
717
Validation of the HADRIAN System Using an ATM Evaluation Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Steve J. Summerskill, Russell Marshall, Keith Case, Diane E. Gyi, Ruth E. Sims, and Peter Davis A 3D Method for Fit Assessment of a Sizing System . . . . . . . . . . . . . . . . . . Jiang Wu, Zhizhong Li, and Jianwei Niu Analyzing the Effects of a BCMA in Inter-Provider Communication, Coordination and Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gulcin Yucel, Bo Hoege, Vincent G. Duffy, and Matthias Roetting Fuzzy Logic in Exploring Data Effects: A Way to Unveil Uncertainty in EEG Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fang Zheng, Bin Hu, Li Liu, Tingshao Zhu, Yongchang Li, and Yanbin Qi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
727
737
744
754
765
Static and Dynamic Human Shape Modeling Zhiqing Cheng1 and Kathleen Robinette2 1
Infoscitex Corporation, 4027 Colonel Glenn Highway, Suite 210, Dayton, OH 45431, USA 2 711th Human Performance Wing, Air Force Research Laboratory, 2800 Q Street, Wright-Patterson AFB, OH 45433, USA {zhiqing.cheng,kathleen.robinette}@wpafb.af.mil
Abstract. Recent developments in static human shape modeling based on range scan data and dynamic human shape modeling from video imagery are reviewed. The topics discussed include shape description, surface registration, hole filling, shape characterization, and shape reconstruction for static modeling and pose identification, skeleton modeling, shape deformation, motion tracking, dynamic shape capture and reconstruction, and animation for dynamic modeling. A new method for human shape modeling is introduced. Keywords: Human body, shape modeling, pose, animation.
1 Introduction From the perspective of the motion status of the subject to be modeled, human shape modeling can be classified as either static or dynamic. Static shape modeling creates a model to describe human shape at a particular pose and addresses shape description, registration, hole filling, shape variation characterization, and shape reconstruction. Dynamic shape modeling addresses the shape variations due to pose changes (pose identification, skeleton modeling, and shape deformation) or while the subject is in motion (motion tracking, shape capture, shape reconstruction, and animation). Extensive investigations have been performed on human shape modeling [1-10]. Recent developments of human shape modeling, in particular, static shape modeling based on range scan data and dynamic shape modeling from video imagery, are reviewed in this paper. A new method for human shape modeling based on body segmentation and contour lines is introduced.
2 Static Shape Modeling 2.1 Shape Description Shape description is a fundamental problem for human shape modeling. Traditional anthropometry is based on a set of measurements corresponding to linear distances between anatomical landmarks and circumference values at predefined locations. These measurements provide limited information about the human body shape [11]. With the advances in surface digitization technology, a 3-D surface scan of the whole body can be acquired in a few seconds. While whole body 3-D surface scan provides V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 3–12, 2009. © Springer-Verlag Berlin Heidelberg 2009
4
Z. Cheng and K. Robinette
very detailed description of the body shape, the verbose scan data cannot be used directly for shape analysis. Therefore, it is necessary to convert 3-D scans to a form of compact representation. For searching and mining from a large 3-D scan database, Robinette [12] investigated 3-D shape descriptors where the Paquet Shape Descriptor (PSD) developed by Paquet and Rioux [13] was examined in detail. While PSD is able to discriminate or characterize different human shapes, it is not invertible. In other words, it is impossible to reconstruct a human shape from PSD. An ideal human shape descriptor should be concise, unique, and complete for human shape description, efficient for shape indexing and searching, and invertible for shape reconstruction. Finding such a descriptor still remains a challenge. Alternatively, various graphic elements or graphic representation methods can be used to describe the human shape. For instance, Allen et al [2] and Anguelov et al [7] basically dealt directly with the vertices or polygons of a scanned surface for shape description. Allen et al [1] used subdivision surface in their pose modeling. Ben Azouz et al [6] utilized volumetric representation to convert vertices to voxels in their human shape modeling. While these methods guarantee reconstruction, they are not quite efficient for shape identification, discrimination, and searching. 2.2 Surface Registration Surface registration or point-to-point correspondence among the scan data of different subjects is essential to many problems such as the study of human shape variability [2, 14] and pose modeling and animation [1, 7] where multiple subjects or multiple poses are involved. One methodology for establishing point-to-point correspondence among different scan data sets or models is usually called non-rigid registration. Given a set of markers between two meshes, non-rigid registration brings the meshes into close alignment while simultaneously aligning the markers. Allen et al [1, 2] solved the correspondence problem between subjects by deforming a template model which is a hole-free, artist-generated mesh to fit individual scans. The resulting individually fitted scans or individual “models” all have the same number of triangles and point-to-point correspondences. The fitting process relies on a set of anthropometric landmarks provided in the CAESAR (Civilian American and European Surface Anthropometry Resource) database [15]. Anguelov et al [16] developed an unsupervised algorithm for registering 3-D surface scans of an object among different poses undergoing significant deformations. The algorithm called Correlated Correspondence (CC) does not use markers, nor does it assume prior knowledge about object shape, the dynamics of its deformation, or scan alignment. The algorithm registers two meshes with significant deformations by optimizing a joint probabilistic model over all point-to-point correspondences between them. This model enforces preservation of local mesh geometry as well as global constraints that capture the preservation of geodesic distance between corresponding point pairs. To obtain the markers for the non-rigid registration, Anguelov et al [7] then used the CC algorithm to compute the consistent embedding of each instance mesh (the mesh of a particular pose) into the template mesh (the mesh of a reference pose). Ben Azouz et al [14] used a volumetric representation of human 3-D surface to establish the correspondences between the scan data of different subjects. By converting their polygonal
Static and Dynamic Human Shape Modeling
5
mesh descriptions to a volumetric representation, the 3D scans of different subjects are aligned inside a volume of fixed dimensions, which is sampled to a set of voxels. A human 3-D shape is then characterized by an array of signed distances between the voxels and their nearest point on the body surface. Correspondence is achieved by comparing for each voxel the signed distances attributed to different subjects without using anatomical landmarks. 2.3 Hole Filling Surfaces acquired with scanners are typically incomplete and contain holes. Filling a hole is a challenging problem in its own right, as discussed by Davis et al. [18]. A common way to complete a hole is to fill it with a smooth surface patch that meets the boundary conditions of the hole. While these methods fill holes in a smooth manner, which is reasonable in some areas such as the top of the head and possibly in the underarm, other areas should not be filled smoothly. Therefore, Allen et al [2] developed a method that maps a surface from a template model to the hole area. Alternatively, hole-filling can be based on the contour lines of a scan surface [14]. 2.4 Shape Variation Characterization The human body comes in all shapes and sizes. Characterizing human shape variation is traditionally the subject of anthropometry—the study of human body measurement. The sparse measurements of traditional anthropometric shape characterization curtail its ability to capture the detailed shape variations needed for realism. While characterizing human shape variation based on a 3-D range scan could capture the details of shape variation, the method relies on three conditions: noise elimination, hole-filling and surface completion, and point-to-point correspondence. Also, whole body scanners generate verbose data that cannot be used directly for shape variation analysis. Therefore, it is necessary to convert 3-D scans to a compact representation that retains information of the body shape. Principal components analysis (PCA) is a potential solution to the problem. Allen et al [2] captured the variability of human shape by performing PCA over the displacements of the points from the template surface to an instance surface. Anguelov et al [7] also used PCA to characterize the shape deformation and then used the principal components for shape completion. Ben Azouz et al [14] applied PCA to the volumetric models where the vector is formed by the signed distance from a voxel to the surface of the model. In order to explore the variations of the human body with intuitive control parameters (e.g., height, weight, age, and sex), Allen et al [2] showed how to relate several variables simultaneously by learning a linear mapping between the control parameters and the PCA weights. Ben Azouz et al [6,14,21] attempted to link the principal modes to some intuitive body shape variations by visualizing the first five modes of variation and gave interpretations of these modes. While PCA is shown to be effective in characterizing global shape variations, it may smear local variations for which other methods (e.g., wavelets) may be more effective.
6
Z. Cheng and K. Robinette
2.5 Shape Reconstruction Given a number of scan data sets of different subjects, a novel human shape can be created that has resemblance to the samples but is not the exact copy of any existing ones. This can be realized in four ways. • Interpolation or morphing. One shape can be gradually morphed to another by interpolating between their vertices or other graphic entities [2]. In order to create a faithful intermediate shape between two individuals, it is critical that all features are well-aligned; otherwise, features will cross-fade instead of moving. • Reconstruction from eigen-space. After PCA analysis, the features of sample shapes are characterized by eigen-vectors or eigen-persons which form an eignespace. Any new shape model can be generated from this space by combining a number of eigen-models with appropriate weighting factors [14]. • Feature-based synthesis. Once the relationship between human anthropometric features and eigen-vectors is established, a new shape model can be constructed from the eigen-space with desired features by editing multiple correlated attributes, such as height and weight [2] or fat percentage and hip-to-waist ratio [4]. • Marker-only matching. Marker-only matching can be considered as a way of reconstruction with provided markers [2]. This is important for many applications such as deriving a model from video imagery, since marker data can be obtained using less expensive equipment than a laser range scanner.
3 Pose Change Modeling During pose changing or body movement, muscles, bones, and other anatomical structures continuously shift and change the shape of the body. For pose modeling, scanning the subject in every pose is impractical; instead, body shape can be scanned in a set of key poses, and then the body shapes corresponding to intermediate poses are determined by smoothly interpolating among these poses. The issues involved in pose modeling include pose definition and identification, skeleton model derivation, shape deformation (skinning), and pose mapping. 3.1 Pose Definition and Identification The human body can assume various poses. In order to have a common basis for pose modeling, a distinct, unique description of difference poses is required. Since it is impossible to collect the data or create template models for all possible poses, it is necessary to define a set of standard, typical poses. This is pose definition. A convention for pose definition is yet to be established. One approach is to use joint angle changes as the measures to characterize human pose changing and gross motion. This means that poses can be defined by joint angles. By defining poses and motion in such a way, the body shape variations caused by pose changing and motion will consist of both rigid and non-rigid deformation. Rigid deformation is associated with the orientation and position of segments that connect joints. Non-rigid deformation is related to the changes in shape of soft tissues associated with segments in motion, which, however, excludes local deformation caused by muscle action alone. One
Static and Dynamic Human Shape Modeling
7
method for measuring and defining joint angles is using a skeleton model. In the model, the human body is divided into multiple segments according to major joints of the body, each segment is represented by a rigid linkage, and an appropriate joint is placed between the two corresponding linkages. Given a set of scan data, imagery, or photos, the determination or identification of the corresponding pose can be done by fitting a skeleton model to the data set. The skeleton model derivation will be discussed in the following section. Alternatively, there are several methods for pose identification that are not based on skeleton models. Mittal et al [22] studied human body pose estimation using silhouette shape analysis. Cohen and Li [23] proposed an approach for inferring the body posture using a 3-D visual-hull constructed from a set of silhouettes. 3.2 Skeleton Model Allen et al [1] constructed a kinematic skeleton model to identify the pose of a scan data set using markers captured during range scanning. Anguelov et al [24] developed an algorithm that automatically recovers from 3-D range data a decomposition of the object into approximately rigid parts, the location of the parts in the different poses, and the articulated object skeleton linking the parts. Robertson and Trucco [25] developed an evolutionary approach to estimating upper-body posture from multi-view markerless sequences. Sundaresan et al [26] proposed a general approach that uses Laplacian eigen-maps and a graphical model of the human body to segment 3-D voxel data of humans into different articulated chains. 3.3 Body Deformation Modeling Body deformation modeling is also referred to as skinning in animation. Two main approaches for modeling body deformations are anatomical modeling and examplebased modeling. The anatomical modeling is based on an accurate representation of the major bones, muscles, and other interior structures of the body [27]. These structures are deformed as necessary when the body moves, and a skin model is wrapped around the underlying anatomy to obtain the final geometry of the body shape. The finite element method is the primary modeling technique used for anatomical modeling. In the example-based approach, a model of some body part in several different poses with the same underlying mesh structure can be generated by an artist. These poses are correlated to various degrees of freedom, such as joint angles. An animator can then supply values for the degrees of freedom of a new pose and the body shape for that new pose is interpolated appropriately. Lewis et al [28] and Sloan et al [29] developed similar techniques for applying example-based approaches to meshes. Instead of using artist-generated models, recent work on the example-based modeling uses range-scan data. Allen et al [1] presented an example-based method for calculating skeleton-driven body deformations. Their example data consists of range scans of a human body in a variety of poses. Using markers captured during range scanning, a kinematic skeleton is constructed first to identify the pose of each scan. Then a mutually consistent parameterization of all the scans is constructed using a posable subdivision surface template. Anguelov et al [7] developed a method that incorporates both articulated
8
Z. Cheng and K. Robinette
and non-rigid deformations. A pose deformation model was constructed from training scan data that derives the non-rigid surface deformation as a function of the pose of the articulated skeleton. A separate model of shape variation was derived from the training data also. The two models were combined to produce a 3D surface model with realistic muscle deformation for different people in different poses. The method (model) is referred to as the SCAPE (Shape Completion and Animation for People). For pose modeling, it is impossible to acquire the pose deformation for each person at each pose. Instead, pose deformation can be transferred from one person to another for a given pose. Anguelov et al [7] addressed this issue by integrating a pose model with a shape model reconstructed from eigen-space. As such, they were able to generate a mesh for any body shape in their PCA space in any pose.
4 Shape Modeling of Human in Motion 4.1 Motion Tracking Human motion tracking or capturing is an area that has attracted a lot of study and investigation. Today’s performance of off-the-shelf computer hardware enables marker-free, non-intrusive optical tracking of the human body. For example, Theobalt et al [30] developed a system to capture human motion at interactive frame rates without the use of markers or scene-introducing devices. The algorithms for 2-D computer vision and 3-D volumetric scene reconstruction were applied directly to the image data. A person was recorded by multiple synchronized cameras, and a multilayer hierarchical kinematic skeleton was fitted into each frame in a two-stage process. 4.2 Dynamic Shape Capture During dynamic activities, the surface of the human body moves in many subtle but visually significant ways: bending, bulging, jiggling, and stretching. Park and Hodgins [8] developed a technique for capturing and animating those motions using a commercial motion capture system with approximately 350 markers. Supplemented with a detailed, actor specific surface model, the motion of the skin was then computed by segmenting the markers into the motion of a set of rigid parts and a residual deformation. Sand et al [5] developed a method (a needle model) for the acquisition of deformable human geometry from silhouettes. Their technique uses a commercial tracking system to determine the motion of the skeleton and then estimates geometry for each bone using constraints provided by the silhouettes from one or more cameras. 4.3 Shape Reconstruction from Imagery Data • From Photos. Seo et al [31] presented a data-driven shape model for reconstructing human body models from one or more 2-D photos. A data-driven, parameterized deformable model acquired from a collection of range scans of a real human body is used to complement the image-based reconstruction by leveraging the quality, shape, and statistical information accumulated from multiple shapes of rangescanned people.
Static and Dynamic Human Shape Modeling
9
• From Video Sequences. One recent work was done by Balan et al [10] that proposed a method for recovering human shape models directly from images. Specifically, the human body shape is represented by the SCAPE [7] and the parameters of the model are directly estimated from image data. A cost function between image observations and a hypothesized mesh is defined and the problem is formulated as an optimization. 4.4 Animation The animation of the subject can be realized by displaying a series of human shape models for a prescribed sequence of poses. Hilton et al [3] built a framework for construction of animated models from the captured surface shape of real objects. Seo et al [4] developed a synthesizer where for any synthesized model, the underlying bone and skin structure is properly adjusted, so that the model remains completely animatable using the underlying skeleton. Aguiar et al [9] developed a novel versatile, fast and simple framework to generate high quality animations of scanned human characters from input motion data. The method is purely mesh-based and can easily transfer motions between human subjects of completely different shape and proportions.
5 A New Method In the static human shape modeling based on 3-D laser scan data, polygons/vertices are usually used as the basic graphic entities for the representation of a human body shape. Usually approximately 20,000 ~ 500,000 vertices are required to describe a full body shape, depending upon surface resolution. This way of surface representation incurs a large computational cost and cannot ensure point-to-point correspondence among the scans of different subjects. Thus we developed a new method that uses contour lines as the basic entities for the shape modeling. The entire procedure of the method is as follows. (1) Joint center calculation The human body is treated as a multi-segment system where segments are connected to each other by joints, which in turn, are defined by respective landmarks. (2) Skeleton model building A skeleton model is formed by connecting respective joint centers to represent the articulated structure and segments of the human body, as shown in Fig. 1. (3) Segmentation The entire body scan is divided into segments according to the skeleton model with some special treatment in certain body areas. (4) Slicing The scan of each segment is sliced along the main axis of each segment at fixed intervals, which produces the contour lines of the segment. Figure 2 displays the segmentation and slicing of a whole body scan. (5) Discretizing Each contour line is discretized with respect to a polar angle. As such, the two-dimensional contour curve is represented by a vector. (6) Hole-filling The hole-filling is performed on contour lines for each segment. Figure 3 shows the original surface and filled surface of the abdomen segment. (7) Parameterization The vector of each discretized contour line is represented by a set of wavelet coefficients. (8) Registration The point-to-point correspondence between the scans of two bodies is established with respect to the contour lines of each segment. (9) Shape description and PCA The assembly of the wavelet coefficients of all segments is used as the shape description vector. Principal component analysis (PCA) is performed on a
10
Z. Cheng and K. Robinette
selection of subjects from the CAESAR database. (10) Shape reconstruction A 3D human shape model is reconstructed in the following way: (a) From the shape description vector to wavelet coefficients; (b) From wavelet coefficients to contour lines; (c) From contour lines to 3D surface model; and (d) Part blending as needed.
Fig. 1. Landmarks, joint centers, and the skeleton model
Fig. 2. Segmentation and slicing
250
200
150 300 200 100 0 -100 -200 -300
-300
-200
-100
0
100
200
Fig. 3. Hole filling based on contour lines
6 Concluding Remarks Human shape modeling spans various research areas from anthropometry, computer graphics and computer vision to machine intelligence and optimization. It simply would not be possible to present a full survey of the related work. Instead, this paper just intended to provide an indication of the current state of the art. In addition to traditional uses, human modeling is finding many new applications with great challenges, such as virtual environment, human identification, and human-borne threat detection.
References 1. Allen, B., Curless, B., Popovic, Z.: Articulated Body Deformation from Range Scan Data. In: ACM SIGGRAPH 2002, San Antonio, TX, USA, pp. 21–26 (2002) 2. Allen, B., Curless, B., Popvic, Z.: The space of human body shapes: reconstruction and parameterization from range scans. In: ACM SIGGRAPH 2003, San Diego, CA, USA, 2731 July (2003)
Static and Dynamic Human Shape Modeling
11
3. Hilton, A., Starck, J., Collins, G.: From 3D Shape Capture to Animated Models. In: Proceedings of First International Symposium on 3D Data Processing Visualization and Transmission, pp. 246–255 (2002) 4. Seo, H., Cordier, F., Thalmann, N.M.: Synthesizing Animatable Body Models with Parameterized Shape Modifications. In: Eurographics/SIGGRAPH Symposium on Computer Animation (2003) 5. Sand, P., McMillan, L., Popovic, J.: Continuous Capture of Skin Deformation. ACM Transactions on Graphics 22(3), 578–586 (2003) 6. Ben Azouz, Z., Rioux, M., Shu, C., Lepage, R.: Analysis of Human Shape Variation using Volumetric Techniques. In: Proc. of 17th Annual Conference on Computer Animation and Social Agents Geneva, Switzerland (2004) 7. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: Shape Completion and Animation of People. ACM Transactions on Graphics 24(3) (2005) 8. Park, S.I., Hodgins, J.K.: Capturing and Animating Skin Deformation in Human Motion. In: ACM Transaction on Graphics (SIGGRAPH 2006), vol. 25(3), pp. 881–889 (2006) 9. Aguiar, E., Zayer, R., Theobalt, C., Magnor, M., Seidel, H.P.: A Framework for Natural Animation of Digitized Models. MPI-I-2006-4-003 (2006) 10. Balan, A., Sigal, L., Black, M., Davis, J., and Haussecker, H.: Detailed Human Shape and Pose from Images. In: IEEE Conf. on Comp. Vision and Pattern Recognition (2007). 11. Robinette, K.M., Vannier, M.W., Rioux, M., Jones, P.: 3-D surface anthropometry: Review of technologies. In: North Atlantic Treaty Organization Advisory Group for Aerospace Rearch & Development, Aerospace Medical Panel (1997) 12. Robinette, K.M.: An Investigation of 3-D Anthropometric Shape Descriptors for Database Mining, Ph.D. Thesis, University of Cincinnati (2003) 13. Paquet, E., Rioux, M.: Content-based access of VRML libraries. In: Ip, H.H.-S., Smeulders, A.W.M. (eds.) MINAR 1998. LNCS, vol. 1464. Springer, Heidelberg (1998) 14. Ben Azouz, Z.B., Shu, C., Lepage, R., Rioux, M.: Extracting Main Modes of Human Body Shape Variation from 3-D Anthropometric Data. In: Proceedings of the Fifth International Conference on 3-D Digital Imaging and Modeling (2005) 15. Robinette, K., Daanen, H., Paquet, E.: The Caesar Project: A 3-D Surface Anthropometry Survey. In: Second International Conference on 3-D Digital Imaging and Modeling (3DIM 1999), Ottawa, Canada, pp. 380–386 (1999) 16. Anguelov, D., Srinivasan, P., Pang, H.C., Koller, D., Thrun, S., Davis, J.: The correlated correspondence algorithm for unsupervised registration of nonrigid surfaces. Advances in Neural Information Processing Systems 17, 33–40 (2005) 17. Ben Azou, Z.B., Shu, C., Mantel, A.: Automatic Locating of Anthropometrics Landmarks on 3D Human models. In: Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (2006) 18. Davis, J., Marschner, S., Garr, M., Levoy, M.: Filling holes in complex surfaces using volumetric diffusion. In: Proceedings of the First International Symposium on 3D Data Processing, Visualization and Transmission. Padua, Italy (2002) 19. Curless, B., Levoy, M.: A volumetric method of building complex models from range images. In: Proceedings of SIGGRAPH 1996, 303–312 (1996) 20. Liepa, P.: Filling holes in meshes. In: Proc. of the Eurographics/ACM SIGGRAPH symposium on Geometry processing, pp. 200–205 (2003) 21. Ben Azouz, B.Z., Rioux, M., Shu, C., Lepage, R.: Characterizing Human Shape Variation Using 3-D Anthropometric Data. International Journal of Computer Graphics 22(5), 302– 314 (2005)
12
Z. Cheng and K. Robinette
22. Mittal, A., Zhao, L., Davis, L.S.: Human Body Pose Estimation Using Silhouette Shape Analysis. In: Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance (2003) 23. Cohen, I., Li, H.X.: Inference of Human Postures by Classification of 3D Human Body Shape. In: Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures (2003) 24. Anguelov, D., Koller, D., Pang, H.C., Srinivasan, P., Thrun, S.: Recovering Articulated Object Models from 3D Range Data. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pp. 18–26 (2004) 25. Robertson, C., Trucco, E.: Human body posture via hierarchical evolutionary optimization. In: BMVC 2006 (2006) 26. Sundaresan, A., Chellappa, R.: Model driven segmentation of articulating humans in Laplacian Eigenspace. IEEE Transaction: Patter Analysis and Machine Intelligence (2007) 27. Aubel, A., Thalmann, D.: Interactive modeling of the human musculature. In: Proc. of Computer Animation (2001) 28. Lewis, J.P., Cordner, M., Fong, N.: Pose space deformations: A unified approach to shape interpolation and skeleton-driven deformation. In: ACM SIGGRAPH, pp. 165–172 (2000) 29. Sloan, P.P., Rose, C., Cohen, M.F.: Shape by example. In: Proceedings of 2001 Symposium on Interactive 3D Graphics (2001) 30. Theobalt, C., Magnor, M., Schüler, P., Seidel, H.P.: Combining 2D Feature Tracking and Volume Reconstructions for Online Video-Based Human Motion Capture. International Journal of Image and Graphics 4(4), 563–583 (2004) 31. Seo, H., Yeo, Y.I., Wohn, K.: 3D Body Reconstruction from Photos Based on Range Scan. In: Pan, Z., Aylett, R.S., Diener, H., Jin, X., Göbel, S., Li, L. (eds.) Edutainment 2006. LNCS, vol. 3942, pp. 849–860. Springer, Heidelberg (2006)
An Advanced Modality of Visualization and Interaction with Virtual Models of the Human Body Lucio T. De Paolis1,3, Marco Pulimeno2, and Giovanni Aloisio1,3 1
Department of Innovation Engineering, Salento University, Lecce, Italy 2 ISUFI, Salento University, Lecce, Italy 3 SPACI Consortium, Italy {lucio.depaolis,marco.pulimeno,giovanni.aloisio}@unile.it
Abstract. The developed system is the first prototype of a virtual interface designed to avoid contact with the computer so that the surgeon is able to visualize models of the patient’s organs more effectively during surgical procedure. In particular, the surgeon will be able to rotate, to translate and to zoom in on 3D models of the patient’s organs simply by moving his finger in free space; in addition, it is possible to choose to visualize all of the organs or only some of them. All of the interactions with the models happen in real-time using the virtual interface which appears as a touch-screen suspended in free space in a position chosen by the user when the application is started up. Finger movements are detected by means of an optical tracking system and are used to simulate touch with the interface and to interact by pressing the buttons present on the virtual screen. Keywords: User Interface, Image Processing, Tracking System.
1 Introduction The visualization of 3D models of the patient’s body emerges as a priority in surgery both in pre-operative planning and during surgical procedures. Current input devices tether the user to the system by restrictive cabling or gloves. The use of a computer in the operating room requires the introduction of new modalities of interaction designed in order to replace the standard ones and to enable a non-contact doctor-computer interaction. Gesture tracking systems provide a natural and intuitive means of interacting with the environment in an equipment-free and non-intrusive manner. Greater flexibility of action is provided since no wired components or markers need to be introduced into the system. In this work we present a new interface, based on the use of an optical tracking system, which interprets the user’s gestures in real-time for the navigation and manipulation of 3D models of the human body. The tracked movements of the finger provide a more natural and less-restrictive way of manipulating 3D models created using patient’s medical images. Various gesture-based interfaces have been developed; some of these are used in medical applications. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 13–18, 2009. © Springer-Verlag Berlin Heidelberg 2009
14
L.T. De Paolis, M. Pulimeno, and G. Aloisio
Grätzel et al. [1] presented a non-contact mouse for surgeon-computer interaction in order to replace standard computer mouse functions with hand gestures. Wachs et al. [2] presented ”Gestix”, a vision-based hand gesture capture and recognition system for navigation and manipulation of images in an electronic medical record database. GoMonkey [3] is an interactive, real time gesture-based control system for projected output that combines conventional PC hardware with a pair of stereo tracking cameras, gesture recognition software and customized content management system. O’Hagan and Zelinsky [4] presented a prototype interface based on tracking system where a finger is used as a pointing and selection device. The focus of the discussion is how the system can be made to perform robustly in real-time. O’Hagan et al. [5] implemented a gesture interface for navigation and object manipulation in the virtual environment.
2 Technologies Used In the developed system we have utilized OpenSceneGraph for the construction of the graphic environment and 3D Slicer for building the 3D models starting from the real patient’s medical images. OpenSceneGraph [6] is an open source high performance 3D graphics toolkit used by application developers in fields such as visual simulation, computer games, virtual reality, scientific visualization and modeling. The toolkit is a C++ library and is available on multiple platforms including Windows, Linux, IRIX and Solaris. 3D Slicer [7] is a multi-platform open-source software package for visualization and image analysis, aimed at computer scientists and clinical researchers. The platform provides functionality for segmentation, registration and threedimensional visualization of multi-modal image data, as well as advanced image analysis algorithms for diffusion tensor imaging, functional magnetic resonance imaging and image-guided therapy. Standard image file formats are supported, and the application integrates interface capabilities with biomedical research software and image informatics frameworks. The optical tracking system used in this application is the Polaris Vicra of the NDI. The Polaris Vicra is an optical system that tracks both active and passive markers and provides precise, real-time spatial measurements of the location and orientation of an object or tool within a defined coordinate system. The system tracks wired active tools with infra-red light-emitting diodes and wireless passive tools with passive reflective spheres. With passive and active markers, the position sensor receives light from marker reflections and marker emissions, respectively. The Polaris Vicra uses a position sensor to detect infrared-emitting or retroreflective markers affixed to a tool or object; based on the information received from the markers, the position sensor is able to determine position and orientation of tools within a specific measurement volume. In this way each movement of the marker, or marker geometry, attached to the specific tool in the real environment is replicated in the corresponding virtual environment. The markers outside of the measurement volume are not detected.
An Advanced Modality of Visualization and Interaction with Virtual Models
15
The system is able to track up to 6 tools (maximum 1 active wireless) with a maximum of 32 passive markers in view and the maximum update rate is 20 Hz. The systems can be used in a variety of surgical applications, delivering accurate, flexible, and reliable measurement solutions that are easily customized for specific applications.
3 The Developed Application The developed system is the first prototype of a virtual interface designed to avoid contact with the computer so that the surgeon can visualize models of the patient’s organs more effectively during the surgical procedure. A 3D model of the abdominal area, reconstructed from CT images, is shown in Figure 1. The patient suffers from a pathology in the liver which causes notable swelling.
Fig. 1. A 3D model reconstructed from CT images
In order to build the 3D model from the CT images some segmentation and classification algorithms were utilized. The Fast Marching algorithm was used for the image segmentation; some fiducial points were chosen in the interest area and used in the growing phase. After a first semi-automatic segmentation, a manual segmentation was carried out. All of the interactions with the models happen in real-time using the virtual interface which appears as a touch-screen suspended in free space in a position chosen by the user when the application is started up.
16
L.T. De Paolis, M. Pulimeno, and G. Aloisio
When starting the user has to define a space area where the interface is located and to decide on the positions of the four vertexes of the virtual screen. In this way a reference system is also defined; this is necessary to fix the interaction plane. In front of this region the marker is moved around and, in order to choose the different interaction modalities and the organs to be visualized, you press the virtual buttons which are present on the interface. In addition a scaling operation is carried out in order to adapt the size of the virtual interface to the real screen of the computer. Finger movements are detected by means of an optical tracking system and are used to simulate touch with the interface where some buttons are located. In figure 2 the interaction with the user interface by means of the tracking system is shown.
Fig. 2. The interaction with the virtual user interface
The interaction with the virtual screen happens by pressing these buttons, which make it possible to visualize the different organs present in the built 3D model (buttons on the right) and to choose the possible operations allowed on the selected model (buttons on the left). For this reason, when using this graphical interface, the surgeon is able to rotate, to translate and to zoom in on the 3D models of the patient’s organs simply by moving his finger in free space; in addition, he can select the visualization of all of the organs or only some of them. At the bottom of the screen the interaction modality chosen is visualized and in the top left-hand corner the cursor position is shown in the defined reference system. In figure 3 is shows the virtual user interface.
An Advanced Modality of Visualization and Interaction with Virtual Models
17
cursor position
action buttons selection buttons
interaction modality
Fig. 3. The virtual user interface
To build the virtual scene a scene graph has been used and 2D and 3D environments are included. The 2D environment allows the cursor, some text and the buttons to be visualized, updating the active interaction modality and the cursor position. The 3D environment allows the model of the organs to be visualized and provides the interaction operations. The lighting conditions are important and can cause problems because some external light could be interpreted as other IR reflectors with the creation of false cursors in the scene.
4 Conclusions and Future Work The described application is the first prototype of a virtual interface which provides a very simple form of interaction for navigation and manipulation of 3D virtual models of the human body. The virtual interface created provides an interaction modality with models of the human body, a modality which is similar to the traditional one which uses a touch screen, but in this interface there is no contact with the screen and the user’s finger moves through open space. By means of an optical tracking system, the position of the finger tip, where an IR reflector is located, is detected and utilized first to define the four vertexes of the virtual interface and then to manage the interaction with this. The optical tracker is already in use in computer aided systems and, for this reason, the developed interface can easily be integrated in the operation room. Taking into account a possible use of the optical tracker in the operating room during surgical procedures, the problem of possible undesired interferences due to the detection of false markers (phantom markers) will be evaluated.
18
L.T. De Paolis, M. Pulimeno, and G. Aloisio
The introduction of other functionalities of interaction with the models is in progress, after further investigation and consideration of surgeons’ requirements. Another improvement could be to provide the visualization of CT images in addition to the 3D models and give surgeons the opportunity of navigating into the set of CT slices. In this way surgeons are provided with the traditional visualization modality as well as the new one and are able to compare them.
References 1. Grätzel, C., Fong, T., Grange, S., Baur, C.: A non-Contact Mouse for Surgeon-Computer Interaction. Technology and Health Care Journal 12(3) (2004) 2. Wachs, J.P., Stern, H.I., Edan, Y., Gillam, M., Handler, J., Feied, C., Smith, M.A.: Gesturebased Tool for Sterile Browsing of Radiology Images. The Journal of the American Medical Informatics Association 15(3) (2008) 3. GoMonkey, http://www.gomonkey.at 4. O’Hagan, R., Zelinsky, A.: Finger Track - A Robust and Real-Time Gesture Interface. In: Sattar, A. (ed.) Canadian AI 1997. LNCS, vol. 1342, pp. 475–484. Springer, Heidelberg (1997) 5. O’Hagan, R., Zelinsky, A., Rougeaux, S.: Visual Gesture Interfaces for Virtual Environments. Interacting with Computers 14, 231–250 (2002) 6. OpenSceneGraph, http://www.openscenegraph.org 7. 3D Slicer, http://www.slicer.org
3D Body Scanning’s Contribution to the Use of Apparel as an Identity Construction Tool Marie-Eve Faust1 and Serge Carrier2 1
The Hong Kong Polytechnic University, Institute of Textiles & Clothing, Hung Hom, Kowloon, Hong Kong, China 2 Université du Québec a Montréal, C.P. 8888, Succ. Centre-ville, Montréal, Québec, H3P 3P8, Canada
[email protected],
[email protected]
Abstract. Humans use apparel as an artifact to construct their identities and present it to the outside world. Beyond textiles and clothing style, garment fit contributes to this image presentation. This research, conducted on the Hong Kong market shows that women view 3D body scanning technology positively and that it therefore could prove an effective and efficient tool, both from a consumer’s and from a seller’s point of view, in facilitating the body image creation. Keywords: Body image, 3D body scan, apparel, fashion.
1 Introduction Throughout the ages clothing has not only fulfilled a need for protection but has also played a role in defining the wearer’s personality and determined his or her status in society. Yet selecting a garment that advantages one’s silhouette and projects the right image often, for most, proves a difficult if not impossible task. Style, color and textile are, to a large extent, subjective decisions: de gustibus et coloribus non est disputandum. Yet fit is a very objective criterion. Until a few years ago, the only way to identify a fitting garment was to try it on. The invention of 3D body scanning technology is rapidly changing this situation. Not only is it a possible first step toward mass customization, but it may also be used, by retailers, as an added customer service tool helping them identify the best fitting garments, thereby improving their satisfaction with the shopping experience.
2 Literature Review The following section first discusses the importance of clothing in the individual’s image formation. It then proceeds to a brief presentation of 3D body scanning technology and its potential use in helping women selecting the best fitting and most advantaging pieces of clothing. 2.1 Personal Identity and Styling Clothing is worn daily yet it serves more than a mere protection need. Leroi-Gourhan [11], Wolfe [18] and Boucher [3] state that clothing has always served the same three V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 19–28, 2009. © Springer-Verlag Berlin Heidelberg 2009
20
M.-E. Faust and S. Carrier
basic human needs: (1) protection (physical need), (2) adornment and identification (psychological needs), and (3) modesty and status (social needs or role). For Kefgen and Touchie-Specht [9] clothing forms a nonverbal communication. The way people dress tells others, what kind of a person they are or would like to be perceived as. Johnson and Foster [7] point out that clothing has a language of its own providing. It may not be appropriate to judge a book by its cover but many argue that the cover certainly helps selecting the book. Body image can be positive or negative, accurate or inaccurate, particularly as we form this image in comparison with others and in relation to cultural views of fashion. The evaluative dimension of body image is known as body cathexis [8]: the indication of an individuals’ satisfaction or dissatisfaction with their different body parts [10; 16]. Body cathexis is closely related to the person’s global self-image, self esteem, and self concept [16; 17]. Rasband [14] showed that an accurate and objective body image is necessary as it plays a significant part in clothing selection and appearance. Clothing becomes part of your body image, a second skin in establishing new physical boundaries for yourself. Over the last few decades, various studies have focused on five elements of a garment important in achieving a better clothing message: line, shape, color, texture and pattern. In order to understand how clothing can impact on the way someone looks, body figures must be understood even if standards or beliefs change with times or who makes the decision. Authors generally recognize eight more or less standard female body shapes: • Ideal figure: the shoulders and hips have similar width, the bust size is medium the waist is small. The abdomen is flat to slightly curved, the buttock is moderately curved and thighs are slim. The figure is well balanced. The weight is just enough to cover the bones. • Hourglass figure type: the hourglass shape appears full-rounded in the bust and hip, with a small waist. The bust is more often larger than average as well as the hips. The waist is well indented waist. Hips and buttocks are smoothly rounded. • Triangular figure type: the triangular figure seen from the front looks narrower above the waist and wider below. The excess of weight appears on the buttocks, the low hips, and the thighs. Women with this type of figure appear unbalanced from top to bottom with the shoulders narrower than the hips. The bust and the waist are usually small to medium. • Inverted triangular figure type: the inverted triangle gives the opposite look. It appears wider above the waist and narrower below. The shoulder, the upper back, and the bust look prominent. • Rectangular figure type: women with a called rectangular figure seams to have nearly the same width at shoulders, waist, and hips. Their waist line doesn’t seem well defined and their body lines look straight. • Tubular figure type: similar to the rectangular, the weight is considerably below the average or ideal range. • Diamond figure type: points up with narrow shoulders and hips in combination with a wide midriff and waist. The midriff and upper hips do not appear to taper inward towards the waist.
3D Body Scanning’s Contribution to the Use of Apparel
21
• Oval or full-rounded figure: the weight is noticeably above the average and larger throughout the figure where body lines are full-round curves. (Rasdand and Liechty, 2006) Numerous fashion articles are written every year focusing either on the body or parts of the body such as “C’est moi, ma personalite, mon style” [5] or InStyle [1]. Each describes and shows two dimensional figures and ways to improve upon them. Instyle, for example, talks about “curvy women” stating that these women should showcase their waist and curves without over emphasizing them; steer clear of anything to tight or clothes that are cut straight un-and-down and fabrics that are thin. It provides tips to select garments that flatter the body, or part of it, for each body figure: short, narrow shoulders or broad shoulders, full bust or small bust, heavy arms, well defined tummy, short-waisted/long legs or long-waisted/short legs, bottom heavy, etc. For Rasband and Liechty [15], a garment can change the visual of the body figure, even in areas where it may appear difficult. According to Rasband and Liechty [15] a garment line creates shape and form. Yet to fully take advantage of this wisdom, women need to know their body shapes. 2.2 3D Body Scanner A 3D body scanner is the size of a fitting room. It uses cameras or safe lasers to capture up to 300,000 data points for each person’s scan. The scanning process takes only a few seconds. Within a few minutes the software automatically extracts hundreds of body measurements. Data on body shape and body volume can also be automatically extracted. The resolution of the final scan is quite accurate. Data can be transferred directly from the scanner over local networks or the web (Shape Analysis Limited, 2008). In the early stages of the 3D body scanning technology many argued that this technology would mostly be used to provide custom fitting services. Many thought that it would bring consumers into the design and production stages, resulting in well-fitting, made-to-measure garments at competitive prices and turnaround times. Although it has not quite reached this point yet, 3D scanning has come to play an important for some apparel retailers and producers. It contributes to mass customization by enabling retailers to rapidly collect three-dimensional (3D) data and to send it to manufacturers who tailor the garment to fit individuals [2]. In addition to custom fitting, 3D body scanning technology also improves the body measurement data used in traditional mass production [4]. Industry and academic researchers are beginning to use large amounts of anthropometric (body measurement) data captured by body scanners to adjust the sizing systems of ready-to-wear clothing lines in order to provide better fitted garments ([TC]2, 2004). Another application of 3D body scanning is the Virtual Try-on. Consumers can now virtually try garments on. An individual's scan is visualized on a computer while clothing of various sizes is superimposed (in 3D) on a rotatable image (http://www.bodyscan.human.cornell.edu/scene0037.html, [2]). The computer application highlights areas of good and bad fit, helping the user to select the most appropriate product according to his or her body size and shape. Body scanning data will also increase the number and accuracy of measurements used in size prediction (match between one’s body and garments on offer). The
22
M.-E. Faust and S. Carrier
combination of virtual try-on with size prediction not only provides consumers with the brands and sizes that fit their measurements and proportions best, but also lets them virtually view garments on their scan and choose the design they prefer. This process combines objective fit information and with fit preference. Locker and Ashdown [12] noticed that Commercial applications of body scanning (mass-customized clothing, improved ready-to-wear sizing systems, and virtual tryon) will only be viable if consumers agree to being scanned. They surveyed a group of women they scanned in the course of one of their studies enquiring about their level of comfort with, and interest in, body scanning. The answer was resoundingly positive on both counts regardless of size, age, or their satisfaction with the fit of available ready-to-wear pants. Almost all were willing to be scanned again and many were willing to be scanned every year or whenever their weight changed. They also found very positive reactions to commercial applications and research using body scan data. Participants found the virtual try-on application more appealing than custom-fit clothing or patterns, size prediction, or personal shopper applications. Women also selected virtual try-on as the most likely to influence them to buy more clothing on the Internet. Virtual try-on, custom-fitted clothing, and the creation of a “personal shopper” were rated highest in their potential contribution to find clothing that looks good on the body; custom-fit and size prediction were rated highest in helping to find clothing that fits them best. Participant confidence was also extremely high in the body scan data’s applications as an effective way to obtain body measurements, an effective means to arrive at a good fit, and in improving the trustworthiness of an online screen image (figure 1) of their own body over an idealized body shape (avatar).
Fig. 1. Adapted from Cornell Body Scan Research Group (http://www.bodyscan.human. cornell.edu/scene0037.html)
Another approach to provide the consumer with a “personal shopper” is the avatar such as those being offered by Lands' End and My Virtual Model (figure 2). Lands' End customers enter their body measurements and select a virtual model with a similar body shape in order to visualize clothing styles through their on-line store. My Virtual Model supplies the virtual try-on web interface for Lands' End, Levi's, Kenneth Cole, and other on-line retailers (My Virtual Model Inc., 2008). According to Istook and Hwang [6], there is no doubt that scanners will become an important component of the shopping experience.
3D Body Scanning’s Contribution to the Use of Apparel
23
Fig. 2. My Virtual Model, Adapted from: http://www.mvm.com/brandme.php?id=10&lang_id=en
2.3 Styling Service Online Styling services are now offered by cutting edge Internet companies such as myShape.com which boast of having more than 20,000 women’s measurements on file. This company offers women the possibility to shop from personalized clothing collections matching their style, fit preferences, and body shapes and sizes (myShape, 2008).
Fig. 3. The 7 body shapes Adopted from: www.myshape
In Wannier (2006), myShape’s chief executive states that the method seems to be working, particularly among women 35 and older. On the other hand, Mulpuru (2006), an analyst from Forrester Research states that myShape’s approach fails to gain a mass audience because the measuring process is too complicated. Only a small percentage of women would accept the site’s offer to mail them a free tape measure, and fewer would go through the process of taking the measurements and logging them into the myShape system. One thing that might help myShape to reach a mass audience would be if the company somehow offered interested people to be 3D body scan in malls or other locations, saving them the trouble of measuring themselves [13].
3 Methodology Many previous researches looked into the consumers’ reaction to 3D body scanning; yet few researches studying the combination of 3D body scanning with a styling
24
M.-E. Faust and S. Carrier
service were found. Moreover, most were merely based on western countries. A questionnaire was therefore developed to determine if a potential market combining these two areas may exist in Asia and more specifically Hong Kong. 3.1 Questionnaire Design The questionnaire was comprised of 26 questions divided into seven sections. The first section looked into the consumers’ expectations in clothing; a better understanding of their thoughts and behavior providing a first input as to the need for the type of service this research is interested in. The second section tried to evaluate the consumers’ awareness and knowledge of their body measurements and figure. In the third section, the questionnaire focused on time as a factor in clothing selection. The fourth section evaluated the consumers’ difficulties in selecting clothing. The next section dealt with shopping habits. The next to the last section investigated the consumers’ interest in using the 3D scanning-styling service should it be offered. Finally, the seventh section pertained to our respondents’ socio-demographic characteristics. 3.2 Sampling and Data Analysis A total of 128 women answered our questionnaire. The sample was a convenience one as the questionnaire was distributed to teachers, classmates, friends and relatives over the first 3 months of 2008. SPSS and Excel were used to process and analyze the data collected. Besides using descriptive statistics and frequency distributions to describe the sample population, cluster analysis were used to break it into smaller more homogenous groups.
4 Results and Findings The following section presents some of our findings on women’s purchases and perception of 3D body scanning technology. 4.1 Women’s Garment Purchases As our literature review revealed clothing serves different purposes. Figure 4 shows that second to fulfilling a basic need the HK women who participated in our survey stated that clothing should reflect their personalities and help them look beautiful. More than half of them believe that clothing helps build their self-esteem. This validated Leroi-Gourhan [11] and Wolfe’s [18] study that people use clothes for three major reasons: physical, psychological and social. It also confirms that, as is the case with Westerners, clothing is not only fulfilling a need but also answers a “want” [15]. When asked where they purchase their clothes 99 women chose mall stores and boutiques, 68 answered that they bought them in stand-alone stores. None of them mentioned shopping (the impossibility to feel the material and see the garments being mentioned as the main reasons whereas the long store hours in HK reduced the need for on-line shopping).
3D Body Scanning’s Contribution to the Use of Apparel
150 100
25
Q1. What purpose does clothing server? Selection) A basic need 105 (Mulitiple 99 83 71 57 To get
50
attention Reflect lit
0
0
Fig. 4. Justification for clothing purchases
Q5. Did you ever purchase a garment that you never or hardly ever wore? Y… No
35% 65%
Fig. 5. Garment use
Interestingly 35% of the women surveyed admitted to having purchased a garment which they hardly ever wore as they felt it did not look good on them, did not advantage their body figure, or they did not feel confident when wearing it. All concerns mentioned had to do with image and psychological needs; none mentioned concerns about fit or comfort. While 80% of our sampled admitted to searching for the garments most advantageous to their silhouette, 70% admitted they found it difficult to identify the style that accomplished this objective. To the question of perceived time consumption to find fitting clothes (1 being the lowest and 5 the highest) over 50% of women scored a 4 or 5 and 37% scored a 3; a result which clearly shows that women find the process to be time consuming. When asked how much time they spend to choose a garment, 49% stated that they spend 5 to 15 minutes to choose a casual garment and 28% that they spend 15 to 30 minutes. For the selection of party clothes, 49% of our sample stated spending between 15 and 30 minutes and 23% between 30 to 60 minutes. Almost 40% of the women stated they take between 15 to 30 minutes to decide what to wear for an interview. As one could expect, the time to choose a garment for a special event (such as a wedding) increases with 56% spending 2 hours and 18% up to three hours. When trying to identify if women were concerned with their body measurement and shapes (body figure) we found 50% of women scoring 4 and 5 (on a Likert scale where 1 identified a small extent and 5 a great extent). Our results also revealed that 62% of our sample had never heard about the “standard” body shapes before being shown the pictures taken from the literature review. Surprisingly 77were not sure about which body shape represent them best.
26
M.-E. Faust and S. Carrier
4.2 3D Body Scanning As we expected women were curious about body figures and body scanning although we were not sure if Hong Kong females were as “opened” as westerners and willing to be scanned. To the question as to how they would react to the possibility of being scanned in a retail store, 64% of women stated they would accept to then compare their body shape to the “standard” ones. Nearly 35% of our sample stated that they would visit a retail store more often if it offered styling recommendations. Nearly 40% believed that styling recommendation would reduce the risks of buying “unsuitable” garments. Through a clustering analysis, we found that 50.8% of our sampled women clustered on the first group (aged 19 to 25, single, low income level, secondary or tertiary education level) which cared the most about their own body measurement. Our second group (15.8% of our sample, aged 26 to 32, single or married without children, middle income level, secondary or tertiary education level) cared much less. The third group (12.5% of our sample, aged 33 to 40, married with or without children, high income level, tertiary education level) did not seem to care about their body measurement. Lastly the fourth group (20.8% of the sample, aged 26 to 35, married with children, high income level, tertiary education level) cared only marginally about their own body measurement. A clustering analysis on the perceived time consumption of apparel shopping showed that 36.7% did not perceive apparel shopping as particularly time consuming (aged 19 to 25, single, low income level secondary or tertiary education). The second group (29.2% of sample, aged 19 to 30, single or married without children, middle income level, tertiary education) stated that apparel shopping was time consuming. A third group (20.8% of sample, aged 26 to 40, married with children, high income level, and tertiary education level) also finds apparel shopping to be time consuming. The fourth group (13.3% of sample, aged 19 to 32, married with or without children, high income, tertiary education) finds it highly time consuming. A third clustering analysis was performed to identify willingness to go through a scanning process. In this case we identifier a first group (39.2% of sample, aged 19 to 32, single, low income, secondary to tertiary education) expressed no interest in trying the 3D body scanner. The second group (17.5% of sample, aged 19 to 25, single, low income level, tertiary education) expressed willingness to try the 3D body scanner. A third group (30.8% of sample, aged 26 to 40, married with or without children, high income, secondary to tertiary education) also expressed interest. The last group (12.5% of sample, aged 19 to 32, married with or without children, high income level, tertiary education) also expressed interest in trying the 3D body scanner. Our fourth clustering analysis focused on the interest of our sample to pay for styling recommendations. Only one group (17.5% of sample, aged 26 to 40, married with or without children, high income, and tertiary education) stated that the provision of styling recommendations by a retail store would not influence them in their shopping patterns. Lastly we conducted a clustering analysis to try and understand the relationship between the wish for styling recommendations and the willingness to try the 3D body scan. A first group (39.2% of sample) finds apparel shopping time consuming and
3D Body Scanning’s Contribution to the Use of Apparel
27
spends relatively little on fashion would like to be offered styling recommendations but expresses no interest in 3D body scanning. A second group 10% of sample) does not perceive shopping as being time consuming spends very little on fashion yet is interested in trying the body scanner as well as being offered styling recommendations. A third group (25% of sample) was comprised of those who find shopping time consuming, spend moderately on fashion products, and are interested in trying the body scanner but will patronize a retail store because it offers styling recommendations. The last group (25.8% of sample finds shopping time consuming yet spends relatively more on fashion; they are uncertain about their interest in trying the body scanner but will patronize a retail offering styling recommendations.
5 Conclusions and Recommendations Our results show that the group of consumers most interested in the body scan technology and in patronizing stores offering styling recommendations is comprised of individuals at the lower end of the spending spectrum on fashion. This finding begs the question: is it worthwhile investing in 3D scanning technology and in providing styling recommendations. Unfortunately our research does not enable us to determine whether a “free body scanning / styling recommendations” offer would impact the amount of money these customers spend on fashion items. Yet it clearly indicates that 82,5% of women would appreciate styling recommendations and 35% would agree to an in-store 3D body scan. This clearly represents an opportunity which should be investigated further.
References 1. Arbetter, L.: Style, Secrets of Style. In: Style (eds.) The complete guide to dressing your best every day, p. 191. Melcher Media, New York (2005) 2. Ashdown, S.P.: Research Group, Cornell University, About the Body Scanner (2006), http://www.bodyscan.human.cornell.edu/scene60df.html 3. Boucher, F.: Histoire du Costume en Occident des origines a nos jours, p. 478. Flammarion, Paris (1996) 4. Faust, M.-E., Carrier, S.: Discard one size fits all labels! New Size and Body Shapes labels are coming! Way to achieved Mass Customization in the apparel industry. In: Extreme Customization Mass Customization World Conference (MIT) Cambridge/Boston & (HEC) Montreal. Conference dates (October 2007) (Book chapter TBP, 2009) 5. Hamel, C., Salvas, G.: C’est moi, ma personnalité, mon style, Québec: Éditions Communiplex, p. 310 (1992) 6. Istook, C.L., Hwang, S.-J.: 3D body scanning systems with application to the apparel industry. Journal of Fashion Marketing and Management 5(2), 120–132 (2001) 7. Johnson, J.G., Foster, A.G.: Clothing image and impact. South-Western Publishing Co. (1990) 8. Jourard, S.M.: Personal adjustment; an approach through the study of healthy personality. Personal adjustment. Macmillan, New York (1958) 9. Kefgen, M., Touchie-Specht, P.: Individuality in clothing selection and personal appearance, 3rd edn. MacMillan Publishing Company, Basingstoke (1986)
28
M.-E. Faust and S. Carrier
10. LaBat, K.L., Delong, M.R.: Body Cathexis and Satisfaction with Fit Apparel. Clothing and Textiles Research Journal 8(2), 43–48 (1990) 11. Leroi-Gourhan, A.: Milieu et techniques, Évolution et Techniques. Paris: Éditions Albin Michel, p. 475, 198–241 (1973) 12. Locker, S., Cowie, L., Ashdown, S., Lewis, V.D.: Female consumers reactions to body scanning. Clothing and Textiles Research Journal 22(4), 151–160 (2004) 13. Powell, T.: Body-scanning kiosk wows apparel shoppers. Seifserviceworld.com (2006), http://www.selfserviceworld.com/article.php?id=16541 14. Rasband, J.: Fabulous Fit. Fairchild Publications, New York, p. 176 (1994) 15. Rasband, J.A., Liechty, E.L.G.: Fabulous fit Speed Fitting and Alteration, 2nd edn., p. 432. Fairchild Publications, New York (2006) 16. Secord, P.F., Jourard, S.M.: The appraisal of body-cathexis: Body-cathexis and the self. Journal of Consulting Psychology 17(5), 343–347 (1953) 17. Wendel, G., Lester, D.: Body-cathexis and self esteem. Perceptual and Motor Skills, p. 538 (1988) 18. Wolfe, M.G.: Fashion! The Goodheart-Willcox Company, Inc., West Chester Pennsylvania (2002)
Websites − − − −
INTELLIFIT (2007). http://www.it-fits.info/IntellifitSystem.asp MyShape (2008). http://www.myshape.com/content/body_shapes My Virtual Model (2008): http://www.mvm.com/cs/ Seifserviceworld.com (2006). http://www.selfserviceworld.com/article.php?id=16541 − Shape Analysis Limited (2008). http://www.shapeanalysis.com/prod01.htm − Wannier (2006). http://www.myshape.com/content/body_shapes
Facial Shape Analysis and Sizing System Afzal Godil National Institute of Standards and Technology, 100 Bureau Dr, MS 8940, Gaithersburg, MD 20899, USA
[email protected]
Abstract. The understanding of shape and size of the human head and faces is vital for design of facial wear products, such as respirators, helmets, eyeglasses and for ergonomic studies. 3D scanning is used to create 3D databases of thousands of humans from different demographics backgrounds. 3D scans have been used for design and analysis of facial wear products, but have not been very effectively utilized for sizing system. The 3D scans of human bodies contain over hundreds of thousand grid points. To be used effectively for analysis and design, these human heads require a compact shape representation. We have developed compact shape representations of head and facial shapes. We propose a sizing system based on cluster analysis along with compact shape representations to come up with different sizes for different facial wear products, such as respirators, helmets, eyeglasses, etc. Keywords: Anthropometry, shape descriptor, cluster analysis, PCA.
1 Introduction The understanding of shape and size of the human head and faces is vital for design of facial wear products, such as respirators, helmets, eyeglasses and for ergonomic studies. With the emergence of 3D laser scanners, there have been large scale surveys of humans around the world, such as the CAESAR anthropometric database. The 3D scans of human bodies contain over hundreds of thousand grid points. To be used effectively for analysis and design, these human bodies require a compact shape representation. We have developed two such compact representations based on human head shape by applying Principal Component Analysis on the facial surface and in the second method the whole head is transformed to a spherical coordinate system expanded in a basis of Spherical Harmonics. Then we use cluster analysis on these shape descriptors along with the number of cluster that come up with the sizing system for different product; such as, facial respirators, eyeglasses, helmets, and so on. Cluster analysis is a technique for extracting implicit relationship or patterns by grouping related shape descriptors. A cluster is a collection of objects that are similar to one another and are dissimilar to the objects in other clusters. There are a number of clustering techniques, but we have only tried k-mean and k-median techniques. Paquet et al. [9] have used cluster analysis for adjusting the sizes of virtual mannequin using anthropometry data. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 29–35, 2009. © Springer-Verlag Berlin Heidelberg 2009
30
A. Godil
Facial respirators are used by millions of people around the world to reduce their risk to diseases, harmful and hazardous airborne agents. At the heart of the effectiveness is the seal of the respirator which mainly depends on the fit, which prevents harmful gases and particulates from enter into the wearer’s respiratory system. The Los Alamos National Laboratory (LANL) fit test panel developed in the 1970’s is based on an anthropometric survey conducted in 1967 of Air Force personnel and still the standard for today’s respirator fit tests. The National Institute for Occupational Safety and Health (NIOSH) conducted a new survey in 2001 entitled the NIOSH Head-and-Face Anthropometric Survey of U.S. Respirator Users to produce a more accurate picture of the civilian workforce. Following analysis of the survey revealed that the LANL panels were in fact not representative of most respirator users in the U.S. Out of the 3997 respirator users in the survey, 15.3% of them were outside of the LANL fit test panel. Although the fit test panel is in the process of being updated, the core of these fit tests is still traditional anthropometric measures, which simplify the complexity of the shape of the human face. Today most manufactures supply half and full facial mask respirators based on the facial grouping that are based on the above surveys. However, many researchers have shown that there are little or no correlations between facial dimensions and the fit of half mask respirator [10]. Hence the best respirator shape for the best seal fits can only be archived by using the full 3D facial data. In this paper, we describe the CAESAR database. Then the different compact shape descriptors to represent the facial and head shape. Then finally, we discuss cluster analysis along with the shape descriptors for sizing system for facial wear products.
2 CAESAR Database The CAESAR (Civilian American and European Surface Anthropometry Resource) project has collected 3D Scans, seventy-three Anthropometry Landmarks, and Traditional Measurements data for each of the 5000 subjects. The objective of this study was to represent, in three-dimensions, the anthropometric variability of the civilian populations of Europe and North America and it was the first successful anthropometric survey to use 3-D scanning technology. The CAESAR project employs both 3-D scanning and traditional tools for body measurements for people ages 18-65. A typical CAESAR body is shown in Figure 1. The seventy-three anthropometric landmarks points were extracted from the scans as shown in Figure 2. These landmark points are pre-marked by pasting small stickers on the body and are automatically extracted using landmark software. There are around 250,000 points in each surface grid on a body and points are distributed uniformly.
Facial Shape Analysis and Sizing System
31
Fig. 1. A CAESAR body with three postures
Fig. 2. A Caesar body with landmark numbers and positions
3 Head Shape Descriptor We now describe two methods for creating descriptors based on human head shape. 3.1 PCA Based The first shape descriptor is based on applying principal component analysis (PCA) on the 3D facial grid and the most significant eigenvectors is the shape descriptor. PCA is a statistical technique to reduce the dimensionality of the data set and it has also have been applied to face recognition. First we use four anthropometric landmark points on the face from the database to properly position and align the face surface and then interpolate the surface information
32
A. Godil
on a regular rectangular grid whose size is proportional to the distance between the landmark points. The grid size is 128 in both directions. Next we perform principal component analysis (PCA) on the set of regular 3D surface grid to create the PCA based shape descriptor. The facial grid is cut from the whole CAESAR body grid using the landmark points 5 and 10 as shown in Figure 3 and listed in Table 1. Table 1 list all the numbers and names of landmark points used in our 3D face shape descriptor. The new generated facial grid for some of the subjects with two different views is shown in Figure 3. The facial grid is very coarse for some of the subjects in the seated pose.
Fig. 3. Landmark points 1, 2, 3, 4, 5 and 10. Vertical and horizontal lines are the cutting plane.
Table 1. Numbers and names of landmark points used in our 3D face 1 3 5 7 10
Sellion Lt Infraobitale Rt.Tragion Lt. Tragion Rt. Clavicale
2 Rt Infraobitale 4 Supramenton 6 Rt. Gonion 8 Lt. Gonion 12 Lt.Clavicale
Next, we use four anthropometric landmark points (L1, L2, L3, L4) as shown in Figure 5, located on the facial surface, to properly position and align the face surface using an iterative method. There is some error in alignment and position because of error in measurements of the position of these landmark points. Then we interpolate the facial surface information on a regular rectangular grid whose size is proportional to the distance between the landmark points L2 and L3 ( d=| L3 - L2 | ) and whose grid size is 128 in both directions. We use a cubic interpolation and handle missing values with the nearest neighbor method when there are voids in the original facial grid. For some of the subjects there are large voids in the facial surface grids. Figure 4, shows the facial surface and the new rectangular grid.
Facial Shape Analysis and Sizing System
33
Fig. 4. Shows the new facial rectangular grid for two subjects
We properly positioned and aligned the facial surface and then interpolated the surface information on a regular rectangular grid whose size is proportional to the distance between the landmark points. Next we perform Principal Component Analysis (PCA) on the 3D surface and similarity based descriptors are created. In this method the head descriptor is only based on the facial region. The PCA recognition method is a nearest neighbor classifier operating in the PCA subspace. To test how well the PCA based descriptor performs, we studied the identification between 200 standing and sitting subjects. The CMC at rank 1 for the study is 85%. More details about this descriptor are described in the paper by [3, 5] 3.2 Spherical Harmonics Based In the second method the 3D triangular grid of the head is transformed to a spherical coordinate system by a least square approach and expanded in a spherical harmonic basis as shown in Figure 5. The main advantage of the Spherical Harmonics Based head descriptor is that it is orientation and position independent. The spherical harmonics based descriptor is then used with the L1 and L2 norm to create similarity measure. To test how well the Spherical Harmonics Based head descriptor performs, we studied the identification of the human head between 220 standing and sitting subjects. The CMC at rank 1 for the study is 94%.
Fig. 5. 3D head grid is mapped into a sphere
34
A. Godil
4 Cluster Analysis We have used the compact face descriptors for clustering. Clustering is the process of organizing a set of faces/heads into groups in such a way that the faces/heads within the group are more similar to each other than they are to other bodies belonging to different clusters. We use k-mean cluster analysis along with the shape descriptors and the number of clusters to come with sizes for products design such as respirator. The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data, where there are k clusters. k
V =∑
∑ (x
j
− μi ) 2
(1)
i =1 x j ∈S i
5
Results
For this initial study, we have used the facial surface of the first 200 standing subjects from the CAESAR database. The PCA based shape descriptor is calculated for these faces and then K-means clustering is applied and the number of cluster selected is four. Figure 6, shows the 40 faces out of the 200 faces that were used for clustering. The four different colors show the different clusters. When we are doing cluster analysis with the facial shape descriptor we can vary the emphasis on the shape verses the size. We should emphasis that these results based on cluster analysis are preliminary.
Fig. 6. Shows the 40 faces out of the 200 faces that were used for clustering. The four different colors show the different clusters.
Facial Shape Analysis and Sizing System
35
6 Conclusion We have developed compact shape representations of head and facial shape. We have proposed a sizing system based on cluster analysis along with compact shape representations to come up with different sizes for different facial wear products; such as, respirators, helmets, eyeglasses, etc. We also present our preliminary results based on clustering analysis. Disclaimer. The identification of any commercial product or trade name does not imply endorsement or recommendation by the National Institute of Standards and Technology (NIST). Also the findings and conclusions in this report are those of the authors and do not necessarily represent the views of the NIST.
References 1. Allen, B., Curless, B., Popovic, Z.: Exploring the space of human body shapes: data-driven synthesis under anthropometric control. In: Proc. Digital Human Modeling for Design and Engineering Conference, Rochester, MI, June 15-17, SAE International (2004) 2. CAESAR: Civilian American and European Surface Anthropometry Resource web site, http://www.hec.afrl.af.mil/cardlab/CAESAR/index.html 3. Godil, A., Ressler, S.: Retrieval and Clustering from a 3D Human Database based on Body and Head Shape. In: SAE Digital Human Modeling conference, Lyon, France, 02 July - 04 August (2006) 4. Godil, A., Grother, P., Ressler, S.: Human Identification from Body Shape. In: Proceedings of 4th IEEE International Conference on 3D Digital Imaging and Modeling, Banff, Canada, October 6-10 (2003) 5. Godil, A., Ressler, S., Grother, P.: Face Recognition using 3D surface and color map information: Comparison and Combination. In: Godil, A., Ressler, S., Grother, P. (eds.) The SPIE’s symposium on Biometrics Technology for Human Identification, Orlando, FL, April 12-13 (2004) 6. Ip, H.H.S., Wong, W.: 3D Head Model Retrieval Based on Hierarchical Facial Region Similarity. In: Proc. of 15th International Conference on Visual Interface (VI 2002), Canada (2002) 7. Paquet, E.: Exploring Anthropometric Data Through Cluster Analysis. In: Digital Human Modeling for Design and Engineering (DHM), Oakland University, Rochester, Michigan, USA, NRC 46564, June 15-17 (2004) 8. Paquet, E., Rioux, M.: Anthropometric Visual Data Mining: A Content-Based Approach. In: IEA 2003 - International Ergonomics Association XVth Triennial Congress. Seoul, Korea. NRC 44977 (Submitted, 2003) 9. Paquet, E., Viktor, H.L.: Adjustment of Virtual Mannequins through Anthropometric Measurements, Cluster Analysis and Content-based Retrieval of 3-D Body Scans. IEEE Transactions on Instrumentation and Measurement 56(5), 1924–1929 (1924) NRC 48821 10. Yang, L., Shen, H.: A pilot study on facial anthropometric dimensions of the Chinese population for half-mask respirator design and sizing. International Journal of Industrial Ergonomics 38(11-12), 921–926 (2008) 11. Zheng, R., Yu, W., Fan, J.: Development of a new Chinese bra sizing system based on breast anthropometric measurements. International Journal of Industrial Ergonomics 37(8), 697–705 (2007)
Facial Gender Classification Using LUT-Based Sub-images and DIE Jong-Bae Jeon, Sang-Hyeon Jin, Dong-Ju Kim, and Kwang-Seok Hong School of Information and Communication Engineering, Sungkyunkwan University, 300, Chunchun-dong, Jangan-gu, Suwon, Kyungki-do, 440-746, Korea
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. This paper presents a gender classification method using LUT-based sub-images and DIE (Difference Image Entropy). The proposed method consists of three major steps; extraction of facial sub-images, construction of a LUT (Look-Up table), and calculation of DIE. Firstly, extraction of sub-images of the face, right eye, and mouth from face images is conducted using Haar-like features and AdaBoost proposed by Viola and Jones. Secondly, sub-images are converted using LUT. LUT-based sub-regions are constructed by calculation of one pixel and near pixels. Finally, sub-images are classified male or female using DIE. The DIE value is computed with histogram levels of a grayscale difference image which has peak positions from -255 to +255, to prevent information sweeping. The performance evaluation is conducted using five standard databases, i.e., PAL, BioID, FERET, PIC, and Caltech facial databases. The experimental results show good performance in comparison with earlier methods. Keywords: Gender Classification, Difference Image Entropy.
1 Introduction Biometrics such as facial structure, fingerprints, iris structure, and voice can be used in many applications in fields such as human computer interaction, multimedia, security systems, and gate entrances. As time goes by, the research of facial image processing has become the focal point of many researchers’ attention. Facial images have a lot of information including information about gender, age, expression, and ethnic origin. In this paper a method for gender classification is proposed. Not surprisingly, a lot of research on facial gender classification has been done by researchers from in the field of computer science. Our gender classification method consists of three major steps; extraction of facial sub-images, construction of LUT, and calculation of DIE. We propose a new gender classification system using Shannon’s entropy-based DIE and LUT-based sub-images. The difference images are computed with pixel subtraction between input images and average images from reference images. For performance evaluation of the proposed method, we use five standard facial databases, i.e., PAL [1], BioID [2], FERET [3], PIC [4], and Caltech [5]. In addition, the proposed method is compared to the methods of using Euclidean with PCA and sobel image among the edge detection methods. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 36–45, 2009. © Springer-Verlag Berlin Heidelberg 2009
Facial Gender Classification Using LUT-Based Sub-images and DIE
37
This paper is organized as follows. In section 2, we review some related work on the research of gender classification. Section 3 describes the extraction method of sub-images of the face, eye, and mouth, basic concepts of the Difference Image Entropy (DIE), LUT-based sub-images and the method of the proposed facial gender classification. The experimental results are described in Section 4. Finally, we draw conclusions in Section 5.
2 Related Work Earlier work on gender classification mainly originated in psychology and cognition research. Recently people began to consider this problem more technically. Several methods have been proposed for solving the gender classification problem. Among them, systems based on neural networks, PCA, decision trees, SVM, and AdaBoost classifiers can be mentioned [6]. Shakhnarovich et al. [7] proposed an AdaBoostbased gender classification method that achieved even better performance than SVM. In [8], a gender classification system is proposed based on the use of the SVM classifier. Other work includes Wu et al.'s LUT-based AdaBoost method that implemented a real-time gender classification system with comparative performance [9]. In 1948, Shannon introduced a general uncertainty-measure on random variables that takes different probabilities among states into account [10]. Given events occurring with probability P, the Shannon entropy is defined as Eq. (1). m
H = ∑ pi log i =1
m 1 = − ∑ pi log pi pi i =1
(1)
Shannon’s entropy can also be computed for an image, where the probabilities of the gray level distributions are considered in the Shannon Entropy formula. A probability distribution of gray values can be estimated by counting the number of times each gray value occurs in the image and dividing those numbers by the total number of occurrences. In this method, Shannon entropy is also a measure of dispersion of a probability distribution [11], however this system is an entropy-based method for face localization. Recently, we proposed DIE-based teeth verification [12] and DIE-based teeth recognition [13].
3 A Proposal for Facial Gender Classification The architecture for a DIE-based facial gender classification system using facial images consists of three steps. First, sub-images are extracted using Haar-like features and the AdaBoost algorithm. Second, extracted sub-images are made into LUT-based sub-images. Third, DIE is computed with the accepted input sub-image and average sub-image from male and female. Finally, the minimum value is selected via comparison processing of the DIE value and the facial gender result is returned to the user. The system flow-chart of a DIE-based biometric verification system using an LUT-based image is shown in Fig. 1.
38
J.-B. Jeon et al.
Fig. 1. Block Diagram of gender classification
3.1 Extraction of Sub-images from Original Image In this paper, we extract three sub-regions from facial images as illustrated in Figure 2. The detection process for face, right eye and mouth regions use Haar-like features and the AdaBoost method introduced by Viola and Jones. Extracted subimages are resized. The face is resized to 80ⅹ80 pixels, the right eye to 40ⅹ40 pixels, and the mouth to 50ⅹ30 pixels. 3.2 LUT-Based Sub-images LUT is a data structure, usually an array or associative array, used to replace a runtime computation with a simpler array indexing operation. The LUT used in the proposed method is defined as the following Eq. (2) – (5). LUT is computed using pixel subtraction between a grayscale value of one pixel and the average value for three pixels around one pixel.
Facial Gender Classification Using LUT-Based Sub-images and DIE
39
Fig. 2. Extraction of three sub-images from original image
w −2 h−2
∑ ∑ LUT [ y ][ x ] = OI [ y ][ x ] − x =0 y =0
(2)
{OI [ y ][ x + 1] + OI [ y + 1][ x ] + OI [ y + 1][ x + 1]) / 3} h− 2
∑ LUT[ y][w − 1] = OI[ y][w − 1] − y =0
(3)
{OI[ y][w − 2] + OI[ y + 1][w − 2] + OI[ y + 1][w − 1]) / 3} w −2
∑ LUT[h − 1][x] = OI[h − 1][x] − x =0
(4)
{OI[ h − 2][ x] + OI [h − 2][ x + 1] + OI[ h − 1][ x + 1]) / 3} LUT [ h − 1][ w − 1] = OI[ h − 1][ w − 1] − {OI[ h − 2][ w − 2] + OI[ h − 2][w − 1] + OI[ h − 1][ w − 2]) / 3}
(5)
In above equation, OI is the original sub-image, y, x indicates height and width indexes, and w, h indicates the width and height of sub-images respectively. Finally, the pixel of LUT-based sub-images is added to a 100-gray value as shown in Eq. (6). The 100-gray value is an experimental value which is most suitable for gender classification. w −1 h −1
∑ ∑ Im g[ y ][ x] = LUT [ y ][ x] + 100 x=0 y =0
(6)
40
J.-B. Jeon et al.
3.3 Difference Image Entropy Difference Image Entropy is computed with histogram levels of a grayscale difference image which has a peak position from -255 to +255, to prevent information sweeping. The average image from the M reference sub-images is given in Eq. (7).
Saverage = In Eq. (7), S m ( x, y ) means the
1 M ∑ S m ( x, y ) M m=1
(7)
mth reference image. The difference image ( Ddiff )
is defined as Eq. (8). where Ddiff is computed by pixel subtraction between input subimages,
I input and average sub-images, and Saverage on random-collected gender
reference images.
Ddiff = I input − Saverage The DIE,
(8)
E g is defined as Eq. (9), and Pk means probabilities of the frequency
of histogram in difference images.
Eg = −
255
∑ Pk log 2 Pk =
k =−255
In addition, a probability
255
∑ P log
k =−255
k
2
1 Pk
(9)
Pk is defined as Eq. (10). Where a k indicates the fre-
quency of histogram from the -255 histogram levels to +255 histogram levels, and the sum and total of each histogram in the difference images G(T ) is given in Eq. (11).
Pk =
ak G(T )
G(T ) =
(10)
255
∑a
k =−255
k
(11)
4 Experiments and Results 4.1 Facial Databases Experiments for gender classification involved five standard databases of facial images, i.e., FERET, PIC, BioID, Caltech, and PAL. The sample images of frontal faces are shown in Fig. 3. We used a total of 3,655 frontal face images, 2,078 males and 1,577 females. We used 660 images from PIC with 438 males and 222 females. We used 705 images from FERET, with 400 males and 305 females. We used 1,270 images from BioID, with 746 males and 524 females. We used 440 images from Caltech, with 266 males and 174 females. And, we used 580 images from PAL, with
Facial Gender Classification Using LUT-Based Sub-images and DIE
(a)
(b)
(c)
(d)
41
(e)
Fig. 3. Samples from facial databases: (a) PAL, (b) FERET, (c) PIC, (d) BioID, (e) Caltech
228 males and 352 females. To make the average image for males and females, we used 1,040 and 790 images of men and women, respectively. Also, we use 1,038 images of males and 787 images females to evaluate the performance of the gender classification system. Extraction of face, right eye, and mouth regions from facial images is performed by using Haar-like features and the AdaBoost algorithm. Facial regions are detected using the frontal face cascade. This face detector is scaled to the size of 24 × 24 pixels. The right eyes are extracted by our own training. The training process is implemented in a Microsoft Visual C++ 6.0 environment, simulated on a Pentium 2.6GHz machine. This right eye detector is scaled to the size of 24 × 12 pixels. We used 5,254 images for positive images of the right eye region and 10,932 images for negative images of the background. The training set consisted of five face databases and our own facial images acquired by webcam as shown figure 4.
Fig. 4. Left: Right eye images (positive). Right: Non-right eye images (negative).
In order to perform gender classification experiments, the program needs to preprocess original facial images. The preprocessing step is described below. For grayscale images, extracted sub-images are converted into grayscale images and normalized using histogram equalization to minimize the effect of illumination. Next, sobel edge images and LUT-based images are made and converted in the gray level. Figure 5 shows male and female grayscale sub-images and the sub-images for sobel edge detected images and LUT-based images.
42
J.-B. Jeon et al.
(a)
(b)
(c)
Fig. 5. The preprocessed sub-images: (a) grayscale images, (b) sobel edge detected images, (c) LUT-based images
Figure 6 shows the average image of sub-images for grayscale images, sobel edge detected images and LUT-based images. These average images are used to compute the DIE values with input images.
(a)
(b)
(c)
Fig. 6. The average images of sub-images:(a) grayscale images, (b) sobel edge detected images, (c) LUT-based images
4.2 Comparative Experiments We tested four methods of facial gender classification. The first method was conducted using grayscale based sub-images and DIE for each sub-region. The second method was conducted using sobel edge images of detected sub-images and DIE. Thirdly, the proposed LUT-based sub-images and DIE method in the paper was tested for each sub-region. Lastly, the method of Principal Component Analysis (PCA) and Euclidean for each sub-region is conducted as comparative experiment. The result of facial gender classification for sub-images is shown in Table 1 to Table 3. Table 1, Table 2, and Table 3 are results for the facial region, right eye region, and mouth region, respectively.
Facial Gender Classification Using LUT-Based Sub-images and DIE
43
Table 1. Gender classification results for facial images
Male Grayscale image + DIE Female Sobel Edge Detected image + DIE
Male Female Male
LUT-based image +DIE Female Male PCA + Euclidean Female
Male
Female
646/1038 (62.23%) 126/787 (16.1%) 203/1038 (19.5%) 27/787 (3.5%) 815/1038 (78.5%) 246/787 (31.3%) 487/1038 (46.9%) 150/787 (19.1%)
392/1038 (37.77%) 661/787 (83.9%) 835/1038 (80.5%) 760/787 (96.5%) 223/1038 (21.5%) 541/787 (68.7%) 551/1038 (53.1%) 637/787 (80.9%)
Total
71.6%
52.7%
74.3%
61.6 %
Table 2. Gender classification results for right eye images
Male Grayscale image + DIE Female Sobel Edge Detected image + DIE
Male Female Male
LUT-based image +DIE Female Male PCA + Euclidean Female
Male
Female
743/1011 (73.4%) 354/765 (46.2%) 870/1011 (86.5%) 658/765 (86%) 691/1011 (68.3%) 369/765 (48.2%) 589/1011 (58%) 350/765 (45.7%)
268/1011 (26.6%) 411/765 (53.8%) 141/1011 (13.5%) 107/765 (14%) 320/1011 (31.7%) 396/765 (51.8%) 422/1011 (42%) 415/765 (54.3%)
Total
64.9%
55%
61.2%
56.5%
From the experiment results, it can be seen that the advantage of DIE is demonstrated. Also, LUT-based sub-images are better than grayscale sub-images and sobel edge detected sub-images. This method indicated classification rates with an overall classification rate of 74.3%. And the first, the second, and fourth methods showed a classification rate of 71.6%, 52.7%, and 61.6% for facial regions respectively. For the right eye region and mouth region, grayscale based images and the DIE method is 64.9% and 64.8% respectively. We can confirm two main results. First, the facial region indicated better performance than the right eye region and the mouth region. Secondly, the proposed LUT-based sub-images and the DIE method are generally better than three methods.
44
J.-B. Jeon et al. Table 3. Gender classification results for mouth images
Male Grayscale image + DIE Female Sobel Edge Detected image + DIE
Male Female Male
LUT-based image +DIE Female Male PCA + Euclidean Female
Male
Female
606/1038 (58.3%) 209/787 (26.5%) 1030/1038 (99.2%) 777/787 (98.7%) 241/1038 (23.2%) 320/787 (40.6%) 473/1038 (45.5%) 174/787 (22%)
432/1038 (41.7%) 578/787 (73.5%) 8/1038 (0.8%) 10/787 (1.3%) 797/1038 (76.8%) 467/787 (59.4%) 565/1038 (54.5%) 613/787 (78%)
Total
64.8%
56.9%
38.7%
59.5%
5 Conclusions In this paper, the method to classify whether an input image is male or female was proposed by using DIE and LUT-based sub-images. We conducted gender experiments for sub-images and four gender classification methods. In classification results for comparative experiments, the proposed LUT-based sub-images and the DIE method showed better performance than remainder methods with a classification rate of 74.5% for facial region. From this result, we confirm that DIE-based methods give reliable gender classification results. The gender classification system is expected to be applied to a live application in the field. In the future, it will be necessary to research more robust gender classification techniques including more variation in rotation, illumination, and other factors. Although we discuss the method of DIE on gender classification, it can be used in other facial expressions and age classifications. Also, more efforts should be paid on the combination of DIE and other pattern recognition algorithms.
Acknowledgment This research was supported by MIC, Korea under ITRC IITA-2008-(C1090-08010046), and the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korean government (MEST) (No. 2008-000-10642-0).
References 1. Minear, M., Park, D.C.: A lifespan dataset of adult facial stimuli. Behavior Research Methods, Instruments Computers 36(4), 630–633 (2004) 2. http://www.bioid.com/downloads/facedb/index.php
Facial Gender Classification Using LUT-Based Sub-images and DIE 3. 4. 5. 6. 7. 8.
9.
10. 11. 12.
13.
45
http://www.frvt.org/FERET/default.htm http://PICS.psych.stir.ac.uk http://www.vision.caltech.edu/html-files/archive.htm Amin, T., Hatzinakos, D.: A Correlation Based Approach to Human Gait Recognition. In: Biometrics Symposium, pp. 1–6 (2007) Moghaddam, B., Ming-Hsuan, Y.: Learning Gender with Support Faces. IEEE Trans. Pattern Analysis and Machine Intelligence 24(5), 707–711 (2002) Shakhnarovich, G., Viola, P., Moghaddam, B.: A Unified Learning Framework for Real Time Face Detection and Classification. In: IEEE conf. on Automatic Face and Gesture Recognition 2002, pp. 14–21 (2002) Wu, B., Ai, H., Huang, C.: LUT-based AdaBoost for Gender Classification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 104–110. Springer, Heidelberg (2003) Shannon, C.E.: A Mathematical Theory of Communication. The Bell Systems Technical Journal 27, 379–423 (1948) Alirezaee, S., Aghaeinia, H., Faez, K., Askari, F.: An Efficient Algorithm for Face Localization. International Journal of Information Technology 12(7), 30–36 (2006) Jong-Bae, J., Kim, J.-H., Yoon, J.-H., Hong, K.-S.: Teeth-Based Biometrics and Image Selection Method Using Difference Image Entropy. In: The 9th International Workshop on Information Security Applications (2008) Jeon, J.-B., Kim, J.-H., Yoon, J.-H., Hong, K.-S.: Performance Evaluation of Teeth Image Recognition System Based on Difference Image Entropy. In: IEEE conf. on ICCIT 2008, vol. 2, pp. 967–972 (2008)
Anthropometric Measurement of the Hands of Chinese Children Linghua Ran, Xin Zhang, Chuzhi Chao, Taijie Liu, and Tingting Dong China National Institute of Standardization, Zhichun Road, 4, Haidian District, Beijing 100088, China
[email protected]
Abstract. This paper presents the results of a nationwide anthropometric survey conducted on children in China. Eight hand anthropometric dimensions were measured from 20,000 children with age ranged from 4 to 17 years old. Mean values, standard deviations, and the 5th, 95th percentile for each dimension were estimated. The dimension difference between age, gender and difference between Chinese and Japanese were analyzed. It was found that the mean values of the dimensions showed a gradual increase by age. The dimensions had no significant difference between genders for the children from 4 to 12, but the difference became significant for the children from 13 to 17. Comparison between Chinese and Japanese children showed that Chinese children tended to have relatively longer and broader hands than Japanese children. These data, previously lacking in China, can benefit the children’s products design. Keywords: Hand; anthropometric measurement; Chinese children.
1 Introduction Anthropometric data are essential for the correct design of various facilities. Without such data, the designs can not fit people properly. This is especially true for children. The comfort and functional utility of the workspace, equipments and products which designed based on the anthropometric data are related with children’s health and safety. Many anthropometric studies had been undertaken to determine the size of children [1][2][3][4]. In china, a nationwide anthropometric survey project for children from 4 to 17 was completed from 2005 to 2008. This survey measured more than 100 anthropometric dimensions, including body size as well as head, foot and hand size. The hand anthropometric data for the children are presented in this paper. The purpose is to determine hand dimensions in different age groups to facilitate the design of such products as toys, gloves and other components in their daily life.
2 Methods 2.1 Subjects China is a vast country with an area of over 96 million square kilometers. Children in different regions have large difference in body development status and body shape. To V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 46–54, 2009. © Springer-Verlag Berlin Heidelberg 2009
Anthropometric Measurement of the Hands of Chinese Children
47
make the anthropometric survey more representative, a stratified cluster sampling method was used to determine the distribution of the samples. The whole country was divided into six geographical areas, which was in accordance with the adult anthropometric survey in 1988[5]: north and northeast area, central and western area, the lower reaches of the Changjiang River area, the middle reaches of the Changjiang River area, Guangdong-Guangxi-Fujian area, Yunnan-GuizhouSichuan area. From the statistical point of view, the people within each area have similar body shape and body size, but body shape and size for the people in different areas are different with other. The sample size in each area was determined based on the distribution of children’s population reported by China National Bureau of Statistics [6]. One or two cities in each area were selected and some kindergartens, primary schools and high schools were taken from these cities. Within each kindergarten, primary school or high school selected, a number of classes were taken and all the children in which were measured until the number of children desired in per age group was met. According to Report on the Physical Fitness and Health Surveillance of Chinese School Students (2000) [7] and Report on the Second National Physical Fitness Surveillance (2000) [8], the children were subdivided into five age groups: preschool (4-6), lower primary (7-10), upper primary (11-12), middle school (13-15), high school (16-17). In this survey, for example, 10 years old means ones whose age is from 9.5 to 10.5 years old. The sample size in each age group was distributed according to the children’s body development status. The sample size of preschool age group may be smaller. Within lower primary and middle school age group, sample size should be increased, and for upper primary and high school age group the sample size can be reduced appropriately. Based on this sampling plan, body dimension data were obtained from more than 20,000 children in ten provinces distributed in the six geographical areas. 2.2 Dimension Measurements Instead of traditional Martin type anthropometer, a two-dimensional color scanner was adopted for hand anthropometric survey. The ratio of image size with the real hand size was 1:1 with a resolution of 150.They were kept in BMP format. The advantages of such system are that it would be much faster than Martin method of collecting hand data and it is applicable for a large-scale anthropometric survey. And it would provide a permanent record from which any measurement dimensions can be taken as needed. To achieve a greater scientific uniformity, measurements were always carried out on the right hand. Every subject was scanned with two hand postures. The first was with four fingers closing together and the thumb naturally outreached, putting on the scanning plane lightly. The second was with the five fingers outreached as far as possible, putting on the scanning plane lightly. After each scanning, a view to the scanning results was required to prevent the scanning failure caused by finger shifting. In each area, before starting the survey the measurement team was specially trained in anthropometric techniques and checked for consistency in their procedures to ensure the data reliability. The parents or teachers were asked to fill a form including their
48
L. Ran et al.
child’s name, sex, birth date and place, nationality, the school and grade, etc. The whole survey was completed in a period of about two years. 2.3 Data Processing and Statistical Analysis Hand Dimension Calculating Computer Software was developed. This programme allows the user to select anatomical points on both hand images on screen by means of a cursor. Once the points in each image have been identified, the programme can calculate hand length and breadth dimensions automatically. In this paper, eight anthropometric measurement dimensions were taken: hand length, hand breadth at metacarpals, palm length perpendicular, index finger length, thumb length, middle finger length, index finger breadth (proximal) and index finger breadth (distal). Except thumb length and middle finger length, the definitions of other six hand dimensions were taken from ISO 7250:2004[9]. The dimension values obtained were categorized according to sex and age groups and abnormity data examination was conducted. The extreme outliers and unreasonable results were identified and eliminated carefully by using 3σ test, peak value test and logical value test. The Statistical Package for the Social Sciences (SPSS) for Windows version 16.0 was used in the following statistical analysis. The descriptive statistics, including arithmetic means (M), standard deviations (SD), and percentiles (5th and 95th) of the above measurements were calculated for both boys and girls.
3 Results The statistical data of eight hand anthropometric dimensions are presented in table 1-6, including the number of subjects, gender (boys and girls) and age (4 to 17 years old). Estimates of mean, standard deviation (SD) and the 5th, 95th percentile are included. All dimensions are reported in mm.
4 Discussion 4.1 Differences between Age Groups From table 1 to 6, it can be found that all mean values for the eight dimensions increase gradually by age. Because hand length and breadth are the basis to establish hand sizing system [10], hand length and breadth are further analyzed to show the difference between age groups. Table 6 and table 7 show the interclass increase value and relative odds ratio of the mean values. Both length and breadth show a trend for significant dimension increase by age in boys and girls and there are clear differences between the five age groups. For boys, the difference of hand length between (4-6) and (7-10) age group is 20.1mm. From (7-10) to (11-12), the length of boys increased by 16.8mm, and from (11-12) to (13-15) the increase value is 18.1mm. Also for girls, the increase of mean values of hand length are 20.7mm,18.6mm, 7.8mm and 0.9mm respectively for the age group from (4-6) to (16-17) .
Anthropometric Measurement of the Hands of Chinese Children
49
Table 1. The statistical values of hand anthropometric dimensions (4–6 years old) Dimensions(mm)
Boys(N=1138)
Girls(N=1140)
M 124.1
SD 9.3
P5 110.1
P95 138.7
M 122.0
SD 8.4
P5 108.2
P95 136.5
Hand breadth at metacarpals Palm length perpendicular Index finger length
58.4
4.0
52.3
64.7
56.5
3.7
50.7
62.6
71.0
5.6
62.7
80.4
69.3
5.2
61.0
78.2
48.2
4.0
42.2
55.1
48.0
3.8
41.6
53.9
Thumb length
39.2
3.7
33.9
45.2
38.5
3.5
32.9
44.3
Hand length
Middle finger length
53.8
4.5
47.2
61.3
53.6
4.1
47.0
60.5
Index finger breadth, proximal Index finger breadth, distal
14.3
1.4
12.1
16.5
13.8
1.3
11.7
16.0
12.7
1.3
10.8
14.9
12.3
1.2
10.5
14.5
Table 2. The statistical values of hand anthropometric dimensions (7–10 years old) Dimensions(mm)
Boys(N=2239) M
Girls(N=2115)
SD
P5
P95
M
SD
P5
P95
Hand length
144.2
9.9
128.7
161.9
142.7
10.5
126.3
161.2
Hand breadth at metacarpals Palm length perpendicular Index finger length
65.5
4.5
58.5
73.2
63.4
4.3
56.6
71.1
82.3
6.0
72.8
92.8
80.7
6.2
70.8
91.8
56.0
4.4
49.0
63.8
56.2
4.6
48.9
64.2
Thumb length
45.9
4.0
39.9
52.9
45.9
4.3
39.1
53.4
Middle finger length Index finger breadth, proximal Index finger breadth, distal
62.4
4.8
54.9
70.8
62.6
5.0
54.9
71.5
15.7
1.3
13.6
18.0
15.1
1.3
13.1
17.4
14.1
1.2
12.2
16.3
13.6
1.1
11.9
15.7
Table 6 and 7 also reveal that for both boys and girls, there is a stage in which the hands have a relatively fast growth rate. For boys, it is in the 4 to 15 years old, but for girls it is the 4 to 12 years old. When the boy grows up to 15, girls to 12, the hand growth rate slows down. According to the Report on the Physical Fitness and Health Surveillance of Chinese School Students (2000), children have a sudden increase in youth period. During this period, their physical size has an obvious change. In that report, the periods are 12-14 and 10-12 for the boys and girls. It can be found that there is a certain degree of correlation between the hand dimension changes and age group. This exact relationship may be verified through future research.
50
L. Ran et al. Table 3. The statistical values of hand anthropometric dimensions (11–12 years old)
Dimensions(mm) Hand length Hand breadth at metacarpals Palm length perpendicular Index finger length Thumb length Middle finger length Index finger breadth, proximal Index finger breadth, distal
Boys(N=2098)
Girls(N=2019)
M
SD
P5
P95
M
SD
P5
P95
161.0
10.9
144.6
180.7
161.3
9.3
145.9
176.5
71.8
5.1
64.3
81.1
70.0
4.1
63.5
76.9
91.8
6.5
81.6
103.6
90.9
5.6
81.5
100.1
62.3
4.7
55.0
70.6
63.5
4.5
56.3
71.0
51.7
4.4
45.0
59.6
52.3
4.0
45.9
59.2
69.6
5.3
61.7
79.3
70.9
4.8
62.9
78.8
17.0
1.5
14.6
19.5
16.4
1.4
14.3
18.7
15.1
1.3
13.1
17.5
14.8
1.3
12.8
17.0
Table 4. The statistical values of hand anthropometric dimensions (13–15 years old) Dimensions(mm)
Boys(N=2942)
Girls(N=2795)
M 179.1
SD 10.9
P5 159.6
P95 196.1
M 169.1
SD 7.8
P5 156.4
P95 181.7
Hand breadth at metacarpals Palm length perpendicular Index finger length
79.5
5.2
70.5
87.8
73.0
3.7
67.1
79.2
101.6
6.5
90.4
112.1
95.2
5.0
87.2
103.5
69.3
5.1
60.7
77.5
66.8
4.0
60.5
73.2
Thumb length
57.5
4.6
49.9
64.9
54.4
3.6
48.6
60.6
Middle finger length Index finger breadth, proximal Index finger breadth, distal
77.7
5.5
68.2
86.6
74.2
4.2
67.5
81.3
18.6
1.7
15.7
21.3
17.3
1.4
15.2
19.6
16.4
1.5
14.0
18.8
15.4
1.2
13.4
17.5
Hand length
4.2 Gender Differences The differences between boys and girls can be obtained in table 1 to 6. In table 1 to 3 most of the boys’ dimensions have a litter higher than girls’, but the differences are not obvious. The differences of mean values range from -1.3mm (index finger length and middle finger length in 11-12 age group) to 2.1mm (hand length in 4-6 age group and hand breadth in 7-10 age group). In table 4 and 5, the gender differences have become significant. In the age group of (13-15), the mean differences range from 1.0mm (index finger breadth, distal) to 10.0mm (hand length).The differences keep increasing in the 16-17 age group by a range from 1.3mm (index finger breadth, distal) to 14.7 mm (hand length).
Anthropometric Measurement of the Hands of Chinese Children
51
Table 5. The statistical values of hand anthropometric dimensions (16–17 years old) Dimensions(mm)
Boys(N=1840)
Girls(N=1910)
M 184.7
SD 8.9
P5 170.2
P95 198.8
M 170.0
SD 8.0
P5 157.2
P95 183.2
Hand breadth at metacarpals Palm length perpendicular Index finger length
82.0
4.4
75.2
89.2
73.4
3.6
67.4
79.4
105.0
5.8
95.7
114.9
96.0
5.1
88.2
104.5
71.7
4.2
65.1
78.6
67.0
3.9
60.5
73.8
Thumb length
59.2
4.0
52.8
66.2
54.4
3.8
48.2
60.6
Middle finger length
80.2
4.6
72.6
87.5
74.4
4.3
67.5
81.7
Index finger breadth, proximal Index finger breadth, distal
19.2
1.5
16.8
21.5
17.6
1.3
15.5
19.7
16.8
1.4
14.7
19.2
15.6
1.2
13.7
17.6
Hand length
Table 6. Mean value increase of hand length and breadth in different age groups (for boys)
(4-6) to (7-10)
Hand length Interclass increase (mm) 20.1
Relative odds ratio (%) 116.2
Hand breadth at metacarpals Interclass Relative odds increase (mm) ratio (%) 7.1 121.2
(7-10) to (11-12)
16.8
111.7
6.3
109.6
(11-12) to (13-15)
18.1
111.2
7.7
110.7
(13-15) to (16-17)
5.6
103.1
2.5
103.1
Age group
Table 7. Mean value increase of hand length and breadth in different age groups (for girls)
(4-6) to (7-10)
Hand length Interclass increase (mm) 20.7
Relative odds ratio (%) 117.0
Hand breadth at metacarpals Interclass Relative odds increase (mm) ratio (%) 6.9 112.2
(7-10) to (11-12)
18.6
113.0
6.6
110.4
(11-12) to (13-15)
7.8
104.8
3.0
104.3
(13-15) to (16-17)
0.9
100.5
0.4
100.5
Age group
The significance of the differences between boys and girls was also examined by Mollision’s method [11] [12] across age groups. The formula is as followed:
S=
A1 − A11 × 100 S A11
(1)
52
L. Ran et al.
A1 — Arithmetic mean of boys in each age group; A11 — Arithmetic mean of girls in each age group; S A11 — Standard deviation of girls in each age group; Differences between the means of boys and girls are expressed in each measurement by percentage deviation. When the indicator of mean deviation is positive, then the value of the mean of boys is bigger than the mean of girls. The situation is reversed when the indicator is negative. If the result exceeds 100, then it shows that there is a significance difference between the two groups. The indicator of mean deviation was calculated. The results showed that from 4 to 12, no significant differences were found between boys and girls in all the eight hand dimensions. In (13-15) age group, the differences were significant in the hand length, hand breadth and the palm length. In the (16-17) age group, all the eight dimensions had significant difference between boys and girls, especially the hand length, breadth and palm length. The results showed that the hand dimensions have very little differences between boys and girls for the children from 4-12 years old, which may imply that it was not necessary to consider gender difference in the design of some hand related products for children younger than 12 years old. But for children older than 12 years old, the difference should be taken into consideration. 4.3 Differences between Chinese and Japanese Children China and Japan are both in the eastern of Asia, with similar ethnic characteristics and cultural traditions. It is meaningful to find out whether there are significant differences in body dimensions of the children in these two groups. Table 8. Comparison of mean values between China and Japan in four age groups ( boys) Dimensions(mm)
Hand length Hand breadth at metacarpals Palm length perpendicular Index finger length Thumb length Middle finger length Index finger breadth, proximal Index finger breadth, distal
(7-10)
(11-12)
(13-15)
(16-17)
China
Japan
China
Japan
China
Japan
China
Japan
144.2
141.6
161.0
157.0
179.1
175.8
184.7
182.6
65.5
61.7
71.8
68.5
79.5
76.1
82.0
79.2
82.3
80.2
91.8
88.4
101.6
99.3
105.0
103.5
56.0
54.1
62.3
60.4
69.3
67.5
71.7
69.8
45.9
46.0
51.7
51.5
57.5
58.0
59.2
60.2
62.4
61.4
69.6
68.6
77.7
76.5
80.2
79.2
15.7
16.1
17.0
17.3
18.6
18.7
19.2
19.8
14.1
14.0
15.1
15.0
16.4
16.2
16.8
17.3
Anthropometric Measurement of the Hands of Chinese Children
53
Table 9. Comparison of mean values between China and Japan in four age groups( girls) Dimensions(mm)
Hand length Hand breadth at metacarpals Palm length perpendicular Index finger length Thumb length Middle finger length Index finger breadth, proximal Index finger breadth, distal
(7-10)
(11-12)
(13-15)
(16-17)
China
Japan
China
Japan
China
Japan
China
Japan
142.7
140.9
161.3
158.8
169.1
167.5
170.0
167.7
63.4
60.4
70.0
67.5
73.0
70.1
73.4
70.8
80.7
79.5
90.9
89.1
95.2
94.0
96.0
94.3
56.2
54.3
63.5
61.9
66.8
65.9
67.0
65.6
45.9
45.4
52.3
51.5
54.4
54.3
54.4
55.3
62.6
61.5
70.9
69.6
74.2
73.4
74.4
73.4
15.1
15.5
16.4
16.9
17.3
17.6
17.6
17.8
13.6
13.5
14.8
14.6
15.4
15.1
15.6
15.3
The Japanese data were collected during 1992 to 1994 by the Institute of Human Engineering for Quality of Life (HQL).More than 5,000 children from 7 to 17 years old were included in the survey. Because there’s no hand data for children from 4 to 6 in Japan, only the hand mean values in four age groups are displayed in table 8 and 9. It is found that both Chinese boys and girls have greater values in hand length, hand breadth palm length in the four age groups. It appears that Chinese children have longer and broader hands than Japanese children. As to the three finger length dimensions, most of the Chinese children have a relatively higher value than Japanese children. Only one finger breath dimension was compared and it showed that Japanese children had wider 2nd joint and narrower1st joint than Chinese children. Whether there are differences between other fingers’ breadth, more data should be extracted from the Chinese children hand images.
5 Conclusion This study was conducted to provide hand anthropometric information of Chinese children from 4 to 17 years old, which could be used for the ergonomic design of workspace and products. A total of eight hand anthropometric dimensions extracted from 20,000 children are listed in the forms of mean, standard deviation and percentile values. The differences among age groups, between boys and girls groups, and between Chinese and Japanese are discussed. The results showed that the differences between the age groups were significant. In (13-15) age group, the gender difference was significant in the hand length, hand breadth and the palm length and in the (16-17) age group; all the eight dimensions had significant difference between boys and girls. Chinese children had longer and broader hands than Japanese children, whereas Japanese children had wider 2nd joint and narrower 1st joint than Chinese children. In
54
L. Ran et al.
this study, the hand dimensions were extracted from 2-D images. The thickness and girth data about hands and fingers have not been obtained. Nevertheless, this survey provides the first hand anthropometric database of Chinese children.
References 1. Wang, M.-J.J., Wang, E.M.-y., Lin, Y.-C.: The Anthropometric Database for Children and Young Adults in Taiwan. Applied Ergonomics 33, 583–585 (2002) 2. Kayis, B., Ozok, A.F.: Anthropometry Survey Among Turkish Primary School Children. Applied Ergonomics 22, 55–56 (1991) 3. Steenbekkers, L.P., Molenbroek, J.F.: Anthropometric Data of Children for Non-specialist Users. Ergonomics 33(4), 421–429 (1990) 4. Prado-Leon, L.R., Avila-Chaurand, R., Gonzalez-Munoz, E.L.: Anthropometric Study of Mexican Primary School Children. Applied Ergonomics 32, 339–345 (2001) 5. Chinese National Standard, GB10000-1988: Human Dimension of Chinese Adults. Standards Press of China, Beijing (1988) 6. National Bureau of Statistics: Chinese Demographic Yearbook. Statistics Press, China (2003) 7. Ministry of Education of the People’s Republic of China, General Administration of Sports of China, Ministry of Health of the People’s Republic of China, Ministry of Science and Technology, Sports and health study group of Chinese Students Allocation: Report on the Physical Fitness and Health Surveillance of Chinese School Students. Higher Education Press, Beijing (2000) 8. General Administration of Sports of China: Report on the Second National Physical Fitness Surveillance. Sports University Press, Beijing (2000) 9. International Standard, ISO 7250: Basic Human Body Measurements for Technological Design. International standard Organization (2004) 10. Chinese National Standard, GB16252-1996: Hand Sizing system-Adult. Standards Press of China, Beijing (1996) 11. Hu, H., Li, Z., Yan, J., Wang, X., Xiao, H., Duan, J., Zheng, L.: Anthropometric Measurement of the Chinese Elderly Living in the Beijing Area. International Journal of Industrial Ergonomics 37, 303–311 (2007) 12. Nowak, E.: Workspace for Disabled People. Ergonomics 32(9), 1077–1088 (1989)
Comparisons of 3D Shape Clustering with Different Face Area Definitions Jianwei Niu1,∗, Zhizhong Li2, and Song Xu2 1
School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083, China Tel.: 86-131-61942805
[email protected],
[email protected] 2 Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
[email protected],
[email protected]
Abstract. The importance of fit for face-related wearing products has introduced the necessity for better definition of face area. In this paper, three definitions of face area are compared on the context of Three dimensional (3D) face shape similarity based clustering. The first method defines the face area by spanning from the whole head grid surface by the front π/2 wedge angle along a line going through the centroid and pointing to the top of the head. The second method defines the face area as the grid surface enclosed by several anthropometric landmark points (sellion, both zygions, and menton) on the facial surface. The zonal surface where the respirator interferes with the wear’s face is taken as the third alternative definition for the comparative study. By utilizing the block-distance measure, each face was converted into a compact block-distance vector. Then, k-means clustering was performed on the vectors. 376 3D face data sets were tested in this study. One-way ANOVA on the block distance based vectors was conducted to evaluate the influence on clustering results by utilizing different face area definitions. No difference was found at the significant level of 0.05. However, the cluster membership shows great difference between different definitions. This emphasizes the value of the selection of face area in 3D face shape-similarity-based clustering. Keywords: 3D anthropometry; face area; shape comparison; clustering.
1 Introduction Of all the biometrics features, face is among the most common ones [1]. Face anthropometry is a focused issue over the past years. It has applications in clinical diagnostics, cosmetic surgery, forensics, arts and other fields. For example, comparison with patient’s face anthropometric data can help to indicate the existence of deformities, possibly leading to discovery of an illness [2]. If size and shape of a deformity are quantifiable, the surgeon can make more exact statements about necessary corrections [3]. Before 3D digitalizing technology emerged, traditional anthropometry was based on one-dimensional (1D) dimensions. Fortunately, with the wide availability of 3D ∗
Corresponding author.
V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 55–63, 2009. © Springer-Verlag Berlin Heidelberg 2009
56
J. Niu, Z. Li, and S. Xu
scanning technologies, it is convenient to acquire the 3D data of the human body. Understanding the 3D shape variation is essential to many scientific activities such as personal identification, population accommodation, human computer interaction, and image retrieval, etc [4]. Extraction of biologically important information on shape variance from 3D digitized samples has been developed as geometric morphometrics, which has now found extensive applications to 3D human data, such as pathology, archaeology, primatology, paleoanthropology, and reconstructive craniofacial surgery [5]. For example, Hennessy et al. [6] made an effort by using 3D face shape to establish a relationship between facial morphogenesis and adult brain function as a basis for conducting subsequent studies in schizophrenia. How to use 3D anthropometry to obtain proper fit of wearing products has been excessively addressed [7-11], while how to use 3D face anthropometry for the fit design purpose has not been well investigated yet. As an example, Mochimaru and Kouchi [12] used Free Form Deformation (FFD) method in the analysis of 3D human face forms for spectacle frames design. As typical face-related wearing products, respirators have been widely used across numerous fields. The current sizing for respirators is based on some linear measurements. In USA, the respirator RFTP with the proper facial anthropometric dimensions should specify tightness of fit satisfactorily for >95% of the targeted race group [13, 14]. However, NIOSH’s research indicated that the LANL panel for full-facepiece respirators accommodated only 84% of current civilian subjects [15]. Utilizing 3D facial shape information appears to be a promising avenue to overcome some of the limitations of current 1D-measurement based sizing systems and widened the opportunities for improving the design of face-related products. However, unlike some other biometrics features such as iris, retina, and fingerprint, it’s usually difficult to define the face area strictly, especially in 3D form. Various definitions of face area have been introduced in the past. There is considerable interest in the assessment of 3D shape clustering with different face area definitions. In this paper, three definitions of face area are compared on the context of 3D face shape similarity based clustering. The remainder of this paper is organized as follows. In Section 2, we introduce the method. Section 3 reports the results and gives some discussions. Finally, Section 4 summarizes this study.
2 Methods 2.1 Different Face Area Definitions The raw 3D head data of 376 young male Chinese soldiers (aged from 19 to 23) are used [16]. All faces are aligned by translating the origin of the Cartesian coordinate system to a specified point. The y and z axis values of the new origin are the average values of the y and z axis values of sellion, both zygions, both cheilions, and menton, and the x axis value of the new origin equals the x axis value of sellion. The landmarks, defined in accordance with 1988 Anthropometric Survey of the U.S. Army Personnel Project [17], were located manually by the same experienced investigator.
Comparisons of 3D Shape Clustering with Different Face Area Definitions
57
Three definitions of face area are then introduced. The first method defines a face as the surface spanned from the whole heads by the front π/2 wedge angle along a line going through the centroid and pointing to the top of a head. Here the centroid was computed as the point with averaged coordinates of all points. This selection criterion of the front π/2 wedge angle is based on the observation of the average angle spanned forward of all samples in this study, based on which almost the whole face coverage could be obtained. The second method defines a face as a grid characterized by four facial landmarks, i.e. sellion, both zygions and menton. The top of the face area lies 50mm above the sellion. This is based on the subjective judgment of the position of a full-face respirator on the forehead. The zonal surface on the face where a certain level of compression force will be applied around is the third definition of face area for our study, since it is the actual interfacing area between equipment and face. If the surface of the contacting strip is not well consistent with the zonal surface, the compression force will be unevenly distributed and cause discomfort. In our previous study [18], a block-division method was proposed to convert each 3D surface into a block-distance based vector. In the current case study, each face surface was divided into 30 (6X5) blocks, and the zonal surface consists of the peripheral 18 blocks. 2.2 Comparison between Different Face Area Definitions For each face area definition, k-means clustering was applied to the block distance-based vectors referring to the inscribed surface of all samples. Wang and Yuan [19] presented a new oxygen mask sizing system where they partitioned the Chinese face samples into four sizes, namely small size, medium-narrow size, medium-wide size and large size. For comparison with their method in the future, the number of K for the clustering was also set as four in this case study. The representative face surface of each cluster is obtained by calculating the average coordinates of the points of the samples belonging to the cluster. Then the block distance between a sample surface and the representative surface can be constructed as S1' and S2' .
S1' can be calculated as,
S1' =
1 n ∑ dis( p j ) n j =1
(1)
where pj is the jth point, n represents the number of points of a face, and Euclidean distance between two corresponding points on the sample and the representative surface, dis (pj), was computed. S2' can be calculated as,
58
J. Niu, Z. Li, and S. Xu
S2' =
1 n dis( p j ) − S1' ∑ n j =1
(2)
S2 describes the local shape variation between the sample and the representative surface. Tests for normality are conducted on all S1' and S2' values using the One-Sample Kolmogorov-Smirnov test. Tests for the homogeneity-of-variance of the variables are conducted using the Levene test. Finally, multiple comparisons of means between the three face area definitions were conducted by using One-way ANOVA.
3 Results 3.1 Face Area Definitions The landmarks labeled manually are illustrated in Fig. 1. Considering the difficulty in identifying landmarks on a virtual image without the ability of feeling the force feedback to palpate and locate bony landmarks as in traditional anthropometry, the landmark-label result has passed visual check from several views of the 3D head under CAD software Unigraphics. The face areas according to the three definitions are shown in Figs. 2-4, respectively. For the zonal surface, the average value of the side length of each peripheral block is about 25mm. This is consistent with the width of the contacting strip of full-face respirator in real application.
L5
L1
L6
L2
L3
L4
(a) front view
(b) side view
Fig. 1. Interactive manual identification of landmarks (pink dots, L1: sellion; L2: right zygion;
;
L3: right cheilion; L4: menton L5: left zygion; L6: left cheilion)
Comparisons of 3D Shape Clustering with Different Face Area Definitions
(a) front view
(b) side view
Fig. 2. The first definition of face area
(a) front view
(b) side view
Fig. 3. The second definition of face area
(a) front view
(b) side view
Fig. 4. The third definition of face area
3.2 Comparison between Different Face Area Definitions The average face area of each cluster was generated, as shown in Fig. 5.
59
60
J. Niu, Z. Li, and S. Xu
First definition
Second definition
Third definition
Face definition
Front view
Side view
Bottom view
Fig. 5. Different views of the merged average faces of clusters
Tests for normality of S1' and S2' values showed p values less than 0.05, resulting in rejection of the null hypothesis. Afterwards each S1' and S2' values were transformed into their corresponding natural logarithmic values, denoted as ln S1' and ln S2' respectively. One-Sample Kolmogorov-Smirnov test on the ln S1' and ln S2' values resulted in p values greater than 0.05 (p=0.566 and 0.106 respectively). Levene test on the ln S1' and ln S2' values showed p values of 0.138 and 0.000, respectively. This indicated that the homogeneity-of-variance for ln S1' was satisfied at the significance level of 0.05, while for ln S2' the homogeneity-of-variance was not satisfied. So when multiple comparisons in One-way ANOVA was conducted, Least-significant difference (LSD) test was used for ln S1' , while Tamhane's T2 was used for ln S2' .
Comparisons of 3D Shape Clustering with Different Face Area Definitions
61
The descriptives of block-distance measures, i.e., ln S1' and ln S2' , for different face definitions are shown in Table 1. It can be seen that for the average values of ln S1' , the difference between the first two face definitions is almost ignorable. While, the average value of ln S1' for the zonal face area is greater than the first two alternatives. In contrast, the average value of ln S2' for the zonal face area is smaller than the first two alternatives. This can be explained from the definition of S1' and S2' which reflect the local size and shape differences, respectively. The zonal face area only covers a small portion of the whole face, i.e., the peripheral blocks of the face. For the whole face, the distance is averaged over a big surface, thus the effect of S1' is weakened. Since the shape variation of the whole face is greater that that of the peripheral portion of the face, S2' values of the whole face are greater. What is more, compared with the center face area consisting nose, mouth, and eyes, the zonal face area usually demonstrates more regular geometry. Therefore, the shape variation of the zonal face area is smaller, and the effect of S2' is weakened. Table 1. Descriptives of block-distance measures (N=376)
ln S1'
ln S2'
Face definition 1 2 3 1 2 3
M 1.21 1.21 1.25 0.09 0.10 0.07
SD 0.317 0.324 0.346 0.383 0.381 0.455
As shown in Table 2, One-way ANOVA results demonstrated p values greater than 0.05. Such results lead to no rejection of the null hypothesis at the significance level of 0.05. However, the p values for ln S1' between the first and third definition (0.129), and between the second and third definition (0.070), both show marginally significant difference. Cluster membership variation with different face areas was investigated and summarized in Table 3. Compared with the second definition, the numbers of samples whose cluster membership has changed with the first and the third definition are 83 and 30, respectively. Whereas, the number of samples with changed membership between the first and third definition is 79. It can be seen that the the membership variation between the second and third definition is much smaller than that between the first and second definition. These membership differences may indicates that the face area definition should be considered according to design requirements when developing a sizing system for face-interfaced products.
62
J. Niu, Z. Li, and S. Xu Table 2. Multiple comparisons in One-way ANOVA
Dependent Variable
Group I
Group J
Mean Difference (I-J)
Std. Error
Sig.
ln S1'
1 2 0.01 0.024 0.767 1 3 -0.04 0.024 0.129 2 3 -0.04 0.024 0.070 ' 1 2 -0.02 0.028 0.894 ln S2 1 3 0.02 0.031 0.935 2 3 0.03 0.031 0.610 Note: Group 1, 2, and 3 are the first, second and third face definitions, respectively. Table 3. Cluster membership change (N=376)
Cluster ID 1 2 3 4 number of change
Second definition 40 114 48 174 -
Sample size First definition 37 160 48 131 83
Third definition 35 137 49 155 30
4 Conclusions This study investigates the influence of face area definition on 3D face shape clustering. Though no significant difference is found for the block-distance measures between these three face definitions, the cluster membership shows remarkable difference between the first definition and the latter two alternatives. This underlines the potential value of the selection of face area for assessing the face shape variation among the population and designing better fitted face-related products.
Acknowledgements The study is supported by the National Natural Science Foundation of China (No.70571045).
References 1. Zhou, M.Q., Liu, X.N., Geng, G.H.: 3D face recognition based on geometrical measurement. In: Li, S.Z., Lai, J.-H., Tan, T., Feng, G.-C., Wang, Y. (eds.) SINOBIOMETRICS 2004. LNCS, vol. 3338, pp. 244–249. Springer, Heidelberg (2004)
Comparisons of 3D Shape Clustering with Different Face Area Definitions
63
2. McCloskey, E.V., Spector, T.D., Eyres, K.S., Fern, E.D., O’Rourke, N., Vasikaran, S., Kanis, J.A.: The assessment of vertebral deformity: A method for use in population studies and clinical trials. Osteoporosis International 3(3), 138–147 (1993) 3. Kaehler, K.: A Head Model with Anatomical Structure for Facial Modeling and Animation. Max-Planck-Institut für Informatik in Saarbrücken, Germany (2003) 4. Godil, A.: Advanced human body and head shape representation and analysis. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 92–100. Springer, Heidelberg (2007) 5. Hennessy, R.J., McLearie, S., Kinsella, A., Waddington, J.L.: Facial surface analysis by 3D laser scanning and geometric morphometrics in relation to sexual dimorphism in cerebral-craniofacial morphogenesis and cognitive function. Journal of Anatomy 207(3), 283–295 (2005) 6. Hennessy, R.J., McLearie, S., Kinsella, A., Waddington, J.L.: Facial Shape and Asymmetry by Three-Dimensional Laser Surface Scanning Covary With Cognition in a Sexually Dimorphic Manner. The Journal of Neuropsychiatry and Clinical Neurosciences 18, 73–80 (2006) 7. Whitestone, J.J., Robinette, K.M.: Fitting to maximize performance of HMD systems. In: Melzer, J.E., Moffitt, K.W. (eds.) Head-Mounted Displays: Designing for the User, pp. 175–206. McGraw-Hill, New York (1997) 8. Meunier, P., Tack, D., Ricci, A., Bossi, L., Angel, H.: Helmet accommodation analysis using 3D laser scanning. Applied Ergonomics 31, 361–369 (2000) 9. Hsiao, H.W., Bradtmiller, B., Whitestone, J.: Sizing and fit of fall-protection harnesses. Ergonomics 46(12), 1233–1258 (2003) 10. Witana, C.P., Xiong, S.P., Zhao, J.H., Goonetilleke, R.S.: Foot measurements from three-dimensional scans: A comparison and evaluation of different methods. International Journal of Industrial Ergonomics 36, 789–807 (2006) 11. Au, E.Y.L., Goonetilleke, R.S.: A qualitative study on the comfort and fit of ladies’ dress shoes. Applied Ergonomics 38(6), 687–696 (2007) 12. Mochimaru, M., Kouchi, M.: Proper sizing of spectacle frames based on 3-D digital faces. In: Proceedings: 15th Triennial Congress of the International Ergonomics Association (CD ROM), Seoul, Korea, August 24-29 (2003) 13. National Institute for Occupational Safety Health NIOSH, DHEW/NIOSH TR-004-73. In: McConville, J.T., Churchill, E., Laubach, L.L. (eds.) Cincinnati, OH: National Institute for Occupational Safety and Health. pp. 1–44 (1972) 14. Zhuang, Z.: Anthropometric research to support RFTPs. In: The CDC workshop on respiratory protection for airborne infectious agents, Atlanta, GA (November 2004) 15. Federal Register/Notice: Proposed Data Collections Submitted for Public Comment and Recommendations 67(16) (2002) 16. Chen, X., Shi, M.W., Zhou, H., Wang, X.T., Zhou, G.T.: The "standard head" for sizing military helmets based on computerized tomography and the headform sizing algorithm (in Chinese). Acta Armamentarii. 23(4), 476–480 (2002) 17. Cherverud, J., Gordon, C.C., Walker, R.A., Jacquish, C., Kohn, L.: Northwestern University of EVANSTON IL, 1988 Anthropometric Survey of U.S. Army Personnel: Correlation Coefficients and Regression Equations, Part 1 Statistical Techniques, Landmark and Measurement definition (TANICK/TR-90/032), pp. 48–51. U.S. Army Natick Research, Development and Engineering Center Evanston, Natick, MA (1990) 18. Niu, J.W., Li, Z.Z., Salvendy, G.: Multi-resolution shape description and clustering of three-dimensional head data. Ergonomics (in press) 19. Wang, X.W., Yuan, X.G.: Study on type and sizing tariff of aircrew oxygen masks. Journal of Beijing University of Aeronautics and Astronautics 27(3), 309–312 (2001)
Block Division for 3D Head Shape Clustering Jianwei Niu1, Zhizhong Li2, and Song Xu2 1
School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083, China 2 Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China
[email protected]
Abstract. In our previous Three Dimensional (3D) anthropometric shape clustering study, block-division technique is adopted. The objective of this study was to examine the sensitivity of clustering results on block-division. Such a block-division technique means to divide each 3D surface into a predefined number of blocks. Then by using a block-distance measure, each surface is converted into a block-distance based vector. Finally, k-means clustering is performed on the vectors to segment a population into several groups. Totally 447 3D head samples have been analyzed in the case study. The influence of block division number on clustering was evaluated by using One-way ANOVA. No significant difference was found between the three block division alternatives. This means the adopted method is robust to block division. Keywords: Three dimensional anthropometry; block-division; clustering; sizing.
1 Introduction In the past decades, several large scale 3D anthropometric surveys have been conducted, such as Civilian American and European Surface Anthropometry Resource (CAESAR) [1], SizeUK [2], SizeUSA [3], etc. An international collaboration named World Engineering Anthropometry Resource (WEAR) brings together a wealth of different anthropometric data collected across the world [4]. 3D anthropometric shape analysis has found many applications such as in clinical diagnostics, cosmetic surgery, forensics, arts, and entertainment as well as in other fields. How to utilize 3D anthropometric data to improve the fitting level of wearing products has also gained great attention from the ergonomics and human factors [5-14]. An effective way to design fitting products is to analyze the shape of human body forms and classify a specific population into homogeneous groups. Traditionally, some One Dimensional (1D) measurements were usually selected as key dimensions for the analysis of human body variation [15]. Unfortunately, there are some drawbacks in such traditional sizing methods. The most important is that geometric characteristics and internal structure of human surface are not adequately considered, which may lead to design deficiency on fitting comfort [16]. For example, studies have disclosed that foot length and width measures are insufficient for proper fit though most consumers usually select footwear based on the two measurements [17, 18]. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 64–71, 2009. © Springer-Verlag Berlin Heidelberg 2009
Block Division for 3D Head Shape Clustering
65
Considering the inherent abundant information contained in 3D anthropometric data, sizing methods based on 3D anthropometric data may be able to overcome the drawbacks of traditional sizing methods. However, this seems not an easy task. In our previous study [19], a block-division method was proposed to convert each 3D surface into a block-distance based vector, which reflects both size and shape difference. Such vectors were then used as the input of k-means clustering algorithm. The influence of the block division number on the 3D shape clustering is further studied in this paper. The remainder of this paper is organized as follows. Section 2 introduces the proposed method. A case study of 446 3D head samples is then presented in Section 3. Finally, Section 4 concludes this paper.
2 Methods 2.1 Block Division of Head Data The raw 3D data of 446 head of young male Chinese soldiers (aged from 19 to 23) were collected by a Chinese military institute in 2002 [20]. The data we received are row points of outer surface in each slice. All samples were properly positioned and oriented according to a predefined alignment reference [19]. Once the alignment of a 3D head is done, a ‘vector descriptor’ is established. A vector descriptor consists of a number of block distances. Here the term of block means a regular patch on the 3D surface. First the inscribed surface of all the samples is calculated. Then the inscribed surface and all sample surfaces can be divided into m blocks. Let P, Q denote the number of the control knots of a surface in u and v directions respectively, and p, q denote the desirable number of the control knots of a block in u and v directions respectively. The control knots of a surface were partitioned into P/p uniform intervals in u direction and Q/q uniform intervals in v direction. Thus the surface was converted into m = P/p * Q/q blocks. The distance between two corresponding blocks on a sample surface i and the inscribed surface, namely S(i), can then be constructed with two parts, namely S1(i) and S2(i), that reflect macro (size) and micro (shape) differences respectively. S1(i) can be calculated as, ni
S1 (i ) = ∑ dis ( pi , j ) , i=1, 2, 3,…, m
(1)
j =1
where pi,j is the jth point, and ni represents the number of points falling into the ith block, and Euclidean distance was used to calculate the distance, dis (pi,j), between two corresponding points on the sample and the inscribed surfaces. S2(i) can be calculated as, ni
S2 (i ) = ∑ dis ( pi , j ) − j =1
S1 (i ) , i=1, 2, 3, …, m ni
(2)
66
J. Niu, Z. Li, and S. Xu
S2 describes the shape variation in the corresponding local areas of two 3D surfaces. Different local areas on a surface can have different shape and geometry characteristics; therefore they contribute differently in the whole shape dissimilarity between a sample surface and the inscribed surface. For example, the geometry of the nose is irregular, so the geometric dissimilarity may play an important role in the total shape dissimilarity between two noses. While the geometry of the upper head is very smooth and quite similar, so the size dissimilarity may play a dominant role in the total shape dissimilarity between two upper heads. By the above method, the shape of a surface can be characterized by a vector (S1(1), S2(1), S1(2), S2 (2),…, S1(m), S2(m)). The vectors are the input of the following k-means clustering algorithm. 2.2 Comparison between Different Block Division Numbers In the case study, each head surface was divided into 20 (5X4), 30 (6X5), and 90 (16X6) blocks, respectively. Then k-means clustering was applied to the block distance-based vectors with different block division number. In this case study, the number of K for the clustering was set as seven. An evaluation of the influence of the block division number on the clustering results is demonstrated by using One-way ANOVA on the block distance based vectors. The representative head sample of each cluster is obtained first by calculating the average coordinates of the points of the head samples belonging to the cluster. Then the distance between a sample surface and the representative surface can be constructed for S1' and
S2' , respectively. As a prerequisite step of ANOVA, the first examination is whether the S1' and S2' display a normal distribution. Tests for normality are conducted on all
S1' and S2' values using the One-Sample Kolmogorov-Smirnov Test. Another prerequisite step of ANOVA Levene test is to test the homogeneity-of-variance of the variables. Finally, multiple comparisons of means between different block divisions were conducted by using One-way ANOVA.
3 Results and Discussions 3.1 Block Division with Different Block Numbers The block division results of a head are shown through Fig. 1-3 with block numbers of 20 (5X4), 30 (6X5), and 90 (16X6), respectively. Each block is illustrated with different colors to distinguish from each other. No anatomical correspondence is taken into consideration during the block division. Consequently, as depicted in Fig. 1, it’s not surprising that the nose is divided into one block, while both eyes are divided into two different blocks. With the increase of the block division number, the area covered by each block decreases. That’s to say, more block division number means fewer points falling into each block.
Block Division for 3D Head Shape Clustering
(a) front view
67
(b) bottom view (c) side view
Fig. 1. Block division of a head (20 blocks)
(a) front view
(b) bottom view (c) side view
Fig. 2. Block division of a head (30 blocks)
(a) front view
(b) bottom view (c) side view
Fig. 3. Block division of a head (90 blocks)
3.2
Clustering Results under Different Block Division Numbers
As shown in Fig. 4, when representative surfaces of clusters are merged together, it is easy to acquire a visual image of the size and shape difference from each other.
68
J. Niu, Z. Li, and S. Xu
Block number
Front view
Views Side view
Bottom view
20
30
90
Fig. 4. Different views of the merged average heads of clusters
3.3 Comparison of Results between Different Block Division Numbers Tests for normality of S1' and S2' values showed p values less than 0.05, resulting in rejection of the null hypothesis. Afterwards each S1' and S2' values were transformed into their corresponding natural logarithmic values, denoted as ln S1' and ln S2' respectively. One-Sample Kolmogorov-Smirnov test was conducted on the ln S1' and
ln S2' values, resulting in p values greater than 0.05 (p=0.393 and 0.193 respectively). Levene test on the ln S1' and ln S2' values showed p values of 0.570 and 0.492, respectively. As shown in Table 1, One-way ANOVA results demonstrated p values greater than 0.05. Such results lead to no rejection of the null hypothesis. Thus no significant
Block Division for 3D Head Shape Clustering
69
differences were found between the block division numbers. In other words, this reveals the robustness of the 3D clustering method with block division. Table 1. Multiple comparisons in One-way ANOVA
Dependent Variable
Group I
Group J
Mean Difference (I-J)
Std. Error
Sig.
1 1 2 1 1 2
2 3 3 2 3 3
-.0007 .0156 .0163 .0001 .0004 .0004
.01967 .01967 .01967 .02102 .02102 .02102
.971 .429 .408 .997 .984 .986
ln S1'
ln S2'
Note: Block division numbers of Group 1, 2, and 3 are 20, 30, and 90, respectively.
Cluster membership variation on different block division number of each sample was investigated and summarized in Table 2. When the block number is changed from 20 to 30 and 90, the numbers of samples whose cluster membership has changed are 10 and 86, respectively (sample size is 446). Whereas, the number of samples with changed membership between 30 and 90 blocks is 88. It can be seen that when the block division number changes within a medium range, the difference of clustering results is almost ignorable. However, when the block division number becomes big, such as 90 in this case study, the membership variation turns to big. This can be explained from the definition of S1 and S2 which reflect the local size and shape differences, respectively. When the surface is divided into many blocks, the effect of S2 is weakened, since for each small block the shape variation is small. Instead, when the block division number is small, the distance of each block is averaged over a big surface, thus the effect of S1 is weakened; while the shape variation of a big surface is greater, and the effect of S2 under this situation will be emphasized. Thus too small or great block division number will cause biased consideration of local size and shape. Table 2. Cluster membership change
Cluster ID 1 2 3 4 5 6 7 number of change
Sample size 20 blocks 40 71 66 77 63 23 106 -
30 blocks 39 72 62 78 60 25 110 10
90 blocks 41 81 76 75 68 27 78 86
70
J. Niu, Z. Li, and S. Xu
4 Conclusions This paper is a further study of our previous 3D shape clustering method based on block division technique [19]. Clustering results of three alternatives of the block division number were compared. One-way ANOVA and cluster membership variation results showed the robustness of the block division method for the k-means 3D shape clustering when the block division number changes within a medium range. However, Extreme block division numbers may lead to greater membership variation.
Acknowledgements This study is supported by the National Natural Science Foundation of China (No.70571045).
References 1. Robinette, K.M., Blackwell, S., Daanen, H., Fleming, S., Boehmer, M., Brill, T., Hoeferlin, D., Burnsides, D.: CAESAR, Final Report, Volume I: Summary. AFRL-HE-WP-TR-2002-0169. United States Air Force Research Lab., Human Effectiveness Directorate, Crew System Interface Division, Dayton, Ohio (2002) 2. Bougourd, J., Treleaven, P., Allen, R.M.: The UK national sizing survey using 3d body scanning. In: Proceedings of Eurasia-Tex Conference in association with International Culture Festival, Donghua University, Shanghai, China (March 2004) 3. Isaacs, M.: 3D fit for the future. American Association of Textile Chemists and Colorists Review 5(12), 21–24 (2005) 4. WEAR, http://ovrt.nist.gov/projects/wear/ 5. Whitestone, J.J., Robinette, K.M.: Fitting to maximize performance of HMD systems. In: Melzer, J.E., Moffitt, K.W. (eds.) Head-Mounted Displays: Designing for the User, pp. 175–206. McGraw-Hill, New York (1997) 6. Elliott, M.G.: Methodology for the sizing and design of protective helmets using three-dimensional anthropometric data. Thesis (PhD). Colorado State University, Fort Collins, Colorado, USA (1998) 7. Meunier, P., Tack, D., Ricci, A., Bossi, L., Angel, H.: Helmet accommodation analysis using 3D laser scanning. Applied Ergonomics 31, 361–369 (2000) 8. Mochimaru, M., Kouchi, M.: Proper sizing of spectacle frames based on 3-D digital faces. In: Proceedings of 15th Triennial Congress of the International Ergonomics Association (CD ROM), Seoul, Korea, August 24-29 (2003) 9. Witana, C.P., Feng, J.J., Goonetilleke, R.S.: Dimensional differences for evaluating the quality of footwear fit. Ergonomics 47(12), 1301–1317 (2004) 10. Zhang, B., Molenbroek, J.F.M.: Representation of a human head with bi-cubic B-splines technique based on the laser scanning technique in 3D surface anthropometry. Applied Ergonomics 35, 459–465 (2004) 11. Witana, C.P., Xiong, S.P., Zhao, J.H., Goonetilleke, R.S.: Foot measurements from three-dimensional scans: A comparison and evaluation of different methods. International Journal of Industrial Ergonomics 36, 789–807 (2006) 12. Hsiao, H.W., Whitestone, J., Kau, T.Y.: Evaluation of Fall Arrest Harness Sizing Schemes. Human Factors 49(3), 447–464 (2007)
Block Division for 3D Head Shape Clustering
71
13. Lee, H.Y., Hong, K.H.: Optimal brassiere wire based on the 3D anthropometric measurements of under breast curve. Applied Ergonomics 38, 377–384 (2007) 14. Rogers, M.S., Barr, A.B., Kasemsontitum, B., Rempel, D.M.: A three-dimensional anthropometric solid model of the hand based on landmark measurements. Ergonomics 51(4), 511–526 (2008) 15. Gouvali, M.K., Boudolos, K.: Match between school furniture dimensions and children’s anthropometry. Applied Ergonomics 37, 765–773 (2006) 16. Li, Z.Z.: Anthropometric Topography. In: Karwowski, W. (ed.) The 2nd edition of the International Encyclopedia of Ergonomics and Human Factors, pp. 265–269. Taylor and Fancis, London (2006) 17. Goonetilleke, R.S., Luximon, A., Tsui, K.L.: The Quality of Footwear Fit: What we know, don’t know and should know. In: Proceedings of the Human Factors and Ergonomics Society Conference, San Diego, CA, vol. 2, pp. 515–518 (2000) 18. Goonetilleke, R.S., Luximon, A.: Designing for Comfort: A Footwear Application. In: Das, B., Karwowski, W., Mondelo, P., Mattila, M. (eds.) Proceedings of the Computer-Aided Ergonomics and Safety Conference (Plenary Session, CD-ROM), Maui, Hawaii, July 28-August 2 (2001) 19. Niu, J.W., Li, Z.Z., Salvendy, G.: Multi-resolution shape description and clustering of three-dimensional head data. Ergonomics (in press) 20. Chen, X., Shi, M.W., Zhou, H., Wang, X.T., Zhou, G.T.: The “standard head” for sizing military helmets based on computerized tomography and the headform sizing algorithm (in Chinese). Acta Armamentarii 23(4), 476–480 (2002)
Joint Coupling for Human Shoulder Complex Jingzhou (James) Yang1, Xuemei Feng2, Joo H. Kim3, Yujiang Xiang3, and Sudhakar Rajulu4 1
Department of Mechanical Engineering Texas Tech University, Lubbock, TX79409, USA 2 Wuhan University of Technology, Wuhan, Hubei, China 3 Center for Computer-Aided Design, University of Iowa, Iowa City, USA 4 NASA Johnson Space Center, Houston, TX77058, USA
[email protected]
Abstract. In this paper, we present an inverse kinamtics method to determining human shoulder joint motion coupling relationship based on experimental data in the literature. The joint coupling relationship is available in the literature, but it is an Euler–angle-based relationship. This work focuses on transferring Eulerangle-based coupling equations into a relationship based on the DenavitHartenberg (DH) method. We use analytical inverse kinematics to achieve the transferring. Euler angles are obtained for static positions with intervals of 15 degrees, and the elevation angle of the arm varied between 0 and 120 degrees. For a specific posture, we can choose points on clavicle, scapula, and humerus and represent the end-effector positions based on Euler angles or DH method. For both systems, the end-effectors have the same Cartesian positions. Solving these equations related to end-effector positions yields DH joint angles for that posture. The new joint motion coupling relationship is obtained by polynomial and cosine fitting of the DH joint angles for all different postures. Keywords: Keywords: Human shoulder; joint motion coupling; joint limit coupling; shoulder rhythm; Euler angles; DH method.
1 Introduction Human shoulder complex consists of three bones—the clavicle, scapula, and humerus—and more than 20 muscles. The shoulder complex model is the key to correctly simulating human posture and motion. So far, various kinds of kinematics shoulder models are available and are based on various methods. Among those methods, the Denavit-Hartenberg (DH) method is an effective way to control the digital human movement in the virtual simulation field [18]. In the literature, two categories can be found: open-loop chain systems and closed-loop chain systems [3]. There are different types of models within each category. Also, we proposed a closed-loop chain model [4] for the shoulder complex. This model is high–fidelity, and the digital human system operates in real-time. To correctly model the movement of the human shoulder complex, a high-fidelity kinematic model is not enough; a phenomenon called shoulder rhythm should be considered. Shoulder rhythm includes joint motion coupling and joint limit coupling. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 72–81, 2009. © Springer-Verlag Berlin Heidelberg 2009
Joint Coupling for Human Shoulder Complex
73
Joint limit coupling has been investigated by Lenarcic and Umek [13], Klopcar and Lenarcic [10, 11], Klopcar et al., [12], and Lenarcic and Klopcar [14]. Joint motion coupling was obtained using an experiment [8]; however, this relationship is Eulerbased, and we cannot use it in the DH-based digital human environment. In previous work [4], we proposed one method for transferring Euler-based coupling equations into DH-based relationships based on a shoulder model in Virtools®. That method is tedious and depends on the model in Virtools. This paper presents an analytical inverse kinematics based method for mapping between the two systems. We first summarize the new proposed shoulder complex model. Next, we briefly discuss the joint coupling equations depicted in Euler angle. For an end-effector position, we can define it as a function of Euler angles or DH joint angles. Solving this analytical inverse kinematic problem, we can obtain a set of DH joint angles. Repeating this procedure for all end-effector positions yields different sets of data. Then these separate data are fitted into a set of functional equations accordingly. Finally, we plot the coupling equations based on polynomial and cosine fitting and compare the different fitting results.
2 Shoulder Kinematic Model In previous section, we summarized that there are open-loop chain and closed-chain systems. Within the first one, there are five different models (the 5-DOF models I and II, the 6-DOF model, the 7-DOF model, and the 9-DOF model). For a closed-loop chain, several models are also available [3]. We propose a new shoulder model that has 8 DOFs: (1) two revolute joints ( q1 , q 2 ) in the sternoclavicular (SC) joint, denoting clavicle vertical rotation and horizontal rotation; (2) three revolute joints ( q3 ,
q4 , q5 ) in the acromioclavicular (AC) joint, denoting rotations in three orthogonal directions of the scapula movement with respect to the local frame; (3) three revolute joints ( q6 , q7 , q8 ) in the glenohumeral (GH) joint, denoting the movement of the humerus with respect to the local frame.
3 Euler-Angle-Based Joint Coupling Equations In the past two decades, much research has been done on shoulder complex motion because shoulder pain constitutes a large portion of musculoskeletal disorders in the general population as well as among industrial workers. Shoulder rhythm is one important characteristic to study. Different approaches have been used for this study [1, 2, 6, 7, 8, 16, 17 and 19]. Among these, Hogfors et al. [8] obtained ideal shoulder rhythm solutions. In the study by Hogfors et al. [8], three healthy, right-handed male volunteers (mean age, 24 yr; mean body mass, 70 kg; mean stature 183 cm) were used. They also used numerical evaluation of low roentgen stereophotogrammetric motion pictures of subjects with radiation dense implantations in the bones. Interpolation between measured positions makes it possible to simulate shoulder motions within the normal working range. In the experiment, the orientation angles of the
74
J. Yang et al.
Fig. 1. Shoulder kinematic model
Fig. 2. The elevation angle experiment
θ
used in the
scapula bone and the clavicle bone were measured when the arm was elevated at the angle θ in the scapular plane, which has 45-degree angle with respect to the coronal plane in Figure 2, where θ varied between 0 and 120 degrees. Three bones’ (clavicle, scapula, and humerus) body-fixed coordinate frames are used to define the orientations of the bones in Figure 3. One global coordinate system is attached on the sternum. The global frame ( x, y, z ) and the clavicle local frame
( x1 , x2 , x3 ) have the same origin is located at point
Ω . The origin of the scapula local frame (ξ1 , ξ 2 , ξ3 )
Ω . The origin of the humerus frame (κ1 , κ 2 , κ 3 ) is at point Ω h . s
Fig. 3. Coordinate systems for shoulder bones [8]
The Euler angle system α , −β , and γ shown in Figure 4 was used to depict the movement of the shoulder bones. The transformation matrix is defined by REul = RX (γ ) RY (− β ) RZ (α ) , where ⎛ cos(α ) − sin(α ) 0 ⎞ ⎜ ⎟ Rz (α ) = ⎜ sin(α ) cos(α ) 0 ⎟ , ⎜ 0 0 1 ⎟⎠ ⎝
0 ⎛1 ⎛ cos( β ) 0 − sin( β ) ⎞ ⎜ ⎟ R (γ ) = ⎜ 0 cos(γ ) Ry (− β ) = ⎜ 0 1 0 ⎟ x ⎜ ⎜ 0 sin(γ ) ⎜ sin( β ) 0 cos( β ) ⎟ ⎝ ⎝ ⎠
0 ⎞ ⎟ − sin(γ ) ⎟ , cos(γ ) ⎟⎠
while the XYZ frame can be any one of the body-fixed frames for the three bones.
Joint Coupling for Human Shoulder Complex
75
The interpolation results from the experimental data are shown in Eqs. 1-3. The indices h, c, and s for α , −β , and γ stand for humerus, clavicula, and scapula, respectively. In these equations, αh and βh are independent variables.
αh
varies from -100
to 900, and β h = −900 + θ varies within 90-300. The angle γ h in the equations refers to the neutral rotation angle of the upper arm [9]. Humerus: (1) γ h = −45 + α h [1 − ( βh + 90) / 360] + (135 − α h /1.1)sin(0.5( βh + 90)(1 + α h / 90)) ⎧α c = −50 + 30cos[0.75( β h + 90)] Clavicle: ⎪⎨ β c = 24{1 − cos[0.75( β h + 90)]}(0.5 + α h / 90) + 9 (2) ⎪γ = 15{1 − cos[0.75( β + 90)]} + 3 ⎩ c h ⎧α s = 200 + 20 cos[0.75( β h + 90)]
Scapula: ⎪⎨ β s = −140 + 94 cos[0.75( β h + 90)(1 − γ h / 270)]
(3)
⎪γ = 82 + 8cos{(α + 10)sin[0.75( β + 90)]} h h ⎩ s
Fig. 4. Euler angles
4 Transferring Joint Coupling Equations from Euler System to DH System In the above section, we review the coupling equations obtained by experiments in Hogfors et al. [8]. However, it is difficult to use these results directly in digital human models using DH representation instead of Euler angles. This section presents the methodology of transferring these equations from the Euler to the DH system, data generation, data fitting, and discussion about the results from different fitting techniques.
5 Methodology for Transferring the Joint Coupling Equations The principle of transferring coupling equations from the Euler to the DH system is that the same posture can be represented by different orientation representation systems, i.e., we can represent the same posture with Euler’s angles, DH joint angles, Euler parameters, etc. The procedure for transferring the coupling equations shown in Figure 5 entails data generation and equation fitting. Within data generation, there are three steps: (1) selecting key postures based on the Euler system; (3) choosing points on clavicle and humerus AC and GH joint as the end-effectors and form equations
76
J. Yang et al. Data Generation Select key postures based on Euler angle system
Choose points on Clavicle and Humerus as end-effectors and form equations
Solve equations to obtain DH joint angles
Equations Determination Fit the joint angles into functional equations
Fig. 5. Methodology for transferring joint coupling equations
(left hand side is the end-effector by Euler system, and right hand side is by DH system); and (3) solving these equations to obtain DH joint angles. 5.1 Data Generation The global coordinate system is the same for both Euler and DH system. However, DH local frames are different from Euler’s frames. Figs. 6 and 7 show the postures corresponding to zero Euler angles and DH angles, respectively.
Fig. 6. Zero Euler angles
Fig. 7. Zero DH joint angles
In this section, we use one example to illustrate the detailed procedure to determine DH joint angles by analytical inverse kinematics method. When we choose
α h = 45o
and
β h = −45o , then, from Eqs. 1-3, all Euler angles are calculated and
shown in Table 1. The posture is shown in Figure 8. Table 1. Euler angles (in degrees)
αh
βh
4 5
45
γh
αc
46.6 491
25.0559
βc 13.0 447
γc 5.52 796
αs 216. 629
βs 56.9405
γs 88. 889
Considering AC joint center for both systems, one has the following equations:
Joint Coupling for Human Shoulder Complex
77
0 0 Fig. 8. The shoulder posture when α h = 45 , β h = −45
⎡ ⎛ L2 ⎞⎤ ⎡0⎤ ⎢ ⎜ ⎟⎥ ⎢ ⎥ ⎢ R (α c , β c , γ c ) ⋅ ⎜ 0 ⎟ ⎥ = T 0 (q ) ⋅ T 1 ( q ) ⋅ ⎢ 0 ⎥ ⎜ 0 ⎟⎥ 1 1 2 2 ⎢0⎥ ⎢ ⎝ ⎠ ⎢ ⎥ ⎢ ⎥ 1 ⎣1 ⎦ ⎣⎢ ⎦⎥
(4)
Bringing in all necessary terms in Eq. (4) yields
sin(q2 ) = cos( β c ) sin(α c )
(5)
sin( β c ) = − cos(q2 )sin(q1 )
(6)
⎧ q1 = 0.2504 ⎧ q1 = −0.2504 ⎨ Solving Eqs. (5) and (6), one obtains ⎩q2 = 2.7163 and ⎨ ⎩q2 = −0.4253 (radians). One can bring in these solutions into Figure 8 to select the correct solution
⎧ q1 = −0.2504 ⎨ ⎩q2 = −0.4253 . Choosing GH joint center for both systems, one has ⎡ ⎤ ⎛ L2 ⎞ ⎡0⎤ ⎢ ⎜ ⎟ Local ⎥ ⎢ ⎥ ⎢ R(α c , β c , γ c ) ⋅ ⎜ 0 ⎟ + R (α s , β s , γ s ) ⋅ VGH ⎥ = T 0 (q ) ⋅ T 1 (q ) ⋅ T 2 (q ) ⋅ T 3 (q ) ⋅ T 4 (q ) ⋅ ⎢ 0 ⎥ 1 1 2 2 3 3 4 4 5 5 ⎜ 0 ⎟ ⎢ ⎥ ⎢ L4 ⎥ ⎝ ⎠ ⎢ ⎥ ⎢ ⎥ 1 ⎢⎣ ⎥⎦ ⎣1⎦
(7)
Local where VGH = [ 0.9463 −0.9463 0.4145] is the GH joint center position corresT
ponding to the scapula fixed frame. Solving the first two equations in Eq.(7), one q3 = 0.2394 and q4 = −2.9030 , obtains the following possible solutions: q3 = −2.9234 and q4 = −2.6982 , q3 = 0.2176 and q4 = −0.4434 , or
q3 = −2.9022 and q4 = −0.2386 . This is a redundant problem and we bring all possible solutions in Fig. 8 and the correct solution is q3 = 0.2176 and q4 = −0.4434 . Solving the third equation in Eq.(7) yields q5 = 0.2895 .
78
J. Yang et al.
Similarly we choose a point on the humerus and have the following equations: ⎡ ⎤ ⎛ L2 ⎞ ⎡ −1⎤ ⎢ ⎜ ⎟ Local Local ⎥ ⎢0⎥ R ( , , ) 0 R ( , , ) V R ( , , ) V α β γ α β γ α β γ ⋅ + ⋅ + ⋅ c c c s s s GH h h h HUM ⎥ 0 ⎢ ⎜ ⎟ ⎢ ⎥ T ( q , q , q , q , q , q , q , q ) = ⋅ 8 1 2 3 4 5 6 7 8 ⎜ ⎟ ⎢ ⎥ ⎢1⎥ ⎝ 0 ⎠ ⎢ ⎥ ⎢ ⎥ 1 ⎣1⎦ ⎣⎢ ⎦⎥
(8)
Solve these equations and check the postures in Fig. 8. The final correct solutions q = 0.8612 q7 = 0.6735 q = 0.2528 are 6 , , and 8 . Transferring all radians to degrees, one gets the DH joint angles in Table 2.
Table 2. DH joint angles for
q1
q2
14.3469
q3
24.3679
12.4 676
α h = 450 , β h = −450 (in degrees)
q4 25.4049
q5
q6
q7
q8
16.5 871
49.3 427
38.5 889
14.4 844
Similarly, we can find all DH joint angles for postures with respect to 0
0
10 to 90 and
βh
αh
within -
0
in -90-30 .
5.2 Data Fitting In the above data generation, we find that the joint angle
q8 does not affect the posi-
tions and orientation of the clavicle and scapula bones. Therefore, the joint angle independent. Joint angles
q8 is
q1 to q5 are functions of q6 and q7 . Based on data from
the above section, we can use functional fitting to build up the coupling equations. There are different functions that can be fitted based on the same set of data. In this study, we choose polynomial, cosine series, and sigmoid functions as the coupling equations. Then we compare these functions to summarize the pros and cons. We use Mathematica® to obtain the fitting functions. The final fitting functions are denoted as follows; the unit is radians. i. Polynomial functions q1 = -0.056496q73 - 0.020185q63 - 0.197370q7 2 q6 + 0.188814q7 q6 2 + 0.094692q7 2 0.265677q6 2 - 0.025195q7 q6 - 0.090610q7 +0.587757q6 - 0.481600 q2 = -0.203064q73 - 0.008258q63 + 0.160842q7 2 q6 + 0.085135q7 q6 2 + 0.312579q7 2 0.110429q6 2 - 0.722626q7 q6 +0.179235q7 + 0.753456q6 - 0.877683
q3 = -0.010575q73 + 0.001193q63 + 0.269967q7 2 q6 - 0.005200q7 q6 2 - 0.142453q7 2 + 0.016321q6 2 - 0.707116q7 q6 +0.613453q7 + 0.440594q6 - 0.228151
Joint Coupling for Human Shoulder Complex
79
q4 = 0.015404q73 + 0.008550q63 - 0.308488q7 2 q6 - 0.099259q7 q6 2 + 0.375770q7 2 + 0.134428q6 2 + 1.122621q7 q6 -1.082992q7 -1.032156q6 + 0.400115 q5 = 0.019663q7 3 - 0.004599q63 - 0.047076q7 2 q6 + 0.061801q7 q6 2 +0.021243q7 2 0.086331q6 2 + 0.095226q7 q6 - 0.080383q7 -0.006919q6 + 0.328476
(9)
ii. Fourier series functions q1 = - 0.638284 + 0.095253cos(q7 ) - 0.076156 cos(q6 ) + 0.002803sin(q7 ) - 0.746368sin(q6 ) 0.101693cos(2q7 ) + 0.024957 cos(2q6 ) - 0.168160sin(2q7 ) + 0.017257 sin(2q6 ) + 0.287427 cos( q7 ) cos(q6 ) + 1.085183cos( q7 ) sin(q6 ) + 0.045037 sin(q7 ) cos(q6 ) + 0.730106sin(q7 )sin(q6 ) q2 = -1.740112 + 0.010163cos(q7 ) + 0.998839cos(q6 ) + 1.212096sin(q7 ) + 0.955282sin(q6 ) + 0.050952 cos(2q7 ) - 0.005597 cos(2q6 ) + 0.126617sin(2q7 ) + 0.015509sin(2q6 ) 0.332707 cos(q7 ) cos(q6 ) + 0.125642cos(q7 ) sin(q6 ) - 0.973988sin(q7 ) cos(q6 ) 0.949188sin(q7 )sin(q6 ) q3 = -0.568506 - 0.005966 cos(q7 ) + 0.314923cos( q6 ) + 1.087587 sin( q7 ) + 0.896371sin( q6 ) + 0.161975cos(2q7 ) + 0.004556 cos(2q6 ) + 0.010620sin(2q7 ) + 0.002634sin(2q6 ) 0.211634 cos(q7 ) cos(q6 ) - 0.279816 cos(q7 )sin(q6 ) - 0.319026sin(q7 ) cos(q6 ) 0.889261sin(q7 )sin(q6 ) q4 = -0.029221 + 0.659483cos(q7 ) - 0.087511cos(q6 ) - 0.453476sin( q7 ) - 0.870285sin(q6 ) 0.109823cos(2q7 ) - 0.005751cos(2q6 ) - 0.270092sin(2q7 ) + 0.001511sin(2q6 ) 0.026523cos(q7 ) cos( q6 ) - 0.190386 cos(q7 ) sin(q6 ) + 0.103925sin( q7 ) cos(q6 ) + 0.851738sin(q7 ) sin( q6 ) q5 = 0.012476 + 0.328949 cos(q7 ) - 0.198714 cos( q6 ) + 0.335480sin(q7 ) - 0.310143sin(q6 ) 0.012544 cos(2q7 ) + 0.019457 cos(2q6 ) - 0.269965sin(2q7 ) + 0.004378sin(2q6 ) + 0.200568cos(q7 ) cos( q6 ) + 0.193417 cos(q7 )sin(q6 ) + 0.158652sin(q7 ) cos(q6 ) + 0.314105sin( q7 ) sin( q6 )
(10)
5.3 Discussion By means of data fitting (or regression), the joint coupling equations in the DH system have been obtained in the above section. However, different fitting functions have their own characteristics, which are summarized in this section. During the data regression process, two criteria were used. They are coefficient of determination R2, and maximal absolute residuals. The regression criteria values are presented in Tables 3 and 4. According to statistics theory, a larger R2 means a better model. Most R2 in Table 3 are larger than 0.90. The maximal absolute residuals are different for different joints. Smoothness is another factor to consider when choosing regression functions because human shoulder joints should not have any jerk during motion. From Eqs. 1-3, all functions are smooth. Joint limits calculated from Eqs. 1-3 are the last factor to be considered for regression functions. Bringing joint limits for q6 and q7 into these equations yields a
80
J. Yang et al.
possible range of motions for the joints
q1 to q5 . These values should be finite and
within the given range of motions for each joint. Based on all the factors mentioned above, the Fourier series functions are the best choice for the final coupling equations in the DH system. Table 3. Coefficient of Determination R2
Joint angle q1
Polyno. function 0.964169
Trigon. function 0.985975
Fourier function 0.997812
q2
0.964169
0.985975
0.997812
q3
0.993352
0.996666
0.998945
q4
0.991618
0.996725
0.998068
q5
0.900623
0.911096
0.94599
Table 4. Maximum of absolute residuals in regression equations
Joint angle
Polyno. function
Trigo. function
Fourier function
q1
0.239784
0.0791158
0.0724016
q2
0.187342
0.109164
0.0640427
q3
0.064795
0.0363415
0.0307189
q4
0.141521
0.0529489
0.0413061
q5
0.063625
0.0606651
0.0733895
6 Conclusions This paper presents an analytical inverse kinematics method for transferring coupling equations from the Euler system to the DH system. This method is based on the principle that one posture can be depicted by different rotation representation systems. Key postures from the Euler system were used to obtain the DH joint angles. A shoulder kinematic model was set up in Virtools to eliminate wrong postures, a data fitting technique was implemented, and several types of regression functions were constructed and compared. Fourier series functions are the ideal solutions for these coupling equations based on regression criteria, smoothness, and calculated joint ranges of motion. The original coupling equations in the Euler system were from experiments, and the hypothesis is that they do not depend on anthropometry. That means these equations generally represent the shoulder rhythm for humans from all percentiles. However, when we transferred these equations in the Euler system to equations in the DH system, specific link lengths were used. If a different set of link lengths were used, then we would get a different set of coupling equations. They would not be significantly different, however, because the shoulder rhythm is similar for humans of all
Joint Coupling for Human Shoulder Complex
81
percentiles [8]. Therefore, these transferred coupling equations in DH system can be approximately used for a human from any percentile.
References 1. Bao, H., Willems, P.Y.: On the kinematic modeling and the parameter estimation of the human shoulder. Journal of Biomechanics 32, 943–950 (1999) 2. Berthonnaud, E., Herzberg, G., Zhao, K.D., An, K.N., Dimnet, J.: Three-dimensional in vivo displacements of the shoulder complex from biplanar radiography. Surg. Radiol. Anat. 27, 214–222 (2005) 3. Feng, X., Yang, J., Abdel-Malek, K.: Survey of Biomechanical Models for Human Shoulder Complex. In: Proceedings of SAE Digital Human Modeling for Design and, Pittsburgh, PA, June 14-16, 2008 (2008a) 4. Feng, X., Yang, J., Abdel-Malek, K.: On the Determination of Joint Coupling for Human Shoulder Complex. In: Proceedings of SAE Digital Human Modeling for Design and, Pittsburgh, PA, June 14-16, 2008 (2008b) 5. de Groot, J.H., Valstar, E.R., Arwert, H.J.: Velocity effects on the scapula-humeral rhythm. Clinical Biomechanics 13, 593–602 (1998) 6. de Groot, J.H., Brand, R.: A three-dimensional regression model of the shoulder rhythm. Clinical Biomechanics 16(9), 735–743 (2001) 7. Herda, L., Urtasun, R., Fua, P., Hanson, A.: Automatic determination of shoulder joint limits using quaternion field boundaries. International Journal of Robotics Research 22(6), 419–436 (2003) 8. Hogfors, C., Peterson, B., Sigholm, G., Herberts, P.: Biomechanical model of the human shoulder joint-II. The shoulder rhythm. J. Biomechanics. 24(8), 699–709 (1991) 9. Karlsson, D., Peterson, B.: Towards a model for force predictions in the human shoulder. J. Biomechanics. 25(2), 189–199 (1992) 10. Klopcar, N., Lenarcic, J.: Kinematic model for determination of human arm reachable workspace. Meccanica 40, 203–219 (2005) 11. Klopcar, N., Lenarcic, J.: Bilateral and unilateral shoulder girdle kinematics during humeral elevation. Clinical Biomechanics 21, S20–S26 (2006) 12. Klopcar, N., Tomsic, M., Lenarcic, J.: A kinematic model of the shoulder complex to evaluate the arm-reachable workspace. Journal of Biomechanics 40, 86–91 (2007) 13. Lenarcic, J., Umek, A.: Simple model of human arm reachable workspace. IEEE Transaction on Systems, Man, and Cybernetics 6, 1239–1246 (1994) 14. Lenarcic, J., Klopcar, N.: Positional kinematics of humanoid arms. Robotica 24, 105–112 (2006) 15. Maurel, W.: 3D modeling of the human upper limb including the biomechanics of joints, muscles and soft tissues. PhD Thesis, Lausanne, EPFL (1995) 16. Moeslund, T.B.: Modeling the human arm. Technical report at Laboratory of Computer Vision and Media Technology, Aalborg University, Denmark (2002) 17. Rundquist, P.J., Anderson, D.D., Guanche, C.A., Ludewig, P.M.: Shoulder kinematics in subjects with frozen shoulder. Arch. Phys Med. Rehabil. 84, 1473–1479 (2003) 18. Sciavicco, L., Siciliano, B.: Modeling and control of robot manipulators. The McGrawHill Companies, Inc., New York (1996) 19. Six Dijkstra, W.M.C., Veeger, H.E.J., van der Woude, L.H.V.: Scapular resting orientation and scapula-humeral rhythm in paraplegic and able-bodied male. In: Proceedings of the First Conference of the ISG, pp. 47–51 (1997)
Development of a Kinematic Hand Model for Study and Design of Hose Installation Thomas J. Armstrong1, Christopher Best1, Sungchan Bae1, Jaewon Choi1, D. Christian Grieshaber2, Daewoo Park1, Charles Woolley1, and Wei Zhou1 1
Center for Ergonomics, University of Michigan, Ann Arbor, MI 48109 Department of Health Sciences, Illinois State University, Normal, IL
2
Abstract. Kinematic hand models can be used to predict where workers will place their fingers on work objects and the space required by the hand. Hand postures can be used to predict hand strength. Kinematic models also can be used to predict tissue stresses and to study work-related musculoskeletal disorders. Study and design of manual hose installation is an important application for kinematic hand models. Hoses are widely used in many mechanical systems such as autos, aircraft and home appliance, which are all mass-produced on assembly lines. Studies of automobile assembly jobs show that hose installations are one of the most physically demanding jobs that workers perform. Hoses are a good starting point for kinematic model development because they can be characterized as simple cylinders. Keywords: Hands, kinematic model, manufacturing.
1 Introduction Manual work continues to be a vital part of our industrial economy. People have many advantages over machines: they are able to compensate for subtle material and process variations; they can quickly learn to perform different jobs in an agile production process; and they don’t require huge upfront capital investments. However, people, like machines, have operating limits and constraints. Job demands must not exceed their strength capacity and sufficient space must be provided to reach for and grasp work objects. Production hose installation is an example of a job that is routinely performed by hand. The external size and shape of hoses often varies slightly from one hose to another. Hoses are often jointed to a flange in confined and obstructed workspaces. Studies by Ebersole and Armstrong [1 and 2] showed that manual hose installation is one of the most demanding auto assembly jobs that workers perform. Static anthropometric data, such as hand length, width and thickness cannot be applied directly to determine if there is sufficient room for the hand. Static anthropometric data, however, can be used with kinematic models to predict possible grip postures and how much space will be occupied by the hand in those postures. Additionally posture can be used to be used to predict hand strength [3]. Kinematic models also can be used to estimate tendon excursions and loads associated with reaching and grasping and to study risk of musculoskeletal disorders in the wrist [4]. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 85–94, 2009. © Springer-Verlag Berlin Heidelberg 2009
86
T.J. Armstrong et al.
This paper describes the development of a kinematic model for studying and designing manual hose installation tasks. Although the main focus of this work was on hoses, the resulting model has potential applications to tasks that involve gripping of other parts and tools.
2 Methodology 2.1 The Link System The link system used for development of this model was based on that developed by Buchholz, et al. [5]. They studied the relationship between segment lengths and hand lengths. Planner radiographs were obtained for series of joint angles from straight to fully flexed. Reuleaux’s method was used to determine the joint centers. Segment lengths were then computed as the distances between successive joint centers. Segment lengths were found to be highly correlated with hand lengths. Figure 1 shows segments and the coefficients used for computing their length.
Segment 1 2 3 4
1 0.118 0.251 0.196 0.158
2 0.463 0.245 0.143 0.097
Digit 3 0.446 0.266 0.170 0.108
4 0.421 0.244 0.165 0.107
5 0.414 0.204 0.117 0.093
Fig. 1. Relative link lengths from Buchholz, Armstrong, Goldstein [5]
2.2 Hand and Object Surfaces Buchholz and Armstrong [6] in 1992 proposed a series of ellipsoids that were scaled on segment lengths, widths and thicknesses to give the model hand shape. Use of geometric objects made it possible to detect contact among hand segments and external objects in a virtual environment. Model manipulations were quite slow on
Development of a Kinematic Hand Model for Study and Design of Hose Installation
87
Fig. 2. Choi and Armstrong (7) used arrays of points based on hand segment sizes and truncated cones to depict hand surfaces (left). The surfaces filled in using Open GL (right).
computers at that time. Recently Choi and Armstrong [7] utilized arrays of equally spaced points based on segment sizes and truncated cones to depict hand surfaces (see Fig. 2). 2.3 The Graphical User Interface and Manipulation of Model A graphical user interface was designed to facilitate manipulation of the model (see Fig. 3). The program will compute segment sizes for a given percentile hand length using the Buchholz coefficients or the user can specify desired hand sizes. The program also provides a selection of standard object shapes that can be scaled to desired sizes in each of the three dimensions. The user can also place the object at desired locations and orientations. For example, a hose would be represented as a cylinder and could be placed at a right angle to the hand, parallel to the hand, or something in between. The user also has the option of entering other objects as arrays of surface points. The joint angles can be manipulated manually by entering angles. Positioning the fingers on a work surface is a tedious process and the results will vary from one user to another. Still, it is possible to get an idea of how well an object fits the hand, where the fingers might touch the work object and what kinds of postures are possible. It is helpful if the user has some familiarity with what the worker is trying to do. Grieshaber and Armstrong [3] studied the postures that workers used to install hoses in 113 hose installation tasks in twenty-eight auto assembly tasks. They found that workers are more likely to use a power grip than a pinch grip posture as the ratio of hose diameter to hand length increased and the hose insertion forces increased (see Fig. 4). Some workers still use a power grip posture for small hoses and low forces and some use a pinch grip for large hoses and high forces. Other investigators have studied how people grasp objects with the goal of developing a set of primitives for robots [8]. These primitives also provide a guidance manual for manipulation of
88
T.J. Armstrong et al.
Fig. 3. Graphical user interface used by Choi and Armstrong [7] to manipulate Kinematic Hand Model
Fig. 4. Hand posture versus hose OD/ hand length ratio (left) and finger flexor activity (normalized 0-10) (right)
kinematic models. Grip postures are no doubt affected by other factors, such as access to the flange and the worker’s behavior. Although kinematic models do not force objects, that does not mean that the users of those models cannot. It must be possible for the hand to achieve a static equilibrium with the grip objects, that is:
r
∑F
i
= 0 and
r
∑M
i
= 0.
(1)
This means that if the fingers press on one side of an object and that object is not constrained externally, e.g., work surface railing, etc., then that object must be supported on the opposite side by the thumb or the palm. Skin deformation and friction will help keep objects from slipping out of the hand if the fingers are not exactly aligned on opposite sides of the grip object.
Development of a Kinematic Hand Model for Study and Design of Hose Installation
89
2.4 Posture Prediction Algorithms Some of the work required to manually position the fingers on the surface of a work object can be reduced through the use of posture prediction algorithms. These algorithms either flex the fingers until contact is detected between segments of the fingers and the grip object, or alternatively, they use optimization methods to calculate the best fit between the hand and work object. Buchholz [6] adapted an algorithm from Fleck and Butler [9] to detect contact between ellipsoid representation of the finger segments and the geometric representation of the grip object. The user first specified the geometry of the grip object and its location and orientation with respect to the hand. The posture prediction routine then started with the fist knuckle and rotated that joint until contact occurred between the hand and the work object. The process was then repeated for the second and then the third knuckles. Buchholz reported very good agreement between predicted and observed postures for gripping different sized cylinders perpendicular to the long axis of the hand using a power grip. The need to represent grip objects mathematically, the lack of a graphical user interface, and the slow processors at that time restricted the use of the resulting model. Choi and Armstrong [7] utilized a contact algorithm that computed distances between points representing surfaces of the hand and surfaces of the work object. Although this is computationally intensive, it is within the capacity of most modern desktop computers. A number of studies were performed to evaluate the sensitivity of posture to hand size, skin deformation and cylinder diameter. The contact algorithm made it possible to simulate skin deformation by allowing penetration of the object into the hand (negative object hand distances). Model predictions explained 72% of the observed hand posture variance (R2) for gripping cylinders with diameters between 26 and 114.3 mm. Prediction errors ranged from -16.4º to 18.7º. The model tended to overestimate the third knuckle angles for cylinder sizes (-16.4º ~ -0.4º) and to underestimate first knuckle angles for all cylinder sizes (-2.4º ~ 18.7º). Cylinder size had the most profound effect on finger joint angles. Hand length and width (from small female to large male percentiles) and skin deformation (up to 20% penetration) had only a small effect on joint angle predictions. Subsequent studies examined how predicted joint angles are affected by where the object is placed in the hand and how it is oriented [10]. Hand placement is important especially when posture prediction algorithms are used. If the object is place too close to the wrist, it is possible for the fingers to completely miss the object as the fist closes. If it is placed too close to the fingertips, the hand may not close much at all before contact occurs. Posture predictions were generally pretty consistent as long as they were between the middle of the palm and the first knuckle. Studies of grip behavior, [3; 8; 11; 12], can be used to guide finger placement and determine if the resulting grip postures are feasible. Lee and Zhang [13] proposed an optimization model based on the assumption that the best prehensile configuration of the hand in power grip optimally conforms to the shape of the grip object. Their model simultaneously minimized the distances between joint centers and the surface of the grip object. Their model was tested by comparing predicted and observed postures of twenty subjects gripping vertically oriented cylindrical handles 45 and 50 mm in diameter. Average root mean prediction errors
90
T.J. Armstrong et al.
across all conditions were less than 14 degrees. This optimization routine can be extended to other grip objects, but reformulation of the model would be required if the hand is not in continuous contact with the grip object. The advantage of this model is that it does not require iterations to find the final posture. The disadvantage is that it may have to be reprogrammed for application to grip objects with other shapes or other hand postures. Also, it does not allow the user to easily explore subtle variations in object placement and orientation or finger placements. 2.5 Finger Movements Posture predictions are affected by rotation rates of the finger joints. Figure 5 shows the finger tip trajectories based on rotating one joint at a time versus rotating them together at the same rate. It can be seen that rotating them together reduces the reach area of the fingertip. As a practical matter, people don’t close their fist one joint at a time. Neither do they close them at the same rate. Kamper, et al. [14] studied finger motions for 10 subjects grasping different objects. They observed that the average rate of rotation for the second knuckle was only 26 to 72% that of the first knuckle and that the rate for the third knuckle was only 16 to 36% that of the second knuckle. There were significant variations among fingers and subjects. The actual finger trajectory will probably be somewhere between the two extremes shown in Fig. 5.
Fig. 5. Fingertip trajectories based on rotating joint 1 to its maximum, then joint 2, then joint 3 (solid lines) and based on joints 1, 2 and 3 together at the same rate
One of the challenges to developing accurate finger motions models is that the finger motions are usually combined with movement of the hand towards the work object and starts with opening the hand so that the grip object can pass between the thumb and the fingers [15; 16]. Figure 6 shows hand trajectory and wrist and index finger angles for an average of six subjects reaching for a vertical cylinder. Starting from a relaxed posture (25° Extension and 45°, 30° and 10° Flexion) the wrist extends and the hand opens (Δθ = +8°, -22°, -12° and -5°) before closing (Δθ = +7°, +11°, +10° and +10°). It also can be seen that the trajectory of the wrist is curved. It has been shown that how much the hand opens and how much it closes are related to the
Development of a Kinematic Hand Model for Study and Design of Hose Installation
91
Fig. 6. Wrist trajectory (left) and the first (1), second (2) and third (3) knuckle angles of the index finger and wrist angle (right) for a 40 cm reach to grasp a 15 cm diameter cylinder
size of the object [15; 17; 18; 16; 7]. Models are needed that describe the finger motions as functions of time so that they can be used with kinematic models. 2.6 Required Hand Space 3-D kinematic hand models can be used to predict the hand space requirement for hose placement tasks (see Fig. 7). Hand space requirements were simulated using a 3D kinematic hand model described by Choi, et al. [19] and compared with experimental data reported by Grieshaber and Armstrong [20]. The simulation results showed good agreement with measured data with an average 17% underestimation of hand space envelopes. Simulations showed that pinch grip required an average of 72% larger space than power grip, the rotation method required an average of 26% larger space than the straight method, and a 95% male hand size required an average of 44% larger space than 5% female hand length. The hand space envelope can give useful information to designers and engineers who design workspace and parts to avoid problems of obstruction. Future work will include the addition of modules to the kinematic model interface for capturing hand space data and validating space predictions for a range of different size and shape grip objects. 2.7 Work-Related Musculoskeletal Disorders Another important use of kinematic models is evaluating risk of work related musculoskeletal disorders. Choi and Armstrong [21] conducted a study to examine the relationship between tendon excursion and wrist movements and MSDs (musculoskeletal disorders) of the hand and wrist. Video tapes were obtained from a previous study by Latko, et al. [22] that showed a strong basis between Hand Activity Level and risk of non-specific hand pain, tendonitis and carpal tunnel syndrome. One medium-risk job and two low-risk jobs were selected from an office furniture manufacturing facility. Two high-risk jobs, one medium-risk job, and one low-risk job were selected from a manufacturing site for industrial containers. Two high-risk jobs and one medium-risk job were chosen from a company manufacturing spark plugs. Time-based analyses were performed for the right hand and the wrist as described by Armstrong, et al. [23].
92
T.J. Armstrong et al.
Fig. 7. Space occupied by the hand while inserting a hose using predicted using moments
Tendon excursions of FDP (flexor digitorum profundus) and FDS (flexor digitorum superficialis), projected for one hour, were assessed by using the models developed by Armstrong and Chaffin [24]. Cumulative tendon excursions were computed from angular velocities and peak wrist excursions. First, wrist posture as a function of time, θ(t), can be written as θ(t)=∑ θ 0isin(ωit+Φ) ,
(2)
where θ 0i is peak wrist excursion, ωi is the frequency, Φ is the phase, and t is time. Second, angular velocity, θ’(t), can be calculated as θ’(t)=∑ θ 0iωicos(ωit+Φ) .
(3)
Third, the cumulative tendon excursion is;
∫
T
0
•
r θ (t ) dt = ∫
T
0
∑ rθ
oi
ω i cos(ω i t + φ ) dt
(4)
Where r is the radius of tendon curvature in the wrist, θ’ is the angular velocity of the wrist, and T is work duration of observations. It can be seen that total tendon travel during the work period provides an exposure index that captures frequency, ωi, peak wrist excursion, θ 0i, and work duration, T. Mean velocity and acceleration for wrist flexion-extension and cumulative tendon excursions were significant (p<0.05) across risk groups, and these values corresponded to the risk of MSDs. In future studies we will add excursions due to finger motions in addition to wrist motions and add tendon force to the analysis. Kinematic models can be used to study wrist finger and tendon motions.
Development of a Kinematic Hand Model for Study and Design of Hose Installation
93
3 Conclusions Kinematic models can be used to determine how workers may grasp different work objects, how much space is required for their hand, and the required tendon forces and hand strength. Contact and posture prediction algorithms facilitate use of kinematic models, focus on one best grip and do not automatically capture the variations that may occur among different workers. Many of the studies heretofore focused on grasping cylinders. These studies are particularly relevant to hose installation. Future studies will focus on enhancing models for grasping irregular shape objects, development of models for describing hand and finger motions, risk of musculoskeletal disorders, improving the graphical user interfaces, and integration of models into CAD environments.
Acknowledgement The project was funded in part by joint funds from the UAW-GM National Joint Committee on Health and Safety. The results presented herein represent the conclusions and opinions of the authors. Its publication does not necessarily imply endorsement by the International Union, UAW, or General Motors Corporation.
References 1. Ebersole, M., Armstrong, T.J.: An analysis of task-based self-assessments of force. In: Human Factors and Ergonomics Society 48th Annual Meeting, New Orleans, Louisiana, Human Factors and Ergonomics Society (2004) 2. Ebersole, M., Lau, M., Armstrong, T.J.: Task-based measurement of force in automobile assembly using worker self-assessment, observational analysis and electromyography. Human Factors and Ergonomics Conference, Orlando, HFES (2005) 3. Grieshaber, C., Armstrong T.J.: Systematic characterization of gross hand postures employed during actual work tasks. In: 15th Triennial Congress of the International Ergonomics Association, Seoul, Korea. (2003) 4. Armstrong, T.J., Chaffin, D.B.: An investigation of the relationship between displacements of the finger and wrist joints and the extrinsic finger flexor tendons. J. Biomech. 11(3), 119–128 (1978) 5. Buchholz, B., Armstrong, T.J., Goldstein, S.A.: Anthropometric data for describing the kinematics of the human hand. Ergonomics 35(3), 261–273 (1992) 6. Buchholz, B., Armstrong, T.J.: An ellipsoidal representation of human hand anthropometry. Hum Factors 33(4), 429–441 (1991) 7. Choi, J., Armstrong, T.J.: Examination of a collision detection algorithm for predicting grip posture of small to large cylindrical handles. In: 2006 Digital Human Modeling for Design and Engineering Conference and Exhibition, Lyon, France, SAE (2006) 8. Culkosky, M.R., Howe, R.D.: Human grasp choice and robotic grasp analysis dextrous robot hands, S. T. Venkataraman and T. Iberall. Springer, Heidelberg (1990) 9. Fleck, J.T., Butler, F.E.: Validation of the Crash Victim Simulator, vol. 1: Analytical Formulation. Springfield, VA, National Technical Information Service. 1 (1981)
94
T.J. Armstrong et al.
10. Choi, J.: Developing a 3-Dimensional Kinematic Model of the Hand for Ergonomic Analyses of Hand Posture, Hand Space Envelope, and Tendon Excursion. A Ph.D. dissertation, University of Michigan, Ann Arbor, Michigan (2008) 11. Bae, S., Choi, J., Armstrong, T.J.: Influence of object properties on reaching and grasping tasks. 2008 Digital Human Modeling for Design and Engineering Conference and Exhibition, Pittsburgh, PA, SAE (2008) 12. van Nierop, O.A., van der Helm, A., Overbeeke, K.J., Djajadiningrat, T.J.P.: A natural human hand model. Visual Comput. 24, 31–44 (2008) 13. Lee, S.W., Zhang, X.: Development and evaluation of an optimization-based model for power-grip posture prediction. J. Biomech. 38(8), 1591–1597 (2005) 14. Kamper, D.G., Cruz, E.G., Siegel, M.P.: Stereotypical fingertip trajectories during grasp. J. Neurophysiol. 90(6), 3702–3710 (2003) 15. Jeannerod, M., Prablanc, C.: Visual control of reaching movements in man. Adv. Neurol. 39, 13–29 (1983) 16. Zatsiorsky, V.M., Latash, M.L.: Multifinger prehension: an overview. J. Mot. Behav. 40(5), 446–476 (2008) 17. Lee, J.W., Rim, K.: Measurement of finger joint angles and maximum finger forces during cylinder grip activity. J. Biomed. Eng. 13(2), 152–162 (1991) 18. Buchholz, B., Armstrong, T.J.: A kinematic model of the human hand to evaluate its prehensile capabilities. J. Biomech. 25(2), 149–162 (1992) 19. Choi, J., Grieshaber, C.D., Armstrong, T.J.: Estimation of grasp envelope using a 3dimensional kinematic model of the hand. In: Human Factors and Ergonomics Society 51s Annual Meeting, Baltimore, MD, Human Factors and Ergonomics Society (2007) 20. Grieshaber, D.C., Armstrong, T.J.: The effect of insertion method and required force on hand clearance envelopes during simulated rubber hose insertion tasks. Hum Factors (in press, 2009) 21. Choi, J., Armstrong, T.J.: Assessment of the risk of MSDs using time-based video analysis. In: Sixth International Scientific Conference on Prevention of Work-Related Musculoskeletal Disorders, Boston, USA (2007) 22. Latko, W.A., Armstrong, T.J., Franzblau, A., Ulin, S.S., Werner, R.A., Albers, J.W.: Cross-sectional study of the relationship between repetitive work and the prevalence of upper limb musculoskeletal disorders. Am. J. Ind. Med. 36(2), 248–259 (1999) 23. Armstrong, T.J., Keyserling, W.M., Grieshaber, D.C., Ebersole, M., Lo, E.: Time based job analysis for control of work related musculoskeletal disorders. In: 15th Triennial Congress of the International Ergonomics Association, Seoul, Korea (2003) 24. Armstrong, T.J., Chaffin, D.B.: Some biomechanical aspects of the carpal tunnel. J. Biomech. 12(7), 567–570 (1979)
Generation of Percentile Values for Human Joint Torque Characteristics Florian Engstler and Heiner Bubb Institute of Ergonomics, Technische Universität München, Boltzmannstr. 15, 85747 Garching, Germany
[email protected],
[email protected]
Abstract. This pilot study presents an approach to generate percentile values for joint torque characteristics of digital human models. Detailed angle specific joint torque measurements of few subjects are set in relation to extensive measurements of external maximum forces including percentile values based on many subjects by using muti-body simulation. Results indicate the applicability of the approach but do not generate results of high validity due to some sources of errors along the process. More experiments solving these issues and generating valid results are being planned. Keywords: DHM, joint force, percentile.
1 Introduction Today’s digital human models (DHM) offer an increasingly comprehensive simulation of human characteristics. One aspect currently being addressed by numerous research groups around the world is the integration of human forces allowing for force-based posture- and motion-simulation. Bubb and Fritzsche [1] give an overview of current developments in digital human modeling presenting the different approaches ranging from simply integrating force data for specific tasks at defined postures to detailed simulation of individual muscles in musculoskeletal models like the AnyBody modeling system [2]. We concentrate on an approach at joint level: With the knowledge of internal joint torques it is possible to simulate external forces, postures and motions without the need of having a detailed understanding of the underlying muscle activities (see Seitz [3] and Fritzsche [4]). Generally, the force data used for human modeling is obtained experimentally: Subjects exert maximum forces for single joint degrees of freedom (DOF), which are then converted to joint torque. With joint torque depending on joint angle and force direction, numerous measurements under different postures are necessary making the process extremely time-consuming. For this reason available studies on maximum joint torque are unfortunately based on a very small number of subjects. Nonetheless such data has been integrated e.g. into the DHM Ramsis driving the integrated forcebased posture calculation and the prediction of external forces [3]. As the limited amount of data does not suffice for the application of sound statistical analysis, the dependency e.g. of gender and age are currently calculated on the basis of factors coming from literature [5] or by so-called synthetic distributions [6]. However, there V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 95–104, 2009. © Springer-Verlag Berlin Heidelberg 2009
96
F. Engstler and H. Bubb
is no direct connection between the force-capabilities of the DHM and the actual force distribution in the population like we know it from body dimensions for instance. In short: it remains unclear, how strong the DHM actually is. This shows the necessity for percentile values of human joint torque information. Because of the high effort it is hardly realizable to measure this data on a statistically sufficient number of subjects. On the other hand there are experimental studies on maximum forces with high numbers of subjects. These studies typically measured external forces for certain mostly industry-related tasks like lifting loads, pushing crates, etc. allowing for the calculation of force percentile values. They are, however, missing information on posture and internal joint torque. This paper presents a possible approach to link these two types of experiments allowing for a relation of percentile values to joint torque data.
2 Human Maximum Force Measurements This section gives a short overview on definitions of maximum force used in the scientific community and puts this work into the context. 2.1 Definitions and Measurement Protocols of Maximum Force There are different definitions and measurement protocols for maximum force capabilities of human beings. Force exertion can be static or dynamic and the assessment can be isometric, isokinetic, isoinertial or psychophysical [7, 8]. The vast majority of data in the literature are isometric static forces and our approach relates to this method as well. There are different methods how to measure isometric force concerning the type of force exertion. Kroemer [9] distinguishes between the “plateau-”, “ramp-” and “impulse-method”. When using the plateau-method a constant maximum force has to be applied for about 3 to 5 seconds. One evident problem is that subjects normally won’t apply their maximum possible force if they aren’t used to the experiments. A kind of safety thinking leads them to conservatively distribute their force over the 5 seconds as they don’t know how long they are able to maintain a certain force level. Therefore subjects need to be trained to use this method. With the ramp-method the subject continuously increases the applied force until exhaustion. With this method an authentic maximum force will be measured, however, this force is only applied for a split second and therefore the informative value of the result can be debatable. The impulse-method is not recommendable as the maximum force is supposed to be applied in an instance. As it’s not possible to mobilize maximum force within fractions of a second, the result is no real maximum value. The peak results mainly from the impetus of the motion rather than muscle force. Preliminary tests of Rühmann and Schmidtke [5] argue for the ramp-method (reliability coefficient = 0.98). However, plateau- and ramp-method or slight variations of both are widely used in the literature.
Generation of Percentile Values for Human Joint Torque Characteristics
97
2.2 Measuring External Maximum Force Most of the experimental studies on maximum forces, especially in the past, have been driven by the need for task-related data for the design of specific workplaces, working environments or products. Therefore typical types of force exertion for industrial work or user-product interaction have been studied: lifting loads at different positions, pushing and pulling crates, operating handles, etc. Daams [10] gives an overview of such measurements. The common ground of these studies are the relatively undefined experimental restrictions for the subjects: Although measuring different anthropometries, the place of force application is mostly fixed in space leading to different postures for different anthropometries. Additionally, the subjects have the freedom to take the posture they like in order to fulfill the task (except for some regulations e.g. where to put their feet), but detailed information on the individual subject’s postures are generally missing. However, this kind of experiments is normally done on a high number of subjects. Rühmann and Schmidtke [5] measured maximum external isometric forces for 1113 females and 1967 males for 14 different tasks using the ramp-method (see Fig. 1). Other studies offer comparable sample sizes giving the results high statistic significance (e.g. Glitsch et al. [11], Rohmert et al. [12]). The documented data is mostly demographic information related to force values: force percentiles for gender or age groups for instance. Unfortunately this kind of data can therefore not be directly adopted for the use in DHMs as no information on the subject-internal processes of force exertion is contained.
Fig. 1. Experiments on task specific external forces by Rühmann and Schmidtke [5]. Some lifting tasks were done with one and two hands.
2.3 Measuring Joint-Level Maximum Force In order to simulate forces with DHMs force information at joint level is required. Some measurements for single joints or extremities can be found in the literature (e.g. [13], [14], [15] and [16]), but generally few studies obtained such data and on a very small number of subjects only. This is due to the high effort caused by joint-specific measurements: with joint torque being dependent on the joint angle a very high number of trials become necessary. This is especially true for complex tree-dimensional joints like the shoulder. In course of the EU FP5 project “Realman” (IST-200029357) [17] maximum joint torque had been measured for eight subjects in a period of three years. Subjects applied isometric forces for single joint DOFs which were measured with the plateau-method. The measurements covered most major joints: hip, knee, shoulder, elbow and wrist. The ankle and especially the spine had unfortunately not been measured.
98
F. Engstler and H. Bubb
3 Approach to Percentile Values at Joint-Level Our approach tries to build a link between the detailed joint-level information of individual subjects and the percentile values of large measurements on external forces. The basic idea is to have subjects with known joint-torque properties perform experiments for external forces and relate their joint-torque utilization to percentile values obtained in such experiments. This paper describes the approach based on data from the Realman-project and a recreation of the experiments from Rühmann and Schmidtke. The concept is illustrated in Fig. 2 and will be described in detail below. Joint-level data from Realman-experiments
external maximum force data from recreated experiments from Rühmann and Schmidtke
joint-torque data for major joints at different joint angles Matlab interpolation of joint torque components for each joint DOF calculation of maximum joint torque vector for measured posture (ellipsoid-approach)
posture (obtained with PCMAN)
exerted external force (plateau- / ramp-method)
multi body simulation in Alaska/DYNAMICUS calculated joint torque vector for each joint
relative joint torque for each joint maximum relative joint torque designates the limiting joint for the given task the limiting joint is defined to have the percentile value of the external force
percentile value (data from Rühmann and Schmidtke)
Fig. 2. Concept for generation of percentile values of human joint torque characteristics
3.1 Joint-Level Data This pilot study is based on experiments with one male subject (28 years old, 1.74 m tall and 64.5 kg weight). He was the only subject still available from the Realmanproject with a total of 936 single joint torque values measured which could be reused for this study. The data are tabulated torque values for different orthogonal directions of force exertion at a given joint angle combination. For each shoulder for instance six torque values (flexion, extension, abduction, adduction, internal and external rotation) have been measured for 59 different postures. Firstly, the subject’s torque data for specific joint angles has to be interpolated in order to calculate maximum joint torque values for any joint angle combinations. This is done by a multidimensional polynomial regression in Matlab showing the best
Generation of Percentile Values for Human Joint Torque Characteristics
99
results among the popular mathematical methods [4]. An exemplary formula (1) for elbow flexion (2 rotational DOF) can be seen below (α = elbow flexion angle; γ = forearm pronation angle). (1) When exerting force in any direction different to the six directions measured, the maximum joint torque value has to be calculated. Schäfer [18] and Rothaug [19] described joint torque for a 3 rotational DOF joint at a given joint angle as an unsymmetrical ellipsoid defined by the six measured values with the ellipsoid surface defining the maximum joint torque possible (see Fig. 3). Any vector inside the ellipsoid would be sub-maximum joint torque, a vector touching the surface would be maximum joint torque and any vector penetrating the surface could not be exerted by the subject. For joints with 2 rotational DOF like the elbow the representation would be an ellipse.
Fig. 3. Calculation of maximum joint torque based on an ellipsoid-approach (t1 being sub maximal torque, t2 being the maximal torque and t3 exceeding the maximal torque) [3]
3.2 External Maximum Force We recreated the measurement setup from the experiments done by Rühmann and Schmidtke and extended it by integrating the PCMAN posture measurement system [20]. This video-based software allows for marker-less measuring of the subject’s anthropometry and posture and is compatible to the DHM Ramsis and the multi body modeling system Alaska/DYNAMICUS [21]. The subject performed 14 of the trials measured by Rühmann and Schmidtke. An exemplary trial can be seen in Fig. 4. According to the guidelines on force measurement from Kumar [22] one trial was repeated until the variance of the results was smaller than 10%. Additionally the subject was asked on his impression which joint might have been the limiting one for the trial. As Rühmann and Schmidtke utilized different measurement protocols than the Realman-project, the subject performed every trial twice: once with the ramp-method and a second time with the plateau-method. Comparing the measurements showed the results of the ramp-method equal on average 1.25 times those of the plateau-method (standard deviation 0.15). The data obtained with the ramp-method are only used to
100
F. Engstler and H. Bubb
Fig. 4. Exemplary experiment according to Rühmann and Schmidtke with overlaid PCMAN measurement. The subject had to pull the handle upwards. Height of the handle and distance between right foot and handle had been specified.
assign the subject’s external force to a percentile value coming from Rühmann and Schmidtke. The data from the plateau-method is, together with anthropometry and posture from PCMAN, transferred to the multi-body simulation. 3.3 Multi-body Simulation Given the measured external force, anthropometry and posture the joint torque necessary for exerting the external force can be calculated for each joint using the multibody modeling software Alaska/DYNAMICUS. The software offers an interface to RAMSIS and PCMAN giving the multi-body model the correct anthropometry and dynamic properties (joint segment weight, centre of gravity, etc.). Joint angles from PCMAN need to be transformed to the Alaska/DYNAMICUS joint coordinate system using Excel. The simulation generates one joint torque vector for each joint of the model expressed in the Alaska/DYNAMICUS joint coordinate system. The vectors are then transformed back to PCMAN joint angles (see red arrows in Fig. 5). For the given orientation of the simulated joint torque vector the maximum joint torque vector can be calculated from the Realman measurements using the ellipsoid approach (yellow arrow). The absolute values of the vectors are then compared giving the relative joint torque for each joint. 3.4 Assignment of Percentile Values The joint with the maximum relative load is designated to be the limiting joint for the given force exertion and is therefore responsible for the percentile value of the exerted external force. It is expected that the relative load of the limiting joint is near 100%. Some variance will however be inevitable due to inaccuracies during force- and posturemeasurements. Obviously, percentile values can only be assigned to limiting joints.
Generation of Percentile Values for Human Joint Torque Characteristics
101
Fig. 5. Exemplary calculation of relative joint load based on an external force for elbow and shoulder
The percentile value of the external force can be directly related to the dimensions of the subject’s maximum torque vector. To generate percentile distributions for joint torque additional steps are necessary: We assume the shape of the torque ellipsoid of the limiting joint to be representative for the whole population and thus being constant independently from its size. With this assumption the percentile value is related to the whole torque ellipsoid. The ellipsoid can now be scaled by calculating the torque distribution using mean and variance of the force measurements according to formulae 1-4 (index F indicating force measurements, index T indicating torque calculation). With (1) and
(2)
mean and variance of the joint torque can be calculated by (3) and
.
(4)
4 Results An exemplary result of relative joint torque values is depicted in Fig. 6. The subject had to pull one-handedly upwards while keeping a defined distance between foot and handle. His exerted force was 10.5th percentile. It can be seen that the knee joint exceeds 100% relative joint load. This can be easily explained: The posture measurement reported a slightly bent knee and thus Alaska/DYNAMICUS calculated a high joint torque for this posture. In reality however, the subject had fully stretched his knee not needing to exert noticeable joint torque. In other trials different joints, like the shoulder also exceeded 100% relative joint load. The explanation in this case is not that clear: The deviation from is assumed to be a combination of measurement inaccuracies from posture measurement, errors from maximum joint torque
102
F. Engstler and H. Bubb
interpolation and the ellipsoid-approach and finally the possibility, that the subject did not actually exert his maximum strength during the experiments. The sensitivity to posture measurement errors was tested on the trial in Fig. 6 by slightly varying the joint angles for clavicle, shoulder and elbow by few degrees while still generating a plausible result in PCMAN. This generated a high difference for the relative load in the shoulder joint of +15% showing the necessity for a more accurate posture measurement. Furthermore, occasionally joint angles exceeded the range in which maximum joint torque had been measured during the Realman-project, so the calculated maximum joint torque was no longer an interpolation but an extrapolation of low validity. This occurred especially at hip and knee. Also, for some of the trials not depicted here it is expected and concordant to the subject’s impression that the spine would be the limiting factor. As the spine had not been measured during the Realmanproject the results do not actually identify the limiting joint. However, the resulting limiting joints generally correlate very well with the subject’s statements.
Fig. 6. Exemplary relative joint torque values of external load for a lifting task. The black vector indicates external load, red vectors indicate calculated joint torque for elbow and shoulder. The limiting joint is marked inverted.
5 Conclusion and Outlook The presented approach was able to detect the limiting joints for given tasks of maximum force exertion. Due to inaccuracies along the process the results of relative joint torque and percentile values are however not of high validity. The need for further refinement especially of the experimental protocols became evident. Nonetheless, this pilot study showed the general applicability of our approach. To improve the validity of the results there are several issues to be dealt with in future work: First of all the experiment needs to be repeated with a higher number of subjects. In course of the project “DHErgo” funded under the EU Seventh Framework Programme maximum joint torque will be measured similar to the Realman-project including all major joints with multiple young and elderly subjects. The measurement protocols and equipment are currently being improved and measuring equipments for
Generation of Percentile Values for Human Joint Torque Characteristics
103
additional joints like the spine and ankle are being developed. Furthermore, the subjects’ posture will be recorded with a marker tracking system of high accuracy. For at least some of the subjects joint torque measurements are planned to be done at a higher level of detail which should allow for more accurate joint torque interpolation functions. In addition the ellipsoid approach will have to be put into question. It might well be good model for some joints, but is might not be suitable for all joints. Other mathematical descriptions will have to be tested. Furthermore, the ellipsoid shapes of different subjects will have to be combined in order to generate a mathematical description valid for the average population. The subjects from DHErgo will also take part in experiments on external maximum force like those from Rühmann and Schmidtke described above. In addition we plan to recreate more experiments of this kind in order to have every joint as a limiting joint at least once. These experiments will be recorded with the marker tracking system as well. All these actions should significantly improve the data quality allowing for a more promising application of the presented approach and more valid results.
References 1. Bubb, H., Fritzsche, F.: A Scientific Perspective of Digital Human Models: Past, Present and Future. In: Duffy, V.G. (ed.) Handbook of digital human modeling. Research for applied ergonomics and human factors engineering, Boca Raton, Fla. (2009) 2. Rasmussen, J., Christensen, S.T., Dahlquist, J., Damsgaard, M., de Zee, M.: AnyBody - A quantitative ergonomic design method. In: Olsen, K.B. (ed.) Working life ethics. Proceedings from Nordic Ergonomics Society 36th annual conference 2004, Hotel Comwell, Kolding, Danmark, Monday 16 - Wednesday 18 August 2004, Kolding (2004) 3. Seitz, T., Recluta, D., Zimmermann, D., Wirsching, H.-J.: FOCOPP - An approach for a human posture prediction model using internal/external forces and discomfort. In: Proceedings of the SAE DHMC (Digital Human Modeling Conference) (2005) 4. Fritzsche, F.: Kraft- und Haltungsabhängiger Diskomfort unter Bewegung - berechnet mit Hilfe eines digitalen Menschmodells, Dissertation am Lehrstuhl für Ergonomie der Technischen Universität München, München (in press, 2009) 5. Rühmann, H.: Körperkräfte des Menschen. Perzentilierung isometrischer Maximalkräfte sowie Ausdauer und Beanspruchung bei konzentrischer und exzentrischer Muskelarbeit; Kolloquium des Lehrstuhls für Ergonomie der Technischen Universität München zum HdA-Projekt "Körperkräfte des Menschen Teil II". Köln (1992) 6. Schaefer, P., Schwarz, W.: Target group ergonomics. In: European Standardization - force limits reflecting demographic profiles. In: Proceedings of the IEA 2006 conference (2006) 7. Mital, A., Kumar, S.: Human muscle strength definitions, measurement, and usage: Part I Guidelines for the practitioner. International Journal of Industrial Ergonomics 22, 101–121 (1998) 8. Mital, A., Kumar, S.: Human muscle strength definitions, measurement, and usage: Part II - The scientific basis (knowledge base) for the guide. International Journal of Industrial Ergonomics 22, 123–144 (1998) 9. Kroemer, K.H.E.: Die Messung der Muskelstärke des Menschen. Methoden und Techniken. Bremerhaven (1977) 10. Daams, B.J.: Human force exertion in user-product interaction. backgrounds for design. Delft (1994)
104
F. Engstler and H. Bubb
11. Glitsch, U., Ellegast, R., Schaub, K., Wakula, J., Berg, K.: Biomechanische Analyse von Ganzkörperkräften in unterschiedlichen Körperhaltungen. In: Gesellschaft für Arbeitswissenschaft (ed.). Produkt- und Produktions-Ergonomie - Aufgabe für Entwickler und Planer - Bericht zum 54. Kongress der Gesellschaft für Arbeitswissenschaft. Dortmund (2008) 12. Rohmert, W., Rückert, A., Schaub, K.: Körperkräfte des Menschen. Darmstadt (1992) 13. Amis, A.A., Dowson, D., Wright, V.: Elbow joint force predictions for some strenuous isometric actions. Journal of Biomechanics 13, 765–775 (1980) 14. Winter, D.A.: Overall principle of lower limb support during stance phase of gait. Journal of Biomechanics 13, 923–927 (1980) 15. Nijhof, E.J., Gabriel, D.A.: Maximum isometric arm forces in the horizontal plane. Journal of Biomechanics 39, 708–716 (2006) 16. Anderson, D.E., Madigan, M.L., Nussbaum, M.A.: Maximum voluntary joint torque as a function of joint angle and angular velocity: Model development and application to the lower limb. Journal of Biomechanics 40, 3105–3113 (2007) 17. Bubb, H.: Research for a Strenght Based Discomfort Model of Posture and Movement. In: Proceedings of the IEA 2003 conference (2003) 18. Schaefer, P., Rudolph, H., Schwarz, W.: Digital Man Models and Physical Strength – A New Approach in Strength Simulation. In: Proceedings of the SAE DHMC (Digital Human Modeling Conference) (2000) 19. Rothaug, H.: Combined Force-Posture Model for Predicting Human Postures and Motion by Using the Ramsis Human Model. In: Proceedings of the SAE DHMC (Digital Human Modeling Conference) (2000) 20. Seitz, T., Bubb, H.: Measuring of Human Anthropometry, Posture and Motion. In: Proceedings of the SAE DHMC (Digital Human Modeling Conference) (1999) 21. Härtel, T., Hermsdorf, H.: Biomechanical modelling and simulation of human body by means of DYNAMICUS. Abstracts of the 5th World Congress of Biomechanics. Journal of Biomechanics 39, S549–S549 (2006) 22. Kumar, S.: Muscle strength. Boca Raton (2004)
Adaptive Motion Pattern Recognition: Implementing Playful Learning through Embodied Interaction Anja Hashagen, Christian Zabel, Heidi Schelhowe, and Saeed Zare University of Bremen, TZI, dimeb (Digital Media in Education), Bibliothekstr. 1, 28359 Bremen, Germany {hashagen,chr,schelhow,zare}@tzi.de
Abstract. The concept of embodiment plays an emergent role in HumanComputer-Interaction. Accordingly, we conceptualized, implemented, and evaluated an adaptive motion pattern recognition system for an educational installation called Der Schwarm. We implemented three algorithms and compared correctness and processing speed. Der Schwarm aims to encourage children to learn about technology and interprets free body movements. The motion pattern recognition system fosters embodied playful learning, as an evaluation with children shows. Keywords: Motion Pattern Recognition, Playful Learning, Embodied Interaction, Children Education, HCI, Virtual Environments.
1 Introduction Kevin is 11 years old and on a study trip with his class. The teacher only announced a visit at local University. Now the students take place around a marked rectangle on the floor, in which a swarm of strange-looking little bugs move. After a short welcome by two researchers, Kevin slowly enters the area and the bugs immediately approach him. He is surprised and runs to a corner of the rectangle, while the swarm follows him. “Cool, they can see me!” he shouts. Kevin quickly waves his arms and the swarm backs off and attempts to escape. “Step on them! Catch them!”, “Look, they are not real, but made of light”, and “Run away and see what they do!” add some classmates. The class is very exited and curious about the artificial swarm, where it comes from, why it can see Kevin, and how he may control the swarm. After Kevin left the rectangle and some of his classmates interacted with the swarm, the class discusses upcoming theories about functionality and underlying concepts and creates test scenarios to proof their hypotheses. This scenario describes a typical workshop with children and the installation Der Schwarm [6] (see System Implementation Der Schwarm) conducted by our research group). From our experience in the field of Digital Media and Education free body interaction provides a good starting point to pique children’s curiosity about technology. Therefore, Der Schwarm implements the concept of Embodiment after V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 105–114, 2009. © Springer-Verlag Berlin Heidelberg 2009
106
A. Hashagen et al.
Dourish [4] and offers the possibility to interact with a computer on a bodily level. The abstract functionality regarding technical setup and swarm algorithms can be explored playfully through the medium body. The scenario emphasizes children’s motivation that is considered as a key factor to gain a deeper understanding, following the principle of Playful Learning [9]. In order to support the process of exploring abstract technological concepts through concrete body interaction driven by motivation, we implemented motion pattern recognition and integrated the software to the installation Der Schwarm.
2 Motion Pattern Recognition System Related work mainly exists in the field of gesture recognition. Rubine developed a gesture recognition system, which employs Fisher's Classification Functions to recognize patterns with devices such as mouse and stylus pen [11]. A commercial solution is the software iisuTM by Softkinetic. A 3D depth sensing camera tracks body movements with arms, legs etc and the gesture is recognized by a comparison of tracked position information with sample gestures [8]. Another example of a gesture recognition system is the game console Wii by Nintendo with the input device Wiimote [13]. The motion pattern recognition system we developed differs from these solutions in its input and tracking method. Our objective is to recognize motion patterns, more precisely the recognition of a walked path on the floor instead of gesture patterns, drawn paths in space. Anyway, the classification of the patterns requires similar conditions and can be achieved with similar methods. Within our system, the user enters pre-trained motion pattern by walking in a monitored area with the ratio of 4:3. The user’s position is tracked by a laser scanner. A visual reaction is triggered after a motion pattern is detected and projected on the floor within the monitored area. The patterns are grouped to sets of similar motion patterns such as Geometry and Numbers. The system is adaptive and therefore provides a training function, which lets the user enter new motion patterns by walking patterns repetitively. Two of the three classification algorithms we implemented are based on Discriminant Analysis. This affects the general procedure of training and classification and therefore, we introduce the concept before describing training or classification details. Generally, after having trained a new motion pattern, the pattern is represented as Discriminant Function [3]. For classification the same function is calculated to distinguish motion patterns and results in a discriminant value (Equation 1). The higher the resulting discriminant value, the higher is the similarity of the patterns and the motion pattern is classified. D = k0 + k1*m1 + k2*m2 + … + kn*mn .
(1)
Equation 1 shows Fisher’s Discriminant Function, a linear combination that calculates the distinction of two patterns. Variables m1 - mn represent characteristics of the motion pattern entered. During classification the characteristics are inserted to
Adaptive Motion Pattern Recognition: Implementing Playful Learning
107
the function. The coefficients k0 - kn, calculated with matrix manipulation, are based on the results of the training. With the characteristics (from data input) and the coefficients (inserted during training) the function is calculated and D represents the resulting discriminant value. Classifications of more than two patterns require several discriminant functions. The number of required discriminant functions depends on the number of motion patterns and on the discriminant analysis variant. In [7] and [12] are two variants mentioned, Fisher's Classification Functions and Fisher's Linear Discriminant Function. We employed both as algorithms within the motion pattern recognition system. 2.1 Motion Pattern Training Training is the process to create new motion patterns. Several input repetitions are necessary to ensure an adequate recognition rate. Besides high recognition rates our aim is to reduce computation complexity and have a minimum of needed repetitions. From our experience about 15 repetitions lead to reasonable classification results. We discuss the training complexity and recognition rates in the technical evaluation chapter. During training the average value of each characteristic is calculated to adjust the coefficients of Discriminant Functions. The adjustment ensures a stable motion pattern classification with Discriminant Analysis and the third algorithm developed by the authors. 2.2 Motion Pattern Recognition After motion patterns are trained and known to the system, software needs to recognize such patterns based on the walked and tracked path of the user during his/her data input. The recognition of motion patterns requires such as gesture pattern recognition the computation of several components. Therefore, we differentiated the pattern recognition process to the following sequential steps. 1. 2. 3. 4.
Motion point measurement (data input) Characteristics extraction Classification Reaction triggering
During motion point measurement the path walked by the user is tracked and saved. Afterwards the pattern characteristics are extracted and compared with pretrained motion patterns during classification. If the pattern is recognized an appropriate reaction is triggered, otherwise a text message is shown. Motion Point Measurement. The user’s position is tracked with a laser scanner and a sequence of two-dimensional points is extracted to recognize a motion pattern. During the input of the pattern the user’s coordinates are constantly measured and saved. The recognition of motion patterns we developed is based on the calculation of motion points. The average of five successive coordinates results in one motion point. Sequences of these motion points define a motion pattern. Figure 1 shows the
108
A. Hashagen et al.
Fig. 1. Coordinate sequence (black) and Motion Point Sequence (white)
comparison of two sequences, the original coordinate sequence (black) from the laser scanner and the motion point sequence (white) after the average value calculation. The average value of e.g. motion point no 3 is calculated from coordinates no 1-5. Calculating the averages leads to a better performance during the classification and smooth edges. Hence, irregularities and small errors in the motion pattern are reduced to ensure a stable recognition. Characteristics Extraction. Based on the motion point sequence, a characteristics vector is calculated which specifies essential properties for the recognition. The vector describes the characteristics of the pattern. The selection and the number of the characteristics influence quality as well as performance of the recognition process. In order to ensure an adequate performance the complexity in calculation and thus the number of characteristics has to be as low as possible. However, a high recognition rate requires as much characteristics as possible. We considered quality, performance, and hardware configuration and an absolute minimum of four characteristics is required. In our experiences, about ten characteristics guarantee a stable recognition (see Evaluation). The chosen characteristics ensure an invariance of scaling, thus, the size of entered motion patterns is irrelevant. Furthermore, we only consider position relations instead of absolute positions, which lead to translation invariance. Classification. After the motion points are extracted and at least eight motion points were found the classification process starts. Based on the characteristics vector, the motion pattern can be classified. A successful classification result is a motion pattern class, whose characteristics are most similar to the characteristics of the entered motion pattern. We implemented three algorithms and compared recognition rate and classification complexity. Two algorithms employ the above mentioned Discriminant Analysis, Fisher’s Classification Functions (FCF) and Fisher’s Linear Discriminant Function (FLDF) [7] [12]. The third algorithm implements Average Classification Function (ACF), which is developed by the authors of this paper and calculates the average values of the feature vector’s elements. Fisher’s Classification Functions (FCF). This variant of Discriminant Analysis was firstly described by Fisher [12] and also calculates one discriminant function for each class (Equation 1). Finally, the class with the largest discriminant value is chosen [7].
Adaptive Motion Pattern Recognition: Implementing Playful Learning
109
This version of Discriminant Analysis was successfully applied in [11] and achieved a gesture recognition rate of 96%. The differences to our classification algorithm are marginal and just differ in the input method, since the further processing is similar. Fisher’s Linear Discriminant Function (FLDF). Fisher’s second algorithm, which is associated with motion pattern recognition, generates one discriminant function (Equation 1) to compare two motion patterns. Following the principles of FCF, FLDF calculates a discriminant value and in doing so determines the motion pattern with the highest similarity [12]. Furthermore, Elpelt described a method to distinguish several motion patterns with FLDF, which requires one function for each pair of motion patterns (entered and system-known) [5]. A motion pattern is recognized if each discriminant function returns a positive result. Average Classification Function (ACF). In order to gain a better performance we developed a function, which does not calculate matrices during training and classification unlike FCF and FLDF (Equation 2). The algorithm calculates average values and standard deviations of each characteristic. C = C + 1, if sx ≥ |ax – mx| , for all x in n .
(2)
In Equation 2 the result C is the classification value of the actual system-known motion pattern, sx represents the standard deviation of the actual characteristic, and ax the average value. Mx is the characteristic of the entered and hence to be classified motion pattern. If the difference of the actual characteristic to the system-known motion pattern's average value is equal or lower than the standard deviation, the classification value is incremented. This calculation is performed for each characteristic and results in one classification value per system-known motion pattern. The result of the classification and therefore the pattern recognition is the motion pattern with the largest value, which represents the highest similarity. Reaction Triggering. If the motion pattern was classified successfully, a predefined reaction is triggered. We implemented a set of instant visible reactions as examples. These reactions are strongly related to the field of application and the installation Der Schwarm. Depending on purpose, specific usage and integration to other software systems, all sorts of reactions can be implemented to our adaptive motion pattern recognition system.
3 System Implementation at Der Schwarm The adaptive motion pattern recognition has been integrated to the multi-agent-system Der Schwarm (translation: the swarm, the flock), which allows free body movement interaction with a virtual swarm. The first version of the installation was planned and implemented in 20041, continuous enhancements in technology and workshop 1
First idea, concept, and realization particularly by Merten Schüler and Andreas Wiegand.
110
A. Hashagen et al.
concept have led to the actual installation. Anyway, the employed algorithms computing the motion pattern recognition can easily be used in other scenarios. Der Schwarm is a technological learning environment consisting of hardware and software that detects, tracks, and interprets free body movements. The hardware of Der Schwarm consists of a laptop, a laser scanner, and a projector. The laser scanner is installed at table height and detects free body movements and returns a continuous stream of two-dimensional position information about the interacting person to the laptop. The software computes a reaction to this technical representation of body movements and finally the projector produces a visual feedback. The projector is installed above the interacting person and describes a projection area of at least 6.0m x 4.5m (ratio 4:3) depending on its installation height. The laser scanner is calibrated to track movements within the projection area. Figure 1 shows the hardware setup of Der Schwarm. The system’s reaction to free body movements is computed by special software and is visualized as a flock of light spots. Reynolds’ solution for steering autonomous characters is employed to simulate swarm behavior [10]. Besides the implemented swarm behavior, the light spot’s steering direction, velocity, as well as appearance is influenced directly by the interacting person’s movements. We developed six states and created parameter sets for the swarm representing the behavior patterns trust, curiosity, observance, escape, confusion, and aggression. The changing parameters are for instance level of herd instinct, basic velocity, and basic distance to interacting person. Both behavior and color of the flock changes with its state, so that a flock of light blue, slowly moving, and the interacting person closely following light spots represents curiosity, whereas a red, quickly moving, and the interacting person chasing flock represents aggression. Since we developed a flexible software structure, new image sets of light spots are easily interchangeable and shapes (and colors) such as bugs, fish, dragonflies, circles, and squares provide room for experimentation. The image set shown in Figure 2 has been used in several workshops with children conducted by our research group.
Fig. 2. Hardware setup of Der Schwarm [6]
Adaptive Motion Pattern Recognition: Implementing Playful Learning
111
Fig. 3. Image sets of light spots for every state
Altogether, the installation Der Schwarm provides a technical starting point to foster children’s motivation to learn about technology. The system’s reaction to free body movements provides room for interpretation and discussion, as our experience from several workshops with children at the age of 9-14 years shows. The interacting person and the participants interpret technical output information semantically and curiosity about underlying techniques arouses. A didactical concept is needed to provoke children’s curiosity and motivation, which are requirements for learning. We developed and tested a concept [6] based on the statement of Ackermann in which she argues that a combination of interaction phases with immersion, so-called diving in on the one hand and reflection, known as stepping out, is needed in order to gain deeper understanding [1] [2]. The integration of the adaptive motion pattern recognition system to Der Schwarm aims at introducing abstract models with a bodily approach. Even to train new motion patterns requires advanced abstract thinking for instance to mentally compare similarity with other trained patterns, several repetitions of the path as well as decision making about the exact appearance and path to be walked. The children immediately experience the consequences of their decisions and gain deeper understanding. In the end, the system might mistake an imprecise trained square for a circle. The finding can be technology’s incapability to interpret or the need for more different looking patterns and afterwards a resulting action might be an exact training of a square pattern or the development of a new pattern. Additionally, features like translation and scaling invariance offer more possibilities in pattern classification and therefore implement extra level of abstraction.
4 Evaluation Different levels of our work, namely system specification and Human Computer Interaction require two evaluations, a technical as well as a user evaluation. The technical evaluation aims at producing technical results about the motion pattern recognition software system, specifically about recognition rates and training complexity. For the user evaluation probands were asked to enter motion patterns to the system and comment there experiences, while we observed their actions. Both evaluations were performed with two datasets of patterns, which we trained in advance. The number dataset contains digits from 0 to 9, while the geometry set includes geometrical motion patterns such as square, triangle, and circle. The datasets differ in pattern complexity and similarity of patterns entered within a set.
112
A. Hashagen et al.
4.1 Technical Evaluation Both system-known datasets were tested with each of the three implemented algorithms. We entered every pattern of each dataset and calculated the average recognition rate per algorithm. Incorrect classification or no recognition is interpreted as an error. Additionally, we calculated the theoretical complexity in calculation during training and classification. Table 1 shows the recognition rates and complexity notations of the above mentioned algorithms. C represents the number of classes and F the constant number of features for each method. Table 1. Recognition Rates of Three Classification Functions Algorithm
Number Dataset
Form Dataset Average
FCF
94%
89%
91,5%
FLDF
50%
100%
75,0%
ACF
66%
77%
71,5%
Training
Complexity Classification
O( F 2 + C )
O(C )
O( F 2 + C 2 ) O( F + C )
O(C 2 ) O(C )
In average, the implementation of FCF achieves the best recognition rates with a very good consistency. The FLDF algorithm achieves better results than ACF, but the recognition rates are inconsistent and some motion patterns could never be classified correctly, whereas other patterns were always recognized. The recognition rates with ACF are more consistent than FLDF and show about the same average recognition rate as FLDF. FLDF shows the highest theoretical calculation complexity in training as well as classification. FDF and ACF have a similar classification complexity. FDF shows a mid-range complexity in training, but as we stated, has high recognition rates. The best complexity rates in training shows the ACF algorithm and has potential for improvement to increase recognition rates at a good performance. 4.2 User Evaluation The second part of the evaluation aims at gaining information about the software's usability and accessibility for user respectively actors with the system. Therefore nine probands were asked to enter pre-trained motion patterns. Since the main target group of the installation Der Schwarm are children, we focus on four probands who were aged between 9-15 years. Every proband was asked to enter each motion pattern of the number and geometry dataset. We prepared printouts with images of the motion patterns to prevent misunderstandings regarding the notation of for instance digits such as 7, 4, and 1. During the interaction with the system, we observed the proband’s actions and interviewed them afterwards. Focus of interest was the level of difficulties entering the motion pattern and the motivation to succeed. Since we consider motivation as a key factor in children’s process of understanding, these observations are crucial for further developments of the motion pattern system and its integration to Der Schwarm.
Adaptive Motion Pattern Recognition: Implementing Playful Learning
113
The results of the evaluation with probands were predominantly positive. In general, the probands managed the task well and showed a high interest in the system as well as the underlying concepts. The majority of the motion patterns were entered without problems and successfully classified (based on the recognition rates of the FCF-algorithm). Some probands struggled with their orientation during walking in the monitored area, thus a repetition was necessary to enter a motion pattern successfully. The lack of orientation mainly occurred, if a complex pattern required a crossing of the walked path, for instance at the digits 9 and 4. Altogether, especially the important group of children and youth showed a high motivation and tried to enter the different motion patterns several times. Software enhancements, for instance a visualization of the actor’s walked path will probably help to solve problems observed.
5 Conclusions and Future Work We developed an adaptive motion pattern recognition system that detects twodimensional motion patterns walked by a person in a laser scanner monitored area. New patterns can be trained with few repetitions. The best classification rate of motion patterns is achieved by Fisher Classification Functions, as the comparison of three algorithms shows. High potential for improvement has the Average Classification Function, we developed, since the algorithm shows the best results in calculation complexity. We integrated the pattern recognition system to the learning environment Der Schwarm, a project of our research group. As the introducing scenario states, the motion pattern recognition system aims at supporting the general objective of Der Schwarm to pique children’s curiosity about technology. The installation provides an environment to explore Digital Media playfully and to gain deeper understanding of underlying concepts by embodied interaction. Operating the motion pattern recognition system with Der Schwarm requires an intense examination of technology and its principles and can support the process from immersive engagement to abstract understanding. During the user evaluation children of the main target group at the age of 9-15 years interacted with Der Schwarm and the motion pattern recognition system. They showed a high motivation to learn about the technology. Besides technical improvements in performance and classification rate, our next approaches are enhancements in usability, such as a visualization of the walked path and a rotation invariance. Furthermore, we plan the implementation of appropriate reactions to a (un)successful recognition, that supports the didactical concept in enabling children to playfully explore abstract concepts of through concrete interaction.
References 1. Ackermann, E.K.: Perspective-Taking and Object Construction: Two Keys to Learning. In: Constructionism in Practice: Designing, Thinking, and Learning in a Digital World, pp. 25–35. Lawrence Erlbaum, Mahwah (1996)
114
A. Hashagen et al.
2. Ackermann, E.K.: Constructing Knowledge and Transforming the World. In: Tokoro, M., Steels, L. (eds.) A Learning Zone of One’s Own: Sharing Representations and Flowing Collaborative Learning Environments, pp. 15–37. IOS Press, Amsterdam (2004) 3. Backhaus, K., Erichson, D., Plinke, W., Weiber, R.: Multivariate Analysemethoden - Eine anwendungsorientierte Einfuehrung. Springer, Berlin (2003) 4. Dourish, P.: Where the Action Is: The Foundations of Embodied Interaction. MIT Press, Cambridge (2001) 5. Elpelt, B., Hartung, H.: Multivariate Statistik. Oldenbourger Wissenschaftsverlag, Munich (2007) 6. Hashagen, A., Schelhowe, H.: ‘Der Schwarm” - Playful Learning with an Interactive Virtual Flock. In: Auer, M.E. (ed.) Interactive Computer aided Learning (ICL) International Conference Villach/Austria 2008, The Future of Learning – Globalizing in Education. Kassel University Press, Kassel (2008) 7. Klecka, W.A.: Discriminant Analysis. Sage Publications, Thousand Oaks (1980) 8. Softkinetic, S.A.: Product Datasheet IISUTM —iisu.pdf (2008), http://www.softkinetic.net/Files/media/ Softkinetic-Product-Datasheet–iisu.pdf 9. Resnick, M.: Edutainment? No Thanks. I Prefer Playful Learning. In: Associazione Civita Report on Edutainment. pp. 1–4 (1987) 10. Reynolds, C.W.: Steering Behaviors for Autonomous Characters. In: Proceedings of the Computer Game Developers Conference 1999, pp. 763–782. Miller Freeman Game Group, San Francisco (1999) 11. Rubine, D.: The Automatic Recognition of Gestures. PhD thesis, Carnegie Mellon University (1991) 12. Srivastava, M.S.: Methods of Multivariate Statistics. John Wiley & Sons, Inc., New York (2002) 13. WiiTM, Nintendo®, http://wii.com/
A Multi-functional Visualization System for Motion Captured Human Body Based on Virtual Reality Technology Qichang He1, Lifeng Zhang1, Xiumin Fan1,2, and Yong Hu1 1
2
CIM Institute of Shanghai JiaoTong University, Shanghai 200030, China State Key Laboratory of Mechanical System and Vibration, Shanghai 200030, China
[email protected]
Abstract. This study is to develop a multi-functional visualization system (KINE) for motion captured Human Body based on Virtual Reality (VR) technology, which reconstruct the skeleton rigid model motion in the 3D virtual environment. The KINE is based on VR general application development platform named VRFlier, which provide an innovative human-machine interaction. This paper focuses on the methods of human rigid modeling and motion reconstruction. The human rigid modeling is based on Rigid Body Assumption (RBA) theory and using Virtual Marker (VM) to position the arthrosis of linked body segment. The motion reconstruction is implemented through coordination transformation of Local Coordination System (LCS) defined by VM. KINE is applied in the research project “Mechanical Virtual Human of China”, the results show that this software tool can help conveniently analyze the data collected by motion captured system. Keywords: Virtual Reality, Human Skeleton Rigid Model, Measuring Rigid Body (MRB), Virtual Marker (VM), Motion Visualization.
1 Introduction Human motion capture system has already widely used in research of human biomechanics. In key research project, Mechanical Virtual Human of China, supported by China National Natural & Science Foundation, motion capture system, Optotrak TM Certus, is used to measure human body motion and results will be transformed into the motion of corresponding musculosketetal model, then human mechanics analysis can be made [1]. Dynamical visualization of captured data in 3D virtual environment will make researcher observe and analyze the acquisition data much more conveniently. The result of kinematics computation, dynamics computation and finite element computation could display in same virtual simulation environment, so the researcher can make full use of the captured data. Visual3DTM is a commercial visualization software system for motion capture data [2]. It provides a convenient tool for analyzing the data in 3D visual environment, but it can’t provide a fully 3D immersive and interactive environment. The emerging VR technology just can meet these challenges, which is a new-type human-machine V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 115–122, 2009. © Springer-Verlag Berlin Heidelberg 2009
116
Q. He et al.
interaction technology. Visualizing the motion captured system data with VR technology will make user observe and analyze the data more conveniently and intuitively. The following chapters introduce the developed multi-functional visualization system for motion captured human body (KINE) which can not only visualize motion data, but also provide a set of interactive software tools to make data analysis and operation.
2 System Overview The KINE system is composed of following main modules: data input interface, human skeleton modeling, motion reconstruction, and human-machine interaction, which is shown as Fig.1.
Fig. 1. KINE system structure
Human skeleton motion reconstruction is based on human multi Rigid Body Assumption (RBA) theory [3]. RBA is a widely used theory in human biomechanics research. It assumes that human limb is rigid body, which is linked by hinge, so the human body could be treated as a rigid body system without considering deformation under external force. The system provides a human skeleton modeling tool, which can build up human skeleton model automatically according to parameters defined by user. The system provides standard data input interface which support C3D data format. The motion captured system uses high precise camera to detect motion trajectory of marker which is pasted on the skin of human body, and then transform into human motion data. In order to record whole human motion data, it is necessary to paste many markers on the measured human body based on the RBA theory. During the motion captured process, some pasted markers might not be detected by motion captured system because of blocking or out of measuring range, so make the captured data incomplete. This system provides customized interpolation method to fill blank data. There are linear, spine and customized methods for user’s selection.
A Multi-functional Visualization System for Motion Captured Human Body
117
At the same time, the system is developed based on a software platform named VRFlier, which is a general development platform for Virtual Reality (VR) application. VRFlier is cross-platform, modular and extensible, which is developed by SJTU(Shanghai JiaoTong University) and has already supported successfully for many kinds and sizes of VR application development [4]. Based on VRFlier, KINE system has a VR-based human-machine interactive module; user could manipulate viewpoint and 3D model in an immersive stereo virtual environment.
3 Human Skeleton Modeling According to human multi Rigid Body Assumption theory, human skeleton model has been built up, as following Fig.2 in which all the human body segment (e.g., thigh, upper arm and forearm) is treated as rigid body. In Fig.2, human body is composed of 15 rigid bodies whose names are listed in Appendix Table1. Each human limb is pasted a Measuring Rigid Body (MRB) and two Virtual Markers (VM). Each MRB is built up by 3 or 4 real markers, which could be detected by the motion capture system when the marker moves together with the human body. The posture of measuring rigid body is determined by MRB. The VM is not a real marker. It is only a marked point, which is fixed relative to the MRB. Its position can be computed through the posture of MRB. Through using MRB technique, the motion captured system could output the position data of the VM even if it is out of measuring range. With VM, it can not only decrease the measuring quantity of pasted real marker, but also improve the measuring accuracy. The rigid body is linked through arthrosis, the midpoint between two virtual markers is thought as the center of arthrosis, but the shoulder and hip arthrosis are exceptional, their center is offset from corresponding virtual marker which is shown as blue and yellow points in Fig. 2. The length H and R is corresponding offset distance, which could be adjusted according to each measuring human body. Also, the geometry parameter of rigid body, such as length, center of gravid, inertia etc could be modified according to the captured human data.
Fig. 2. Human skeleton rigid model
118
Q. He et al.
4 Multi-rigid Skeleton Model Motion Reconstruction 4.1 Coordination System Definition Model-related computations are facilitated by associating a distinct coordinate system with each model segment. In KINE system, two kinds of coordination systems are used to describe the movement of human skeleton rigid model: World Coordination System (WCS) and Local Coordination System (LCS). The LCS move with respect to the WCS as the model segment itself moves. The WCS is determined by the motion captured system when it is initialized, the captured data is recorded under this coordination system. The LCS is determined by those four virtual markers located at the proximal and distal of each model segment. The proximal means that the two virtual markers are close to human rigid body THO (Truck, Refer to Appendix table1) and the distal is away from the THO. The definition of LCS is shown in Fig.3. The origin of LCS is located at C1 point, Z axis direction is from C1 to C2, Y axis is perpendicular to the plane determined by C2 and distal two virtual markers, and X axis is determined through right hand rule. The C1 and C2 is the midpoint of two virtual markers located at proximal and distal respectively.
Fig. 3. LCS definition of model segment
Because of the exception of upper arm and thigh segments, the definition of these two segments’ LCS is shown as following Fig. 4 and Fig. 5.
Fig. 4. LCS definition of Upper Arm
Fig. 5. LCS definition of thigh
A Multi-functional Visualization System for Motion Captured Human Body
119
4.2 Motion Reconstruction Motion captured system uses video cameras or similar devices to track the instantaneous locations of target markers as they move. It is usually not possible to position the virtual markers at the proximal and distal segment endpoints, because these endpoints are defined to be at locations inside the body of the measuring subject. Because the position of VM is fixed relatively to MRB, we can compute the position of VM when MRB move along with the body segment. Each rigid body‘s position is invariable in LCS, so the rigid body‘s movement is the timing process of LCS. The coordination system of 3D model in the virtual simulation environment might be different from the WCS. In order to make coordination transform conveniently in the 3D virtual environment, it is necessary to calibrate the rigid body. The calibration process is as follows: to move the proximal center of rigid body to the origin of WCS, and rotate the distal to Z axis. As shown in Fig. 6, the rigid body moves from P1 to P2. After the calibration, one can recording the rotation matrix M cali which make the rigid body transform from P1 to P2. z Z
v2 C2 P1
Distal
p3
C2
p4
y P2
C1 x Y
Proximal
p1
p2
v1
C1
X
Fig. 6. Rigid body calibration
Fig. 7. Rigid body matrix transformation
In Fig. 7, the skeleton segment’s posture is determined by the virtual markers (p1, p2, p3, p4), the vector v1 , as following formula 1, determines the rotation matrix
M rot1 along the z axis, and vector v 2 , as following formula 2, determines the rotation matrix M rot 2 along the x axis and y axis. C1, as following formula 3, the midpoint of proximal end, determine the transformation matrix M trans . v1 = p 2 − p1
v2 =
p 4 + p3 p 2 + p1 − 2 2 c1 =
p 2 + p1 2
(1) (2)
(3)
120
Q. He et al.
z
z
x
y
z
M trans
y
y
x
M rot1
x M rot 2
Fig. 8. Matrix transformation of rigid body in motion reconstruction
The transformation process of 3D rigid model in virtual environment is shown in Fig. 8. The center of proximal end is located at the origin of LCS, the distal is directed to Z axis when the rigid segment has been calibrated. The position of original 3D model is shown as red arrow. In there, the horizontal arrow denotes vector v1 and perpendicular arrow denotes vector v 2 . Just as shown in Fig. 8, the transformation process is: Firstly, rotate the rigid body segment along z axis using rotation matrix M rot1 to green position; Then rotate the rigid body segment along
x and y axis using rotation matrix
M rot 2 to yellow position; Finally transfer the rigid body segment using transformation matrix
M trans to blue
position, which is the final posture of the rigid body segment in this measuring frame. So the transform matrix of rigid body:
M = M rot 2 × M rot1 × M trans
(4)
In Fig. 8, the center of vector v1 and v 2 in green arrow is same as in red where offset a visual distance in order to describe the matrix transformation process.
5 System Integration and Application Verification The KINE system is developed based on VRFlier platform using Visual Studio 2005. The system‘s human-machine interaction interface is shown as following Fig. 9. In this visualization example, the motion reconstruction data come from the research project of “Mechanical Virtual Human of China”. Through KINE system, the user, with 3D stereo glass, could interact and observe the reconstruction process naturally and immersive. During the process of motion visualization, user could adjust the viewpoint and the motion replay speed in real-time through mouse, keyboard and other interface device. The system will also provide tools such as building up parametric rigid model, marker configuration, kinematics
A Multi-functional Visualization System for Motion Captured Human Body
121
Fig. 9. Human-machine interaction interface of KINE system
Fig. 10. Kinematics computation visualization of upper arm
and dynamics computation and visualization etc. KINE system also provide kinematics and dynamics computation function, the computation results can be visualized through graph and also can be output through data output interface. The following Fig. 10 shows the kinematics computation result of upper arm.
6 Conclusion At present, the developed KINE system is being used in the research activities for “Mechanical Virtual Human of China”. Dozens of kinds of human motion data including running, weight-lifting, riding bicycle etc have already visualized and tested by the system. The results showed that this system could satisfy researcher analyzing requirement of body motion captured data. It also has the flexibility to meet customized data formatting, processing, analyzing and reporting requirements. Based on VR technology, the developed system provides a more interactive and immersive virtual environment for analyzing motion captured data. The best performance has obtained by the VR technology, which make the user observe and analysis the skeleton movement more naturally and conveniently. In the future, more functions will be added into this system.
122
Q. He et al.
Acknowledgement The work is supported by Key Project from NSFC of China and Program for New Century Excellent Talents in University (NCET). The authors thanks very much for support from Prof. Chengtao Wang. And the authors are also grateful to the editors and the anonymous reviewers for helpful comments.
References 1. Cheng-tao, W.: Mechanical Virtual Human of China (in Chinese). Journal of Medical Biomechanics 31(3) (2006) 2. Visual3D manual, version 3, C-Motion, Inc 3. Biryukova, E.V., et al.: Kinematics of human arm reconstructed from spatial tracking system recordings. Journal of Biomechanics 30(8), 985–995 (2000) 4. Xing-zhong, P., Flier, V.R., et al.: A Software Platform for Virtual Reality General Application Development (in Chinese). Journal of System Simulation 17(5) (2005) 5. Hong-sheng, W., et al.: Human gait measurement based on rigid body and virtual markers (in Chinese). Journal of Clinical Rehabilitative Tissue Engineering Research 12(30) (2008) 6. Cereatti, A., et al.: Reconstruction of skeletal movement using skin markers: comparative assessment of bone pose estimators. Journal of Neuro Engineering and Rehabilitation, 3 (2006) 7. Guan, L.: Research on Motion Generation and Control Techniques for Virtual Human (in Chinese), Doctoral dissertation, Northwestern Polytechnical University (2003)
Appendix Table 1. Name of human segment
Name HED THO PLV RTH RSH RFT LTH LSH LFT RUA RFA RHD LUA LFA LHD
Human Segment Head Trunk Pelvis Right Thigh Right Shank Right Foot Left Thigh Left Shank Left Foot Right Upper Arm Right Forearm Right Hand Left Upper Arm Left Forearm Left Hand
Augmented Practice Mirror: A Self-learning Support System of Physical Motion with Real-Time Comparison to Teacher's Model Itaru Kuramoto1, Yoshikazu Inagaki2, Yu Shibuya1, and Yoshihiro Tsujino1 1
Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 606-8585 Japan 2 Kyoto University of Education, Fujinomori-cho, Fukakusa, Fushimi-ku, Kyoto 612-8522 Japan
[email protected]
Abstract. An effective way to learn some physical motions such as dancing, playing sports, making traditional crafts, and so on, is to mimic teacher's motion. In this style of learning, it is important for the learner to recognize the difference between the teacher's motion and his/her one. We propose Augmented Practice Mirror (APM) learning support system. APM shows the mirror image of learner's motion overlapped teacher's one, and the difference between them. These three images are shown simultaneously on a large screen as virtual mirror in real time. As a result of the experimental evaluations, it was found that APM was better in recognizing the difference between the participant's motion and the teacher's one than two common methods, and that the hybrid interface of voice recognition and gesture was better than the single interface of voice recognition or gesture for operating APM. Keywords: mirror interface, gesture, physical motion tracking, learning support, human model, voice recognition, augmented reality.
1 Introduction An effective and simple way to learn some physical motions such as dancing, playing sports, making traditional crafts, and so on, is to mimic teacher's motion. It is not often that a learner can be taught by a teacher of the physical motion which he/she want to learn because there might be only a few teachers, or all of them might have already died in case of making some traditional craftworks. In such a case, a learner typically practices in front of a large mirror while watching the teacher’s motion by movie in order to compare his/her motion with the teacher’s one. When he/she notices that his/her motion is different from the teacher’s one, he/she tries to decrease such different motion by changing his/her motion. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 123–131, 2009. © Springer-Verlag Berlin Heidelberg 2009
124
I. Kuramoto et al.
In this style of learning, it is important for the learner to recognize the difference between the teacher's motion and his/her one. In order to support for recognizing it, three following points are needed; 1. To show teacher's motion model overlapped with learner's motion 2. To show both teacher's motion and learner's one simultaneously, in real-time 3. To make the difference between teacher's motion and learner’s one easy to understand Some self-learning support systems had already proposed, but they cannot deal with some of these points. Interactive Video Mirror[1] is a system for learning the physical motion of playing sports. It shows the mirror image of a learner’s motion in real-time. However, it cannot overlap a teacher’s motion on the image. Paravie[2] shows a learner’s mirror image and a teacher’s motion simultaneously in real-time, but they are displayed in juxtaposition, not overlapping. We propose Augmented Practice Mirror (APM) self-learning support system which covers all of these points. APM shows the mirror image of learner’s motion, overlapped teacher's motion, and the difference between them. These three images are shown simultaneously on a large screen as virtual mirror.
2 Implementation 2.1 Displaying Motions Figure 1 shows the overview of APM. When using APM, a learner can see three following images in real-time. • The learner’s motion as mirror image with learner’s wire human model. • The wire human model tracing teacher's motion overlapped with the learner’s motion. The model has already been recorded. • The difference of each joint position (ex. knee, elbow, and so on) between the learner’s motion and the teacher's one. Each difference is shown by dotted lines, and the color of them is determined by the degree of difference. In other words, the larger the difference is, the warmer the color is. The learner can play teacher's motion from any point of time, and he/she can also control the speed of the motion. Table 1 shows the functions APM has. Using these functions, he/she can see the motion and/or the difference in detail. In addition, APM has the function to adjust the size of teacher's model. Generally, the body size of a teacher is often different from that of the learner. He/she can enlarge or shrink the size of teacher's motion model in order to fit it to his/her body. This adjustment is important when showing the difference between the teacher's motion and his/her one.
Augmented Practice Mirror: A Self-learning Support System of Physical Motion
125
Fig. 1. Overview of APM Table 1. The operations of APM operation (re-)start stop go to begging point rewind fast-forward set the start point change play speed
detail (re-)start teacher’s motion with a normal speed pause teacher’s motion go back teacher’s motion to the beginning point (scene #1) start teacher’s motion reversal with x2 speed start teacher’s motion with x2 speed set teacher’s start point where a learner want to start change the speed of teacher’s motion
2.2 Interface When using APM, a learner must stand in front of the virtual mirror screen. In such a case, the learner cannot manipulate conventional PC devices such as mouse, keyboard, and so on, in order to operate APM. Some hand-held devices are also not available because he/she might want to learn the motion with precise hand movement. Therefore, it is considered that there are two interfaces for operating APM, that is, voice recognition and body gesture. In the voice recognition interface, a voice command should be defined for each function mentioned Table 1. A learner wears a wireless hands-off microphone. In the interface, the learner can operate any functions with any poses. However, some functions such as “rewind” and “change play speed” may require many commands to operate, and they are time-consuming.
126
I. Kuramoto et al.
Fig. 2. Gesture interface
By contrast, the learner can operate simply by gesture like as HyperMirror[3]. Figure 2 shows the gesture interface of APM. In the interface, he/she touches or manipulates the control objects on the virtual mirror screen with his/her hands. The position of them is captured by the motion tracking system which APM uses for tracking the motion of his/her whole body. The interface has a slider for setting the start point of teacher’s motion, so he/she can start from the desired scene of the motion directly and intuitively. However, the interface forces him/her to get out of the correct position when he/she invokes some commands. In order to balance the trade-off between operation time and forcing some unnecessary body movements, we propose the hybrid interface of voice recognition and gesture. In functions shown on Table 1, it is considered that “(re-)start” and “stop” by gesture are particularly obstructive for learning. Therefore, in the hybrid interface, these two commands are executed by voice recognition, and the other commands are by gesture. However, the interface may be confusing because it combines two different interfaces so the learner cannot do the correct action immediately. 2.3 Hardware Figure 3 shows the system overview of APM. APM is implemented on two PCs with Windows XP Professional. One PC tracks learner’s motion with Radish the motion tracking system. Another PC captures learner's image by an USB camera, and composes the mirror image. DirectX technology is used to capture images from the camera and represent the wire human model in a virtual 3-D space. In addition, Microsoft Speech SDK 5.1 is used for voice recognition.
Augmented Practice Mirror: A Self-learning Support System of Physical Motion
127
Fig. 3. System overview
3 Evaluation 3.1 Usability In order to evaluate the usability and efficiency of APM, we held laboratory-based experiment. We compared three types of representation, which were: • Md: The proposed method. APM showed the difference, the wire human model of teacher's motion, that of learner’s motion, and learner’s image. All of them were overlapping. • Mo: APM showed the wire human model of teacher’s motion, that of learner’s motion, and learner’s image. They were overlapping. • Mp: The wire human model of teacher’s motion and learner’s image were shown in juxtaposition. We asked 9 participants to learn three types of motion, which were throwing a dart, shot-putting, and robotic dance. In the experiment, one of the authors operated APM to exclude the influence of the usability of its interface. The learning time was 15 minutes each. After each learning, they were asked to answer the questionnaire, which consists of the questions described below; • Q1. I could recognize the motion of the teacher easily. (-2: strongly disagree, 2: strongly agree) • Q2. I could recognize the motion of myself easily. • Q3. I could recognize the difference between the motions of the teacher and me.
128
I. Kuramoto et al.
Fig. 4. The result of the usability experiment
Figure 4 shows the result of the questionnaire. As a result, the average score of Mp is significantly worse than those of Md and Mo in question 1. The result indicates that overlapping teacher's motion with that of participants is better to recognize the motions than juxtaposition. It suggests that the participants must concentrate both of motions simultaneously to recognize the difference, so the concentration in the case of parallel representation (Mp) is more difficult than the cases of overlapping representation (Mo and Md). In addition, the score of Md is significantly better than those of Mo and Mp in question 3. It is found that the proposed method seems to be acceptable and useful. 3.2 Interface In order to compare the performance of the interface of APM, we held an additional laboratory-based experiment. We compared three interfaces of APM, which are V: voice recognition, G: gesture and H: hybrid interface mentioned in section 2.2. Table 2. The operations measured in the experiment
operation
detail
play pause rewind short
(re-)start teacher’s motion stop teacher’s motion when mimicking rewind a short period (or set the start point slightly before the current position in H and G) rewind a long period (or set the start point largely before the current position in H and G) change the speed to x1/2 (in V, say ``slow’’ two times) change the speed from 1/2 to x3/4 (in V, say ``fast’’ once)
rewind long two levels slower one level faster
Augmented Practice Mirror: A Self-learning Support System of Physical Motion
129
We asked 18 participants to learn a motion of robotic dance, which was the same one as the previous experiment in section 3.1. Each of the participants performed one task for each interface. We asked them to execute some operations which were displayed at the upper part of the virtual mirror screen (see Figure 5) during they were mimicking the motion of the dance. Table 2 shows the operations which we measured. We measured the time interval from displaying an instruction to the execution of the operation correctly as the performance time of the operation. In addition, after each task we asked them to answer the questionnaire, of which questions are shown below; • Q1. I could understand the interface easily. (-2: strongly disagree, 2: strongly agree) • Q2. I felt a load when learning with the interface. • Q3. I could control APM appropriately.
Fig. 5. Operation displaying field for experiment
Figure 6 shows the result of the performance time per operation. As a result, the G’s performance time of “play” and “pause” is shorter than that of V and H. In addition, the V’s time of “rewind long” and “two levels slower” is significantly longer than that of G and H. It is pointed out that the gesture interface is more effective than the voice recognition interface in the aspect of the performance time. It is the similar to the argument which Hämäläinen pointed out in the evaluation of Interactive Video Mirrors[1]. In the interview after the experiment, some participants said that they were confused in case of “play” and “stop” operations with H because they could not judge quickly which modes (voice or gesture) they should use. From the opinion, it is expected that the difference between H’s operation time and G’s becomes small as they are familiar to the hybrid interface. Figure 7 shows the result of the questionnaire. In question 1 and 3, the average score of G and H is higher than V because the participants are required to remember all of the commands in the voice recognition interface. By contrast, the average score of V and H is higher than G in question 2. It means that operation by the gesture interface is an actual obstruction when they are learning.
130
I. Kuramoto et al.
Fig. 6. The result of operation time (by millisecond)
Fig. 7. The result of the questionnaire
In addition, the H’s average scores of all questions are high, therefore it is considered that the hybrid interface is more effective than other two interfaces in the both aspect of performance time and subjective evaluation.
Augmented Practice Mirror: A Self-learning Support System of Physical Motion
131
4 Conclusion APM is a virtual mirror system for self-learning of physical motion. It has mirror-like screen which shows a learner’s motion, a teacher's motion by wire human model, and differences between their motions simultaneously in real-time. As a result of the experimental evaluations, it is found that APM is better in recognizing the difference between the participant’s motion and the teacher's one. In addition, it is also found that the hybrid interface of voice recognition and gesture is better for operating APM than the voice recognition interface and the gesture interface.
Acknowledgement This work was partially supported by KAKENHI (Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (B), 20300037, 2008).
References 1. Hämäläinen, P.: Interactive Video Mirrors for Sports training. In: The 3rd Nordic Conference on Human-Computer Interaction, pp. 199–202 (2004) 2. Usui, J., Hatayama, H., Sato, T., Furuoka, Y., Okude, N.: Paravie: dance entertainment system for everyone to express oneself with movement. In: The 2006 ACM SIGCHI international conference on Advances in computer entertainment technology, p. 30 (2006) 3. Morikawa, O., Maesako, T.: HyperMirror: toward pleasant-to-use video mediated communication system. In: 1998 ACM conference on Computer supported cooperative work, pp. 149–158 (1998)
Video-Based Human Motion Estimation System Mariofanna Milanova and Leonardo Bocchi Computer Science Department, University of Arkansas at Little Rock 2801 S. University Ave., Little Rock, Arkansas 72204, USA Dept. of Electronics and Telecommunications, University of Florence, V. S. Marta 3, Florence, Italy
[email protected],
[email protected]
Abstract. This paper presents the system designed to estimate body silhouette representation from sequences of images. The accuracy of human motion estimation can be improved by increasing the complexity of any of the three fundamental building blocks: the measured data, the prior model, or the optimization method. The vast majority of existing literature on human motion estimation has focused on just one of these building blocks: improving the methods for optimization, also called inference. In contrast, our approach seeks to explore the hypothesis that the other two building blocks are critical components, using extremely high accuracy measured data and shape of body motion priors, so that the objective function is more precise and less noisy, resulting in an easier solution. Our main goal is to develop a new module for extracting accuracy measured data from video imagery.
1 Introduction Much work has been done to create different automated visual surveillance systems that are based on computer vision technology and designed for security purposes facilitating the detection and tracking of human motion and intrusion. A survey by Wang reviewed the techniques for human motion analysis dealing with detecting, tracking and recognizing [1]. Wang’s paper provides a comprehensive survey of research on computer-vision-based human motion analysis. The emphasis is on three major issues involved in a general human motion analysis system, namely human detection, tracking and activity understanding. Currently the main algorithms for locating and tracking people can be divided into the following four categories: region –based tracking, active contour- based tracking, feature- based tracking and model-based tracking [2]. Techniques for tracking human motion from video present us with still open problems, such as: self occlusion of the legs during motion, occlusions from other objects, different illumination and background. The task of tracking human motion becomes even more complicated in the case of tracking human activities. A requirement for achieving automatic surveillance of human activity is attaining a reliable tracking of human body parts. A visual surveillance system could identify human behavior that is considered abnormal by simply representing or interpreting the human body movements. For example, in the V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 132–139, 2009. © Springer-Verlag Berlin Heidelberg 2009
Video-Based Human Motion Estimation System
133
case of detecting a distinct and unusual body shape or different body parts with abnormal proportions or people starting a fight, the visual surveillance system will interpret the action. Keeping a detailed description of the human figures being tracked, such as segmenting into meaningful body parts, allows for a more comprehensive analysis of the human activities being tracked. The paper is organized as follows: section 2 describes the overall architecture of the system. In sections 2-1 – 2.5 we determine a range of methods implemented to estimate body silhouette representation. Experimental results are presented in section 3. Conclusions are presented in section 4.
2 Materials and Methods The overall architecture of the system is shown in Fig. 1. In the first step, each frame of the video sequence is processed to extract the Region of Interest (ROI) from the background. The extracted ROI is then filtered to evaluate a vector of feature maps which is used to represent the spatial distribution of features in the frame. The detected blobs are refined to produce a human silhouette. Next, at the body silhouette representation step, we implement body –part detection referred to as a set of local descriptors. For the shape –based analysis, we define a global descriptor. Both the local descriptors and global descriptor are combined to implement pose/shape estimation. In the last stage, an elastic network composed of a set of specialized feature detectors is used to find the optimal match between the image and feature maps. Our generative model predicts silhouettes in each video camera view given the pose /shape parameters of the model.
Input image
ROI extractor
Platoon of units
ROI
Matching
Fig. 1. Block diagram of the system
Filter bank
Feature maps
134
M. Milanova and L. Bocchi
Fig. 2. Sample feature maps obtained from the ROI shown in fig. 1, with scale k=1 and four different orientations (horizontal, left diagonal, vertical, right diagonal, respectively)
2.1 Region of Interest (ROI) Extraction For ROI extraction we used optical flow algorithm presented by Little and Boyd [3]. We compute the optical flow of the motion sequence to get n images (frames) of (u,v) data, where u is the x-direction flow and v it the y-direction flow. The dense optical flow is generated by minimizing the sum of absolute differences between image patches. The result is a set of moving points. For each frame of the flow we compute a set of scalars that characterizes the shape of the flow in that frame. We use all the points in the flow and analyze their spatial distribution. The shape of motion is the distribution of flow, characterized by several sets of measures of the flow. For example, we compute the following scalars: x and y coordinates of the centroid of moving region, aspect ratio of moving region. The system is represented in Milanova [4]. 2.2 Feature Extraction Each ROI has been processed to extract a feature map which describes the spatial position and features of the subject. A proper selection of the set of feature maps is of primary importance to obtain good performances of the system, and it is mandatory that the features represent the local properties of the image (presence and directions of edges, corners, and similar characteristics of the image). Moreover, a set of optimal features needs to provide a compact description of these local properties of the image to have a feature vector of reasonable size. This suggests that a feature set be selected having both a limited spatial support in order to capture local information and a limited frequency response allowing to reduce noise. These properties, as shown by several researchers, are best exploited by Gabor functions. The proposed feature map is based on a multiscale Gabor representation, based on the Morlet wavelet. The Morlet wavelet is defined by rotation and scaling of a mother wavelet function which, for a given scale k and a given orientation θ, is expressed as:
⎛ v 2k ,θ x 2 ⎞ ⎟ exp(iv k x ) 2 ⎟ ⎝ 2σ ⎠
ψ k ,θ = β k ,θ exp⎜⎜ −
(1)
Video-Based Human Motion Estimation System
135
where x = (x,y) is the coordinate vector in the image plane, βk,θ is a normalization constant, and the parameter v has direction θ and module v = 2k. Starting from (1), the (discrete) Morlet transform is defined as:
M k , x 0 = ∑ψ k ,θ (x − x 0 ) ⋅ I (x)
(2)
x
For each value of scale k and orientation θ, the Morlet transform represents a map of the features with the corresponding scale and orientation present in the image. In this work, we used a set of three different scales, and eight equally spaced directions. The resulting transforms are complex-valued images. As the phase information is mainly related to the spatial position of the objects into the image, we converted each feature map to a real valued image, taking it modulus. Therefore, we obtained a total of 24 feature maps from each ROI. Each pixel in the image is associated to a feature vector having 24 components, which describe the local properties of the image in a neighborhood of the pixel. 2.3 Representation of the Silhouette The proposed representation is based on a self –organizing system designed to learn to recognize both the characteristic features of the image and their spatial relationship. To this end, we build an architecture composed of a set of specialized feature detectors which are coupled together, by elastic forces, to form a sort of network. Each feature detector acts as an independent unit, which is free to move on the target image to detect a matching feature. The unit is identified by a target vector with 24 components, which is compared to features present in the image in order to find the best match. However, the elastic coupling between units force the units to act in a coordinated way, and to find the optimal matching between detectors and image, taking into account both the matching between feature detectors and actual features, and the spatial relationships among features, both in the image and in the network units. When a new image is fed as input to the network a relaxation process takes place allowing units to move on the image and to reach the optimal minimum. During the training phase, at the end of the relaxation phase, the target vector is updated to match more closely the feature vector in the final location of the unit. In the following, we outline the basic relations which describe the network dynamics. For a more detailed description, see [5]. 2.4 Relaxation The approach used to find the optimal configuration of the network, which is the best position of the feature detectors on the input image, is based on an energy minimization strategy. The network is associated to energy composed of two components: the first part is associated to the elastic stretching of the connection between the units, while the second component relates to the discrepancy between target vectors and the features present in the image.
136
M. Milanova and L. Bocchi
The elastic energy is evaluated by assuming an ideal spring is connected between each couple of neighboring units in the grid. Assuming each unit (i,j) is connected to the 4-neighborhood Sij, the resulting elastic energy Ei can be expressed as:
Ei = ∑
∑
c Pij − Pmn
( i , j ) ( m , n )∈S ij
2
(3)
where Pij is the position, on the image, where the unit (i,j) is located and c represents the elastic constant of the springs. The second energy term has been selected in order to achieve the minimal energy when the best matching occurs between the target vector and the feature maps. Among the several possible rules, we selected the most straightforward, based on the scalar product between the two vectors:
Ee = − ∑ w ij × I ( Pij )
(4)
(i , j )
where (i,j) identifies the network units, wij is the target vector for the unit, and I(Pij) is the vector of the feature maps at the position Pij. Minimization of the total energy Et = Ei + Ee is achieved by an iterative procedure. In the first phase, an attention point is randomly selected on the image. Selection procedure is performed using a roulette-wheel procedure, which gives higher probability of selection to points having a larger feature vector. Once the attention point has been selected, all units in the neighborhood of the attention point are tested by moving, in turn, each of them in the attention point, and evaluation the network energy before and after the move. The unit which is associated to the largest energy loss is then selected as winner, and it is moved toward the attention point. Experimental results indicate that the convergence speed can be improved by moving altogether all units in the neighborhood of the winning units. The process is then repeated for a given number of iterations, slowly decreasing both the speed of the movement toward the attention point and the radius of the neighborhood. 2.5 Adaptation Once the relaxation phase has been completed the network can be trained to improve the matching between the target vector and the actual features present in the image. To this end, the target vector of each unit is changed in order to reduce the difference between the target vector and the features present in the image in the final position of the unit. The adaptation is performed following a rule similar to the one used in the self-organized neural maps proposed by Kohonen [6]:
[
Δw ij = ε I ( Pij ) − w ij
]
(5)
Video-Based Human Motion Estimation System
137
Fig. 3. Results evaluation: a label marks a significant point of the image (walking man) in the training set. At first (1) the match point which is closer to the label is identified. This allows to mark (2) a unit in the grid. A second image is presented to the network, and the unit identifies a new match point (3). The distance between the match point and a label (4) placed on the test image (walking woman) gives an estimate of the results.
where ε is the learning rate. As it occurs with Kohonen maps, best convergence results are obtained using a slowly decreasing value of ε.
3 Experimental Results The system has been developed and tested on a set of video sequences obtained from surveillance cameras. The video sequences represent, on a simple background, different individuals walking over a straight line. Each sequence lasts approx 5-10s. The set of sequences has been split in two distinct parts, to produce a training set and a test set. A first set of frames has been extracted from the training set and used to design the system and to train the units. After the training phase has been completed, the system is tested on video frames extracted by the test set. A semi-quantitative evaluation can be obtained by labeling some of the units in the grid according to their location on the images in the training set, as shown in Fig. 3. The procedure is designed to transfer the labels from an image to network units and from an image to a different image. The procedure can be described as follows: each test image is applied to the network input and the mesh completes the relaxation process. At the end of the relaxation process it is possible to identify the unit whose match point is located closest to the label placed on the image. That unit is assumed to represent the tag point in the first image. When a test image is presented to the network and the relaxation process has been completed, the unit is located in a new point. This point is assumed to represent the label in the test image.
138
M. Milanova and L. Bocchi
Fig. 4. Results of the matching step. The grid indicates the position of the units, at the equilibrium, on two sample images from the test sequence. Three units have been marked in white on both images.
An evaluation of the performance of the network can be therefore achieved by measuring the distance from the transferred label and the manually applied label. A perfect matching between the images would produce a null relative displacement where the same unit should be located on the same marker in all images. Experimental results indicate that the average absolute displacement between the marks is about 0.9 units. A visual evaluation of the results is shown in Fig. 4 for two different frames of the sequence. It can be noted that units concentrate on boundaries of the moving figure, where the maximum information is present. As an example, Fig. 4 shows three units which are marked with white labels, together with their position in the grid. As it can be seen in the figures, those units are in a consistent position on both frames, although their spatial position is quite different in the two frames. More specifically, unit (7,4) is positioned on the neck of the figure, unit (8,8) is positioned on the belt, and unit (7,12) is located on the front foot. In the latter case, it can be noted how the grid identifies different points of the feet, due to its different orientation.
4 Conclusions The proposed system is planned to be part of an automated self-training system designed to perform video surveillance. Positions of labeled units allow extracting information about the position and the dynamics of the observed figure. The parameters which describe the relative distance of the units are, therefore, associated to the position and to the physical dimension of the different body parts.
Video-Based Human Motion Estimation System
139
Our aim is to interpret the deformation parameters of the grid to detect anomalies in the shape or position of the figure, as well as in the dynamics of the movement. Any discrepancy between the learned behavior of the grid and the actual behavior can be used to trigger an alarm stating that something anomalous is occurring.
Acknowledgment This paper is supported by NSF grant 0619069 Development of Interdisciplinary Arkansas Emulation Laboratory and by funding from the U.S. Defense Threat Reduction Agency (DTRA-BA08MSB008).
References 1. Wang, L., Hu, W.: Recent developments in human motion analysis. Pattern Recognition 36(3), 585–601 (2003) 2. Anderson, P., Corlin, R.: Tracking of Interacting Peoples and Their Body Parts for Outdoor Surveillance, Master Thesis (2005) 3. Little, J., Boud, J.: Recognizing People by their Gait: The Shape of Motion. J of Computer Vision Research 1(2), 2–32 (1998) 4. Milanova, M.: Object Recognition in Image Sequences with Cellular Neural Networks. Neurocomputing 31(1-4), 125–141 (2000) 5. Bocchi, L.: Evolution of an abstract image representation by a population of feature detectors. In: Cagnoni, S., Lutton, E., Olagu, G. (eds.) Genetic and evolutionary computation for image processing, pp. 157–176. Hindawi Publishing Corporatio (2008) 6. Kohonen, T.: Self-Organization and Associative Memory, 3rd edn. Springer, New York (1989)
Virtual Human Hand: Grasping and Simulation Esteban Peña-Pitarch1, Jingzhou (James) Yang2, and Karim Abdel-Malek3 1
Escola Politècnica Superior d’Enginyeria de Manresa (EPSEM) UPC, Av. Bases de Manresa 61-73, 08240 Manresa, Spain 2 Human-Centric Design Research Lab, Texas Tech University, USA 3 Center for Computer Aided Design, The University of Iowa, USA
[email protected]
Abstract. The human hand is the most complete tool, able to adapt to different surfaces and shapes and touch and grasp. It is a direct connection between the exterior world and the brain. I. Kant (German philosopher) defined how the hand is an extension of the brain. In this paper we present and develop a new algorithm for grasp any object in a virtual environment (VE). The objective is to present a novel theory for grasping in the VE any object with the virtual human (VH). The novel concepts for this application are the autonomous grasp, implementation of several types of grasp, and a new algorithm for grasp. Keywords: Autonomous grasp, virtual environment, virtual human hand.
1 Introduction Grasping has been an active research area for many years, and a great deal of effort has been spent on the automatic determination of grasping actions. The research activity has been oriented in different directions, ranging from robotics applications to the emulation of the human grasp actions using a VH, but the basic concepts are quite similar, and the techniques and methods used in robotics are also applied to the virtual human grasps and vice versa. Following the research line related to VH, this paper presents a novel approach to generating the grasp of different objects in a semiintelligent VE. 1.1 Related Work In research related to VH, [1], [2], [3], Dhaiba hand group is working on motion capture, and after being implemented in virtual environment; they study different types of grasps for applications like manipulation of a cell phone. A model of a virtual hand and its implementation in MAND3D was presented in [4]. Complete analysis and classification of the human hand was presented in [5] oriented to the design of hands for manufacturing tasks. [6] presented a design hand without grasp simulation, and hand grasping was presented in [7], based on a database. Regarding the simulation of human fingers, a deformable model based on Hertzian and similar theories was presented in [8]. Analysis of finger motion coordination (except the thumb) in the hand manipulative and gesture acts was studied in [9]. For hand and muscle simulation, a sample surfaces method was presented in [10]. The simulation of V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 140–149, 2009. © Springer-Verlag Berlin Heidelberg 2009
Virtual Human Hand: Grasping and Simulation
141
several robot hands was done in the environment called GraspIt! [11], [12]. Based on visual recognition, [13] reconstructed the hand posture using a hand model with 20 degrees of freedom (DOF). Controlling and performing activities with hands with a large number of DOF are very complex tasks; in order to reduce the complexity, [14] coupled the movements of some joints, reducing the number of DOF. Based on three functions of grasping surfaces, namely the object-supporting, pressing, and wrapping functions, [15] presented a study to assist robot designers in producing innovative robotic hand systems. The Sharmes simulator [16] is the development of a highly realistic model of the human hand and forearm; it contains 38 muscles and 24 DOF representing the joints of the system. Two of these DOF are for the wrist, two are for the arm, and 20 are for the hand. 1.2 Paper Outline This paper is organized as follows. In Section 2, we present a grasping flowchart that was used for our virtual human to grasp. The user chooses one object from several in the VE. Each object has several inherent tasks, i.e., a mug can be moved or used for drinking. The user chooses one of these tasks, and a semi-intelligent algorithm based on previous inputs and object attributes like surface shape and weight makes a decision to grasp with power or precision. Once that decision is made, the number of hands and fingers is calculated based on the surface shape of the object and the type of grasp. In the next, we describe a novel algorithm for grasping any object with power or precision. For a power grasp, the angles for each joint are calculated based on a geometry calculation. For a precision grasp, the fingertip of each finger that grasps is calculated by geometry; to determine the angles of each joint, we apply inverse kinematics and consider each finger an independent ray. In Section 3 we present two examples, without lost generality, and implement the flowchart for these examples. Section 4 presents the conclusion.
2 Grasping Approach Figure 1 presents a flowchart of the actions needed to perform a grasp in a given environment. First, the user chooses one object out of those entered or included as part of the scene. Each object has attached information about attributes such as surface shape, weight, temperature, fragility, and the task in which they are going to be used, among others; these attributes help the system make a decision about the type of grasp, power grasp, precision handling, pinch, pull, or push. Once the user chooses a task to be done with the selected object, the system, helps make a decision about how to grasp the object for that purpose, using as input the known attributes of the object. The next step in the sequence is to check whether or not the object is reachable; checking the position of the object with respect to the wrist is possible implementing the row range deficiency method [20]. The row range deficiency method was implemented for each finger sweeping the workspace volume and checking for the object. Selecting the number of fingers and determining if more than one hand is necessary is a function of the hand size and the surface shape of the object to be grasped. The following subsections explain how these decisions are made.
142
E. Peña-Pitarch, J. Yang, and K. Abdel-Malek
Fig. 1. Flowchart of the actions before performing a grasp
2.1 Make a Decision The method proposed to make a decision about how a VH must grasp an object in the VE is based on a support vector machine (SVM) applied in a single perceptron. The inputs in a simple associator are the object attributes (task, weight, surface shape, temperature, object stability, and fragility) and the output pattern is the type of grasp (power grasp, precision handling, pinch, pull, and push). When the VH makes a decision, it is based in a linearly separable system. When grasping any object, the input vector for a single perceptron can be:
… Task, temperature, weight, …, are the input attributes; we can add surface shape, fragility, and object stability.
Fig. 2. Single perceptron
Virtual Human Hand: Grasping and Simulation
143
Figure 2 shows a single perceptron [17],[18],[19], where the input vector is , ,…, described above, the bias, the weights for each input are given by the vector , ,..., , and the weight for the bias is . In this work, these networks are used to solve a classification problem in which the inputs are binary images [1,-1] of attributes like task, weight, surface shape, temperature, object stability, and fragility. The output of the perceptron is given by 1 where denotes the vector formed from the activations , . . . , , and the function of activation in this case is a symmetric saturating linear (satlins), because this is linearly separable. The output for each of these attributes for the symmetric saturating linear (satlins) can be 1 or -1; if these values are different, the function of activation saturates to 1 or -1. In the example of grasping a mug, if the output is 1, the VH grasps the handle, and if the output is -1, the VH grasps the side of the mug. The input/output relation is: 1 1
1
1 1 1
With a single perceptron, quite frequently the grasping output does not agree with all the input parameters; i.e., to grasp a mug, the input parameters can be weight, temperature, task, etc. If the VH needs to move a mug that is hot, if the output was “grasp side mug” (that means grasp with power), it is not compatible with the task constraint (a hot mug cannot be grasped with a power grasp). In order to solve this type of problem, new perceptrons were added, building a network with more information about the grasp. 2.2 Choosing the Number of Fingers and Hands Whether only one hand or both hands are needed to perform the grasp is a function of the shape and size of the object. Attribute surface shape provides information about the size and the weight of the object, which determines if one hand or both hands are used or if the action is not performed. Let D be the side dimensions to grasp and HL be the hand length; if D > 0.8HL then the VH has to use two hands for the grasp. For precision handling, the virtual human may need two, three, four, or up to five fingers. The number of fingers to be used is also a function of the shape and weight of the object. Power grasps and simple touches do not need a predefined number of fingers; normally, a power grasp uses five fingers and a touch uses only the index finger. The numbers of fingers used in other grasp types are functions of the attributes mentioned earlier. For precision handling, a primitive sphere is a function of the shape; or better, if the radius of the equator of the sphere is we can define the number of fingers as follows:
144
E. Peña-Pitarch, J. Yang, and K. Abdel-Malek
1 20 40 60
20 40 60 90
where PH means precision handling and the subscript indicates the number of fingers that, together with the thumb are involved in the grasp. When the output is power grasp (PG), the VH usually uses all the fingers, and we do not consider these conditionals in our algorithm. The following subsection shows how we calculate the angles for each joint and each finger, when the output is power grasp or precision handling. 2.3 Finger Angles Our approximation for grasping is based on the movement of the fingers. There are two types of movements. For grasping with power, the movement described, each finger except the thumb is circular. For the second, when grasping with precision, the fingertips’ position, including the thumb, approximates a circle. Based on these approximations, we can simulate all the human grasping proposed by [5]. Pinching is a particular case of grasping with precision. Pulling, pushing, and touching we considered positioning finger problems. For the first approximation of power grasping we can apply forward kinematics and calculate all the angles for every finger. For a cylinder with radius , Figure 3 depicts a cross-section of the cylinder and the schematic phalanx bones. The angles and are obtained from the geometry relationship. This example is considered a power grasp. The angle is for each finger, and the subscript is . . . where subscript is for the index, is for the middle, is for the ring, and is for the small finger. These angles are for the proximal phalanx with respect to the metacarpal bones . . . , the subscripts indicate the for each finger. It is similar for , where same fingers as before, and the angles are between the proximal phalanx and the medial phalanx. For , where the subscript is . . . the angles are between the middle phalanx and the distal phalanx. All of these angles are calculated for geometry and changed from local to global with a transformation matrix.
Fig. 3. The geometry relationship of finger segments
Virtual Human Hand: Grasping and Simulation
145
For precision grasping, the fingertip positions of each finger on the object boundary are given, and the finger joint angles for grasping the object are computed using inverse kinematics. Figures 4 and 5 depict the fingertip positions on a ball. Figure 5, the angles and depend on the diameter of the ball; from the observation of real people grasping a ball for a radius ρ = 27.5 mm, the results α= 60º and 0 . In addition, it is imposed that the middle finger stays in its neutral position (i.e., no abduction displacement). Then, the fingertip positions for the thumb, index, and ring fingers can be computed with respect to the wrist (global coordinates) coordinate system, while the small finger stays in the neutral position.
Fig. 4. Grasping a sphere
Fig. 5. Equator section with position of fingertips used
The inverse kinematic solutions depend on the initial values of the design variables ( ) for both iterative and optimization-based methods. Table 1 presents the solutions in degrees) for the index finger with the Newton-Raphson method, where the ( global coordinate is 11.22 152.341 77.4 in mm, the hand length is 200 mm, and the local coordinate is 7.3 59.9887 77.4 in mm. Table 1 shows that the convergence for the Newton-Raphson method is very fast when the initial angles are close to the solution. For the first set of initial values, the solution for the distal interphalangeal joint (DIP) ( ) is negative and is in the range Table 1. Index joint angles with Newton-Raphson method
Iteration Initial 7
0 6.95
30 39.3
30 30
10 -7.95
Initial 10 Initial 7
0 6.95 0 6.95
0 42.3 10 42.3
0 10 10 10
0 26.6 0 26.6
146
E. Peña-Pitarch, J. Yang, and K. Abdel-Malek
of motion. The negative angle for this joint represents hyperextension. However, usually, we can observe that humans never grasp this sphere by DIP hyper-extension. In practice, some joints in the fingers are coupled, i.e., the movement of one joint depends on the motion of another joint. For example, each finger except the thumb has two coupled joints. The DIP depends on the proximal interphalangeal joint (PIP) and the relationship between them is [14] 2 3 where the superscript identifies the finger, beginning with 1 for the index finger and ending with 4 for the small finger [21]. For the thumb, similar relationships were observed. 2 7 5
1 6
3 Implementation The implementation of the proposed approach has been done in C++. The implementation is divided into modules, following the schema shown in Figure 1. In some modules, the users interact with the virtual environment, i.e., the user chooses the object and task to be done with it; other modules run autonomously. Two examples are given below to illustrate the developed system. 3.1 Grasping a Mug Input parameters for grasping a mug can be: 1 1
1 1
1 1
Attributes like surface shape, fragility, and object stability do not help the system make a decision in this case, and therefore they are considered 0. 0 and 0, the weight input vector To simplify the example, if we choose 1,0,0,0,0,0 , and these values are implemented in Equation 1. Output can be 1 or -1. In this example, if the output is 1, the VH grasps the mug by its handle, and if the output is -1, the VH grasps the mug by its side. If the input is “drink” (task chosen by the user), “hot” (temperature inherent to the object), light (weight, VE knows the density and the volume and can calculate the weight), the output decision is: 1 1 1 0 1 100000 0 0 0
Virtual Human Hand: Grasping and Simulation
147
Fig. 6. Grasping a mug
In this case, the decision is to grasp the mug by the handle. Figure 6 shows the VH executing the action of grasping a mug by the handle. 3.2 Grasping a Joystick A similar process and results are shown in Figures 7 and 8 for a joystick. In this case, there are only two tasks, and the attributes are inherent to the joystick. To grasp a joystick by the side or the top is a function of the task to perform. The most important attribute for this case is the task to be performed with the joystick; for simplicity, these two tasks are to push the rear button, to unload the engine and move the joystick by the top; this movement will imply up-down the load. 1 1 The output will be 1, grasping the joystick by the side, as shown in Figure 7, or -1, grasping the joystick by the top, as shown in Figure 8.
Fig. 7. Grasping a joystick; power grasp
Fig. 8. Grasping a joystick; power grasp
4 Conclusion We have presented a novel approach for grasping based on the objects and their functionality. When the object is selected for the user, it is associated with more
148
E. Peña-Pitarch, J. Yang, and K. Abdel-Malek
attributes, which we describe above. After the user chooses the task, the virtual human, if the object is feasible, grasps with the type of grasp calculated as a function of output in a single perceptron. The new concept in this paper is that the virtual human can grasp autonomously without the user once the task is chosen. Support vector machine (SVM) theory, for a perceptron, was applied for this autonomous grasp. After we developed the approach without lost generality, we implemented and showed two examples.
Acknowledgements This work was partially supported by the projects DPI2007-63665 and the Caterpillar Inc. project: Digital Human Modeling and Simulation for Safety and Serviceability.
References 1. Miyata, N., Kouchi, M., Kurihara, T., Mochimaru, M.: Modeling of human hand link structure from optical motion capture data. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3, pp. 2129–2135 (2004) 2. Miyata, N., Kouchi, M., Mochimaru, M.: Posture estimation for design alternative screening by dhaibahand - cell phone operation. In: SAE 2006 Digital Human Modeling for Design and Engineering Conference 2006–01–2327 (2006) 3. Miyata, N., Kouchi, M., Mochimaru, M.: Generation and validation of 3d links for representative hand models. In: SAE 2007 Digital Human Modeling for Design and Engineering Conference, 2007–01-2512 (2007) 4. Savescu, A., Cheze, L., Wang, X., Beurier, G., Verriest, J.P.: A 25 degrees of freedom hand geometrical model for better hand attitude simulation. In SAE International, 2004-012196 (2004) 5. Cutkosky, M.: On grasp choice, grasp models, and the design of hands for manufacturing tasks. IEEE Transactions on Robotics and Automation 5(3), 269–279 (1989) 6. Albrecht, I., Haber, J., Seidel, H.P.: Construction and animation of anatomically based human hand models. In: Eurographics/SIGGRAPH Symposium on Computer Animation, pp. 1–12 (2003) 7. Aydin, Y., Nakajima, M.: Database guided computer animation of human grasping using forward and inverse kinematics. Computers and Graphics 23(1), 145–154 (1999) 8. Barbagli, F., Frisoli, A., Salisbury, K., Bergamasco, M.: Simulating human fingers: a soft finger proxy model and algorithm. In: HAPTICS 2004. 12th International Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, pp. 9–17 (2004) 9. Braido, P., Zhang, X.: Quantitative analysis of finger motion coordination in hand manipulative and gestic acts. Human Movement Science 22, 661–678 (2004) 10. Huang, C., Xiao, S.: Geometric modeling of human hand for muscle volume conductor computation. In: 19th International Conference - IEEE/EMBS, vol. 3, pp. 207–208 (1997) 11. Kragic, D., Miller, A., Allen, P.: Real-time tracking meets online grasp planning. In: 2001 ICRA. IEEE International Conference on Robotics and Automation, vol. 3, pp. 2460–2465 (2001) 12. Miller, A., Allen, P., Santos, V., Valero-Cuevas, F.: From robotic hands: a visualization and simulation engine for grasping research. Industrial Robot: An International Journal 32(1), 55–63 (2005)
Virtual Human Hand: Grasping and Simulation
149
13. Nolker, C., Ritter, H.: Visual recognition of continuous hand postures. IEEE Transactions on Neural Networks 13(4), 983–994 (2002) 14. Rijpkema, H., Girard, M.: Computer animation of knowledge-based human grasping. Computer Graphics 25(4), 339–348 (1991) 15. Saito, F., Nagata, K.: Interpretation of grasp and manipulation based on grasping surfaces. In: IEEE International Conference on Robotics and Automation, vol. 2, pp. 1247–1254 (1999) 16. Chalfoun, J., Younes, R., Renault, M., Ouezdou, F.: Forces, activation and displacement prediction during free movement in the hand and forearm. Journal of Robotics Systems 22(11), 653–660 (2005) 17. Hagan, M.T., Demuth, H.B., Beale, M.H.: Neural network design. PWS Pub., Boston (1996) 18. Haykin, S.: Neuronal Network. A Comprehensive Foundation., 2nd edn. Prentice Hall international, Inc., Englewood Cliffs (1999) 19. Anderson, J.A.: An Introduction to Neural Networks, 3rd edn. Bradford Book (1997) 20. Peña-Pitarch, E., Yang, J., Abdel-Malek, K.: SANTOSTM Hand: Workspace Analysis. In: 16th IASTED International Conference on Modelling and Simulation, Cancun, Mexico, May 18-20 (2005) 21. Peña-Pitarch, E., Yang, J., Abdel-Malek, K.: SANTOSTM Hand: A 25 Degree-OfFreedom Model. In: SAE 2005 Digital Human Modeling for Design and Engineering Conference, 2005-01-2727 (2005)
Harmonic Gait under Primitive DOF for Biped Robot (Harmonically Communicated Movement) Shigeki Sugiyama University of Gifu, 1-1 Yanagido, Gifu City, Gifu500-1234, Japan
[email protected]
Abstract. Here argues about an effective and an energy less-consuming walking for a humanoid type of robot. There are many humanoid types of robots in the world that can walk, run, dance, and get up, etc. Those are mostly used and enjoyed in the field of entertainment for kids’ pleasures. For any other usages, humanoid types of robots are not practical enough in a daily life usage or in factory usages or in any others. Because the robotics movements are not smooth enough and not effective enough for doing things in them, that is to say, the stable biped walking and the energy optimization biped walking (series of those walking figures) could not meet the necessary conditions for the expected usages or for other purposes. So here introduces a new idea of humanoid type of harmonic gait, which makes a robot move more effectively and an energy lessconsuming walking. Keywords: harmonic gait, less-consuming walking.
1 Humanoid Type of Robotic Basic Movement As we know well, the basic idea of biped locomotion method with two legs is traced back to the original thought of the Zero Moment Point (ZMP), which is the point or the aria between the biped feet and the contact surface of a ground or of a floor. And that ZMP usually becomes a rectangle or a polygon as the area of the stability. As the contact forces are due to gravitation and inertia of the walking body, the ZMD in a rectangle or polygon can also be defined as the point of the surface where the moment of the resultant gravitation and inertia forces becomes zero. This phenomenon of the walking mechanisms can also be true to be said for very complex biped waling creatures like a human and others.
2 Problems in Biped Locomotion Method 2.1 Problems of Biped Locomotion It is said that a jogging robot including a walking robot consumes many times as much energy as its jogging by human, which shows us that there is much waste movements and futile activities with it, that is to say, the locomotion mechanism is not smooth enough or effective enough in a sense of the walking movements. When we can pick up a couple of reasons why it comes as it is so as shown below. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 150–159, 2009. © Springer-Verlag Berlin Heidelberg 2009
Harmonic Gait under Primitive DOF for Biped Robot
151
1. Two-legged movement of human is not deeply studied in a sense of effective, smooth, and a low energy consumption method for a biped type of robotics. It seemed that a focus of the whole studies was only on how to get various kinds of walking types mechanisms or figures by a biped locomotion that were able to be applied to a biped robot walking. 2. It is true to say that we are under the atmosphere of the gravity, so that the living creatures in this world should be very good at using the gravity in walking movements effectively, smoothly, and with low energy consumption. So it would have been better to learn more about effective and smooth movements out of human or other biped walking creatures in a sense of gravity. 2.2 Human Walking in General When we walk on a road in flat, we feel that we can walk as much longer as we want. And it is, in fact, true to say that we can walk for a several hours or more continuously. It may come to a conclusion for this reason that we move legs within a minimum amount of energy consumption by resting some part of muscles of legs or by releasing some part of muscles of legs or by taking a low energy consumption leg movement method. But when we want to go fast or run fast, it can continue going fast only for a couple of minutes for an ordinal person (not for an athlete). The reason why is that we use legs (muscles) only for going or running fast, which is equivalent to making always keep forcing legs moving back and forth without resting or releasing muscles at all or even against smooth and a low energy consumption mechanisms for going or running. This method is a very tiresome and energy consuming futile moving method even though it is able to go faster or to run faster, so that usually an ordinal person cannot go or run with this way for long. So from these arguments, we can say that it might be possible to extract some of more important facts about the smooth and effective moving methods out of human gait if we closely look at our movements with two legs from other aspects. 2.3 Legs Movements in Dancing and the Seamless Low Energy Consumption Gait Mechanisms We have now various kinds of dancing by from old to young and by from snob to shabby and from traditional to brand new. Some of those have got quite histories and others do not. Here is just putting a focus on the social dancing for its movements of walking and walking mechanisms of legs. The social dancing is now a popular dancing in the world and it is centered in England. And lots of techniques for dancing have been developed for several kinds of social dancing. And the social dancing in general has got two categories, one is “Modern” and another is “Latin”. If we talk about Modern of Blues, there are some interesting walking techniques behind. Basically it consists of back and forth walking movements. And those movements are smooth enough to go forward and to go backward seamlessly with a couple stuck and with a couple simultaneous moving (walking). Here shows the basic walking mechanisms from an aspect of the social dancing in a sense of seamless and low energy consumption manners. And those are as follows.
152
S. Sugiyama
1. Releasing the muscles of one leg to stand by this one leg, which is immediately and closely linked with the forward or backward movements, is for starting a movement. 2. And then by using the ankle joint and the knee joint of the leg to make both of them bending gradually, the whole body goes forward or backward as the whole body sinking down by those bending, which can be done theoretically with a minimum energy consumption by transformation from the potential energy to the kinetic energy of the whole body forward movement. 3. And then, make the another leg release in order to get it beside the bending leg of the ankle and the knee without using any muscles, which is again no energy consumption involved theoretically just like a pendulum movement that goes back to the gravity stable position and that swings over further to forward. 4. And then the another leg (the released leg that went forward) is now ready for standing upright by pushing a bit onto the floor, which can make the whole body go further and can make the whole body stand upright onto the released leg at a floor. 5. And this movement is connected to the following forward or backward movement of the whole body, and further. 6. By giving these movements of the two legs one after another, the whole upper body (torso) is always stabilized upright onto the one leg or onto another leg one after another. 7. And the repetition of this walking pattern comes up for in a series of seamless and low energy consumption gait (manner of walking).
3 Ideas of Harmonic Gait 3.1 Waling Method in General It is classically said that a biped locomotion can be done by having the Zero Moment Point (ZMP) at the point of the contact place of floor or ground or road or at any others. As the related contact point of forces are the gravitation and an inertia of the whole waling body, the ZMP can also be expressed as the point of the contact place of floor or ground or road or at any others where the moment of the assembled inertias for each part of the body (the total sum of the inertias of the moments vectors) and gravity forces for the whole body comes to zero. As explained in the section 2.3, the whole upper body (torso) can be always stabilized upright onto the one leg or onto another leg one after another under the situations explained. As the result of it, we are able to define a biped robot as shown in Figure 1 with 8 dof as the most primitive case of possible walking mechanisms. And so, in general, it can be defined and expressed as follows. q1, q2: displacement of a hip joint in XY coordinate plane. q3, q4, q5, q6: rotation angle of hip, knee, ankle, and toe joint for a right lower limb. q7, q8,q9, q10: rotation angle of hip, knee, ankle, and toe joint for left lower limb. In this case, the coordinate dimensions can be defined as follows. Rhip = (q1, q2)T ; hip joint, R3 = {q1+r3cos(q3), q1+r3sin(q3)}T; right thigh, … R10.
Harmonic Gait under Primitive DOF for Biped Robot
153
q1 q2
Fig. 1. Example of Biped Robot of Gait
And there should be the ranges of joints’ angles for each joint. Those can be defined within limitations of those angles as they should be, like from the minimum possible bending angles to the maximum possible bending angles for each joint. So those are to be defined by the natures that we want to put on a robot. And the velocity vector of each segment can be defined as follows. Dot(R3) = {dot(q1) –r3dot(q3)sin(q3), dot(q2)+r3dot(q3)cos(q3)}, ….., Dot(R10). And the kinetic energy can be defined as follows. K = f{q1, q2, q3, …, q10, dot(q1), dot(q2), dot(q3), …, dot(q10)} = 1/2{dot(q)TM(q)dot(q)}. And the potential energy can be defined as follows. P = f{q1, q2, q3, …, q10}. And so, Lagrange equation can be simply defined in the conservative forces of system as follow. L = (K – P). 3.2 Theory of Harmonic Gait Here argues about a theoretical walking method that minimizes a cost function in energy consumption by using the idea of Harmonic Gait.
154
S. Sugiyama
When we think about a walking method within a minimum energy consumption pattern, we will know that it will be the minimum when we can use wisely the gravity and inertia of each object without wasting extra energy for keeping balancing, etc. as we know well. For doing these, we can have the following conditions by the knowledge of dancing techniques that are shown and argued partly in the previous section. [Necessary and Enough Conditions for Harmonic Gait] 1. The whole body should be calm and smooth and seamless without irrelevant movement for walking. 2. The usage of gravity means that the whole body goes forward or backward by transforming a potential energy into a kinetic energy by sinking the whole body or by transforming a kinetic energy into a potential energy by raising the whole body upward for conservative forces. 3. An inertia can be used for going forward or backward seamless movement of the whole body by releasing some part of the body freely. That is to say, by using the ankle joints and the knee joints of one leg to make both of them bending gradually, the whole body goes forward or backward by those pair of bending as it is without wasting any extra energy consumption with sinking the whole body down too. And also by releasing a leg situated backward freely like a pendulum, its energy can be transformed into the body of forward movement just like a pendulum swinging coming-back and going further forward. 4. The biped walking should be smoothed by interchange the legs one after another, that is to say, the gravity centre should be changed smoothly from onto Left foot to onto Right foot or vice versa within a seamless swing. 5. A cooperative two robots movements need a harmonic behaviours of stabilized, synchronized, and seamless movements. And so, the continuously calm and stable and smooth torso onto one leg or onto another leg deadly uprighted should be the necessary behaviours for a cooperative two robot movement. So we can have the following theory by using the above sorts of the knowledge. [Theory of Harmonic Gait] The initialisation of walking energy (Lsink) can be given by sinking the whole body, that is to say, a potential energy is transformed into a kinetic energy by bending the ankle and the knee as releasing the whole legs’ muscles. The potential energy (P) is defined by the following equation with sinking (Psink). P = (Pinitail – Psink) And the kinetic energy (K) is defined by the following equation with sinking (Ksink). K = Ksink And the energy loss (E) by the system is defined by the following equation with loss (Eloss) in movement. E = Eloss
Harmonic Gait under Primitive DOF for Biped Robot
155
And put the above equations into the following Lagrangian equation, we have L = (K – P) – E L = {Ksink – (Pinitial – Psink)} – Elossdown = Ksink – Pinitial + Psink - Elossdown = Lsink So the above equation also says that the potential energy by the action of sinking (Lsink) will be transformed into Kinetic Energy of Walking (Ksink) in the five conditions mentioned above. That is to say, ideally we can have the following energy transformation. Lsink Æ Ksink, that is to say, ideally, Lsink = Ksink. And the effectiveness (EFsink) in walking with sinking will be defined by the following equation. Lsink EFsink = --------------------------------------Ksink – Pinitial + Psink And also, we can say that this action of Walking can be transformed into P(Pup: going upward and back to the initial original state) again smoothly if this transformation can be done without any loses in the five conditions as mentioned above. Krise Æ Pup That is to say, ideally we can have the following transformation. L = (K – P) – Eloss L = {Krise – (Psink - Pup)} – Elossup = Krise – Psink + Pup – Elossup = Lup And the effectiveness (EFup) in walking with up will be defined by the following equation. Lup EFup = -----------------------------------Krise – Psink + Pup And in this case, we have Krise = Pup. And so, the above Lagrangian equation will be; L = –Psink + Pup – Krise – Elossup = –(Psink – Pup) – (Krise + Elossup) = K – P.
156
S. Sugiyama
And the above equation shows that “the movement comes back to the original situation”. These kinds of movements and transformations can be done if and only if the “Necessary & Enough Conditions” as mentioned in this section above are satisfied. And these series of movements with transformations will be able to be circulated and repeated for good within the minimum energy consumption condition. These transformations, the gravitation centers’ changes, up and down movements displacements, and the robotics generalized movements and generalized movements figures are illustrated in Figure 2 below.
Fig. 2. Robotic Generalized Movements
[Contents of Figure 2] First row shows the gravity center changes in ZMP. Second row shows the body movement displacement on the horizontal axis. Third row shows the body upward and downward movement displacement. Fourth row shows the generalized legs movements figures and the gravity center position movements. [END] So, this theory proves “the method of walking patterns and figures with a low energy consumption mechanisms by using Harmonic Gait Method”. Here shows an example by using the theory shown in the above. The Figures 3 and 4 show the general initial conditions for the dynamics and the concrete values for the upward (Wvup) and the downward (Wvdown) forces in the case of 176 height robot (h), the gravity (G), the moment of inertia (M), and the lean angle of the body (theta; θ). And Eloss will be less than 40% ideally. By using the above dynamics values for the biped robot, the movements histories are shown by the equations below. The time to go downward; Td Td = square root [2(176*sinθ–h)/G] The distance to move forward by bending the knee, Df, in the time Td. Df = (1/2)*Fb * (Td)2 The distance to move downward by bending the knee, Dd, in the time Td. Dd = (1/2) * Fvd * (Td)2 The results of the movements by using the theory is summarized at Figure 2 above.
Harmonic Gait under Primitive DOF for Biped Robot
157
Fig. 3. Initial Dynamics for Gait
Fig. 4. Initial Dynamics for the Elements of Values of Gait
3.3 Theory of Continuous Body onto Method and Harmonic Walking The important matter for a biped robot to walk in a low energy performance is not to have extra irrelevant movements; that is to say, the whole body should be stable and calm but flexible in dynamic movement as having been studied in the above. In order
158
S. Sugiyama
to have this situation, we can say the following things from the knowledge of the social dancing. 1. The whole upper body (torso) is always upright onto the one leg or onto another leg one after another, which makes a biped robot allow being stable, calm, and dynamic in movement. So the torso is just stable without redundant movements and the legs only shall move as they are wanted to be. In this condition, we can consider and generalize a robot figure as shown in the equations below. That is to say, the upper body can be thought as a simple unique figure without the hands nor the neck nor the head just like an accumulated unity mass, and it is illustrated as shown in Figure 5 below. 2. The above condition makes allow two biped robots walk always with simultaneous and identical movements with a stable and calm “A” and “B” (torsos) as shown in Figure 6. And the walking figures are under the low energy consumption method by using Gravity and Inertia. And those can move harmoniously closely side by side or they may have a thing between them for bringing to. 3. The two robots will be able to do cooperative work under the above condition because the two robots can be with upright body, which can allow two bodies closely side by side as shown in Figure 6. And because two torsos are stable and calm but very flexible in bilateral movements, it makes possible for the two robots work together cooperatively and harmoniously for bringing a good or for doing harmonious working, etc. Robot =“The whole parts of the biped robot”=“Head + Upper Body + Hands + Legs” = “Torso + Legs” = “Torso” + “Legs” = “Object A” + “Object B” = “A” + “B”.
“A”
“A1”
“A2”
“B”
“B1”
“B2”
Fig. 5.
Fig. 6.
4 Conclusion Through the examinations and discussions in this paper, the following results have been given; 1. Basic idea of harmonic Gait Walking has been shown, 2. The generalized walking mechanisms of the biped robot and the theory for a low energy consumption (more than 30% less respectfully) for a biped robot has been introduced, 3. By using the theory of harmonic Gait Walking, it has been shown that a cooperative movement of a couple is possible to perform for a harmonic works to bring a thing together.
Harmonic Gait under Primitive DOF for Biped Robot
159
Aknowledgements. First of all, I am truly indebted to my professional dancing instructors, Ms. S. Yamamoto and Ms. M. Yamashita who showed and taught me about the detailed dancing movements & mechanisms and those logical aspects of explanations through a year or so. And also I would like to thank Professor M. Sasaki at Gifu University for having given me basic and general robotics dynamics information.
References 1. Anderson, J.A.: A biped walking machine. MIT Press, Cambridge (1989) 2. Sugiyama, S.: Holonic Knowledge Base. In: IEEE & ICONIP 8th International Conference on Neural Information Processing proceedings, November 14-18 (2001) 3. Chareyron, S.: Stability and regulation of nonsmooth dynamical systems (5408) (December 2004)
Problems Encountered in Seated Arm Reach Posture Reconstruction: Need for a More Realistic Spine and Upper Limb Kinematic Model Xuguang Wang Université de Lyon, F-69622, Lyon, France, INRETS, UMR_T9406, LBMC, Bron, Université Lyon 1, Villeurbanne
[email protected]
Abstract. In this paper, we will present the main problems encountered for reconstructing in-vehicle reach postures. Among the 2176 successfully captured movements, about 7.4% of them were considered as “bad quality” with a high residual error between reconstructed and measured marker positions. They mainly correspond to the far targets and in the direction that one has to elevate the arm. The results of the present study strongly suggest that a more realistic kinematic model of the upper body including the shoulder complex, pelvis and spine is required. In addition, the natural coordination between joint axes should also be used for compensating the lack of information in case of underconstraint situation and for correcting the uncertainty of surface markers positions. Keywords: Reach, Digital human, Motion reconstruction, Motion capture, Discomfort.
1 Introduction Evaluating reach capacity and difficulty is one of main requirements for ergonomic design of workplace [4-6]. Recently, we proposed a unified data based approach which aimed at predicting both reach envelopes and reach discomfort for a digital human model [18-21]. In this approach, four reach surfaces, from half-flexed arm distance to maximum reach with torso participation, need to be predicted at first from an existing database of reach postures. Then, the discomfort of a target reach is evaluated in terms of target position with respect to these four reach envelops. We applied this approach for predicting in-car driver’s reach envelopes and discomfort. One of critical issues in this approach was to build-up a database of reach postures covering a large arm reachable space. For data collecting, 2184 reach movements (twenty-four differently sized male and female subjects x 91 targets located in a large reachable space) were captured. The aim of this paper is to present the difficulties encountered in reconstructing these reach postures using a digital human model and to suggest future research directions for improving the quality of reconstructed movements. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 160–169, 2009. © Springer-Verlag Berlin Heidelberg 2009
Problems Encountered in Seated Arm Reach Posture Reconstruction
161
2 Reach Motion Data Collecting Thirteen female and eleven male subjects participated in the experiment. They were selected according to three groups by stature • Short • Average-height • Tall
: <1625 mm : 1625 - 1755 mm : > 1755 mm
An experimental driving mock-up was built up. It was composed of a seat, two pedals (acceleration and braking) and a steering wheel. It corresponded to a standard Renault car configuration. The subjects were asked to naturally grasp a sphere of 40 mm in diameter with three fingers (thumb, index and major) of the right hand from a standard driving posture. The reach posture was maintained during about 3 seconds before returning back to the starting posture. The buttock was not allowed to leave the seat. The left hand was maintained on the steering-wheel during the whole movement.
Fig. 1. Target location
The targets were located in four azimuth planes (P-15 to P105), -15° to 105° from the sagittal plane passing through the right shoulder (Fig. 1a). For each azimuth plane (Fig. 1b), the targets were positioned in six elevations (E-60 to E90) (from -60° to 90°) and four distances from the shoulder (D1 to D4). The four distances were defined with respect to the maximum reach distances without (D2) and with (D4) torso movement for each elevation. D3 was the mid distance between D2 and D4. D1 was defined as the two-third of D2. Maximum reach distances D2 and D4 for each target elevation in each orientation plane were determined prior to the experiment for each subject. In all, 75 different target locations were tested for each subject: • 60 locations for the planes P25, P65 and P105 − 4 distances (D1, D2, D3, D4) − 5 elevations (-60°, -30°, 0°, 30° and 60°)
162
X. Wang
• 11 locations for plane P-15 − 4 distances (D1, D2, D3, D4) − 3 elevations (0°, 30°, 60°). The elevations -30° and -60° were eliminated because of the interference between the body and the seat. − The target E0-D1 in the plane P-15 was also eliminated because of the interference with the steering-wheel. • 4 locations for the pole (elevation 90°) − 4 distances (D1, D2, D3, D4) In order to test the repeatability of discomfort rating and reach posture, the target E0-D3 for each azimuth plane was repeated two times. In addition, two different grasping types were also tested for the same target location in order to look at the effects of grasping type of reach discomfort. Therefore, each subject carried out 91 (75+8+8) reach movements in total. Movements were captured using the optoelectronic system VICON with 10 cameras at a frequency of 50Hz. The detailed description of the experimental procedure for data collecting can be found in Wang et al. [21].
3 Posture Reconstruction Method Motion reconstruction consists in calculating the joint angles of a human model from captured marker trajectories, called also inverse kinematics motion reconstruction. Several model-based motion reconstruction methods have been proposed in the past. An up-to-date review of motion reconstruction methods can be found in Ausejo and Wang [2]. For instance, Lu and O’Connor [13] proposed a Global Optimization Method (GOM), searching for an optimal posture minimizing, in the least square sense, the overall distances between the measured and the model-determined marker coordinates, of all the body segments at each frame. In a recent European research project named REALMAN, Ausejo and his colleagues [19-1] proposed a similar approach using natural coordinates and applied it to the whole body motion reconstruction. In the present study, the inverse kinematics solver is based on generalized pseudoinverse as initially proposed by Liegois [12]:
Δθ = J + ΔX + (I − J + J )(θ − θ ref )
(1)
where ΔX is the incremental marker displacement, Δθ the joint angle change. J is called the Jacobian and relates linearly ΔX and Δθ . J + is the pseudo inverse of J . θ is the joint angle array representing the body posture and θ ref is a reference posture. The first term of this equation minimizes the distance between the measured and the model-determined marker coordinates and is of high priority. The second term tends to bring the actual posture to a reference one within the range of solutions admitted by the first term. (I − J + J ) is the projection operator of (θ − θ ref ) into the null space of J. The second term is of lower priority. This formulation therefore enables
Problems Encountered in Seated Arm Reach Posture Reconstruction
163
the driving of over-constrained as well as under-constrained chains: most frequently, two or more markers are placed on each segment of the four limbs. For instance, the arm posture is therefore over-constrained and not dependant on the reference posture. On the other hand, the spine, where it is frequently not possible to place markers, is driven by markers on the pelvis and on the upper torso. The spine here is underconstrained. An infinite number of solutions can exist, which minimize the marker distances. The second term of the equation therefore attracts the posture to a reference one, leading to a unique solution. Otherwise, without the second term, a solution will be found, which is approaching to its initial guess. This inverse kinematics algorithm is implemented in the motion reconstruction and simulation software RPx (see Monnier et al., [15] for a general presentation of RPx), which is used in this study for reach posture reconstruction. Several preparatory steps are necessary before joint angle calculation: 1. Definition of the digital twin of a real subject from twenty-four measured anthropometric dimensions. For this, the tool named BodyBuilder by the software editor Human Solutions is used. 2. Verification and fine adjustment of the digital human model created in the step 1. It consists in a visual inspection by superimposing the model with different views of a reference posture (Fig. 2). RPx allows one to adjust segment length and joint angle. 3. Marker attachment. Once the human model is superimposed correctly with the photos, each marker is attached to one body segment in order to obtain its local coordinates in the corresponding body segment coordinate system. 4. Joint angle calculation by minimizing the distance model-based marker positions and those measured by the motion capture system. In order to reconstruct the posture of the spine and pelvis, the standard seated driving posture is used as the reference posture. In addition, an empirical coordination laws of spine joint joints are also used (Monnier et al) [14], allowing a better control of spine joint angles by four torso global attitude parameters (axial rotation, forward and backward flexion-extension, lateral flexion, compression/elevation). Fig.3 shows an example of reconstructed reach posture.
Fig. 2. Verification and fine adjustment of model’s dimensions and marker attachment
164
X. Wang
Fig. 3. An example of reconstructed reach posture for the target P065-E0-D3
4 Evaluation of Reconstructed Reach Postures In order to evaluate the quality of reconstructed reach postures, we calculated the mean value (m) of the residual errors of the following eight markers and their standard deviation (s) for each posture:
• • • •
Markers on the right (lfe_r) and left knee (lfe_l) Marker on the chest (sn) Markers on the right upper limb: shoulder (ac_r), elbow (lhe_r), wrist (ms_r) Markers on the hand : hand back (smc_r), index fingertip (index_r)
It is well known that the soft tissue artefact (STA) is the main error source for motion analysis [11]. For instance, Fuller et al. [18] reported that skin-mounted markers on the lower limb (thigh and shank segments) could exhibit displacements with respect to the underlying bone of up to 20 mm. Moreover, the procedure used for defining an individualized 3-D geometric model is quite approximate. In a past study [19], we compared the seven anthropometric dimensions between direct measurements and those obtained by superimposing a digital model with a least 2 photos of different views in two different reference postures (standing and sitting). Their average differences could reach as high as 28.7 mm for the upper arm length in the standing posture and 44 mm for the sitting height for the sitting posture. Considering possible large relative movement between surface marker and the underlying bone as well as the approximation made for individualized digital twin of the subject, the reconstructed posture is qualified “bad quality” if both m and s are greater than 20 mm. Using this criteria, the postures with a high residual error but uniformly distributed (m>20 mm and s<20 mm) and those with a large dispersion of residual errors but a small global residual error (m<20 mm and s>20 mm) are considered as “correctly” reconstructed”. In the first case, the residual errors are relatively uniformly distributed among the selected markers, and corresponding reconstructed postures were visually natural looking in general. In the second case, the residual errors are unevenly distributed with a low average error. It may be due to high error of one or two mislabelled
Problems Encountered in Seated Arm Reach Posture Reconstruction
165
Table 1. Number of badly reconstructed postures according to target distance, plane, and elevation
D1 D2 D3 D4
Distance 11 33 47 71
Plane P-15 P25 P65 P105
56 21 22 63
E-60 E-30 E0 E30 E60 E90
Elevation 2 5 16 29 72 38
80 70
Residual error (mm)
60 50 m
40
s
30 20 10 0 lfe_l
lfe_r
sn
ac_r
lhe_r
ms_r
smc_r
index_r
Marker
Fig. 4. Means and standard deviations of the residual errors of the eight selected markers of the 162 movements qualified “badly reconstructed”
markers. As the method is a global optimisation procedure, it is quite robust to the missing and mislabelling of a small number of markers. Among the 2176 (24 subjects x 91 targets – 8 unsuccessfully captured trials) captured reach movements, 162 postures are considered “badly reconstructed”, representing 7.4%. Their distribution according to target location is summarized in Table 1. One can see that these badly reconstructed postures mainly corresponded to the targets located at far distances (D3 and D4) in the planes -15° and 105° with a high arm elevation (azimuth 60° and 90°). Fig.4 shows the distribution of residual errors for the eight selected markers. One can see that the main errors are located at the shoulder (ac_r), elbow (lhe_r) and wrist (ms_r). The marker shoulder (ac_r) is in the area of high skin movement when the arm is elevated. For far targets, the elbow is almost fully extended, leading to the alignment of the upper arm and forearm and thus to high uncertainty about their axial rotation. In addition to skin movement artefact, this may partly explain high residual errors for the markers elbow (lhe_r) and wrist (ms_r). However, the residual errors decrease for more distal markers (smc_r, index_r). This may lead to unrealistic hand posture.
166
X. Wang
Fig. 5. Example of a “badly” reconstructed posture for the target P(-15)-E60-D4, showing high residual error of the markers attached at the upper and lower arms. The residual error of a marker and its direction is represented by the red stick from the marker.
Fig. 6. An example of “badly” reconstructed reach posture for the target P(-15)-E30-D4. The postures of the three other reach distances in the same reach direction are also shown. The residual error of a marker and its direction is represented by the red stick from the marker.
Figures 5 and 6 show two examples of “badly” reconstructed postures, one illustrating high residual errors of the markers attached on the upper limb, the other for the thorax.
Problems Encountered in Seated Arm Reach Posture Reconstruction
167
5 Concluding Remarks In this paper, we have illustrated the main problems encountered in reconstructing invehicle reach postures. Among the successfully captured movements, about 7.4% of them were considered as “bad quality” with a high error between reconstructed and measured marker positions. They mainly correspond to the far targets in the direction that one has to elevate the arm. The markers attached on the elbow had the highest residual errors, and then came the markers at the shoulder. The residual errors decrease for distal markers at the wrist, the hand back and the fingertips. The fingertips being the last kinematic chain of the upper body, this may imply that the joint angles are probably not correctly estimated for these targets. The kinematic human model used in this paper is based on the RAMSIS model, which is widely used in automotive industries for car interior design. Like most of digital human models, the spine is represented by six spherical joints. The shoulder complex is simplified as an open chain composed of one spherical joint at the glenohumeral joint and one revolute joint at the sterno-clavicular joint. The results of the present study clearly show that this simplified model is probably good enough for the targets close to the body without high arm elevation. But for far and high targets, such a simplified model may not be appropriate. In this study, the 3-D individualised human model is created based on classical 1-D anthropometric dimensions. Frequently, some manual adjustments are required to fit the photos. Using this procedure, the advantage is that the internal structure of human model (joint centre, joint axes) is identified implicitly. However, we do not know how good the identified internal structure is with respect to the real anatomic structure. Clearly we need information which allows the identification of internal kinematic structure from external measurable parameters, such as palpable anatomic bony landmarks. There is also a need to define natural coordination laws of joint axes for compensating the lack of information in case of under-constraint situation and for correcting the uncertainty of surface markers positions. In fact, for far targets, one has to fully extend the arm and to move the trunk. When the elbow is fully extended, the upper and lower arms are almost aligned; it is difficult to correctly estimate their axial rotations due to high uncertainty of the marker positions at the elbow and wrist. In a seated position, it is difficult to put many makers on the trunk. In our study, only four markers were attached at the thoracic cage and two markers at the pelvis. The movement of the trunk and pelvis was clearly under-constraint in our case. It is also true for the shoulder movement. Only one marker was attached at the shoulder. Due to the skin movement artefact, the movement of the sterno-clavicular joint is difficult to be estimated. The difficulties encountered in arm reach posture reconstruction for far and elevated targets strongly suggest that a more realistic kinematic model of the upper body including the shoulder complex, pelvis and spine is required. In addition, the natural coordination between joint axes should also be used in case of underconstraint situation and for correcting the uncertainty of surface markers positions.
168
X. Wang
Acknowledgement This study is partly supported by the car manufacturer Renault. The author would like to acknowledge Gilles Monnier for his technical assistance as well as Nicolas Chevalot, Sebatien Parello, Julien Causse for data collecting.
References 1. Ausejo, S., Suescun, Á., Celigüeta, J., Wang, X.: Robust Human Motion Reconstruction in the Presence of Missing Markers and the Absence of Markers for Some Body Segments. In: SAE International conference and exposition of Digital Human Modeling for Design and Engineering, Lyon, France, July 4-6 (2006) SAE paper 2006-01-2321 2. Ausejo, S., Wang, X.: Motion Capture and Reconstruction. In: Duffy, V.G. (ed.) Handbook of Digital Human Modeling. Taylor & Francis Group, Abington (2008) 3. Boydstun, L.E., Kessel, D.S.: Maximum reach models for the seated pilot. Center for ergonomics, college of engineering, University of Michigan (March 1980) 4. Chaffee, J.W.: Methods for determinating driver reach capability. SAE report 6901205, New York (1969) 5. Chaffin, D.B., Faraway, J.J., Zhang, X., Woolley, C.: Stature age and gender effects on reach motion postures. Human factors 42(3), 408–420 (2000) 6. Chateauroux, E., Wang, X.: Effects of age, gender and target location on seated reach capacity and posture. Human Factor 50(2), 211–226 (2008) 7. Chevalot, N., Wang, X.: Experimental investigation of the discomfort of arm reaching movements in a seated position. SAE Transactions, Journal of Aerospace 1, 270–276 (2004) 8. Fuller, J., Liu, L.J., Murphy, M.C., Mann, R.W.: A comparison of lowerextremity skeletal kinematics measured using skin- and pin-mounted markers. Hum. Mov. Sci. 16, 219–242 (1997) 9. Jung, E.S., Choe, J.: Human reach posture prediction based on psychophysical discomfort. Internaltional Journal of Industrial Ergonomics 18, 173–179 (1996) 10. Kennedy, F.W.: Reached capability of men and women: A three dimensional analysis. In: Aerospace medical research laboratory, Wright-Patterson Air force base, Yellow springs, Ohio (July 1978) 11. Leardini, A., Chiari, L., Della Croce, U., Cappozzo, A.: A. Human movement analysis using stereophotogrammetry. Part 3: soft tissue artefact assessment and compensation. Gait Posture 21, 212–225 (2005) 12. Liegeois, A.: Automatic supervisory control of the configuration and behavior of multybody mechanisms. IEEE Transactions on systems, man and cybernetics 7(12), 868–871 (1977) 13. Lu, T.W., O’Connor, J.J.: Bone position estimation from skin marker co-ordinates using global optimisation with joint constraints. Journal of Biomechanics 32, 129–134 (1999) 14. Monnier, G., Wang, X., Beurier, G., Trasbot, J.: Coordination of spine degrees of freedom during a motion reconstruction process. SAE 2007 Transactions Journal of Passenger Car – Electronic and Electronic Systems (2007) SAE Paper 2007-01-2454 15. Monnier, G., Wang, X., Trasbot, J.: RPx: A motion simulation tool for car interior design. In: Duffy, V.G. (ed.) To be published in Handbook of Digital Human Modeling, Taylor & Francis Group, Abington (2008)
Problems Encountered in Seated Arm Reach Posture Reconstruction
169
16. Reed, M.P., Parkinson, M.B., Chaffin, D.B.: A new approach to modeling driver reach. In: 2003 SAE World Congress, Detroit, Michigan, March 3-6 (2003) SAE Paper N° 2003-010587 17. Sengupta, A.K., Das, B.: Maximum reach envelope for the seated and standing male and female for industrial workstation design. Ergonomics 43(9), 1390–1404 (2000) 18. Wang, X., Chateauroux, E., Chevalot, N.: A data-based modeling approach of reach capacity and discomfort for digital human models. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 215–223. Springer, Heidelberg (2007) 19. Wang, X., Chevalot, N., Monnier, G., Ausejo, S., Suescun, Á., Celigüeta, J.: Validation of a model-based motion reconstruction method developed in the Realman project. In: SAE International conference and exposition of Digital Human Modeling for Design and Engineering, June 14-16 (2005) SAE paper 2005-01-2743 20. Wang, X., Chevalot, N., Monnier, G., Trasbot, J.: From motion capture to motion simulation: an in-vehicle reach motion database for car design. In: SAE International conference and exposition of Digital Human Modeling for Design and Engineering, Lyon, France, July 4-6 (2006) SAE Paper 2006-01-2362 21. Wang, X., Chevalot, N., Trasbot, J.: Prediction of in-vehicle reach surfaces and discomfort by digital human models. In: SAE International conference and exposition of Digital Human Modeling for Design and Engineering, Sheraton Station Square, Pittsburgh, Pennsylvania, USA, June 17-19 (2008) SAE paper N° 2008-01-1869 22. Wang, X., Verriest, J.P.: A geometric algorithm to predict the arm reach posture for computer-aided ergonomic evaluation. The Journal of Visualization and Computer Animation 9, 33–47 (1998) 23. Zhang, X., Chaffin, D.B.: A three-dimensional dynamic posture prediction model for invehicle seated reaching movements: development and validation. Ergonomics 43(9), 1314–1330 (2000)
Intelligent Motion Tracking by Combining Specialized Algorithms Matthias Weber FGAN e.V., FKIE, Neuenahrer Str. 20, 53343 Wachtberg-Werthhoven, Germany
[email protected] http://www.fgan.com/fkie
Abstract. Motion Capture is a widely accepted approach to capture natural human motion, usually utilizing markers to track certain anthropological points on the participant’s body. Unfortunately, these markers do not carry any identification information. Furthermore, marker data can be noisy. To address these problems this work suggests a hybrid approach, i.e. an approach using several experts to solve easier, less complex subproblems. Currently, the presented hybrid approach is built upon three methods, two for identification and one for tracking purposes. For identification of an initial posture, a PCA-based technique for aligning a skeleton model as well as a tree-based optimization comparing anthropometric and tracking data are introduced. To complement the hybrid computation pipeline a neural network algorithm based on selforganizing maps tracks the markers on subsequent frames. Keywords: Motion Capture, Marker Identification, Neural Networks.
1
Introduction
Analyzing life-like human motion is often a tedious and exerting task. The underlying motion data is usually captured with a Motion Capture (MoCap) system. These systems produce a lot of data by recording many data frames at a certain rate. In addition, most systems track positional data of certain anthropological points on the participant’s body. As this positional data is recorded several times per second for all anthropological points, a lot of data is gathered that has to be analyzed. Furthermore, for capturing human motion, MoCap systems should be used that do not hinder natural movement. Most systems utilize markers to track certain anthropological points on the participant’s body. Markers are sensors, emitters or reflectors which provide positional information to the system. Currently, passive optical systems promise the least hindering of natural movement as they do not require cables or big markers. They use reflectors as markers which are very small but do not carry any identification information. Such optical MoCap systems additionally have to cope with hidden markers or suddenly appearing ghost markers that do not exist in reality. Therefore, the gathered MoCap data can be noisy. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 170–179, 2009. c Springer-Verlag Berlin Heidelberg 2009
Intelligent Motion Tracking by Combining Specialized Algorithms
171
As a consequence the MoCap data first has to be processed to circumvent these problems, leading to mostly polished data which in turn makes later analysis easier. Several algorithms have been proposed to cope with unlabeled and noisy data. Usually these approaches still require certain, often manual steps to be taken. Additionally, they are often very specialized, like dealing with identification or hidden markers only. Approaches that use several experts to solve subproblems, seem to be non-existent in current research. Such a separation into subproblems can yield several advantages. It is often much easier and less complex to solve subproblems. The resulting solutions provide a good basis for solving the whole problem. This work presents such a hybrid approach, built upon three methods, two for identification and one for tracking purposes. It is presented by first giving an overview on related work, followed by the methods for initializing the internal model and for tracking the markers on consecutive MoCap frames. Finally, the approach will be evaluated, leading to a conclusion.
2
Related Work
As this work concentrates on marker-based MoCap techniques only related work relevant to marker-based tracking is described in this section. Research in this field can be separated in the two research topics occlusion free marker tracking and skeleton fitting. Dorfm¨ uller-Ulhaas [1] extended the Kalman filter in conjunction with a motion model based on exponential maps to deal with occlusions. Filtering can be seen as a preprocessing step for a later estimation of the human skeleton configuration from the tracked point cloud. While providing higher accuracy, such filtering techniques can increase the computational demands. For skeleton fitting a least-squares method, is often used to fit a reference skeleton model into a MoCap data cloud, see e.g. O’Brien et al. [2]. Silaghi et al. [3] present a combination of global and local techniques for skeleton fitting. Global techniques consider the whole skeleton at once, while local techniques adapt only a limited number of bones. Using a closed form alternative to the least-squares methods reduces the computational complexity of skeleton fitting, as shown by Knight and Semwal [4]. Zordan and Horst [5] proposed a physically based approach to skeleton fitting where the marker points were used to compute external forces acting on a skeleton model. An integrated approach for both robust marker tracking and skeleton fitting is presented by Hornung and Sar-Dessai [6]. In this work clique-based recognition was used to identify indistinguishable markers, as described by Ringer and Lasenby [7]. To solve the problem of missing markers, inverse kinematics, continuity assumptions and other heuristics were used. In general this approach successfully tackles some fundamental problems of MoCap. However, as it has to use several markers to build the cliques which might restrict the mobility of the tracked person, it is not applicable in all environments. A hybrid approach as presented in this work seems to be a unique investigation in the field of motion tracking. Other work mostly focuses on enhancing tracking performance by combining different tracking technologies. Combining different algorithms for performance enhancement is seldomly used.
172
3
M. Weber
Algorithm Description
Hybrid methods encompassing several algorithms promise higher stability in terms of recognition rate. They try to combine the advantages of several methods while avoiding their different disadvantages. In addition, they can combine several algorithms in a pipeline, i.e. certain algorithms pre-process the data for other algorithms. This either enhances the processed data or it transforms the data to meet certain requirements for later processing algorithms. The hybrid tracking approach explained in this paper is composed of two main steps, organized in a pipeline. First (Sect. 4), an internal model is calculated that represents and identifies the markers in an initial MoCap posture. Second (Sect. 5), neural network learning is used to subsequently adapt the skeleton to the data points of consecutive MoCap frames through time.
4
Building an Internal Model
For capturing human motion, people are equipped with markers on special points of their bodies. These markers are tracked and captured to a database. The markers are not labeled by the tracking system, they are 3D points without additional information. As a consequence, each posture of a human motion is represented as a point cloud in the database. For marker identification, first an internal model has to be initialized that maps markers to body landmark points out of the point cloud. 4.1
Prerequisite: User-Created Skeleton Model
A skeleton-like structure is used to develop a description of the participant’s body, in particular the respective anatomical landmarks. Skeletons are similar to trees, they consist of joints (like nodes) and segments (edges, connecting the joints). Therefore, they form branches that consist of joints that are linked in a chain, e.g. arms and legs. For the skeleton model these branches will be called subchains. head l_tip
l_shoulder l_wrist l_hand
l_elbow
r_shoulder r_wrist
r_tip
r_elbow r_hand
l_pelvis
r_pelvis
l_knee
r_knee
l_ankle l_foot
r_ankle r_foot
Fig. 1. The user-created skeleton model used in this work
Intelligent Motion Tracking by Combining Specialized Algorithms
173
As a prerequisite, the user has to create a skeleton structure first. This can be done once as the skeleton will be used for all MoCap data captured with similar marker configurations. The provided skeleton does not need to precisely reproduce the human skeleton in size or proportions. Rather, it needs to specify an approximated link structure of the marker configuration on the subject’s body. Fig. 1 shows the skeleton model used in this work. It consists of 19 joints, organized in 4 subchains, left and right arm and left and right leg including left and right markers at the pelvis. This simple model is sufficient enough to conduct our experiments as well as many other human motion experiments. More complex models, e.g. including the spine, could be used as well, though requiring more complex calculations afterwards. 4.2
Aligning a Skeleton Model
Acquiring a properly aligned internal model is a helpful first step to identify markers and therefore reconstruct the postures done by the test subject in each frame. To configure the internal model, a body posture out of the MoCap data is used that has to accompany the user-created skeleton, such as standing with the arms reaching out (T-Pose), as it is used in this work. First, the skeleton model is created with normalized extensions, i.e. coordinates are in the range 0 − 1. This makes it easier to adopt the model later to the initial MoCap posture. Joints in this skeleton are labeled according to the corresponding anatomical landmarks or joints of the body. When querying the initial posture from the database, the returned point cloud is arbitrarily positioned and oriented. To be able to fit the normalized skeleton to this point cloud the two main axes of the body are calculated using the principal components analysis (PCA). The PCA transforms the data by projecting it onto a set of orthogonal axes which indicate the directions of greatest variance in the data. In our case, the point cloud is projected onto a 2-dimensional space, spanned by the two main PCA axes. This leads to a properly aligned and flattened point cloud, which is oriented such that the arms reach out in the direction of the x axis. The feet and head are oriented along the y axis. As there is nothing known about the orientation of the two main axes of the PCA, the left and right side of the body could be confused. Therefore, the orientation of the head (it is equipped with a 6DOF body target) is multiplied with the vector of the PCA x axis. The result of this scalar product reveals parallelism (result > 0) or anti-parallelism (result < 0). If they are anti-parallel, the orientation of the x axis is swapped to its opposite direction yielding correct PCA orientation. After this, the normalized skeleton model is scaled to the extensions on the x and y axis. The skeleton model is thereby properly aligned to the projected point cloud in PCA space. This scaled skeleton model is retransformed into the normal 3-dimensional space and moved to the median point of the test subject’s point cloud. As a result, the model is now aligned to the subject’s initial posture. Fig. 2 shows the initialization steps for an initial model. After the initial model was created, it is trained to the initial posture. For this, the skeleton structure
174
M. Weber
Fig. 2. Left: Loaded MoCap data from database (spheres) and calculated PCA (coordinate system), right: initial model (line)
is considered as a SOM (see Sect. 5) with the joints being the neurons. After training with the initial posture, the neurons correspond to the skeleton joints and the joints correspond to the markers. This makes it possible to map neurons aka skeleton joints (which are named) to markers, therefore identifying them. 4.3
Tree-Based Optimization for Skeleton Model Initialization
Often T-Pose postures are not available. Therefore it might be more feasible to use an optimization algorithm to fit the skeleton directly to an initial, more complex posture. The idea is to use anthropometric data of a person to compare this data to possible marker configurations and choose the best match. Again, a user-created skeleton is used (see Sect. 4.1) with the difference that only the hierarchy of the skeleton is needed. Anthropometric Data. Anthropometric data contains fundamental information about the participant’s body characteristics. Anthropometry is often part of MoCap experiments, usually as an initial step when marking anatomic landmarks. Segment lengths in the skeleton model can be computed out of anthropometric data. To accomplish this the following anthropometric values are used: Body length, acromial, iliocristal height, width of shoulder, pelvic width, height of merion laterale, height of supratarsale fibulare, hand length, forward range of hand, forward grasping range of hand, foot length, length of upper arm, length of lower arm. All of them are measured on the subject’s right side. The skeleton segment lengths are computed mainly using the theorem of Pythagoras. Lower and upper arm do not need to be calculated as they are already given. The resulting segment lengths are used as reference values for comparison with generated segment lengths, as explained in the following. Constructing the Tree. To find an optimal mapping between markers and anthropological points, a tree is constructed that contains all possible mappings. For the optimization purpose it is additionally annotated with costs,
Intelligent Motion Tracking by Combining Specialized Algorithms
175
denoting discrepancies between the reference segment lengths and the currently constructed segment lengths. Nodes are created in three dimensions, one dimension for the depth of a tree, one for joint names and one for the markers. The maximum number of tree nodes is therefore: depth−1
number(joints) · number(markers − i)
(1)
i=0
For the tree construction, all reference segment lengths and all unlabeled markers, as well as the skeleton model are needed. After construction the tree contains nodes for every possible marker to joint mapping, annotated with costs. Branches in the tree represent possible marker configurations corresponding to subchains in the skeleton. Initial postures must contain all the markers that correspond to the joints in the skeleton model. The tree construction algorithm starts from the root node which corresponds to the head in this work. Nodes are created in all three dimensions of the tree: Joint names are considered that correspond to the skeleton model and nodes are created for all markers. Each node is used as the parent for the next hierarchical level in the skeleton model aka all child joints. In consequence, all possible marker configurations are constructed for each subchain in the skeleton structure. Leaf nodes are the nodes created for the end joints of a subchain. Costs are calculated for each node. The previously computed segment lengths are compared to distances between the nodes’ and parent nodes’ associated markers. These costs are accumulated, i.e. each node’s costs are added with the accumulated costs of the node’s parent. The root node initially contains a value of 0. Additionally, this leads to a possible optimization: Branches in the tree that contain high accumulated costs can be abandoned as the difference between their marker configurations and the reference segment lengths is already high. An example for such a tree is shown in Fig. 3. It contains a head marker node as the root of the tree. For each child of the head in the predefined skeleton, a set of nodes is created, which are named l shoulder, r shoulder, l pelvis and r pelvis. Child nodes of the head are created in each set. These nodes represent all tracked markers and contain the first discrepancies between the reference segment lengths and differences between any markers’ positions and the head marker’s position, annotated as costs. Then, children sets for every node in the l shoulder set are constructed, here these are only l elbow sets. Again nodes for all markers are created and connected to the respective parent node. The costs for these nodes are calculated and added to the costs of the parent node. This process is continued until the leaf nodes of the tree have been created. To sum up, all possible marker configurations are denoted in a 3-dimensional tree resembling the predefined hierarchical skeleton structure. To facilitate determining the best-fitting configurations, the nodes have been annotated with accumulated costs. After the tree has been built, the best leaf has to be found for each branch that corresponds to a subchain in the skeleton model (here for each arm and leg) using the accumulated costs. For each end joint name of the
176
M. Weber Root / Head r_pelvis l_pelvis r_shoulder l_shoulder Marker 1 Costs: 1.3
Marker 2 Costs: 8.4
...
l_elbow Marker 2 Costs: 1.3+0.5
Marker 3 Costs: 1.3+4.8
...
l_elbow Marker 1 Costs: 8.4+0.5
...
Marker 3 Costs: 8.4+4.8
...
...
...
Fig. 3. Partial view on a constructed 3D tree, containing costs and node connections
skeleton model the corresponding leaf with the least accumulated costs is searched by comparing all costs of leaves corresponding to this joint name.
5
Neural Network for Adapting the Model
When initialization is finished, the SOM is adapted to each frame of the MoCap data. During this process the SOM is used to map joint names to the markers, therefore labeling the markers. Adaptation steps are computed several times for each frame in the MoCap data, finishing when certain abort criteria, like distances between markers and neurons being low or reaching a certain number of iterations, are achieved. Initialization: Creation of the SOM and a Reference Model. First, the SOM has to be created. After the initial model was generated, the neurons are linked to the joints of this model. This reference model is used to keep the neural net consistent with regards to the skeleton structure. Whenever markers are occluded it is used for keeping the distances between joints approximately the same. However, the following usual SOM learning rules apply. Step 1: Stimulus and Response. For the current frame, the position of each tracked marker is presented to the SOM, with x being the training vector. Distances to all prototype vectors are then computed, using Euclidean distance measure. The neuron with its prototype closest to x is the winning neuron b. The prototype vectors were set in the initialization step (see Sect. 5), when the SOM was created out of the initial skeleton. x − mb = min{x − mi } i
(2)
Intelligent Motion Tracking by Combining Specialized Algorithms
177
Step 2: Adaptation. For adaptation, first the winner neuron is moved towards the current MoCap point, with a certain learning rate and learning radius σ. Each other neuron is also adapted towards this point but, in contrast to the usually applied learning rules, with much lower learning parameters. Such lower learning parameters seem to achieve a good compromise between generalization and specialization. The prototype vectors are updated according to the following update rule: mi (t + 1) = mi (t) + Hbi (t) (x − mi (t))
(3)
t is the current iteration. Hbi (t) is a neighborhood kernel centered at the winning neuron vector: Hbi (t) = e
−
mb (t)−mi (t)2 2σ2
(4)
Step 3: Identifying Markers. Steps 1 to 3 are usually computed several times to adapt properly to a posture. After adaptation is finished, the markers have to be labeled. This can be done by using any nearest neighbors computation algorithm. The nearest marker of a neuron gets the name of this neuron. After this the model is fully adapted to the markers and the markers are labeled according to the names of the nearest neurons. Fig. 4 shows an example tracking for several frames.
Fig. 4. Tracking of several frames
6
Evaluation
For capturing motion data, we used a passive optical tracking system with 8 cameras operating at 50Hz made by ART GmbH, Germany. Such systems are nowadays widely used in the field of real-time interaction with virtual environments. An experiment with 30 participant was conducted. Each participant was equipped with one 6DOF body above the head as a reference point. For tracking the complete body, the participant was additionally equipped with 18 3DOF markers on the anatomic body landmarks. The joints tracked are the same as the joints of our skeleton model (see Sect. 1). The tracking cameras span a measure volume of 2.8m × 3.8m wide and 2.2m high.
178
M. Weber
A participant performed several motions. It first moved to the initial pose (see Sect. 4). After that it performed some arm movement and body movement. Additionally, the initial pose was performed on several different locations with different orientations. Finally, movement while using an interaction device was captured. 6.1
PCA-Based Aligning
The participant was orientated in arbitrary directions to check the method’s capabilities to cope with such situations. Especially the PCA algorithm, as it is used here to calculate position and orientation of the participant, was evaluated. The algorithm works 100% correctly for T-Pose postures and for frames where all markers are present. 6.2
Tree-Based Optimization
Anthropometric dimensions were measured for each participant. After MoCap, some postures out of the MoCap data were chosen as test input data. After presenting the data to the system, the correct assignment of the recognized marker configuration was checked manually. 80% of the frames were recognized correctly. The rest was recognized partially, e.g. arms were correct but the legs were mixed. One reason for this could be the accuracy of the participant’s anthropometric data. Using additional constraints to guide the algorithm yielded a 100% recognition rate. 6.3
Neural Network
To test the algorithm, a part of the data set, that contained only few marker occlusions was used for training and testing. This data was additionally obfuscated with one marker being occluded for a longer time. As it is not easy to measure the error (there is no reference here) it can only be quantitatively stated that the SOM performed very well for this situation. Finally, the rest of the data, where a lot of markers were occluded (each once a while up to only four markers were seen), was used for adapting the SOM. This data even contained situations where the whole left arm data was missing for approximately 2 seconds, being a hard condition for the algorithm. Markers just disappear and pop up on completely different positions. Unfortunately, the SOM is not very well fitted to track a lot of missing markers over a longer period of time, and, for human motion, even 2 seconds are a long period of time. During this time the SOM tries to adapt to the other markers and can completely lose track on its corresponding, but missing marker. It can even happen that other neurons adapt strongly to a reappearing marker, because in the meanwhile it moved to their position and they also lost track to their corresponding markers.
Intelligent Motion Tracking by Combining Specialized Algorithms
7
179
Conclusion and Future Work
Many tracking algorithms require time-consuming manual initialization processes, at least to gather initial information for a subsequent automatic tracking of the markers. This work presents a mostly automatic and robust approach to identify and track markers on a human subject. An initial pose out of Motion Capture data is used to generate an internal model of this pose. This internal model is then represented as a self-organizing map which can adapt easily to new subsequent poses out of the MoCap data. Even though such a hybrid approach achieves quite robust tracking, it still has some manual actions left, namely setting parameters for the algorithms and search for a good start posture. It might be interesting to integrate other methods to further increase robustness and automatic processing. Integrating other motion prediction methods might also help the tracking process and guide the SOM.
References 1. Dorfm¨ uller-Ulhaas, K.: Robust Optical User Motion Tracking Using a Kalman Filter. Technical Report 2003-6, University of Augsburg, Institut f¨ ur Informatik, Universit¨ atsstr. 2, D-86159 Augsburg (May 2003) 2. O’Brien, J.F., Bodenheimer, R., Brostow, G., Hodgins, J.K.: Automatic Joint Parameter Estimation from Magnetic Motion Capture Data. In: Graphics Interface, pp. 53–60 (2000) 3. Silaghi, M.C., Pl¨ ankers, R., Boulic, R., Fua, P., Thalmann, D.: Local and Global Skeleton Fitting Techniques for Optical Motion Capture. In: Magnenat-Thalmann, N., Thalmann, D. (eds.) CAPTECH 1998. LNCS, vol. 1537, pp. 26–40. Springer, Heidelberg (1998) 4. Knight, J.K., Semwal, S.: Fast Skeleton Estimation from Motion Capture Data using Generalized Delogne-Kasa Method. In: Skala, V. (ed.) WSCG Conference Proceedings, WSCG (January 2007) 5. Zordan, V.B., Horst, N.C.V.D.: Mapping Optical Motion Capture Data to Skeletal Motion using a Physical Model. In: SCA 2003: Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation, Aire-la-Ville, Switzerland, Switzerland, Eurographics Association, pp. 245–250 (2003) 6. Hornung, A., Sar-Dessai, S.: Self-Calibrating Optical Motion Tracking for Articulated Bodies. In: VR 2005: Proceedings of the, IEEE Conference, on Virtual Reality, Washington, DC, USA, pp. 75–82. IEEE Computer Society Press, Los Alamitos (2005) 7. Ringer, M., Lasenby, J.: A Procedure for Automatically Estimating Model Parameters in Optical Motion Capture. Image and Vision Computing 22(10), 843–850 (2004)
Ambient Compass: One Approach to Model Spatial Relations Petr Aksenov, Geert Vanderhulst, Kris Luyten, and Karin Coninx Hasselt University – transnationale Universiteit Limburg Expertise Centre for Digital Media – IBBT Wetenschapspark 2, 3590 Diepenbeek, Belgium {petr.aksenov,geert.vanderhulst,kris.luyten, karin.coninx}@uhasselt.be
Abstract. The knowledge of spatial arrangements of objects is an important component for the design of migratable user interfaces that target pervasive environments. Objects in these environments are often moving around individually, which leads to a highly dynamic and unpredictable environment. Due to its nature, spatial information cannot be described exhaustively, and uncertainty and imprecision need to be taken into account during both the design phase and at runtime. We present an approach to model dynamic spatial information, providing it with the ability to interpret to some extent uncertain and imprecise knowledge. We then integrate this type of spatial-awareness into ReWiRe, a framework for designing interactive pervasive environments, in order to improve its user-interface distribution techniques.
1 Introduction Spatial information has been named one of the most important knowledge about an environment in a large amount of research (e.g. [7,16]). Nowadays, both indoor and outdoor environments are getting populated with mobile devices, implying that many interaction resources are carried around by users most of the time. The position and orientation of these devices can have an impact on the way users interact with their surroundings; for instance, it is more likely that users will execute a task by making use of resources in their vicinity. Therefore modelling the spatial arrangements of computing devices is an important step to get insight into the full topology of the environment. However, it is still unclear how spatial information should integrate with existing models. In this work we present an approach to model the spatial behaviour of an environment, i.e. positional relationships that apply between objects, providing it with the ability to reason about uncertain and imprecise spatial knowledge. This type of knowledge allows that spatial arrangements between interacting resources are considered in a more natural way, thus aiming to improve the overall spatial-awareness of the environment. We propose a model, denoted as The Ambient Compass, to capture spatial knowledge (section 3) and discuss its implementation (section 4) and integration into a pervasive computing framework (section 5). But before we briefly present and discuss some related work that exists in the field. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 183–191, 2009. © Springer-Verlag Berlin Heidelberg 2009
184
P. Aksenov et al.
2 Related Work Elaborating the form of knowledge about location is often an inherent part of the engineering process in the pervasive computing area, and a great deal of research has been devoted to this problem. For example, Bandini et. al [4] present the "commonsense spatial model" based on two concepts of "place" and "conceptual spatial relation". They then use their notions to discuss about a possible reasoning technique over the model in a way that a high-level understanding of the situation can be obtained from a combination of initial factors. Another very interesting location model is described by Satoh in [17], where special emphasis is put on the fact that pervasive environments are addressed. The model is presented as a general purpose one intended to address problems of managing location-based services. The underlying principle of the model is in dealing with virtual counterparts that are digital representations of the actual physical objects and spaces. In both works the concept of an object’s closeness area is introduced in either form, but no expansion on the topic is given. Kortuem et. al [14] deal with the problem of utilising spatial information in creating new types of user interfaces and use a graph to model spatial arrangements of the system. The graph represents the spatial infrastructure of a system at a certain moment of time, and the system is then described over time by a sequence of these graphs. However, the question of uncertainty is excluded from the discussion in all of the works mentioned, but is an important part of our approach. The concept of fuzziness is very much discussed in almost every area of research where spatial information is involved. Guesgen [9] showed the possibility to introduce fuzziness into spatial relations in general, though with a conclusion that the actual implementation of fuzziness depends on the model chosen for representation. A thorough overview of ontological modelling of spatial information as well as an extensive discussion of possibilities to implement fuzziness in such an ontology is given by Hudelot et al [13]. The work also contains an excellent collection of references to other publications in the field. Despite the fact that the application area of the presented ontology is within the image interpretation, the ideas presented can serve as a good source of information for extending both the number of concepts in our ontology and the fuzzy elements therein. Apart of the spatial information, several example of incorporating fuzziness into ontologies, from a simple and direct implementation [8] to a detailed and theoretically supported analysis of the problem as a whole [5,18,19], have been published quite recently, thus indicating that this is still a topical problem with promising trends.
3 Modelling the Compass The model defines basic concepts natural in spatial structures. This includes positioning information, orientation angles and a division of space into the "hasOnLeft", "hasOnRight", "hasBehind", and "hasInFront" relationships that can take place between two interacting resources. The proposed classification aims at giving an application the possibility to speak a language similar to that of humans when they talk about spatial arrangements. In general, there should be two more relations, "hasBelow" and "hasAbove", but due to the increasing complexity their
Ambient Compass: One Approach to Model Spatial Relations
185
Fig. 1. (a) The Ambient Compass divides the space around a resource into eight zones; (b) resources belonging to the same zone of the compass are distinguished by means of assigning each of them a degree of membership to this zone
consideration has been postponed. Various techniques to obtain information about the location and orientation of an object from sensors exist [11,12,15]. Using this information, we can update our model in real-time and derive spatial relations by dividing the space around a resource into eight zones as depicted in Fig.1. Either one relation (e.g., "hasOnRight") or two relations (e.g., "hasOnRight" as well as "hasInFront", or, similarly, "hasOnRight" as well as "hasBehind") can apply between two resources at a given moment. This division may remind of the way we generally refer to the parts of the world: north, east, south, west, north-west, north-east, south-east, and south-west. The boundaries of each of the main four zones are two rays symmetrically drawn to the left and right, or to the above and below, respectively, of the corresponding axis line. The slope angles of the rays depend on the resource in question therefore a set of experiments will be required to decide upon the best strategy to set them for different groups of devices. Additionally, each relation of type "has" has an inverse relation of type "is", so that if, for example, device D1 "hasOnLeft" device D2 then device D2 "isOnLeftOf" device D1. This makes our model smoother and also simplifies queries executed on the ontology. This simple model acts as a basis on which we build two extensions to get a more extensive model capable of handling relevant uncertain and imprecise knowledge about the spatial world. 3.1 Adding Fuzziness The first extension deals with ambiguity which appears when relations are determined. In Fig. 1b), devices A and B both belong to the "hasInFront-hasOnLeft" area of the central device, but it is obvious that their actual position with respect to this device is different. Therefore treating A and B as spatially equal would be erroneous. A possible solution to this issue lies in introducing the concept of fuzziness into establishing the four relationships. Several ways of extending ontologies with fuzzy information exist [5,8,18,19] and all of them deal with introducing in either way a degree of membership of each individual and/or relation to a certain domain. In the
186
P. Aksenov et al.
case of the ambient compass, these domains are its eight zones, and in Fig. 1b) you can see devices A and B have different values (0.1 and 0.85, respectively) of being in front of the central device. The membership of the second relation in the zone is such that the sum of the two values equals 1. Incorporating this extension allows to keep the relationships as appropriate using the corresponding weights, thus providing desired flexibility as well as a sort of precision in defining a more truthful type of the actual relationship between two resources. The same idea applies to the zones where there is only one relation, with the weights standing for closeness of the object in question to the corresponding adjacent zone with two relations. 3.2 Defining Nearby Regions The second extension results primarily from the way how humans perceive spatial information. The concept of closeness of one object to another usually varies depending on the number of factors one considers to matter in a given situation and has already been pointed out as a subject of special attention in a number of research in either form [4,17]. The solution we suggest consists of two parts. The first one defines the concept of the "nearby" spatial relation for different types of interaction resources present in a pervasive environment. The second one defines reasonable spatial regions for each different type of interaction resource – with the resource itself being the central point – within which the corresponding "nearby" relation can be established between the resource in question and other resources. We plan to involve the concept of fuzziness into the definition of "nearby", too. This means that there is no strict division into "nearby" and "not nearby", but a degree of how much an object is "nearby" is used instead and is represented by a real number in the range [0;1]. Considering distances in such a way can provide solutions in situations where no perfect match can be found but still a positive response can be obtained. Some good examples of this approach are given, for instance, by Guesgen [10]. In addition to identifying the nearby areas, we also try to predict the behaviour of interaction resources by means of analysing their previous behaviour and reasoning over the current corresponding spatial relations between them. We use the concepts of device availability function and device importance introduced in [3] to address it.
4 Engineering the Compass Information about the model is presented in the form of an ontology and is therefore a set of concepts and properties that relate these concepts to each other. The ontology is created in the OWL language using the Protege-OWL editing tool [1]. The choice of OWL was directed by the latest trends in the development of the semantic web world [2], following the endorsement of the W3C organisation1. A part of the ontology is shown in Fig.2. Due to the highly-dynamic nature of targeted environments, we do not instantiate any resources during the design phase – all instances are created at run-time. To deal with fuzziness, we decided to use the approach suggested by Gu et. al [8] due to its self-evidence and computational simplicity. We extended our ontology with 1
http://www.w3.org
Ambient Compass: One Approach to Model Spatial Relations
187
Fig. 2. Ontology as it appears in Protege-OWL. The highlighted "FuzzySpatialRelation" class on the left extends the basic concepts of the spatial model with a possibility to consider uncertain knowledge by means of introducing two additional numerical properties.
class "FuzzySpatialRelation" that has two object properties, according to the number of interacting resources involved, and two data-type properties that keep the fuzziness values (see Fig.2). It is important to note that two values are used to define fuzziness since the relations between two resources are possible in both directions. This allows having only one instance of the "FuzzySpatialRelation" class for two compasses (one from each interacting resource), keeping the entire ontology simpler. Localisation systems have a certain accuracy of measurements, resulting in a difference between the predicted and the actual location [6]. Paying our attention to this important factor, we consider a possible actual location of a resource in such a way that if the measured location has produced the fuzziness value of 0.8 then the relation’s membership could vary from (0.8-δ) to (0.8+δ), where δ is the allowance parameter. The value of this parameter depends on the quality of the measuring equipment and must be determined empirically. This structure can also be used in the case of the zones with one relation. Here, the data-type property would stand for the closeness of the resource to one of the zone’s borderlines so that two objects standing on the opposite sides of the zone would have different fuzziness values in the corresponding "FuzzySpatialRelation" component of the spatial ontology. The same can apply to the areas surrounding the borderlines of the zones. In other words, when it is not clear whether the position of the resource better corresponds to being on just one side of the other resource (e.g., only "hasOnRight") – as per the calculations – or should the second relation be hypothetically considered as well (e.g., both "hasOnRight" and "hasInFront") because of the measurements.
5 Using the Compass 5.1 Integration with ReWiRe In order to validate the proposed spatial model, we have integrated it in ReWiRe, a framework to design interactive pervasive computing environments [20]. In ReWiRe,
188
P. Aksenov et al.
Fig. 3. Visualisation of the environment in ReWiRe. Resources have a set of spatial relations between them which always exist in "has+is" pairs. When a change to the location or orientation of any object happens, new spatial relations for this object are derived.
an environment is described using an upper ontology that includes concepts to represent generic resources found in a pervasive computing environment such as users, computing devices, services, tasks, etc. Aggregated with the framework’s upper environment ontology, our model provides the spatial context of resources in the environment, which is for instance exploited to improve a distribution algorithm for user interfaces amongst multiple screens as well as to improve location-awareness of users in (unfamiliar) computer-augmented environments. Another important intention behind the integration of the proposed approach in the framework is to allow the use of the same language during both design and run-time phases. This assures that all changes happening during run-time can be interpreted by the designer in the way they are used at the design step. Fig. 3 shows a plug-in for ReWiRe we designed that simulates the movement and rotation of physical objects. Objects are overlaid on a map that represents the environment and any action executed on this view (e.g. dragging a resource) results in an update of the underlying model, in particular of its spatial relations. Besides, updates in the model triggered by sensor readings can be observed using the tool along with their spatial impact on other resources. 5.2 Action Scenario As a brief illustration of possible use of the approach, let’s consider a room with two vertically positioned displays. Assume that a user with a PDA is oriented towards the left display and is projecting something from the PDA on it. The right display is inactive which is indicated by a dashed line (Fig. 4). At a certain moment, the user begins to turn clockwise so that the right display moves – from the PDA’s perspective – from being equally "isInFrontOf" and "isOnRightOf" to much more "isInFrontOf"; whereas the left display starts holding both "isOnLeftOf" and "isInFrontOf" relations. When the PDA’s rotation reaches a certain angle, the image of the PDA is copied to the right display, activating it (the dashed line becomes solid), but still being shown on the left one as well. Having observed that the PDA keeps turning, the compass discovers that the left display, though still staying in front, is already considerably to the left of the
Ambient Compass: One Approach to Model Spatial Relations
189
Fig. 4. The displays change their status from inactive (dashed line).into active (solid line), and vice versa, in response to the PDA turning clockwise. The change of the active display is preceded by the state when the image is shown on both of them.
PDA and therefore can be released (the solid line becomes dashed). This example, in particular, illustrates how this kind of spatial awareness can also be used to smooth the procedure of redistributing a user interface.
6 Discussion We presented an approach to describe spatial information intended to address pervasive environments. Its main advantages are 1) relative simplicity, a crucial factor in dealing with highly dynamic pervasive environments; 2) human friendliness, i.e. an easily recognisable interpretation of this kind of information by humans who have become a considerable part of pervasive environments; and 3) an ability to handle uncertain, incomplete knowledge which is also natural to pervasive structures. Due to being part of the ReWiRe framework, our approach is meant to assist designers of user interfaces in the domain of pervasive environments in general rather than in a specific type of applications. Implementation of the underlying structure of the compass as part of ReWiRe has been completed and its visualisation is currently in progress. In addition, the shortterm future development and improvement of The Ambient Compass includes, first of all, the elaboration of the concept of "nearby" for different groups of interacting resources. In particular, it will take into account their geometrical sizes. The current division into eight zones, as well as considering only four different relations, is a straightforward viewpoint. However, a modified (e.g. asymmetrical) version of the division might suit better for certain tasks, or a more precise subdividing – into more zones – might be necessary. Since no actual evaluation of the current version has been completed, discussing about these further possibilities becomes somewhat unfounded. Therefore one of the early things we plan to do is validate the current version of the approach in an experimental setup on a set of appropriate user interface distribution tasks. Based on the results, we will get a more sophisticated view of the model and will have clues on its amendment and means to improve the algorithms. In particular, on correcting values of allowance parameter δ and rays’ slope angles for different devices and in different situations. For example, it is very likely for the slope angle to
190
P. Aksenov et al.
be a function of distance between the central object and the targeted resource. Another possible useful extension that comes directly from the above discussion about the layout of the zones is to give the designer an ability to define the zones manually, as appropriate for a task. In the long run, we consider extending the compass to the third dimension by means of introducing the "hasBelow" and "hasAbove" spatial relations.
Acknowledgments Part of the research at EDM is funded by EFRO (European Fund for Regional Development) and the Flemish Government. Funding for this research was also provided by the Research Foundation -- Flanders (F.W.O. Vlaanderen, project CoLaSUE, number G.0439.08N).
References 1. Protégé-OWL, http://protege.stanford.edu/overview/protege-owl.html 2. Web Ontology Language, http://www.w3.org/2004/OWL/ 3. Aksenov, P., Luyten, K., Coninx, K.: Reasoning Over Spatial Relations for Context-Aware Distributed User Interfaces. In: Kofod-Petersen, A., Cassens, J., Leake, D., Zacarias, M. (eds.) Proc. of 5th Int. Workshop on Modelling and Reasoning in Context (MRC 2008), pp. 37–50. TELECOM Bretagne (June 2008) 4. Bandini, S., Mosca, A., Palmonari, M.: Commonsense Spatial Reasoning for ContextAware Pervasive Systems. In: Strang, T., Linnhoff-Popien, C. (eds.) LoCA 2005. LNCS, vol. 3479, pp. 180–188. Springer, Heidelberg (2005) 5. Calegari, S., Ciucci, D.: Integrating Fuzzy Logic in Ontologies. In: Manolopoulos, Y., Filipe, J., Constantopoulos, P., Cordeiro, J. (eds.) ICEIS 2006, pp. 66–73 (2006) 6. Dearman, D., Varshavsky, A., de Lara, E., Truong, K.N.: An Exploration of Location Error Estimation. In: Krumm, J., Abowd, G.D., Seneviratne, A., Strang, T. (eds.) UbiComp 2007. LNCS, vol. 4717, pp. 181–198. Springer, Heidelberg (2007) 7. Gottfried, B., Guesgen, H.W., Hübner, S.: Spatiotemporal Reasoning for Smart Homes. In: Augusto, J.C., Nugent, C.D. (eds.) Designing Smart Homes. LNCS, vol. 4008, pp. 16–34. Springer, Heidelberg (2006) 8. Gu, H.-M., Wang, X., Ling, Y., Shi, J.-Q.: Building a Fuzzy Ontology of Edutainment Using OWL. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4489, pp. 591–594. Springer, Heidelberg (2007) 9. Guesgen, H.: Fuzzifying Spatial Relations. In: Matsakis, P., Sztandera, L.M. (eds.) Applying Soft Computing in Defining Spatial Relations, pp. 1–16. Physica-Verlag GmbH, Heidelberg (2002) 10. Geusgen, H.: Reasoning About Distance Based on Fuzzy Sets. Applied Intelligence 17(3), 265–270 (2002) 11. Hightower, J., Borriello, G.: Location Systems for Ubiquitous Computing. Computer 34(8), 57–66 (2001) 12. Hinckley, K., Pierce, J., Sinclair, M., Horvitz, E.: Sensing Techniques for Mobile Interaction. In: Proc. of the 13th Annual ACM Symposium on User Interface Software and Technology (UIST 2000), pp. 91–100. ACM Press, New York (2000)
Ambient Compass: One Approach to Model Spatial Relations
191
13. Hudelot, C., Atif, J., Bloch, I.: Fuzzy Spatial Relation Ontology for Image Interpretation. Fuzzy Sets and Systems 159(15), 1929–1951 (2008) 14. Kortuem, G., Kray, C., Gellersen, H.-W.: Sensing and Visualizing Spatial Relations of Mobile Devices. In: Proc. of the 18th Annual ACM Symposium on User Interface Software and Technology (UIST 2005), pp. 93–102. ACM Press, New York (2005) 15. Leydon, K.: Sensing the Position and Orientation of Hand-Held Objects: An Overview of Techniques. Technical Report, University of Limerick (December 2001) 16. Nurmi, P., Bhattacharya, S.: Identifying Meaningful Places: The Non-Parametric Way. In: Indulska, J., Patterson, D.J., Rodden, T., Ott, M. (eds.) PERVASIVE 2008. LNCS, vol. 5013, pp. 111–127. Springer, Heidelberg (2008) 17. Satoh, I.: A Location Model for Pervasive Computing Environments. In: Proc. of the 3rd Int. Conf. on Pervasive Computing and Communications (PerCom 2005), pp. 215–224. IEEE Computer Society Press, Los Alamitos (2005) 18. Stoilos, G., Stamou, G.: Extending Fuzzy Description Logics for the Semantic Web. In: Golbreich, C., Kalyanpur, A., Parsia, B. (eds.) Proceedings of the Workshop on OWL: Experiences and Directions (OWLED 2007), Innsbruck (2007) 19. Straccia, U.: A Fuzzy Description Logic for the Semantic Web. In: Sanchez, E. (ed.) Fuzzy Logic and the Semantic Web, Capturing Intelligence, ch. 4, pp. 73–90. Elsevier, Amsterdam (2006) 20. Vanderhulst, G., Luyten, K., Coninx, K.: ReWiRe: Creating Interactive Pervasive Systems That Cope with Changing Environments by Rewiring. In: Proc. of the 4th IET International Conference on Intelligent Environments (IE 2008) (2008)
A Comprehension Based Cognitive Model of Situation Awareness Martin R.K. Baumann1 and Josef F. Krems2 1
German Aerospace Center DLR, Institute for Transportation Systems, Lilienthalplatz 7, 38108 Braunschweig, Germany
[email protected] 2 Chemnitz University of Technology, Institute of Psychology, Wilhelm-Raabe-Str. 43, 09120 Chemnitz, Germany
[email protected]
Abstract. For safe driving it is an inevitable precondition that the driver possesses a correct mental representation of the current traffic situation, the situation model. This mental representation not only involves a representation of objects and situational features relevant to the driver’s behaviour, but also the driver’s expectations about the future development of the traffic situation. A concept that describes the processes and the factors influencing them is situation awareness (SA) [1]. Until now the cognitive mechanisms underlying situation awareness have been far from properly understood. In this paper we propose a process model of situation awareness that views the construction of the situation model as a comprehension process comparable to discourse comprehension. Two experiments will be presented briefly that address some predictions derived from this model. The last section of the paper describes a current project that aims at implementing this model in the cognitive architecture ACT-R.
1 Introduction Performance in dynamic situations is highly influenced by how well the operator knows what is currently going on around him and how well he can predict the development of the situation in the near future. The processes involved in constructing and maintaining such a mental representation of the current situation that forms the basis for the operator’s decisions and actions are described in the concept of situation awareness [1]. According to [1] situation awareness entails “the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future” (p. 36). This concept was successfully applied in the last decade to human factors issues in aviation, nuclear power generation, or military combat systems. Only recently it has been introduced to the analysis of driving behaviour [2], [3], [4]. This was at least in part driven by concerns that have been raised about the effects of novel driving assistant and information systems. Models of situation awareness in the driving context have been developed to assess the impact of such assistant and information systems on driver’s situation awareness and their consequences for driving behaviour and driver’s safety [2], [4]. For safe driving it is necessary that V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 192–201, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Comprehension Based Cognitive Model of Situation Awareness
193
drivers perceive, identify, and correctly interpret the relevant objects and elements of the current traffic situation and that they construct expectations about the future development of the current situation to adapt their own driving behavior to the situation. Such elements may be other traffic participants, the surface of the street, or traffic signs. For example, if a driver wants to perform a lane change to the left to overtake a slower lead car on a motorway a lot of different aspects of the current traffic situation, such as the own vehicle’s current speed, the speed of the lead car, the distance of other cars on the left lane, their speed, the road conditions, have to be taken into account. Additionally the driver has also to consider how the situation will develop in the near future, for example is the vehicle on the left going to slow down to let the driver perform the lane change or not. And finally, also global aspects have to be considered, such as is the exit to take to reach the driver’s destination already so near that overtaking may lead to missing this exit and therefore it is better not overtake. 1.1 Situation Awareness According to [1] situation awareness serves three functions. The first function involves the perception of the status, the attributes, and the dynamics of the relevant situation elements. The second function, comprehension, aims to integrate the different situation elements into a holistic picture of the situation, resulting in the comprehension of the meaning of the different elements. The third function of situation awareness is to anticipate future developments of the current situation to be prepared for these developments. These anticipations are based on the mental representation of the situation constructed by the comprehension function. The resulting representation, the situation model, consists of an integration of the environmental input with the operator’s goals, anticipations, and knowledge and his/her previous situation model before the new environmental input was perceived. This already implies that the situation model is not a static structure but a dynamically changing representation that is continually updated. It is the basis for the selection of new actions and strategies. The updated situation awareness will trigger certain actions that might lead to the perception of new pieces of information as these actions change the task environment or because they comprise the sampling of new pieces of information. In either case the perception of these new pieces of information will again change the current situation model. This will again trigger new actions and so on. 1.2 A Comprehension Based Model of Situation Awareness Endsley’s model [1] of situation awareness has stimulated a great number of studies on situation awareness and it is certainly one of the most influential ones to date. Nevertheless, Endsley’s model is a descriptive model of situation awareness. that identifies and describes different cognitive resources and mechanisms that might be involved in constructing and maintaining situation awareness. But her model is rather vague about the nature of these processes and how the different functions of situation awareness are realized. For example her model does not explain what kind of processes underlie the anticipation of the situation’s development in the near future. Endsley’s [1] model of situation awareness does not address how this is accomplished
194
M.R.K. Baumann and J.F. Krems
beyond “through knowledge of the status and dynamics of the elements and the comprehension of the situation” ([1] p. 37). The goal of our research project is to describe the processes that underlie the construction and maintenance of situation awareness in more detail using empirically confirmed models and theories from cognitive psychology research. This model is described in more detail in [2] and we will only sketch this model here focusing on the construction and maintenance of the situation model and leaving out how this situation model can form the basis of action selection. Fundamental to our approach is Kintsch’s [5] Construction-Integration theory of text comprehension as we view the construction and maintenance of situation awareness as a comprehension process. Perceived information activates knowledge stored in long-term memory (LTM) that is linked to the perceived information. From this activated knowledge network a coherent representation is constructed by a constraint-satisfaction process. This process constrains the spreading of activation leading to the activation of compatible and to the suppression of incompatible knowledge elements of the activated knowledge network. The result is a coherent mental representation of the current situation, the situation model. For example, an event such as a traffic light turning yellow might activate at first two incompatible interpretations, “I have to decelerate to stop before the traffic light” and “I have to accelerate to pass the crossroads before the traffic light turns red”. These two interpretations might receive additional activation from other knowledge elements. For example, if the driver knows that the police monitor this crossroads, that knowledge will additionally activate the deceleration interpretation and at the same time inhibit the acceleration interpretation. With this additional activation the deceleration interpretation “wins” and the acceleration interpretation will be inhibited. The experienced driver’s knowledge presumably also includes expectations about the future development of situations that are linked to certain types of situation and are activated when these types of situations, such as approaching a traffic light, are encountered. In this sense the same process that serves the comprehension function of situation awareness also serves the prediction function, especially in routine driving situations for which the driver already acquired relevant knowledge. 1.3 Empirical Investigation of the Model A series of experiments was conducted to test different aspects of the comprehension based model of situation awareness. In this paper only some results can be summarized to demonstrate the usefulness of this approach. We will present the results of one experiment where the availability of the situation model as a function of different possible influence factors was examined. In the second experiment the role of WM in the process of anticipating the future development of a traffic situation was examined. Experiment 1: The effect of experience and information relevance on the availability of the situation model Situation awareness and the theory of long-term working memory As shown in the overtaking example above a substantial amount of information has to be encoded, comprehended and integrated into a mental representation of the current
A Comprehension Based Cognitive Model of Situation Awareness
195
situation. Because of the amount and the complexity of the information that has to be represented in the situation model, such as the position and speed of other vehicles near to the own vehicle, traffic rules defined by signs, the status of the road surface and so on, it is assumed that the situation model contains much more information than can be kept active in working memory. But for the construction process to result in a coherent and consistent situation model it is necessary that when new information is perceived those parts of the situation model relevant for the interpretation of this new piece of information have to be available in working memory [6], [7]. Therefore a mechanism is necessary that allows to keep this information stored in long-term memory but makes this information reliably available for processing when it becomes relevant. The theory of long-term working memory (LT-WM) describes a mechanism [8] that could provide such a function. According to LT-WM theory “cognitive processes are viewed as a sequence of stable states representing end products of processing […] acquired memory skills allow these end products to be stored in long-term memory and kept directly accessible by means of retrieval cues in short-term memory” ([8], p. 211). These retrieval cues are arranged into stable retrieval structures that can have many different forms depending on the domain. In the case of driving such retrieval structures might be based on the huge amount of driving situations an experienced driver encountered. The experienced driver possibly developed many highly differentiated schemata of driving situations that allow him to easily identify many different types of driving situations. Such a mechanism might, for example, at least in part explain why experienced drivers are much better and much faster in identifying dangerous traffic situations than less experienced drivers [9], [10], [11]. If then a certain driving schema is activated, such as the “overtaking schema”, given the appropriate cues in the traffic situation, such as a slower lead vehicle on the current lane, the experienced driver can much more easily associate the different features of the current overtaking situation with this global “overtaking schema” to instantiate this global schema in the current situation. Associated with this memory structure that is stored in long-term memory (LTM) the newly perceived information becomes part of the situation model but is at least in part stored in LTM. Any retrieval cue that retrieves the instantiated overtaking schema into WM retrieves all information associated with this schema into WM and makes the information available for further processing. This mechanism allows that the capacity limitations of WM can be overcome as task-relevant information that is represented in the driver’s situation model can not only be stored in WM but can also be stored in LTM. In this case, information about the current situation kept in WM and the information connected to this information via this schema establish LT-WM. Applying the theory of LT-WM to situation awareness allows making specific predictions about the effect of different factors on the availability of the driver’s situation model. In an experiment several of these factors were examined: the driver’s experience, the relevance of the information, the duration how long information has to be kept available, and whether the driver has available retrieval cues to access information encoded in LT-WM. First, experienced drivers possessing a huge amount of knowledge about traffic situations should be able to use the LT-WM mechanism to encode driving related information much more efficiently than less experienced
196
M.R.K. Baumann and J.F. Krems
drivers can do. Therefore experienced drivers should show a better memory performance for driving related information than less experienced drivers. Second, the retrieval structures of experienced drivers should be adapted to encode task-relevant information. Therefore it is predicted that the advantage of experienced drivers compared to less experienced drivers in keeping driving relevant information available should be greater when this information is highly relevant than when it is less relevant. Third, the availability of information encoded in the retrieval structures depends on the presence of retrieval cues in WM. As experienced drivers should possess more of these structures and more differentiated retrieval structures they should be able to take better use of retrieval cues than less experienced drivers. Experimental procedure These predictions were tested in a driving simulator experiment where participants had to drive on a three-lane highway. The simulation was interrupted repeatedly and participants were asked about a crucial aspect of the traffic situation, namely the arrangement of cars around the participant’s car. This was done by asking the participant about the number of cars on different locations around the participant’s car, for example on the left lane behind the participant. Two groups of drivers were tested: 20 experienced and 20 less experienced drivers. The length of the delay between interruption and recall was either long (20 sec) or short (2 sec). After the interruption, filled with a WM task to prevent rehearsal, either a high informative or a low informative retrieval cue was presented. Additionally the relevance of the information that was to be recalled was manipulated. The participant either had to remember the number of cars from a position highly relevant for the current driving manoeuvre or from a position less relevant. Summary of results and discussion The results of this experiment provide some indication that LT-WM is indeed involved in situation awareness processes while driving. Whereas experienced drivers were as much affected by the length of the interruption interval as less experienced drivers, contrary to our prediction, experienced drivers showed a clear effect of relevance of information whereas less experienced drivers did not. Furthermore, experienced drivers indeed tended to profit more from retrieval cues as these cues support the retrieval of both highly task-relevant information and less relevant information. Inexperienced drivers profited from retrieval cues only when relevant information was cued, not when less relevant information was cued. This means first that experienced drivers seem to be able to encode more traffic information while driving than inexperienced drivers which is in accordance with results of studies relating the visual scanning behaviour of drivers with their experience (e.g., [10]). Second, this also is in accordance with the assumption that experienced drivers are able to use retrieval cues more efficiently because they possess more and better differentiated retrieval structures [2], [12]. The reason that experienced and less experienced drivers were affected by the length of the interruption to the same degree might stem from the dynamics of the driving task itself to which the retrieval structures are adapted. The high dynamics of the driving task on the manoeuvre level [13] make it inefficient to store information at
A Comprehension Based Cognitive Model of Situation Awareness
197
this level for longer time periods. The arrangement of cars around one’s own car certainly belongs to the manoeuvre level. This arrangement changes rather fast and the representation one has constructed is certainly irrelevant after 20 sec. Therefore, experienced drivers may possess retrieval structures that have the relevant positions of cars in the current traffic situation as slots, such as “car from left behind?” as part of the “overtaking” or the “lane change left” schema. The contents of these slots are continuously updated while driving so that the previous content is lost after an update process occurred. This seems highly plausible as the required action in a given traffic situation is basically determined by the current situation or its probable future development, and not by its past. The better performance of experienced drivers in terms of encoding relevant information may then be due to the fact that their retrieval structures better reflect the differences between different types of traffic situations in terms of relevant positions. Experiment 2: The role of WM for the anticipation of traffic events Situation awareness and working memory The comprehension based view of situation awareness emphasizes the importance of WM resources for the construction and maintenance of situation awareness besides visual attention. The integration of new information into the situation model, the updating of the model, and the use of the situation model as a basis for action selection all need working memory resources. According to this model newly perceived information from the environment is comprehended by the activation and integration of knowledge in LTM that is associated with the new information. An important subset of such associated knowledge is expectations about the future development of the situation acquired by the driver previously. For example perceiving a warning sign announcing a construction site ahead might activate the expectation that the lane ahead is somehow blocked. If so, the driver might adapt speed accordingly. But for this process to run properly the relevant knowledge has to be retrieved from LTM to be available in WM. Only information available in WM can be associated and integrated with other pieces of information. Therefore, the availability of this knowledge as well as the availability of WM resources is necessary for the construction of a consistent and coherent situation model involving also the relevant expectations about the future development of the situation. Imposing cognitive load on the driver withdraws necessary resources. The looked-but-did-not-see phenomenon is an example of the interference of cognitive load with situation awareness. In this case the cognitive load by an additional task leads to an incomplete comprehension of one or more situation elements. This then might lead to an inappropriate action selection of the driver. Experimental procedure An experiment was performed to test the assumption that cognitive load impairs the generation of expectations about the future events in a traffic situation. In this experiment 48 participants drove through a scenario that contained both predictable and non-predictable events. These events were designed to be exactly equivalent besides that in the predictable version a warning sign warned the driver of the upcoming event. The reaction to the event when the participant was warned was compared to the reaction when the driver was not warned.
198
M.R.K. Baumann and J.F. Krems
The participants were divided into three groups. In the first group the participants had to perform no secondary task while driving, in the second group and third group they had to perform cognitively loading tasks. In the second group they had to perform an auditory monitoring task that should less interfere with the activation of relevant knowledge in LTM and the comprehension processes. In the third group the participants had to perform a running memory task that should interfere with the activation of knowledge and especially with the comprehension processes. In the monitoring task participants had to react as fast as possible to an auditory signal that was presented with either after a long or a short time interval after the previous signal. Using only two randomly presented interstimulus intervals this tasks induces a strong tendency for rhythmic responding that would lead to errors, mainly too early responses. To avoid these errors one has to constantly suppress rhythmic tapping that draws on WM resources but not on resources that are involved in the comprehension process. In the running memory task [14] participants heard a constant stream of letters (one letter every 2 sec) and they had to remember the current last three letters presented. As the end of the sequence is not known this task requires that the set of letters kept in WM is constantly updated. Performing this task involves WM functions that should also be highly involved in maintaining and updating a proper situation model. Therefore, this task should interfere with the comprehension processes. We assumed that participants driving the scenario without performing a secondary task should clearly benefit from the warning signs in the predictable events. The benefit from warning signs should be reduced when participants had to perform an additional task while driving. And the reduction of this benefit should be greater when participants had to perform the running memory task than when they had to perform the monitoring task, as the memory task should interfere more with the comprehension and prediction function of situation awareness than the monitoring task. The experiment was run in the high fidelity driving simulator of TNO in Soesterberg, Netherlands. The driving scenario consisted of a rural road. Participants were instructed to drive with an approximate speed of 80 km/h. There were four critical events per drive where the participant’s lane was blocked by an obstacle, for example by a construction site. In two of these events the driver was given information to predict the obstacle, in the other two the driver did not receive such warning information. Each participant drove the scenario only once. Summary of results and discussion A more detailed presentation of the results is given in [19]. One of the measures we examined was the Time to Collision (TTC) at the moment the driver released the throttle to brake after passing the location where the obstacle first became visible (see Table 1). As participants being prepared to the obstacle should have already reduced their speed after seeing the warning signal and should brake earlier than participants that did not comprehend the warning signal fully, TTC for prepared participants should be larger than for unprepared. And as stated above this difference should be greatest in the no secondary task condition and lowest in the updating memory condition. The results show that this is the case confirming the predictions. Whereas the difference between the predictable and the non-predictable obstacle is 1.57 sec in the no secondary task condition, it is 0.8 sec in the monitoring task condition and 0.58 sec in the memory updating condition.
A Comprehension Based Cognitive Model of Situation Awareness
199
Table 1. Mean values of Time To Collision
TTC at first throttle release after roadblock is visible in sec
no sign sign
no secondary task 3.69 5.26
monitoring task 3.75 4.55
memory updating task 3.66 4.24
The same pattern of results was found examining the speed of the participants when they release the throttle to brake in front of the obstacle. When the event is not predictable the speed is not different between the no secondary task condition and the two secondary task conditions. There is a significant difference in speed between these conditions in case of a predictable obstacle. Participants in the secondary task conditions were driving faster when they started to brake than participants in the no secondary task condition, indicating that these participants were less prepared to the obstacle despite the warning sign. The greatest difference in speed could be found in the no secondary task condition, 12.6 km/h, a medium difference in the monitoring condition, 5.9 km/h, and the smallest in the memory updating condition, 4.8 km/h. Again this indicates that participants in the memory updating condition had the greatest difficulties to comprehend, integrate, and react to the warning sign. Implementation of the Comprehension Based Model of Situation Awareness In a current project at DLR this comprehension based model of situation awareness will be implemented in ACT-R [15]. ACT-R is a cognitive architecture consisting of a set of modules each dedicated to the processing of a certain kind of information. Examples of such modules are the visual module for the processing of objects in the visual field, the manual module for controlling the hands, the declarative module for the retrieval of information from memory, and the goal module necessary to keep track of goals and intentions [15], Each module is associated with a buffer where a limited amount of information is deposited and can be accessed by a central production system. This central production system coordinates the exchange of information between the buffers. It recognizes patterns in these buffers and changes the contents of these buffers, for example when it makes a request for a manual response such as a turn to the left on the steering wheel. This architecture was successfully used to model a number of tasks in different domains. Of special importance for this project is Salvucci’s ACT-R Integrated Driver Model [16]. This model consists of three primary components: control, monitoring, and decision making. The control component manages those perceptual and motor processes necessary for longitudinal and lateral control. The monitoring component manages the maintenance of situation awareness, and the decision component controls whether tactical decisions, such as the initiation of a lane change maneuver, have to be made based on the information gathered by the control and the monitoring component. As our focus is on the implementation of the situation awareness model our modeling activities will primarily address the monitoring and the decision making component. In its current version the situation model of [16] driver model primarily represents the location of other vehicles around the
200
M.R.K. Baumann and J.F. Krems
ego-vehicle. The sampling of the different locations is based on a random sampling model of the four locations (left and right lane, either backward or forward) that samples each location with equal likelihood. In a first step the knowledge base of the ACT-R driver model will be extended to allow the modeling of novice and experienced drivers in the motorway driving scenario that was used in Experiment 1. In this experiment assumptions about the different knowledge structures between novice and experienced drivers were used to predict the differences in recall performance. Implementing these assumptions in the ACT-R cognitive architecture will allow to test these assumptions much more precisely. For this it is necessary to add in a second step a comprehension process to the monitoring component of the driver model that is based on Kintsch’s [5] construction-integration theory (see [17]) so that the model’s situation model not only includes information about the locations adjacent to the ego-vehicle based primarily on perceptual processes, but also information about the relevance of these locations derived from the driver’s intentions and knowledge. The comprehension process serves to integrate this information with the perceived information. This in turn will allow to replace the random sampling strategy of [16] driver model with a sampling model that is based on the information represented in the driver model’s situation representation. The different knowledge structures of simulated novice and experienced drivers should lead to differences in the situation model. As ACT-R already provides a detailed theory about memory processes, such as how the activation of chunks in the declarative memory is increased and decays, ACT-R can then be used to make predictions about the recall of different elements of the situation model in a simulated recall procedure analogous to the one used in Experiment 1 with human subjects. These can be compared to the empirical results of Experiment 1 to test the adequacy of the assumptions about the underlying differences between novice and experienced drivers made in Experiment 1. In a following project phase the effects of secondary tasks on the construction of the situation model especially on the anticipation of events in a traffic situation as examined in Experiment 2 will be the focus of modeling activities. For this a multitasking model (e.g., [18]) will be combined with the comprehension process model to test the assumptions underlying the interpretation of the results of the Experiment 2.
References 1. Endsley, M.R.: Toward a theory of situation awareness in dynamic systems. Human Factors 37(1), 32–64 (1995) 2. Baumann, M., Krems, J.F.: Situation Awareness and Driving: A Cognitive Model. In: Cacciabue, C. (ed.) Modelling Driver Behaviour in Automotive Environments. Springer, London (2007) 3. Gugerty, L.J.: Situation awareness during driving: Explicit and implicit spatial knowledge in dynamic spatial memory. Journal of Experimental Psychology: Applied 3, 42–66 (1997) 4. Matthews, M.L., Bryant, D.J., Webb, R.D.G., Harbluk, J.L.: Model for Situation Awareness and Driving: Application to Analysis and Research for Intelligent Transportation Systems. Transportation Research Record 1779, 26–32 (2001)
A Comprehension Based Cognitive Model of Situation Awareness
201
5. Kintsch, W.: Comprehension: A paradigm for cognition. Cambridge University Press, New York (1998) 6. Fischer, B., Glanzer, M.: Short–term storage and the processing of cohesion during reading. Quarterly Journal of Experimental Psychology: Human Experimental Psychology 38A, 431–460 (1986) 7. Glanzer, M., Nolan, S.D.: Memory mechanisms in text comprehension. In: Bower, G.H. (ed.) The psychology of learning and motivation. Academic Press, New York (1986) 8. Ericsson, K.A., Kintsch, W.: Long-term working memory. Psychological Review 102(2), 211–245 (1995) 9. Crundall, D.E., Chapman, P., Phelps, N., Underwood, G.: Eye movements and hazard perception in police pursuit and emergency response driving. Journal of Experimental Psychology: Applied 9(3), 163–174 (2003) 10. Crundall, D.E., Underwood, G.: Effects of experience and processing demands on visual information acquisition in drivers. Ergonomics 41(4), 448–458 (1998) 11. Underwood, G., Chapman, P., Berger, Z., Crundall, D.: Driving experience, attentional focusing, and the recall of recently inspected events. Transportation Research Part F 6, 289–304 (2003) 12. Durso, F.T., Rawson, K.A., Girotto, S.: Situation comprehension and situation awareness. In: Durso, F.T., Nickerson, R.S., Dumais, S.T., Lewandowsky, S., Perfect, T. (eds.) Handbook of Applied Cognition, Wiley, Chicester (2007) 13. Michon, J.A.: A critical view of driver behavior models: What do we know, what should we do. In: Evans, L., Schwing, R.C. (eds.) Human Behavior and Traffic Safety. Plenum Press, New York (1985) 14. Pollack, I., Johnson, L.B., Knaff, P.R.: Running memory span. Journal of Experimental Psychology 57(3), 137–146 (1959) 15. Anderson, J.R., Bothell, D., Byrne, M., Douglass, S., Lebiere, C., Qin, Y.: An integrated theory of the mind. Psychological Review 111(4), 1036–1060 (2004) 16. Salvucci, D.D.: Modeling Driver Behavior in a Cognitive Architecture. Human Factors 48(2), 362–375 (2005) 17. Budiu, R., Anderson, J.R.: Interpretation-based processing: a unified theory of sentence comprehension. Cognitive Science 28(1), 1–44 (2004) 18. Salvucci, D.D., Taatgen, N.A.: Threaded Cognition: An Integrated Theory of Concurrent Multitasking. Psychological Review 115(1), 101–130 (2008) 19. Baumann, M., Petzoldt, T., Groenewoud, C., Hogema, J., Krems, J.F.: The effect of cognitive tasks on predicting events in traffic. In: Brusque, C. (ed.) Proceedings of the European Conference on Human Interface Design for Intelligent Transport Systems. Humanist Publications, Lyon (2008)
A Probabilistic Approach for Modeling Human Behavior in Smart Environments Christoph Burghardt and Thomas Kirste Department of Computer Science, University of Rostock, Albert Einstein Str. 21, 18055 Rostock, Germany {christoph.burghardt,thomas.kirste}@.uni-rostock.de
Abstract. In order to act intelligently, a smart environment needs to have a notion about its users. Hidden Markov models are especially suited to recognize for example the state of a meeting in a smart meeting room, as they can cope with the noisy and intermittent sensor values. However, modeling the user behavior as an HMM is challenging, because of the high degrees of freedom the users have when acting in such a smart environment. Therefore, we compare two methods that ease the automatic generation of HMM and express the human behavior.
1
Introduction
A smart meeting room should support its users by controlling the state of the environment depending on the current team activity. When e.g. a presentation is taking place, the room should setup the multiple devices like projector, lamps and blinds to ensure optimal visibility for every participant. But before it can do so, it has to be sure about the current state of the meeting in the environment. Determining the progress of the meeting in a smart environment can be regarded as a classification problem. Classification can be done by model-free approaches like artificial neural networks, support vector machines or decision trees. The particular advantage of these methods is that the developer of the smart environment doesn’t need to know anything about the structure of the problem (here: the behavior of the humans in the smart environment). Statistical toolkits like WEKA [1] offer implementations of a wide number of different machine learning algorithms. To facilitate the use of machine learning algorithms even further there exists approaches in the HCI community to directly train statistical model like dtools [2] and exemplar [3]. The disadvantage of these methods is the requirement of much training data. Using (supervised) learning is particularly complex in smart environments: Training data acquisition and annotation is a very expensive task. Furthermore, it is often difficult to generalize common behavior from the training data (comp. Figures 2,4). Most of the times, it is not possible to generate training data for every possible human behavior, even for a closed subset of scenarios like meetings. Therefore, it is advisable to employ methods that need as less training data as possible. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 202–210, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Probabilistic Approach for Modeling Human Behavior
203
Fig. 1. The prototype smart environment where we conduct our experiments
Fig. 2. Left: The noisy and intermittent sensor values from different meetings. Right: The relation of the sensor data to places and persons after automatic annotation by a clustering algorithm (Seats, Presenting Zones, Movement).
Another common and well understood approach are hidden Markov models [4]. The advantage of Markov models is the intuitive manner of fusion common sense knowledge with a-priory knowledge and available training data. A hidden Markov model (HMM) estimates the value of an unobservable variable s over time. The domain of s, denoted by H, represents the set of possible states in which s can exist. An HMM updates its model of s at discrete intervals or time steps. At each time step t, s transitions from its previous state st−1 to the current state st and a set of observations Ot are emitted based on st . The HMM represents the likelihood of these events by a transition function T (st = hi |st1 = hj ), ∀hi , hj ∈ H and a observation function O(Ot |st = h), ∀h ∈ H. The initial distribution over s is given by a prior P (s1 = h), ∀h ∈ H.
204
C. Burghardt and T. Kirste
Fig. 3. The temporal order of the meeting states, ignoring the execution order of the presentations
A big advantage for generative models like HMM are the filtering and prediction operations. Given a specific hidden Markov Model and a set of sensor readings O1:t , Bayesian filtering is finding the most probable next state P (st+1 |O1:t ). Predicting is finding the most probable a posteriori state P (st + d|O1:t ), d > 0. These properties allow hidden Markov Models to intermittent sensor values and make predictions about the most probable next activities of the user. When using HMMs, the designer has to devote most of his efforts to the question of determining sensible values for the states of s, the domain H and the observation and prior functions. The generation of these values is a complex process. Even for small scenarios, s can grow exponentially (comp. Figures 3,6). Furthermore, this task is repetitive, as the resulting model is tied to its domain and not reusable. It has to be modified (slightly) for each new situation.
2
Building Hidden Markov Models
To exemplify the technique of intention recognition with HMMs, we examine a simple meeting that is conducted in a smart meeting room. Three persons A,B,C meet to discuss the progress of a project. They have an agenda: each person is giving a short talk and afterward a discussion about the next steps is scheduled. The smart environment should correctly recognize the meeting states and furthermore the transitions between these states to configure the room for each speaker. Therefore, the smart environment is fitted with a real-time local positioning system to track the movement of the users. Figure 2 and 4 show the sensor values from a number of different recorded meetings before and after annotation by a human expert. 2.1
Determining the Structure
In order to employ hidden Markov Models in our prototype smart environment, we choose a goal-based approach. We define the domain H as the set of goals (or intentions) the user can accomplish in this room. Intention or goal in this context is just an arbitrary state that is a synonym for a room configuration that supports the user in an optimal manner. Giving a presentation is a sensible state, as this is also a sensible room configuration. Therefore, we already have four states for the three presentations and the final discussion. Note that the intention recognition does not control the room, instead it serves as the input to a component that tries to reach these states in an optimal manner.
A Probabilistic Approach for Modeling Human Behavior
205
Fig. 4. The states of the meeting, after recording and annotation through a human expert
For smart environments, a particularly advantageous approach that doesn’t need a central component is described by Reisse et. al. [5]. The observation function O(st ) defines the expected sensor values that can be observed when the user is in the room. This depends mainly on the type of sensors that are employed and can be derived from work from the domain of human action recognition. In our prototype smart environment, we use a combination of RFID and an indoor positioning system. The transition function T (st ) is the temporal dependence between the goals an capture in general the user behavior. In a meeting, this information is available in some form of agenda (most of the time). However, the resulting hidden Markov model should be flexible enough to recognize derivations from the agenda and act accordingly. Therefore, we need to specify the most likely activity sequence, as well as all possible activity sequences of such a meeting. For longer action sequences, this task is quite complex. We therefore need a technique to model the activities of humans during a meeting efficiently. Therefore, want compare some approaches of easing this development step. 2.2
Employing Task Models
Giersich et. al [6] proposed to use task models that could be modified and used as the transition function T (st ) of an HMM. One of the most popular notations with respect to hierarchical task models, is the Concur Task Tree (CTT) [7] notation. Each compound activity is broken down into a number of sub-tasks. Possible
206
C. Burghardt and T. Kirste
A 90
|=|
B 9
|=|
C 1
>>
D
Fig. 5. The task model for the example scenario. The numbers specify the execution preference of one task over its alternatives. In this scenario, A would be executed 90% of the time.
execution sequences are constrained by temporal relationships. Giersich et. al introduced an extension task notation to describe how probable the execution of one task is over another. All possible task traces form the complete directed acyclic graph, that is converted to the transitions function T (st ). By specifying the execution preference, the transition function would be tweaked accordingly. The resulting task model that fits the given specific scenario is depicted in Figure 5. The advantage of this approach is that the task models are able to describe the human behavior in an intuitive fashion on a higher semantic level than a plain transition function. The description of the dependencies of a task is simpler than specifying all possible execution histories. In order to complete the proposal by Giersich et. al and generate the complete hidden Markov model from the task model, we propose two further extensions: First, the designer must specify the observation function O(st ) for each activity. In our scenario, the designer must add the (approximate) movement pattern for all persons for each activity. As can be seen in Figure 2 most activities are stationary. During the presentation, the presenter is moving through the presenter zone and the other persons are listening at their seats. Sensor noise increases the radius of the places. All movement of the people (in grey) is captured by a single circle, capturing the full smart environment. Second, the designer can optionally specify a duration for each activity with a mean and a variance. Temporal aspects of an HMMs can be modelled with negative binomials. For more information about what HMMs can do, the interested reader is referred to [8]. When both pieces of information are incorporated into the task model, it is straightforward to generate the complete hidden Markov model from the task model. When duration information is given, the recognition rate of the hidden Markov model can be increased. A problem that is still present when using task models is that a task model still describes only one specific scenario. The resulting HMM is not suitable when the domain changes, e.g. a fourth person takes part in the meeting. This problem could be further diminished by introducing roles. The notion is to describe the behavior of people in the smart environment as ”role models” and attach every person taking part in the meeting a role. This would further facilitate the generation of task models for similar situations but is part of future work.
A Probabilistic Approach for Modeling Human Behavior
2.3
207
Partial Order Planning
A different approach to generate the number of states and their dependencies is to use concepts of the planning community. We first proposed this approach in [9]. The main idea is to describe the behavior of the humans as planning operators, e.g. in STRIPS [10]. Every planning operator consists of a number of preconditions and effects. The operator ”Give a Presentation” can for example be written as: (:action present :parameters(?who) :precondition (and (prepare-presentation ?who) (forall (?x - user) (or (=?x ?who) (prepare-sit-and-listen ?x)))) :effect (and (has-presented ?who)(not (prepare-presentation ?who))) )
The prepare-* conditions ensure that the user has e.g. moved to the right destination and divide each task into a prepare and an act part. The progress of the meeting is stored in the has-* conditions. The devices and persons that are present during the meeting are given as initial states to the planner. The goal (in our scenario the meeting), is the set of actions that need to be accomplished. Expressed in a planning language like STRIPS, a goal description for a meeting looks like this: All persons have given a presentation and taken part in the discussion. (:goal (forall (?p - person) (and (has-presented ?p) (has-discussed ?p) ) ) ) The result of a partial order planning process is a directed acyclic graph that is one of the possible execution histories of a meeting. The sum of all valid plans is therefore the complete graph that can be translated into the transition function of an HMM. Assigning good transition probabilities is still a challenge. We settled for a rather simple approach: The transition probabilities can be specified by the designer or else a default forward probability, depending on the number of possible branches in the graph (see Figure 6), is assigned. Implementing durative actions is straightforward, as durative actions are already available in the STRIPS language. In order to prove the feasibility of this approach, we implemented a simple forward planner that builds up the complete directed acyclic graph. While the planner is not suitable for all types of planning problems, the description of the human behavior in a meeting room is in comparison a relatively small planning problem with, in our scenario, about 300 operators. Therefore a complete apriory iteration through the search space is still possible. A meeting scenario with three people is depicted in Figure 6 and consists of 88 states.
208
C. Burghardt and T. Kirste
Fig. 6. The sum of all possible plan for the meeting with three persons, generated by a planner. The numbers at the arcs are the transition probabilities.
This approach has the advantage that only context that could be sensed automatically (like the number of persons and the possible user goals from the agenda) is needed and the suitable human behavior model is generated automatically. A further advantage is the (in principle) higher complexity models that can be built. A model for a four person meeting grows to 250 states. Unfortunately, the resulting model is way too large to be depicted in this publication. For even larger and more complex models it is not possible to expand the full plan with a deep search first a-priory. However with approximate inference methods like particle filters it is not necessary to expand the full search space. Instead one needs to consider only the next steps that are valid transitions from the current state. Plans that are not compatible with the sensor data are automatically pruned by the filter.
3
Comparing the Resulting Models
After presenting each approach, what is the recommendation when you want to build an intention recognition system? Task models have the advantage that
A Probabilistic Approach for Modeling Human Behavior
209
they are well known to the HCI designer and are a intuitive way to specify the temporal order. By generating the resulting set of possible task traces, the designer knows what is recognisable by the resulting hidden Markov model. However, this is still a manual approach of generating the HMMs. The designer has to specify every possible behavior a-priory. This approach is limited to closed, simpler scenarios like e.g. small meetings in a smart meeting room. Furthermore, the resulting intention recognition should work in an ad-hoc scenario as well, where people bring some of the infrastructure into the room. Therefore, for more complex scenarios, other methods like the planning approach can capture the human behavior better. The planning approach can even be fully automated, which is better suitable for ad-hoc environments. However, for an HCI designer, the initial learning curve is quite high as he must get used to a new way of expressing activities with conditions and effects. Finding errors is also not as intuitive as when using CTT, because the resulting graphs from the planner can get more complex. When the designer builds a complete CTT model, the resulting structure is identical in both approaches. Therefore, a similar performance is expected in real-world evaluation. The task-model approach is again slightly advantageous when searching for errors. Therefore, for smaller projects the recommendation is to use task models. However, one has to be aware of the fast growing complexity: even a four-person meetings can already expand to 240 states.
4
Conclusion and Outlook
Inferring the current state of the meeting is an important step in the data processing chain in smart environments. Using HMMs is especially favorable, as these models merge a-priory knowledge with sensor data to deliver good accuracy without the need for expensive training data. However, these models need to be adapted for every individual meeting and smart environment. Therefore, the generation of simple yet powerful HMMs has to be simplified. Planning and task models are both promising techniques to reduce the complexity of modeling human behavior with HMMs. This work is part of the research of the graduate school MuSAMA and supported by the German Research Foundation (DFG).
References 1. Garner, S.R.: Weka: The waikato environment for knowledge analysis. In: Proc. of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995) 2. Hartmann, B., Klemmer, S.R.: Reflective physical prototyping through integrated design, test, and analysis. In: UIST 2006, pp. 299–308. ACM Press, New York (2006) 3. Bj¨ orn, H., Leith, A., Manas, M., Klemmer, S.R.: Authoring sensor-based interactions by demonstration with direct manipulation and pattern recognition. In: CHI 2007: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 145–154. ACM Press, New York (2007)
210
C. Burghardt and T. Kirste
4. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989) 5. Reisse, C., Kirste, T.: A distributed mechanism for device cooperation in Smart Environments. In: Advances in Pervasive Computing. Adjunct proceedings of the 6th International Conference on Pervasive Computing, Sydney, Australia, pp. 53– 56 (2008) 6. Giersich, M., Forbrig, P., Fuchs, G., Kirste, T., Reichart, D., Schumann, H.: Towards an integrated approach for task modeling and human behavior recognition. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4550, pp. 1109–1118. Springer, Heidelberg (2007) 7. Paterno, F., Mancini, C., Meniconi, S.: Concurtasktrees: A diagrammatic notation for specifying task models. In: INTERACT 1997: Proceedings of the IFIP TC13 Interantional Conference on Human-Computer Interaction, London, UK, pp. 362– 369. Chapman & Hall, Ltd., Boca Raton (1997) 8. Jeff, A.B.: What HMMs Can Do. UWEE Technical Report UWEETR-2002-0003, Department of Electrical Engineering, University of Washington (January 2002) 9. Burghardt, C., Giersich, M., Kirste, T.: Synthesizing probabilistic models for team activities using partial order planning. In: Ami Workshop, KI-Konferenz 2007 (2007) 10. Fikes, R., Nilsson, N.J.: Strips: A new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3/4), 189–208 (1971)
PERMUTATION: A Corpus-Based Approach for Modeling Personality and Multimodal Expression of Affects in Virtual Characters Céline Clavel and Jean-Claude Martin LIMSI-CNRS, BP 133, 91403 Orsay cedex, France {celine.clavel,martin}@limsi.fr
Abstract. In order to improve the consistency of their affective multimodal behaviors, interactive virtual agents might benefit from a model of personality inspired from psychology. In this paper, we revisit the different approaches considered in personality psychology. We show that previous efforts to endow virtual agents with personality made only a limited use of these approaches. Finally, we introduce our PERMUTATION corpus-based framework. Keywords: virtual agents, multimodality, emotion, personality.
1 Introduction An interactive virtual agent is a human-computer interface in which an animated character displayed on the screen combines several human-like modalities such as speech, gesture and facial expressions. Using an interactive virtual agent is expected to lead to an intuitive and friendly interaction since it uses communication modalities that we all use every day. Researchers recently focused on how such virtual characters should express affective states across modalities and how that would be perceived by human subjects [1, 2, 3, 4]. In order to improve the consistency of their multimodal expressions of affects, interactive virtual agents might benefit from a model of personality inspired from psychology. This perspective raises several questions: how should affects and personality interact in the specification of the multimodal expression to be displayed by the virtual character? What are the relevant psychological models from which we can inspire from? The first section of this paper surveys the different approaches to personality psychology to explain their interests and their limitations in terms of virtual agents. The second section shows that, although several attempts have been made to endow virtual characters with personality, current virtual characters are still limited when compared to the rich literature about human personality. The last section introduces a corpus-based framework to inform the expression of affective states in a virtual agent endowed with personality features. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 211–220, 2009. © Springer-Verlag Berlin Heidelberg 2009
212
C. Clavel and J.-C. Martin
2 Psychological Approaches to Personality One of the definitions of Personality [5] is “a set of organized, stable and individualized behaviors”. The goal of personality research is to try to describe, to explain and to predict this set. Today an integrative approach of personality is recommended. Six levels of analysis are proposed to provide a global overview of human personality: the trait-dispositional level, the psychodynamic-motivational level, the phenomenological level, the behavioral-conditioning level, the social cognitive level and the biological level. Each level provides a specific contribution to the general understanding of personality and behavior. These levels have been partly addressed by several major historical approaches that we describe below. The Lexical Approach The lexical approach to personality proposes to classify the terms of natural language that are used to describe and understand human qualities. It is based on the postulate that the frequency of a word used to describe people corresponds to the importance accorded to such words in human interactions. Factorial analysis has been employed in order to reduce and organize the thousands of adjectives into a smaller number of dimensions. Factorial analysis is applied to process interpersonal differences of the dispositional tendencies reported during self-assessments. The lexical approach enables to define constructs that have a relative temporal stability, a good predictive value, that are applicable to different cultures and that are socially important. These constructs correspond to “personality traits”. Rolland [6] defines traits as “coherent sets of cognitions, emotions and behaviors that demonstrate a temporal stability and cross situational consistency”. Such traits result from inferences and not from a directly observable reality. Personality traits are defined as general, durable, relatively stable, characteristics, used to assess and explain the behavior. Different models and psychometric tools based on the lexical approach have been developed: the Eysenck Personality Inventory, 16PF, and NEOPI R. Eysenck [7] described personality from the clinical descriptions of patients displaying psychopathological behaviors. He focused on Extraversion / Introversion and Neuroticism / Emotional stability. A third factor was present in his previous work: Psychotism. Each trait is bipolar. Eysenck developed a psychometric tool to assess personality: the Eysenck Personality Inventory (EPI). However Eysenck grants little importance to the impact of education in the development of personality and defends the idea that personality's factors are determined primarily by heredity. Cattell [8] wanted to identify all personality traits that an individual might possess out of language. From 4500 words describing personality, he established a list of 171 words. He asked hundreds of subjects to assess whether they considered themselves to be well described by these words. He also asked other people to evaluate these same people. He identified 16 personality traits (16PF). However, some of these factors are strongly correlated and five factors of second order come out of these 16: Extraversion, Anxiety, Hardness / Intransigence, Independence, and Self-control. The Big Five model (FFM) proposed by Costa and McCrae describes personality with two levels. The facets propose a fine and accurate description of personality. A domain corresponds to a group of facets. The big five model identifies 5 basic dimensions through factorial analysis. Each dimension includes specific
PERMUTATION: A Corpus-Based Approach for Modeling Personality
213
characteristics of personality represented by adjectives on a bipolar scale and is composed of six facets which specify them. Neuroticism is defined as a system regulating avoidance behaviors [9, 10]. Its role is to preserve the organism of pain by anticipating and by activating surveillance behaviors. A subject with a high neuroticism score presents a very critical vision of herself. She also has the tendency to feel frequently and intensively a wide range of negative emotions. Extraversion is characterized as a system of regulation of approach behaviors. A high score on this trait reveals a strong sensitivity to pleasant stimuli and a tendency to feel frequently and intensively positive emotions. Openness to Experience results in broad and varied interests, a capacity to search for and to live new and unusual experiences. It is a system of regulation of reactions to novelty. A person who presents a high score manifests particular attention to his own emotional universe. Agreeableness refers to interactions with others and especially to the tone of relationship with others. It corresponds to a system of regulation of the balance in relations and exchanges. A high score corresponds to altruist individuals who worry first about the well-being of others and who have tendency to trust others. Conscientiousness relates to motivation, organization and perseverance in the conducts oriented towards a goal. A high score corresponds to a person who tends to set long-term goals, to organize her action and accepts the constraints bound to the satisfaction differed of the needs and desires. Studies revealed that there is a consensus among judges to characterize target individuals on certain traits (extraversion and conscientiousness). The characterizations of target individuals by judges tend to be convergent but also correspond to selfdescriptions of targets, even when the judges have minimum indications about target individuals [11]. Costa and McCrae elaborated the NEOPI R that is the most widely used personality's inventory. It includes 240 items, 48 by domain, 8 by facet. It was translated into several languages [12, 13]. The scores on personality traits found among young adults (of more than 30 years) are relatively stable with the advancement in age. Costa & McCrae [14] consider the traits as endogenous trends with a biological basis. According to Costa and McCrae, the model of Big Five represents the universal structure of individual differences. The goal of the Big Five model is to provide descriptive models to classify individuals according to their abstract dispositional trends and not to explain their behaviors. The significance of adjectives used in natural language is complex and it is relevant to describe them by their polysemy [15]: they can describe the behaviors, the states and evaluate the social utility of others. Lexical models attach little importance to the role of the environment in the implementation processes of personality. Finally these models are interested solely in the interpersonal differences. These criticisms led to two other currents of research. The Psychosocial Approach This approach sees in the expression of traits the description of the behaviors, the states and the evaluation of the social utility of others. Rather than considering a trait as the expression of an individual trends, this approach suggests to consider the trait as an act depends on the sense given to the situation : to say how we behaves, but also to express our value in social relationships and express our internal states. Two approaches characterize this current of research: 1) an approach centered on the value
214
C. Clavel and J.-C. Martin
of adjectives, and 2) an approach centered on the significance of the adjectives. Only the first approach will be presented here because it is more relevant for virtual characters since it enables to determine the representations which the subjects develop concerning the agents. According to this approach, the perception of individual differences is an evaluation of others and not a description of their supposed properties [16, 17] [18]. People do not judge others in order to describe them as accurately as possible, but instead in order to prepare an adequate interaction with them according to the utility that they represent. So people would develop a personality's implicit theory (TIP). Two dimensions of the values of adjective structure personality traits. The first dimension concerns approach vs. avoidance. We consider as positive everything that relates to approach and negative what relates to avoidance. The second dimension characterizes the skills, the power and the social status of people. Thus, adjectives convey information on people about their emotional (1st dimension) or social value (2nd dimension). Wiggins’ [19] proposes a circumplex model reporting the structure of interpersonal adjectives. Beauvois [20, 21] proposed social analysis that considers the value of a person as depending on her interpersonal relations or social relationships. Social desirability corresponds to the knowledge that people have about what is considered to be desirable in a society. Social utility refers to the knowledge that people have about the likelihood of success of a person in the social life according to her level of adhesion to the surrounding social organization. The Socio-Cognitive Approach Bandura [22, 23] aims to explain human conducts through the understanding of underlying mechanisms of individual actions. This approach attaches importance to the social context and the intra individual differences. It focuses primarily on how the individual selects, estimates, and processes information about others and about the world. It is about understanding the cognitive, emotional, and social processes that characterizes individuals. Three entries contribute to the development of this approach according to Cervone [24]: 1) a meta-theoretical framework that can organize research on the individual, 2) theories which study variables characterizing the architecture of personality and which enable predictions and evaluations at the individual level, and 3) theories focused on the demonstration of the dynamics and socio-cognitive processes that underlie a given phenomenon. CAPS (CognitiveAffective Personality Systems) is a cognitive and emotional model of personality [25]. It considers the beliefs and goals of individuals. Personality does not result from the sum of isolated and independent individual characteristics. It is therefore important to study the organization of these variables. This model considers that the person and the environmental situation are continually in interaction and influence mutually. It questions the lexical approach theories based on average behavioral tendencies. The CAPS model considers the variability in behaviors observed in different situations as being informative about the individual. However CAPS remains a conceptual framework and does not specify which variables are necessary to consider for modeling the structure of personality. The KAPA model (Knowledge and Appraisal Personality Architecture) attempts to answer this question.
PERMUTATION: A Corpus-Based Approach for Modeling Personality
215
The KAPA model by Cervone [26] distinguishes three types of mental contents depending on the direction of intentionality. Beliefs are directed from the mind towards the world. Goals are directed from the world towards the mind. A mental content is not true or false but reflects an intention to reach a future state. It serves as a criterion for assessing the quality of an entity. This is referred to as Evaluative standard. These types of mental content are involved in knowledge and appraisal:”Knowledge is an enduring structural feature of personality. Appraisals are dynamic personality processes. People possess vast repertoires of knowledge, but only a small subset is active at a time, and is thus potentially influential to appraisal processes”.
3 Virtual Agents with a Personality Psychosocial Approach to Human-Computer Interaction Several studies considered how users build representation of their computers. Users apply to their computer some stereotypes of daily life. Gender stereotypes were observed: users trust more a machine endowed with a male voice and estimate that a machine having a female voice has higher relational skills [27]. Nass et al. [28] observed that standards of social utility and social desirability also applied to HCI. The performances of computers are judged as being superior when they are valorized by other computers that when it is the computer that welcomes itself. A computer which congratulates itself or which criticizes other computers is perceived as being less friendly than a computer which admires the others and which displays self criticism. Otherwise computers that criticize are perceived as being smarter than computers that praise. Other researchers studied the impact of users’ personality on their representation of their computer. Nass et al. [29] propose a circumplex model of inter-personal behavior based on two “factors”: Extraversion (dominant vs. submissive) and Agreeableness (cordial vs. hostile). They observed that subjects preferred a computer which looked like them. The same result is obtained for the skill. Models Based on the Lexical Approach Most models of virtual agents with personality inspire from the lexical approach, and the Big Five model [30, 31, 32, 33, 34]. According to André et al. [35], the model developed by Costa & McCrae presents the advantage to be descriptive and is considered like a support of the emotional dimension. Some architecture focuses on a few traits (extraversion, agreeableness and neuroticism for André, Klesen, Gebhard, Allen & Rist, or the neuroticism for Hermann, Melcher, Rank & Trappl). Ball and Breese [36] use the dominance and friendliness dimensions which they consider as being more relevant within the framework of interpersonal relations. Models based on the personality's socio-cognitive approach Models of virtual agents based on the socio-cognitive approach to personality are few. Moffat [37] developed a model to create personalities of virtual agents using the works of Mischel. In the same way, Sandercock et al. [38] worked on the development of believable agents for interactive applications. These authors focused
216
C. Clavel and J.-C. Martin
Computing Models
Characteristics of psychological models
on the intra-individual variability of agents. Their goal is to produce agents whose conducts depend on the situation but remain coherent. These authors note that most implementations of personality are static and based on the theory of traits. However according to Sandercock et al. [38], the lexical approach doesn't propose any help concerning the modulation of the expression of personality's traits depending on the situation.
-
-
-
-
-
Lexical approach Descriptive model to classify individuals according to their abstract dispositional trends Traits as endogenous trends with a biological basis Factorial Analyses Psychometric tools: EPI, 16PF, Big Five Interpersonal differences Gebhard, ALMA Egges et al. A Model for Personality and Emotion Simulation Breese & Ball models of emotions and personality encoded as Bayesian networks André et al. Persona System
-
-
Psychosocial approach Perception of individual differences is mostly an evaluation of others Influence of social interactions Role of Social desirability Role of Social utility
- Nass et al. CASA ”Computers Are Social Actors“
Sociocognitive approach - Explanatory model of human behavior via the understanding of the underlying mechanisms of individual actions - Influence of social context - Integration of the different individual variables - Intra-individual variability - CAPS KAPA - Moffat - Sandercock et al.
Computing Models combining different approaches - Read et al. The Personality-enabled Architecture for Cognition - Poznanski & Thagards : the SPOT model
Models combining different approaches of the personality Most computer models trying to introduce the concept of personality in virtual characters combine different perspectives [34, 39, 40, 41]. For example, Poznanski & Thagards [42] developed the SPOT model (Simulating Personality Over Time) based on the lexical approach and the socio-cognitive. It is composed of four components: personality, emotion, input describing the situation, and output describing the behaviors. The personality component is based on the Big Five model. Each of the traits has its behavioral pattern. Personality’s nodes in SPOT are connected with the behavioral output based on behavioral trends for a given trait. The extrovert node is strongly connected to behaviors that characterize extrovert humans. It is also loosely connected to behaviors that are not extrovert. These various connection strengths represent the genetic predispositions of a person for the traits. This model also relies on theories of social learning about personality. Each “situation” input is connected
PERMUTATION: A Corpus-Based Approach for Modeling Personality
217
with various strengths to a corresponding “behavior” output as well as to other behaviors. The connections between nodes and their strength are determined from the emotional and behavioral tendencies known to the Big Five.
4 The PERMUTATION Corpus-Based Approach Existing personality models used in virtual agents are mainly based on the lexical approach to personality. They are inspired by the socio-cognitive approach only when the situation is considered important in the development of personality. Few virtual agent studies considered the cognitive dimension of personality and tried to develop profiles of cognitive functioning for virtual agents. Among socio-cognitive approaches to personality psychology, cognitive styles describe one’s cognitive functioning but also certain aspects of one’s social behaviors [43]. They relate to characteristic ways to perceive, remember, think and solve problems [44, 45]. Thus, they describe the style of mental activity rather than its content. They are deduced from our stable individual differences in the way of organizing and of dealing with information. One of the most studied cognitive styles in psychology is the field-dependency dimension (FID). This cognitive style relates to the usual and favorite way of perceiving the information. People that are independent from the field (FI) have an analytical vision; they transform the information at their disposal to organize it according to their own criteria. Their conducts are rather directed toward objects and they tend to take the lead in social interactions. In contrast, people that are depending on the field (FD) are more sensitive to the perceptive and conceptual organization of the information. They are very attentive to interpersonal relations and tend to ask for information from others. Research on expressive agent did not invest much this dimension of personality. Few researches tried to assign the properties which characterize certain cognitive functioning to animated characters. However such properties of mental activity might participate in the multimodal expression of emotion since one goal of multimodal expressions of emotion is to inform others of the way we evaluate the current situation. Our hypothesis is that cognitive style can be perceived in the multimodal expression of emotions. For example “FI” people do not consider much the point of view of others and tend to dictate their opinion. They might not try to control their anger and might adopt broader and quicker movements than “FD” people. To determine the multimodal emotional expression associated to every pole of the cognitive style (FID), we have developed the PERMUTATION corpus-based approach (PERsonality MUltimodal InTerAcTION). TV series provide recurrent behaviors displayed by a variety of characters over time when faced with different emotional situations. They might provide more spontaneous data than acted protocols using in-lab conditions. They enable to consider the role of the situation in the emergence of the emotional process and are informative on the stability of the emotional expression in the course of time according to the personality of the characters. We designed a questionnaire for assessing the various parameters of the DIC (orientation of the behaviors, type of interaction and type of perception). 50 subjects had to estimate the cognitive style of seven television series characters with the help of this questionnaire. This allowed us to select five female characters recognized by the subjects either as being either strongly FI or strongly FD.
218
C. Clavel and J.-C. Martin
Video samples featuring emotional behaviors of these characters were then collected. Five emotion families were considered (happiness, anger, surprise, fear and sadness). 100 sequences have been selected, for a total duration of 2568 seconds. These video samples were then annotated by other subjects with respect to the multimodal emotional expression they perceive. Preliminary results reveal that subjects estimate the emotional events differently according to their emotional tone. They judge that the positive episodes are more pleasant and more favorable to the success of the character's goals than the negative situations. Besides, when we compare the evaluation realized by subjects for two characters who are perceived as having the same cognitive style, no difference appears. The subjects consider that characters having the same cognitive style and facing similar situations, realize the same type of emotional assessment of the situation. Movement quality was observed to be a discriminative features of some acted emotions [46]. In our data, the subjects judged that the temporal amplitude, the intensity and the general level of activation vary according to the nature of the expressed emotion. When a character expresses Anger, her gestures are perceived to be faster, more intense and the general level of activation is higher than when the character expresses Joy. Furthermore, as seen previously, two characters having the same cognitive style and expressing the same type of emotion do not present differences as for the quality of their movements.
5 Conclusion and Future Directions We surveyed the different models of personality in virtual agents and the underlying approaches in Psychology. We introduced the PERMUTATION corpus-based approach which aims at informing the design of virtual agents that are able to reflect their cognitive style in their multimodal expression of emotion. Such data can be useful for the definition a library of multimodal behaviors associated to different emotions according to cognitive styles. Future directions include validating the model by conducting similar perceptive studies with users interacting PERMUTATIONbased animated characters. Acknowledgments. The research presented here was partly supported by the project “ANR TLOG Affective Avatars”. The authors thank the students who participated in the study.
Bibliography 1. Martin, J.-C., Niewiadomski, R., Devillers, L., Buisine, S., Pelachaud, C.: Multimodal Complex Emotions: Gesture Expressivity And Blended Facial Expressions. Special issue of the Journal of Humanoid Robotics on Achieving Human-like Qualities in Interactive Virtual and Physical Humanoids 3, 3 (2006) (Pelachaud, C., Canamero, L. (eds.)) 2. Buisine, S., Abrilian, S., Niewiadomski, R., Martin, J.-C., Devillers, L., Pelachaud, C.: Perception of Blended Emotions: from Video Corpus to Expressive Agent. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS, vol. 4133, pp. 93– 106. Springer, Heidelberg (2006)
PERMUTATION: A Corpus-Based Approach for Modeling Personality
219
3. Pelachaud, C.: Multimodal expressive embodied conversational agent. ACM Multimedia, Brave New Topics session, Singapore, 683–689 (2005) 4. Gratch, J., Marsella, S.: Lessons for Emotion Psychology for the Design of Lifelike Characters. In: Journal of Applied Artificial Intelligence (special issue on Educational Agents - Beyond Virtual Tutors), vol. 19, pp. 3–4 (2005) 5. Mischel, W., Shoda, Y., Smith, R.E.: Introduction to Personality: Toward an Integration, 7th edn. J. Wiley & Sons, Hoboken (2004) 6. Rolland, J.P.: L’évaluation de la personnalité. Mardaga, Sprimont (2004) 7. Eysenck, H.J.: The structure of human personality, London, Methuen (1970) 8. Cattell, R.B.: Personality and mood by questionnaire, San Francisco, Jossey-Bass (1973) 9. Davidson, R.: Affective style and affective disorders: Perspective from neurosciences. Cognition and Emotion 12, 3 (1998) 10. Gray, J.: The neurobiology of temperament. In: Exploration in temperament: International perspectives on theory and measurement. Plenum Press, New York (1991) 11. Borkenau, P., Liebler, A.: Trait inferences: sources of validity at zero acquaintance. Journal of Personality and Social Psychology 62 (1992) 12. Mlaĉiĉ, B., Ostendorf, F.: Taxonomy and structure of Croatian personality-descriptive adjectives. European Journal of Personality 19 (2005) 13. Saucier, G., Goldberg, L.: Personnalité, caractère et tempérament: La structure translinguistique des traits. Psychologie Française 51 (2006) 14. McCrae, R.R., Costa Jr., P.T.: A Five-Factor Theory of personality. Handbook of personality psychology. Guilford, New York (1999) 15. Mignon, A., Mollaret, P.: Quel type d’approche scientifique pour la description de la personnalité? Psychologie Française 51 (2006) 16. Beauvois, J., Dubois, N.: Traits as evaluative categories. Cahiers de Psychologie Cognitive/Current Psychology of Cognition 12 (1992) 17. Beauvois, J., Dubois, N.: Affordances in social judgment: Experimental proof of why it is a mistake to ignore how others behave towards a target and look solely at how the target behaves. Swiss Journal of Psychology/Schweizerische Zeitschrift für Psychologie/Revue Suisse de Psychologie 59 (2000) 18. Mignon, A., Mollaret, P.: Applying the affordance conception of traits: a person perception study. Personality and Social Psychology Bulletin 28 (2002) 19. Wiggins, J.S.: A Psychological taxonomy of trait-descriptive terms: the interpersonal domain. Journal of Personality and Social Psychology 37 (1979) 20. Beauvois, J.L.: La connaissance des utilités sociales. Psychologie française 40 (1995) 21. Beauvois, J.L.: Judgment norms, social utility, and indvidualism. A sociocognitive approach to social norms. Routledge, London (2003) 22. Bandura, A.: Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review 84 (1977) 23. Bandura, A.: Social cognitive theory: An agentic perspective. Annual review of psychology 52 (2001) 24. Cervone, D.: Personality Architecture: Within-Person Structures and Processes. Annual Review of Psychology 56, 1 (2005) 25. Mischel, W., Shoda, Y.: A cognitive-affective system theory of personality: Reconceptualizing situations, dispositions, dynamics, and invariance in personality structure. Psychological Review 102 (1995) 26. Cervone, D.: Personality assessment: tapping the social-cognitive architecture of personality. Behavior Therapy 35 (2004)
220
C. Clavel and J.-C. Martin
27. Nass, C., Moon, Y., Green, N.: Are Machines Gender Neutral? Gender-Stereotypic Responses to Computers With Voices. Journal of Applied Social Psychology 27 (1997) 28. Nass, C., Steuer, J., Henriksen, L., Dryer, D.: Machines, social attributions, and ethopoeia: Performance assessments of computers subsequent to ’self-’ or ’other-’ evaluations. International Journal of Human-Computer Studies 40 (1994) 29. Nass, C., Moon, Y., Fogg, B.J., Reeves, B., Dryer, D.: Can computer personalities be human personalities. International Journal Human Computer Studies 43 (1995) 30. Egges, A., Kshirsagar, S., Magnenat-Thalmann, N.: A Model for Personality and Emotion Simulation. Knowledge Based Intelligent information and engineering systems (2003) 31. Kshirsagar, S.: A multilayer personality model. In: 2nd international symposium on Smart graphics SMARTGRAPH 2002 Hawthorne, New York, pp. 107–115 (2002) 32. Hermann, C., Melcher, H., Rank, S., Trappl, R.: Neuroticism - A Competitive Advantage. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 64–71. Springer, Heidelberg (2007) 33. Delgado-Mata, C., Ibáñez, J.: Behavioural Reactive Agents for Video Game Opponents with Personalities. In: Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) IVA 2007. LNCS, vol. 4722, pp. 371–372. Springer, Heidelberg (2007) 34. Gebhard, P.: ALMA - A Layered Model of Affect. In: Fourth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2005) Utrecht, pp. 29–36 (2005) 35. André, E., Klesen, M., Gebhard, P., Allen, S., Rist, T.: Integrating Models of Personality and Emotions into Lifelike Characters workshop on Affect in Interactions - Towards a new Generation of Interfaces in conjunction with the 3rd i3 Annual Conference, Siena, Italy, pp. 136–149 (1999) 36. Breese, J., Ball, G.: Modeling emotional state and personality for conversational agents, pp. 7–13. AAAI, Menlo Park (1998) 37. Moffat, D.: Personality Parameters and Programs. Berlin 38. Sandercock, J., Padgham, L., Zambetta, F.: Creating Adaptive and Individual Personalities. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS, vol. 4133, pp. 357–368. Springer, Heidelberg (2006) 39. Read, S.J., Miller, L.: Virtual Personalities: A Neural Network Model of Personality. Personality and Social Psychology Review 6 (2002) 40. Ghasem-Aghaee, N., Ören, T.I.: Cognitive Complexity and Dynamic Personality in Agent Simulation. Computers in Human Behavior 23 (2007) 41. Read, S., Miller, L., Kostygina, A., Chopra, G., Christensen, J.L., Corsbie-Massay, C., Zachary, W., Lementec, J., Iordanov, V., Rosoff, A.: The Personality-Enabled Architecture for Cognition (PAC). In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 735–736. Springer, Heidelberg (2007) 42. Poznanski, M., Thagard, P.: Changing personalities: towards realistic virtual characters. Journal of Experimental & Theoretical Artificial Intelligence 17 (2005) 43. Huteau, M.: Les conceptions cognitives de la personnalité. PUF Paris (1985) 44. Witkin, H.: A cognitive-style perspective on evaluation and guidance. In Proceedings of the Invitational Conference on Testing Problems (1973) 45. Messick, S.: The matter of style: manifestations of personality in cognition, learning, and teaching. Educational Psychologist 29, 3 (1994) 46. Wallbott, H.G.: Bodily expression of emotion. European Journal of Social Psychology 28 (1998)
Workload Assessment in Field Using the Ambulatory CUELA System Rolf Ellegast, Ingo Hermanns, and Christoph Schiefer BGIA (Institute for Occupational Health and Safety of the German Social Accident Insurance), Alte Heerstrasse 111, 53757 Sankt Augustin, Germany {rolf.ellegast,ingo.hermanns,christoph.schiefer}@dguv.de
Abstract. Ambulatory assessment of physical workloads in field is necessary to investigate the risk of work-related musculoskeletal disorders (MSD). Since more than ten years the BGIA is developing and using the motion and force capture system CUELA (computer-assisted recording and long-term analysis of musculoskeletal load), which is designed for whole-shift recordings and analysis of work-related postural and mechanical loads in ergonomic field analysis. This article gives an overview of the actual state of development and some applications of the system. Keywords: ambulatory workload assessment, inertial tracking device, motion capturing, CUELA, ergonomic field analysis.
1 Introduction At many workplaces, musculoskeletal workloads due to manual material handling, awkward postures or repetitive movements can be commonly observed. Observational methods are commonly known for workload assessment in field. The problem with these methods is that the description of risk factors (e. g. postural workloads) is too broad to provide accurate information for an appropriate assessment [18]. Therefore, direct measurements should be preferred for more accurate and less time-consuming workload data acquisition and assessment. There are several measurement systems for assessment of postural workloads of a specific body part, e.g. assessment of trunk postures at work [20, 21]. The BGIA (Institute for Occupational Health and Safety of the German Social Accident Insurance) is developing and using a measuring system known as CUELA (computer-assisted recording and long-term analysis of musculoskeletal load) for assessment of postural and kinetic workloads of several body parts (upper and lower extremities, trunk and head). CUELA allows for a quantification of musculoskeletal workloads even in complex work processes. In a second step a check of the effectiveness of already initiated measures is possible to improve the ergonomics of the work process.
2 Method The CUELA system consists of accelerometers, gyroscopes and potentiometers, which can directly be attached to the worker’s clothes, and a small portable V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 221–226, 2009. © Springer-Verlag Berlin Heidelberg 2009
222
R. Ellegast, I. Hermanns, and C. Schiefer
Fig. 1. Setup of the CUELA system (basic version)
data-logger (sampling rate 50 Hz, 168 channels). The basic CUELA system enables a motion capturing of the trunk (3D) and of the lower extremities in the sagittal plane [5, 6] (see figure 1). An extension of the CUELA system also provides a 3D motion recording of the upper limb (shoulder blade, shoulder joint, elbow, forearm and wrist), the inclination of the pelvis and the head [7, 14]. As the system run on a miniature battery, CUELA is specifically designed for field analysis at mobile workplaces. The synchronous registration of ground reaction forces is realized using foot pressure sensitive insoles. Each insole consists of 24 piezo-resistive hydro cells. From the ground reaction forces, it is possible by using a biomechanical model to detect the handled load weights even during dynamic movement [5]. The measuring system, which can be fitted at the workplace in about 20 minutes, weighs approximately 3 kg. It can be adjusted to body size and height. Employees wearing the system can go about their work in the usual way. The measurements are additionally documented on video. By synchronizing the video recording with the measured data, it is possible to match the load readings with the actual work situation. Immediately after measurement the data can be placed into a developed CUELA software and displayed (see figure 2). Using this software it is possible to display body postures at any given point with the aid of a 3-dimensional computer-animated figure and a time-dependent graph of the measured data. At the same time, the associated work situation is automatically illustrated in the video sequence. After measurement it is possible to mark any actions or situations to highlight certain work activities and have them evaluated. The CUELA software automatically issues a series of statistical evaluations to give a quick impression of the quantified risk factors. Body angles and postures are analyzed with reference to the literature and some relevant standards:
Workload Assessment in Field Using the Ambulatory CUELA System
223
• Extreme body angle positions, asymmetrical posture patterns (assessed in accordance with ISO and European standards and literature e. g. [4]) • Static postures (assessed in accordance with European standards) • Repetitive movements (assessed in accordance with RULA [19, 14], OCRA [1, 14] and other literature [23, 17] For each measurement, it is possible to have an OWAS (Ovako Working Posture Analysing System [16]) ergonomic analysis carried out. The software automatically identifies work postures classified in accordance with OWAS in connection with the handled weights and evaluates them statistically. As a result, the user receives a list of priorities that distinguishes between four risk classes (action category/class of measures). For the biomechanical assessment of manual load handling and to estimate the associated load on the spine, the measured data can be entered as input data into biomechanical human models [5]. Apart from the measured body/joint movements and forces, the model also requires the subject's data, e.g. body height, length of limbs and body weight, as input variables. From this, force and torque vectors are calculated at the model's joints. For estimation of the loading on the lumbar spine, an interface to the biomechanical model “The Dortmunder” [15] exists.
Fig. 2. Data visualization and assessment with the CUELA software, including video, 3D puppet and time graphs
224
R. Ellegast, I. Hermanns, and C. Schiefer
CUELA enables also a synchronous application and data acquisition with other physical and physiological measurement devices: 3D force handles, e. g. [10], force gloves, ECG, EMG [11] and whole body vibration [13]. An interface of the CUELA software to a BGIA database allows for collecting workload data from occupational practice for evaluating workplaces and developing suitable preventive measures [3].
3 Results The effective CUELA method has been successfully employed in the last years in reducing health risks at numerous workplaces in a variety of industries (including the building industry, the retail trade, the energy industry, electrical industry, metalworking industry, chemical industry, textile and leather industry and nursing). Both in consultations with the companies themselves and in research projects, targeted measures have been initiated to prevent excessive loading of the musculoskeletal system at the workplace. The quantification of the loading situation before and after ergonomic modifications facilitates a precise control of the effectiveness of such preventive measures. The findings from the measurements have in several cases been converted into simple instructions with practical tips for the persons concerned. Some examples for projects including the application of CUELA are listed in the following: • • • • • • •
ergonomic intervention study at sewing workplaces [7] assessment of pushing and pulling of trolleys aboard aircrafts [10, 12] redesign of crane operator workplaces [3] whole shift postural workload assessment in different nursing workplaces [9] assessment of physical activity at workplaces [24] combined assessment of posture and whole body vibration [13] comparative assessment of dynamic office chairs [8]
Currently the hardware of the CUELA system is updated by replacing the analog motion sensors with digital 3D inertial sensor packages [22].
References 1. Colombini, D., Occhipinti, E., Greco, A.: Risk Assessment and Management of Repetitive Movements and Exertions of the Upper Limb. In: Mital, A., Ayoub, M., Landau, K. (eds.) Elsevier Ergonomics Book Series, vol. 2. Elsevier, London (2002) 2. Ditchen, D., Ellegast, R.P.: Development of a database for the analysis of and research into occupational strains on the spinal column. In: McCabe, P.T. (ed.) Contemporary Ergonomics 2004, pp. 202–206. CRC Press LLC, Boca Raton (2004) 3. Ditchen, D., Ellegast, R.P., Herda, C., Hoehne-Hückstädt, U.: Ergonomic intervention on musculoskeletal discomfort among crane operators at waste-to-energy-plants. In: Bust, P.D., McCabe, P.T. (eds.) Contemporary Ergonomics 2005, pp. 22–26. Taylor & Francis, London (2005) 4. Drury, C.G.: A Biomechanical Evaluation of the Repetitive Motion Injury Potential of Industrial Jobs. Seminars in Occupational Medicine 2, 41–49 (1987)
Workload Assessment in Field Using the Ambulatory CUELA System
225
5. Ellegast, R.P.: Personengebundenes Messsystem zur automatisierten Erfassung von Wirbelsäulenbelastungen bei beruflichen Tätigkeiten. BIA-Report 5/98. HVBG, Sankt Augustin (1998), http://www.dguv.de/bgia/de/pub/rep/rep02/biar0598/index.jsp 6. Ellegast, R.P., Kupfer, J.: Portable posture and motion measuring system for use in ergonomic field analysis. In: Ergon, L. (ed.) Ergonomic Software Tools in Product and Workplace Design, Ergon Stuttgart, pp. 47–54 (2000) 7. Ellegast, R., Herda, C., Hoehne-Hückstädt, U., Lesser, W., Kraus, G., Schwan, W.: Ergonomie an Näharbeitsplätzen. BIA-Report 7/2004. HVBG, Sankt Augustin (2004), http://www.dguv.de/bgia/de/pub/rep/rep04/biar0704/index.jsp 8. Ellegast, R.P., Keller, K., Hamburger, R., Berger, H., Krause, F., Groenesteijn, L., Blok, M., Vink, P.: Ergonomische Untersuchung besonderer Büroarbeitsstühle. BGIA-Report 5/2008, Deutsche Gesetzliche Unfallversicherung (DGUV), Sankt Augustin (2008), http://www.dguv.de/bgia/de/pub/rep/rep07/bgia0508/index.jsp 9. Freitag, S., Ellegast, R., Dulon, M., Nienhaus, A.: Quantitative measurement of stressful postures in nursing professions. Ann. Occup. Hyg. 51(4), 385–395 (2007) 10. Glitsch, U., Ottersbach, H.J., Ellegast, R., Hermanns, I., Feldges, W., Schaub, K., Berg, K., Winter, G., Sawatzki, K., Voß, J., Göllner, R., Jäger, M., Franz, G.: Untersuchung der Belastung von Flugbegleitern beim Schieben und Ziehen von Trolleys in Flugzeugen. BIA-Report 5/2004. HVBG, Sankt Augustin (2004), http://www.dguv.de/bgia/de/pub/rep/rep04/biar0504/index.jsp 11. Glitsch, U., Hermanns, I., Ellegast, R.P., Schüler, R., Herrmann, L.: EMG signal processor module for long-term movement analysis. In: Kalender, W., Hahn, E.G., Schulte, A.M. (eds.) Berichtsband Biomedizinische Technik, vol. 50(suppl. 1), Part 2, pp. 1440–1441. Fachverlag Schiele & Schön, Berlin (2005) 12. Glitsch, U., Ottersbach, H.J., Ellegast, R., Schaub, K., Franz, G., Jäger, M.: Physical workload of flight attendants when pushing and pulling trolleys aboard aircraft. Int. Journal of Ind. Ergonomics 37, 845–854 (2007) 13. Hermanns, I., Raffler, N., Ellegast, R., Fischer, S., Göres, B.: Simultaneous field measuring method of vibration and body posture for assessment of seated occupational driving tasks. Int. Journal of Ind. Ergonomics 38, 255–263 (2008) 14. Hoehne-Hückstädt, U., Herda, C., Ellegast, R., Hermanns, I., Hamburger, R., Ditchen, D.: Muskel-Skelett-Erkrankungen der oberen Extremität und berufliche Tätigkeit. BGIAReport 2/2007. HVBG, Sankt Augustin (2007), http://www.hvbg.de/d/bia/pub/rep/rep04/bia0207.html 15. Jäger, M., Luttmann, A., Göllner, R., Laurig, W.: Der Dortmunder - Biomechanische Modellbildung zur Bestimmung und Beurteilung der Belastung der Lendenwirbelsäule bei Lastenhandhabungen. In: Radandt, S., Grieshaber, R., Schneider, W. (eds.) Prävention von arbeitsbedingten Gesundheitsgefahren und Erkrankungen, pp. 105–124. Monade-Verlag, Leipzig (2000) 16. Karhu, O., Kansi, P., Kuorinka, I.: Correcting working postures in industry: A practical method for analysis. Appl. Ergon. 8, 199–201 (1977) 17. Kilbom, Å.: Repetitive work of the upper extremity: Part I – Guidelines for the practitioner. Int. Journal of Ind. Ergonomics 14, 51–57 (1994) 18. Li, G., Buckle, P.: Current techniques for assessing physical exposure to work-related musculoskeletal risks, with emphasis on posture-based methods. Ergonomics 42, 674–695 (1999) 19. McAtamney, L., Corlett, E.N.: RULA: a survey method for the investigations of workrelated upper limb disorders. Appl. Ergonomics 24, 91–99 (1993)
226
R. Ellegast, I. Hermanns, and C. Schiefer
20. Marras, W.S., Fathallah, F.A., Miller, R.J., Davis, S.W., Mirka, G.A.: Accuracy of a threedimensional lumbar motion monitor for recording dynamic trunk motion characteristics. Int. Journal of Ind. Ergonomics 9, 75–87 (1992) 21. Plamondon, A., Delisle, A., Larue, C., Brouillette, D., McFadden, D., Desjardins, P., Lariviere, C.: Evaluation of a hybrid system for three-dimensional measurement of trunk posture in motion. Appl. Ergonomics 38, 697–712 (2007) 22. Schiefer, C.: Development of a person centred measuring system for detecting torsion and rotation of human trunk or head. Master Thesis, University of Applied Sciences, Departement of Computer Sciences, Sankt Augustin, Germany (2008) 23. Silverstein, B.A., Fine, L.J., Armstrong, T.J.: Hand wrist cumulative trauma disorders in industry. British Journal of Industrial Medicine 43, 779–784 (1986) 24. Weber, B., Wiemeyer, J., Hermanns, I., Ellegast, R.: Assessment of everyday physical activity: Development and evaluation of an accelerometry based measuring system. Int. Journal of Computer Sciences in Sport 6, 4–20 (2007)
Computational Nonlinear Dynamics Model of Percept Switching with Ambiguous Stimuli Norbert Fürstenau German Aerospace Center, Institute for Flight Guidance, Lilienthalplatz 7 D-38108 Braunschweig, Germany
[email protected]
Abstract. Simulation results of bistable perception due to ambiguous visual stimuli are presented which are obtained with a nonlinear dynamics model using delayed perception–attention–memory coupling. Percept reversals are induced by attention fatigue with an attention bias which balances the relative percept duration. Periodic stimulus simulations as a function of stimulus offtime yields the reversal rate variation in surprisingly good quantitative agreement with classical experimental results reported in the literature [1] when selecting a fatigue time constant of 1 – 2 s. Coupling of the bias to the perception state introduces memory effects which are quantified through the Hurst parameter H, exhibiting significant long range correlations (H > 0.5) in agreement with recent experimental results [2]. Percept transition times of 150 – 200 ms and mean percept dwell times of 3 – 5 s as reported in the literature, are correctly predicted if a feedback delay of 40 ms is assumed as mentioned in the literature (e.g. [21]). Keywords: cognitive bistability, modelling, nonlinear dynamics, perception, attention, Hurst parameter.
1 Introduction In the present work new simulation results of a nonlinear dynamics model of cognitive multistability [3] are presented. Multistable perception is the spontaneous involuntary switching of conscious awareness between the different percepts of an ambiguous stimulus. It is excited with different methods and stimuli such as binocular rivalry [5], perspective reversal, e.g. with the famous Necker cube [6][7][25], and ambiguous motion displays [8]. Bistability provides an unique approach to fundamental questions of perception and consciousness because it allows for the direct measurement of the switching of subjective perception under constant external stimulus (e.g. [9][10][11][12] [13]). Various aspects of the present model were described in previous papers [3][14][15] where results on stability, typical time scales, statistics of perceptual dominance times, and memory effects were compared with experimental results found in the literature. The present simulation results are compared with two different experiments: classical results of Orbach et.al. [1][6] addressing percept stabilization due to periodic interruption of stimulus, and recently discovered long range correlations of the perceptual duration times [2] via determination of the self similarity (Hurst) parameter H (> 0.5) of the dwell time series. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 227–236, 2009. © Springer-Verlag Berlin Heidelberg 2009
228
N. Fürstenau
Concerning theoretical modeling there is an ongoing discussion on the predominance of stochastic [16] [17] versus deterministic [3] [18][19] background of multistability, and on the importance of neural or attentional fatigue [6][19] versus memory effects [1][17]. The synergetic model of Ditzinger & Haken [19] is based on two separate sets of coupled nonlinear dynamics equations for the two perception state order parameters and the corresponding attention (control) parameters. According to the experimentally supported satiation (neuronal fatigue) hypothesis [6], quasiperiodic transitions between different attractor states of the perception order parameter are induced by a slow time variation of the attention (control) parameter due to perception–attention coupling. Following [19] and supported by recent experimental results in [4][25][29] the present model couples the dynamics of a macroscopic (behavioral) perception state order parameter with an adaptive attention control parameter, corresponding to feedback gain with delay and additive noise [3]. Memory effects are introduced by allowing for the adaptation of the originally constant attention bias parameter which balances the subjective preference of one of the two percepts. By including an additive attention noise term the model explains the experimental finding that deterministic as well as stochastic dynamics determines the measured reversal time statistics for different multistability phenomena. In section 2 the theoretical approach is described. Computer simulations of perception time series are presented in section 3, adressing percept stabilization with interrupted stimulus in 3.1 and predicting long range correlations with adaptive bias under constant stimulus in 3.2. Discussion of results and the conclusion follows in section 4.
2 Theory 2.1 The Recursive Mean Field Interference Model After reviewing important features of the present model I will add some aspects not mentioned in previous papers [3][14][15]. In agreement with the widely accepted view of reentrant synchronous interactions between distant neuronal groups within the thalamo-cortical system leading to conscious perception (e.g. [13][22][25][29]), the present model assumes superimposition of coherent fields a(Φ1(t)), b(Φ2(t) representing the possible percepts P1, P2, and recursive processes to determine the multistable perception dynamics. Like [19] it utilizes perception-attention coupling, however within a delayed reentrant loop modulating the phase difference ΔΦ = Φ1 – Φ2, with attention identified with feedback gain [3][26], and adaptive attention bias balancing preference between percepts via learning and memory. This approach results in a phase dynamics ΔΦ(t) formalized by a recursive cosinuidal mapping function. The architecture is motivated by thalamo-cortical (TC) reentrant loops as proposed within the dynamical core hypothesis of consciousness [13] and within the discussion of bottom-up and top-down aspects of visual attention [26]. The present approach is furthermore motivated by the mean field phase oscillator theory of coupled neuronal columns in the visual cortex [23]. It describes via the circle (sine) map the synchronization of neural self oscillations as the physiological basis of dynamic temporal binding which in turn is thought to be cruical for the selection of perceptually or
Computational Nonlinear Dynamics Model of Percept Switching
229
Fig. 1. Schematic of visual information flow within the thalamo-cortical system, with indication of bottom-up streams and attentional top-down modulation (black arrows) of ventral ("what") and dorsal ("where") pathways resulting in recurrent v-G-vb loops (based on [26][27]) with feedforward and reentrant delay T ≈ 40 ms [21]. Top sketch shows time scales of disambiguation process.
behavioraly relevant information [10][11][12]. Accordingly Figure 1 depicts within a block diagram important modules of the attentionally modulated visual perception system. The diagram is based on classical brain circuit schematics (e.g. [27]) and extends a figure in [26] depicting the attentional top-down modulation of the dorsal ("where") and ventral ("what") streams of information. Within the present model it is assumed that for the emergence of the conscious percept, feedforward preprocessing of the stimulus up to the Primary Visual Cortex V1 as well as the loop via superior colliculi can be neglected. The main processing takes place within recurrent TC-loops under covert attention (e.g. [9][22][26]). The model architecture is suggested to basically represent the ventral ("what") V2/V4– InferoTemporal (IF)–PraeFrontal (PF)–V2/V4 loop and the TC-hippocampal (memory) loop as target structure. Recent experimental evidence on perception–attention coupling with ambiguous stimuli was based on EEG recording of frontal theta and occipital alpha bands [25]and eye blink rate measurement [4]. According to Hillyard et.al. [28] stimulus-evoked neuronal activity can be modified by an attentional induced additive bias or by a true gain modulation (present model parameters vb(t) and g(t)). Increase of gain g(t) is correlated with increased blood flow through the respective cortical areas. Consequently in the present model, like in [19], the feedback gain serves as adaptive control parameter (g ∼ attention parameter G) which induces the rapid transitions between the alternative stationary perception states P1 and P2, through attention fatigue [6][19]. The reentrant coherent field superimposition yields an overdamped feedback system with a first order dynamical equation. The resulting phase oscillator equation (1) is similar to the phase attractive circle map of Kelso et.al. [24]. The complete dynamics is described by three coupled equations for the perception state order parameter (phase difference v(t) = ΔΦ/π), the attention control parameter G(t), and for the attention bias or preference vb(t). The full model is built upon a set of three perception-attention-memory (PAM) equations for each percept Pi,
230
N. Fürstenau
i = 1,2,..., n, with inhibiting (phase) coupling -cij vj , i ≠ j, in the nonlinear mapping functions, comparable to [19]. n ⎡ ⎛ ⎛ ⎞ ⎞⎤ τv& it + T + v it + T = Gti ⎢1 + μ i cos⎜ π⎜⎜ v it − ∑ cij v tj + v B ⎟⎟ ⎟⎥ . ⎟ ⎜ i≠ j ⎢⎣ ⎠ ⎠⎥⎦ ⎝ ⎝
(
) (
)
i & i = v i − v i /γ + G . G mean - G t /τ G + L t t b t
(
)
(
)
v& ib t = vibe − vib t M/τ L + v it − vibt /τ M .
(1)
(2) (3)
In the computer experiments of section 3, however, like in previous publications, for the bistable case a reduced model with a single set of PAM equations will be used. This is justified by the fact that without noise the system behavior is completely redundant with regard to perception states i = 1, 2 (P1, P2) as will be shown in section 2.2 (see also [19]). The advantage of reduced number of parameters has to be payed for by slightly unsymmetric behavior of P1, P2 time series (slightly different mean dwell times with symmetric bias vb) The reduced model system behavior can be understood as follows. An ambiguous stimulus with strength I and difference of meaning μ (interference contrast 0 ≤ μ ≤ 1) of the two possible percepts P1, P2 excites two corresponding hypothetical mean fields [a1, a2] representing percept possibilities, with phase difference ΔΦ. A recurrent process is established by feedback of the output U ∼ |a1 + a2|2 after amplification (feedback gain g) with delay T into ΔΦ via a hypothetical phase modulation mechanism ΔΦ = πU/Uπ. = πv. As a quantitative estimate for T the reentrant (feedback) processing delay of ≈ 40 ms within the association cortex is assumed as mentioned by Lamme [21]. The nonlinear rhs. of equ. (1) describes the conventional interference between two coherent fields. In what follows I assume the phase bias vB = 0 mod 2. In agreement with Itti & Koch [26] the attention parameter G(t) ∼ κ I0 g(t) is the product of feedback gain g(t) and input (stimulus) strength I0 (=1 in what follows). The attention dynamics is determined by the attention bias vb (determining the relative preference of P1 and P2), fatigue time constant γ, recovery time constant τG, and Gmean = 0.5(3 – μ)/(1 – μ2) = center between turning points of stationary hysteresis v*(G) (see below). Following [19], the random noise due to physically required dissipative processes is added to the attention equation G(t) as a stochastic Langevin force L(t) with band limited white noise power Jω. The attention bias or preference dynamics dvb/dt is modelled as the sum of a learning term M(vt,vb,vbe)(vbe – vb)/τL, and of a memory component (
– vb)/τM which couples vb to the low pass filtered perception state. Learning of an unfamiliar (weak) percept Pj is active only in the initial phase of the time series if a Pj association is low and a fluctuation induced jump into the weak Pjperception state from Pi occurs, switching M from 0 to 1. 2.2 Stationary Solutions and Self-oscillations Quasiperiodic switching between two attractor states v*1(P1) and v*2(P2) emerges after a node bifurcation of the stationary solution v*(G). It evolves from a monotonous
Computational Nonlinear Dynamics Model of Percept Switching
a)
231
b)
Fig. 2. a) First order stationary solution of a single percept equation (1) with arrows indicating g-v phase space trajectories of perceptual self oscillations (frequency fG, vertical) and externally imposed stimulus oscillation μ(t) = 0.2 ⇔ 0.6: (frequency fS, horizontal). b) Numerical solution of the full model equs. (1) (2) (3): over 3000Ts = 1 min, depicting redundancy due to antiphase of v1, v2. Stimulus μ(t) changes at t = 1000 Ts = 20 s from μ = 0.2 to μ = 0.6.
function into a hysteresis (S-shaped) ambiguous one with increasing μ as can be seen in the first order stationary solution of equ. (1) shown in Figure 2a). The stationary solution supports the proposed catastrophe topology of the cognitive multistability dynamics [18]. At the critical value, μn = 0.18, the slope of the stationary system state v*(G) becomes infinite, with (Gn, vn) ≈ (1.5, 1.5). For μ < μn both percepts are fused into a single meaning. For μ > μn the stationary solution v*(G) becomes multivalued. For maximum contrast μ = 1 the horizontal slope dv/dG = 0 yields v i∞ = 2i − 1 , i = 1,2,3,… as stationary perception levels for G → ∞. Figure 2b) depicts a numerical solution of the set of two coupled PAM equations with identical parameter values T = 2, τ = 1, γ = 60, τG = 500, cij = 0.1, constant attention bias vb = 1.5, noise power Jω = 0 (time units = sample time TS = 20 ms), as obtained with a Matlab–Simulink code using the Runge-Kutta solver "ode23tb" [3][14][15]. Higher order stationary solutions yield period doubling pitchfork bifurcations [3][14][15] (not shown in Fig. 2a)) on both positive slope regions of the hysteresis curve, with the G-values of the bifurcation points converging at the chaotic boundary according to the Feigenbaum constant δ ∞ = 4.6692 . The corresponding P1-, P2-limit cycle oscillations and chaotic contributions can be seen in Figure 2b) which depicts time series of perceptual switching events of the percept vector [v1, v2] for small and large contrast parameter μ. The small-μ self-oscillations change into pronounced switching between percept-on (vi > 2) and –off (vi ≈ 1) with incrasing contrast. In contrast to the quasiperiodic P1-P2 switching the superimposed limit cycle oscillations (> 5 Hz) originate from the finite delay T with the amplitudes corresponding to the pitchfork bifurcation pattern [3][15]. The linear stability analysis of equ.(1) [15] yields Eigenfrequencies β = 2πf via βτ = − tan(βT ) with numerical values f/Hz = 9.1,
232
N. Fürstenau
20.2, 32.2, 44.5 ... for τ = 20 ms, T = 40 ms. This spectrum compares reasonably well with typical EEG frequencies as well as fixational eye movements as related external observables. The percept reversal time period is determined by the slow G(t) dynamics, with fatigue and recovery time constants γ, τG, leading to the quasiperiodic P1→P2 transitions at the G-extrema. An analytic estimate for small μ of the expected perceptual self oscillations between the stationary states v*(P1) ⇔ v*(P2) due to the v – G coupling may be obtained by combination of equations (1) and (2) yielding the reversal frequency fG = f0 1− D2
(4)
with eigenfrequency ω 0 = 1/ γ(τ + T ) = 3.73rad/s or f0 = 0.59 Hz = 36 min-1 or T0 = 1.7 s. The influence of the damping term can be derived after transformation of the timescale into eigentime ϑ = ω 0 t with normalized damping D = (1 − πμG * )/(2ω 0 (τ + T )) , yielding the reversal rate fD = 0.55 Hz = 33 min-1 in exact agreement with the the numerical solution in Fig. 2b). Although the very rough dwell time estimate for a single percept Δ(Pi) = TG / 2 = 1/2fG due to the low hysteresis (μ = 0.2 ) lies at the lower end of the typical experimental results it nevertheless predicts the correct order of magnitude, e.g. [6][7][16][20]. The percept duration time statistics has been shown in numerous experimental investigations (e.g.[7][20][29]) and different theoretical modelling approaches ([3][19][24]) to correspond to a Γ-distribution as a reasonable approximation. Time series of the kind shown in Fig. 2b) obtained with the simplified (scalar) model were analyzed in previous publications [3][14][15] with respect to the relative frequencies of perceptual duration times Δ(P1), Δ(P2). The analysis confirmed the Γ-distribution statistics of percept dwell times as a good approximation, with absolute mean values Δm of some seconds and relative standard deviation σ/Δm ≈ 0.5 [7][20].
3 Computer Experiments In what follows numerical evaluations of the PAM-equations in its reduced scalar form are presented for comparing theoretical predictions with a) experiments addressing fatigue suppression (or percept stabilization) with periodically interrupted ambiguous stimulus [1][6], and b) long range correlations within dwell time series observed under constant stimulus [2]. 3.1 Perception–Attention Dynamics with Interrupted Stimulus In this section numerical evaluations of a single set of PAM equations with periodically interrupted stimulus are presented. Figure 3 shows for the same parameter values as Fig. 2b) over a period of tSim = 2000 TS = 40 s the time series μ(t), G(t) and v(t), however with noise power Jω = 0.001 (noise sample time tc = 0.1), and τM = 10000, τL = 100000, i.e. effectively constant bias. The periodically interrupted stimulus parameter (contrast) μ(t) alternates between 0.6 = stimulus-on and 0.1 = stimulus-off with ton = toff = 300 ms.
Computational Nonlinear Dynamics Model of Percept Switching
233
Fig. 3. Numerical evaluation of PAM-equations (reduced scalar model) for periodic stimulus with ton = toff = 300 ms. From bottom to top: Stimulus parameter μ(t) alternating between μ = 0.6 (on) and 0.1 (off), attention parameter G, perception state v(t). For details see text.
The v(t) dynamics in Fig.3 exhibits the expected quasiperiodic transitions between stationary perception states P1 (near v* ≈ 1) and P2 (near v* ≈ 2.5). During stimulus on periods the expected superimposed fast limit cycle and chaotic oscillations are observed. The transition time between P1 and P2 is of the order of 8 - 10 TS ≈ 150 - 200 ms, in reasonable agreement with the time interval between stimulus onset and conscious perception [21]. Figure 4 shows model based reversal rates 1/Δm as function of toff.
Fig. 4. Reversal rate 1/Δm obtained from computer experiments for ton = 300 ms and 10 ms ≤ toff ≤ 800 ms (circles: 100 time series of tSim = 5000 TS/data point) and experimental values [1] (crosses)
Numerical values are determined by evaluation of time series like in Fig.3 with ton = 300 ms and a range of toff–values corresponding to experiments reported in [1][6]. A surprisingly good agreement is observed between model simulations and experiments, even with regard to the absolut maximum, indicating the fatigue induced phaseoscillator mechanism to capture essential aspects of the cognitive bistability dynamics.
234
N. Fürstenau
3.2 Memory Effects through Adaptive Bias In a recent analysis of perceptual dwell time statistics as measured with Necker cube and binocular rivalry experiments Gao et.al. [2] detected significant long range correlations quantified by the Hurst parameter (H > 0.5), with 0.6 < H < 0.8 for 20 subjects who indicated subjective percept switching by pressing a button. With the present model the coupling of the dynamic bias vb to the perception state leads to long term correlations via memory effects. The left graph of Figure 5 depicts simulated subjective percept switching with dwell times Δ(P2) versus reversal number. Simulation parameters are μ = 0.6, vb0 = vbe = 1.5, T = 2TS, τ = 0.5, γ = 60, τG = 500, Jω = 0.004, dynamic bias (preference) time constants τM = 3000, τL = 100000. The right graph of Fig. 5 depicts the evaluation of H from 100 time series with simulation length 5000 TS by employing the log(variance(Δ(m))) vs. log(sample size m) method with var(Δ(m)) = s2 m2H-1 as used by Gao et.al. [2]. H is determined from the slope of the regression line and includes 95% confidence intervals of parameter estimates.
Fig. 5. Left: Simulated subjective responses to percept switching depicting dwell times Δ(P2). Right: variance(m) vs. sample time (m) plot of the same simulation runs with linear fit (95% conf. intervals) for estimating H via the slope of the regression line.
It shows significant long range correlations due to the memory effect if the time constant for the attention bias vb satisfies τM < 10000 TS = 200 s. The learning component in equ.(3) influences the dynamics only in the initial phase if |vbe – vb(t=0)| > 0 and only if τL < 2000. Large τL,M (vanishing memory change) represent quasi static preference: for τM,L > 10000 the long range correlations vanish, with H ≈ 0.5 corresponding to a random walk process (Brownian motion).
4 Discussion and Conclusion For the first time to our knowledge the percept reversal rate of alternating perception states under periodic stimulus and the memory effect of an adaptive perception bias was derived by computer simulations using a single behavioral nonlinear dynamics phase oscillator model based on perception-attention-memory coupling and phase feedback. The PAM model can be mapped to a simplified thalamocortical reentrant circuit including attentional feedback modulation of the ventral stream [26]. For the
Computational Nonlinear Dynamics Model of Percept Switching
235
bistable case the full vector model with a set of PAM equations per perception state can be approximated by a scalar PAM model due to redundancy of the noise-free case, at the cost of slight unsymmetries between v1, v2 time series statistics. The dynamics of the reentrant self oscillator perception circuit is determined by delayed adaptive gain for modeling attention fatigue, with additive attention noise. The attention in turn is biased by an adaptive preference parameter coupled to the perception state for simulating memory effects. Simulated perceptual reversal rates under periodic stimulus provide surprisingly good quantitative agreement with experimental results of Orbach et al. [1][6]. With memory time constants < 200 s reversal time series exhibit long range correlations characterized by a Hurst (self similarity) parameter H > 0.5 in agreement with experimental results of Gao et.al.[2]. The present model supports the early proposal of Poston & Stewart [18] of a deterministic catastrophe topology as the basis of the perception reversal dynamics. Acknowledgement. I am indebted to Monika Mittendorf for help with the computer experiments and to J.B. Gao and K.D. White of Univ. of Florida for providing an early preprint of their work.
References 1. Orbach, J., Zucker, E., Olson, R.: Reversibility of the Necker Cube: VII. Reversal rate as a function of figure-on and figure-off durations. Percept. and Motor Skills 22, 615–618 (1966) 2. Gao, J.B., Merk, I., Tung, W.W., Billok, V., White, K.D., Harris, J.G., Roychowdhury, V.P.: Inertia and memory in visual perception. Cogn. Process 7, 105–112 (2006) 3. Fürstenau, N.: A computational model of bistable perception-attention dynamics with long range correlations. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS, vol. 4667, pp. 251–263. Springer, Heidelberg (2007) 4. Ito, J., Nikolaev, A.R., Luman, M., Aukes, M.F., Nakatani, C., van Leeuwen, C.: Perceptual switching, eye movements, and the bus paradox. Perception 32, 681–698 (2003) 5. Blake, R., Logothetis, N.K.: Visual competition. Nature Reviews / Neuroscience 3, 1–11 (2002) 6. Orbach, J., Ehrlich, D., Heath, H.A.: Reversibility of the Necker Cube: An examination of the concept of satiation of orientation. Perceptual and Motor Skills 17, 439–458 (1963) 7. Borsellino, A., de Marco, A., Allazetta, A., Rinesi, S., Bartolini, B.: Reversal time distribution in the perception of visual ambiguous stimuli. Kybernetik 10, 139–144 (1972) 8. Hock, H.S., Schöner, G., Giese, M.: The dynamical foundations of motion pattern formation: Stability, selective adaptation, and perceptual continuity. Perception & Psychophysics 65, 429–457 (2003) 9. Koch, C.: The Quest for Consciousness – A Neurobiological Approach, German Translation. Elsevier, München (2004) 10. Engel, A.K., Fries, P., Singer, W.: Dynamic Predictions: Oscillations and Synchrony in Top-Down Processing. Nature Reviews Neuroscience 2, 704–718 (2001) 11. Engel, A.K., Fries, P., König, P., Brecht, M., Singer, W.: Temporal binding, binocular rivalry, and consciousness. Consciousness and Cognition 8, 128–151 (1999) 12. Srinavasan, R., Russel, D.S., Edelman, G.M., Tononi, G.: Increased synchronization of magnetic responses during conscious perception. J. Neuroscience 19, 5435–5448 (1999)
236
N. Fürstenau
13. Edelman, G.: Wider than the Sky, pp. 87–96. Penguin Books (2004) 14. Fürstenau, N.: Modelling and Simulation of spontaneous perception switching with ambiguous visual stimuli in augmented vision systems. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Weber, M. (eds.) PIT 2006. LNCS (LNAI), vol. 4021, pp. 20–31. Springer, Heidelberg (2006) 15. Fürstenau, N.: A nonlinear dynamics model of Binocular Rivalry and Cognitive Multistability. In: Proc. IEEE Int. Conf. Systems, Man, Cybernetics, pp. 1081–1088 (2003) 16. De Marco, A., Penengo, P., Trabucco, A., Borsellino, A., Carlini, F., Riani, M., Tuccio, M.T.: Stochastic Models and Fluctuations in Reversal Time of Ambiguous Figures. Perception 6, 645–656 (1977) 17. Merk, I.L.K., Schnakenberg, J.: A stochastic model of multistable perception. Biol.Cybern. 86, 111–116 (2002) 18. Poston, T., Stewart, I.: Nonlinear Modeling of Multistable Perception. Behavioral Science 23, 318–334 (1978) 19. Ditzinger, T., Haken, H.: A Synergetic Model of Multistability in Perception. In: Kruse, P., Stadler, M. (eds.) Ambiguity in Mind and Nature, pp. 255–273. Springer, Berlin (1995) 20. Levelt, W.J.M.: Note on the distribution of dominance times in binocular rivalry. Br. J. Psychol. 58, 143–145 (1967) 21. Lamme, V.A.F.: Why visual attention and awareness are different. Trends in cognitive Sciences 7, 12–18 (2003) 22. Tononi, G., Edelman, G.M.: Consciousness and Complexity. Science 282, 1846–1851 (1998) 23. Schuster, H.G., Wagner, P.A.: A Model for Neural Oscillations in the Visual Cortex: 1. Mean field theory and the derivation of the phase equations. Biol. Cybern. 64, 77–82 (1990) 24. Kelso, J.A.S., Case, P., Holroyd, T., Horvath, E., Raczaszek, J., Tuller, B., Ding, M.: Multistability and metastability in perceptual and brain dynamics. In: Kruse, P., Stadler, M. (eds.) Ambiguity in Mind and Nature, pp. 255–273. Springer, Berlin (1995) 25. Nakatani, H., van Leeuwen, C.: Transient synchrony of distant brain areas and perceptual switching in ambiguous figures. Biol. Cybern. 94, 445–457 (2006) 26. Itti, L., Koch, C.: Computational Modelling of Visual Attention. Nature Reviews Neuroscience 2, 194–203 (2001) 27. Robinson, D. (ed.): Neurobiology. Springer, Berlin (1998) 28. Hillyard, S.A., Vogel, E.K., Luck, S.J.: Sensory gain control (amplification) as a mechanism of selective attention: electrophysiological and neuroimaging evidence. In: Humphreys, G.W., Duncan, J., Treisman, A. (eds.) Attention, Space, and Action, pp. 31–53. Oxford University Press, Oxford (1999) 29. Nakatani, H., van Leeuwen, C.: Individual Differences in Perceptual Switching rates: the role of occipital alpha and frontal theta band activity. Biol. Cybern. 93, 343–354 (2005)
A Computational Implementation of a Human Attention Guiding Mechanism in MIDAS v5 Brian F. Gore1, Becky L. Hooey1, Christopher D. Wickens2, and Shelly Scott-Nash2 1
San Jose State University Research Foundation, NASA Ames Research Center, MS 262-4, Moffett Field, California, USA 2 Alion Science and Technology, 4949 Pearl East Circle, Suite 300 Boulder, Colorado, USA {Brian.F.Gore,Becky.L.Hooey}@nasa.gov, {cwickens,sscott-nash}@alionscience.com
Abstract. In complex human-machine systems, the human operator is often required to intervene to detect and solve problems. Given this increased reliance on the human in these critical human-machine systems, there is an increasing need to validly predict how operators allocate their visual attention. This paper describes the information-seeking (attention-guiding) model within the Man-machine Integration Design and Analysis System (MIDAS) v5 software - a predictive model that uses the Salience, Effort, Expectancy and Value (SEEV) of an area of interest to guide a person’s attention. The paper highlights the differences between using a probabilistic fixation approach and the SEEV approach in MIDAS to drive attention. Keywords: Human Performance Modeling, Modeling Attention, MIDAS v5, SEEV.
1 Introduction There is a need for increased realism in human performance models (HPMs) of extreme and potentially hazardous environments. As the fidelity and realism of the HPMs improve, so too does the need for integrating and using complex human cognitive and attention models. HPMs exist that incorporate basic human vision and attention models to drive how and when a human will respond to events in specific environment contexts. Implementing these models computationally has typically taken the form of scripting a sequence of visual fixations points and some apply a probabilistic distribution [1,2]. Few, if any, HPM-attention models today operate in a closed-loop fashion using information from the environment to drive where the operator is going to look next. As automation and advanced technologies are introduced into current operational environments, there is an increasing need to validly predict how and when a human will detect environmental events. This paper summarizes the augmentations to the information-seeking (attention-guiding) model within the Man-machine Integration Design and Analysis System (MIDAS) v5 software from a probabilistic approach to a predictive model that uses four parameters (Salience, Effort, Expectancy and Value; SEEV) to guide an operator’s attention [3]. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 237–246, 2009. © Springer-Verlag Berlin Heidelberg 2009
238
B.F. Gore et al.
1.1 Man-machine Integration Design and Analysis System (MIDAS) The Man-machine Integration Design and Analysis System (MIDAS) is a dynamic, integrated human performance modeling and simulation environment that facilitates the design, visualization, and computational evaluation of complex man-machine system concepts in simulated operational environments [4,5]. MIDAS combines graphical equipment prototyping, dynamic simulation, and human performance modeling to reduce design cycle time, support quantitative predictions of humansystem effectiveness, and improve the design of crew stations and their associated operating procedures. HPMs like MIDAS provide a flexible and economical way to manipulate aspects of the operator, automation, and task-environment for simulation analyses [4,5,6]. MIDAS can suggest the nature of likely pilot errors, as well as highlight precursor conditions to error such as high levels of memory demand, mounting time pressure and workload, attentional tunneling or distraction, and deteriorating situation awareness (SA). MIDAS links a virtual human, comprised of a physical anthropometric character, to a computational cognitive structure that represents human capabilities and limitations. The cognitive component is comprised of a perceptual mechanism (visual and auditory), memory (short term memory, long term working memory, and long term memory), a decision maker and a response selection architectural component. The complex interplay among bottom-up and top-down processes enables the emergence of unforeseen, and non-programmed behaviors [7]. MIDAS is unique as it can be used as a cognitive modeling tool that allows the user to obtain both predictions and quantitative output measures of various elements of human performance, such as workload and SA, and as a tool for analyzing the effectiveness of crewstation designs from a human factors perspective [4]. This analysis can help point out fundamental design issues early in the design lifecycle, prior to the use of hardware simulators and human-in-the-loop experiments. In both cases, MIDAS provides an easy to use and cost effective means to conduct experiments that explore "what-if" questions about domains of interest. MIDAS v5 has a graphical user interface that does not require advanced programming skills to use. Other features include dynamic visual representations of the simulation environment, support for multiple and interacting human operators, several HPM outputs (including timelines, task lists, workload, and SA), performance influencing factors (such as error predictive performance, fatigue and gravitational effects on performance), libraries of basic human operator procedures (how-to knowledge) and geometries for building scenarios graphically (that leverage heavily from Siemens' JackTM software) [8]1. 1.2 MIDAS Attention and Perception Model MIDAS represents attention as a series of basic human primitive behaviors that carry with them an associated workload level determined from empirical research [9,10,11]. Actions are triggered by information that flows from the environment, through a perception model, to a selection-architecture (that includes a representation of human 1
Additional MIDAS information in [4,5] and http://hsi.arc.nasa.gov/groups/midas/ JackTM is maintained by Siemens PLM Solutions.
A Computational Implementation of a Human Attention Guiding Mechanism
239
attention loads), to a task network representation of the procedures that then feeds back into the environment. Actions carried out by the MIDAS operator impact the performance of the model in a closed-loop fashion. MIDAS represents perception as a series of stages that information must pass through in order to be processed. The perception model includes visual and auditory information. Visual perception in MIDAS depends on two factors – the amount of time the observer dwells on an object and the perceptibility of the observed object. The perception model computes the perceptibility of each object that falls into the operator’s field of view based on properties of the observed object, the visual angle of the object and environmental factors. In the current implementation of MIDAS, perception is a three-stage, time-based perception model (undetected, detected, comprehended) for objects inside the workstation (e.g., an aircraft cockpit) and a fourstage, time-based perception model (undetected, detected, recognized, identified) for objects outside the workstation (e.g., taxiway signs on an airport surface). The model computes the upper level of detection (i.e., undetectable, detectable, recognizable, identifiable for external objects) that can be achieved by the average unaided eye if the observer dwells on it for a requisite amount of time. For example, in a lowvisibility environment, the presence of an aircraft on the airport surface may be ‘detectable’ but the aircraft company logo on the tail might not be ‘recognizable’ or ‘identifiable’ even if he/she dwells on it for a long time. 1.3 MIDAS Probabilistic Scanning Model MIDAS uses a probabilistic scan pattern to drive the perception model. In the current version, probabilistic scan behaviors drive the eyeball towards a particular area of interest (AOI) based on a combination of the model analysts’ understanding of the operators scan pattern and the analysts’ selection of a statistical distribution of fixation times (i.e. gamma, lognormal, linear, etc) characteristic of the specific Table 1. Visual fixation probability matrix in a model of pilot performance (see [12]) Captain's fixation probabilities by context (phase of flight) Displays Primary Flight Display Nav Display/Elect Moving Map left window left-front window right-front window right window Eng. Indicating & Crew Alerting System Mode Control Panel Jepp chart Control Display Unit Total
descent 0.20 0.20 0.05 0.05 0.05 0.05 0.10 0.10 0.10 0.10 1.00
approach 0.30 0.30 0.05 0.10 0.05
land
0.90 0.10
1.00
1.00
0.05 0.05 0.10 1.00
Context exit runway 0.10 0.20 0.20 0.20 0.10 0.10 0.10 0.10
rollout 0.10 0.10 0.10 0.50 0.10
1.00
after land check taxi to gate
arrive at gate
0.20 0.20 0.20 0.20 0.10 0.05 0.05
0.30 0.20 0.20 0.20 0.10
0.10 0.30 0.30 0.20 0.10
1.00
1.00
1.00
First Officer's fixation probabilities by context (phase of flight) Displays Primary Flight Display Nav Display/Elect Moving Map left window left-front window right-front window right window Eng. Indicating & Crew Alerting System Mode Control Panel Jepp chart Control Display Unit Total
descent 0.10 0.20 0.10 0.10 0.10 0.10 0.10 0.05 0.10 0.05 1.00
approach 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.10 1.00
Context exit runway 0.20 0.20 0.10 0.10 0.10 0.10 0.20 0.10 0.20 0.10
land 0.30 0.20 0.10 0.10 0.10 0.10 0.10
rollout 0.40 0.20
1.00
1.00
1.00
after land check taxi to gate 0.10 0.10 0.25 0.10 0.10 0.20 0.20 0.20 0.20 0.15 0.20 0.10 0.10 1.00
1.00
arrive at gate 0.10 0.10 0.10 0.30 0.40
1.00
240
B.F. Gore et al.
environmental context. This approach requires a known scan pattern (in many cases this requires access to eye-movement data from a human-in-the-loop simulation). Models that use probabilities to drive the scan behavior require extensive model development time in order to represent context. An aviation example from a recently completed MIDAS v5 model (for a scenario description see [12]) will illustrate the manner that the information is input into the MIDAS architecture. The modeled pilots scan the displays and out the windows according to a probability matrix, as presented in Table 1. The probabilities were developed and verified by an experienced commercial pilot Subject Matter Expert (SME). The matrix assigns to the Captain (CA) and First Officer (FO) a probability of attending to information sources (shown in rows) for each of eight scenario contexts or phases of flight (shown in columns). 1.4 MIDAS Implementation of the Probability Matrix Within the model, the probability of visual fixation (location) is context specific as illustrated in Fig. 1. For example, during ‘after land checks’, the Captain is primarily scanning the electronic moving map (EMM) and out the window (OTW), while his/her secondary scanning is towards the Engine Indicating and Crew Alerting System (EICAS). The First Officer (FO) is primarily scanning the EICAS and OTW. The Primary Flight Display (PFD) and EMM are secondary. Probabilities are defined in the node to the right of the high level task (e.g. “descent(1_68)”).
Fig. 1. MIDAS implementation of the probabilistic scan pattern – P decision node (circled) is where the analyst enters the context-specific probabilistic values from probability matrix
This probabilistic approach effectively drives attention when the scan behavior is known but is less suitable when an analyst is interested in predicting the scan pattern given the context of the information content in the modeled world. To address this limitation and to improve the cross-domain generalizability of the MIDAS perception and attention model, MIDAS was augmented to include the validated Salience, Effort, Expectancy, Value (SEEV) model of visual attention [13] as will be described next. 1.5 The Salience, Effort, Expectancy, Value (SEEV) Model The SEEV model began as a conceptual model to predict how visual attention is guided in dynamic large-scale environments [13]. SEEV estimates the probability of
A Computational Implementation of a Human Attention Guiding Mechanism
241
attending, P(AOI), to an AOI in visual space, as a linear weighted combination of four components (salience, effort, expectancy, and value) as per the following equation: P(AOI) = s*S –ef*EF + ex*EX + v*V.
(1)
Coefficients in the uppercase describe the properties of a display or environment, while those in lower case describe the weight assigned to those properties in the control of an operator’s attention [14]. Specifically, the allocation of attention in dynamic environments is driven by the bottom-up capture of Salient (S) events (e.g., a flashing warning on the instrument panel) and inhibited by the Effort (E) required to move attention (e.g., a pilot will be less likely to scan an instrument located at an overhead panel, head down, or to the side where head rotation is required, than to an instrument located directly ahead on a head-up display (HUD). The SEEV model also predicts that attention is driven by the Expectancy (EX) of seeing a Valuable (V) event at certain locations in the environment. A computational version of this model drives the eyeballs around an environment, such as the dynamic cockpit, according to the four SEEV parameters. For example, the simulated eyeball following the model will fixate more frequently on areas with a high bandwidth (and hence a high expectancy for change), as well as areas that support high-value tasks, like maintaining stable flight [15].2 SEEV has been under development since 2001 and has been extensively validated with empirical human-inthe-loop data from different domains [3,16]. The integration of the SEEV model into MIDAS allows dynamic scanning behaviors by calculating the probability that the operator’s eye will move to a particular AOI given the tasks the operator is engaged in within the multitask context. It also better addresses allocation of attention in dynamic environments such as flight and driving tasks. A description of the implementation of the SEEV model into the MIDAS software follows.
2 Augmenting the MIDAS Visual Scan Mechanism with SEEV In MIDAS, Effort, Expectancy, and Value are assigned values between 0 and 1, while Salience is left unconstrained. As such, Effort, Expectancy, and Value drive the human operator’s eye around the displays. However, if a salient event occurs, then P(AOI) may be offset by the display exhibiting the salient event until the display location of the salient event has been fixated and detected. In order to integrate SEEV into MIDAS, provisions were made for the analysts to estimate values for each of the four parameters. Each will be discussed in turn.
Salience. In MIDAS, salience is associated with an event, not a display or object. An example of salience could be a proximity indicator on the navigation display that flashes when another aircraft comes too close. That is, for example, a cockpit display becomes salient when it is presenting an alert, but otherwise, is not salient. In addition, salience could include the loudness of an utterance (but not the content), the flash rate of an alert, and the color of an indicator (i.e., red to indicate a failure). In 2
The SEEV conceptual model has been refined to include a “to-be-noticed event” [15,16,17].
242
B.F. Gore et al.
Fig. 2. Salience heuristics are provided to guide model development
MIDAS, the time between the onset of the salient event and the time at which perception exceeds “Undetected” is reported [15,16,17]. The analyst must assess the salience of an event and provide a weight from 1 to 4. To aid this process, and in an attempt to establish a consistent set of rules to be applied across models, simple heuristics were developed: 1 = change with no luminance increase, 2 = change with luminance increase, 3 = change in position and luminance increase, 4 = repeated onsets (flashing). Fig. 2 below shows how an analyst sets the salience of an event in the MIDAS software.
Effort. Effort refers to the work that is required to sample the information (distance to the AOI). Effort is the only inhibitory factor in the SEEV equation and impacts the likelihood of traveling from one AOI to another. Since MIDAS knows the location of all displays and objects in the environment, the model can calculate Effort empirically. In MIDAS, an Effort rating between 0 and 1 is calculated for each AOI relative to the currently fixated AOI and is based on the angular difference. Any AOI that is 90 degrees or more from the current AOI is set to the maximum (1.0). The visual angle to any AOI that is less than 90 degrees is divided by 90 degrees. Expectancy. Expectancy, also called bandwidth, is described as the event frequency along a channel (location). This parameter is based on the assumption that if a channel has a high event rate, people will sample this channel more frequently than if the event rate is lower [14]. An example is the frequent oscillation of attitude of a light plane when encountering turbulence. The pilot expects the horizon line on the attitude indicator to change frequently and therefore monitors it closely. In contrast, the pilot expects the altimeter during a controlled descent to descend at a constant rate and therefore has a low expectation of seeing changes in descent rate. Thus, when the rate of change is constant, the bandwidth is zero. In SEEV applications, bandwidth (event rate) is always used as a proxy for expectancy. In MIDAS, Expectancy is implemented as a SEEV primitive (Fig. 3). Different expectancy values on a given display can be set for each context, procedure and operator. The context of events that precede the onset of a given signal will influence the likelihood that operators will bring their attention into the areas that are infrequently sampled.
A Computational Implementation of a Human Attention Guiding Mechanism
243
Fig. 3. An example of setting expectancy for First Officer
Expectancy for each AOI is set by the user to ‘none’, ‘low’, ‘moderate’ or ‘high’. When used in the SEEV equation, Expectancy is converted to 0, .333, .666 and 1.0 respectively. Drilling down on the SEEV Expectancy primitive in the task network reveals the setting as shown in Fig. 3. Value. The level of Value denotes the importance of attending to an event or task or the cost of missing it. For example, information that is used to prevent stalling the aircraft (airspeed, attitude, angle-of-attack), is clearly more important than navigational information, such as waypoint location. The sum of the product of the task value and the relevance of each display to the task is used to compute the value (importance) of the display [14] as illustrated in Table 2. Before the SEEV calculation is run, the task set importance is normalized between 0 and 1 (as shown by the values in Table 2) by computing the sum of all the importance values and then dividing each importance by the sum. It can be seen that an increased weight is given to the front window when avoiding collision relative to maintaining speed and heading. Table 2. Task value computation to determine display importance per context
Task
Task Value .8
Avoid collision Maintain .2 speed/ heading Value of AOI
Front Window
Importance of AOI to task Left Window Near PFD
Near ND
.6
.4
0
0
.1
.1
.4
.4
=(.8*.6)+(.2*.1) =.5
=(.8*.4)+(.2*.1) =.34
=(.8*.0)+(.2*.4) =.08
=(.8*.0)+(.2*.4) =.08
244
B.F. Gore et al.
Fig. 4. Example of assigning the value of AOIs to a task
In MIDAS, Value is implemented using SEEV primitives in order to bracket sets of primitives belonging to the most relevant task. The SEEV calculation considers all tasks that are active until they are explicitly ended by a SEEV end task primitive. For each task, an overall importance is set by the user. The user can indicate a relevance of none, low, moderate and high for each AOI. Just as with Expectancy, these are converted to 0, .333, .666, and 1. In addition, the user can specify none, low, moderate and high importance rating for the entire task. In Fig. 4, monitoring out the window (Front Right Window) is of high importance to the task bracketed by the “Monitoring OTW during land – FO” task set.
3 Discussion Few computational models operate in a closed-loop manner when it comes to seeking information within the environmental context. For a HPM to produce valid output, it must accurately model visual attention. Two attention-guiding mechanisms within MIDAS were presented: Probabilistic fixations and the SEEV approach. Probabilistic scan behaviors drive the eyeball towards a particular AOI based on a known scan pattern and a statistical distribution of fixation times. Models that use probabilities to drive the scan behavior are suitable if the analyst wants to replicate a known scan pattern but are less suitable when an analyst is interested in predicting the scan pattern given the context of information in the environment. Further, the probabilistic approach is often limited in that it does not consider dynamic changes to the environment and to the task. The SEEV method overcomes those limitations by breaking down relevant flight deck display features to four parameters (Salience, Effort, Expectancy, and Value). This approach to modeling attention is more consistent with actual human behavior and has previously been validated with
A Computational Implementation of a Human Attention Guiding Mechanism
245
empirical human-in-the-loop data (see [14,16]). The SEEV model is also less prone to error introduced by the modeler/analyst, as it does not require adjustment of fixation probabilities each time the task or environment is changed, as the probabilistic method does.
4 Conclusion Incorrectly defining visual scanning behavior and the manner that humans seek information when interacting in a system context can result in devastating outcomes and system inefficiencies if model results are to be relied upon for system design and evaluation purposes. The improved predictive capability of information-seeking behavior that resulted from the implementation of the validated SEEV model leaves MIDAS better suited to predict performance in complex human-machine systems. Acknowledgments. The SEEV model integration into MIDAS v5 is the result of NASA’s Aviation Safety Program’s Integrated Intelligent Flight Deck Technologies (IIFTD), System Design & Analysis project. The reported modeling effort was coordinated by Dr. David C. Foyle (NASA Ames Research Center). The opinions expressed in this paper are those of the authors and do not reflect the opinions of NASA, the Federal government, Alion Science and Technology, or SJSU.
References 1. Landy, M.S.: Vision and attention for Air MIDAS (NASA Final Report NCC2-5472). New York University, Moffett Field, CA (2002) 2. Corker, K.M., Gore, B.F., Guneratne, E., Jadhav, A., Verma, S.: Coordination of Air MIDAS Safety Development Human Error Modeling: NASA Aviation Safety Program Integration of Air MIDAS Human Visual Model Requirement and Validation of Human Performance Model for Assessment of Safety Risk Reduction through the implementation of SVS technologies (Interim Report NCC2-1563): San Jose State University (2003) 3. Wickens, C.D., McCarley, J.M.: Applied Attention Theory. Taylor and Francis/CRC Press, Boca Raton (2008) 4. Gore, B.F.: Chapter 32: Human Performance: Evaluating the Cognitive Aspects. In: Duffy, V. (ed.) Handbook of Digital Human Modeling. Taylor and Francis/CRC Press, NJ (2008) 5. Gore, B.F., Hooey, B.L., Foyle, D.C., Scott-Nash, S.: Meeting the Challenge of Cognitive Human Performance Model Interpretability Though Transparency: MIDAS V5.X. In: The 2nd International Conference On Applied Human Factors And Ergonomics, Las Vegas, Nevada, July 14-17 (2008) 6. Hooey, B.L., Foyle, D.C.: Advancing the State of the Art of Human Performance Models to Improve Aviation Safety. In: Foyle, D.C., Hooey, B.L. (eds.) Human Performance Modeling in Aviation. CRC Press, Boca Raton (2008) 7. Gore, B.F., Smith, J.D.: Risk Assessment and Human Performance Modeling: The Need for an Integrated Approach. In: Malek, K.A. (ed.) International Journal of Human Factors of Modeling and Simulation, vol. 1(1), pp. 119–139 (2006) 8. Badler, N.I., Phillips, C.B., Webber, B.L.: Simulating Humans: Computer Graphics, Animation, and Control. Oxford University Press, Oxford (1993)
246
B.F. Gore et al.
9. McCracken, J.H., Aldrich, T.B.: Analysis of Selected LHX Mission Functions: Implications for Operator Workload and System Automation Goals (Technical note ASI 479-024-84(b)). Anacapa Sciences, Inc. (1984) 10. Hamilton, D.B., Bierbaum, C.R., Fulford, L.A.: Task Analysis/Workload (TAWL) User’s Guide Version 4.0. U.S. Army Research Institute, Aviation Research and Development Activity, Fort Rucker, AL: Anacapa Sciences, Inc. (1990) 11. Mitchell, D.K.: Mental Workload and ARL Workload Modeling Tools. ARL-TN-161. Aberdeen Proving Ground, M.D., Army Research Laboratory (2000) 12. Hooey, B.L., Gore, B.F., Scott-Nash, S., Wickens, C.D., Small, R., Foyle, D.C.: Developing the Coordinated Situation Awareness Toolkit (CSATK): Situation Awareness Model Augmentation and Application. In: HCSL Technical Report (HCSL-08-01) NASA Ames, Moffett Field, CA (2008) 13. Wickens, C.D., Goh, J., Helleberg, J., Horrey, W., Talleur, D.A.: Attentional Models of Multi-Task Pilot Performance using Advanced Display Technology. Human Factors 45(3), 360–380 (2003) 14. Wickens, C.D., McCarley, J.S., Alexander, A., Thomas, L., Ambinder, M., Zheng, S.: Attention-Situation Awareness (A-SA) Model of Pilot Error. In: Foyle, D., Hooey, B.L. (eds.) Human Performance Modeling in Aviation, Taylor and Francis/CRC Press, Boca Raton (2008) 15. Wickens, C.D., Hooey, B.L., Gore, B.F., Sebok, A., Koenecke, C., Salud, E.: Identifying Black Swans in NextGen: Predicting Human Performance in Off-Nominal Conditions. In: Proceeding of the 53rd Annual Human Factors and Ergonomics Society General Meeting, San Antonio, TX, October 19-23 (2009) 16. Gore, B.F., Hooey, B.L., Wickens, C.D., Sebok, A., Hutchins, S., Salud, E., Small, R., Koenecke, C., Bzostek, J.: Identification Of Nextgen Air Traffic Control and Pilot Performance Parameters for Human Performance Model Development in the Transitional Airspace. In: NASA Final Report, ROA 2007, NRA # NNX08AE87A, SJSU, San Jose (2009) 17. McCarley, J., Wickens, C.D., Steelman, K., Sebok, A.: Control of Attention: Modeling the Effects of Stimulus Characteristics, Task Demands, and Individual Differences. NASA Final Report, ROA 2007, NRA NNX07AV97A (2007) (in prep.)
Towards a Computational Model of Perception and Action in Human Computer Interaction Pascal Haazebroek and Bernhard Hommel Cognitive Psychology Unit & Leiden Institute for Brain and Cognition Wassenaarseweg 52, Leiden, 2333 AK The Netherlands {PHaazebroek,Hommel}@fsw.leidenuniv.nl
Abstract. The evaluation and design of user interfaces may be facilitated by using performance models based on cognitive architectures. A recent trend in Human Computer Interaction is the increased focus on perceptual and motorrelated aspects of the interaction. With respect to this focus, we present the foundations of HiTEC, a new cognitive architecture based on recent findings of interactions between perception and action in the domain of cognitive psychology. This approach is contrasted with existing architectures. Keywords: Cognitive Architecture, Perception, Action, HCI, action effect learning, PDP, connectionism.
1 Introduction The evaluation and design of user interfaces often involves testing with human subjects. However, sometimes this is too expensive, impractical or plainly impossible. In these cases, usability experts often resort to analytical evaluation driven by their intuition rather than empirically obtained findings or quantitative theory. In these situations, computational models of human performance can provide an additional source of information. When applied appropriately, these models can interact with user interfaces and mimic the user in this interaction, yielding statistics that enable the usability engineer to quantitatively compare interaction with alternative interface designs or to locate possible bottlenecks in human computer interaction. As more and more aspects of our lives are becoming increasingly 'computerized', even small improvements that slightly facilitate user interaction can scale up to large financial benefits for organizations. In addition, using computational models of human performance may contribute to deeper insights in the mechanisms underlying human computer interaction. Usually, models of human performance are task specific instances of a more generic framework: a cognitive architecture [1]. Such an architecture (e.g., ACT-R, [2]; SOAR, [3]; EPIC, [4]) describes the overall structure and basic principles of human cognition, covering a wide range of human cognitive capabilities (e.g., attention, memory, problem solving and learning). Recently, the focus in Human Computer Interaction is no longer only on purely cognitive aspects, but also on the perceptual and motor aspects of interaction. Computers, mobile phones, interactive toys and other devices are increasingly V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 247–256, 2009. © Springer-Verlag Berlin Heidelberg 2009
248
P. Haazebroek and B. Hommel
equipped with advanced displays and controls, such as direct manipulation GUI's, touch screens, multi-function keys et cetera, that allow for user interfaces that draw on a rich body of real world perceptual-motor experience in the human user [5]. To account for perceptual and action-related interactions, some cognitive architectures have extended their coverage from primarily cognitive processes to perceptual processing and response execution (e.g. EPIC; ACT-R/PM, [6]). Although these approaches have been shown to be quite successful in modeling human performance in a number of specific tasks, they are still too limited to explain more general phenomena that are relevant in the perception-action domain in cognitive psychology. In this paper, we first examine some existing cognitive architectures and discuss their characteristics with respect to a number of challenging findings from cognitive psychology. Next, we present and describe the characteristics of our HiTEC model for perception and action [7]. Finally, we discuss its promise as a cognitive architecture for digital human modeling in HCI.
2 Cognitive Architectures A cognitive architecture can be characterized as a broad theory of human cognition based on a wide selection of human experimental data [1]. Whereas traditional research in cognitive psychology tends to focus on specific theories of a very limited range of phenomena, cognitive architectures are attempts to integrate these theories into computer simulation models. Apart from their potential to compare and contrast various theoretical accounts, cognitive architectures can be useful in creating models for an applied domain like HCI that requires users to employ a wide range of cognitive capabilities, even in very simple tasks. Cognitive architectures define an overall structure and general principles. To model a specific task, certain aspects (e.g., prior knowledge, the task goal) need to be filled in by a cognitive engineer. Only then, ‘running’ the architecture may result in interactions comparable to human behavior. The best known cognitive architectures (e.g., SOAR, EPIC, ACT-R) are theoretically based on the Model Human Processor, the seminal work of Card, Moran, and Newell [8]. According to this theoretical model, the human ‘processor’ is composed of three main modules: perception, cognition and action. It describes cognitive processing as a cyclic, sequential process from stimulus perception to cognitive problem solving to response execution. Note that this closely resembles the seven stage model [9] often used to explain human behavior in HCI: (1) users perceive the state of the world, (2) users interpret their perception, (3) users form evaluations based on these interpretations, (4) users match these evaluations against their goals, (5) users form an intention to act, (6) users translate this intention into a sequence of actions and (7) users execute this action sequence. Executing an action sequence subsequently results in a change in the world state which can again be perceived in stage 1. Traditionally, cognitive architectures are developed to model the middle, cognitive steps of this sequence. It is assumed that the first steps, perceiving and interpreting the world state, are performed relatively easily. The main focus is on comparing the world state with a goal state and deciding upon which action to take next in order to achieve the goal state. It is further assumed that once an action is chosen, its execution is easy, leading to a predictable new world state.
Towards a Computational Model of Perception and Action in HCI
249
The core mechanism used by most architectures is a production rule system. A production rule defines the translation of a pre-condition into an action that is known to produce a desired post-condition. This can be interpreted as “IF (x) THEN (y)” rules. By specifying a set of production rules, a cognitive architecture can be given some prior knowledge resulting in tendencies to choose those actions that eventually realize certain goals. When putting a cognitive architecture, endowed with a set of production rules, in interaction with an environment, conflicts between rules or unexpected conditions may present themselves. Some cognitive architectures have means to cope with these situations. For example, SOAR has a learning mechanism that can learn new production rules [1]. By assuming a set of production rules, a cognitive architecture also assumes a set of action alternatives. However, when a user is interacting with a physical or virtual environment, it is often unclear which actions can be performed. In certain contexts, users may not readily detect all action opportunities and action alternatives may differ in their availability, leading to variance in behavior [10]. This is hard to capture in a cognitive architecture that assumes a predefined set of (re)actions. With the increased interest in the perceptual and action-related aspects of human computer interaction, some of the architectures have been extended with perceptual and motor modules that allow for the modeling of aspects related to ‘early’ and ‘late’ stages of the interaction cycle as well. For example, EPIC contains not only production rules which define the behavior of the cognitive processor, but also some perceptual-motor parameters which define the time courses of (simulated) perceptual information processing and (simulated) motor action [4]. Importantly, perceptual processing is modeled as a computation of ‘additional waiting time’ before the production rules can be applied. By defining certain parameters in the model, this waiting time can, for instance, vary for different modalities. Similarly, EPIC does not simulate actual motor movement, but computes the time it would take for a particular motor output to be produced after the cognitive processor has sent the action instruction. This time course depends on specified motor features and the current state of the motor processor (i.e.., the last movement it has prepared). ACT-R/PM is a recent extension of the ACT-R cognitive architecture. It allows a modeler to include time estimates of perceptual and motor processes in a similar fashion as EPIC [1]. In sum, cognitive architectures typically maintain a perception-cognition-action flow of information, where the focus of the modeling effort is primarily on cognitive (i.e., problem-solving) aspects. New extensions of some of the leading architectures allow modelers to include perceptual and motor aspects, but this is typically limited to approximations of the time needed to perceive certain features and to produce certain movements.
3 Cognitive Psychology Existing cognitive architectures are generally based on findings in cognitive psychology. They are mainly inspired by studies on problem solving and decision making. However, (recent) findings in the perception-action domain of cognitive psychology may shed some new light on the assumptions of existing cognitive architectures. In the following
250
P. Haazebroek and B. Hommel
we discuss a number of these effects. Subsequently, we describe the Theory of Event Coding that aims at integrating these effects into a single meta-theoretical framework. This is the main theoretical basis of the HiTEC architecture that will be described in the next section. Stimulus Response Compatibility. When studying the design of computer interfaces, Simon [11] accidentally discovered that spatial responses (e.g. pressing a left or right key) to non-spatial stimulus features (e.g., color or shape) are faster if the stimulus location corresponds to the response location. This effect has come to be known as the Simon effect. It suggests that while the only specified task ‘rules’ are “IF red THEN left” and “IF green THEN right”, the non-specified ‘rules’ “IF left THEN left” and “IF right THEN right” are apparently active as well. It is clear that a cognitive architecture that incorporates perceptual and action related processes needs to explain this type of automatic stimulus-response interaction in a natural way. Action influences Perception. In recent experiments, it has been shown [12] that if people prepare a manual grasping or reaching action, they detect and discriminate target stimuli in an unrelated interleaved task faster if these targets are defined on feature dimensions that are relevant for the planned action (e.g., shape and size for grasping, color and contrast for reaching). This finding suggests that action planning can influence object perception. It challenges the traditional view of a strictly sequential flow of information from perceptual stages to stages of action planning and execution. Learning Action Alternatives. Various studies, including research on infants, show that people are capable of learning the perceptual effects of actions and subsequently use this knowledge to select an action in order to achieve these effects [13]. In this way, initially arbitrary actions may become the very building blocks of goal directed action. This principle could introduce a more grounded notion of goal-directedness in a cognitive architecture than merely responding with a set of reactions. 3.1 Theory of Event Coding To account for various types of interaction between perception and action, Hommel, Müsseler, Aschersleben, and Prinz [14] formulated the Theory of Event Coding (TEC). In this meta-theoretical framework they proposed a level of common representations, where stimulus features and action features are coded by means of the same representational structures: ‘feature codes’. Feature codes refer to distal features of objects and events in the environment, such as distance, size and location, but on a remote, descriptive level, as opposed to the proximal features that are registered by the senses. Second, at this common codes level, stimulus perception and action planning are considered to be similar processes; both involve activating and integrating feature codes into complex structures called ‘event files’. Third, action features refer to the perceptual consequences of a motor action; when an action is executed, its perceptual effects are integrated into an event file, an action concept. Following the Ideomotor theory [15], one can plan an action by anticipating the features belonging to this action concept. As a result, actions can be planned voluntarily by intending their perceptual effects. Finally, TEC stresses the role of task
Towards a Computational Model of Perception and Action in HCI
251
context in stimulus and response coding. In particular, feature codes are “intentionally weighted” according to the action goal at hand. In order to computationally specify the mechanisms proposed in TEC and validate its principles and assumptions by means of simulations, we are developing the HiTEC architecture [7]. HiTEC is a generic architecture that can be used to define more specific computational models of human perception and action control and that can serve as a starting point for a cognitive architecture in digital human modeling for HCI. In the following, we describe the HiTEC architecture in terms of its structures and processes and discuss how the architecture incorporates the above mentioned psychological effects.
4 HiTEC The Theory of Event Coding provides a number of constraints on the structure and processes of the HiTEC cognitive architecture. First, we describe the general structure of HiTEC and its representations. Next, we describe the processes operating on these representations, following the two-stage model for the acquisition of voluntary action control [16]. 4.1 HiTEC’s Structure and Representations HiTEC is architected as a connectionist network model that uses the basic building blocks of parallel distributed processing (PDP, [17]). In a PDP network model processing occurs through the interactions of a large number of interconnected elements called units or nodes. During each update cycle, activation propagates gradually through the nodes. In addition, connections between nodes may be strengthened or weakened reflecting long term associations between nodes. In HiTEC, the elementary nodes are codes which can become associated. As illustrated in Fig. 1, codes are organized into three main systems: the sensory system, the motor system and the common coding system. Each system will now be discussed in more detail. Sensory System. The human brain encodes perceived objects in a distributed fashion: different features are processed and represented by different brain areas. In HiTEC, different perceptual modalities (e.g., visual, auditory, tactile, proprioceptive) and different dimensions within each modality (e.g., visual color and shape, auditory location and pitch) are processed and represented in different sensory maps. Each sensory map is a module containing a number of sensory codes that are responsive to specific sensory features (e.g., a specific color or a specific pitch). Note that Fig. 1 shows only a subset of sensory maps. Models based on the HiTEC architecture may include other sensory maps as well. Motor System. The motor system contains motor codes, referring to proximal aspects of specific movements (e.g., right index finger press, left hand power grasp et cetera). Although motor codes could also be organized in multiple maps, in the present version of HiTEC we consider only one basic motor map with a rudimentary set of motor codes.
252
P. Haazebroek and B. Hommel
Fig. 1. HiTEC Architecture
Common Coding System. According to TEC, both perceived events and events that are generated by action are coded in one common representational domain [14]. In HiTEC, this is implemented as a common coding system that contains common feature codes. Feature codes refer to distal features of objects (e.g., global location in scene, overall object color, size, et cetera) as opposed to the proximal features coded by the sensory codes and motor codes. Feature codes may be associated to both sensory codes and motor. They can combine information from different modalities and are in principle unlimited in number. TEC assumes that feature codes are not fixed, but that they emerge by extracting regularities from sensorimotor experiences. For example, as a result of frequently using the left hand to grasp an object perceived in the left visual field, the distal feature code ‘left’ may emerge which both codes for left-hand actions and for objects perceived in the left visual field. As a result, feature codes gradually evolve and change over time. Associations. In HiTEC, codes can become associated, both for short term and for long term. Short term associations between feature codes reflect that these codes 'belong together in the current task or context’ and that their binding is actively maintained in working memory. In Fig. 1, these temporary bindings are depicted as dashed lines. Long term associations can be interpreted as learned connections reflecting prior experience. These associations are depicted as solid lines in Fig. 1. Event Files. Another central concept of TEC is the event file [18]. In HiTEC, the event file is modeled as a structure that temporarily associates to feature codes that 'belong together in the current context’ in working memory. An event file serves both
Towards a Computational Model of Perception and Action in HCI
253
the perception of a stimulus as well as the planning of an action. When multiple events are present in working memory, choosing between these events (e.g., deciding between different action alternatives) is reflected by competition between the associated event files. This competition is computationally modeled by means of negative associations between event files, depicted as solid lines with filled disk ends in Fig. 1. 4.2 HiTEC’s Processes Following Elsner and Hommel [16] two-stage model of acquisition of voluntary action we now describe the HiTEC processes that enable the learning of action alternatives. Next, we discuss how HiTEC allows for action and task mediated perception as well as stimulus response compatibility. Stage 1: Acquiring Action – Effect Associations. Feature codes are perceptually grounded representations since they are derived by abstracting regularities in activations of sensory codes. Associations between feature codes and motor codes reflect acquired knowledge of action-effect contingencies: motor codes mi are activated, either because of some already existing action-effect associations or simply randomly (e.g., an infant trying out some buttons on an interactive toy). This leads to a change in the environment (e.g., pressing a button produces a sound) which is registered by sensory codes si. Activation propagates from sensory codes towards feature codes fi. Eventually, these feature codes are integrated into an event file ei which acts as an action concept. Subsequently, the cognitive system learns associations between the feature codes fi belonging to this action concept and the motor code mi that just led to the executed motor action. The weights of these associations depend on activation of the motor code and the feature code. Crucially, this allows the task context to influence the learning of action effects, by moderating the activation of certain feature codes. Due to this top-down moderation, task-relevant features (e.g., button look and feel) are weighted more strongly than task-irrelevant features (e.g., lighting conditions in the room). Nonetheless, this does not exclude task-irrelevant but very salient action effects to become involved in strong associations as well. Stage 2: Using Action – Effect Associations. Once associations between motor codes and feature codes exist, they can be used to select and plan voluntary actions. By anticipating desired action effects, feature codes become active. Now, by integrating the feature codes into an action concept, the system can treat the features as constituting a desired state and propagate their activation towards associated motor codes. Initially, multiple motor codes mi may become active as feature codes typically connect to multiple motor codes. However, some motor codes will have more associated features that are also part of the active action concept and some of the mi – fi associations may be stronger than others. Therefore, in time, the network will converge towards a state where only one code mi is strongly activated, which will lead to the selection of that motor action. In addition to the mere selection of a motor action, feature codes also form the actual action plan that specifies (in distal terms) how the action should be executed (e.g., global button location). This action plan is kept active in working memory, allowing
254
P. Haazebroek and B. Hommel
the system to monitor, evaluate and adjust the actual motor action. Crucially, action alternatives can be learned and selected in terms of their perceptual effects. Task Preparation. In human computer interaction, users may have tendencies to respond differently to different stimulus elements. To model this, different event files are created and maintained for the various options (e.g., choosing among buttons that produce different sounds). Due to the negative links between these event files, they will compete with each other during the task. Perception and Action. When the environment is perceived, sensory features will activate a set of feature codes. Activation propagates towards one or more event files (that were formed during task preparation). Competition takes place between these event files. Simultaneously, activation propagates from event files to action effect features and motor codes, resulting in the execution and control of motor action. Note that task preparation already sensitizes feature codes both for the to-be-perceived stimuli and for the to-be-planned responses. Therefore, the cognitive system is biased in perceiving elements in the environments and anticipating responses in terms of these feature codes. As the common feature codes are used for both perception and action, perceptual coding can influence action coding and vice versa. Stimulus Response Compatibility. When feature codes for perceived elements and anticipated responses overlap, stimulus-response compatibility effects can arise: if a stimulus element activates a feature code (e.g., picture of an animal) that is also part of the event file representing/of the correct response (e.g., the sound of that animal), this response is activated more quickly, yielding faster reactions. If, on the other hand, the feature code activated by the stimulus element is part of the incorrect response, this increases the competition between the event files representing the correct and incorrect response, resulting in slower reactions.
5 Discussion We discussed existing cognitive architectures and highlighted their limitations with respect to a number of psychological findings highly relevant to HCI. We subsequently described our HiTEC architecture and discussed how these findings could be explained from HiTEC’s basic structures and processes. Like other cognitive models, HiTEC also consists of perception, motor and cognitive modules. However, in contrast to the sequential architecture of existing models, the modules in HiTEC are highly interactive and operate in parallel. Perception of a stimulus does not need to be completed before an action plan is formed (as suggested by linear stage models). Furthermore, the cognitive module contains common codes that are used for encoding perceived stimulus as well as for anticipated actions. Actions are represented as motor programs in the motor module, but they are connected to their (learned) perceptual action effects (e.g., a resulting visual effect or a haptic sensation of a key press) as proposed by Ideomotor theory [15]. The way in which tasks are encoded in HiTEC (by using competing event files) shows similarities to a system where multiple production rules compete for ‘firing’.
Towards a Computational Model of Perception and Action in HCI
255
However, it is important to note that action alternatives are selected on the basis of their distal feature effects, rather than on the basis of their proximal, motor characteristics. We acknowledge that HiTEC, in its current incarnation, is not yet capable of the rich body of simulations that other cognitive architectures have demonstrated. Indeed, we do not exclude a production rule component in future versions, but we emphasize that the core of HiTEC consists of perception-action bindings. The strengths of HiTEC lay primarily in its ability to learn perceptual action effects in a principled way and using these effects for action selection and control. This naturally results in a mechanism that enables stimulus response translation on an abstract level, thereby allowing the system to generalize over different but similar perceptual features, both in object perception and in action planning. This leniency may avoid creating a ‘production rule’ for each and every minute variant that the system may encounter. This may increase the system’s robustness against variability in perception and action. Of course, expending our simulations to environments and tasks that currently can be simulated by other architectures, requires more intricate techniques to actually learn the sensorimotor contingencies (i.e., feature codes) that we now assume. Also, we now discussed a situation where a simple set of task rules was predefined. Further research is necessary to assess the role of long term memory and motivational influences in this respect. With the rise of new HCI environments, such as various mobile devices, virtual reality and augmented reality, HCI within these virtual environments increasingly resembles interaction in the physical world. This trend stresses the importance of studying the implications of findings in the perception-action domain for the field of HCI. Acknowledgments. Support for this research by the European Commission (PACOPLUS, IST-FP6-IP-027657) is gratefully acknowledged.
References 1. Byrne, M.D.: Cognitive architecture. In: Jacko, J.A., Sears, A. (eds.) Human-Computer Interaction Handbook, pp. 97–117. Erlbaum, Mahwah (2008) 2. Anderson, J.R.: Rules of the mind. Erlbaum, Hillsdale (1993) 3. Newell, A.: Unified theories of cognition. Harvard University Press, Cambridge (1990) 4. Kieras, D.E., Meyer, D.E.: An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction 12, 391–438 (1997) 5. Welsh, T.N., Chua, R., Weeks, D.J., Goodman, D.: Perceptual-Motor Interaction: Some Implications for HCI. In: Jacko, J.A., Sears, A. (eds.) The Human-Computer interaction Handbook: Fundamentals, Evolving Techniques, and Emerging Applications, pp. 27–41. Erlbaum, Mahwah (2008) 6. Byrne, M.D., Anderson, J.R.: Perception and action. In: Anderson, J.R., Lebiere, C. (eds.) The atomic components of thought, pp. 167–200. Erlbaum, Hillsdale (1998) 7. Haazebroek, P., Hommel, B.: HiTEC: A computational model of the interaction between perception and action (submitted)
256
P. Haazebroek and B. Hommel
8. Card, S.K., Moran, T.P., Newell, A.: The psychology of human-computer interaction. Erlbaum, Hillsdale (1983) 9. Norman, D.A.: The Design of Everyday Things. Basic Book, New York (1988) 10. Kirlik, A.: Conceptual and Technical Issues in Extending Computational Cognitive Modeling to Aviation. In: Proceedings of Human-Computer Interaction International, pp. 872–881 (2007) 11. Simon, J., Rudell, A.: Auditory s-r compatibility: The effect of an irrelevant cue on information processing. Journal of Applied Psychology 51, 300–304 (1967) 12. Fagioli, S., Hommel, B., Schubotz, R.I.: Intentional control of attention: Action planning primes action-related stimulus dimensions. Psychological Research 71, 22–29 (2007) 13. Hommel, B., Elsner, B.: Acquisition, representation, and control of action. In: Morsella, E., Bargh, J.A., Gollwitzer, P.M. (eds.) Oxford handbook of human action, pp. 371–398. Oxford University Press, New York (2009) 14. Hommel, B., Muesseler, J., Aschersleben, G., Prinz, W.: The theory of event coding (TEC): A framework for perception and action planning. Behavioral and Brain Sciences 24, 849–937 (2001) 15. James, W.: The principles of psychology. Dover Publications, New York (1890) 16. Elsner, B., Hommel, B.: Effect anticipation and action control. Journal of Experimental Psychology: Human Perception and Performance 27, 229–240 (2001) 17. Rumelhart, D.E., Hinton, G.E., McClelland, J.L.: A general framework for parallel distributed processing. In: Rumelhart, D.E., McClelland, J.L., The PDP Research Group (eds.) Parallel distributed processing: Explorations in the microstructure of cognition. MIT Press, Cambridge (1986) 18. Hommel, B.: Event files: Feature binding in and across perception and action. Trends in Cognitive Sciences 8, 494–500 (2004)
The Five Commandments of Activity-Aware Ubiquitous Computing Applications Nasim Mahmud, Jo Vermeulen, Kris Luyten, and Karin Coninx Hasselt University – tUL – IBBT, Expertise Centre for Digital Media, Wetenschapspark 2, B-3590 Diepenbeek, Belgium {nasim.mahmud,jo.vermeulen,kris.luyten, karin.coninx}@uhasselt.be
Abstract. Recent work demonstrates the potential for extracting patterns from users’ behavior as detected by sensors. Since there is currently no generalized framework for reasoning about activity-aware applications, designers can only rely on the existing systems for guidance. However, these systems often use a custom, domain-specific definition of activity pattern. Consequently the guidelines designers can extract from individual systems are limited to the specific application domains of those applications. In this paper, we introduce five high-level guidelines or commandments for designing activity-aware applications. By considering the issues we outlined in this paper, designers will be able to avoid common mistakes inherent in designing activity-aware applications.
1 Introduction In recent years, researchers have demonstrated the potential for extracting patterns from users’ behavior by employing sensors [1-3]. There are various applications for detecting the user's activities. Systems such as FolderPredictor [4] and Magitti [5] offer suggestions to users to assist them in their current activity. Other work has used activity recognition to provide awareness of people's activities to improve collaboration [6, 7] or the feeling of connectedness within groups [8]. Furthermore, a recent trend is to employ sensors to predict the user's interruptibility, allowing computers to be more polite and interact with users in an unobtrusive way [9-11]. As there is currently no generalized framework for reasoning about activity-aware applications, designers can only rely on the existing systems for guidance. However, these systems often use a custom, domain-specific definition of activity pattern. For example, the Whereabouts clock focuses on the user's location to determine their general activity [8], while FolderPredictor only takes the user's desktop activity into account [4]. Consequently, the guidelines designers can extract from individual systems are limited to that system's specific application domain. Although there are existing, focused design frameworks that deal with background and foreground interaction [12]; employing sensors [13-15]; or allowing users to intervene when a system acts on their behalf [16, 17], it is hard for designers to come to a generalized body of design knowledge for activity-aware systems by integrating each of these frameworks. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 257–264, 2009. © Springer-Verlag Berlin Heidelberg 2009
258
N. Mahmud et al.
In this paper, we introduce five high-level guidelines or commandments that need to be addressed when designing activity-aware applications. The main contribution of this framework is that it allows designers to avoid common mistakes inherent in designing activity-aware applications, regardless of the targeted application domain and activity recognition technologies. We combine and generalize existing models in a convenient way for designers. Just as for designing desktop applications, designers need general guidelines that they can rely on. The availability of such a set of guidelines is a critical factor in moving activity-aware applications beyond research prototypes and into practical applications. Our hope is that this work is another step towards a generalized body of design knowledge for activity-aware systems. Our own work on applications for detecting user’s activities with sensors and a broad study of existing activity-aware systems and design frameworks led us to develop the following five commandments for activity-aware systems: 1. 2. 3. 4. 5.
View activity patterns in context; Don’t view a user’s activities in isolation, but in their social context; Deal with hierarchical reuse of patterns; Take uncertainty into account at different levels of abstraction; Allow users to intervene and correct the system.
In the following, each commandment is explained into detail together with a motivating example.
2 View Activity Patterns in Context Human activity does not consist of isolated actions, it is rooted in context. As discussed by Suchman [18], people's behavior is contextualized, i.e. the situation is a very important factor in determining what people will do. It is not possible to generalize and predict people's behavior without considering the situation they are in at that time. It is important that designers consider activities in context. Context consists of many aspects, including but not limited to: the time of day, location or the presence of other people (see commandment 2). In activity-aware systems, an important aspect is how the elements inside a pattern are temporally related to each other. The time of occurrence of an element in a pattern, its duration and the time interval between two such elements are important issues to consider. The elements might form a sequence, occur concurrently, or have a more complex temporal relationship with each other such as the ones described by Allen [19]. It is important to take these temporal relationships into account in order to correctly identify different activity patterns. Another use for time as context information is in supporting continuous interaction, as described by Abowd et al. [20]. Although desktop computers allow for multitasking, computer systems still largely expect users to do their tasks on a single machine, and to do one task after another (although they can switch quickly by switching from one window to another). Abowd et al. state that this assumption will not be valid for ubiquitous computing. In real life, people also regularly start new activities and stop doing others, and do activities concurrently (e.g. watching television while ironing clothes).
The Five Commandments of Activity-Aware Ubiquitous Computing Applications
259
For a practical method of implementing these features, we refer to Patterson et al. [21] who evaluated several techniques to detect activities, and explain how to support detection of concurrent and interruptible activities.
3 Don’t View a User’s Activities in Isolation, But in Their Social Context As we discussed before in commandment 1, human activity is highly dependent on the context. An important aspect of context is social context. A user’s social context consists of the people who are in his surroundings. A user’s activity patterns might be different when he is alone than when he is in a group. What’s more, they might differ according to who he is with at that time (e.g. his colleagues, his friends or his family). Suppose Jim has a pattern which consists of playing his favorite music when he gets home from work. This pattern might not be applicable to the situation where he comes home and there are guests in the house. Besides the fact that social context is important to consider for classifying patterns, analyzing the patterns of groups of people might help to create common patterns across people. Someone might only once enter a seminar room and turn off the lights when a speaker is going to give a talk, making it impossible to detect this as a pattern. However, several people might perform this activity in a common pattern. On the other hand, some patterns are specific to certain people (e.g. taking extra strong black coffee from the vending machine). This pattern might not necessarily hold for this person’s colleagues.
4 Allow Hierarchical Reuse of Patterns Activity Theory (AT) defines an operation as an atomic procedure. A set of operations forms an action while a set of actions forms an activity [22]. The level of granularity can be determined according to what the sensors allow the system to detect. For example, in a GUI environment operations could be keyboard and mouse events, an action could be “selection”, while the activity could be “select a file”. A set of actions may be reoccurring and can form an activity pattern. Existing patterns might themselves be used again in another higher level pattern (e.g. making coffee could be a sub pattern in the morning routine pattern). While the terminology is not clearly defined (making coffee could become an action while it was previously an activity), the point we want to make here is that hierarchical patterns are a natural way of describing real-life activities. According to AT [22], interaction between human beings and the world is organized into subordinated hierarchical levels. Designers should make sure that activity patterns can be organized in a hierarchy.
5 Take Uncertainty into Account at Different Levels of Abstraction An activity-aware system should take uncertainty into account at different levels of abstraction. Activities are in general detected by aggregating the values of one or
260
N. Mahmud et al.
more sensors and reasoning about these values to detect patterns. Sensors can be pieces of hardware (e.g. a temperature sensor), software (e.g. the user’s schedule obtained from their online calendar) or a combination of the two (e.g. a GPS beacon). At the lowest level, uncertainty can occur due to imprecision in the data generated by sensors. A sensor could be faulty, or might need to be smoothed over time to get correct results. For example, the user’s distance to a display could be detected with a Bluetooth beacon attached to the display. The sensor would retrieve the Received Signal Strength Indication (RSSI) value from the user’s Bluetooth-enabled phone to the beacon. When the RSSI value is low (around zero) the user is standing close to the display, otherwise he will be further away. However, this value will fluctuate a lot, as shown in Figure 1 where the phone is just laying on a desk. The system should deal with this uncertainty by smoothening the sensor reads. At an intermediate level, there can be uncertainty in pattern recognition. This type of uncertainty can have several causes. It could be caused by inadequate sensors that prohibit certain parts of the user’s state from being detected (e.g. the user’s emotional state). Another cause could be insufficient training data for the recognition algorithm. For example, systems that are designed to be used in public spaces might not be able to learn a lot about their users, since most people will only use the system once. Uncertainty at an intermediate level can furthermore be caused by user actions that the designers of the system didn’t take into account. As an example of this kind of uncertainty, consider an anti-carjacking device that will automatically be triggered when the driver exits the car with the engine still running. The system will then after a certain time automatically disable the car’s engine, close the doors and sound an alarm. The motivation for these actions is to make the car unusable when the driver has been unwillingly forced to exit the vehicle. Now suppose Jim is using his car to deliver a local community magazine in the neighborhood. At each house, he parks his car next to the road with the engine running, steps out to drop the magazine in the mailbox, and gets back in the car. When he gets to his friend Tom’s house, he parks
Fig. 1. Uncertainty at the lowest level: imprecision in data generated by a Bluetooth RSSI distance sensor
The Five Commandments of Activity-Aware Ubiquitous Computing Applications
261
his car on Tom’s driveway. After dropping the magazine in the mailbox of his friend Tom, he sees Tom working in the garden and goes over to chat with him. The engine of Jim’s car is still running, but he feels his car is safe on Tom’s driveway. Besides, they are both nearby. About a minute later however, the engine of Jim’s car suddenly shuts down, the doors close and a loud alarm starts blaring. Jim is unable to enter his car and has to explain the situation to the authorities who arrive soon after. This embarrassing situation occurred because the designers of the anti-carjacking device did not take into account that a driver might ever step out of the car and leave his engine running for more than two minutes. Finally, at the highest level, there might be uncertainty in the user’s mental model of the system. This contributes to the discrepancy between what the user expects (their mental model of the system), what the system can sense through its sensors (e.g. a positioning system might not be as accurate as the user expects, leaving him wondering why his movements are left undetected), and what is desired (what is needed for the application), as discussed by Benford et al. [14]. They argue that by explicitly analyzing mismatches between each of expected, sensed, and desired, designers can detect problems resulting from these mismatches early on as well as find opportunities to exploit these mismatches. An interesting practical, activity-aware application that deals with uncertainty is described by Patterson et al. [21]. Their system allows fine-grained activity recognition by analyzing which objects users are manipulating by means of RadioFrequency Identification (RFID) tags attached to kitchen objects, and a glove equipped with an RFID reader that is worn by the user. Their system is resilient against intermediate level uncertainty: it can recognize activities even when different objects are used for it (e.g. using a table spoon instead of a cooking spoon for making oatmeal). An important way to deal with uncertainty is to involve the user. The system could allow users to override its actions at any time, thereby offering a safety net for when the system’s actions would be inappropriate. When a system is uncertain, it might also ask the user for confirmation. In an ideal case, computation would be split between humans and computers, letting each one perform the tasks they do best [23]. The next commandment discusses user intervention in activity-aware systems.
6 Allow Users to Intervene and Correct the System Users should be able to correct the activity-aware system when it makes a mistake. The system is bound to make mistakes, as it is impossible to detect the user’s activity correctly in every possible situation. General activity recognition might be classified as an AI-complete problem [24], which means that a solution to this problem would presuppose the availability of a solution to the general problem of intelligence. Since it is inevitable that the system will make mistakes, there should be a way for users to correct the system, so they can stay in control. As discussed by Bellotti et al. [13], the lack of communication between a system that employs sensors and its users is an important problem. Because these systems often have both less means of providing feedback and will take more actions on behalf of the user than desktop systems, the user will quickly feel out of control.
262
N. Mahmud et al.
Dey et al. [17] introduce the term mediation to refer to the dialogue that takes place between user and system to handle ambiguities in context. They describe four guidelines for mediation of context-aware systems: 1. Applications should provide redundant mediation techniques to support more natural and smooth interactions; 2. Applications should facilitate providing input and output that are distributed both in space and time to support input and feedback for mobile users; 3. Interpretations of ambiguous context should have carefully chosen defaults to minimize user mediation, particularly when users are not directly interacting with a system; 4. Ambiguity should be retained until mediation is necessary for an application to proceed. Guideline 1 deals with providing several redundant levels of interaction for user input and system feedback. This could for example range from most implicit to most explicit, depending on the user’s attention and level of engagement in the task. Guideline 2 points out the fact that communication between system and user should take into account the space through which the user is moving, and have a timeout period after which a user might not have the chance to interact with a mediator anymore. Guidelines 3 and 4 refer to the fact that mediation should be used with care, so as not to distract the user unnecessarily. To allow corrections to be made in a timely fashion, systems should make clear what they perceive of the user, what (automated) actions they are performing or going to perform and of course provide a way to undo or correct actions. Ju et al. [16] discuss three interaction techniques in their implicit interaction framework that cover these requirements: user reflection (making clear what the system perceives of the user); system demonstration (showing what actions the system is performing or going to perform); and override (providing a handle for the user to correct the system). To illustrate the necessity of mediation, we discuss an experience of incorrect system behavior that one of the authors had when visiting a house of the future. This exhibit demonstrated how recent technology trends such as context-awareness could influence our life in the future, and might make our homes smart. The author missed his train and arrived a bit too late. He had to enter the seminar room when the first talk had already started, which was very annoying since the only entrance to the room was in front of the room, next to the lecturer. As if this wasn’t enough, the smart room automatically turned on the lights when he entered the room, leaving no escape to an unremarkable entrance. This experience clearly illustrates that activity-ware systems need mediation to ensure that users remain in control and will not get frustrated. If we would apply the three interaction techniques of Ju et al. [16] to this scenario, the system might indicate that it senses that the user is entering the seminar room (user reflection), announce that it is going to turn on the lights (system demonstration), and – most importantly – provide the user with an opportunity to cancel this action (override). The authors believe that designers of activity-aware systems should always keep user intervention in mind. Both the guidelines for mediation [17] and the implicit interaction techniques [16] are useful to take into account for this purpose.
The Five Commandments of Activity-Aware Ubiquitous Computing Applications
263
7 Conclusion In this work we have presented five commandments of activity-aware ubiquitous computing applications which is a design guideline for application designers. The commandments presented in this paper is high level guideline intended to help designers in designing activity-aware systems. The guideline will help designers to avoid common mistakes in designing activity-aware systems. Existing activity-aware applications often use a domain-specific definition of activity pattern. Thus existing works do not provide a generalized body of knowledge that application designers can rely on. This work is a step towards generalized guidelines for the application designer. The commandments in this paper suggest the designer to consider the users’ activity in their social context; to consider a pattern of activity in context and to allow for hierarchical reuse of patterns. The commandments also suggest the designer to consider uncertainty at different levels of abstraction and to allow users to intervene and correct the system when the system makes mistake. For designing desktop applications, there are guidelines allowing the designers to use off-the-shelf knowledge. This is not the case for activity-aware ubiquitous computing applications. Hence, generalized guidelines for designing these applications are necessary to move them from lab prototypes into commercial development. In our ongoing work, we are developing an activity-aware ubiquitous computing system that takes the five commandments into account. In our future work, we want to ensure the validity of our guideline in practical settings.
Acknowledgements Part of the research at EDM is funded by ERDF (European Regional Development Fund) and the Flemish Government. Funding for this research was also provided by the Research Foundation -- Flanders (F.W.O. Vlaanderen, project CoLaSUE, number G.0439.08N).
References 1. Begole, J.B., Tang, J.C., Hill, R.: Rhythm modeling, visualizations and applications. In: Proceedings of the 16th annual ACM symposium on User interface software and technology. ACM, Vancouver (2003) 2. Eagle, N., Pentland, A.: Reality mining: sensing complex social systems. Personal Ubiquitous Comput. 10, 255–268 (2006) 3. Philipose, M., Fishkin, K.P., Perkowitz, M., Patterson, D.J., Fox, D., Kautz, H., Hahnel, D.: Inferring Activities from Interactions with Objects. IEEE Pervasive Computing 3, 50– 57 (2004) 4. Bao, X., Herlocker, J.L., Dietterich, T.G.: Fewer clicks and less frustration: reducing the cost of reaching the right folder. In: Proceedings of the 11th international conference on Intelligent user interfaces. ACM, Sydney (2006) 5. Bellotti, V., Begole, B., Chi, E.H., Ducheneaut, N., Fang, J., Isaacs, E., King, T., Newman, M.W., Partridge, K., Price, B., Rasmussen, P., Roberts, M., Schiano, D.J., Walendowski, A.: Activity-based serendipitous recommendations with the Magitti mobile leisure guide. In: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems. ACM, Florence (2008)
264
N. Mahmud et al.
6. Tullio, J., Goecks, J., Mynatt, E.D., Nguyen, D.H.: Augmenting shared personal calendars. In: Proceedings of the 15th annual ACM symposium on User interface software and technology. ACM, Paris (2002) 7. Isaacs, E.A., Tang, J.C., Morris, T.: Piazza: a desktop environment supporting impromptu and planned interactions. In: Proceedings of the 1996 ACM conference on Computer supported cooperative work. ACM, Boston (1996) 8. Brown, B., Taylor, A.S., Izadi, S., Sellen, A., Kaye, J.J., Eardley, R.: Locating Family Values: A Field Trial of the Whereabouts Clock. In: Krumm, J., Abowd, G.D., Seneviratne, A., Strang, T. (eds.) UbiComp 2007. LNCS, vol. 4717, p. 354. Springer, Heidelberg (2007) 9. Gibbs, W.W.: Considerate Computing. Scientific American 292, 54–61 (2005) 10. Fogarty, J., Hudson, S.E., Atkeson, C.G., Avrahami, D., Forlizzi, J., Kiesler, S., Lee, J.C., Yang, J.: Predicting human interruptibility with sensors. ACM Trans. Comput.-Hum. Interact. 12, 119–146 (2005) 11. Horvitz, E., Koch, P., Apacible, J.: BusyBody: creating and fielding personalized models of the cost of interruption. In: Proceedings of the 2004 ACM conference on Computer supported cooperative work. ACM, Chicago (2004) 12. Buxton, B.: Integrating the Periphery and Context: A New Taxonomy of Telematics. In: Proceedings of Graphics Interface 1995 (1995) 13. Bellotti, V., Back, M., Edwards, W.K., Grinter, R.E., Henderson, A., Lopes, C.: Making sense of sensing systems: five questions for designers and researchers. In: Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves. ACM, New York (2002) 14. Benford, S., Schnädelbach, H., Koleva, B., Anastasi, R., Greenhalgh, C., Rodden, T., Green, J., Ghali, A., Pridmore, T., Gaver, B., Boucher, A., Walker, B., Pennington, S., Schmidt, A., Gellersen, H., Steed, A.: Expected, sensed, and desired: A framework for designing sensing-based interaction. ACM Trans. Comput.-Hum. Interact. 12, 3–30 (2005) 15. Hinckley, K., Pierce, J., Horvitz, E., Sinclair, M.: Foreground and background interaction with sensor-enhanced mobile devices. ACM Trans. Comput.-Hum. Interact. 12, 31–52 (2005) 16. Ju, W., Lee, B.A., Klemmer, S.R.: Range: exploring implicit interaction through electronic whiteboard design. In: Proceedings of the ACM 2008 conference on Computer supported cooperative work. ACM Press, San Diego (2008) 17. Dey, A.K., Mankoff, J.: Designing mediation for context-aware applications. ACM Trans. Comput.-Hum. Interact. 12, 53–80 (2005) 18. Suchman, L.: Plans and Situated Actions: The Problem of Human-Machine Communication. Cambridge University Press, Cambridge (1987) 19. Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26, 832–843 (1983) 20. Abowd, G.D., Mynatt, E.D.: Charting past, present, and future research in ubiquitous computing. ACM Trans. Comput.-Hum. Interact. 7, 29–58 (2000) 21. Patterson, D.J., Fox, D., Kautz, H., Philipose, M.: Fine-grained activity recognition by aggregating abstract object usage. Wearable Computers. In: Proceedings of Ninth IEEE International Symposium on Wearable Computers, 2005, pp. 44–51 (2005) 22. Leontiev, A.N.: Activity, Consciousness and Personality. Prentice-Hall, Englewood Cliffs 23. Horvitz, E.: Reflections on Challenges and Promises of Mixed-Initiative Interaction. AI Magazine 28, 19 (2007) 24. Mallery, J.C.: Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. In: The 1988 Annual Meeting of the International Studies Association, St. Louis, MO (1988)
What the Eyes Reveal: Measuring the Cognitive Workload of Teams Sandra P. Marshall Department of Psychology, San Diego State University, San Diego, CA 92182 and EyeTracking, Inc., 6475 Alvarado Rd., Suite 132, San Diego, CA 92120 [email protected]
Abstract. This paper describes the measurement of cognitive workload using the Networked Evaluation System (NES). NES is a unique network of coordinated eye-tracking systems that allows monitoring of groups of decision makers working together in a single environment. Two implementations are described. The first is a military application with teams of officers working together on a simulated joint relief mission, and the second is a fatigue study with teams of individuals working together in a simulated lunar search and recovery mission. Keywords: eye tracking, pupil dilation, cognitive workload, team assessment.
1 Introduction Many activities require teams of individuals to work together productively over a sustained period of time. Sports teams exemplify this, with players relying on each other to maintain vigilance and alertness to changing circumstances of the game. Other types of teams also require vigilance and alertness to detail and often do so under life-threatening circumstances, such as medical teams, SWAT Teams, or First Responder Teams. Each team depends upon the good performance of all its members, and weaknesses in any one of them will change the way the team performs. For instance, sometimes one team member is overloaded and cannot perform his or her duties quickly enough so the entire team slows down; sometimes a team member loses sight of the situation and makes an error so the entire team needs to compensate; and sometimes the team member is fatigued and cannot function effectively so the other members need to assume more responsibility. It is not always immediately evident when a team member is experiencing difficulty. All too often, the first indication is a major error that occurs when the team member reaches the critical point of being seriously impaired (either overloaded or fatigued). Early indication of such problems is clearly desirable but difficult to achieve. This paper describes a networked system for evaluating cognitive workload and/or fatigue in team members as they perform their tasks. The system uses eyetracking data to create a non-intrusive method of workload evaluation. The paper has three parts: the first describes the system itself and how data are collected, the second describes assessing cognitive workload in teams of military officers as they determine V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 265–274, 2009. © Springer-Verlag Berlin Heidelberg 2009
266
S.P. Marshall
how to share resources, and the third describes evaluation of performance and fatigue in a NASA study.
2 The Networked Evaluation System (NES) The networked evaluation system, hereafter called NES, is a unique network of coordinated eye-tracking systems that allows monitoring of groups of decision makers working together in a single environment. Two versions have been developed and tested. One uses lightweight head-mounted optics and the other uses unobtrusive, remote eye-tracking cameras to monitor each individual’s eyes. Each system then synthesizes data from all subjects in real time to enable the comparison of attention level and cognitive workload of all team members. The end product is a functional state-of-the-art eye-tracking network that can produce information in real time about all the team members collectively as well as individually. The head-mounted NES utilizes the SR Research EyeLink II, which is a binocular eye tracking device that samples at 250 Hz. The remote NES utilizes the Tobii X120, which also is a binocular eye tracking device with a sampling rate of 120 Hz. Both eye trackers provide excellent data for eye position (horizontal and vertical pixels) and pupil size. In both configurations, each eyetracker is controlled by GazeTrace™ software from EyeTracking, Inc. which in turn produces the workload measure before feeding it to the central CWAD server (also produced by EyeTracking, Inc.) software for data synchronization and integration [4]. Both NES systems capture the same data: the location of each eye in terms of horizontal and vertical location on the display and the size of each pupil. A primary difference between the two systems is that the head-mounted system records data every 4 msec while the remote system records data every 8.33 msec. The data are transformed by the central processing unit of the NES into more conventional eyetracking metrics such as blinks, fixations, and measures of vergence. The pupil data also are transformed uniquely in the Index of Cognitive Activity (ICA), a patented metric which assess the level of cognitive workload experienced by an individual [2, 3]. Altogether, these metrics may then be combined to provide estimates of cognitive state [1, 4]. In particular, they are useful for examining whether an individual is overloaded, fatigued, or functioning normally. All eyetracking systems in either NES are interconnected by a private computer network. GazeTrace software controls the eyetrackers, instructing them first to calibrate and then to start collecting data. In real time, the GazeTrace software computes the ICA workload measure and sends it to the CWAD server where it is synchronized into a database with eye and workload data from the other eyetrackers in the session.
3 Assessing Cognitive Workload Level The research reported here was conducted under the Adaptive Architectures for Command and Control (A2C2) Research Program sponsored by the Office of Naval
What the Eyes Reveal: Measuring the Cognitive Workload of Teams
267
Research. It was conducted at the Naval Postgraduate School in Monterey, CA. Researchers from San Diego State University and the Naval Postgraduate School collaborated to carry out the study. The primary purpose of the study was to examine how team members work together to overcome limitations, changes, or problems that arise during a mission. Three-person teams worked together in scenarios created within the Distributed Dynamic Decision-Making Simulation (DDD), a simulation system that allows multiple computers to interface and display coordinated screens for a defined environment. The general focus of the study was the Expeditionary Strike Group (ESG), a relatively new military concept of organization that unites several different commands into a single unit that can move rapidly in response to problem situations. In the ESG simulation, the decision makers are given a mission, a set of predefined mission requirements, and information about assets that they control individually. Working together, they formulate plans of action and execute those plans to accomplish the overall mission objective. Examples of simulations in the DDD environment involve humanitarian assistance, disaster relief and maritime interdiction. The simulation was designed to foster interactions among three specific positions in the ESG: Sea Combat Commander (SCC), Marine Expeditionary Unit (MEU), and Intelligence, Surveillance and Reconnaissance Commander or Coordinator (ISR). Seven three-person teams of officers participated in the study. Each team member was assigned a position (SCC, MEU, or ISR) which he or she maintained throughout the entire study. Each team participated in four two-hour sessions. The first two sessions were training sessions designed to familiarize the officers with the simulation software, the general outline of the mission, and the specifics of their own roles as decision makers. The third session, while primarily designed as a training session, was also a valuable source of data. During this session, the teams worked through two scenarios in the DDD simulation. The first scenario was a training scenario. The second was a new scenario designed to test the team’s understanding of the situation. This same second scenario was then repeated during the fourth and final session under different test conditions. Thus, we had direct comparisons between the third and fourth sessions. The fourth session was designed to be the major source of experimental data. When team members arrived for this session, they were told that many of their assets (e.g., helos, UAVs, etc.) used in the previous sessions were no longer available to them. Consequently, they faced the necessity to decide among themselves how to cover the tasks required in the mission under these reduced conditions. The most obvious way to do this was to combine their individual resources and to share tasks. Teams reached consensus about how to work together by discussing the previous scenarios they had seen and describing how they had utilized their individual assets. They then created a written plan to detail how they expected to work together and to share responsibilities. Finally, they repeated the same scenario that was used at the end of the third session and implemented their new plan of cooperation. Thus, the research design allowed direct comparison of team behavior under two conditions: autonomous task performance and coordinated task performance. Under the first condition, each team member was free to select a task objective and to pursue it without undue deliberation or constraint by the actions of other team members. Under the second condition, team members were forced to communicate their plans to
268
S.P. Marshall
the other team members so that they could allocate the necessary resources in a timely fashion. Many mission objectives required actions to be taken by two or sometimes three team members simultaneously. If one team member did not deploy a specific asset in a timely fashion, the mission objective would not be achieved. This design proved to be extremely valuable in examining how cognitive workload changed from one condition to the other. The underlying simulations were identical, thus the same events occurred at the same time and we could monitor how the teams responded to them. For each run during the third and fourth experimental sessions, all team members were monitored using the unobtrusive networked eyetracking system. Data consisted of all eye movements of each team member, pupil size for both left and right eye measured at 120 Hz, and a video overlay of eye movements on the simulation screen. Each simulation run lasted 20-30 minutes. Several unexpected problems were encountered during data collection. First, some team members assumed extreme positions to the left or right of the computer display, leaning heavily on one elbow as they looked at the screen. They were not viewable by the eyetracking cameras while they were doing so, and data were lost temporarily while they maintained this position. Second, a few of the officers were unable to read the very small print on the display and had to lean forward to within a few inches of the screen to read messages. The eyetracking cameras could not keep them in focus during these times and these data were also lost. Examples of workload results are shown in the following figures. Workload was measured by the Index of Cognitive Activity (ICA), a metric based on changes in pupil dilation [2, 5]. The ICA is computed every 30 seconds to show how workload changed over time during the scenarios. Figure 1 illustrates the difference in cumulative workload for two positions, SCC and ISR, on two different simulations, session 3 and session 4. Each graph shows the two scenarios for the SCC member of a team as well as the same two scenarios for the ISR member of the same team. Teams are identified by letter. For Teams B and D, ISR experienced higher workload than SCC throughout most of the scenario. The cumulative plots shown here begin to rise more steeply for ISR than SCC by the end of 5 minutes (10 observation points). It is interesting to note that the ISR Coordinators in Teams B and D experienced higher workload than the ISR Commander in Team G. A key objective of the study was to understand how the workload of the various team members changed when they had reduced assets and were forced to coordinate their activities. It was expected that workload would rise as assets were reduced. Figure 2 shows the results for three teams under the two conditions. Surprisingly, some SCCs had lower workload under the reduced-asset condition. This unexpected result was explained during the team’s follow-up discussion in which these officers volunteered that they had had difficulty keeping all the assets moving around efficiently under the full-asset condition. Thus, by reducing the number of assets they had to manage, they experienced lower workload even though they had to interact more with their team members.
What the Eyes Reveal: Measuring the Cognitive Workload of Teams
269
Fig. 1. Cumulative workload for three teams
Fig. 2. Original Scenario versus Reduced Assets Scenario: SCC (original is solid line and reduced is dotted line)
270
S.P. Marshall
4 Assessing Cognitive State This study examined several different psychophysiological measures of task difficulty and subject fatigue. Only the eyetracking data are described here. The study was led by Dr. Judith Orasanu, NASA Ames Research Center and involved collaboration between several research groups at NASA Ames and EyeTracking, Inc. The task required 5 team members to work together to solve a series of lunar search and recovery problems. It also allowed each individual to score points, so that the individual was working not only for the good of the team but was also trying to maximize his or her own points. Multiple versions of the task were employed and were presented in the following order: Run1 (Moderate), Run2 (Difficult), Run3 (Difficult), Run4 (Moderate), Run5 (Difficult), Run6 (Moderate). Eye data were recorded for three participants during six experimental runs, with each run lasting 75 minutes. The three participants were part of a larger 5-person team who were jointly tasked with manning 4 lunar vehicles plus the base station. We eyetracked two operators of lunar vehicles (code named RED and PURPLE) as well as the base operator (BLACK). They were tested six times over the course of a 24hour period during which time they were sleep deprived. Each participant worked in a separate small room and communicated with other team members through a common view of the lunar landscape on the computer display and through headsets. The head-mounted Networked Evaluation System (NES) was used in this study, with each participant undergoing a brief calibration prior to each run. The eye data and workload were then sent in real-time to the central processing CWAD server where all data were time stamped and synchronized for subsequent analysis. A large quantity of data was collected. For each participant on the experimental task of interest, we have a total of 450 minutes of data (6 runs x 75 minutes), which is 27,000 seconds or 13,500,000 individual time points (taken every 4 msec). The data were subsequently reduced to 1-minute intervals by averaging the variables across successive 60 seconds. Seven eye-data metrics were created: Index of Cognitive Activity (ICA) for both eyes, blink rates for both eyes, fixation rates for both eyes, and vergence. All variables were transformed by the hyperbolic tangent function to produce values ranging from -1 to +1. These seven metrics have been employed successfully in the past to examine the cognitive states of individuals in diverse situations including solving math problems, driving a car (simulator), and performing laparoscopic surgery. The six runs were performed by the subjects as three sets of two runs, with each set containing a moderate run and a difficult run. The first set occurred in the first few hours of the study when the subjects were not fatigued; the second set occurred under moderate levels of fatigue; and the third set occurred during the last few hours of the study when the subjects experienced severe levels of fatigue. A patented process based on linear discriminant function analysis was carried out for each subject in each of the three sets to determine whether the eye data were sufficient for predicting task difficulty. The first analysis compared Run 1 with Run 2 to determine if the eye metrics are sufficient for distinguishing between the two levels of task difficulty. The linear
What the Eyes Reveal: Measuring the Cognitive Workload of Teams
271
discriminant function analysis (LDFA) determined the linear functions that best separated the 1-minute time intervals (75 per run) into two distinct categories for each participant. Classification rates were very high, with 85%, 96%, and 100% success rates for BLACK, RED, and PURPLE respectively. The eye metrics clearly distinguish between the initial moderate and difficult scenario. It is possible to estimate from a single minute of performance whether the individual was carrying out the easier task or the more difficult one. The analysis of the middle set of runs (runs 3 and 4, made under moderate fatigue) also shows successful discrimination between the two levels of task difficulty, with success rates of 99%, 92%, and 90%. And, the analysis of the third set of runs (runs 5 and 6, made under extreme fatigue) shows similar but slightly lower success rates of 85%, 95%, and 86%. Looking across all three sets, it is evident that the eye metrics distinguish between the two levels of the scenario whether participants are alert (first set), moderately fatigued (second set), or very fatigued (third set). The lowest classification rate was 85%, meaning that the eye metrics correctly identified at least 85% of all minutes according to the scenario in which it occurred. It should be noted that all minutes of each scenario were included in these analyses, including initial minutes during which the scenarios presumably looked very similar to participants. The first set of analyses described above looked at task difficulty while holding fatigue constant. Similar analyses look at whether we can distinguish between little fatigue and extreme fatigue while holding task difficulty constant. Two analyses parallel those described above. The first fatigue analysis looked at levels of fatigue during two moderate runs. It compares the initial moderate run (Run1) with the final moderate run (Run6). The former was the run with least fatigue because it occurred first in the experimental study. The latter was presumably the run with the most fatigue because it occurred after participants had been sleep deprived for approximately 24 hours. LDFA classification rates for this analysis were 85%, 95%, and 86% for BLACK, RED, and PURPLE respectively. The second fatigue analysis looked at levels of fatigue during two difficult runs. Once again, the first difficult run (Run2) was contrasted with the final difficult run (Run5). Classifications rates here were 100%, 95%, and 100%. The eye metrics were extremely effective in detecting the difference between low and high fatigue states with near perfect classification across the all 1-minute intervals for all three participants on the challenging difficult runs. A final view of the data illustrates the importance of the Networked Evaluation System. The objective was to determine whether the participants experienced similar levels of workload during the tasks. For this analysis, it is critical that the data be synchronized so that we are comparing precisely the same time interval for every participant. Figure 3 shows the left and right ICA for the three participants across all six runs. These figures show that the ICA varies considerably within each run, peaking at various times and dropping at other times. These figures also show a dramatic impact
272
S.P. Marshall
Fig. 3. Left and right eye ICA across the entire six runs
of fatigue on the ICA (see for examples the fourth panel for BLACK, the last panel for RED and the last two panels for PURPLE). And, there are sizable differences between left and right eyes for all three participants. Each of the panels of Figure 3 could be expanded and mapped against the task details to determine what the participant was doing during periods of high and low workload. Figure 4 contains an annotated graph of the Right ICA for RED during Run1 (first panel of middle graph in Figure 3). This graph has a number of peaks and
What the Eyes Reveal: Measuring the Cognitive Workload of Teams
273
valleys. Eight peaks were selected for annotation using the screen video from the eyetracking session (audio was not available). For the most part, it is possible to determine from the video what the participant was doing, i.e., working with other team members to process a seismic monitor sensor, working alone to process other sensors, or navigating across the terrain. We assumed that the many steps required to process a seismic monitor required considerable cognitive processing and that moving in a straight line across the grid required very little cognitive processing. And that is what we observed here. As Figure 4 shows, most of the spikes correspond to times that RED was processing sensors, either alone or in tandem with other team members. Most of the time when she was simply moving from one location to the other the Right ICA was descending. (Some spikes are not labeled because the video did not provide sufficient evidence alone to be sure of the task she was attempting.) Thus, we are confident that the ICA can locate time periods that are more cognitively effortful for any participant. It should be kept in mind, however, that participants could have been processing information that is neither on the screen nor spoken by the team. In such instances, we might see active processing but not be able to trace its source.
Fig. 4. Annotated History of Run1 for RED
5 Summary The Networked Evaluation System worked very well in both environments described here. During both studies, it was possible to monitor the workload of the team members in real time as they performed their tasks. An obvious extension to NES
274
S.P. Marshall
would be to create some sort of alert that can inform either the team member directly or a supervisor when levels of workload are unacceptably high or low. Another option would be to have a direct link between NES and the operating system for the task. If the team member’s workload exceeded a defined threshold, the system could reduce the demands on the team member directly without supervisor intervention. Additional studies are now planned or underway in both environments and will provide more data about how NES can be implemented in real settings. Future studies will focus on how to time stamp automatically critical events for post hoc analyses and how to better capture and display task elements that correspond to high and low workload.
References 1. Marshall, S.: Identifying cognitive state from eye metrics. Aviation, Space, & Environmental Medicine 78(5), 165–175 (2007) 2. Marshall, S.: Measures of Attention and Cognitive Effort in Tactical Decision Making. In: Cook, M., Noyes, J., Masakowski, V. (eds.) Decision Making in Complex Environments, pp. 321–332. Ashgate Publishing, Aldershot (2007) 3. Marshall, S.P.: U.S. Patent No. 6,090,051. U.S. Patent & Trademark Office, Washington, DC (2000) 4. Marshall, S.P.: U.S. Patent No. 7,344,251. U.S. Patent & Trademark Office, Washington, DC (2008) 5. Weatherhead, J., Marshall, S.: From Disparate Sensors to a Unified Gauge: Bringing Them All Together. In: Proceedings of the 1st International Conference on Augmented Cognition, Las Vegas, NV, CD-ROM (2005)
User Behavior Mining for On-Line GUI Adaptation Wei Pan1,2 , Yiqiang Chen1 , and Junfa Liu1,2 1
Institute of Computing Technology, Chinese Academy of Sciences {panwei,yqchen,liujunfa}@ict.ac.cn 2 Graduate University of Chinese Academy of Sciences
Abstract. On-Line Graphics User Interface (GUI) Adaptation technology, which can predict and highlight user’s next operation in menu based graphics interface, is the key problem in next generation pervasive human computer interaction, especially for remote control device like Wiimote assisting TV interaction. In this paper, a hierarchical Markov model is proposed for mining and predicting user’s behavior from Wiimote control sequence. The modal can be on-line updated and highlight the next possible operation and then improve the system’s usability. We setup our experiments on asking several volunteers to manipulate one real education web site and its embedded media player. The results shows our modal can make their interaction with GUI more convenient when using Wii for remote control.
1
Introduction
Pervasive Computing is a popular concept and attracting more and more computer and communication experts. [Xu Guang-You:2007] defines it as an attempt to break the pattern paradigm of the traditional relationship between users and computational services by extending the computational interface into the user’s environment, namely the 3D physical space. Graphics User Interface(GUI) is a basic interaction interface in this environment. Lots device used for simplifying and improving the interaction style has been invented recently. Take Wii Remote([nitendo], or wiimote) as an instance, which was originally designed as a TV game controller, has a great potential of becoming a popular HCI device, especially for GUI([Lee, Kim:2008]). The advantages of wiimote assisting GUI should be attributed to its appealing characteristics —wireless connection, simple operation, and human engineering shape. Moreover, the acceleration sensors equipped inside provide more useful information about motions of the user, thus it is possible to develop more intelligent functionalities based on it compared to traditional mouses, like recognizing the user’s gestures [Kela:2006] and so on. Although wiimote greatly improves the user experience, there are several limitations when it is used as a remote mouse with a simple keyboard (just six keys). Let think about the process using the controller. We first choose the input focus and press “OK”to trigger the operation. For example, we can use the V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 275–284, 2009. c Springer-Verlag Berlin Heidelberg 2009
276
W. Pan, Y. Chen, and J. Liu
device to run a media player, pause, or change the video, etc. This kind of interaction would be very useful and attractive when demonstrating in public or in class. However, the survey from volunteers indicates that some operations are not as convenient as expected. For instance, tedious operations of “tab” must be pressed until we reach the right menu we want. In this paper, we propose a solution to improve the usability of the interaction modal. We address the problem of decreasing redundant operations by predicting the user’s next possible operation in advance. While everyone has some general action habits, we aim to mine them out from history data to assist wiimote operation. We employ a hierarchy Markov to model wiimote control sequence. The predicted result using the model serve as a recommendation of next operation to the user to augment the usability and reduce unnecessary operations. We also setup our experiments on an educational web site. The rest of this paper is organized as follows. Section 2 gives out some related work. The system architecture, as well as our behavior mining model is introduced in Section 3. In section 4, we describe the experimental results and give the evaluation. We conclude the work and propose the future work in the last section.
2
Related Work
Our wiimote controlling system is a kind of intelligent user interface, which has been discussed for years and some demo systems are already in use. [Hook:2000] discussed systemically four problems to overcome on the road to intelligent user interface: – usability principle for intelligent interface – reliable and cost-efficient intelligent user interface development method – a better understanding of how and when intelligence can substantially improve the interaction – authoring tools to maintenance the intelligent part of the system There are also several works proposed to handle these challenges. [Lieberman:1995] explored a suggestion tool in web browsing, where some links that may be potentially interesting to users were provided to realize a quicker search. [Insung:2007] introduced an intelligent agent in a health-care system used for self-governing and self-customized by making wise decisions. These are early tries on intelligence interface, but not suitable for new interaction model.[Kela:2006] introducing a new convenient input style, which may be very useful in the pervasive computing environment. In this paper, We sufficiently study the character of wiimote assisting GUI and propose a solution to improve its intelligence. [Etzioni, Weld:1994] is a good example of modeling based on pre-programmed rules. Similar works can be found in [Lester:2000] and [Virvor, Kabassi:2002]. Rules are very useful, however its defect is also obvious—rules are hard to be update online. Especially, powerful rules are hard to be built for wiimote operation sequence. Considering habits may change, on-line updating is indispensable.
User Behavior Mining for On-Line GUI Adaptation
277
In this paper, we setup a hierarchical Markov model to simulate the user habits, and introduce several updating algorithm for model adaptation.
3 3.1
Proposed System Architecture System Architecture
Fig. 1 illustrates the architecture of our recommendation system, which is consisted of four main modules. The user interacts with the system through a wiimote, and then the system responds by making a recommendation about next operation, which is predicted by the behavior mining model. If the user finds the recommendation useful, he/she need to press “OK” to the next step, or choose the one needed. The system will gather the wrong recommendations automatically for further study. There is also a module called Updating Strategy, which can update our model based on the latest collected data. Fig. 2 is the operation interface of an educational web site for primer school study, and our experiments are carried out on it. The white device on the right is a picture of wiimote. 3.2
Model User Behavior
Model module is the most important part of the system in Fig. 1. Since the user’s behaviors on the web site is a sequence of click actions, taking the web
Fig. 1. Architecture of Recommendation System
278
W. Pan, Y. Chen, and J. Liu
Fig. 2. Web Site where and the device We do the Experiment
Fig. 3. Hierarchical Markov Chain
site hierarchy into account, we model it using a hierarchical Markov chain. [Meyn and Tweedie:2005] discussed Markov Model theory in detail, and [Fine:98] proposed hierarchical hidden markov model similar to the model we will discuss in this paper. Fig. 3 explain the structure of a hierarchical Markov model. Gray lines shows vertical transitions. The horizontal transitions are shown as black lines. The light gray circles are the internal states and the dark gray circles are the terminal states that returns control to the activating state. Let T represent the transition matrix. Suppose we have N statuses (operations) in the system, the size of T will be N ×N . Each element in T gives the probability of transition from one status to another. For example, the ith row gives the transition
User Behavior Mining for On-Line GUI Adaptation
279
probability from status i to all the other statuses. T [i][j] = k, (0 ≤ k ≤ 1) means the status i can be followed by status j with the probability k. Apply it to the wiimote operation sequence. The top node indicates the entrance to the web site, the second level gathers three nodes representing three main modules of the web site. The third level contains more detail functional nodes provided by the web site. Lines among nodes represent possible operation ways. The nodes in the same level has the lines connected between each other. Each node has some child nodes in the next level with lines connected. 3.3
On-Line Adaptation
It is apparent that new user behavior data should be utilized to improve the accuracy of our algorithm. A model with online updating will change itself with the real-world environment to be adaptive and intelligent. In our experiment we aim to update one or more transition matrixes. In this paper, we propose two alternative ways based on transition matrix: no history data, and constant number history data. The first one, we just use the latest data to update the transition matrixes. Suppose we get a new instance user behavior data, say, from node i to node j. If we successfully predict node j, we can remain T unchanged. But if next node we predict is node k, then we should update T by the formula below. T [i][k] = T [i][k] + δ T [i][j] = T [i][j] − δ
(1)
The value of δ is the critical task, which can be determined by some experiments. The second alternative method, we can preserve a constant number of history data in the system, say N , as a database. When we receive new K instances, just replace them with the oldest K instances. Then we can rebuild the model by the new data with endurable computing time.
4 4.1
Experiments Implement System
We setup our experiment on an education web site, which is a typical GUI based system, see Fig. 2. We will assign each node in the system with a unique indicator. Nodes in one level can be classified and share the same transition matrix. We make the prediction by two steps. Firstly, retrieve the user position in the system, say l. Then the system will predict the next most possible operation based on the transition matrix Tl . Updating of the model can be done based the new gathered wiimote control sequence. 4.2
Data Collection
First, we analyze the the web site and pick out all the possible operations, resulting in 72 nodes. Then we classify them into 5 classes, which means we will create 5
280
W. Pan, Y. Chen, and J. Liu
transition matrix in total. In our experiment, we employ 100 volunteers to use the education web site, each one is asked to visit the web site 20 times. And we get about 100 × 20 = 2000 groups of behavior series. We randomly choose 80% of the these data as a training data set, and the rest as the testing data set. 4.3
Experimental Results
First of all, we train the transition matrix on the training data set. Then test the transition matrix on training data set and testing data set respectively. The result of the Hierarchical Markov Model is shown in Fig.4. The blue line represents the prediction accuracy on the training data set, while the red line represents the prediction accuracy on the testing data set. It is very easy to find out that prediction accuracy on the original training data set is better than that on the testing data set. We give the mean and variance statistics in Table. 1. According to the table, the mean prediction of the testing data is nearly 70%, while the we have about 75% on the original data. That both of their variances are very small implies that the prediction result is stable. Following, we exam the first updating algorithm —updating the model with constant history data set. In our experiment, we choose 50 user’s behavior series stored in the system. Once we receive new instance of user’s behavior, we replace the oldest data with new gathered data. Model is updated based on the new history data set. Fig.5 gives the testing result. We can find that the prediction accuracy is a little better than the model without any updating. We
Predict User Behavior Using Hierarachical Markov Model Without Updating 1 Prediction Accuracy on Training Data Prediction Accuracy on Testing Data 0.9
Prediction Accurency
0.8
0.7
0.6
0.5
0.4
0
10
20
30 UserBehavior Data Series
40
50
60
Fig. 4. Test Accuracy of the Hierarchical Markov Model Without out Updating Table 1. Some Statistics of the Prediction Result over all the models Model TrainingMean TrainingVariance TestingMean TestingVariance ModelWithoutUpdating 0.7466 0.0137 0.6903 0.0225 ModelUpdatedWithConstData 0.7227 0.0197 0.7134 0.0181 ModelUpdtedWithDelta(0.2) 0.7373 0.0164 0.7129 0.0203
User Behavior Mining for On-Line GUI Adaptation
281
Predict User Behavior Using Hierarachical Markov Model Without Updating 1 Prediction Accuracy on Training Data Prediction Accuracy on Testing Data
0.9
Prediction Accurency
0.8
0.7
0.6
0.5
0.4
0
10
20
30 UserBehavior Data Series
40
50
60
Fig. 5. Test Accuracy of the Hierarchical Markov Model With Const History Data
Mean Prediction Accurency over Delta 0.716 Mean Prediction Accurency 0.714
Mean Prediction Accurency
0.712 0.71 0.708 0.706 0.704 0.702 0.7 0.698 0.696
0
5
10
15
20
25
30
35
40
45
Delta
Fig. 6. Test Accuracy over δ
Predict User Behavior Using Hierarachical Markov Model With Fixed Delta: 0.2 1 Prediction Accuracy on Training Data Prediction Accuracy on Testing Data 0.9
Prediction Accurency
0.8
0.7
0.6
0.5
0.4
0
10
20 30 40 UserBehavior Data Series
50
60
Fig. 7. Prediction Accuracy when δ is 0.2
282
W. Pan, Y. Chen, and J. Liu Testing and Training Accuracy of ELM 0.65 Prediction Accuracy on Testing Data Prediction Accuracy on Training Data 0.6
Prediction Accurency
0.55
0.5
0.45
0.4
0.35
0
5
10
15
20 25 30 UserBehavior Data Series
35
40
45
50
(a) Experiment Result Using ELM Testing and Training Accuracy of Markov Model 0.78 Prediction Accuracy on Testing Data Prediction Accuracy on Training Data 0.76
Prediction Accurency
0.74
0.72
0.7
0.68
0.66
0.64
0
10
20 30 UserBehavior Data Series
40
50
(b) Experiment Result Using Hierarchical Markov Model
Fig. 8. Comparison Between ELM([G.-B. Huang:2006]) and Hierarchical Markov Model
also notice that prediction on some instance is rather low, even below 50%. It will be discussed in Section 5. Another updating algorithm seems to be quite simple, for it need not recompute all the history data set and just update the model by the newest received data. In this method, one of the core problem is how to decide δ. Fig.6 shows the result of testing δ from 0.1 to 0.3 with a step of 0.055. According to the figure, there will be a higher prediction accuracy when δ ∈ 0.1 ∼ 0.2. Larger or smaller value will cause over-computing and decrease the prediction accuracy. In Fig.7, we choose δ = 0.2 to make a model according to the third solution. Table. 1 gives all results of these models. All of them have the mean accuracy about 70%, and the variances are also quite small(smaller than 0.03). In this viewpoint, we may come to a conclusion that the original hierarchical Markov
User Behavior Mining for On-Line GUI Adaptation
283
model is good enough, and other improvement tries make little contribution to the prediction result. Compared to other models, such as neuron network, hierarchical Markov model is more suitable for the user behavior model. Fig.8 gives a typical comparison. It is obvious that our model (72%) is much better than neuron network (35%). This result should be ascribed to the similarity between the structure of wiimote operation sequence and hierarchical markov model, which is the weak point of neuron network.
5
Conclusion and Future Work
In our experiment, we ask the volunteers some questions, such as wether it improve the usability, whether we bring in new inconvenience in operation. Most of them think the recommendation provided by the system is helpful. However, some argue that, if they do not operate normally, the prediction is often wrong. It is because the system is trained for normal user behaviors, and once it encounters some abnormal operation series, the reaction would not meet the user’s requirement. One of the solutions is provide two or more recommendation options. Another question is that which recommendation style is appreciate. Here we skip the input focus into the recommendation operation menu item. Most of the volunteers think this is helpful, and some of them suggest using pop-up dialogs to provide two or more recommendation. These advices are useful for improving our system in the future. Devices like wiimote empower GUI. The model with on-line adaptation we setup provides a solution of intelligence interface, which is the main character of next generation pervasive computing. It greatly enhances the user operation experience. The future of pervasive computing HCI should absorb advances from such exciting tries.
Acknowledgements We would like to thank Juan Liu for data collection and all of volunteers. This work is support of the National HighTechnology Research and Development Program(“863”Program) of China(Grant No.2007AA01Z305) and National Natural Science Funds(Grant No.60775027).
References [Kallio:2006]
[Crampton:2007]
Kallio, S., Kela, J., et al.: User independent gesture interaction for small handheld devices. International Journal of Pattern Recognition and Artificial Intelligence 20(4), 505–524 (2006) Crampton, N., Fox, K., et al.: Dance, Dance Evolution.:Accelerometer Sensor Networks as Input to Video Games. Haptic, Audio and Visual Environments and Games (2007)
284
W. Pan, Y. Chen, and J. Liu
[Kela:2006]
[Hook:2000]
[Insung:2007] [Lester:2000]
[Lieberman:1995] [Virvor, Kabassi:2002]
[Etzioni, Weld:1994] [nitendo] [Lee, Kim:2008]
[Xu Guang-You:2007] [Meyn and Tweedie:2005] [G.-B. Huang:2006]
[Fine:98]
Kela, J., Korpipaa, P., et al.: Accelerometer-based gesture control for a design environment. Personal and Ubiquitous Computing 10(5), 285–299 (2006) Hook, K.: Steps to take before intelligent user interfaces become real. Interacting with Computers 12(4), 409–426 (2000) Insung Jung, D.T., Wang, G.-N.: Intelligent Agent Based Graphic User Interface (GUI) for e-Physician (2007) Lester, W.L., Johnson, J.W., Rickel, J.C.: Animated Pedagogical Agents: Face-to-Face Interaction in Interactive Learning Environments (2000) Lieberman, H.: Letizia: An Agent That Assists Web Browsing (1995) Virvou, M., Kabassi, K.: Rendering the interaction more human-like in an intelligent GUI. Information Technology Interfaces (2002) Etzioni, O., Weld, D.: A Softbot-Based Interface to the Internet. Communications of the ACM (1994) http://www.nintendo.com Lee, H.-J., Kim, H., et al.: WiiArts:Creating collaborative art experience with WiiRemote interaction. In: Proceeding of the Second International Conference on Tangible and Embedded Interaction (2008) Guang-You, X., Li-Mi, T., et al.: Human Computer Interaction for Ubiquitous/Pervasive Computing Mode (2007) Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Cambridge University Press, Cambridge (2005) Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme Learning Machine: Theory and Applications. Neurocomputing 70, 489–501 (2006) Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden markov model:Analysis and applications, 41–62 (1998)
Modeling Human Actors in an Intelligent Automated Warehouse Davy Preuveneers and Yolande Berbers Department of Computer Science, K.U. Leuven Celestijnenlaan 200A, B-3001 Leuven, Belgium {Davy.Preuveneers,Yolande.Berbers}@cs.kuleuven.be
Abstract. Warehouse automation has progressed at a rapid pace over the last decade. While the tendency has been to implement fully automated solutions, most warehouses today exist as a mixture of manually operated and fully automated material handling sections. In such a hybrid warehouse, men and machines move around goods in between sections in order to retrieve, transport and stack goods according to their nature and quantity. The biggest challenge in hybrid warehouses is to optimize the alignment of manual and automatic processes in order to improve the flow of materials between storage areas and distribution centers. Integrating individuals as human actors in an automation solution is not straightforward due to unpredictable human behavior. In this paper, we will investigate how we can model the characteristics of human actors within an automation solution and how software systems can unify human actors with automated business processes to coordinate both as first class entities for logistics activities within a hybrid warehouse.
1 Introduction Warehouses come in different sizes and shapes, but they are all used for the receipt, storage, retrieval and timely dispatch of a variety of goods. To ensure that productivity targets are met and to maximize the manufacturing floor space, warehouse managers often rely for repetitive material handling processes on automatic guided vehicles (AGVs) [1], automated storage and retrieval systems, and on conveyor and sorting systems. For other tasks that require certain creativity, human actors are indispensable. In hybrid warehouses, manual and automated material handling processes are intertwined. Within a hybrid warehouse, it is possible that for a single purchase order, goods from different storage areas need to be collected and consolidated. Fig. 1 shows a manual task that is assigned to a human actor, in this case an individual driving a fork lift to transport goods to their designated area. Fig. 2 shows an example of an automated storage and retrieval system. To ensure a smooth product flow within the warehouse, manual and automated material handling processes need to be properly aligned. However, for warehouse managers it is not evident to integrate human actors into an automation process, because human behavior is far from being predictable and people make mistakes more easily. To circumvent problems that may arise during the harmonization or integration of human actors within automated warehouse systems, V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 285–294, 2009. © Springer-Verlag Berlin Heidelberg 2009
286
D. Preuveneers and Y. Berbers
Fig. 1. Human-driven material handling
Fig. 2. Automated storage and retrieval
warehouse managers often decouple the warehouse into a manually operated and a fully automated storage area. Because automated and manual material handling processes are fundamentally different, their supporting software systems - the Warehouse Management System (WMS) - are often developed with a different background. In many cases both types of warehouse management systems independently do location allocation and transport planning [2] for the goods within their area. Integration is often limited to a high-level coupling between both systems at the Enterprise Resource Planning (ERP) level. Because synchronizing manual and automated systems is hard, hybrid warehouses suffer from significant inefficiencies and suboptimal throughput of the hybrid warehouse due to the extra buffers that are often introduced as a workaround to deal with this performance loss. A global approach to an intelligent hybrid warehouse where manual and automated processes are considered and optimized as a whole could lead to an improvement of the Total Cost of Ownership (TCO) of a hybrid warehouse. The fundamental problem that needs to be addressed to achieve this goal is the lack of modeling and software support to incorporate human actors as first class entities within a warehouse automation process. In this paper we will first investigate how we can model human behavior by identifying the characteristics of a human actor within logistics systems, which tasks human actors fulfill and which properties are of importance. Secondly, we will investigate how we can explicitly model expectation patterns for more complex jobs in order to address the possibility of and the response to unexpected human behavior. A last aspect we address in this paper is a mapping of this human behavior model on a software architecture to fully support human actors within a hybrid warehouse management system. In section 2, we discuss the role and characteristics of a human actor in a hybrid warehouse. We present our modeling support for human actors in a material handling
Modeling Human Actors in an Intelligent Automated Warehouse
287
process in section 3. We discuss our initial mapping of this model onto a software system in section 4. An overview of our contributions, the conclusion and opportunities for future work are presented in section 5.
2 Human Actors in a Hybrid Warehouse In a hybrid warehouse, some of the material handling processes can be automated while others have to be carried out manually. For both types of activities a Warehouse Management System will assign a specific logic [3] to the various combinations of order, items, quantities and locations. Such a logic either optimizes space utilization (e.g. pick-to-clear logic), number of movements (e.g. fewest locations logic), travel times (e.g. nearest location logic), etc. For any of these transportation logics, the consolidation of a customer order would result in a sequence of transport operations of an amount of products at a given location. For larger orders, multiple human actors and automated systems can work on the same task list, which may require some synchronization between the different transport activities. But while automated storage and retrieval systems are capable of storing and consolidating goods at a given and fixed rate, the rate at which human actors can transfer goods is less predictable. In brief, integrating human actors in a hybrid warehouse raises a few concerns:
Choice versus Time Constraints: Human actors have some autonomy to handle more complex jobs, because some decisions are better made by people due to their flexibility, intuition and wisdom. To reduce the duration variability of such jobs, we need to balance the number of options given to human actors and the processing time to complete the job. Indeterminism: Human actors can behave in unforeseen ways, such as performing tasks out of order according to the sequence of tasks they were assigned. The challenge here is to monitor the overall effect of human tasks and to take decisions whenever human actors do not behave as expected. Roles and Responsibilities: The role and responsibilities of human actors participating in a process help to define interchangeable human resources. We must also describe how the different human actors can collaborate and synchronize with each other while the material handling process progresses.
2.1 Choice versus Time Constraints Human actors can introduce unexpected delays in the material handling process. The main reason is that human actors usually perform their tasks slower than automated facilities, but also because they make mistakes more easily. For example, a fork lift driver may waste time if he needs to go and find goods mistakenly placed elsewhere if a given pellet turns out to have fewer items than expected. However, one cannot automate every single task, either because they are too complex to automate, or they require human expertise or intuition to handle a particular product. The main concern here is to find a way to minimize the human impact on the overall processing time. A first alienating approach transforms human actors into robots: (1) reduce the number of possible actions to a minimum; (2) make sure that human actors do not have to take
288
D. Preuveneers and Y. Berbers
any decision and that they always know what to do next. Unfortunately, this would result in an inflexible sequence of tasks to be executed within a fixed time frame without the ability for an operator to solve problems (e.g. broken vehicles, missing items or other unexpected delays) without jeopardizing the order of the task list. Instead, to maintain a certain level of job satisfaction, we want to give human actors some autonomy to make choices and take decisions, and investigate how dynamic decision models [4] to help estimate delays in order to take appropriate actions when deadlines expire, such as exception handling, sending reminders. Helping human actors to prioritize their task is another approach to reduce these delays. In brief, we have to find a balance between the autonomy of human actors making choices and the delays this autonomy brings. 2.2 Indeterminism Autonomy does not only bring delay, it also brings indeterminism. Indeterminism means that given the same initial conditions a human actor does not always behave in the same way. He may perform unexpected actions or carry out actions in an unexpected order. For example, some operators may cancel a picking job if a pellet contains fewer items than advertised, while others may suspend the job, replenish the good from a reserve storage location, and then resume the original job with picking the number of items needed. Since we want to give autonomy to the human actors, we cannot prevent this indeterminism at all times. However, we can try to prevent it as much as possible, and try to compensate its effects when we cannot avoid it. Preventing indeterminism means that we must describe the allowed degrees of freedom a human operator has to handle a batch of tasks. As a result, we must monitor such tasks assigned to human actors and detect approaching deadlines and expired deadlines. Reminders can be sent when a deadline is approaching. Escalation [5] can be triggered when a deadline expires or when an error or exception occurs. Escalation means that a person with a higher level of responsibility is notified that a deadline expired. Escalation may also transfer the responsibility for a task to another operator. 2.3 Roles and Responsibilities If we want to integrate human actors in automated business processes we must be able to define the roles and the responsibilities of each actor that participates in a process. A high-level overview of material handling activities in a warehouse business process is shown in Fig. 3. Some of these activities can be carried out by both automated systems and by individuals. Roles and responsibilities are assigned to groups for each of these material handling activities. A business process usually involves several participants, some of which may be human actors, others can be automated systems. For example, in a warehouse, there may be groups for truck drivers, automatic guided vehicles, fork lift drivers, packers, conveyors, order managers, automated storage and retrieval systems, etc. Additionally to the roles of their respective groups, human actors can be granted other roles to help define interchangeability of human and systems resources within a warehouse business process.
Modeling Human Actors in an Intelligent Automated Warehouse
289
Fig. 3. Simplified schematic overview of material handling activities
3 Modeling Human Behavior and Expectation Patterns In recent years many researchers have proposed various task models [6-8] to describe human activities and related aspects for various domains. Our focus is more oriented to business process modeling. Although there are many business modeling methods, no well established modeling standard is available in the area of hybrid warehouses. Our aim is to design a model that can be easily mapped onto business process standardization efforts in the area of integrating people in service oriented architectures. The basic concepts of our model are depicted in a schematic way in Fig. 4. 3.1 Human Transportation Tasks and Activities Human transportation tasks (picking, replenishing, putting away, etc.) within a hybrid warehouse have a life cycle with states that are independent of the logic that the WMS uses to decide exactly which location to pick from, replenish from/to, and putaway to, and in what sequence these tasks should occur. The life-cycle can be described with the following states:
Unclaimed: The transportation task is available for designation Claimed: The transportation task is assigned to an individual Started: The transportation task is in progress Finished: The transportation task has finished Error: The individual provided fault data and failed the task
In fact, these states are typical for transportation tasks for both humans and automated systems. However, humans are often also involved in other activities that cannot be classified as transportation tasks which have been fully planned in advance: • Wait or Delay tasks: This task represents a lack of activity. One has to wait until a certain condition with respect to the product flow is met. This task can be unplanned or planned. For example, a truck driver may have to wait for a confirmation of an order manager. Finding a realistic distribution of the time of delay of this task is fundamental. • Off-tasks: Off-tasks are typically human and are not related to the product flow or the material handling processes. Such tasks may include having a coffee, responding to a telephone call, going to the bath room, etc. It is hard to estimate if and
290
D. Preuveneers and Y. Berbers
when these tasks take place, because their occurrence is often unknown in advance, but they can be modeled as delay tasks with possible zero delay. • Escalation tasks: This task does relate to the product flow. If a start or a completion deadline of an ordinary transportation is missed or an error occurs, this may trigger one or more escalation actions that, for example, reassign the transportation task to another participant or handle the exception. • Compensation tasks: This task undoes the effects of partially completed activities to ensure that an order is either fully completed or not carried out at all.
Fig. 4. UML class diagram of the basic concepts of the model
Being able to accurately represent them in a human behavior model is fundamental when manual and automated processes are considered and optimized as a whole. For transportation tasks, we will focus on order picking because a warehouse generally has more outbound transactions than inbound transactions, so being able to quickly and accurately process customer orders is essential to increase customer satisfaction. However, conceptually there is not much difference with the other transport tasks (putting, away, replenishing, cross-docking, etc.).
Modeling Human Actors in an Intelligent Automated Warehouse
291
3.2 Defining and Modeling Expectation Patterns People are capable of juggling many tasks at once. This flexible behavior of humans is an advantage for hybrid warehouses, but the disadvantage of multitasking is that people are not interchangeable resources the way automatic guided vehicles and automated storage and retrieval systems are. When humans are in control of certain material handling processes, a single individual can be assigned a set of tasks, or multiple individuals can work in parallel on a single task. The order in which these tasks are executed can matter if we want to reduce delays in the material handling process. In order to model how a collection of transportation tasks are expected to be executed and synchronized, we need to formalize how one task can relate to another. We will define these structured tasks with expectation patterns that describe how a Warehouse Management System would expect a collection of tasks to be executed (see Fig. 5): • Sequence: A Sequence pattern expresses a collection of tasks that is to be performed sequentially and with a specific order. • Spawn: All transportation tasks are executed concurrently. The Spawn pattern completes as soon as all the tasks have been scheduled for execution. • Spawn-Merge: All tasks are executed concurrently with barrier synchronization to ensure that tasks are not executed out of order by different participants. I.e. the Spawn-Merge pattern completes if all tasks have completed. • Any-Order: The Any-Order pattern is used when the order of the tasks is of no importance as long as they do not overlap. The Any-Order pattern completes when all tasks have completed.
Fig. 5. Modeling expectation patterns for picking tasks
292
D. Preuveneers and Y. Berbers
Obviously these patterns can be combined. For example, the Spawn and SpawnMerge patterns can be combined to define tasks with partial synchronization. In a following section, we will provide an example how this could be used to align manually operated and fully automated transportation tasks. However, the same patterns can be used to model constraints between different customer orders. For example, if different orders require the same type of product, and relevant pellets can only be accessed by one human fork lift operator at a time, then it is best to describe these picking tasks with the Any Order pattern, so that these tasks are not executed in parallel. Explicitly modeling these constraints helps to identify delays in the workflow more easily. Other control flow constructs, such as If-Then-Else and Iteration (not shown in Fig. 5) are used to support conditional execution of tasks and repeated execution of a task in a structured loop.
4 Example Scenario and Implementation To incorporate human actors as first class entities in an automation solution, it is important that already during the modeling of the automation process the role of the human actor can be correctly described. See Fig. 6 for an example scenario of aligning human and automated transportation tasks. Each customer order is translated into a set of transportation tasks (A, B, C, D, E and F) that are either carried out by either human or automated operators. Each human task is carried out by someone with a certain role or responsibility. Human tasks B and C could be collecting in parallel smaller items that are combined into a Spawn-Merge pattern, which is synchronized at task F that could be delivering these items at a designated drop zone for shipping to the client. This pattern is combined with tasks A and F into a Sequence pattern. Because this sequence of tasks is aligned with a sequence of automated transportation tasks D and E (of which the completion time can be accurately estimated), the last human task F in the first sequence has a start/end deadline attached to it with an Escalation task that is activated when the deadline is not met. One of the results could be the activation of a compensating task. For example, if task C would be the picking of hazardous products or goods that can perish, the original transportation task may have to be undone to store the goods again at a place where they can be preserved safely. The proposed model is kept simple in order to keep it intuitive for both technical users and business users, but also to simplify the mapping onto software systems that monitor these tasks. For the implementation of the different types of tasks and the expectation patterns in model, we map our constructs to similar representations within the Business Process Modeling Notation (BPMN) [9]. BPMN already proposes a generic graphical solution to model tasks, events and workflows with diagrams, but is however more complex and less intuitive than the model we proposed and lacks a few concepts to easily model warehouse related aspects of a task. The reason for this approach is that we can leverage software tools that can map BPMN to software systems that assist with the monitoring and the coordination of these tasks. We use techniques similar to those described in [10] in order to transform BPMN process models to Business Process Execution Language (BPEL) web services. The main advantage of mapping a workflow of business processes to an equivalent workflow of software service is, whenever a warehouse manager changes its business process, he just needs
Modeling Human Actors in an Intelligent Automated Warehouse
293
Fig. 6. A simple scenario of aligned human and automated transportation tasks
to adapt parameters within our model and the necessary translations to BPMN and BPEL will happen accordingly. BPEL has earned it merits in service oriented architectures which try to uncouple software services from one to the other. For warehouses this would mean that it would be easier to change the process and product flows, but this still needs to be investigated.
5 Conclusions and Future Work The model presented in this paper arose from the real need to be able to integrate individuals as human actors into the product workflow of a warehouse and being able to deal with unpredictable human behavior. Therefore, the focus of our model was more oriented to mapping human actors within established business process practices, rather than focusing on theoretical aspects of task models in general. As a result, we have proposed a simple but intuitive dedicated model that captures several characteristics of transportation tasks, and that addresses the key concerns on choice vs. time constraints, indeterminism in the task flow, and role and responsibilities of each participant in the material handling process. Modeling concepts are included to describe the state and other properties of a transport task and how such a task can relate to nontransportation tasks. In order to better align human tasks with tasks carried out by automated systems, we included concepts to express how a batch of tasks is expected to be executed. This is important whenever multiple individuals and automated systems need to synchronize their activities while completing the consolidation of a single customer order. Nonetheless, some of the aspects need to be further investigated. For example, it is currently unclear how to best model and coordinate escalation tasks when both human actors and automated systems are involved (to make sure that escalation activities will not fail on their own that easily as well). We intend to continue our efforts on leveraging results achieved in the web services community, especially for integrating human tasks in web services orchestrations where two complementary standards BPEL4People [11] and WS-Human Task [12] have been
294
D. Preuveneers and Y. Berbers
proposed. The specifications are evaluated in [13-14]. Some of the observations were that both proposals provide a broad range of ways in which human resources can be represented and grouped, that there are a number of distinct ways in which manual tasks undertaken by human resources can be implemented, but that shortcomings do exist, for example, to enforce separation of duty constraints in BPEL4People processes. We will investigate how these specifications can be used or augmented specifically for the coordination of activities within hybrid warehouses.
References 1. Burkard, R.E., Fruhwirth, B., Rote, G.: Vehicle Routing in an Automated Warehouse Analysis and Optimization. Technical Report 174, Graz (1991) 2. Lashine, S.H., Fattouh, M., Issa, A.: Location/allocation and routing decisions in supply chain network design. Journal of Modelling in Management 1, 173–183 (2006) 3. Piasecki, D.: Warehouse Management Systems (WMS) (2006) 4. Diederich, A.: Dynamic Stochastic Models for Decision Making under Time Constraints. Journal of Mathematical Psychology 41, 260–274 (1997) 5. Panagos, E., Rabinovich, M.: Escalations in workflow management systems (1996) 6. Verpoorten, K., Luyten, K., Coninx, K.: Task-Based Prediction of Interaction Patterns for Ambient Intelligence Environments. In: [15], pp. 1216–1225 7. Giersich, M., Forbrig, P., Fuchs, G., Kirste, T., Reichart, D., Schumann, H.: Towards an Integrated Approach for Task Modeling and Human Behavior Recognition. In: [15], pp. 1109–1118. 8. Winckler, M., Johnson, H., Palanque, P.A. (eds.): TAMODIA 2007. LNCS, vol. 4849. Springer, Heidelberg (2007) 9. Wohed, P., van der Aalst, W., Dumas, M., Hofstede, A.T., Russell, N.: On the Suitability of BPMN for Business Process Modelling. In: Dustdar, S., Fiadeiro, J.L., Sheth, A.P. (eds.) BPM 2006. LNCS, vol. 4102, pp. 161–176. Springer, Heidelberg (2006) 10. Ouyang, C., Dumas, M., van der Aalst Ter, W.M.P.: From BPMN Process Models to BPEL Web Services, pp. 285–292 (2006) 11. Agrawal, et al.: WS-BPEL Extension for People (BPEL4People), Version 1.0 (2007) 12. Agrawal, et al.: Web Services Human Task (WS-HumanTask), Version 1.0 (2007) 13. Russell, N., van der Aals, W.M.: Evaluation of the BPEL4People and WS-HumanTask Extensions to WS-BPEL 2.0 using the Workflow Resource Patterns. Technical Report BPM07-10 (2007) 14. Mendling, J., Ploesser, K., Strembeck, M.: Specifying Separation of Duty Constraints in BPEL4People Processes. In: Business Information Systems, 11th International Conference, BIS 2008, Innsbruck, Austria, pp. 273–284. Springer, Heidelberg (2008) 15. Jacko, J.A. (ed.): HCI 2007. LNCS, vol. 4550. Springer, Heidelberg (2007)
Bridging the Gap between HCI and DHM: The Modeling of Spatial Awareness within a Cognitive Architecture Bryan Robbins1,2, Daniel Carruth1, and Alexander Morais1,2 1
Human Factors and Ergonomics Group, Center for Advanced Vehicular Systems, Box 5405, Mississippi State, MS 39762-5405 2 Department of Computer Science and Engineering, Mississippi State University, Box 9637, Mississippi State, MS 39762-9637 {bryanr,dwc2,amorais}@cavs.msstate.edu
Abstract. In multiple investigations of human performance on natural tasks in three-dimensional (3D) environments, we have found that a sense of space is necessary for accurate modeling of human perception and motor planning. In previous work, we developed ACT-R/DHM, a modification of the ACT-R cognitive architecture with specific extensions for integration with 3D environments. ACT-R/DHM could leverage existing extensions from the ACTR community that implement the spatial sense, but current research seems to indicate that an “egocentric-first” approach is most appropriate. We describe the implementation of a custom spatial module in ACT-R/DHM, which allows for the consideration of spatial locations by adding a single ACT-R module that performs a very small set of operations on existing location information. We demonstrate the use of the 3D, egocentric-first spatial module to simulate a machine interaction task. Keywords: Digital Human Modeling, Human Performance Modeling, Spatial Cognition, Cognitive Modeling, Cognio simulatetive Architecture, ACTR/DHM, ACT-R.
1 Introduction The interdisciplinary field of Digital Human Modeling (DHM) has much to gain from integration efforts. As DHM research continues to realize the need for the simulation of human cognition, cognitive architectures, as first defined by Newell [1] and now implemented by many [2, 3, 4] seem to be a logical next step in integration efforts. However, many cognitive architectures, because of their heritage in Human-Computer Interaction (HCI) research, provide only marginal support for the consideration of the three-dimensional (3D) virtual environments common in DHM applications. The consideration of the human sense of space (or “spatial sense”) is critical in DHM applications, but does not play a vital role in HCI, and thus is not a strong component of existing cognitive modeling architectures. In previous work [5, 6], the ACT-R cognitive architecture [2] has been extended for use with DHM research as ACT-R/DHM. The goal of ACT-R/DHM is to leverage V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 295–304, 2009. © Springer-Verlag Berlin Heidelberg 2009
296
B. Robbins, D. Carruth, and A. Morais
the ACT-R modeling architecture’s theory of cognition and its decades of development, improvement, and validation for the purposes of DHM research by adding theory-based and architecturally consistent extensions. To-date, ACT-R/DHM has extended the existing visual and motor modules of ACT-R and added a kinesthetic and proprioceptive (KP) module in addition to the spatial module described herein. 1.1 The ACT-R Theory and Implementation Before elaborating on the implementation of the spatial sense and other extensions of ACT-R/DHM, we describe the original ACT-R theory [2] and its implementation in the ACT-R 6 software. ACT-R’s model of knowledge is based on a separation between declarative memory (facts) and procedural memory (actions). Models of human sensory and perception systems convert features of the external environment to an internal representation suitable for processing. Strictly typed “chunks” of information serve as the basic building block for ACT-R’s internal representation. Chunks of declarative memory are manipulated by action elements of rules in procedural memory. The central core of the ACT-R software implements a small number of critical cognitive functions. Additional modules supplement the core with memory, perceptual, and motor capabilities. ACT-R is implemented as a production system, with procedural memory elements constructed by the modeler as If-Then rules called productions. Execution of productions is accomplished by matching the “If”, or LeftHand Side (LHS), of the production against the current state of the modules and, if a match is found, executing the “then”, or Right-Hand Side (RHS) of the production. The modular construction of the ACT-R architecture allows for the extension of existing capabilities and the creation of new modules. The theoretical concepts underpinning the ACT-R architecture are enforced by the interface between the core and the modules. Important architectural constructs include the modules, the module buffers, the module requests and the chunk. The module itself, implemented as a LISP object, encapsulates the model of the system being represented. For example, the vision module simulates human vision, sensory memory, feature integration, attention and any other aspects associated with human vision, within a single module. A module’s buffer(s) makes available the current state of its module providing a window to the module environment. Module requests provide mechanisms for updating the module’s state via productions. Finally, chunks, as the basic building block of the architecture, hold declarative information in a strictly defined type, known as the chunk type. As mentioned, the constructs of ACT-R are more than implementation considerations – they enforce the underlying ACT-R theory. Any new capability added to ACT-R, including the extensions in ACT-R/DHM, must follow the required structure. If an extension deviates from the architectural standards, it gains little from ACT-R’s well established psychological validity. For this reason, we describe the extensions of ACT-R/DHM in terms of the modules, buffers, requests, and chunks affected.
Bridging the Gap between HCI and DHM: The Modeling of Spatial Awareness
297
1.2 ACT-R/DHM - Current Implementation ACT-R/DHM, prior to the development of a spatial module, extended ACT-R in a number of ways. First and most importantly, the vision module of ACT-R, which has primarily been used for stimuli presented on a two-dimensional (2D) computer screen, was expanded to consider 3D space. No additional module requests or buffers were necessary, but instead of storing flat (X, Y) coordinates, new visual chunks were derived from the original structures that encode spherical coordinates: pitch, azimuth and distance. The pitch, azimuth, and distance, or “PAD”, encoding is also reflected in the spatial module and elsewhere in ACT-R/DHM as a consistent representation of space. In addition, ACT-R/DHM includes a kinesthetic and proprioceptive (KP) module that allows for the consideration of avatar movements and body part locations in a cognitive model. The KP module adds a single buffer, “kp”, that holds the current state, (position and movement), for a single, attended body part. The kp position representation is consistent with the PAD from the vision module, spatial module, and other modules, as mentioned above. The movement state simply indicates whether or not the body part is currently in motion. Elaboration on the details of the KP module’s implementation is outside the scope of this paper. 1.3 The Need for Spatial Functions Relatively simple DHM task models clearly illustrate the need for spatial functionality in the majority of DHM scenarios. The following example is based on previous efforts [6, 7] using ACT-R/DHM with a wrapper interfacing to the Virtools™ 3D Virtual Environment and the Santos™ avatar. Santos™ is a digital human model developed at the Center for Computer Aided Design at the University of Iowa for the Virtual Soldier Research Project [8]. Santos includes a full-body avatar, with a skeleton and a posture prediction algorithm that serves as the basis for KP body part and movement information. The virtual environment provides the remaining environmental feature information (i.e. visual feature descriptions) to ACT-R/DHM. Figure 1 shows the virtual environment setup for a vending machine interaction task [6]. Participants were given 10 coins to purchase a beverage of their choice. This task involved a series of human-machine interactions in a large visual field. Participants must learn the layout of the interface, deposit their coins, choose from opions visually presented via labels, select their drink, and retrieve their drink from the machine. As the model performs the physical motions necessary to accomplish the task, the head moves and key features of the interface drop in and out of the visual field. In Figure 1a, many of the key features are not visible when the avatar is standing before the machine (initial position). In this position, the model must have some mechanism to access the current egocentric spatial position of the target feature in order to shift the view towards the target. Figure 1b shows one view encountered during interaction with the machine, and is filled with a number of critical machineinteraction features, such as buttons, the coin deposit, and other features. To accomplish tasks without repeatedly scanning for visual features any time they drop out of the current visual field, spatial memory must be available during task performance and is critical in the most basic of real-world maneuvers. To
298
B. Robbins, D. Carruth, and A. Morais
Fig. 1. The availability of visual features changes as body movements are made in a simple machine interaction task. The initial view (a) contains no objects, but reliable motor planning is possible via spatial cognition. Another view (b) may have a completely different set of features available.
appropriately model human-machine interaction, DHM cognitive architectures must include models of spatial cognition. A significant obstacle to successful modeling of spatial cognition is accounting for storage, processing and recall of spatial knowledge derived from egocentric sensory data. Below, we consider the psychological evidence for spatial awareness and existing ACT-R modeling approaches before arriving at the egocentric-first spatial module now implemented in ACT-R/DHM.
2 Modeling Spatial Cognition We propose two limitations on the implementation of the spatial module of ACTR/DHM. First, the new implementation should be based on current theory of human spatial cognition. Second, the implementation should conform to current ACT-R theory and implementation framework. 2.1 Spatial Theory There exists significant debate in the spatial cognition literature regarding the nature of spatial representations and processes. Our current implementation draws primarily from the theory of McNamara [8], as supported by Rock [9] and Sholl and Nolin [10]. The following theorems integrate concepts from all three authors’ theories. Theorem 1, expressed in the work of McNamara [8] and Sholl and Nolin [10], is that human spatial competence depends on both egocentric (first-person) and allocentric (third-person) representations. Sholl and Nolin define the egocentric representations as “self-to-object” relationships and the allocentric representations as “object-to-object” relationships. Theorem 2, elaborated by Sholl and Nolin, states that human spatial competence requires the interaction of the egocentric and allocentric systems. Fundamentally, the egocentric system is based on instantaneous experiences. Theorem 3 states that all spatial relationships must begin as egocentric relationships because they are derived from egocentric percepts. We also find, however, that
Bridging the Gap between HCI and DHM: The Modeling of Spatial Awareness
299
egocentric representations are generally inadequate for reuse. Once the egocentric origin moves in the environment (e.g. head or other body movement in the world), previously encoded egocentric relationships are no longer valid for recall. Rather than postulate a continuous update of spatial memory, we opt to implement Theorem 4: spatial relationships are stored as allocentric object-to-object relationships. McNamara considers the storage of spatial relationships in memory [8]. He outlines a theory that emphasizes the role of orientation and frames of reference in spatial encoding. He appeals to the work of Rock [9] regarding orientation detection of new objects, which states that a variety of cues lead to the ultimate determination of object orientation, and that an object’s encoded orientation depends on many factors, including its environment, intended use, shape, and others. McNamara argues that some persistent cues, which he calls environmental cues, are present across multiple egocentric views, and provide a critical link between egocentric views. We extend this idea as Theorem 5, which states that all spatial relationships require a frame of reference and an orientation, and the orientation must be expressed relative to the frame of reference. Note that Theorem 5 applies to egocentric relationships which use the origin of the spatial system as a frame of reference and a native orientation for each object in the egocentric view. Any object may provide a frame of reference for any other object in an object-to-object relationship. However, certain environmental objects are more likely to be used as referents due to location, size, saliency, permanence, etc. No claim is made as to how actual detection and/or recognition of an object’s native or intrinsic orientation should be modeled. Theorems 1-5 summarize fundamental building blocks of human spatial cognition. Thus, any implementation of the spatial sense in ACT-R/DHM should hold to the theoretical claims of these theorems. 2.2 Other ACT-R Implementations of Spatial Cognition Previous efforts have modeled the spatial sense within ACT-R implementations. Before developing a custom module for the spatial sense in ACT-R/DHM, we considered existing extensions from the ACT-R community. Gunzelmann and Lyon offer a spatial implementation that covers many of the elements of spatial theory identified in section 2.1 [11]. Specifically, they propose adding three buffers to the visual system: an egocentric buffer which holds 3D, egocentric location information, an environmental frame of reference buffer that tracks a frame of reference, as suggested by Theorem 5, and an episodic buffer integrates the egocentric and frame of reference information with existing visual information as needed. Gunzelmann and Lyon also propose a spatial module that makes spatial information accessible across multiple ACT-R modules and provides spatial processing for mental transformations, magnitude estimations, and magnitude calculations [11]. Additional ACT-R implementations have been proposed by Best and Lebiere [12] and Harrison and Schunn [13]. Best and Lebiere’s implementation, designed for the development of intelligent agents, preprocesses the environment to directly provide an allocentric representation to the agent. Harrison and Schunn implement a “configural” buffer that associates a visual object with its orientation, then a system to update up to three “behaviorally significant” egocentric relationships based on the direction of body motion (e.g. walking). No persistent allocentric representation is used.
300
B. Robbins, D. Carruth, and A. Morais
To summarize, our implementation of spatial cognition in ACT-R differs from other current efforts either in underlying theoretical claims (i.e. Harrison and Schunn and Best and Lebiere) or architectural implementation (i.e. Gunzelmann and Lyon).
3 The ACT-R/DHM Spatial Module The following section describes ACT-R/DHM’s spatial module, and how this module is used to support modeling of spatial cognition. 3.1 Module Implementation Previous ACT-R/DHM work extended the vision module of ACT-R to support PAD encoding. The spatial implementation requires a single new module, simply named the spatial module. The module provides only one buffer to the environment, also named the spatial buffer. The spatial buffer should only hold chunks of the type spatial-relationship, and only three module requests are provided as operations on spatial-relationship chunks: ego-relate, chain-relate, and mid-relate. The spatial module has no member data of its own and derives from ACT-R’s generic module class. The spatial-relationship chunk type and the module requests of the ACTR/DHM spatial module capture many of the The spatial-relationship chunk type is detailed in Table 1. ACT-R chunks hold smaller pieces of information in slots. The slots of the spatial-relationship chunk hold a frame of reference in the reference slot, an object in the object slot, the position of the object in the pitch, azimuth, and distance slots, and finally the orientation of the object as three axis vectors, <xdir, ydir, zdir>, <xup, yup, zup>, and <xright, yright, zright>. The position and orientation are relative to the frame of reference. The three spatial module requests operate on spatial-relationship chunks to allow for the modeling of spatial competence. Specifications for each module request are included in Table 2. For the ACT-R/DHM spatial module, all operations occur in the spatial buffer. The ego-relate request takes an object as encoded by a sensory/perception module, produces an egocentric spatial relationship to the object, Table 1. The spatial relationship chunk includes information about both the object and its frame of reference
SPATIAL RELATIONSHIP CHUNK Slot Description referent The referent object The encoded object from a object sensory/perception module Object position in egocentric spherical pitch, azimuth, distance coordinates xdir, dir, zdir xup, yup, zup Object orientation axes xright, yright, zright
Bridging the Gap between HCI and DHM: The Modeling of Spatial Awareness
301
Table 2. The spatial module provides three requests that support spatial reasoning. Requests are used by the cognitive modeler to simulate human performance. The outcome of spatial requests is always placed in the spatial buffer.
SPATIAL MODULE REQUESTS Request Input Arguments ego-relate
visual percept obj
mid-relate
spatial relationship ref-obj spatial-relationship ref-tar
chainrelate
spatial-relationship ref-obj spatial-relationship obj-tar
Output to Spatial Buffer spatial relationship self-obj: referent = SELF object = obj spatial relationship obj-tar: referent = object of ref-obj object = object of ref-tar spatial relationship ref-tar: referent = referent of ref-obj object = object of obj-tar
and places the new spatial-relationship chunk in the spatial buffer. The object chunks themselves must provide some native orientation information. In the case of the vision system, we have extended visual object chunks to include orientation information. When a visual-object chunk is passed to ego-relate, orientation information is passed to the new spatial-relationship. The mid-relate and chain-relate functions build on egocentric spatial-relationship chunks to produce object-to-object relationships. Mathematically, mid-relate and chain-relate are simply vector addition functions, while psychologically they correspond to a “single-step” mental rotation (see Gunzelmann and Lyon [11] and also Kosslyn [14]). The mid-relate request takes two spatial-relationships as arguments, e.g. Self->A and Self->B, and creates an object-to-object spatialrelationship chunk A->B, where A serves as the frame of reference. Similarly, chainrelate takes as input any two spatial-relationship chunks of the form A->B and B->C and creates a spatial-relationship chunk A->C, where A serves as the frame of reference. 3.2 Enforcing Constraints We now review the implementation relative to the previously described theorems and the ACT-R architecture. Theorem 1 states that both egocentric and allocentric encoding is necessary for human spatial competence. The ACT-R/DHM implementation provides a generic representation, the spatial-relationship chunk type, that allows both types of encoding. Theorem 2 states that the interaction of egocentric and allocentric representations is essential to human spatial cognition. The mid-relate function provides a direct transformation from egocentric to allocentric representations. The chain-relate function can also be used for the inverse operation, if the first spatial-relationship in the “chain” is an egocentric relationship, e.g. Self->A + A->B = Self->B. Theorem 3 states that all spatial relationships must begin as egocentric. Sensory percepts are encoded into objects that include egocentric pitch, azimuth, and distance.
302
B. Robbins, D. Carruth, and A. Morais
This egocentric information can be used to generate egocentric spatial-relationships using ego-relate. Theorem 4 states that all relationships are stored in an allocentric form. The mid-relate mechanism converts egocentric spatial-relationships with two objects into a single object-to-object relationship. In our implementation object-toobject relationships are only useful when tied to an egocentric relation. This requirement is based on Theorems 2 and 3. The spatial-relationship chunk also allows for the encoding of a frame of reference and of object orientation, as required by Theorem 5. While the current implementation relies on Rock [9] and McNamara [8] to support the requirement for orientation encoding, it makes no claim to model the determination of orientation by visual or declarative memory methods. This is an interesting question for our implementation that deserves future work as the assumption that orientation can be encoded underpins the spatial module’s encoding of allocentric, object-to-object relationships. 3.3 Spatial Modeling with ACT-R/DHM With the ACT-R/DHM implementation of the spatial sense now specified, the application of the new spatial modeling capability is perhaps most interesting to the DHM community. We now describe the use of the new spatial module to improve the performance of the previously introduced model –the vending machine interaction task. As mentioned previously, ACT-R/DHM to-date has been used to drive the Santos™ digital human model, and Santos™ exists in a high-fidelity 3D virtual environment. The vending machine interaction task now uses the module requests of the spatial module. The model assumes that the human subject is familiar with the parts of the vending machine (e.g. buttons, labels, coin slot, etc.) but has not seen or used this specific machine before. Thus, as the avatar approaches the machine, he encodes the layout of the machine relative to the machine’s background (a large environmental cue) using a series of ego-relate and mid-relate requests. The machine’s background, known in the model as the “CokeBox”, is visible at all times during the machine interaction task, and is therefore an ideal object for the construction of object-toobject relationships. For example, the model encodes egocentric relationships Self>Button1 and Self->CokeBox then uses Mid-Relate with these two egocentric relationships as arguments to create and store the object-to-object relationship CokeBox->Button1. To utilize the stored spatial relationship, the model must relocate CokeBox in the visual field and use chain-relate to program egocentric movements for machine interaction. After encoding the machine layout, the model programs numerous motor movements from the avatar’s hand to the coin slot of the machine, simulating the deposit of coins. Note that as the avatar looks at his hand, the coin slot drops from the visual field. The position of the coin slot relative to the CokeBox must then be recalled from declarative memory and an egocentric spatial-relationship chunk constructed via chain-relate in order to relocate the coin slot and continue depositing.
Bridging the Gap between HCI and DHM: The Modeling of Spatial Awareness
303
4 Conclusions and Future Enhancements The implementation of a spatial module in ACT-R/DHM resolves significant issues related to knowledge of object locations in 3D environments and provides the capability to model human performance for many dynamic tasks. To make the spatial modeling capability more accessible and accurate, a number of additional enhancements are necessary. For example, the link between the visual and spatial systems should be explored with regard to attention and autonomous behavior. It seems feasible, as Harrison and Schunn have suggested [13], that at least one or more currently attended spatial relationships may be updated automatically as the body moves. In fact, ACT-R/DHM’s KP system provides some functionality that could be used to update spatial knowledge based on kp movement information. The enforcing of limitations on spatial cognition is also an area that needs additional research and implementation. If, as Kosslyn suggests [14], spatial reasoning must occur egocentrically, then only egocentric spatial relationships should be available to the spatial buffer. This could be accomplished by implementing mid-relate and chainrelate functionality within the spatial module and exposing only egocentric spatial relationships via the spatial buffer. While much future work remains to extend this implementation, compare our implementation with alternative implementations, and validate against human spatial cognition data, the ACT-R/DHM spatial module provides significant functionality based on spatial cognition theory and within the existing ACT-R framework.
References 1. Newell, A.: Unified Theories of Cognition. Harvard University Press, Cambridge (1999) 2. Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An Integrated Theory of the Mind. Psychological Review 111(4), 1036–1060 (2004) 3. Laird, J.E., Newell, A., Rosenbloom, P.S.: SOAR: An Architecture for General Intelligence. Artificial Intelligence 33(1), 1–64 (1987) 4. Kieras, D.E., Meyer, D.E.: An Overview of the EPIC Architecture for Cognition and Performance with Application to Human-Computer Interaction. Human-Computer Interaction 12(4), 391–438 (1997) 5. Carruth, D., Robbins, B., Thomas, M., Letherwood, M., Nebel, K.: Symbolic Model of Perception in Dynamic 3D Environments. In: Proceedings of the 25th Army Science Conference, Orlando, Florida (September 2006) 6. Carruth, D., Thomas, M., Robbins, B.: Integrating Perception, Cognition and Action for Digital Human Modeling. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 333–342. Springer, Heidelberg (2007) 7. Abdel-Malek, K., Yang, J., Marler, R.T., Beck, S., Mathai, A., Zhou, X., Patrick, A., Arora, J.: Towards a New Generation of Virtual Humans: Santos. International Journal of Human Factors Modeling and Simulation 1(1), 2–39 (2006) 8. McNamara, T.: How Are the Locations of Objects in the Environment Represented in Memory? In: Freksa, C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition III. LNCS, vol. 2685, pp. 174–191. Springer, Heidelberg (2003) 9. Rock, I.: Orientation and Form. Academic Press, New York (1973)
304
B. Robbins, D. Carruth, and A. Morais
10. Sholl, M.J., Nolin, T.L.: Orientation Specificity in Representations of Place. Journal of Experimental Psychology: Learning, Memory, and Cognition 23(6), 1494–1507 (1997) 11. Gunzelmann, G., Lyon, D.R.: Mechanisms for Human Spatial Competence. In: Barkowsky, T., Knauff, M., Ligozat, G., Montello, D.R. (eds.) Spatial Cognition 2007. LNCS (LNAI), vol. 4387, pp. 288–307. Springer, Heidelberg (2007) 12. Best, B.J., Lebiere, C.: Spatial Plans, Communication, and Teamwork in Synthetic MOUT Agents. In: Proceedings of the 12th Conference on Behavior Representation In Modeling and Simulation (2003) 13. Harrison, A.M., Schunn, C.D.: ACT-R/S: Look Ma, no “cognitive map"! In: Detje, F., Doerner, D., Schaub, H. (eds.) Proceedings of the Fifth International Conference on Cognitive Modeling, pp. 129–134. Universitats-Verlag, Bamberg (2003) 14. Kosslyn, S.M.: Image and Brain: The Resolution of the Imagery Debate. MIT Press, Cambridge (1994)
Behavior-Sensitive User Interfaces for Smart Environments Veit Schwartze, Sebastian Feuerstack, and Sahin Albayrak DAI-Labor, TU-Berlin Ernst-Reuter-Platz 7, D-10587 Berlin {Veit.Schwartze,Sebastian.Feuerstack, Sahin.Albayrak}@DAI-Labor.de
Abstract. In smart environments interactive assistants can support the user’s daily life by being ubiquitously available through any interaction device that is connected to the network. Focusing on graphical interaction, user interfaces are required to be flexible enough to be adapted to the actual context of the user. In this paper we describe an approach, which enables flexible user interface layout adaptations based on the current context of use (e.g. by changing the size of elements to visually highlight the important elements used in a specific situation). In a case study of the “4-star Cooking assistant” application we prove the capability of our system to dynamically adapt a graphical user interface to the current context of use. Keywords: Layouting, model-based user interface development, adaptation, constraint generation, context-of-use, smart environments, human-computer interaction.
1 Introduction Interactive applications, which are deployed to smart environments, are often targeted to support the users in their every-day life by being ubiquitous available and continuously offering support and information based on the users’ requirements. Such applications must be able to adapt to different context-of-use scenarios to remain usable for each user’s situation. Scenarios include e.g. adapting the user interface seamlessly to various interaction devices or distributing the user interface to a set of devices that the user feels comfortable with in a specific situation. The broad range of possible user interface distributions and the diversity of available interaction devices make a complete specification of each potential context-of-use scenario difficult during the application design. Necessary adaptations require flexible and robust (re-) layouting mechanisms of the user interface and need to consider the underlying tasks and concepts of the application to generate a consistent layout presentation for all states and distributions of the user interface. Based on previous work [12], we propose a constraint-based GUI layout generation that considers the user’s behavior and her location in a smart environment. Therefore we concentrate on the user’s context and identify several types of possible layout adaptations: V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 305–314, 2009. © Springer-Verlag Berlin Heidelberg 2009
306
V. Schwartze, S. Feuerstack, and S. Albayrak
1. Spot-based adaptation: In a smart environment, such as our SerCHo Living Lab, different places identify various situations. Applications can consider these spots to adapt their user interface layout to focus on those parts of the UI that are identified as most important for a certain spot. 2. Distance-based adaptation: The distance of the user to a certain interaction device, such as a wall-mounted display or a mobile phone, can be used to adapt the layout. 3. Orientation-based adaptation: The orientation of the user to an interaction device can influence the presentation of the user interface. Thus, for instance the angle of view in which the user looks at a display can be used to enlarge the visual weight of elements on one side of the user interface presentation. These adaptations can be done either by discretely or continuously modifying the user interface layout and can be combined for a more comfortable interaction experience. Different to the spot-based adaptation that requires the application developer to explicitly specify, which user tasks are most relevant for a certain user location, the distance- and orientation-based adaptations can be performed without any effort of the designer. In the following, we illustrate the definition and usage of layouting statements to create constraint systems that evaluate runtime context information to adapt the user interface layout accordingly.
2 User Interface Layouting Different to other layout generation approaches [11], we create the constraint system at runtime. In our layout model a user interface is described using four basic characteristics: the containment, the orientation, the order and the size of user interface elements (UI elements). The containment characteristic describes the relation of elements as a nested hierarchy by abstract containers that can contain other abstract containers or UI elements. All UI elements are in an order that can be defined by relations like “before” or “after”. The orientation distinguishes between elements that are oriented horizontally or vertically to each other. Finally the size specifies the width and height of containers and UI elements relative to other UI elements or abstract containers. To create a constraint system from these characteristics, we use a set of statements to express the building process. A statement has conditions combined with conjunctions and disjunctions to define the scope of the statement. Conditions can also use additional information about the UI elements to define application independent statements. The formal description of a statement is shown in figure 1 top. If the conditions are fulfilled, the statement is used and the effect modifies the constraint system. At runtime this set of statements is evaluated and creates a constraint system solved by a Cassowary constraint [1] solver. This constraint solver supports linear problems and solves cycles. To generate a flexible constraint system, it also supports a constraint hierarchy using weak, medium, strong and required priorities. The Effect is split into dynamic and static, static statements use only a static value for adaptations; in opposite dynamic statements use a function depending on dynamic information. Dynamic functions are divided into logical- and mathematical functions. Mathematical functions describe the behavior of their value
Behavior-Sensitive User Interfaces for Smart Environments
307
Fig. 1. Statement format and example
in dependency to external information1 like the distance to the screen. Logical functions use external information2 to create a logical value to come to a decision. This kind of function for instance is used to generate the initial orientation for the elements of the user interface. The example shown in figure 1 describes a “Prioritize Statement” changing the space allocation for a specific node, in this case for the element “GiveHelp”. The effect contains a mathematic function with the variable “distance”. If the distance between the user and the screen changes, the function recalculates the prioritized value that describes how much space the element “GiveHelp” additionally gets from other UI elements. 2.1 Statement Evaluation The result of a successful layout calculation is a set of elements, each consisting of the location (an absolute x, y coordinate) and a width and height value. The layout generation is performed in three phases: 1. First an initial layout is automatically generated by a set of predefined algorithms that interpret the design models like the task- and abstract user interface model to generate an initial layout that is consistent for all platforms. The result of the containment statement is a tree structure representing the graphical user interface organization. The orientation statement at first allocates the vertical space and after a designer modifiable threshold value is reached, it uses an alternating orientation. After the definition of the orientation the size statement defines the initial space usage for the user interface elements. Basic constraints assure that all additional constraints added do not corrupt the constraint system.
1
2
Numerical data from different information sources like the context model. Comparable data like numerical and textual information.
308
V. Schwartze, S. Feuerstack, and S. Albayrak
2. A designer can manipulate the pre-generated layout to match his aesthetical requirements by adding statements that relate information of the design models with a layout characteristic of a UI element. 3. Finally, the user behavior in a smart environment can by considered by adding generic statements that can weight individual UI elements based on the actual context of the user at system runtime. 2.2 Context Related-Layout Adaptations To adapt the interface to specific situations the designer can define context sensitive statements to prioritize specific nodes described in the next section. These statements are only active for specific situations described by context information. Even though our layout model describes the size, order, orientation and containment structure separately, for realizing layout adaptations regarding the user behaviour, we focus on size adaptations, as modifying the other layout characteristics can destroy the user interface consistency, which affects the usability [7]. As we described in the introduction, there are three different statement types: Spot-based adaptation, Orientation-based adaptation, Distance-based adaptation. The basic idea for all adaptations is to highlight the context relevant parts of the user interface for the moment. This is described by a prioritize value characterizing how much additional space an element can use compared to the rest of the interface. The figure below shows an example. The algorithm allocates the space according to the weight (contained elements) so the increase depends on the amount of other elements. In this example we prioritize the red node, the prioritize value of ½ ensures that the node gets additional space of the other nodes. As a result, this statement adds a new constraint with the weight “strong” to the constraint system sizerednode ≥ 2/3 * sizeparentnode. The context based adaptations use static and dynamic statements to recalculate the space allocation for the graphical user interface. The statements, defined by the designer, the prioritize value (static statements) and prioritize function (dynamic statements), used in the next section, are examples and adjustable by the designer. The Spot-based adaptation uses a static prioritize statement for a specific set of nodes and an assigned position of the user. If the user reaches the specified position, the statement is used and adds for the affected nodes the amount of space given by the prioritize value.
Fig. 2. Result of the prioritizing process
Behavior-Sensitive User Interfaces for Smart Environments
309
The orientation-based adaptation uses the Spots “D” and “A2” shown in figure 3 bottom. If the user enters the specified position, this statement is activated and prioritizes a specific node. If the user stands left or right from the screen, this statement prioritizes all nodes with the upper left corner on the opposite side. The Distance-based adaptation uses the distance from the user to the screen to calculate the prioritize value relative from the distance. If the user moves away from the display, the relevant parts of the interface are enlarged. In the following case study these adaptations are described and discussed.
3 Cooking Assistant Case Study To test the adaptations we deployed the cooking assistant into a real kitchen environment of our SerCHo living lab like depicted by the photo in figure 3 top-left. This multimodal application assists the user during the cooking process. The main screen, shown in figure 3, top-right, guides you through the cooking steps and provides help if needed. The figure 3, bottom, illustrates several spots corresponding to the different working positions and user tasks in the kitchen. Since the touch screen supports a view angle of 160 degrees, the user cannot observe the screen from all spots. For the spot-based layouting, we therefore focus on the spots listed in table 1. Figure 4 depicts the box-based preview of our layout editor from which the main screen of the cooking assistant has been derived. By a preceding task analysis, we identified the most relevant interaction tasks. Deriving an initial layout model from a task hierarchy structure has the advantage hat related tasks end up in the same boxes and will be layouted close to each other since they share more parent containers the closer they are related.
Fig. 3. The kitchen with the cooking assistant running on a touch screen (top-left), the main screen of the cooking assistant (top-right), and the location spots defined by the context model (bottom)
310
V. Schwartze, S. Feuerstack, and S. Albayrak
Table 1. An excerpt of the user contexts that are supported by the application. The second column lists the most relevant application tasks for each user tasks.
Spot
User context
A2 C1.2 B2
Looking for ingredients.
D
E
Preparing ingredients while following the cooking advices and controlling the kitchen appliances. Learning about next steps while cleaning dishes after a step has been done. Concentrating on the video or getting an overview about the recipe steps.
Relevant tasks ordered by Priority. 1. listRequiredIngredients 2. listNextStepIngredients 1. stepDetailedDescription 2. listRequiredIngredients 3. selectAppliance 4. giveHelp 1. presentNextStepSummary 2. listNextStepIngredients 3. stepSelection All tasks same priority
Fig. 4. Changes from automatic generated layout to designer adapted layout
The starting point for all adaptations is the constraint system generated by the automatic statements shown in figure 4 Phase I and adapted by the designer to adjust the space allocation to his wishes. The result of this process is shown in figure 4 Phase II. To adapt the constraint system to a specific situation, we describe three examples below. 3.1 Statements for Spot-Based Adaptation: (B2) While using the cooking assistant (CA), the user is preparing ingredients, following the cooking advices and controlling the kitchen appliances. Because it is difficult to look at the screen from this position, shown in figure 3 bottom, the statement highlights the important information (Task: stepDetailedDescription, listRequired Ingredients, selectAppliance, giveHelp). The condition of the Spot statement is characterized by an environment condition, the position of the user and relevant interactions tasks, as the interface structure is derived from the task model. Because
Behavior-Sensitive User Interfaces for Smart Environments
311
Fig. 5. B2 prioritize “ShowCurrentStepDetail” with elements stepDetailedDescription, list RequiredIngredients, selectAppliance and giveHelp
the container “showCurrentStepDetails” contains the most relevant elements it is prioritized. Additionally, the statement use a static prioritize value, defined by the designer. For this study we use a fraction of 4/5(80%) because the prioritization is high enough to support the user, but low enough to follow the changes in the user interface and not confuse the user. The effect of this statement for the case B2 is shown in figure 5. 3.2 Statements for Distance-Based Adaptation While cleaning dishes after a step has been done, the user wants to learn more about the next step. A video helps to understand what has to be done. Because the focused task is specified in the AUI model, the layout algorithm can prioritize the task containing the specific element. The distance statement is characterized by a function calculating the prioritize value depending on the distance to the screen. This function is expressed by prioritize value = ax2 + bx +c. The constants a,b,c can adapted by the designer to match the function to the maximum distance. For our case study we use this linear function: prioritize value = 4/30003* distance. The user interface prioritizing “giveHelp” depending to the distance is shown in figure 6.
Fig. 6. Distance based adaptation, shown for 100, 200 and 400cm
3.3 Statements for Orientation-Based Adaptation: (A2), (D) If the user has something to do at the spots A2 and D shown in figure 3 bottom, the view angle to the screen is inappropriate.
3
This fraction is calculated by the assumption that the interaction space maximum of 600cm, the prioritization for this distance is 4/5(80%).
312
V. Schwartze, S. Feuerstack, and S. Albayrak
Fig. 7. Orientation based adaptation for left- and right side
Depending from the angle of view to the screen shown in figure 7, elements with the upper left corner at the affected side rendered broader than half width of the screen. If the user enters Spot D (left) and leaves the normal angle of view (shown in figure 3 bottom) the width of the elements “giveHelp” and “controlAppliance” is growing to half of the screen width. The same happens if the user enters Spot A2 (right) with the elements “listRequiredIngredients”, “listNextStepIngredients”.
4 Related Work Nichols et al. list a set of requirements that need to be addressed in order to generate high-quality user interfaces in PUC [5]. As for layout information they propose to not include specific layout information into the models as this first tempts the designers to include too many details into the specification for each considered platform, second delimits the user interface consistency and third might lower the chance of compatibility to future platforms. Different to PUC we are not focusing on control user interfaces, but end up in a domain independent layout model that specifies the containment, the size, the orientation and the order relationships of all individual user interface elements. Therefore we do not want to specify the layout manually for each targeted platform and do not rely on a set of standard elements (like a set of widgets for instance) that has been predefined for each platform. The SUPPLE system [3] treats interface adaptation as an optimization problem. Therefore SUPPLE focuses on minimizing the user’s effort when controlling the interface by relying on user traces to estimate the effort and to position widgets on the interface. Although in SUPPLE an efficient algorithm to adapt the user interface is presented, it remains questionable if reliable user traces can be generated or estimated. While SUPPLE also uses constraints to describe device and interactor capabilities they present no details about the expressiveness of the constraints and the designers effort in specifying these constraints. The layout of user interfaces can be described as a linear problem, which can be solved using a constraint solver. The basic idea is shown in [12], this approach uses a grid layout to organize the interface and create a constraint system. Our approach instead uses a tree structure and supports more constraint strengths. Recent research has been done also by Vermeulen [8] implementing the Cassowary algorithm [1], a weak constraint satisfaction algorithm to support user interface adaptation at run-time to different devices. While he demonstrates that constraint satisfaction can be done at run-time, to our knowledge he did not focus on automatic constraint generation.
Behavior-Sensitive User Interfaces for Smart Environments
313
Other approaches describe the user interface layout as a space usage optimization problem [4], and use geometric constraint solvers, which try to minimize the unused space. Compared to linear constraint solving, geometric constraint solvers require plenty of iterations to solve such a space optimization problem. Beneath performance issues an efficient area usage optimization requires a flexible orientation of the user interface elements, which critically affects the user interface consistency. Richter [6] has proposed several criteria that need to be maintained when relayouting a user interface. Machine learning mechanisms can be used to further optimize the layout by eliciting the user’s preferences [5]. The Interface Designer and Evaluator (AIDE) [7] and Gadget [2] are incorporating metrics in the user interface design process to evaluate a user interface design. Both projects focus on criticizing already existing user interface layouts by advising and interactively supporting the designer during the layout optimization process. They follow a descriptive approach by re-evaluating already existing systems with the help of metrics. This is different to our approach that can be directly embedded into a model-based design process (forward engineering). To adapt user interfaces to a specific situation, in [9] an XSL transformation is used to adapt the abstract description of the interface to the different devices. Our approach follows a model-based user interface design [8]. Following a model-based user interface development involves a developer specifying several models using a model editor. Each abstract model is reified to a more concrete model until the final user interface has been derived. The result is a fine structured user interface, which could be easily adapted to different situations. An akin approach to create a user interface is presented in [10], the interface structure is derived from the task model and fleshed out by the AUI- and CUI Model. To adapt the interface to mobile devices, different containing pattern are used to organize the information on the screen. Our approach doesn’t break the interface structure into small pieces because all information has to be displayed.
5 Conclusion and Further Work In this paper we presented an approach to adapt the user interface of applications to specific situations. Furthermore our case study “4-Star Cooking Assistant” has shown the relevance to support the user. In the future we have to enlarge the case study to other applications and check more context information about the relevance for GUI adaptations. User-interaction-related adaptation: Based on the user’s experiences and his interaction history (tasks completion and referred objects), the most important areas of control can be visually weighted higher to prevent unprofitable interaction cycles or helping the user in cases where he is thinking (too) long about how to interact or to go any further. User-abilities-related adaptation: The layout adapts to the user’s stress factor by visually highlighting the most relevant tasks, and takes into account if the user is left or right handed by arranging the most relevant parts of the user interface. Finally his eye-sight capabilities can be used to highlight the most important areas of control.
314
V. Schwartze, S. Feuerstack, and S. Albayrak
References 1. Badros, G.J., Borning, A.: The Cassowary linear arithmetic constraint solving algorithm. In: ACM Transactions on Computer-Human Interaction (2001) 2. Fogarty, J., Hudson, S.: GADGET: A toolkit for optimization-based approaches to interface and display generation (2003) 3. Gajos, K., Weld, D.: SUPPLE: Automatically Generating User interfaces; In: Proceedings of Conference on Intelligent User Interfaces 2004, Maderia, Funchal, Portugal (2004) 4. Gajos, K., Weld, D.S.: Preference elicitation for interface optimization. In: UIST 2005: Proceedings of the 18th annual ACM symposium on User interface software and technology, New York, NY, USA (2005) 5. Nichols, J., Myers, B.A., Harris, T.K., Rosenfeld, R., Shriver, S., Higgins, M., Hughes, J.: Requirements for Automatically Generating Multi-Modal Interfaces for Complex Appliances. In: IEEE Fourth International Conference on Multimodal Interfaces, Pittsburgh 6. Richter, K.: Transformational Consistency. In: CADUI 2006 Computer-AIDED Design of User Interface V (2006) 7. Sears, A.: Aide: a step toward metric-based interface development tools, pp. 101–110 (1995) 8. Vermeulen, J.: Widget set independent layout management for uiml, Master’s thesis, School voor Informatie Technologie Transnationale Universiteit Limburg (2000) 9. Chiu, D.K.W., Hong, D., Cheung, S.C., Kafeza, E.: Adapting Ubiquitous Enterprise Services with Context and Views. In: Dickson, K.W. (ed.) EDOC 2006: Proceedings of the 10th IEEE International Enterprise Distributed Object Computing Conference, Washington, DC, USA, pp. 391–394 (2006) 10. Martinez-Ruiz, F.J., Vanderdonckt, J., Martinez-Ruiz, J.: Context-Aware Generation of User Interface Containers for Mobile Devices. In: ENC 2008: Proceedings of the 2008 Mexican International Conference on Computer Science, 2008, Washington, DC, USA, pp. 63–72 (2008) 11. Lutteroth, C., Strandh, R., Weber, G.: Domain Specific High-Level Constraints for User Interface Layout, Hingham, USA, pp. 307–342 (2008) 12. Feuerstack, S., Blumendorf, M., Schwartze, V., Albayrak, S.: Model-based layout generation. In: Proceedings of the working conference on Advanced visual interfaces, Napoli, Italy (2008)
Non-intrusive Personalized Mental Workload Evaluation for Exercise Intensity Measure N. Luke Thomas1, Yingzi Du1, Tron Artavatkun1, and Jin-hua She2 1
Purdue School of Engineering and Technology at Indianapolis, Electrical and Computer Engineering 723 W. Michigan St. SL160, Indianapolis, IN, USA 2 Tokyo University of Technology 1404-1 Katakuracho, Hachioji-shi, Tokyo 192-0982 Japan {nlulthom,yidu,tartavat}@iupui.edu, [email protected]
Abstract. Non-intrusive measures of mental workload signals are desirable, because they minimize artificially introduced noise, and can be more accurate. A new approach for non-intrusive personalized mental workload evaluation is presented. Our research results show that human mental workload is unique to each person, non-stationary, and not zero-state. Keywords: Personalized mental workload evaluation, exercise intensity measurement, biometrics.
1 Introduction Prediction of a user’s level of mental workload can help detect the physical and psychological status of the human users [1-6]. It is important to perform these measurements without producing further stress, workload, or interference with the user’s normal function in the job [7-11]. In this paper, we propose a biometric-based eye-movement mental workload evaluation system that can automatically identify a user, set system parameters based on the specific user’s needs and previous usage, and detect when a user’s expected workload exceeds some threshold for optimal performance. Biometrics is the process by which one can automatically and uniquely identify humans using their intrinsic qualities, traits, or identifying features. Some examples of biometric identification systems include iris, face, fingerprint, voice, vein, keystroke, and gait recognition systems or algorithms [12-14]. In particular, iris recognition is an ideal biometric recognition technology for accurate, non-intrusive recognition of large numbers of users. Iris recognition is the most accurate biometric recognition technology, with reported results of false match rates of 1 in 200 billion [15]. Additionally, images that are adequate for iris recognition can be acquired at a distance of up to 10 feet from the user, using near-infrared illumination that is unobtrusive. These features make iris recognition an ideal biometric system to identify users of the workload evaluation system, and then tailor the system to their needs and past requests. Eye-tracking and eye movement based mental workload evaluation is a solution, from a system design standpoint, because the information, images of the eye, can be acquired rapidly (in excess of 30 frames per second), could be processed in real-time, V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 315–322, 2009. © Springer-Verlag Berlin Heidelberg 2009
316
N. Luke Thomas et al.
and are highly correlated with a user’s mental workload [1, 7, 8, 16-18]. The information can also be acquired without requiring any special training by the user and without interfering with their normal activity. It has been a focus of workload researchers to produce an average personal model of human mental workload and fatigue [1-4, 11, 19-23]. However, we believe our research shows that mental workload is, instead, a more individual quality of each person. That is, a human being is an inherently non-stationary and non-zero-state system—even in a constrained experimental set-up, it is impossible to replicate the exact same mental and physical state of a person at multiple times. Therefore, since one cannot guarantee an exact, or perhaps even similar, initial state for a workload experiment, it is not appropriate to attempt to apply a single workload or exhaustion model to all people. Instead, we believe that mental workload and fatigue should be modeled on an individual basis, and that while detection is possible, prediction of a user’s workload or fatigue is inherently a flawed approach.
2 Method 2.1 System Setup For our experiment, we acquired videos of user’s eyes using an internally developed helmet mounted cMOS camera. The camera captures 30 frames per second at 640 by 480 pixels. For each user, there were 3 videos taken—5 minutes while driving a
Fig. 1. The head–mounted camera system (right, front, left)
Fig. 2. Example frames from the acquired videos
Non-intrusive Personalized Mental Workload Evaluation
317
motorized cart prior to any physical activity, 10-20 minutes while on a stationary bike, and 5 minutes while driving a motorized cart after the stationary bike exercise. Each video is between 7,000 and 30,000 frames. Figure 1 shows the camera system, and Figure 2 shows frames acquired by the camera system. 2.2 Video-Based Pupil Detection and Classification The videos were processed using internally developed pupil segmentation and measurement software running in Matlab (Fig. 3-4) [24]. The system takes advantage of the motion information in the video to quickly detect the region of interest of the pupil; detects and measures the pupil location, radius, and other eye parameters; and uses a pattern-recognition based system to classify the frame as blink, non-blink, or questionable. N-1 Video Frame
Edge Detected Image
Nth Video Frame
Video-based rough pupil detection
Pattern recognition blink/non-blink classification
Greedy angular pupil detection
Detected Pupil
Fig. 3. Proposed Processing Method
2.3 Data Processing Using the measured data from the videos, the overall workload/blink pattern was extracted from the sequence. Normally, it is easier to detect and classify non-blink frames. Therefore, the analysis was primarily based on the non-blink results. Figure 5 shows several images that are difficult to be conclusively classified as either blink or non-blink. The sequence of frame states and measurements was averaged over the length of the video in 30 second increments. The Cart 1 video was used as a baseline for the other videos, because it was taken of the user prior to physical exercise and, thus,
318
N. Luke Thomas et al.
Fig. 4. Final Frame Classification Process
Fig. 5. Frames that cannot be conclusively classified as blink or non-blink
should be indicative of the user’s level at normal workload and exhaustion. Using the parameters from this video, the ratio of non-blink to blink frames was calculated (Fig 7). In this plot, lower values indicate a period in which the user was blinking more often. Additionally, plots of the detected pupil radius were generated. To increase the accuracy of the results, the detected pupil was first measured in the down sampled image used in most processing. However, if a long series of pupils were detected with the same measured radius, the measurement was repeated on the originally sized image for more discerning results. It is important to note that the original-sized pupil radius was only determined when it was expected the down sampled measurement might not have been accurate, and therefore is not necessarily determined over all time intervals. Additionally, a periodic measure of the outer iris boundary was determined for normalization purposes. Since all the pupil and eye parameters are measured in pixels and the distance from the eye to the camera is not necessarily the same from person to person or video to video, we use the iris boundary radius to normalize the pupil radius measurements to be invariant to image acquisition differences from video to video.
Non-intrusive Personalized Mental Workload Evaluation
319
2.4 Biometrics-Based Data Processing For each user, prior to beginning the workload evaluation, the system would identify the user using iris recognition. If the user had not ever used the system before, they would be enrolled in the system for future use and identification. After identifying the user, the system could use their previously recorded data to set appropriate thresholds for their workload level, set up the system to their ergonomic needs, or provide specialized instructions for the current situation based on past evaluations. Some good quality images acquired have noticeable iris pattern (Fig. 6)—the resolution and image quality is adequate for iris recognition. However, the segmentation of the pupil and iris areas can be quite difficult—many images have significant occlusion from the eyelids and eyelashes, eye gaze can be non-frontal, and the illumination can change throughout the video.
Fig. 6. Example images with adequate quality for iris recognition
3 Experimental Results Our results showed that there is significant variability in physiological changes from person to person. Some individuals blinked less over the course of their exercise, while others blinked more, and others ratio changed periodically. Some individual’s pupil radius increased during the exercise, others stayed constant, and still others decreased. Additionally, for some individuals, their radius and ratio’s were similar from the initial video, cart 1, to the video acquired during exercise, bike. However, for other users, the ‘initial’ state from the cart 1 video were significantly different compared to the results from the bike video, both above and below depending on the user. Figure 7-(a-d) shows the results for 4 representative users—for each user the first plot is the normalized pupil radius and the second is the blink to non-blink ratio. Higher values in the pupil radius plot are indicative of larger pupils, after having been normalized by the measured iris radius. Higher values in the blink to non-blink ratio plot indicate that there were more frames classified as non-blink compared to blink during that time period. On the basis of these results, we believe that attempting to similarly model all people’s physiological reaction to a workload change is inadequate. Instead, it shows that each user’s physiological reactions should be used to develop an individualized workload model, which can then be used and adapted in future evaluations.
320
N. Luke Thomas et al.
Fig. 7-a. Subject A
Fig. 7-b. Subject B
Fig. 7-c. Subject C
Fig. 7-d. Subject D
Non-intrusive Personalized Mental Workload Evaluation
321
4 Conclusion We have developed a system for personalized mental workload evaluation using nonintrusively acquired eye information. The research results show that human mental workload is unique to each person, non-stationary, and not zero-state. Because of this, each user’s mental workload should be modeled individually and adaptively.
Acknowledgement The authors would like to acknowledge the volunteers for data collection in this research in Japan. Part of the research is sponsored by the International Development Fund (IDF) at Indiana University-Purdue University Indianapolis.
References [1] Cain, B.: A review of the mental workload literature (2007) [2] Brookhuis, K.A., de Waard, D.: On the assessment of (mental) workload and other subjective qualifications. In: Ergonomics, November 15, 2002, pp. 1026–1030 (2002); discussion 1042-6 [3] Hancock, P.A., Caird, J.K.: Experimental evaluation of a model of mental workload. In: Hum. Factors, vol. 35, pp. 413–429 (September 1993) [4] Hancock, P.A., Meshkati, N.: Human mental workload. North-Holland, Amsterdam, Sole distributors for the U.S.A. and Canada. Elsevier Science Pub. Co. (1988) [5] Hancock, P.A., Meshkati, N., Robertson, M.M.: Physiological reflections of mental workload. Aviat. Space Environ. Med. 56, 1110–1114 (1985) [6] Hankins, T.C., Wilson, G.F.: A comparison of heart rate, eye activity, EEG and subjective measures of pilot mental workload during flight. Aviat. Space Environ. Med. 69, 360–367 (1998) [7] Itoh, Y., Hayashi, Y., Tsukui, I., Saito, S.: The ergonomic evaluation of eye movement and mental workload in aircraft pilots. Ergonomics 33, 719–733 (1990) [8] Murata, A., Iwase, H.: Evaluation of mental workload by fluctuation analysis of pupil area. In: Proceedings of the 20th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, vol. 6, pp. 3094–3097 (1998) [9] Sekiguchi, C., Handa, Y., Gotoh, M., Kurihara, Y., Nagasawa, A., Kuroda, I.: Evaluation method of mental workload under flight conditions. In: Aviat. Space Environ. Med., vol. 49, pp. 920–925 (July 1978) [10] Wierwille, W.W.: Physiological measures of aircrew mental workload. Hum. Factors 21, 575–593 (1979) [11] Wilson, G.F., Russell, C.A.: Real-time assessment of mental workload using psychophysiological measures and artificial neural networks. Hum. Factors 45, 635–643 (Winter 2003) [12] Daugman, J.: New Methods in Iris Recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B 37, 1167–1175 (2007) [13] Du, Y.: Review of Iris Recognition: Cameras, Systems, and Their Applications. Sensor Review 26, 66–69 (2006)
322
N. Luke Thomas et al.
[14] Proenca, H., Alexandre, L.A.: Toward Noncooperative Iris Recognition: A Classification Approach Using Multiple Signatures. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 607–612 (2007) [15] Daugman, J.: Probing the Uniqueness and Randomness of IrisCodes: Results From 200 Billion Iris Pair Comparisons. Proceedings of the IEEE 94, 1927–1935 (2006) [16] Neumann, D.L.: Effect of varying levels of mental workload on startle eyeblink modulation. Ergonomics 45, 583–602 (2002) [17] Recarte, M.A., Perez, E., Conchillo, A., Nunes, L.M.: Mental workload and visual impairment: differences between pupil, blink, and subjective rating. Span J. Psychol. 11, 374–385 (2008) [18] Backs, R.W., Walrath, L.C.: Eye movement and pupillary response indices of mental workload during visual search of symbolic displays. Appl. Ergon. 23, 243–254 (1992) [19] Cao, A., Chintamani, K.K., Pandya, A.K., Ellis, R.D.: NASA TLX: Software for assessing subjective mental workload. Behav. Res. Methods 41, 113–117 (2009) [20] Moray, N., NATO Special Program Panel on Human Factors: Published in coordination with NATO Scientific Affairs, Plenum Press, New York (1979) [21] Rouse, W.B., Edwards, S.L., Hammer, J.M.: Modeling the dynamics of mental workload and human performance in complex systems. IEEE Transactions on Systems, Man and Cybernetics 23, 1662–1671 (1993) [22] Satava, R.M.: Mental workload: a new parameter for objective assessment? Surg. Innov. 12, 79 (2005) [23] Young, M.S., Stanton, N.A.: It’s all relative: defining mental workload in the light of Annett’s paper. Ergonomics 45, 1018–1020 (2002); discussion 1042-6 [24] Thomas, N.L., Du, Y., Artavatkun, T., She, J.: A New Approach for Low-Cost Eye Tracking and Pupil Measurement for Workload Evaluation. In: 13th International Conference on Human-Computer Interaction (HCI) (2009)
Incorporating Cognitive Aspects in Digital Human Modeling Peter Thorvald1,2, Dan Högberg1, and Keith Case1,2 1 University of Skövde, Skövde, Sweden Loughborough University, Loughborough, UK {peter.thorvald,dan.hogberg}@his.se [email protected] 2
Abstract. To build software which, at the press of a button, can tell you what cognition related hazards there are within an environment or a task, is probably well into the future if it is possible at all. However, incorporating existing tools such as task analysis tools, interface design guidelines and information about general cognitive limitations in humans, could allow for greater evaluative options for cognitive ergonomics. The paper will discuss previous approaches on the subject and suggest adding design and evaluative guiding in DHM that will help a user with little to no knowledge of cognitive science, design and evaluate a human- product interaction scenario. Keywords: Digital human modelling, cognition, context, situatedness, ecological interface design, system ergonomics.
1 Introduction In Digital Human Modeling (DHM), the term ergonomics usually refers to modeling physical aspects of humans with the main focus being on anthropometry and physical strain on the body. This is also reflected in the DHM tools that exist, e.g. RAMSIS, JACK, SAMMIE, V5 Human, etc. [1, 2], tools that mainly, if not exclusively, model physical ergonomics. This paper will suggest and discuss possible ways of bringing cognition into the equation and provide users of DHM tools with an aid in evaluating cognitive as well as physical ergonomics. Computer modeling of human cognition was originally mainly done off-line in the sense that the cognitive system is viewed as a hardware independent program, effectively disregarding the surrounding environment and even the importance of a human body. However, in later years, there has been an increasing interest for viewing the human as part of a complex system, incorporating the environment and the human body in cognitive modeling. This has led to new theories regarding how humans cognize within the world and has allowed us to regard the body and the context as part of the cognitive system. Human cognition is not an isolated island where we can view our surrounding context as merely a problem space. We are very much dependant on our body and our surroundings to successfully survive in the world. Previous suggestions on integrating cognition in DHM tools have largely taken its basis in symbol processing architectures such as ACT-R, Soar and such [3-5], V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 323–332, 2009. © Springer-Verlag Berlin Heidelberg 2009
324
P. Thorvald, D. Högberg, and K. Case
architectures that disregard embodiment and situatedness of cognition. This paper will aim to place the computer manikins used in DHM tools within a context, a context where cognitive offloading and scaffolding onto the environment is supported. The main advantage of using DHM and incorporating the suggested functionality is that it can be used very early in the system development process. It also allows the designer to consider the spatial information that the physical array incorporates. In traditional usability methods, this is seldom the case as design iterations are often done offline in the sense that it only incorporates some (if any) physical properties of the domain where the system is to be implemented.
2 Cognitive Modeling in DHM During the last decade, there have been several attempts at incorporating cognitive modeling in DHM. A research group at Sandia National Laboratories in New Mexico have created a framework based on a modular and symbol processing view of human cognition and others have focused on a rule based system built on architectures such as ACT-R and Soar [3, 4]. Though not built on the same exact architecture, several others have gone about the problem in similar ways, ultimately trying to reach a state where the system can, at the press of a button, perform a cognitive evaluation for us [5]. However, the methodology upon which these architectures are built is challenged by researchers that recommend a more situated view on cognition as a whole. This view, originating in the 1920s from the Russian psychologist Lev Vygotsky, argues that human cognition cannot be viewed apart from its context and body [6]. There is no clear-cut line between what happens in the world and what happens in the head; the mind “leaks” into the world. A view already seen in the DHM society is to stop dividing human factors into “neck up” and “neck down” and instead view the human as a whole [7]. This view finds much support in the work on social embodiment by Lawrence Barsalou and colleagues. They discuss how the embodiment of the self or others can elicit embodied mimicry in the self or others [8], ultimately arguing for a holistic view of the human where the body and mind are both necessary for cognition. Whereas the discussion on embodiment and situatedness is beyond our scope in this paper, it shows us how earlier approaches to modelling cognition in DHM are at best insufficient and that a new approach is needed. The method that this paper is aimed at resulting in will not require any kind of “strong AI” and will have a much lower technological level than many others. However, it will try to consider the human as a system with a physical body, acting within an environment.
3 Cognition in System Ergonomics For a system design to become successful, the incorporation of human factors is essential. To a large part, physical ergonomics is very well accounted for in today’s system design practices, but the cognizing human is often neglected. However, as technology increasingly demands more human processing abilities, the modeling of human cognition becomes more important. The range of human behaviors need to be known to design for human-related control systems [9].
Incorporating Cognitive Aspects in Digital Human Modeling
325
System ergonomics can be used to describe a more or less complex task’s mental demands on a human. It does so by three points [9]. 1. Function The main consideration of function is what the operator has in view and to what extent the task is supported by the system. It is largely defined by the temporal and spatial properties of the activities to be performed. When and where should the task be performed? 2. Feedback The feedback allows the user to identify what state the system is in. If a performed task has resulted in anything, what task was performed etc. It is very important to allow the operator to recognize if an action had any effect on the system and also what the result of it was [10]. For example, even if a computing task on a PC takes some time to calculate, the operator is informed that the computer is working by a flashing led or an hourglass on the screen.
Fig. 1. A seat adjustment control which exhibits excellent natural mapping or matching between the system and the user’s mental model
3. Compatibility Compatibility is largely about the match between systems or between the system and the user’s mental model of the system. The operator should not be required to put too much effort into translating system signals. It relates information sources to each other. A very simple and obvious example from the automotive industry is described by Norman [10] with a seat adjustment control from a car. A similar seat adjustment control can be viewed in figure 1. It is obvious in the figure that the system (the adjustment control) corresponds well to the result of the task of manoeuvring the controls. The control maps very well to the response of the seat and to the user’s probable mental model. However, the compatibility is not exclusively relevant to the psychological issues but a designer also needs to consider the physical compatibility of the user and the system. Controls might for example be spatially located away from the physical reach of the human. Though these three points are hardly sufficient for a comprehensive design tool, they are of great help in an initial state of system design and will prove helpful to us in developing a more detailed design aid.
326
P. Thorvald, D. Högberg, and K. Case
4 Methods for Interface Design and Evaluation In Human-Computer Interaction (HCI) there are several evaluation methods with great use for certain situations. As the aim of this paper is to present a draft of a design tool, we shall take a closer look at a few of these methods along with a task analysis tool. 4.1 Task Analysis All good design processes include some sort of task analysis. To be able to design a system that fits both task and human, we need to know as much as possible about the task. A fairly quick and dirty task analysis which provides a good basis for further development is the hierarchical task analysis (HTA) [11]. A HTA is a tree diagram of the task structure and serves several purposes. It gives us a good overview of the system or the task and subtasks that need to be performed, and provides aid in achieving common ground within a design group. It can also even serve as a task evaluation tool, allowing a designer to find global problems that can be missed while using usability inspection methods such as cognitive walkthrough [12], heuristic evaluation [13, 14] etc. Global issues are mainly related to the structure of the task and the relation between the subtasks whereas local issues are within a subtask with a very limited scope. Make Coffee Do in any order 1-2 Do 3 1. Add Water
2. Add Coffee
Do 1.1-1.2 1.1.Fill Pot with Water
1.2. Pour water into Coffee Maker
3. Press Button
Do 2.1-2.2 2.1. Place Filter
2.2. Add Coffee
Fig. 2. A very simple HTA of the process of making a pot of coffee
The creation of a HTA is fairly simple. First, identify the overlying task to be performed, which in our very simple example, illustrated in figure 2, is making a pot of coffee. The HTA in figure 2 shows this process and also shown are the plans within which each subtask should be performed. In this example it is limited to doing the tasks in order or doing two subtasks first in any order and then continuing with the third. However, these plans can be pretty much anything you want them to be such as selections (do one but not the other), linear or non- linear, or even based on a specific condition (if X then do Y, else Z). The finished task analysis is then used as a basis for further inspections and design iterations. 4.2 Ecological Interface Design Ecological interface design (EID) is spawned from cognitive work analysis (CWA), which was developed as an analytical approach to cognitive engineering by the Risø
Incorporating Cognitive Aspects in Digital Human Modeling
327
group in Denmark [15]. CWA was developed to aid in the design of very critical human-machine systems such as nuclear power plant control rooms to make them safer and more reliable. It is an approach that allows the operator to handle situations that the system designers had not anticipated. CWA is made up of five phases to analyze within a system. These phases are work domain analysis, control task analysis, strategies analysis, social-organizational analysis and worker competencies analysis [16]. Having these analyses allows the designer and the operator a better understanding of the system and already this enables the operator to better respond to unforeseen events. The idea behind EID is to create interfaces based on certain principles of CWA. It is very closely related to the principles of ecological psychology and direct perception, concepts developed by J.J Gibson in the 70’s [17]. Gibson argued that there is enough information in the visual array to directly perceive information and that mental processing of visual information is not necessary. Though this claim is highly challenged, EID is largely built up around these principles in that its goal is to create interfaces containing objects that visually reveal information on their function. A related goal of EID is to make affordances visible in interface design. Affordances, another concept created by Gibson, are the action possibilities of a specific object [17, 18]. The ideas surrounding affordances and EID can also be found in other areas of the scientific literature. In product design, one tends to discuss similar issues in terms of semantics [19]. 4.3 Usability Inspections Usability inspection methods are predictive evaluation methods, usually performed without end user participation (although this is not a prerequisite). Usability experts simulate the users and inspect the interface resulting in problem lists with varying degree of severity [20]. 4.3.1 Cognitive Walkthrough A cognitive walkthrough is usually performed by usability experts considering, in sequence, all actions incorporated in a predefined task. Its focus is almost exclusively on ease of learning, not taking into account first time problems, problems that might not be a problem for the experienced user. The method contains two phases. First the preparations phase where the analyst defines the users, their experience and knowledge; defines the task to be analyzed; identifies the correct sequence of actions to achieve the goal of the task. In the second phase, the analysis phase, the analyst answers and motivates a set of questions for each action within the task [12]. 1. Will the user try to achieve the right effect? For example, if the task is to fill up the car with petrol and a button first has to be pressed from inside the car to open the gas cap, does the user know that this has to be done? 2. Will the user notice that the correct action is available? Simply pressing the button for the gas cap would not be a problem but if the button has to be slid or twisted in some way the user may not think of this. 3. Will the user associate the correct action with the desired effect? Is it clear that this is what the specific control is for? Unambiguous icons and names of controls are important to this aspect.
328
P. Thorvald, D. Högberg, and K. Case
4. If the correct action is performed, will the user see that progress is being made? The importance of feedback, discussed earlier, comes into play here. These questions, though applicable to many tasks, are merely guidelines towards conducting a successful cognitive walkthrough. The method’s advantages are its focus on detail, it identifies local problems within the task and considers the users’ previous knowledge and experiences. However, it rarely catches global problems related to the overlying structure of the task and can be viewed as fairly subjective. It also requires a detailed prototype for evaluation although this would probably not be a problem if it is complementing a DHM-tool where a virtual prototype is likely to already exist. 4.3.2 Heuristic Evaluation Just as in the case of cognitive walkthrough, heuristic evaluations are usually performed by usability experts sequentially going through each action within a main task with a basis in a set of heuristics [14]. The method is developed by usability expert Jacob Nielsen and a set of his heuristics can be found through his publications [13, 14, 21]. Examples of Nielsen’s heuristics are • Match between system and the real world o Similar to the matching and mapping concept discussed in system ergonomics, the system should speak the users’ language, matching the real world in terms of terminology and semiotics. • Consistency and standards o Also related to the matching concept is using accepted conventions to avoid making users wonder whether different words, icons or actions mean the same thing in different contexts. • Recognition rather than recall o Options should be made visible to avoid making the user have to remember how or where specific actions should be performed. • Aesthetic and minimalist design o Dialogues and controls should not be littered with irrelevant or rarely used information. Heuristics can be added and subtracted to fit certain tasks before the evaluation commences. The method results in problem lists with motivations and rankings of the severity of the problems found.
5 A Design Guide for DHM The evaluation and design tools discussed in previous sections are developed for interface design in different settings than DHM. However, the design guide suggested in this section is a hybrid of these, adapted for use under the specific conditions that DHM provides. The method will strive to take into account global as well as local issues through the use of action based interface inspections and a task analysis focusing on the structure of the task. As stated earlier in this paper and by others [22], every good design process starts with a task analysis. For our purposes, a hierarchical task analysis is very suitable as it
Incorporating Cognitive Aspects in Digital Human Modeling
329
complements the inspection methods incorporated in this design guide. The HTA will serve several purposes; it will give the designer a better understanding of the task and it will provide a common understanding of the task within a development group. The task analysis can also be used as an evaluation tool of the task itself. It allows the designer to identify problems in the task structure that could result in problems with automatism [23], it can identify recurring tasks and give them a higher priority in the interface etc. Complementary to the task analysis, the designer should consider who the users are and what a priori knowledge they have. This resembles the guiding system for utilizing traditional DHM tools in development processes suggested by Hanson et al. [24], where the users’ anthropometry and tasks are defined before the actual analyses or simulations are performed. The sequence-based walkthrough will take its basis in the task analysis performed. For each subtask (box) of the HTA, a set of questions, based on Bubb’s points regarding system ergonomics [9], will act as guidelines for the design. • Function – When and where should the action be performed? o Will the user identify the action space where the correct action should be performed? What do the physical and geographical properties of each control convey to the user? o Frequency of actions – a frequently recurring action should take precedence in taking up place and intrusiveness in the physical and cognitive envelope. o Importance of action – Safety critical systems should also take precedence in the available information space. o Minimalism of design – avoid taking up space with irrelevant or rarely needed information. Hick’s law: Reaction time is a function of the number of choices in a decision [25]. • Feedback o Will the user understand that a correct or faulty move has been made? o Is the system status visible? • Compatibility o Does the system match other, similar systems in terms of semantics, semiotics etc.? o Does the system match the real world and the plausible mental model of the user? o Are demands on consistency and standards of the domain met? o Action-effect discrepancies – is it obvious beforehand that a certain action will have a certain effect? 5.1 Function In figure 3, there is an example of what a virtual interface, modelled in a DHM-tool can look like. In this case the picture shows a fighter jet cockpit used for evaluation where the pilot needed to locate a “panic button” to bring the aircraft back into control under extreme physical and mental conditions.
330
P. Thorvald, D. Högberg, and K. Case
Fig. 3. Two views of a cockpit modeled in SAMMIE
The action spaces that the user has to identify when performing an action are the controls in front of, and to the right and left of the steering control stick. Preferably, a frequently performed action control should be placed on the control stick or directly in front of it as these are the spaces that best correspond to the physical and cognitive reach of the pilot. Also safety systems, as in the case of the evaluation in figure 3, should be placed so that they are easily accessible for the user. Knowing that certain controls are rarely used, they can be placed to the right and left to avoid having too many options in terms of pushable buttons at the same place. The intrusiveness and affordances of such “high priority controls” should also be accentuated in terms of their design. 5.2 Feedback Understanding what has been done and what is in progress of happening with the system can prove vital in many cases. Surely we can all relate to a situation where we have pressed the print button more than once only to find out that we have printed several more copies than needed. While this may be a minor problem, one can easily imagine the problems that can arise in more critical domains. What if there were no indications for what gear the car’s gearbox was in? The driver would have to test each time to see if the car is in reverse or drive. In an incident at a hospital, a patient died as a result of being exposed to a massive overdose of radiation during a radiotherapy session. The problem could easily have been avoided, had the system provided the treating radiology technician with information of the machines settings [26]. 5.3 Compatibility Accurate mapping between systems and mental models is a key concept in the compatibility section. This includes trying to adhere to consistencies and standards of the organization and the specific field. There should also be clear connection between action and effect. Neglecting these consistencies can lead to serious problems as in the case with an aircraft’s rudder settings. The sensitivity of the rudder could be set through a lever placed to the side of the pilot’s seat. However, between the simulator for the aircraft and the actual aircraft, the lever was reversed, moving in the opposite direction for maximum and minimum sensitivity almost resulting in a crash [27].
Incorporating Cognitive Aspects in Digital Human Modeling
331
6 Conclusions and Future Work In ergonomics, it seems to be common practice to separate human factors into “neck up” and “neck down”. Though this approach may make it easier to study ergonomics, it does not portray an entirely accurate picture of the human. The evidence for a tight coupling between mind and body is so overwhelming that instead of talking about mind and body, perhaps we should be talking about the human system. The aim of this paper has been to consider past and current approaches towards integrating cognition into DHM tools and outline a new design guide to help designer achieve this integration in a better way. The guide is not complete and would need extensive further development and testing. However, it is a good start towards including something more into DHM than traditionally has been found there.
References 1. Bubb, H.: Future Applications of DHM in Ergonomic Design. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 779–793. Springer, Heidelberg (2007) 2. Case, K., Porter, J.M.: SAMMIE - A Computer Aided Ergonomics Design System. Engineering 220, 21–25 (1980) 3. Bernard, M.L., Xavier, P., Wolfenbarger, P., Hart, D., Waymire, R., Glickman, M., Gardner, M.: Psychologically Plausible Cognitive Models for Simulating Interactive Human Behaviors. In: Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting, pp. 1205–1210 (2005) 4. Carruth, D.W., Thomas, M.D., Robbins, B., Morais, A.: Integrating Perception, Cognition and Action for Digital Human Modelling. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 333–342. Springer, Heidelberg (2007) 5. Gore, B.F.: Human Performance: Evaluating the Cognitive Aspects. In: Duffy, V.G. (ed.) Handbook of digital human modelling, Mahwah, New Jersey (2006) 6. Clark, A.: Being There: Putting Brain, Body, and World Together Again. MIT Press, Cambridge (1997) 7. Feyen, R.: Bridging the Gap: Exploring Interactions Between Digital Human Models and Cognitive Models. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 382–391. Springer, Heidelberg (2007) 8. Barsalou, L.W., Niedenthal, P.M., Barbey, A.K., Ruppert, J.A.: Social Embodiment. In: Ross, B.H. (ed.) The Psychology of Learning and Motivation, pp. 43–92. Academic Press, San Diego (2003) 9. Bubb, H.: Computer Aided Tools of Ergonomics and System Design. Human Factors and Ergonomics in Manufacturing 12, 249–265 (2002) 10. Norman, D.A.: The design of everyday things. Basic Books, New York (2002) 11. Annett, J.: Hierarchichal Task Analysis. In: Diaper, D., Stanton, N. (eds.) The Handbook of Task Analysis for Human-Computer Interaction, pp. 67–82. Lawrence Erlbaum Associates, Mahwah (2003) 12. Polson, P.G., Lewis, C., Rieman, J., Wharton, C.: Cognitive Walkthroughs: A Method for Theory-Based Evaluation of User Interfaces. International Journal of Man-Machine Studies 36, 741–773 (1992) 13. Nielsen, J.: Usability Engineering. Morgan Kaufmann, San Francisco (1993)
332
P. Thorvald, D. Högberg, and K. Case
14. Nielsen, J.: Heuristic evaluation. In: Nielsen, J., Mack, R.L. (eds.) Usability inspection methods, pp. 25–62. John Wiley & Sons, Inc., New York (1994) 15. Vicente, K.J.: Cognitive Work Analysis: Toward Safe, Productive, and Healthy ComputerBased Work. Lawrence Erlbaum Assoc. Inc., Mahwah (1999) 16. Sanderson, P.M.: Cognitive work analysis. In: Carroll, J.M. (ed.) HCI models, theories, and frameworks: Toward an interdisciplinary science, pp. 225–264. Morgan Kaufmann Publishers, San Francisco (2003) 17. Gibson, J.J.: The ecological approach to visual perception. Lawrence Erlbaum Associates, Hillsdale (1986) 18. McGrenere, J., Ho, W.: Affordances: Clarifying and Evolving a Concept. In: Proceedings of Graphics Interface 2000, pp. 179–186 (2000) 19. Monö, R.: Design for Product Understanding. Skogs Boktryckeri AB (1997) 20. Nielsen, J., Mack, R.L.: Usability inspection methods. Wiley, Chichester (1994) 21. Nielsen, J.: Finding usability problems through heuristic evaluation. In: Proceedings of ACM, Monterey, CA, pp. 373–380 (1992) 22. Pheasant, S., Haslegrave, C.M.: Bodyspace: Anthropometry, Ergonomics and the Design of Work. CRC Press, Boca Raton (2006) 23. Thorvald, P., Bäckstrand, G., Högberg, D., de Vin, L.J., Case, K.: Demands on Technology from a Human Automatism Perspective in Manual Assembly. In: Proceedings of FAIM 2008, Skövde, Sweden, vol. 1, pp. 632–638 (2008) 24. Hanson, L., Blomé, M., Dukic, T., Högberg, D.: Guide and documentation system to support digital human modelling applications. International Journal of Industrial Ergonomics 36, 17–24 (2006) 25. Hick, W.E.: On the rate of gain of information. The Quarterly Journal of Experimental Psychology 4, 11–26 (1952) 26. Casey, S.M.: Set phasers on stun and other true tales of design, technology, and human error. Aegean, Santa Barbara (1998) 27. Casey, S.M.: The atomic chef: and other true tales of design, technology, and human error. Aegean Pub. Co., Santa Barbara (2006)
Workload-Based Assessment of a User Interface Design Patrice D. Tremoulet1, Patrick L. Craven1, Susan Harkness Regli1, Saki Wilcox1, Joyce Barton1, Kathleeen Stibler1, Adam Gifford1, and Marianne Clark2 1
Lockheed Martin Advanced Technology Laboratories 3 Executive Campus, Suite 600, Cherry Hill, NJ, USA {polly.d.tremoulet,patrick.craven,susan.regli, sakirenecia.h.wilcox,joyce.h.barton,kathleen.m.stibler, adam.gifford}@lmco.com 2 2001 South Mopac Expressway, Suite 824, Austin, TX, USA [email protected]
Abstract. Lockheed Martin Advanced Technology Laboratories (LM ATL) has designed and developed a tool called Sensor-based Mental Assessment in Real Time (SMART), which uses physiological data to help evaluate humancomputer interfaces (HCI). SMART non-intrusively collects and displays objective measures of cognitive workload, visual engagement, distraction and drowsiness while participants interact with HCIs or HCI prototypes. This paper describes a concept validation experiment (CVE) conducted to 1) demonstrate the feasibility of using SMART during user interface evaluations and 2) validate the EEG-based cognitive workload values derived from the SMART system by comparing them to three other measures of cognitive workload (NASA TLX, expert ratings, and expected workload values generated with Design Interactive’s Multimodal Information Decision Support tool). Results from the CVE indicate that SMART represents a valuable tool that provides human factors engineers with a non-invasive, non-interrupting, objective method of evaluating cognitive workload. Keywords: Cognitive workload, human computer interaction, human factors, usability, evaluation, user interface design.
1 Introduction In 2005 and 2006, the Office of Naval Research (ONR) Disruptive Technologies Opportunity Fund supported Lockheed Martin Advanced Technology Laboratories’ (LM ATL) research effort exploring the use of neuro-physiological data to measure cognitive workload during a human-computer interface (HCI) evaluation. As a part of this effort, LM ATL designed and developed a tool called Sensor-based Mental Assessment in Real Time (SMART), which non-intrusively collects physiological data (electroencephalographs (EEG), heart rate variability (HRV), galvanic skin response and pupil size) from subjects while they interact with HCIs or HCI prototypes. SMART uses the data to derive objective measures of cognitive workload, visual engagement, distraction and drowsiness, which may be used to evaluate the efficacy of design alternatives, e.g., by helping to identify events of interest during V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 333 – 342, 2009. © Springer-Verlag Berlin Heidelberg 2009
334
P.D. Tremoulet et al.
usability tests, thus reducing data load and providing timely evaluation results to suggest design changes or to validate design. SMART’s sensor-based measure of cognitive workload offers several advantages over existing workload measures, including: 1) increased precision in measuring the subject’s cognitive state via a moment-by-moment data collection, 2) obtaining objective measurement of cognitive workload while the test is being performed, and 3) not distracting subjects from their primary task (e.g., by interrupting to collect subjective ratings or requiring them to attend to a secondary task). However, SMART’s cognitive workload measure needed to be validated, so LM ATL conducted a concept validation experiment (CVE) to demonstrate the feasibility of using SMART during user interface evaluations as well as to collect data necessary to validate the sensor-based cognitive workload values derived from the SMART system by comparing them to NASA TLX, expert ratings, and Multimodal Information Decision Support (MIDS) expected values. In most respects, the CVE was similar to a traditional usability study. Twelve sailors interacted with a high-fidelity prototype of a future release of the Tactical Tomahawk Weapons Control System (TTWCS), performing tasks required to execute missile strike scenarios. However, there were two major differences. First, the CVE scenarios were designed not only to be operationally valid but also to include discrete phases that require specific levels of workload. Moreover, while interacting with the prototype, participants in the CVE wore a set of neurological and physiological sensors including a wireless continuous EEG and Electrocardiogram (EKG) sensor, wired EKG (the wired EKG was used as a backup for the newer wireless configuration) and galvanic skin response (GSR) sensors, and an off-head, binocular eye tracker that logs point of gaze and pupil diameters. 1.1 Sensor-Based Mental Assessment in Real-Time (SMART) SMART provides critical support for interpreting physiological data collected during usability evaluations. SMART logs and displays system events, physiological responses, and user actions while study participants continue to interact with system of interest, in this case a prototype of a future TTWCS system. Advances in neurotechnology and physiological measurements have enabled information to be captured that helps identify and indicate psychological states of interest (e.g., boredom, overloaded, and engaged), which aid and human factors engineers in the evaluation of an HCI. SMART provides four critical types of information during usability evaluations: • Real-Time Logging: Experimenters have the ability to enter events into the log to represent significant events that are not automatically logged. Prior to testing, the experimenter can set up events of interest with a quick key identifier so that expected events can be manually logged during testing. • Real-Time Monitoring: During testing, experimenters can log events and monitor physiological sensors data (Fig. 1). • Time Synchronization: Time synchronization between the physiological sensors logs and the test system logs is crucial in accurately matching sensor derived events
Workload-Based Assessment of a User Interface Design
335
to test platform and participant driven events. After testing, the experimenter can view data via the Timeline summary (Fig. 2). • Data Extraction: Data is also extracted and presented in a CSV format, suitable for uploading into standard statistical analysis applications.
Fig. 1. Real-time monitoring
Fig. 2. Timeline Summary
1.2 SMART’s Cognitive Workload Measure Lockheed Martin Advanced Technology Laboratories (LM ATL) worked with Advanced Brain Monitoring (ABM) to develop an Electroencephalogram (EEG)based gauge of cognitive workload [2] [4]. LM ATL employees wore ABM’s EEG acquisition system while performing a variety of classic experimental psychology tasks in which working memory load was varied (e.g., by varying the number of items that needed to be remembered in an N-back task). The EEG acquisition system measured the electrical activity of the participants’ brains with sensors placed on the scalp, allowing data to be captured unobtrusively. The EEG signals reflect the summated potentials of neurons in the brain that are firing at a rate of milliseconds. Discriminant function analyses determined appropriate coefficients for linearly combining measures derived from continuous EEG into a cognitive workload index, which ranges from 0 to 1.0 (representing the probability of being classified as “high workload”). The index values were validated against an objective appraisal of task difficulty and subjective estimates of workload and task focus. The cognitive workload index derived from this research effort increases with increasing working memory load and during problem solving, mental arithmetic,
336
P.D. Tremoulet et al.
integration of information, and analytical reasoning and may reflect a sub-set of executive functions. The cognitive workload levels from this index are significantly correlated with objective performance and subjective workload ratings in tasks with varying levels of difficulty including forward and backward digit span, mental arithmetic and N-back working memory tests [4] [3].
2 Method A Concept Validation Experiment (CVE), patterned after an HCI usability study in which active-duty personnel run operationally valid scenarios designed such that different phases elicit different levels of cognitive workload, was conducted in Norfolk, VA, from August 22-31, 2006. . 2.1 Participants Average age of the twelve active-duty participants was 29, and their years of service ranged from 1-18 years. Six participants had experience with one of the previous Tomahawk systems. All but one had some experience at Tomahawk workstations (averaging over six years) and had participated in an average of 52 Sea-Launched Attack Missile Exercises (SLAMEX) and 28 theatre exercises. Two participants had experience participating in an operational launch. All participants were male; one was left-handed, and two reported corrected 20/20 vision. 2.2 Equipment and Materials Testing was conducted at offices at the Naval Station in Norfolk, VA. The TTWCS prototype was simulated on a Dell Inspiron 8500 laptop connected to two, 19-inch flat panel LCD monitors displaying the prototype software. The monitors were positioned vertically to re-create the configuration of the actual Tomahawk workstation. The prototype system recorded timestamps for system and user events to a log file. One video camera was located behind the participant, and video was captured on digital video tape. The output from the console monitors was sent through splitters to two 19-inch displays to allow observers to view the screen contents from a less intrusive location. Continuous EEG data was acquired from a wireless sensor headset developed by ABM using five channels with the following bi-polar montage: C3-C4, Cz-POz, F3Cz, Fz-C3, Fz-POz. Bi-polar differential recordings were selected to reduce the potential for movement artifacts that can be problematic for applications that require ambulatory conditions in operational environments. Additionally, EKG information was collected on a sixth channel. Limiting the sensors (six) and channels (five) ensured the sensor headset could be applied within 10 minutes. The EEG data were transmitted via a radio frequency (RF) transmitter on the headset to an RF receiver on a laptop computer containing EEG collection and processing software developed by ABM. This computer was then linked to an Ethernet router that sent the EEG data to SMART. Wired galvanic skin response (GSR) and EKG sensors were connected to a ProComp™ Infiniti conversion box, which transmitted the signal via optical cable to
Workload-Based Assessment of a User Interface Design
337
two destinations. The first destination was a laptop containing Cardio Pro™ software, and to the SMART laptop where software logged the GSR data and used the EKG signal to calculate heart rate variability (HRV). Eye gaze and pupil dilation data were collected using the SmartEye™ system in which two cameras were positioned on either side of the monitor. An infrared emitter near each camera allowed the SmartEye™ software to calculate the associated eye movement and dilation data. Paper questionnaires included a background questionnaire and a user satisfaction questionnaire. An electronic version of the NASA TLX was administered on the TTWCS task machine in between task phases. The test administrator recorded notes concerning the testing session and completed rating of the effort observed by the participant during the test session. Subsequent to the test session, aspects of the scenarios and individual user behavior were coded and entered into the MIDS tool. This tool produces a second-by-second total workload value for the entire scenario. 2.3 Experimental Design The SMART workload validation was a correlational design in which the validity of a novel measure (sensor-based cognitive workload) would be compared with other measures that typify accepted practices for measuring workload within the HCI and Human Factors engineering research communities. 2.4 Procedure Each participant’s test session lasted approximately eight hours. The participants were briefed on the purpose and procedures for the study and then completed a background questionnaire with information regarding demographics, rank, billet, and Tomahawk experience. After the questionnaire was completed, a cap was placed on the participant’s head and six EEG sensors were positioned in it (at F3, Fz, C3, Cz, C4, and Pz). Two EKG sensors were placed on the right shoulder and just below the left rib. The participant was instructed to tell the testing staff if they were uncomfortable before or during the experimental session. An impedance check was done to ensure that interference to the signals was at 50 ohms or below; once this was verified, the RF transmitter and receiver were turned on. The participant was then asked to perform three tasks over 30 minutes to establish an individual EEG baseline and calibrate the sensors. The tasks required reading instructions and performing basic visual and auditory monitoring tasks. Next, individual profiles were created for the SmartEye™ eye tracking system by taking pictures while the participant looked at five predetermined points on the displays. Additionally, during hands-on training, specified facial features were marked on the facial images to further refine the participant’s profile. Finally, three EKG sensors were attached, one on the soft spot below each shoulder and one on the center of the abdomen, and two GSR sensors were placed on the second and fourth toes such that the sensor was on the “pad” of the toe. A towel was wrapped around the participant’s foot to keep it from getting cold.
338
P.D. Tremoulet et al. Table 1. Sensor devices used in CVE Sensor Vendor
Description
ABM
Collects EEG data from five channels and EKG on a sixth channel
SmartEye™
Collects point-of-gaze and pupilometry data
Thought Technology
Collects EKG and GSR data HRV calculated through third-party software
Hardware Six EEG electrodes Two EKG electrodes ABM sensor cap RF transmitter and receiver Two cameras Two infrared-flashes ProComp Infiniti converter Two GSR electrodes Three EKG electrodes
Once all sensors were applied and were collecting data, SMART software was started and began collecting data. Software from ABM, SmartEye™, and CardioPro™ were used throughout the experiment to monitor the signals from the sensors (Table 1). Network time protocol (NTP) was used to ensure that all machines were synchronized to within a fraction of a millisecond. Once the sensor equipment was calibrated and system software was running, participants were trained through lecture and given hands-on practice using the prototype. The lecture portion of training consisted of PowerPoint slides, which gave an overview of the system, as well as detailed pictures and descriptions of the windows and interactions that would be part of the participant’s experimental task. During the hands-on practice, a member of the testing staff guided the participant through sample tasks. The participant performed one practice trial using the system, during which various scenario tasks were accomplished. The testing staff answered any questions that the participant had during training. Training took one and a half hours (one hour of lecture and thirty minutes of hands-on practice). After training, participants were given a thirty-minute break. During the experimental test session, the participants were presented with a scenario that included various tasks associated with preparing and launching Tomahawk missiles, similar to those presented during training. The participant was asked to respond to system events and prompts. An experimenter observed and recorded errors and additional observations using SMART and supplemental written notes. SMART logged the sensor data and the objective cognitive measures derived from it, and the TTWCS prototype logged user and system events throughout the test session. The sessions were also video and audio recorded. The video showed only the back of the participant’s head and the two computer screens. In the test scenario, participants were asked to execute two, 10-missile salvos from receipt of the strike package until the missiles launched for first strike package. The main criterion for successful human-machine performance in this cognitive task environment was the degree to which missiles were successfully launched at their designated launch time (T=0). The scenario took approximately one hour and 15 minutes to complete. The scenario was divided into four task phases (each desired to have a different level of workload) that occurred in a specific sequence for a strike:
Workload-Based Assessment of a User Interface Design
339
• Phase 1: Initial validation with no error and engagement planning with errors • Phase 2: Validation due to receipt of execute strike package, preparation for execution, and emergent targets • Phase 3: Monitor Missiles • Phase 4: Launch missiles with emergent targets and receive and prepare second strike package At the end of each phase, the prototype was paused and the participant asked to fill out a questionnaire on perceived workload (NASA TLX). Then the scenario was resumed. After the fourth phase, the sensors were removed and the participant was asked to fill out a questionnaire on the perceived satisfaction with the system and then was debriefed to discuss any questions that they had. 2.5 Data Preparation Four measures of cognitive workload were collected during the test sessions: • Sensor cognitive workload (EEG-based cognitive workload value): Scores from the neurophysiologically-based gauges (CW-EEG) were generated by taking logs of second-by-second gauge output and averaging the workload values during the four different task phases. Data points associated with noisy sensor readings were removed before analysis ([1] provides a description of this procedure.) • NASA TLX (generated by participant survey): Total workload scores from the NASA TLX were used in the analyses. The total workload score is comprised of weighted scores of the six subscales (mental demand, physical demand, temporal demand, performance, effort, and frustration). One participant’s data was removed because he rated all four task phases in both scenarios the same. • MIDS (generated by expert observation and cognitive conflict algorithm): Scores for the MIDS measure were generated for approximately half the participants. Close examination of the task domain and videotape of the study allowed for generation of estimates of workload for individual sensory channels (i.e., visual, auditory, and haptic), cognitive channels (i.e., verbal, spatial) and response channels (i.e., motor, speech). Workload was calculated based on a task timeline by summing (1) the purely additive workload level across attentional channels, (2) a penalty due to demand conflicts within channels, and (3) a penalty due to demand conflicts between channels. The amount of attention the operator must pay to each channel in the performance of each task used a 5-point subjective rating scale (1: very low attentional demand; 5: very high attentional demand). These estimates were combined to create an overall cognitive workload estimate for each second that the participant was completing the scenarios ([5] provides details of this technique.) Values from the six participants were averaged and correlations were made with average values of the other measure. • Expert Rating (generated by expert observation): The expert ratings were generated by three HCI experts using a seven-point Likert scale. Experts rated the participants on six dimensions of interaction with the task. The mental effort rating was then extracted and scaled by rater to help eliminate individual rater tendencies.
340
P.D. Tremoulet et al.
3 Results As described above, four measurements were collected during each of the four phases of the test scenario. The measures estimated workload using either an external observer who monitored the actions of the participant, by the participant himself, or through the use of physiological-sensor-based gauges. The results are presented below first in terms of descriptive statistics and then in terms of correlations of the measures. 3.1 Descriptive Statistics Table 2 shows the mean values for the four measures. For all four measures, Phase 4 had the highest mean value. During this phase, participants were required to launch missiles while simultaneously preparing a second strike package. They also were given emerging targets, requiring quick response. For CW-EEG, Expert Rating and MIDS, Phase 3 was the lowest mean value. During this phase, the participants were primarily performing a vigilance task as the missiles were being prepared for launch, which required no specific interaction with the TTWCS HCI prototype. Table 2. Overall descriptive statistics Metrics CW-EEG (0 to 1.0)
NASA TLX (0 to 100)
Phase
Mean
Phase 1 Phase 2 Phase 3 Phase 4 Phase 1 Phase 2 Phase 3 Phase 4
0.703 0.702 0.659 0.708 22.56 28.62 24.64 41.56
Metrics Expert Rating (1 to 7) MIDS (1 to 5)
Phase
Mean
Phase 1 Phase 2 Phase 3 Phase 4 Phase 1 Phase 2 Phase 3 Phase 4
2.75 3.46 2.04 4.78 11.31 11.11 3.54 13.31
3.2 Validation of EEG-Based Cognitive Workload Validation of the EEG-based cognitive workload index (CW-EEG) was performed by computing correlations among the scores (NASA TLX, expert rating) or average scores (CW-EEG, MIDS) across subjects of the four measures collected during the CVE. Table 3 lists the Pearson’s product-moment correlation coefficient (r) of four measures of cognitive workload. Note that MIDS data was only coded on half the participants’ data). Table 3. Workload measures correlations table Metrics
CW-EEG
NASA-TLLS
Expert
MIDS
CW-EEG NASATLX Expert Ratings MIDS
-0.19^ 0.38**
0.19^ -0.34**
0.38** 0.34** --
0.51** 0.40* 0.56**
0.56**
--
0.51** 0.40* **p<.001, *p<.01, ^p<.10.
Workload-Based Assessment of a User Interface Design
341
As the table indicates, CW-EEG is significantly related to both the expert rating and the MIDS cognitive workload estimate. The probability that CW-EEG was related to the NASA TLX estimate of workload was not significant in the examined population.
4 Discussion The results of this validation study are encouraging for continued advancement in our understanding of users’ cognitive workload. The desire to simultaneously increase the military’s capabilities while reducing staffing requirements has resulted in greater demands on today’s military system operators. The capability to identify the cognitive demands of tasks has the potential to identify the limitations of existing technology interfaces and highlight those periods of interaction that result in excessive cognitive demand for the operator. Traditional methods for assessing cognitive workload have serious limitations such as intrusiveness and bias. For example, the NASA TLX requires administering a survey that demands time and attention from study participants, and the results of this survey fail to discriminate between long and short time periods of testing. Moreover, the NASA TLX includes physical demand as one of its dimensions and physical demands for C2 tasks have low variation. On the other hand, if an HCI expert evaluates the operator during a task, there is the potential for bias. Experimenters can only observe the expression of operators and must estimate the workload based on aspects of the task coupled with the particular reactions of the participant. Administering secondary tasks can produce an objective assessment, but the measures produced by this method are extremely sensitive to variations in either primary or secondary tasks. In contrast, SMART’s EEG-based measure provides an unbiased estimate that has minimal intrusiveness during the testing period. Additionally, high-resolution scores can be obtained for the duration of the test period, and this data can later be aggregated using a variety of methods. The study presented here was designed to assess the valicity of SMAR’s workload measure, by comparing it to three alternative methods for measuring cognitive workload. The results of this study indicate that the novel physiologically-based measure is significantly correlated with the expert rating and the MIDS workload estimate. The relation between SMART’s EEG-based workload index and the NASA TLX was not significant in the examined population. One possible explanation for the lack of a significant relation between SMART’s workload index and the NASA TLX is that the third-party perspective of the expert rating and MIDS may have been better related to the sensor-based gauge since all of these represent more objective measurements that do not include the participants’ individual preferences and desires. A second possible explanation is the validated measure of workload for the NASA TLX includes six weighted subscales; one would only expect SMART’s workload measure to be related to one of these subscales (mental demand). Depending on the participants’ particular weighting of the subscales, the relation between SMARTS workload index and the NASA TLX may have been obscured. Additional studies are needed to more closely examine the relation between user ratings, such as the NASA TLX, and physiologically-based measures of cognitive workload.
342
P.D. Tremoulet et al.
5 Conclusion The concept validation experiment discussed in this paper provides empirical validation of the utility of physiological monitoring as a method for non-invasively evaluating the workload associated with performing tasks using particular user interface designs. This experiment successfully demonstrated that LM ATL’s SMART tool can produce cognitive workload, visual engagement, distraction and drowsiness measures with high frequency without any action required by the usability study participant. Moreover, analyses of the experiment data revealed significant correlations between the average of the cognitive workload index values produced by SMART, expert ratings, and MIDS expected workload values. These results, taken together, indicate that SMART represents a valuable tool for user interface designers by providing a non-invasive, non-interrupting, objective, real-time, domainindependent, task-independent cognitive workload measures. Acknowledgments. This research was supported by Office of Naval Research program “Disruptive Technology Opportunity Fund” via the Space and Naval Warfare Systems Command (SPAWAR). We thank the Program Management Activity (PMA) 280 Project Office and the TTWCS System Development Activity (SDA) for their support of this project. We also thank Gene Kocmich of Northrop Grumman for his support in arranging the testing facilities in Norfolk, and Bill Fitzpatrick and Kevin Cropper of Johns Hopkins University’s Applied Physics Laboratory for their assistance running the concept validation experiment described here.
References 1. Berka, C., Levendowski, D., Cventinovic, M., Petrovic, M., Davis, G., Lumicao, M., Zivkovic, V., Popovic, M., Olmstead, R.: Real-time Analysis of EEG Indices of Alertness, Cognition, and Memory with a Wireless EEG Headset. International Journal of HumanComputer Interaction 17(2), 11–170 (2004) 2. Berka, C., Levendowski, D.J., Ramsey, C.K., Davis, G., Lumicao, M.N., Stanney, K., Reeves, L., Regli, S., Tremoulet, P.D., Stibler, K.: Evaluation of an EEG-Workload Model in an Aegis Simulation. In: Caldwell, J.A., Wesensten, N.J. (eds.) Biomonitoring for Physiological and Cognitive Performance during Military Operations. Proceedings of the International Society for Optical Engineering, vol. 5797, pp. 90–99 (2005) 3. Berka, C., Levendowski, D.J., Lumicao, M.N., Yau, A., Davis, G., Zivkovic, V.T., Olmstead, R.E., Tremoulet, P.D., Craven, P.L.: EEG Correlates of Task Engagement and Mental Workload in Vigilance, Learning and Memory Tasks. Aviation, Space, and Environmental Medicine 78(5), Section II (2007) 4. Craven, P., Belov, N., Tremoulet, P., Thomas, M., Berka, C., Levendowski, D., Davis, G.: Cognitive Workload Gauge Development: Comparison of Real-time Classification Methods. In: Schmorrow, D., Stanney, K., Reeves, L. (eds.) Foundation in Augmented Cognition, 2nd edn., pp. 66–74. Strategic Analysis, Inc., Arlington (2006) 5. Hale, K., Reeves, L., Samman, S., Axelsson, P., Stanney, K.: Validation of predictive workload component of the multimodal information design support (MIDS) system. In: Proceedings of the 49th Annual Human Factors and Ergonomic Society Meeting, Orlando, FL, September 26-30 (2005)
A Simple Simulation Predicting Driver Behavior, Attitudes and Errors Aladino Amantini and Pietro Carlo Cacciabue KITE Solutions, SNC Via Labiena 93, 21014 Laveno Mombello (Va), Italy [email protected]
Abstract. This paper presents the simulation tool called SSDRIVE (Simple Simulation of Driver performance). Following a brief description of the theoretical background and basic algorithms that describe the performance of drivers, the paper presents two case studies of DVE interactions, predicting dynamic situations according to different driver attitudes, in similar traffic conditions. In this way the potential ability of the simulation tool to consider behaviors and errors at different levels of complexity is demonstrated.
1 Introduction The difficulty of developing sound theoretical architectures able to capture the complex Human-Machine-System (HMS) and Human-Machine Interactions is well known and has been widely discussed in the literature. On the other side, an issue that is not always accurately considered is the assessment of the strength and versatility of the adopted numerical solutions and computerized algorithms that enable the implementation of the theories into predictive simulations [8]. Moreover, the overall theoretical model has to cope with the challenges derived from the inclusion, at human level, of cognitive variables such as intentions, motivation, or attitudes, and, from the machine side, of automatic and partially adaptive assistance systems to enable expanded and improved performances. From the software development perspective, the basic architecture of the simulation has to be designed from the early stages so as to accommodate the inclusion of more and more variables and complexity, in order to enable the representation of decision making processes and actions. This paper presents the simulation implementation, called SSDRIVE (Simple Simulation of Driver performance), of a model that describes at theoretical level the interaction between Driver-Vehicle-Environment (DVE) [4]. In the following sections, the SSDRIVE is initially briefly described with the objective to recall its essential algorithms and correlations. Then, the predictive ability of the tool is documented, showing the type of analysis and evaluation that can be performed when attitudes and personal characteristics of different drivers are modified by input data. Two case studies are analyzed of different driver behaviors in similar traffic conditions. Finally, the conclusions focus on possible further development of the simulation and its potential exploitation for safety assessments. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 345–354, 2009. © Springer-Verlag Berlin Heidelberg 2009
346
A. Amantini and P.C. Cacciabue
2 The Model of Driver Behavior The overall model of the Driver, Vehicle and Environment (DVE Model) is based on the concept of the “joint” cognitive system, where the dynamic interactions between driver, vehicle and environment are represented in a harmonized and integrated manner. The model focuses on the driver cognitive and behavioral performances, whereas the other two components of the joint DVE system, i.e., Environment and Vehicle, are dealt with relatively simple correlations [4]. At present, the modeling architecture is based on: 1. Parameters, which enable the consideration of dynamic behavior and interaction between the three components of the DVE system; and 2. Task analysis, which enables to formalize and structure the performance of a driver in a sequence of goals and actions carried out during the DVE interaction. 2.1 Parameters and Modeling Architecture It has been assumed that five parameters govern the driver behavior, namely: Experience/competence (EXP), i.e., the accumulation of knowledge or skills that result from direct participation in the driving activity; Attitudes/personality (ATT), i.e., a complex mental state involving beliefs and feelings and values and dispositions to act in certain ways; Task Demand (TD), i.e., the demands of the process of achieving a specific and measurable goal using a prescribed method; Driver State (DS), i.e., Driver physical and mental ability to drive (fatigue, sleepiness…); and Situation Awareness/Alertness (SA), i.e., perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near futures. They interact within a “classical” human Information Processing System (IPS) architecture, which assumes that behavior can be described in as similar way of a “system” that “processes” the information and signals to which it is exposed. The most known examples of the IPS related models are the SRK (Skill-Rule-Knowledge) approach of Rasmussen [7] and the Michon’s [5] scheme, where the primary driving task is described at different levels of abstraction, namely strategic, tactical and operational. A very simple way to implement the principles of the IPS paradigm is to describe the four basic functions of cognition, namely Perception, Interpretation, Planning and Execution (PIPE). In this approach: Perception considers the sensorial inputs (signals) generated by the vehicle and the environment; Interpretation is the elaboration of the perceived information; Planning implies the formulation of goals and intentions and/or selection of tasks to be carried out; and finally, Execution is the actual performance of actions. The five parameters affect mainly the first three functions of the PIPE architecture. In other words, the environmental and vehicle variables are perceived and interpreted by the driver and generate typical decision making quantities, such as intended speed, overtaking a leading vehicle or stop own vehicle, to attain higher speed, or lower speed, or to maintain speed, etc. The cognitive function “Execution” describes the actual actions performed by the driver. The model makes the assumption that certain actions are performed in an automatic way, with no cognitive effort. These are typical “skill-based” activities, such as the control of the lateral and longitudinal safety margins, as well as the acceleration (or deceleration) to attain higher (or lower) speed.
A Simple Simulation Predicting Driver Behavior, Attitudes and Errors
347
2.2 Task Analysis The most suitable model for representing driver behavior during the performance of activities is to apply a simple “Task Analysis” approach. In order to represent the set of Tasks that are carried during driving, a certain differentiation has been defined according to the complexity associated with a task: • Elementary Functions, represent basic activities; • Elementary Task, are task made of elementary functions; and • Complex Task, are tasks made of a combination of elementary tasks. In addition to “regular” tasks, the model accounts for permanent or automatic tasks, which are identified by the fact that they are permanently carried out during a DVE interaction and do not require specific pre-conditions to be launched. These are the stereotypes of what are called “skill-based” activities described above.
3 The Simulation of Driver Behavior The overall SSDRIVE simulation architecture and process are described in Fig. 1. In particular, at each time interval, the driver model receives the vehicle and environment variables which have been calculated at previous time steps. These variables affect the evaluation of the five parameters and consequently the selection the new task and associated actions that are carried out, essentially: settings of indicators and basic vehicle control actions, such as steering and accelerating/braking in order to obtain to the desired vehicle position and speed. The Driver Model calculates the implementation of tasks and generates profiles of steering angle and actions on the break or the accelerator, in order to accommodate for the desired lane position and speed. The Simulation Manager is the governing module of the simulation, which receives input from all three main DVE components and evaluates the overall error generation mechanism by means of the Driver Impairment Level and keeps account of the synchronization of the simulation. The effect of ADAS and IVIS, if contained by the vehicle simulation, is also fed to the simulation manager. In particular, the behavioral part of the simulation implements a number of algorithms for the control of steering and the attainment of intended speed: • The control of the speed and acceleration has been developed on the basis of empirical correlations [6], which account also for road geometry, principally curves. It considers also the driver attitude and cognitive performance, according to the parameters, primarily EXP, ATT and TD. • The steering performance has been based on the dynamic interaction of the driver with the vehicle and road and by means of an algorithm that predicts vehicle performance, based on the perceived road characteristics and vehicle speed and position. This algorithm calculates the lateral distance between the middle of the lane (lane centre-line) and the vehicle position according to forecasts generated by the human visual, proprioceptive and vestibular systems (Ayres, 1979). These systems enable human beings to predict their motions processes according to visual as well as feelings processes. A graphical representation of the way in which this algorithm works is shown in Fig. 2.
348
A. Amantini and P.C. Cacciabue
Environment
Environmental Parameters
• Vehicle speed • Vehicle position • …...
Simulation Manager
Driver’s Parameters
Driver
Task analysis & Driver’s Actions
Next ∆t
• Vehicle speed • Vehicle position • ADAS interface • IVIS interface • …...
• Weather conditions • Road conditions • Traffic • .....
Obstacles
• Weather conditions • Traffic and road conditions • …... • Perception
DRIVER
ControlIVIS/ADAS, desired
• Interpretation
vdesired
• Planning (Intentions)
Positdesire
ControlIVIS/ADAS, drive
• Task & actions Elementary funct.
Brake Pedal Force Accel. Pedal Force Steering Angle
• Lane position (steering angle) • Speed profile (brake/accelerator) • Control settings (indicators, …...) Vehicle Dynamic
Fig. 1. Overall dynamic DVE interaction process
• The error generation model enables to account for errors made at different levels of cognition, which is then manifested in inadequate actions. 3.1 Error Generation The last open issue concerns the correlation of the five basic parameters with driver behavior and performance with respect to error generation. The mechanism that has been devised to describe Driver error and behavior in relation to the basic parameters has been called Model of Basic Indicators of Driver Operational Navigation (BIDON). A short description of the error model follows, whereas a more extended description may be found elsewhere [2]. It is assumed that the variables affecting the driver behavior, i.e., the dynamic parameters DS and SA and TD, as well as the quasi-static parameters EXP and ATT are represented by “containers” with thresholds/levels, that change from a driver to another and enable to define the overall state/performance-ability of the driver, as the DVE interaction evolves. At simulation level, the parameters are associated with initial values (SA0, DS0, etc.) and will change every time certain events occur, according to the correlations that associate dynamic vehicle and environment variables and parameters, as described above. The values of the parameters affect the driver performance and error making. In other words, this mechanism contributes to the dynamic process of progressive “filling” or “draining” of the “containers” of the BIDON model (Fig. 3). The error generation mechanism that is implemented in the SSDRIVE is essentially associated to a single parameter called, Driver Error Propensity (DEP) which represents the tendency towards error making (including violations) that may be generated by impairment, as well as by other possible conditions, e.g., distractions, lack of knowledge, intentional actions etc., which are not related to impairment. The following correlation applies: DEPt = f(SAt, DSt, TDt, EXP, ATT)
0 ≤ DEP ≤ 1
The logics and fuzzy formulations that are implemented in the simulation depend on the user availability of specific algorithms and correlations. A dynamic evolution of sequences of possible error generation is depicted in Fig. 3.
A Simple Simulation Predicting Driver Behavior, Attitudes and Errors
2
0
TD
4
6
8
10
349
time
Driver Parameters at time t2 TD
Driver Parameters at time t6
SA SA DS TD
Driver Parameters at time t4
DS
SA
DS
Fig. 2. Schematic representation of proprioceptive and visual anticipation
Fig. 3. Error mechanism and dynamic sequences generation
4 Sample Cases of Predictive Simulations of Driver Behavior In order to demonstrate the predictive power of the SSDRIVE tool a number of case studies have been performed. The track of a Virtual Reality (VR) full scale simulator was utilized in terms of carriageway characteristics, mainly curves and straight lanes. The track was subdivided in six sections (Fig. 4). The traffic has been simulated in terms of the SSDRIVE input data. This allows the user to consider any number of leading and incoming vehicles, as well as different types of complications that may be found along the road, such as stationary vehicles, pedestrians and other objects. Various driver behaviors can be simulated simply by changing the input data associated to the so called main-car and main-driver. The main driver is traveling the road, having to deal with other cars, both in the same and in the opposite lane. These other vehicles are simulated as “obstacles”, that is they are handled as entities with a certain mass and volume traveling at a constant speed. As opposed to the main-car, which is simulated by means of a simple but sufficiently realistic model, so as to respond to the control actions of the main-driver, obstacles do not have a behavior-choice process, and cannot change speed or direction. The objective of these case studies was to show the potentialities of the simulation tool. In the following, the DVE interactions will be described and discussed for a specific part of the track, namely the first two sections and limited to the first two overtaking maneuvers. The correlations that have been utilized to describe the maindriver behavior are very simple and coincide with those implemented as default in the SSDRIVE. Moreover, the main-vehicle is simulated without any particular advanced on board system. For a safety assessment and/or the evaluation of various driver behaviors in the presence of diverse safety and control systems, a different type of analysis would be required, covering more extensive timing of simulation, as well as numerous DVE interactions and a variety of traffic situations. However, this type of study goes beyond the scope of the present paper and would require the use of more detailed and validated correlations for the parameters characterizing driver behavior and in-vehicle support and information systems.
350
A. Amantini and P.C. Cacciabue
Fig. 4. Simulator subdivision in sections for SSDRIVE simulation purposes
In the cases under discussion in the following, in the first two sections of the track, the simulation scenario includes a couple of cars, traveling at low speed in the same lane of the main-car. The driver (main-driver) aims is to overtake them and proceed with higher speed. However, in order to achieve this objective, the presence of other four cars coming in the opposite direction must be taken into consideration. Depending on the driver characteristics, different actions will be performed. The presence of vehicles in both lanes is kept the same for both simulations. 4.1 Case Studies Input Data For the two case studies, the following parameters are associated to the main-driver and kept constant throughout the DVE interactions. Case study 1: − Attitude: ATT = Prudent − Experience: EXP = High − Task Demand: TD = Medium Case study 2: − Attitude: ATT = Sensation Seeker − Experience: EXP = High − Task Demand: TD = Low In Case study 2 the attitude of the driver is modified, passing from “prudent” to “sensation seeker”. The TD is also changed decreasing from medium to low. No other changes are made. This implies that, in the second case, a more aggressive type of driving is expected, with higher speeds and more abrupt changes of lanes, acceleration and braking maneuvers. As discussed above, the desired speeds are defined by the simulation at high cognitive level (perception, interpretation and planning), while the more behavioral activity (executions of actions, e.g., braking, accelerating, lane change, etc.) are handled by the part of the simulation based on the SRK Model. In particular, even if TD is a typical dynamic parameter, for these two sample cases, TD as well as ATT and EXP are kept constant throughout the simulation.
A Simple Simulation Predicting Driver Behavior, Attitudes and Errors
351
4.2 Running of Test Cases and Review of Results The output of the SSDRIVE is based on a pictorial outcome of the dynamic vehicle control operations which has the objective of showing the DVE interactions in “real simulation time“. In addition to the road and vehicles dynamic, these pictorial outputs show the dynamic evolution of several relevant variables, such as the intentions of the driver (follow, overtake, change lane, cruise, etc.), the intended and current speed, the acceleration, or braking actions, etc. In addition to this, all variables calculated during the simulations are collected and stored in data logs for more detailed and specific analyses. The amount data collected during the simulation offers the possibility of a much more accurate and complete analysis of the DVE interactions. As an example, the log files obtained from the two case studies allow the comparison of the values for desired speed, actual speed and distance vs. time. These data cover the first two overtaking maneuvers of leading vehicles. The outcomes of the data logs with the output of the simulation runs are shown in Fig. 5 to Fig. 8. The analysis of these data logs allows a much more accurate assessment of the simulated behaviors of these two different types of drivers. Prudent driver behavior More in detail, focusing on the prudent driver, Fig. 5 shows the actual and desired speed, and the distance covered by the main-car vs. time during the first 60 seconds of simulation. Fig. 7 presents the driving “modes” and lane position main-car vs. time. The prudent driver starts by choosing a desired speed of about 95 km/h (Fig. 5). When the first leading vehicle is reached, at about 13 s. of the simulation (Fig. 7), the driver decides to overtake, and, as no obstacles are found on the opposite lane, the manoeuvre starts. The driver changes lane (“change lane” driving mode), and continues the overtaking until the leading vehicle is passed, before returning on the rightmost lane. This operation is completed in about 20 seconds. The driver then continues until reaching a second leading vehicle proceeding at a slower speed at approximately 45 seconds of simulation. The driver decides to overtake the leading vehicle and starts by increasing the desired and actual speeds (Fig. 5). However, in the meanwhile, an incoming vehicle in the opposite lane is approaching, and as this vehicle is too close, the overtaking maneuver has to be interrupted by braking, reducing speed and assuming the driving “mode follow” (Fig. 5 and Fig. 7). Once the incoming vehicle has passed, the overtaxing can take place and the operation is performed and concluded at about 60 seconds. A final comment may be made on the performance of the prudent driver shown in Fig. 7. During the second overtaking process, when the desired speed of ~ 27 m/s (97 km/h) is reached (at ~ 55 sec.), the speed shows a sudden reduction before returning to its desired value. This is due to the fact that the driver releases the accelerator, and actually touches the brakes, as consequence of the attitude of prudence, before gradually regaining and maintaining the desired level of speed.
352
A. Amantini and P.C. Cacciabue
Fig. 5. Speed (actual and desired) and distance vs. time - Driver ATT = Prudent
Fig. 6. Driving “modes” and lane position main-car vs. time - Driver ATT = Prudent
Sensation Seeker Behavior The analysis of the behavior of a sensation seeker type of driver (Fig.7 and Fig. 8) shows a substantial difference in comparison with the previous simulation. In particular, the desired speeds are in general considerably higher; already the first desired speed is of 120 km/h, as apposed to 95 km/h. As a consequence, the actual speed is also higher and the first leading vehicle is reached in a shorter time with respect to the previous case study. However, in this case, the overtaking can not take place immediately, as an incoming vehicle from the opposite lane is too close. The driver has to brake, reduce speed and enters a “following mode” to avoid collision with the leading vehicle, until the overtaking lane is free and the maneuver can take place (Fig.7 ). It is noticeable that, because of the traffic conditions, the sensation seeker, in order to complete this first overtaking maneuver, has actually taken the same time (if not a little longer) than the prudent diver, although the intended speed was almost 30% higher. After the first overtaking has been performed, the vehicle reaches the second car on its lane at about 37 seconds. The intended and actual speeds of the main-vehicle are much higher than before. This time, considering its own speed, the speed of an incoming vehicle and the distance from it, the driver decides that it is possible to overtake immediately. Therefore, the driver does not wait that the incoming car has passed and accelerates to an even higher speed in order to carry out the overtaking of the leading vehicle, completing the maneuver at about 40 seconds. In comparison with the previous case, the consequence of the more aggressive attitude and higher speeds of the sensation seeker is that the second overtaking is performed at higher speed and much earlier than the case of the prudent driver. Moreover, no particular effects are present of speed control and reduction, as it occurred in the previous case, when the final cruising speed is reached. Finally, it can be observer that, although the first overtaking manoeuvre requires almost the same time by the two drivers, the overall sequence, i.e., both overtaking and setting of last cruising speed, is completed much earlier by the sensation seeker than the prudent driver (40 sec. vs. 55 sec.). Moreover, the final cruising speed of the sensation seeker is much higher than the one of the prudent driver (~ 137 km/h vs. ~ 100 km/h), and also well above speed limits!
A Simple Simulation Predicting Driver Behavior, Attitudes and Errors
Fig. 7. Speed (actual and desired) and distance vs. time - Driver ATT= Sen. Seek
353
Fig. 8. Driving “modes” and lane position main-car vs. time - Driver ATT= Sen. Seeker
5 Conclusions This paper has presented and discussed the simulation tool SSDRIVE that aims at offering designers and safety analysts an instrument for performing rapid and fast running predictions of Driver behavior and DVE interactions in different dynamic conditions. The tool is a classical computer simulation program that imbeds a set of algorithms and correlations derived from a well defined modeling architecture. It exploits the power of modern information technology that enables the fast running and solution of complex algorithms and generates large data structures and detailed graphical outputs. In order to show the potentiality of the tool, a number of case studies have been performed simulating specific personality attitudes of drivers by means of specific input to the DVE module ”Driver”. It has been shown that the SSDRIVE enables to predict traffic evolution and generation of potentially risky situations. At present only a limited set of validation experiments of the simulation algorithms has been performed. Therefore, the use of the results of simulation has to be very carefully considered, especially when its predictive power is exploited for safety critical types of applications. At the same time, it is known that many correlations exist and have been tested that are able to describe different attitudes and behavior. However, these are usually confidential and not open to general application. For these reasons, the software structure of SSDRIVE has been developed in such a way to enable users to implement different correlations and formulations according to own and propriety data sets. These types of problems exist in all types of numerical solutions adopted in modern technology and represent an important question that is not always sufficiently recognized vis-à-vis of other more flashy issues, such as “software bugs” or “configuration issues” that affect modern computer technology Finally, even if the major steps of implementation of the dynamic DVE interaction have been resolved, a substantial amount of work remains to be done for completing the tool and enabling a more accurate and extensive simulation. Examples of such improvements are:
354
A. Amantini and P.C. Cacciabue
• The implementation of the driver error generation mechanisms, based on the generation of multiple logical scenarios derived from different error modes. • The development of further sets of tasks and goals, so as to complete the implementation of the overall driver activity, including the crucial effects derived and associated with adaptation aspects [3]. • The development of other driver-modules so as to consider multiple interaction of populations of drivers in a shared traffic context. • The implementation of more complex models of vehicles in consideration of multiple Advanced Driver Assistance Systems and In-Vehicle Information Systems; etc. All these improvements are included in a plan of development of the SSDRIVE tool that will progressively be developed and implemented as further research will be performed with the aim of developing a tool that will be eventually useful and supportive for design and implementation of new instruments dedicated to accident prevention and safety improvement in the domain of automotive systems.
References 1. Ayers, A.J.: Sensory Integration and the Child, Western Psychological Services (1979) 2. Cacciabue, P.C., Re, C., Macchi, L.: Simple Simulation of Driver Performance for Prediction and Design Analysis. In: Cacciabue, P.C. (ed.) Modelling Driver Behaviour in Automotive Environments, pp. 344–375. Springer, London (2007) 3. Cacciabue, P.C., Saad, F.: Behavioural adaptations to driver support systems: a modelling and road safety perspective. Int. Journal of Cognition Technology and Work (CTW) 10(1), 31–40 (2008) 4. Carsten, O.: From driver models to modelling the driver: what do we really need to know about the driver? In: Cacciabue, P.C. (ed.) Modelling Driver Behaviour in Automotive Environments, pp. 105–120. Springer, London (2007) 5. Michon, J.A.: A critical review of driver behaviour models: What do we know? what should we do? In: Evans, L.A., Schwing, R.C. (eds.) Human Behaviour and Traffic Safety, pp. 487–525. Plenum Press, New York (1985) 6. Oregon State University, Portland State University, University of Idaho. Transportation Engineering. On line Lab Manual (2007), http://www.webs1.uidaho.edu/niatt%5Flabmanual/ 7. Rasmussen, J.: Information processes and human-machine interaction. In: An approach to cognitive engineering. North-Holland, Oxford (1986) 8. Salvucci, D.D., Liu, A.: The time course of a lane change: driver control and eye-movement behaviour. Transportation Research Part F 5, 123–132 (2002)
Nautical PSI - Virtual Nautical Officers as Test Drivers in Ship Bridge Design Ulrike Brüggemann and Stefan Strohschneider Department for Intercultural Business Communication, University of Jena [email protected], [email protected]
Abstract. Ship bridges are control centers that operate and manage the ship as a complex socio-technical system. At the University of Jena we have established a project that aims to understand and to explain the nautical officers’ behavior. This work is embedded within a broader project network that seeks to develop a ship bridge that is more standardized, more integrated and better adapted to human performance. Our way to achieve these goals involves anthropologic fieldwork and the construction of a computer simulation called Nautical PSI that models the nautical officers’ psychological processes on the theoretical foundation of the PSI theory. This virtual nautical officer can be used as test driver for virtual bridges during to the design process. Keywords: Human / machine interaction, human performance modeling, PSI theory, ship board bridge design.
1 Introduction A ship bridge is a complex socio-technical system. It provides a control center that brings together different devices that combine and coordinate information and processes. The user of this ship bridge - the nautical officer - is confronted with many interfaces that depict the ship and the maritime environment and that allow interaction with this environment. This represents a great challenge for the nautical officer’s cognitive system: he has to supervise a number of devices, he has to monitor many values and keep them within a certain range, he has to integrate the data to an image of the ship’s condition, he has to determine and keep the ship’s course and has to avoid collision with other ships under consideration of often adverse environmental conditions. At the same time he has to manage cargo, personnel and communication with traffic control, the ship’s owner, charterer and other ships. An additional complication is that every ship bridge is unique, because ship bridges are ‘grown’ structures that evolve through the installation of new equipment. This is driven by technical innovations rather than user requirements and accompanied by the permanent expansion of rules and regulations by international bodies during design, construction and maintenance. In addition there is a high cost pressure that supports a ‘wherever there is free space’ mentality and that hinders integrated solutions. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 355–364, 2009. © Springer-Verlag Berlin Heidelberg 2009
356
U. Brüggemann and S. Strohschneider
Obviously this presents great potential for trouble in the human / machine interaction, while it is nearly impossible for a designer to explore this potential completely without the assistance of a simulation tool.
2 The Human Factors Perspective and the Modeling Approach These observations are the origin of the DGON-bridge project network (10/2005 9/2009) and accordingly two approaches are followed. On the one hand industry partners look from the technology to the user and are addressing the standardization and modularization of the bridge and its equipment. On the other hand we, as psychologists, employ the human factors perspective by looking from the user to the technology, i.e. we aim at a bridge that is better adapted to human performance especially regarding the interaction design of the human / machine interfaces. Particularly this consideration of the user perspective is long overdue, because in contrast to other industries, the human factors perspective has not fully struck roots into the maritime business [1]: The analysis of maritime incidents shows that the main safety risks are deficiencies in the human / machine interfaces [3], i.e. technology should not only be appropriate and robust in a technical sense but also in a psychological sense [2]. But there is only very limited knowledge about the nautical officers’ cognitive processes: how and which data is he collecting? How is he integrating them to a ‘situational’ image? Which data and actions belong together? Which displays are important under what circumstances? And last but not least, how is he organizing his work process, what is happening in his mind and how is he regulating his actions? In employing the human factors perspective we used two main approaches: Firstly, we did anthropological field research, i.e. we undertook user interviews in 2005 and 2006 [4] and had several weeks of free and systematical observations on board container feeders in the German Bight and the Baltic Sea in 2006 to 2008 [5,6]. In addition to this, we are going to undertake systematical observations with the assistance of a ship (bridge) simulator in spring 2009. Secondly we are mapping all our findings into a cognitive model of the nautical officer’s psychological processes (Nautical PSI). First and foremost this is an unambiguous, complete, consistent and dynamic formulation of our findings. Furthermore, this model construction process aims to develop a model that can be used as a representation of the nautical officer in order to determine and evaluate the cognitive requirements of diverse bridge and bridge equipment designs in different situations [7,8]. Last but not least a long-term objective is the further development of the Nautical PSI into an assistance system, which relieves the mariner from part of the navigational burden by steering the ship in standard situations. The application of the Nautical PSI model as a representative of the human being has several advantages as compared to the common way of studying the human / machine interaction in the maritime business. Usually physical models, i.e. bridge mockups1, are constructed that can be operated in high fidelity ship (bridge) simulators. These mockups are either used as a demonstration to present one’s product offerings, or commercially available solutions are built into ‘educational‘ simulators for 1
Constructional structures consisting of plywood, which gather the designated equipment.
Nautical PSI - Virtual Nautical Officers as Test Drivers in Ship Bridge Design
357
professional training. One would expect that these constructions are also used in ‘evaluation’ simulators to comparatively evaluate different bridge and/or interaction designs, i.e. to run through different scenarios with different designs and with comparable test drivers. For various reasons this rarely takes place. First of all a representative study is nearly impossible because there is a wide range of nations, professional educations and generations and furthermore there are no findings about distribution of personality traits or types of nautical officers. This would make the recruitment of the participants a challenge in its own right. Secondly, for a realistic picture, the whole work context (i.e. a number of secondary tasks and distractions, oftentimes fatigue etc.) would have to be considered, which is quite complex and can constitute ethical problems in the case of fatigue. Thirdly, this would be prohibitively expensive because such a study would be very time consuming and time in high-fidelity ship (bridge) simulators is very costly. Consequently, this is a classical field for computer simulation tools like the one presented in this article. They allow the running of multiple scenarios to find event paths and to identify crucial factors. (E.g. the wide ‘terrain’ of mariner personality could be explored to assess the significance of diverse personality parameters. This could guide further empirical work in parameterizing and validating the model and in identifying user types.) Generally spoken with a simulation tool new developments can be studied and considered more extensively, in more detail and in general more effectively in virtual simulations than in physical mockups and therefore should be used in the development phase. Beyond this application-oriented goal of the development of the Nautical PSI model there is also an additional fundamental research interest. The underlying PSI theory is a holistic psychological theory of the human psyche (see detail below) and one is always looking for application fields that allow testing und further development of one’s psychological theory. The navigation of a ship is a very convenient and rewarding task because a ship (bridge) is quite isolated with very reduced environmental interfaces, i.e. there are relatively limited factors that describe the environmental situation (distance to land masses / sea bottom, distance to prescribed track, closest point of approach to other ships, abidance of right of way and other traffic rules) and very limited possibilities to influence this situation (i.e. setting course and speed). Additionally, this isolation has obvious advantages for empirical observations.
3 The Model System Both the given constellation and the project objectives suggested interpreting the given situation within the agent paradigm2. Consequently, the maritime environment has to be modeled; the ship itself is interpreted as the agent’s body and the ship bridge and its equipment are interpreted as the sensorimotor interface for the Nautical PSI agent. This results in the following model system: 2
„An autonomous agent is a system situated within and part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to affect what it senses in the future“ (Franklin & Graesser, 1997, S.4).
358
U. Brüggemann and S. Strohschneider
The virtual nautical officer - the Nautical PSI - is embedded within a set of model ‘shells’ (see fig. 1). The outermost shell - the environmental model - is a model of the maritime environment that depicts landmasses, water depth, wind, current, navigation marks and other ships. The second outermost shell - the ship model - is a model of one’s own ship that depicts the movement in the water and over the ground as a result of propulsion, rudder, current, wind and other factors. It also models the ship’s sensors like GPS (position, course and speed over ground), anemometer (wind), bathometer (water depths), lugger (speed in water), gyro (heading) and radar (land masses, navigation marks, other ships). For the time being, the two outer shells are complete. They are rendered quite simply, but can be easily replaced with a high-fidelity ship (environment) simulation, a feature that will be tested in late spring 2009.
Fig. 1. The model shells
The third outermost shell - the equipment model - is a model of the bridge equipment that reproduces the appropriate data processing (e.g. determining the distance of and the time to the closest point of approach to other ships). Also, this shell is finished for the time being and could be partially or totally replaced by the real external devices if they would be provided with appropriate interfaces (these are not state-of-theart and therefore not going to be tested in the foreseeable future). The next two inner shells represent the interface between the psychological and the physical world. The presentation and control surface model depicts the data presentation and the activation operations of the bridge equipment while the sensorimotor model depicts the capabilities and the conditions of the nautical officers perception and manipulation system and his position on the bridge. The interactions of these two shells allow on the one hand to determine which information can be perceived in principle and how easily it can be perceived. Then the Nautical PSI model considers
Nautical PSI - Virtual Nautical Officers as Test Drivers in Ship Bridge Design
359
attention among other factors and determines if a piece of information is actually noticed and how long this process takes. On the other hand, the interaction of these two shells determines which activation operations are executable and how long it takes to enter an operation command successfully. These two shells will be finally completed in spring 2009. Finally, the cognitive Nautical PSI model represents the inner core of this shell system. It recreates the mariner’s psychological processes and is - for the time being nearly finished:
4 The Simulated Nautical Officer In search of a psychological foundation for the Nautical PSI we were looking for a comprehensive ‘grand unifying’ theory of the human psyche that describes human performance in all its strengths and weaknesses. Unfortunately, a ‘holistic’ systems perspective is rarely found in psychology and therefore there were only two promising approaches. One the one hand, there are the two cognitive structures ACT [10] und SOAR [11], which both are production systems. They describe psychological processes as the result of a deduction mechanism that draws conclusions from data and rules. The disadvantage of these architectures is that they primarily show cognitive behavior and lack assumptions about motivational and emotional processes [12,13,14,15]. One the other hand, there is the PSI theory of Dörner, which originates form the examination of the behavior of human beings in complex situations and the errors that can occur [16]. It is a grand psychological theory of the human psyche with emphasis on action regulation that describes the interaction of cognition, motivation, emotion, perception, learning and memory [12,17,18,19]. The development of the PSI theory and our application on nautical officers’ behavior follows a reconstructive approach, i.e. starting from the question of how a (psychological) system that shows observed behavior could be constructed, one is trying to understand and explain the system’s behavior through reverse engineering. Therefore, an agent (an autonomous program) constructed on the foundation of the PSI theory does not follow a standard architecture, nor is this an artificial intelligence approach. Instead one tries to rebuild the psychological instances and processes postulated by the PSI theory. We preferred the PSI theory basically because it’s a comprehensive theory that includes motivational and stress adaptation processes, which appears to be especially important in the context of our research. In the following the theory itself and its application to the mariner’s psychological processes will be described. The PSI theory does not assume a central executive process, but rather a set of parallel operating mechanisms for information processing. These mechanisms trigger each other and coordinate each other via memory structures (see fig. 2): First of all, there is an automatic perception process that identifies objects in the environment. At regular intervals this process is used to create a ‘situational’ image. In the case of a mariner this is something like: I am at position xy, the water is 20 meters deep; there is a distance of 100 meters to the prescribed track and a deviation of 0.1 degree to the course to the next track point; there is a ship 1,000 meters in front of me and another one starboard astern.
360
U. Brüggemann and S. Strohschneider
Fig. 2. The PSI architecture; see text for explanation
In addition to the identification of environmental objects, ongoing events are identified by a similar process and are categorized (the ship in front of me has the same course but it is slower: it is a ‘fellow runner’; the ship astern is considerably faster than me and has changed it’s course to starboard: it is an ‘overtaker’). On the one hand, these identified events complement the situational image. On the other hand, they inevitably result in the knowledge-driven construction of an ‘expectational’ horizon, an extrapolation of the current situational image into the future. In the case of the nautical officers this could be: the ship on starboard is going to overtake me and is going to finish this maneuver in five minutes; if I don’t do anything, I will crash into the stern of the ship ahead. Situations are evaluated according to nautical needs: Two basic needs are to ‘stay on the track’ and to ‘head for the next track point’. More complex needs are to avoid the more or less endangering objects above and beneath the surface like buoyage, other ships, the coastline and shallows (the need for avoiding collisions with other ships has already been implemented; other obstacles are subject to future work). Beside that there are other non-nautical needs (see below). If a departure from the track is noticed or if such a departure is to be expected, the need ‘stay on track’ is activated by a certain degree, that reflects the degree of deviance and that depends on the significance of the need (the departure of the track may lead to trouble with the master or the ship’s owner, but a collision with another ship compromises lives: of one’s own and of others). The activation of a need always creates an intention. An intention is a temporary memory structure whose function lies in linking all knowledge that is needed to eliminate the current need or otherwise to
Nautical PSI - Virtual Nautical Officers as Test Drivers in Ship Bridge Design
361
avoid the emergence of an anticipated need. Particularly, this is knowledge about intervening operations and their consequences that originate from experience, training and observation. But first and foremost - when an intention has become dominant - it will try to find a plan to satisfy the underlying need and once a promising plan has been found, it will be implemented and supervised. Usually there are several intentions in the intention memory. Not all of them refer to the ship guidance. In addition to intentions like ‘correct course deviance’ and ‘avoid collision with the ship ahead’ there may be other nautical intentions like ‘fix cargo documents’, ‘have the railing painted’ or distractive intentions like ‘drink coffee’, ‘smoke a cigarette’, ‘have a chat’ etc. (exemplary distractions will be implemented in spring 2009). Moreover, to satisfy one need often means to evoke another one, e.g. during a track return maneuver it is inevitable that the course to the next track point is temporarily abandoned. (Because during this maneuver the ship is necessarily heading towards the track, and therefore necessarily not heading towards the track point at the end of the according section of the track). Therefore, a selection mechanism must exist, that operates on the basis of a valueexpectation-principle. This mechanism selects a ‘dominant’ intention, which rules the behavior. By use of the above-mentioned knowledge, a promising plan is determined and its implementation is supervised until another intention becomes dominant. For instance, while returning to the track, the intention ‘stay on track’ becomes weaker while the strength of the intention ‘head for next track point’ (which asks - if you are on the track - for steering the same course as ‘stay on track’) does not change. At some point near the track it becomes stronger, takes command and puts the ship onto the prescribed track. For the determination of a promising plan, Nautical PSI uses knowledge about the effects of intervention operations, mainly the change of course and maybe further on the change of speed. The degree of a course change is varied and thereby a set of alternative future developments is build up in the expectational horizon. These alternatives are evaluated according to nautical needs (for the time being, distance to prescribed track, deviance from course to next track point and distance to other ships that have the right of way in the closest point of approach), and the best one is taken, i.e. the appropriate course change towards the track and the resulting course change on the track later on will be implemented. Like all other psychological processes, this planning process is modulated by a set of so called control factors and modulators that - among other things - determine the invested cognitive effort (here the number of tested alternatives and the temporal depth of the projection): Particularly important for the behavior modulation are two basic ‘cognitive’ needs which have sort of a self-optimizing function: On the one hand the need for certainty urges us to get to know our environment, to make it predictable, and on the other hand the need for competence compels us to test and improve our ability to influence our environment. These two needs together with other control factors (like the unambiguousness and desirability of the expectational horizon; amount, strength and success probability of the intentions in the intention memory) modulate the cognitive system’s mode of operation. They determine if we work faster or more slowly, more or less accurately, update more or less often our situational image etc. Thus we assume that the cognitive system works according to self-adjusting principles by adapting its mode of operation to the workload condition, to its ability to solve problems and to
362
U. Brüggemann and S. Strohschneider
the quality of its environmental knowledge. In the maritime environment the (lack of) certainty mainly stems from confidence in the situational image and the expectational horizon, i.e. it depends on the perceived reliability of the displayed information, the ease of transforming the displayed information into a situational image and especially from one’s own ability to correctly predict the future, i.e. are my expectations fulfilled. The (lack of) competence stems from the subjective capability to handle one’s own ship and especially its equipment and from the capability to meets one’s own needs. In principle, all described instances and processes have been implemented and concluding work (especially on the modulation) will be done in spring 2009. Open questions refer to the modeling of perception errors and learning. This mainly results from our deliberate choice of a higher abstraction level than previous elaborations and implementations of the PSI theory with respect to sensory schemata and knowledge about situation-action consequences. Also, we currently only deal with small punctual obstacles like other ships and buoys, and have not yet considered widespread obstacles, such as land masses and shallows.
5 Validating the Model Validating a complex cognitive model is a discussion and a challenge in its own right. The main problem is that a complex model comprises a lot of assumptions and parameters that oftentimes can not be measured directly and/or can not be examined independently from other assumptions and/or parameters in an appropriate and sufficient way. Also, ‘many assumptions and parameters’ imply that there are numerous possibilities to influence the model’s behavior and that the same results in the overt behavior may be reached by different changes in the inner mechanisms. Therefore, there is a great danger of an unreflected ‘tweak it till it thinks’. Our conclusions are twofold: Firstly, we think that a validation should take place on the level of the overt behavior of the whole system, instead of trying to prove single assumptions and parameters. Secondly, we think that more important than proving the validity of a theory (and a resulting model), is to prove it’s value and it’s usefulness in the scientific community. Competing factors could be completeness, consistency, broadness, preciseness, testability, repeatability, ability to describe, explain, replicate and predict behavior etc. In general, there are several possibilities to validate a theory (and a resulting model): Firstly, one can conduct consistency checks. Secondly, one can accomplish a sensitivity analysis to identify crucial assumptions and factors that dominate the system’s overt behavior and that therefore should be considered and examined more closely. Thirdly, the overt behavior of the system can be compared to human behavior – this culminates in a Turing test. All three approaches were and will be followed. Especially in the last phase of the project in spring and early summer 2009 we will use the behavior data gained in the systematical observations on board of container feeders (in 2006 to 2008) and in a ship (bridge) simulator (in spring 2009) to parametrize the model and to comparatively evaluate the model’s overt behavior.
Nautical PSI - Virtual Nautical Officers as Test Drivers in Ship Bridge Design
363
6 Conclusion The Nautical PSI impressively demonstrates the value of a combined anthropological fieldwork and modeling approach: Employing a psychological perspective in accomplishing anthropological fieldwork on container feeders and trying to reconstruct the nautical officers’ psychological processes by reverse engineering was a process valuable in itself because it gave us many insights into the challenges the nautical officer’s cognitive system is confronted with. This allows us to draw general psychological conclusions from on-board observations concerning ship bridge design and design of human machine interfaces [6]. It also allows us to construct a model that, for the time being, is autonomously able to follow a prescribed track, to abide the right of way and thereby compensate for disturbances by the current and wind. It describes when and which information is obtained in a certain situation, and which resulting actions are taken. Although there is still a long way to go to complete this simulation of a nautical officer’s psychological processes, we feel that we have laid a promising foundation, especially for an effective instrument for testing bridge design, equipment and procedures. The degree of similarity between real and virtual behavior will be evaluated in empirical studies in early summer 2009 using a ship (bridge) simulator that can be steered by human and virtual mariners (to be reported at the conference).
References 1. Grech, M.R., Horberry, T.J., Koester, T.: Human Factors in the maritime domain. CRC, Boca Raton (2008) 2. Lützhöft, M.: The Technology is great when it works. In: Maritime Technology and Human Integration on the Ship’s Bridge. Unitryck, Linköping (2004) 3. Schröder, J.-U.: Datenerfassung bei Unfallursachen und begünstigende Faktoren für Unfälle in der Seeschifffahrt. In: Schriftenreihe der Bundesanstalt für Arbeitsschutz und Arbeitsmedizin – Sonderschrift S81 (2004) 4. Meck, U., Strohschneider, S., Brüggemann, U.: Interaction Design in Ship Building: Integrating the User Perspective in Ship Bridge Design. Journal of Maritime Research (subm.) 5. Strohschneider, S., Meck, U., Brüggemann, U.: Human Factors in Ship Bridge Design: Some Insights from the DGON-BRIDGE-Project. In: Deutsche Gesellschaft für Ortung und Navigation (eds.) International Symposium Information on Ships (ISIS) 2006. Deutsche Gesellschaft für Ortung und Navigation (DGON), Hamburg (2006) 6. Brüggemann, U., Klemp, K.: Psychological conclusions from on-board observations concerning ship bridge design and design of human machine interfaces. In: Deutsche Gesellschaft für Ortung und Navigation (eds.) International Symposium Information on Ships (ISIS) 2008. Deutsche Gesellschaft für Ortung und Navigation (DGON), Hamburg (2008) 7. Brüggemann, U., Klemp, K., Strohschneider, S.: Nautik-PSI: Ein Simulationsansatz für Designprobleme auf Schiffsbrücken. In: Herczeg, M., Kindsmüller, M.C. (eds.) Mensch & Computer 2008. Viel mehr Interaktion, pp. 425–428. Oldenbourg, München (2008) 8. Brüggemann, U., Strohschneider, S., Meck, U.: Virtuelle Nautiker als ‚Probefahrer’ bei der Neukonzeption von Schiffsbrücken. Künstliche Intelligenz 22(3), 62–65 (2008) 9. Franklin, S., Graesser, A.: Is it an agent, or just a program? A taxonomy for autonomous agents. In: Müller, J.P., Wooldridge, M.J., Jennings, N.R. (eds.) Intelligent Agents III. Agent Theories, Architectures, and Languages. ECAI 1996 Workshop (ATAL), pp. 21–35. Springer, Berlin (1997)
364
U. Brüggemann and S. Strohschneider
10. Anderson, J.R.: The Architecture of Cognition. Harvard University Press, Cambridge (1983) 11. Laird, J.E., Newell, A., Rosenbloom, P.S.: Soar: An Architecture for General Intelligence. Artificial Intelligence 33, 1–64 (1987) 12. Dörner, D.: Die Mechanik des Seelenwagens. Huber, Bern (2002) 13. Bösser, T.: A Discussion of The chunking of Skill and Knowledge by Paul R. Rosenbloom, John E. Laird & Allen Newell. In: Elsendoorn, B.A.G., Bouma, H. (eds.) Working Models fo Human Perception, pp. 411–418. Academic Press, London (1989) 14. Cooper, R., Shallice, T.: Soar and the case for unified theories of cognition. Cognition 15(2), 115–149 (1995) 15. Detje, F.: Handeln erklären. Deutscher Universitätsverlag, Wiesbaden (1999) 16. Dörner, D.: Die Logik des Misslingens: Strategisches Denken in komplexen Situationen. Rowohlt, Hamburg (1989) 17. Dörner, D.: Bauplan für eine Seele. Rowohlt, Reinbek bei Hamburg (1999) 18. Dörner, D.: The Mathematics of Emotions. In: Detje, F., Dörner, D., Schaub, H. (eds.) The Logic of Cognitive Systems - Proceedings of the Fifth International Conference on Cognitive Modeling (ICCM 2003) Bamberg 10. - 12.04.2003, Universitätsverlag, Bamberg, pp. 75–80 (2003) 19. Brüggemann, U., Strohschneider, S., Klemp, K.: Das Nautik-ψ: Modelldokumentation & Bedienungsanleitung. IfTP bzw. IWK DGON Memorandum Nr. 4. Fachgebiet Internationale Wirtschaftskommunikation der Friedrich-Schiller-Universität Jena, Jena (2007)
Determining Cockpit Dimensions and Associative Dimensions between Components in Cockpit of Ultralight Plane for Taiwanese Dengchuan Cai, Lan-Ling Huang, Tesheng Liu, and Manlai You National Yunlin University of Science and Technology, Graduate School of Design, 123 University Road, Section 3, Douliu, Yunlin 64002, Taiwan, R.O.C. [email protected]
Abstract. The cockpit dimensions of ultralight plane were determined by suing body and subjective dimensions of Taiwanese. The side view of the cockpit was a trapezoid. The length of the top and bottom sides were 707 and 1773mm, its height was 1280mm. The front view of the cockpit was a rectangle a width of 856mm. The length from SRP to the back and bottom side of the cockpit were 588 and 104-260mm, respectively. The length, width, and height from SRP forward, sideward, and downward to the elevator center were 380-568, 246-319, and 168-254mm. The length, width, and height from SRP forward, sideward, and downward to the throttle center were 356-555mm, 255-328, 179-264mm, respectively. The length from SRP to the rudder pedals was 712-885mm and its angle was 48°. The depth and width of the seat were 238-360 and 396mm, respectively. The height and angle of the seatback was 554 and 91-121°. Keywords: cockpit dimensions, ultralight plane, anthropometry, controls.
1 Introduction Safety is the most important factor considered in flight activities. A good arrangement of controls in the cockpit should be fit to the body dimension measurements of the operators, thus facilitating the effectiveness of learning and operation ([11], p. 711). Cockpit design being not fit to body dimensions may result in errors. Kinnersley & Roelen [8] concluded that over 50% of flight accidents were induced by design problem. Dambier & Hinkelbein [3] reported that over 72% of flight accidents occurred in private flights due to human error. Most ultralight planes used in Taiwan were imports and were not fit to Taiwanese anthropometric data. For example, the rudder pedal was designed too far and too low for average Taiwanese and the seat was not adjustable ([2], p. 2). Although many studies have been reported cockpit dimensions and the association dimensions between controls [9, 10 pp.278, 11, 12 pp.39-349, 16] (Dupuis et al (1955), Dupuis, 1957). Goossens et al. [5] suggested the seats design principles for aircrafts and Wiley & Sons ([15], pp. 398) recommended for that for cars. However, due to ethnic variation, the cockpit should be designed to fit various populations. For example, Kennedy [7] researched that the cockpit dimensions in relation to Americans size compared with various other populations. He specifically studied the eye height (sitting), and found the average American was higher by approximately 30mm V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 365–374, 2009. © Springer-Verlag Berlin Heidelberg 2009
366
D. Cai et al.
compared to the Japanese, and 70mm compared to the Vietnamese; regarding the shoulder height (sitting), American and the Japanese were almost similar, and higher approximately 40mm compared to the Vietnamese; the length of the legs of the average American was longer by approximately 50mm compared to the Japanese and 60mm compared to the Vietnamese; therefore, the rudder pedals must be moved to the rear 100mm for the Japanese and 120mm for the Vietnamese respectively. These results show the anthropometry differences among the various populations. In order to conform to pilot anthropometry and the operation space, the ultralight plane cockpit design needs to consider the positions of the controls other than constructing the cockpit dimensions. There are three controls in the ultralight plane: the elevator controls the tilt angle of the horizontal or vertical for the airframe. The rudder pedals control the direction of the horizontal or vertical for the plane, and slowdown the wheels when the plane lands. The throttle controls the flight speed. The pilots need these controls to be coordinated for hands and legs during the entire flight program. In order to let the pilot be able to operate the controls quickly and correctly, it is important the control interface conform to pilot requirements. To reduce the discomfort and risk of injury for pilots, the cockpit and the control positions design should conform to the anthropometry for male and female pilots. This study’s aim was constructing cockpit dimensions and associative dimensions between the three controls and seat for Taiwanese pilots and design a more comfortable cockpit seat that enhances cockpit performance.
2 Methods 2.1 Measurements of Body Dimensions and Subjective Comfort Dimensions Two types of measurements for Taiwanese were measured, the body dimension and subjective comfort dimensions. Subjects were measured on 14 body dimensions by the standard posture. Furthermore, 25 subjective comfort dimensions were measured while sitting on a pilot seat and operating ultralight plane controls, including elevator, throttle and rudder pedals. Those dimension are shown in Table 1. Table 1. The definition of anthropometric and subjective comfort dimensions for the Cockpit Anthropometric Dimensions 1. Stature (a) 2. Sitting height (b) 3. Shoulder height (sitting) (c) 4. Upper arm length (d) 5. Elbow rest height (sitting) (e) 6. Elbow to elbow breadth (f) 7. Elbow to grip length (g) 8. Arm length (h) 9. Shoulder breadth (i) 10. Buttock-popliteal length (j) 11. Popliteal height (k) 12. Hip breadth (sitting) (l) 13. Head height (m) 14. Head breadth (n)
Subjective Comfort Dimensions 1. Length to rudder pedal (Lr) 2. Height to rudder pedal (Hr) 3. Angle of seatback (As) 4. Popliteal angle (Ap) 5. Rudder pedal angle (Ar) 6. Length to elevator (Lfe) both hands 7. Width to elevator (Wse), both hands 8. Height to elevator (Hue), both hands 9. Elbow angle for elevator (Aee), both hands 10. Shoulder angle for elevator (Ase), both hands 11. Length to throttle (Lft), both hands 12. Width to throttle (Wst), both hands 13. Height to throttle (Hut), both hands 14. Elbow angle for throttle (Aet), both hands 15. Shoulder angle for throttle (Ast), both hands
Determining Cockpit Dimensions and Associative Dimensions between Components
367
2.2 Subjects One hundred fifty university students (75 males and 75 females) ages 18 - 32 years with a mean age of 24.2 years (SD = 2.7) participated. Subjects were students at National Yunlin University of Science and Technology, Taiwan. 2.3 Apparatus and Procedures There are two main experimental instruments in the study. One of them was a set of Martin’s anthropometric measuring instrument, including stature gauge, beam callipers, slide callipers, outside callipers, tape measure, and ruler. The other instrument was a simulated Cockpit for subjective comfort dimensions measurement which is a platform provides subjects operating the controls (the elevator, the throttle and the rudder pedal) and establishing their subjective comfort dimensions. The simulated Cockpit was made by the study including six adjustment parts: a) the tilt angle of the seat’s back, with an adjusted range between 90° and 140°. The tilt angle of the seat surface was limited to 6° by Stinton [12]; b) the length from SRP to the rudder puddle, with an adjusted range between 600 and 900mm; c) the height from SRP to the rudder puddle, with an adjusted range between 0 and 300mm; d) The tilt angle of the rudder puddle, with an adjusted range between 0° and 90°; e) The position of the elevator in three dimensions (forward and backward, leftward and rightward, and upward and downward). The elevator was located at 450mm forward, 300mm sideward and 200mm upward from the SRP. The adjustment ranges were 600mm along each dimension; and f) The throttle position in three dimensions. The throttle was located at contrast side of the body with same specifications to of the elevator. The experiment was conducted in an ergonomic laboratory at university. First, the researcher explained the experimental goals and guidelines to the subjects. Then, the subjects were measured for the fourteen anthropometric dimensions (see Table 1). Second, the subjects were invited to sit down on the seat in the cockpit and asked to operate the three controls (throttle, elevator, and rudder pedals) and adjust the three controls to optimum position until they felt comfortable. Finally, the twenty-five subjective comfort dimensions were recorded.
3 Results 3.1 Body Dimension and Subjective Dimension Measurements Tables 2 showed the mean, the 5th percentile, and the 95th percentile of the body dimension measurements for males and females. Table 3 showed the results of comparisons between the males and females. The table showed all the anthropometric measurements for males were greater than for females except in term of elbow rest height which was not significant different. The study selected the reference groups aged 18-24 years and 45-64 years for both genders from Taiwanese anthropometric database [13] that they were fit the range of
368
D. Cai et al. Table 2. The statistics of body dimensions measurements for males (unit: mm)
Dimensions Stature (a) Sitting height (b) Shoulder height (sitting) (c) Upper arm length (d) Elbow rest height (sitting) (e) Elbow to elbow breadth (f) Elbow to grip length (g) Arm length (h) Shoulder breadth (i) Buttock to popliteal length (j) Popliteal height (k) Hip breadth (l) Head height (m) Head breadth (n)
Mean 1726.6 919.9 601.3 341.5 263.1 412.9 316.1 745.2 450.4 456.4 416.3 336.3 237.1 168.8
Males 5% 95% 1618.8 1815.0 856.0 970.4 552.0 644.0 300.8 383.4 190.8 324.8 389.8 458.2 277.8 345.2 660.0 825.2 358.2 541.0 419.8 510.0 384.0 455.4 297.2 383.8 220.8 257.4 155.8 182.2
Mean 1604.0 860.8 568.8 313.2 254.7 363.0 280.8 681.4 405.2 427.6 385.0 324.5 223.1 163.0
Females 5% 1535.6 815.4 524.0 289.8 179.8 325.0 261.8 626.2 324.0 389.0 356.6 285.6 205.8 150.0
95% 1659.0 920.0 603.6 332.2 313.0 398.6 303.0 735.0 459.0 464.0 418.4 367.6 240.2 178.4
Table 3. Comparisons between males and females and between this study and others Dims
This study Male Female (A) (B)
Wang et al. [13] t-values of comparisons Male Female (A)-(B) (A)-(D) (B)-(F) 18-24 45-64 18-24 45-64 (C) (D) (E) (F) (a) 1726.6 1604.0 1717 1661 1597 1546 15.14 ** 10.08 ** 12.03** ** ** (b) 919.9 860.8 913 893 858 839 11.23 6.92 6.17** ** * (c) 601.3 568.8 599 593 564 554 7.26 2.047 5.03** 4.80 ** 9.81** (d) 341.5 313.2 339 328 311 298 8.80 ** (e) 263.1 254.7 261 262 253 256 1.28 0.23 -0.29 -6.71 ** -12.38** (f) 412.9 363.0 408 429 360 393 14.59 ** (g) 316.1 280.8 312 301 278 265 12.44 ** 6.17 ** 10.94** ** ** 5.92 7.15** (h) 745.2 681.4 744 713 677 654 9.59 -0.11 -5.01** (i) 450.4 405.2 449 451 401 426 6.56 ** --(j) 456.4 427.6 ----7.45 ** (k) 416.3 385.0 414 395 384 372 10.46 ** 9.40 ** 6.62** -3.62 ** -7.01** (l) 336.3 324.5 335 347 323 344 2.92 ** 7.72 ** 3.46** (m) 237.1 223.1 235 227 222 219 7.89 ** 5.52 ** -1.06 (n) 168.8 163.0 168 164 161 164 4.45 ** * p< 0.05, ** p< 0.05, *** all t-vales of (A)-(C) and (B)-(E) were not significant.
population for flitting ultralight plane. The anthropometric data were compared with the two age groups. Table 3 showed the analytical results. For the males, all the measurements of this study were equal to that of group 18-24 years, and were greater than that of group 45-64 years except in terms of elbow rest height and shoulder breadth being equal to, and elbow to elbow breadth and hip breadth being less than. For the females, all the measurements of the study were equal to that of group 18-24 years, and were greater than that of group 45-64 years except in terms of elbow rest height and head breadth being equal to, and shoulder breadth, elbow to elbow breadth, and hip breadth being less than.
Determining Cockpit Dimensions and Associative Dimensions between Components
369
Table 4. The measurements of subjective comfort dimensions for males (unit: mm) Dimensions Mean Rudder pedal Lr Hr Ar As Ap Elevator Lfe (Left hand) Wse Hue Aee Ase Lfe (Right hand) Wse Hue Aee Ase Lfe (Both hands) Wse Hue Aee Ase Throttle Lfe (Left hand) Wse Hue Aee Ase Lfe (Right hand) Wse Hue Aee Ase Lfe (Both hands) Wse Hue Aee Ase * p < 0.05, ** p < 0.01.
Males 5%
95%
Mean
Females 5%
t-value 95%
830.0 208.3 46.2 110.3 120.3
755.8 128.0 26.6 97.8 109.8
884.8 260.0 64.2 121.2 133.0
782.3 178.2 50.6 102.8 125.3
726.2 124.8 37.8 91.0 115.0
827.6 228.6 64.0 112.0 135.0
8.98 5.65 -3.21 7.58 -4.41
**
514.6 290.2 209.6 120.0 11.4 512.2 290.4 210.0 119.6 11.7 513.5 290.3 209.8 119.8 11.5
437.6 260.0 154.0 106.8 7.0 419.6 262.8 165.0 104.0 7.0 427.7 262.6 157.8 105.6 7.0
570.0 320.0 256.0 133.2 20.4 565.2 316.2 254.0 135.2 16.0 567.8 319.0 254.0 135.0 18.9
470.3 280.5 204.6 130.3 10.3 469.6 280.2 204.7 130.2 10.3 470.0 280.3 204.7 130.2 10.3
425.4 263.6 140.0 115.0 5.0 438.0 256.8 147.0 114.0 5.0 429.7 257.0 145.8 114.6 5.0
521.4 313.0 271.0 148.0 16.0 505.2 304.4 258.0 145.0 16.2 516.0 306.2 262.8 146.0 16.0
6.82 3.46 0.97 -6.13 2.07 4.95 3.71 0.87 -6.98 2.22 8.07 5.09 1.30 -9.26 3.04
**
480.1 299.9 224.1 115.1 16.5 480.3 300.1 224.1 115.0 16.3 480.2 300.0 224.1 115.0 16.4
413.4 269.6 156.6 98.0 9.8 395.0 279.8 155.6 95.8 9.0 396.7 273.1 161.6 98.0 9.6
550.0 335.0 263.2 131.2 25.2 555.2 325.0 264.2 130.2 25.0 555.0 327.8 264.0 131.0 25.0
440.3 290.2 219.2 125.3 14.8 440.0 290.3 219.6 125.2 14.8 440.2 290.2 219.4 125.2 14.8
392.6 271.0 166.8 104.6 8.0 360.0 265.6 159.4 109.8 8.8 386.6 268.1 162.8 109.0 8.0
488.6 307.4 272.0 143.2 20.2 500.0 316.8 280.0 139.0 21.2 498.9 314.0 276.7 142.0 21.0
6.80 3.70 0.95 -5.50 2.30 5.47 4.08 0.79 -5.90 2.06 8.54 5.50 1.23 -8.07 3.10
**
** ** ** **
**
** * ** **
** * ** **
** **
**
** * ** **
** * ** **
** **
Tables 4 showed statistics of the measurements of the subjective comfort dimensions for males and females. The analytical results indicated that the length (t=8.98, p<0.01) and height (t=5.65, p<0.01) of the SRP to the rudder pedal for males were longer than for females. The angle of seatback for male was greater than for females (t=7.583, p<0.01). However, with regard to the popliteal angle (t=-4.407, p<0.01) and the rudder pedal angle (t=-3.209, p<0.01) for males were less than for females (Table 6). The
370
D. Cai et al.
results illustrated that the flitting posture was more tilt in seatback, more bend in knee, and more flat in feet for male than for females. For the length and width of SRP to the controls including elevator and throttle for males were longer (forward) and wider (sideward) than for females regardless in left or right hands. However the height of the controls was not different between the both genders (Tables 6 and 7). The elbow angles for operating elevator and throttle for male were smaller than for female. However the shoulder angles (abduction angle of upper arm) for male were larger than for female. That is the females preferred to put the controls to be farther away from the body than males. The result maybe explained because the force of females was smaller than males; therefore, if a large force needs to be exerted, and the elbow angle must be larger. Comparison analysis was used to test the difference between the left and right hand on the five measurements of subjective comfort dimensions for operating elevator and throttle by male and female respectively. The five measurements were the length, width and height from SRP forward, sideward, and upward to the controls, and the elbow and shoulder angles for operating the controls. The results showed that the all the five measurements between left and right hand were not significantly different. Therefore we combined the tow measurements of the two hands to one group for further analysis. The results of comparisons between operating elevator and throttle on the five measurements of subjective comfort dimension for left and right hands respectively. The results illustrated that all the three dimension positions, elbow and shoulder angle were significantly different. The length and elbow angle for operating elevator were longer and lager than for throttle. However, the width, height and the shoulder angle for operating throttle were wider, higher, and lager than for elevator. That is when operating an elevator people tend to erect the forearm and adduct the upper arm so that the three dimension positions for the elevator were farther, more centered and lower than the throttle, vice versa. 3.2 Correlation Analysis for Deriving Critical Body Dimensions After analyzing the correlation between the measurements of body dimensions and of the subjective comfort dimensions, the results showed the length of SRP to the rudder pedal was most relevant with stature (0.75) and the height of SRP the rudder pedal was most relevant with the popliteal height (0.51). The length from SRP forward to the elevator and throttle were most correlated to the elbow to grip length with correlation coefficients of 0.67 and 0.66 respectively. Both widths from SRP sideward to the elevator and throttle were most correlated to the elbow to elbow breadth with correlation coefficients of 0.60 and 0.61 respectively. The heights from SRP upward to the elevator and throttle were most relevant with elbow rest height by correlation coefficients of 0.62 and 0.64 respectively. 3.3 Calculation of the Subjective Dimensions for Specific Groups To conform with the pilot requirements for cockpit design, the study defined the pilot age from eighteen to sixty-four years. Some countries define the driving minimum age as eighteen [6], and the maximum age as sixty-four years because the sixty-five
Determining Cockpit Dimensions and Associative Dimensions between Components
371
years was considered to be aged adults [17]. The study estimated the subjective dimensions for the group of 45-64 years. The ratio of the subjective dimension divided by the its most correlated body dimension of this study could be given with Equation 1, then the subjective dimension of the predicting group can be calculated by using the ratio multiplied by the corresponding body dimension. subjective dimension of this study subjective diemnsion of specific group =R = ..........(1) body dimension of this study body dimension specific group 3.4 Determining the Positions of the Rudder Pedal, Elevator, and Throttle The study constructed cockpit dimensions and the controllers’ positions for the eighteen to sixty-four years old Taiwanese. The controllers of the cockpit were designed for adjustable. For the rudder pedal’s position, the length from SRP to the rudder pedal, we suggested a range from the 5th percentile for 45-64 year-old females by Wang et al [13] and the 95th percentile for males by this study which were 712885mm. The height from SRP to the rudder pedal, we used the same principle for the two groups which was 104-260mm. The distance between the two rudder pedals. This study applied the hip breadth average for the population which was 330mm. the rudder pedal width, Roebuck et al [10] suggested the width should be 152mm. The tilt angle of the rudder pedal, the study used the average for the study which was 48°. The stroke length of the rudder pedal, Stinton [12] suggested the adjusted range to be ±75mm. We suggested a range from the 5th percentile for 45-64 year-old females by Wang et al [13] and 95th percentile for males by this study the to determine the control three-dimension positions. The results were showed in Table 10. The length, width, and height from the SRP forward, sideward, and upward to the center of the hand grip for the elevator were 380-610, 246-319, and 168-254mm respectively, and to the throttle were 356-555, 255-328, and 179-264mm respectively. The elbow angle for operating the elevator and throttle were 125° and 120° respectively, and the upper arm (shoulder) angle for operating the elevator and throttle were abduction 11° and 16 °, respectively. 3.5 Designing the Seat and Cockpit Dimensions First, we found the 5th percentile of the buttock-popliteal length (432mm) of female American civilian population, the mean hip breadth (359mm) of male and female civilian population and mean shoulder height (min 592mm) of air force flying personnel [4]. Also, the standard dimensions of seat depth (406-445mm), seat width (min 430mm), and seatback height (min 650mm) were provided by literature [1]. The seat depth for Taiwanese could be calculated by using the seat depth of standard [1] divided by the buttock-popliteal length of American multiplied by that body dimension of Taiwanese (349mm) which made 238-360mm. By using the same formula, the seat width could be predicted by the hip breadth (330.4) was 396mm, and seatback height by shoulder height (585mm) was 554mm, respectively. Because the range of the rudder pedal, elevator, and throttle on the longitude axis were 173, 188, and 199mm, respectively, we suggested the seat adjustable range on the axis should be 199mm. The range of the rudder pedal, elevator, and throttle on the vertical axis were 98, 86,
372
D. Cai et al.
Fig. 1. Definition of the anthropometric dimensions
and 85mm respectively, we recommended the seat adjustable range on the vertical axis should be 156 mm 104-260). Besides, the seat surface angle was 6° based on Stinton’s study [12]. The study used a range from the 5th percentile of males (91°) to the 95th percentile of females (121°) for the tilt angle of the seatback. The popliteal angle made from the average of the population which was 125° by the study. For the cockpit dimensions, we recommended the height of SRP to the top of cockpit, Stinton [12] suggested the sitting height plus 50 mm. we used the 95th percentile male in this study (970mm) plus 50mm which made 1020mm. The length of SRP to the front of the cockpit, Stinton [12] suggested the length of SRP to the heel of the rudder pedal plus 300 mm. We used the 95th percentile (885mm) plus 300mm which made 1185mm. The length of SRP to the back of cockpit, we suggested the horizontal length from SRP to the point of intersection of the tangent line along the seat back (120°) and the top of the cockpit made which was 588mm (1020 tna30°). The arm reach to the front of the cockpit, Sander et al. (2000) suggested the 5th percentile of the arm length. The study used the 5th percentile of females of group 45-64 years made which came to 600mm. Therefore the height of the cockpit was 1280mm, maximum seat height (260) plus height of SRP to top of the cockpit (1020. The width of the cockpit was recommended two times of the maximum width of elevator or throttle on transverse axis (328) plus 100mm which was 856mm. The length of the cockpit of bottom side was 1773mm
Determining Cockpit Dimensions and Associative Dimensions between Components
373
(1185+588). The length of the cockpit of the top side was reduced by a slope on the length of cockpit of bottom side. The slope was calculated by the length of SRP to the front of cockpit on bottom side (1185) and the length of SRP to front of cockpit (600) at the height of SRP plus average shoulder height (260+585) which was 707mm. All the relative dimensions of SRP to the rudder pedal, elevator, and throttle as well as the cockpit dimensions were illustrated in fig. 1.
4 Conclusion and Suggestion Through the body dimensions and subjective dimensions measurement and the reference from other studies the cockpit dimension were constructed. The lateral view of the cockpit was a trapezoid. The length of the top and bottom sides were 707 and 1773mm, respectively, and its height was 1280mm. The front view of the cockpit was a rectangle, its width was 856mm. Based on SRP, the length from SRP to the back and bottom side of the cockpit were 588 and 104-260mm, to the rudder pedals was 712-885mm. The angle of the seat back was from 91° to 121°. The popliteal angle was 123° and the rudder pedal angle was 48°. The depth and width of the seat were 238-360 and 396mm. The height of the seatback was 554mm. The length, width, and height from SRP forward, sideward, and downward to the elevator center were 380568, 246-319, and 168-254mm, respectively. The length, width, and height from SRP forward, sideward, and downward to the throttle center were 356-555mm, 255-328, 179-264mm, respectively. The elbow angle for operating elevator and throttle was 125° and 120°, and the shoulder angle, upper arm abducted for operating elevator and throttle were 11° and 16°, respectively. The study results can provide a reference for flight cockpit design for Taiwanese and other similar people. Acknowledgment. The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this research under Contract No. NSC 97-2511–S–224-003-MY2.
References 1. Aerospace Standard AS290B: Seats for Flight Deck Crewmen-Transport Aircraft, Society of Automotive Engineers, Warrendale, PA, U.S.A. (1978) 2. Cai, D., You, M., Chen, W.: A study on applying ergonomic approaches to the ultralight plane design (II), Research Project Report, The Nation Science Council of the Republic of China, No. NSC 92-2213-E-224-020 (2004) 3. Dambier, M., Hinkelbein, J.: Analysis of 2004 German General Aviation Aircraft Accidents According to the HFACS Model. Air Medical Journal Associates 25, 265–269 (2006) 4. Damon, A., Stoudt, H.W., McFarland, R.A.: The Human Body Equipment Design. Harvard University Press, USA (1966) 5. Goossens, R.H.M., Snijders, C.J., Fransen, T.: Biomechanical analysis of the dimensions of pilot seats in civil aircraft. Applied Ergonomics 31, 9–14 (2000) 6. Insurance Institute for Highway Safety: U.S. Licensing Systems for Young Drivers (2008), http://www.iihs.org/laws/GraduatedLicenseIntro.aspx
374
D. Cai et al.
7. Kennedy, K.W.: International Anthropometric Variability and Its Effects on Aircraft Cockpit Design. In: Chapanis, A. (ed.) Ethnic Variables in Human Factors Engineering, p. 327. Johns Hopkins University Press, MD (1975) 8. Kinnersley, S., Roelen, A.: The contribution of design to accidents. Safety Science 45, 31– 60 (2007) 9. Mehta, C.R., Tiwari, P.S., Rokade, S., Pandey, M.M., Pharade, S.C.: Leg strength of Indian operators in the operation of tractor pedals. International Journal of Industrial Ergonomics 37, 283–289 (2007) 10. Roebuck, J.A., Kroemer, K.H.E., Thomson, W.G.: Engineering anthropometry methods, pp. 211, 278. Wiley-Interscience, New York (1975) 11. Sanders, M.S., McCormick, E.J.: Human factors in engineering and design, pp. 470–471, 711. McGraw-Hill, New York (1993) 12. Stinton, Darrol.: The design of the aeroplane: which describes common-sense mechanics of design as they affect the flying qualities of aeroplanes needing only one pilot, Granada, London, pp. 339–349 (1983) 13. Wang, M.J., Wang, M.Y., Lin, Y.C.: Anthropometric Data Book of the Chinese People in Taiwan. In: Ergonomics Society of Taiwan, Hsinchu, Taiwan, pp. 64, 66, 242, 410, 430, 436, 460, 488, 412, 534, 576, 374, 130, 222 (2002) 14. Wang, X., Le Breton-Gadegbeku, B., Bouzon, L.: Biomechanical evaluation of the comfort of automobile clutch pedal operation. International Journal of Industrial Ergonomics 34, 209–221 (2004) 15. Wiley & Sons: Human engineering guide to equipment design, Washington, D.C., United States Printing Office, pp. 398 (1972) 16. Woodson, W.E.: Military systems-aircraft. In: Human factors design handbook, pp. 218– 219, 601. McGraw-Hill Book Company, New York (1981) 17. World Health Organization: Definition of an older or elderly person, health statistics and health information systems (2008), http://www.who.int/healthinfo/survey/ageingdefnolder/en/ index.html
Multilevel Analysis of Human Performance Models in Safety-Critical Systems Jeronimo Dzaack1 and Leon Urbas2 1
Technische Universität Berlin, Department of Psychology and Ergonomics Chair of Human-Machine Systems [email protected] 2 Technische Universität Dresden, Institute of Automation [email protected]
Abstract. Safety-critical systems are technical systems whose failure may cause injury or death to human beings. Tools used in the design and evaluation of safety-critical systems are redundancy and formal methods to ensure a proper operating behavior. To integrate human factors into the engineering-process of safety-critical systems it is necessary to take into account cognitive aspects of human beings while interacting with these systems. Formal human performance models can be applied to support the design and evaluation. These cognitive models interact with the technical system and provide a wide range of objective data (e.g., execution times). But using human performance models requires validating their behavior and internal structure in advance. Especially in the context of safety-critical systems this is an important issue. In this contribution the possibilities of multilevel analysis of human performance models are shown and discussed. Selected tools are introduced and related to a derived taxonomy of multilevel analysis. Keywords: cognitive architectures, multilevel analysis, human performance models, tools, human factor, evaluation and design, safety-critical systems.
1 Introduction Safety-critical systems are technical systems that integrate a complex system of input and output devices and whose failure may cause injury or death to human beings and (e.g., nuclear power station or aircraft control systems). These systems have to be operable by the user in all kinds of situations and circumstances. For the design and evaluation of safety-critical systems a wide range of specific methods is available to ensure a proper operating behavior from a technical and systemic view (e.g. redundancy and formal methods). But next to the technical system the human operator is an important part of the safety-critical systems. Especially in the case of a failure or an accident the human operator has to act adequately to secure the function of the safetycritical system. To support the workflow and to ensure access to all important information in safety-critical systems more devices, displays and automated routines are integrated into human-machine interfaces. This leads to a growing information density and more cognitive load of the operator to handle all tasks. Thus designing and developing human-machine interfaces or safety-critical systems demand not only an V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 375–383, 2009. © Springer-Verlag Berlin Heidelberg 2009
376
J. Dzaack and L. Urbas
understanding of technical aspects but also an appreciation of human factors, i.e. an appreciation of the capabilities and cognitive demands of the human operator. Hence, it is essential for engineers and designers to be aware of the requirements and constraints of the future users to adjust the functionality and the design of humanmachine interfaces in particular for safety-critical systems. For example, important issues are what kind of interface best supports cognitive processes and how an interface can be designed to best match the future user requirements. To integrate human factors into the engineering-process of safety-critical systems it is necessary to take into account cognitive aspects of humans while interacting with a technical system. Human performance models can be applied to support the design and evaluation of human-machine interfaces of safety-critical systems. These models incorporate formal theories of human information reception and processing and allow simulating the interaction of humans with technical systems. Based on the simulated data (e.g., eye-movements, execution times) it is possible to derive statements from the simulated interaction in regard to state-of-the-art psychological findings (e.g., visual search). Subsequently it is possible to transfer these finding on the interaction of humans among themselves and with technical system. But using human performance models requires validating and verifying their internal and external aspects in advance to ensure the correct implementation of a human performance model and its underlying cognitive architecture. Thus, in the design and development of human performance models the analysis of the model data and its fit to psychological and empirical findings is an important sub-step to ensure the proper use of human performance models. Especially for human performance models implemented for the design process and the evaluation of safety-critical systems this is a very important issue.
2 Human Performance Models Human performance models or cognitive user models attempt to provide formal symbol structures for selected cognitive processes and attempt to show that these symbol structures can generate the corresponding cognitive behavior [1,2]. These models are developed within cognitive architectures, i.e. software frameworks integrating cognitive and psychological theories, such as theories of visual information processing, decision making, and motor commands. For an overview, see [3]. Most cognitive architectures consist of discrete modules that encapsulate specific cognitive aspects (e.g. visual module, motor module, speech module) that are regulated and ordered by a central processor (i.e. production systems) to simulate human information processing and higher cognitive processes. These architectures are independent of the simulated task and its domain and require a constant task-development in time [4]. These formal human performance models can be applied to predict the users’ behavior and future needs and allow an explicit insight into mental processes and structures that are not accessible by direct observation [5,6,7]. Applied in the engineeringprocess of human-machine systems this method provides statements comparable with empirical data and supports the design and evaluation of human-machine interfaces [8]. This helps to detect errors in the interaction design and gives indications about the cognitive demands of future users. Summing up, human performance models extend classical usability methods and expand their repertoire by cognitive aspects, are
Multilevel Analysis of Human Performance Models in Safety-Critical Systems
377
reusable and can be deployed as formal method to evaluate the usability of humanmachine systems
3 Adequacy of Human Performance Models For the construction of human performance models no general procedural method exists. But it is possible to describe the model construction as a process of four nonlinear phases [7,9]: (1) the task analysis to explore the contextual aspects and the intended use of the human performance model, (2) the conduction of experiments with human subjects to benchmark the task analysis and to provide data for the modeling, (3) the implementation of the human performance model and (4) the validation of the human performance model to detect errors in the technical implementation or the used theories the model is based on (see Fig. 1). The non-linear change between these four phases is part of the model construction and allows fitting the behavior of the model to empirical data and theoretical findings. For example it is possible to repeat the task analysis if the model validation shows that the task analysis was not adequate to model the given task. Task Analysis
Empirical Experiments
Model Implementation
Modal Validation
Fig. 1. The four non-linear phases of the construction of human performance models as stated by [7]
Regarding the application of human performance models in the engineeringprocess of safety-critical systems, it is an important issue to validate the cognitive model to show its adequacy. The prove of adequacy includes the validation of the human performance model in comparison to an empirical database and structural comparisons between model, theory and human with the aim to demonstrate the empirical and theoretical validity of a specific human performance model. The superior aim is to give creditability and confidence in applied human performance models [10]. Regarding mathematical and numerical simulations it is possible to apply statistical methods to show the adequacy of these kinds of models [11]. But these methods are not applicable for human performance models that reproduce hypothetic constructs of human cognition [7]. This problem is stated as the identification problem by [12]: „Given any machine S and any multiple experiment performed on S, there exist other machines experimentally distinguishable from S for which the original experiment would have had the same outcome“ (S. 140). This shows that it is not possible to
378
J. Dzaack and L. Urbas
give final evidence on the correctness of one approach out of concurrent approaches. But a feasible way is to justify the human performance model in comparison with its underlying theory, further human performance models and empirical data from humans [13]. For this purpose a relation between simulation experiments and experiments with humans is established in experimental studies (cognitive empiricism; [9]). The input data and respectively the output data of humans and human performance models is compared with each other and allows to draw conclusions regarding the validity of human performance models and their implemented cognitive processes and structures. For example empirical and simulated data of different human performance models can be set in relation to determine the model with the highest validity. In the context of cognitive empiricism it is necessary to demonstrate the empirical and the theoretical adequacy of human performance models [7]. The empirical adequacy means to prove a sufficient correspondence between human performance model and observable human behavior. The theoretical adequacy implies the correctness of the human performance model by means of the applied cognitive architecture and the actual implementation of the model.
4 Multilevel Analysis of Human Performance Models Human performance models integrate a complex internal structure to simulate human behavior. Three control levels can be identified that regulate their behavior and the internal interaction of the modules and cognitive processes [14]. These levels describe (1) the communication between functional modules of a cognitive system, (2) the internal structures of a functional module and (3) the interaction of a human performance model with its environment. Regarding the analysis it is possible to derive three similar levels of analysis to cover the complexity of human performance models. Therefore we developed a taxonomy of multilevel analysis of human performance models that itself characterizes the same control levels but reorders them regarding the internal complexity of involved kinds of interactions (see Fig. 2). Hence the first level analysis is the analysis 1st level analysis 2nd level analysis 3rd level analysis
Human Performance Model Module 1
Module 2
Module n
Environment
Fig. 2. Multilevel analysis of human performance models. Displayed are the three different level of analysis to cover the complexity of human performance models (1st level: interaction within a module, 2nd level: interaction between different modules, 3rd level: interaction between model and environment).
Multilevel Analysis of Human Performance Models in Safety-Critical Systems
379
of each module structure separately. This allows validating the internal structure of the individual modules in detail and indicates if the implemented module is sufficient to model the encapsulated cognitive processes and theories (e.g. validating the vision module). The second level analysis is the analysis of the internal processes between different modules of a human performance model. With this level analysis it is possible to validate the internal structure of the human performance model with its flow of information and action control between all modules concerned (e.g. the exchange of chunks between different modules). The third level analysis is the analysis of the interaction of the human performance model with its environment. This enables the modeler to evaluate the model in the same way as in empirical studies (e.g. eyemovements, mouse and keyboard input).
5 Methods to Validate Human Performance Models For the validation of human performance models and their behavior different levels of adequacy can be evaluated. For this purpose five grades of adequacy can be differentiated [7]. (1) The correspondence of product identifies the reconstruction of the main task (i.e. achievement of objectives). (2) The correspondence of intermediate steps allows a detailed insight into the modeling of cognitive processes (e.g. sequence of actions). (3) The correspondence of time provides a measurement to judge the temporal behavior of a human performance model (e.g. latency periods). (4) The correspondence of learning shows the correlation of mechanisms for knowledge acquisition (e.g. learning curve). (5) The correspondence of errors enables the modeler to examine errors and their allocation over time. It is obvious that the granularity from (1) to (5) increases and in this way allows a deeper and more complex analysis of cognitive processes and structures. Regarding the grade of analysis these five parameters of correspondence can be varied and combined. But in most validation studies of human performance models only a few correspondences are used to validate the model. This is because of the high effort to analyze the huge amount of human performance model data for each run and the internal structures of human performance models. The latter is determined by a complex underlying cognitive architecture. To show its validity is very important for both, the definition and the modification of a cognitive architecture. This ensures the correct application of human performance models implemented by a modeler who uses a cognitive architecture as a black box. Due to these facts new methods and algorithms for an automatic and computerized data analysis are developed to overcome the shortfalls. In the following a short example for each level of the introduced taxonomy of multilevel analysis is presented. 5.1 First Level Analysis To analyze a single module of a human performance model several approaches are presented in literature. Most of them use theoretical findings of a specific phenomenon and align data derived from these theories with data simulated by the corresponding module of a human performance model or its underlying cognitive architecture.
380
J. Dzaack and L. Urbas
One interesting approach is the competitive argumentation that allows evaluating isolated aspects of the model and to compare them with each other [15]. This approach involves the explication of the principles of a theory and its consequences, and the comparison of theories with other theories and alternative versions of the theory itself. For example it is possible to compare central assumptions, control structures and interaction mechanism of a single module integrated into a specific human performance model with concurrent theories and assumptions to validate the structural and theoretical backing of the implementation (e.g. compare the processing of subsequent actions in the motor module of a human performance model with a predefined list of actions derived from a psychological theory). 5.2 Second Level Analysis Regarding the second level analysis, i.e. the analysis of the interaction between different modules of a human performance model, promising approaches use numerical aspects to compare and validate internal cognitive processes and structures of human performance models. The application of statistical analysis of the model behavior and the comparison with theoretical findings of the results enables the modeler to appraise the correctness of the implemented data structures and cognitive processes of a human performance model. The approach by [16] uses extracted simulation data and compares the data with formal data derived from the theoretical framework. The aim of this approach is to show the quality and accuracy of the implementation of internal processes and structures of the human performance model without environmental aspects (e.g. compare a concrete interaction between a memory module and a vision module of a human performance model interpreting visual stimuli with a formal model of the general theory). 5.3 Third Level Analysis The largest cluster of methods exists for the third level analysis of human performance models, i.e. the interaction of a human performance model with its environment. For this purpose several tools exist to support the analysis and validation of the quantitative simulation data in a similar way empirical data is handled in real experimental studies. The aim of these methods is to minimize the effort of analysis of single datasets and group datasets and their comparison with each other (for both empirical and simulated data). Most of the existing methods provide user interfaces to organize the data and the analyses (see Fig. 3). The tool ProtoMatch was developed to support the exploratory analysis of single datasets in consideration of the equality of data sequences (e.g. eye-movements, action sequences, mouse and keyboard input; [17]). It provides a collection of protocol analysis tools to align datasets to each other and to compute the similarity between sequences of temporally ordered data. Additionally the tool generates a unified stream of high-density and sequential data to allow the processing with standard analysis tools. ProtoMatch is modularized software and can be extended easily by new filters and analyses. In this way ProtoMatch supports both confirmatory and exploratory sequential data analyses to validate human performance models.
Multilevel Analysis of Human Performance Models in Safety-Critical Systems
381
A B
C
Fig. 3. User interfaces of the Simulation Trace Analyzer for the integration of simulation and empirical data (A), the adjustment of the integrated data (B) and the comparison of single and group datasets from different origins (C)
The Simulation Trace Analyzer (SimTrA; [8]) is a tool and method to simplify the analysis of human performance models. The tool automatically processes and analyzes data from human performance models and allows the comparison of simulated data with empirical data wit the support of a graphical user interface (see Fig. 3). The tool enables the modeler to carry out evaluations of the interaction of human performance models respectively the humans with an environment considering the inherent complexity of the data (e.g. eye-movement pattern and statistical dependencies). Therefore the tool provides a preset and modularized repertoire of methods that is adaptable and expandable by the analyst. To support descriptive and exploratory data analyses SimTrA provides tables and plots of the processed data. Additionally the data is stored in a general-purpose format and can be processed by external tools (e.g. MatLab, R, SPSS).
6 Conclusion In this article we showed that the analysis and validation of human performance models is an important aspect for the application of human performance models for the design and evaluation of safety-critical systems. In this article we introduced a taxonomy of multilevel analysis of human performance models and assigned selective methods and tools to each level. These different methods and tools assist the modeler and expand the repertoire of today’s methods of cognitive modeling. They allow an automatic and computerized analysis and validation of human performance models. Regarding the aspect of team-interaction, i.e. the interaction of multiple human
382
J. Dzaack and L. Urbas
performance models, new methods have to be developed and tested. This is an important issue especially in respect to safety-critical systems and needs to be explored in the future. We showed that the improvement of human performance models and cognitive architectures by using multilevel analysis is a feasible way. For this purpose each level of analysis has to be performed during the phase of model validation in the process of model construction (see Fig. 1). The derived statements can be used to change the implementation and the underlying theory (i.e. the cognitive architecture) in respect to empirical data or psychological theories. Because of the high complexity of human performance data analysis a grade of adequacy has to be differentiated for each validation that is sufficient to support the analyst and that is applicable in the field of application of the human performance model. Regarding the analysis and validation of human performance models several aspects have to be kept in view to use the derived statements from a scientific point of view. Most human performance models base on verbally communicated cognitive and psychological theories. This can lead to a difference between the implemented cognitive processes and structures in human performance model and the original theory (theory implementation gap; [18]). One reason for this is that in some cases the modeler has to reduce the complexity of a theory to model the cognitive processes and structures involved in a task. Secondary it is possible that the modeler has to add additional assumptions into a human performance model that are independent from the underlying theory. In some cases this is necessary to support the internal control flow of the underlying programming language or to allow the integration of a single human performance model in an higher-order coherence (e.g., team-interaction). This can lead to an augmentation of a human performance model by aspects not stated in the original theory that are not distinguishable from the essential theory (irrelevant specification gap; [19]). Both aspects need to be known by the modeler and have to be regarded in the process of validating human performance models. The presented methods and tools in this article cover only a small number of aspects of the behavior of human performance models. On the one hand this is due to the high effort that is necessary to analyze the huge amount of quantitative simulation data. On the other hand no secured knowledge on the cognitive processes and structures is available to analyze the simulation data. Both leads to a complex interpretation of the data and hinder the development of methods that support the standardized analysis of human performance models on the whole. In this article a short overview of methods to do multilevel analyses of human performance models was given. They provide an insight into different control levels of human performance models and establish a base for their accurate analysis and validation. But there is still a need to develop additional methods and tools to cover a wider range of the behavior of human performance models. This statement is supported by our belief that in future application scenarios of human performance models the focus shifts more and more towards the analysis and validation, because a adequate and validated human performance model is the basic condition for the application of human performance models for the design and evaluation of human-machine interfaces. In particular in the context of safety-critical systems this is an important issue because their application allows the formal integration of human factors into the design and evaluation process and leads to technical systems that are attuned to humans.
Multilevel Analysis of Human Performance Models in Safety-Critical Systems
383
References 1. Tack, W.H.: Wege zu einer differentiellen kognitiven Psychologie. In: Bericht über den 39. Kongreß der Deutschen Gesellschaft für Psychologie in Hamburg, pp. 172–185. Göttingen, Hogrefe (1995) 2. Kieras, D.E.: The why, when, and how of cognitive simulation: A tutorial. Behavior Research Methods, Instruments, and Computers 17(2), 27–285 (1985) 3. Pew, R.W., Mavor, A.S.: Modeling Human and Organizational Behavior: Application to Military Simulations. National Academic Press, Washington D.C (1998) 4. Howes, A., Young, R.M.: The role of cognitive architecture in modeling the user: Soar’s learning mechanism. Human-Computer Interaction 12(4), 311–343 (1997) 5. Cooper, R.P.: Modeling high-level cognitive processes. Lawrence Erlbaum Associates, Mahwah (2002) 6. Opwis, K.: Kognitive Modellierung – Zur Verwendung wissensbasierter Systeme in der psychologischen Theoriebildung. Verlag Hans Huber, Bern (1992) 7. Wallach, D.: Komplexe Regelungsprozesse – Eine kognitionswissenschaftlicher Analyse. Deutscher Universitäts Verlag, Wiesbaden (1998) 8. Dzaack, J.: Analyse kognitiver Benutzermodelle für die Evaluation von MenschMaschine-Systemen. Technische Universität Berlin, Berlin (2008) 9. Stroher, H.: Kognitive Systeme – Eine Einführung in die Kognitionswissenschaft. Westdeutscher Verlag, Opladen (1995) 10. Bub, W., Lugner, P.: Systematik in der Modellbidlung – Teil 2: Verifikation und Validierung. In: VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik, Modellbildung für Reglung und Simulation: Methoden, Werkzeuge, Fallstudien, pp. 19–43. VDI Verlag, Düsseldorf (1992) 11. Lüer, G., Spada, H.: Denken und Problemlösen. In: Spada, H. (ed.) Lehrbuch allgemeine Psychologie, pp. 189–280. Verlag Hans Huber, Bern (1990) 12. Moore, E.F.: Gedanken-experiments on sequential machines. Automata Studies, Annals of Mathematical Studies 34, 129–153 (1956) 13. Schaub, H.: Modellierung der Handlungsorganisation. Verlag Hans Huber, Bern (1993) 14. Gray, W.D.: Composition and control of integrated cognitive systems. In: Gray, W.D. (ed.) Integrated models of cognitive systems, pp. 3–12. Oxford University Press, New York (2007) 15. VanLehn, K., Brown, J.S., Greeno, J.: Competitive Argumentation in Computational Theories of Cognition. In: Kintsch, W., Miller, J.R., Polson, P.G. (eds.) Method and Tactics in Cognitive Science, pp. 235–262. Lawrence Erlbaum Associates, Hillsdale (1984) 16. Baker, R.S., Corbett, A.T., Koedinger, K.R.: Statistical techniques for comparing ACT-R models of cognitive performance. In: Proceedings of the 10th annual ACT-R workshop, pp. 129–134. Carnegie Mellon University, Pittsburgh (2003) 17. Schoelles, M.J., Myers, C.W.: ProtoMatch: A tool for analyzing high-density, sequential eye gaze and cursor protocols. Behavior Research Methods 37(2), 256–270 (2005) 18. Cooper, R.P., Shallice, T.: Soar and the case for unified theories of cognition. Cognition 55, 115–149 (1995) 19. Newell, A.: Unified theories of cognition. Harvard University Press, Cambridge (1990)
Development of a Driver Model in Powered Wheelchair Operation Takuma Ito1, Takenobu Inoue2, Motoki Shino1, and Minoru Kamata1 1
The University of Tokyo, Japan [email protected] 2 National Rehabilitation Center for Persons with Disabilities, Japan
Abstract. This paper describes the development of a driver model in a powered wheelchair operation. Existing methods have known problems such as straining the user. This is because improving wheelchairs adjustment requires too many trails and errors. Thus, we proposed solutions using computer simulation in this study. Computer simulation for the improvement of wheelchair adjustment needs three models: surroundings, driver and vehicle. Surroundings and vehicle models based on existing researches can be made, but not driver models for the computer simulation. To construct the model, we extracted the operation characteristics using a powered wheelchair simulator. From these results, we constructed the driver model as the first order preview driver model. In addition, a computer simulation was proposed for adjusting a powered wheelchair. Keywords: Driver model, Powered characteristics.
wheelchair,
Simulator,
Operation
1 Introduction The number of persons with severe disabilities has been increasing in these days. It is necessary to offer mobility devices to them for social participation and the improvement of Quality of Life. A powered wheelchair is one of the effective devices. For the first step, some comfortlessness is allowed if the user can move in their own will. However, after the first step, comfort in moving is desired. There are three elements to be evaluated for improving a powered wheelchair: surroundings, drivers and vehicles. Since these elements influence each other, the user is burdened from many trials and errors to improve his powered wheelchairs. Thus, we propose computer simulation solutions for this problem. The powered wheelchair driving computer simulation which contains the three elements makes it possible to adjust the powered wheelchair without lots of trials and errors.
2 Research Question The powered wheelchair driving simulations need three models: surroundings models, driver models and vehicle models. Existing researches allow us to make surroundings V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 384– 393, 2009. © Springer-Verlag Berlin Heidelberg 2009
Development of a Driver Model in Powered Wheelchair Operation
385
models and vehicle models, but not a driver model. So, the purpose of this research is the development of a driver model in a powered wheelchair operation. Driver models are originally proposed in the research field of automobiles [1]. They are mathematical expressions of vehicle operations, and are expressed by a transfer function, logical forms such as if-then rules and so on. Since powered wheelchairs and automobiles have many common characteristics, the concept of driver models for automobiles could be adjusted to powered wheelchair operation. However, a driver model for powered wheelchairs would be different from the driver models for automobiles, since powered wheelchairs and automobiles have also different characteristics.
3 Experiment for Verifying the Possibility of Modeling 3.1 Hypothesis of This Experiment To model a powered wheelchair operation, we thought that following characteristics must be verified. • The reproducibility of operation behaviors in the same driving condition • The variance of operation behaviors under different driving condition
(a) (b)
For these verifications, operation behaviors had to be measured quantitatively. However, since this measurement was realistically difficult, vehicle trajectory was measured instead of operations. 3.2 Experiment Condition The subject was a 42 year-old male with C5 level paresis (from now on, he will be called Subject A). He had been using a powered wheelchair for 18 years since a spinal cord injury. He could use his left hand to operate the joystick of his powered wheelchair. He had no disability in the perception system. An experimental course which imitated a corridor was made for this experiment. In this course, the subject was asked to conduct right turns and left turns in his usual way. As the experimental conditions, the width of the experiment course was changed by 5 steps: 0.9m, 0.95m, 1.0m, 1.1m, 1.2m. In each condition, 3 trials were tested. Measured data are the following: • Vehicle position measured by motion capture (using a VICON). • Movie of the operation recorded by 4 small cameras. Fig.1 shows the appearance of the experiment. 3.3 Experiment Result Fig. 2 shows right turn trajectories of the trials in course width 1.2m. From this figure, we confirmed the similarity of the trajectory under the same conditions. In other conditions, this tendency was also confirmed. Thus, these results verify characteristics (a). (This is mentioned in the section 3.1) Fig.3 shows the average velocities in the right turns of each condition. Fig.4 shows the average curvature radiuses in the right turns of each condition. From these figures,
386
T. Ito et al.
Average Velocity[m/s]
Fig. 1. Appearance of the experiment
1.4 1.2 1 0.8 0.6 0.4 0.2 0
Fig. 2. Trajectory in course width 1.2m (These lines are the trajectories of the center position between rear wheels) :SD 0.83
0.92
0.66 0.55 0.37
0.9
0.95 1.0 1.1 Width of course[m]
1.2
Curvature radius[m]
Fig. 3. Average velocity in width of each course
:SD
0.8 0.6
0.61
1.1
1.2
0.36
0.4 0.2
0.58
0.27 0.13
0 0.9
1.0
1.0
Width of course[m]
Fig. 4. Average curvature radius in width of each course
the response of the driver to the width of the experiment course was confirmed. Thus, these results verify characteristics (b). (This is mentioned in the section 3.1) Changes of gaze points with facial motion were confirmed according to the phase of right turn by analyzing the operation movie. It seems that this result also indicates the driver’s response to the driving environment. The above-mentioned features were also confirmed in left turns. Therefore, we considered left turns and right turns to be equivalent. Henceforth, we determined right turns as the target operation.
Development of a Driver Model in Powered Wheelchair Operation
387
3.4 Summary of This Chapter From the experiment results, the two operation characteristics mentioned in section 3.1 were confirmed. These characteristics verified the possibility of modeling powered wheelchair operation, and indicated model structures which needed external information inputs.
4 Experiment for the Extraction of Driver Operation Characteristics 4.1 Before This Experiment To extract driver operation characteristics in detail, we tried a simulator experiment. Since many experiments were hard for the disabled subject, a subject with no disabilities (from now on, he will be called Subject B) was examined for this experiment. His validity as a subject was confirmed by a preliminary examination. 4.2 Equipment of This Experiment Fig. 5 shows the appearance of the simulator. The simulator has 4 screens: two front screens which are projected from the rear side, and the other screens are projected from the front side by super-short focus type projectors. The horizontal viewing angles are 110 degrees and the vertical viewing angles are 55 degrees. At the bottom of the simulator, there is a 6-axes motion base. This motion base makes operators feel the acceleration of the vehicle. The input device is a joystick. For the calculation of vehicle motion, 3-dimentional dynamics model of a powered wheelchair [2] was installed into the simulator system.
Fig. 5. Powered wheelchair simulator
Fig. 6. Image of experimental course
4.3 Experiment Condition Fig. 6 shows the appearance of the experiment course. The subject was given the following three instructions.
388
T. Ito et al.
• Drive through the corner. • Don’t hit the walls. • Don’t stop in the corner. Acceleration and deceleration were allowed freely. As the experimental conditions, the width of the experiment course and the vehicle maximum velocity were changed. The width was changed by 6steps: 0.9m, 1.0m, 1.2m, 1.8m, 2.0m and 2.4m. The maximum velocity was changed by 9steps: from 1.2km/h to 6.0km/h by 0.6km/h. About 40 types of conditions were experimented combining the two conditions. To consider the reproducibility, more than ten trials on each condition were examined. 4.4 Experiment Result As the characteristics of the operation, we analyzed the direction of the joystick. Fig.7 shows one typical example of the operation. 0sec means that the vehicle is 3m short from the corner. θjs means the direction of the joystick and 0deg means the forward direction. From the analysis of the operation, the turning operation is divided into 3 phases. Phase 1 is the period before turning. In phase 1, the operation is around 0deg. Phase 2 is the period while turning. In phase 2, the operation almost constantly depends on the vehicle velocity. Phase 3 is the period after the turning. In phase 3, the operation changes according to the distance to the walls. The purpose of the operation of this phase is to stabilize the powered wheelchair. The operation changed according to the changes in the experimental conditions. The correction operation in phase 3 decreased when the maximum velocity was low or when the width of the course was wide. Inversely, when the maximum velocity was high and the width of course was narrow, the correction operation in phase 3 increased, and sometimes the correction operation even happened in phase 2.
Fig. 7. Operation of θjs when turning to the right (subject B)
The reproducibility under the same condition was confirmed. Although the operation and trajectory both changed during two or three trials that preceded the changes in experimental conditions, the operation and trajectory were generally the same in each trial. However, in the extreme conditions when the maximum velocity is high and the width of course is narrow, the reproducibility is not confirmed.
Development of a Driver Model in Powered Wheelchair Operation
389
5 Confirmation Experiment by a User with Disabilities 5.1 About This Experiment To confirm the generality of the previous experiment operation characteristics, a disabled subject was experimented. He was Subject A mentioned in chapter 3. Because Subject A had disabilities and needed to be reclined and steadily fixed to the chair, the experiment was conducted with the powered wheelchair simulator of National Rehabilitation Center for Persons with Disabilities. The screens and seat were a little different from the previous simulator, but the input device and the calculation system were same. In this experiment, the number of experiment conditions was reduced concerning the load of the subject. The width of the course was fixed to 1.2m. Two kinds of conditions were prepared for the velocity settings. One was the maximum velocity setting. In this kind of conditions, the subject could adjust the velocity freely. The maximum velocities were 2.4km/h and 3.6km/h. The other was the constant velocity setting. In this kind of conditions, the subject could not adjust the velocity. The constant velocities were from 1.2km/h to 3.0km/h by 0.6km/h. In each condition, 10 trials were experimented after a few practice trials. 5.2 Experiment Result Fig. 8 shows one typical example of the operation. From this figure, the following operation characteristics were confirmed in common.
Fig. 8. Operation of θjs when turning to the right (subject A)
• The 3 phases in operation segmentation • The operation characteristics in each phase The reproducibility of operation behaviors in the same driving condition and the response of operation behaviors to the change in the driving conditions were also confirmed. These results were common characteristics to the subject B. Thus, it seems that a driver model in a powered wheelchair operation could be developed from these common characteristics.
390
T. Ito et al.
6 Construction of a Driver Model 6.1 Mathematical Expression We constructed a driver model in a powered wheelchair operation. The expression of the driver model is shown in the following.
1 h⋅ Δ 1 + τd s θm = k ⋅ u + c
θ js =
τd s h ≡ (h i
(θ
js
≤ θm
)
(1) (2)
: Reaction time
− ho )
: Laplace operator : Operation gain vector (2 elements)
⎛ (1 + τ p s )di ⎞ ⎟ Δ ≡ ⎜⎜ ⎟ ⎝ (1 + τ p s )d o ⎠ τp
: Preview distance vector to the nearer wall (2 elements)
θm
: Operation limit
k, c u
: Operation limit parameters : Velocity
: Preview time
This model is based on the first order preview driver model. The operation is determined depending on the preview distance vector to the nearer walls. Fig.9 shows the definition of the distance vector. τp means the preview time of the driver operation. When a driver operates the vehicle, the driver uses the information not in the present position but in the future position. How far in the future the information that the driver uses is almost constant, so it is expressed as a constant time τp. h means the operation gain. Since drivers tend to margin inside in the corner, ho is usually larger than hi. τd means the reaction time of the driver operation and it is the neuromuscular delay of the driver. θm means the operation limit. Too large yawing motion is uncomfortable for the driver. Even if the calculated θjs is the appropriate value, drivers don’t operate lager than θm. The model parameters are calculated from the result of the previous experiments. do: distance to the nearer outer wall Right turn in the corner
do1
do2 di2 di1
if (1+τps)do1 < (1+τps)do2 di=di1 do=do1 else di=di2 do=do2
di: distance to the nearer inner wall
Fig. 9. Definition of di and do
Development of a Driver Model in Powered Wheelchair Operation
391
6.2 Verification of the Driver Model To verify the driver model, a closed-loop computer simulation with the driver model was tested. Fig.10 shows the computer simulated operation in comparison with experiment result. Fig.11 shows the computer simulated trajectories in comparison with the experiment result. From fig. 10, some similarities could be confirmed: operation amount and operation timing. From fig. 11, even though a little difference is confirmed in the corner, the two trajectories seem to be similar in general. From these results, we confirmed the validity of this driver model.
7 Driver Model Application 7.1 Proposal of an Evaluation Simulation with a Driver Model As we pointed out in the chapter 1, the adjustments of powered wheelchairs need a lot of trials and errors. So we propose the evaluation computer simulation with a driver model. Optimization for the subject who is the prototype of the driver model could be possible by building the driver model into a closed-loop computer simulation. 50
Experiment result Simulation result
40
1
0 y[m]
θ [deg] js
30 20
-1
10 -2
0 -10 0
-3
2
4 6 time[sec]
8
10
Fig. 10. Simulation result (Operation amount)
Experiment result Simulation result -1
0
1 x[m]
2
3
Fig. 11. Simulation result (trajectory)
To confirm the effectiveness of this idea, we tried the computer simulation for adjusting the maximum speed parameter of a powered wheelchair. The evaluation index was the integral value of the operation and the maximum speed parameter was ordered with this evaluation. On the other hand, the subjective evaluation of the maximum speed parameter was examined with the powered wheelchair simulator. Both the prototype of the driver model and the subject is subject B (mentioned in chapter 4). 7.2 Evaluation Result Table 1 shows the results of the subjective evaluation and the computer simulation evaluation. Similar tendency could be confirmed between these results, even though a little difference is confirmed about parameter 3.6km/h. This indicates the possibility that the computer simulation approach could substitute of an experiment adjustment. Thus, we think the effectiveness of this simulation is confirmed from this result.
392
T. Ito et al.
In this case, the evaluation index is the integral value of a computer simulated operation. This index means that the less operation is the better. However, other indexes could be effective. For example, integral value of the distance to the wall could evaluate the risk of driving, maximum yawing motion and acceleration would evaluate the comfort of a driver, and so on. About the selection of the evaluation index, there would be room to discuss.
8 Discussion In this research, we constructed a driver model from a few experiment results. However, the number of the subjects was only two. Therefore, verification of the model was confirmed only in the subjects. To verify the driver model more clearly, more subjects and their experiment data are needed. This is a future task. Table 1. Results of subjective evaluation and simulation evaluation
Max velocity Subjective Integral value of simulated parameter(km/h) Evaluation order operation amount(deg.sec) 2.4 1st 71.18 3.0 2nd 72.81 1.8 3rd 91.93 1.2 4th 181.71 3.6 5th 73.54 (Note: The less integral value is, the better evaluation is.) There are other future tasks. As a driver operation, the turning operation in corners was focused for the development of the driver model. However, not only turning but also acceleration is important in powered wheelchair operations. So, the proposed model lacks and needs acceleration operations in the modeling. As environment conditions, the proposed driver model was constructed and tested in a course with a width of 1.2m. How drivers react to other environment conditions is important. So, the proposed model also lacks generalization to the environmental elements. In this point, environmental elements could be solved as a parameter of the driver model. However, since we can not state clearly about this, it is also a future task.
9 Conclusions This paper developed a driver model of a powered wheelchair operation. The following findings were obtained by the process of this research. 1. A simulator of a powered wheelchair was developed for the analysis of operation. 2. The operation characteristics of a powered wheelchair were examined.
Development of a Driver Model in Powered Wheelchair Operation
393
3. A driver model of a powered wheelchair operation was constructed as the first order preview driver model. 4. The computer simulation for adjusting maximum speed parameter of a powered wheelchair was proposed.
References [1] Plochl, M., et al.: Driver models in automobile dynamics application. Vehicle System Dynamics 45(7-8), 699–741 [2] Yamakawa, Y., et al.: Development of Electric Wheelchair with Operational Force Detecting Device for Persons with Severe Disability (in Japanese). In: Proceedings of the Welfare Engineering Symposium, vol. 2006, pp. 39–42 (2006)
A Model of Integrated Operator-System Separation Assurance and Collision Avoidance Steven J. Landry and Amit V. Lagu School of Industrial Engineering, Purdue University 315 N. Grant St. West Lafayette, IN 47906, USA [email protected], [email protected]
Abstract. A model for the separation assurance and collision avoidance in air traffic has been developed. The objective of the model is to provide qualitative and quantitative predictions of system behavior with respect to separation assurance and collision avoidance. No such model exists, complicating efforts to understand the impact of adding automation to the current system. The model integrates two concepts. First, the system models at the scope of the humanintegrated system, instead of the level of the operator. This follows from the work of Duane McRuer, who found that only at the system level was the human as a control system modelable. Secondly, the system considers the separation assurance and collision avoidance problem as a control problem, where agent (automated and human) actions work to control the system from entering undesirable states. This broadly follows the methodology of system safety. Under this methodology, safety is determined by the ability of the agents in the system to impart control to prevent the system from reaching an unsafe state. The model defines system states, the events and conditions that cause transitions between states, and the control that agents in the system can impart to control those transitions. Keywords: human performance modeling, aviation, safety, air traffic control.
1 Introduction As identified by Sheridan, one of the great insights of McRuer [9] was that the pilot was so adaptable to different systems that modeling the pilot, as abstract from the system being controlled, was very difficult, but modeling the system, with the pilot embedded within it, was relatively easy [4]. While the McRuer crossover model has had limited application to complex systems, the principle of modeling a system with an embedded human, rather than trying to model the human as abstract from the system, seems still highly relevant. The safety of future concepts for air traffic control has been difficult to establish, although the current system is known to be remarkably safe from decades of experience. This difficulty in understanding safety is, in part, due to the inapplicability of reliability-based models to a system that is not comprised of subsystems whose failures are independent. In addition, attempts to model the human as abstract from the V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 394–402, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Model of Integrated Operator-System Separation Assurance and Collision Avoidance
395
system face significant difficulties in being applied to the very broad tasks assigned to controllers. A model of the separation assurance and collision avoidance function within the air traffic control system has been developed. The model is of the entire system, with embedded human and automated agents, and is capable of producing qualitative and quantitative assessments of safety. It is particularly useful for quickly examining the impact of changes to the system on safety.
2 Description of the Models A simple state-based model of the separation assurance and collision avoidance problem was constructed using statechart notation [5]. This model is of the entire humanmachine system, rather than of either the system (abstract from the human), or just the human (abstract from the system). The model for a simple system, without separation criteria, air traffic controllers, or automation, is shown in Figure 1. A more complex model, of the current system, is shown in Figure 2. The models were built to be complete models of the associated systems, where all states were mutually exclusive and exhaustive, except were orthogonal states were identified. Similarly, the conditions were mutually exclusive and exhaustive. The events causing transitions are alleged to be the only events that can cause transitions. The models were initially analyzed for what they relayed qualitatively about the separation assurance and collision avoidance problem in the National Airspace System (NAS) of the United States. Subsequently, a rough probabilistic assessment was applied to the system to further validate the model’s accuracy. Additional information about how the models can be adapted for computational purposes is also provided. 2.1 Model 1 – Simple Collision Avoidance Model 1 is shown in Figure 1. This model is of a simplified two-aircraft collision avoidance task, similar to visual flight rules (VFR) flight, where controllers are not monitoring the flights and where there are no separation standards enforced. In such a system, collision avoidance is the only concern, and it is established by the actions of the two pilots. In model 1, the system is in either one of two states. State 1 is that aircraft are separated (have not collided). State 2 is that aircraft have collided. These two states can be seen to be mutually exclusive and exhaustive. The system is in state 1 if condition A is true, and is in state 2 if condition A is not true. Within state 1 are two substates. The system is in state 1a if a collision will not occur with the current 4D trajectories, and is in state 1b if a collision will occur with the current 4D trajectories (3 spatial dimensions plus time). Again, these states are mutually exclusive and exhaustive under state 1. The system is in state 1a if, in addition to condition A, condition B is true. The system is in state 1b if condition B is false and condition A is true.
396
S.J. Landry and A.V. Lagu 1. Aircraft are separated
A separation ( t current ) > 0 1a.Collision will not occur with current 4D trajectories
1b.Collision will occur with current 4D trajectories
A
B
B
min ( separation ( t ) ) > 0
2. Aircraft have collided
q r
A ¬B
¬A q
q
4D trajectory change occurs
r
t = tcollision
Fig. 1. Model 1 - collision avoidance
Transitions between states 1a and 1b can occur only because of a 4D trajectory change (by definition). Transitions between states 1 and 2 can only occur if the system is in state 1b and the current time (t) becomes equal to the time at which a collision will occur (tcollision). This time can also be defined as the time at which the separation of the two vehicles becomes approximately zero. If one considers agents within the system (in this case the pilots of the two vehicles), their goal is to prevent the system from reaching state 2. This can be accomplished by detecting that the system is in state 1b and, if so, executing a 4D trajectory change that will result in the system transitioning to state 1a. These aspects of the model maps well to the actions of pilots with respect to collision avoidance in VFR. Pilots scan for other aircraft, and, if they detect a potential collision, change the 4D trajectory of the aircraft such that a collision will not occur. The difficulty in this task comes from the uncertainty in detecting system state and in executing a 4D trajectory change that will result in the desired transition. For example, the two aircraft may not be geometrically arranged such that visual contact is possible (e.g. one aircraft above and slightly behind the other). In that case, no detection is possible. Likewise, the execution of a 4D trajectory change is subject to pilot and vehicle delays, and uncertainty in the resulting system state even if the 4D trajectory change is executed properly. 2.2 Model 2 – Separation Assurance and Collision Avoidance Model 2 adds the problem of separation assurance. In this model, agents must consider the ability of the aircraft to stay safely separated in addition to avoiding a collision. However, separation is a procedural problem and, while being undesirable, is not strictly catastrophic as a collision would be. In model 2, states 1 and 2 remain unchanged. Within state 1, however, are two orthogonal states. State 1a refers to the current state of the aircraft, and state 1a’ refers to the future state of the system. Within state 1a, the system is in state 1a1 if no loss of separation (LOS) has occurred, and is in state 1a2 if a loss of separation has occurred. These states are mutually exclusive and exhaustive with respect to current separation, and are identified by the value of condition C as indicated.
A Model of Integrated Operator-System Separation Assurance and Collision Avoidance
A
separation ( tcurrent ) > 0
B
min ( separation ( t ) ) > 0
1. Aircraft have not collided
1a1.No LOS
1a2.LOS
C
A
C
separation ( tcurrent ) > sepmin
D
min ( separation ( t ) ) > sepmin
397
s ¬C
2. Aircraft have collided
r 1a. Current states 1a’. Future states
1a’1 LOS will not occur
¬A
t
A
D q
q
1a’2.LOS will occur
¬D 1a’2a.Collision will not occur
1a’2b.Collision will occur
q
q B
¬B
r s
q
t
4D trajectory change
t = t LOSi t = t LOS i t = tmin
Fig. 2. Model 2 - separation assurance and collision avoidance
Within state 1a’, the system is in state 1a'1 if a LOS will not occur in the future, and in state 1a'2 if a LOS will occur in the future. The system is in these states if condition D is true or false, respectively. Within state 1a'2, the system is in state 1a'2a if a collision will not occur, and is in state 1a'2b if a collision will occur. These states map to condition B, as in model 1. Transitions from state 1a1 and 1a2 can occur due to the start of a LOS event (event r) and the reverse transition occurs due to the end of a LOS event (event s). Transitions between states 1a'1 and 1a'2 occur due to 4D trajectory changes, as do transitions between states 1a'2a and 1a'2b. Transition to state 2 occurs only when the system is simultaneously in states 1a2 and 1a'2b and event t occurs. In this system, there are several different agents, specifically pilots, air traffic controllers, and automation systems, that work together to keep the system out of state 2. Pilots are not procedurally tasked with separation, so their primary function is collision avoidance. Controllers are primarily tasked with separation assurance. The Traffic Collision Avoidance System on board most commercial aircraft is tasked with collision avoidance. There are also automated systems for detection of impending (conflict alert or CA) and actual LOS (operational error detection program or OEDP) at the air traffic controllers’ stations. These agents each act on different parts of the model. Pilots and TCAS monitor for state 1a'2b and, if detected, apply a 4D trajectory change to move the system to state 1a'2a. Controllers and CA monitor for state 1a'2 and, if detected, controllers apply a 4D trajectory change to move the system to state 1a'1 by passing the 4D trajectory change to the pilot. The OEDP simply detects that the system is in state 1a'2. There is one additional consideration regarding the model. One might also consider monitoring for the potential for a particular 4D trajectory change to result in a
398
S.J. Landry and A.V. Lagu
sequence of rapid transitions from safe states to state 2. For example, the system may be in state 1a1 and 1a'1 when a particular 4D trajectory change occurs. (Such changes can occur for reasons other than to control system state; one example is when trajectories are changed to avoid weather or to increase efficiency.) This particular change could result in an immediate transition to state 1a2 and 1a'2, with tmin - t being very small. In such a case, a transition to state 2 could happen in a period of time that is shorter than the reaction time of the system. For example, suppose two aircraft are flying directly at one another, level in opposite directions and separated by the minimum vertical separation of 1,000 feet. A sudden climb by the lower aircraft at an inopportune moment could result in a collision in a matter of seconds. There may be insufficient time for any agent to intervene to prevent the collision. Such a situation is also undesirable. This capability reflects observed behavior of controllers in that controllers often mitigate conflicts, even when the aircraft do not appear to be in danger of losing separation. Nonetheless, if the minimum separation is such that a plausible mistake or even uncertainty could result in a loss of separation in less time than the controller could respond, the controller would likely intervene by mitigating the potential for LOS. This might be accomplished by applying a 4D trajectory change to increase the minimum tmin – t, or by confirming the intentions of the pilot(s) to follow their assigned clearance. The results of a qualitative analysis of the model are supplied next, followed by application of the best-known probabilities of human performance in this task to try to validate the model.
3 Results 3.1 Qualitative Results From the model, the critical capabilities for agents, including humans, is the ability to detect system state, to determine the minimum tmin - t for any plausible 4D trajectory changes, and to identify a 4D trajectory that can control the value of conditions B (for collision avoidance) and D (for separation assurance). Changes to the system or conditions that diminish these capabilities can be said to negatively impact the safety of the system. Consider the introduction of TCAS. The ability of pilots to detect and prevent a collision at high speed is rather low, and is very low in instrument conditions. TCAS enhances this ability by detecting closure rates without relying on visual cues. Such an ability significantly increases the ability to detect state 1a'2b, and controls the trajectories of one (or both) aircraft to move the system to state 1a'2a. Next, consider proposals to replace the air traffic controller with an automated separation assurance system [3]. The ability of such systems to detect system state is not yet clearly defined. In practice, it is difficult to estimate the open-loop behavior of the system accurately, which is a requirement to be able to accurately estimate the effectiveness of a conflict detection capability. As yet, there is no indication that the system is less capable at detecting conflicts than controllers.
A Model of Integrated Operator-System Separation Assurance and Collision Avoidance
399
Moreover, such systems hold promise because this ability is not affected by the number of aircraft, as would a controller. That is, controllers are currently limited to about 12 aircraft in a sector at a given time, depending on the complexity of the sector. Automation is limited only by the number of comparisons that can be made in the time available (a few seconds). This number is very large, and will grow as computing power increases. However, such systems do not mitigate potential LOS as do controllers. The system merely detects a predicted LOS and attempts to resolve it. It is possible that aircraft are allowed to achieve a position from which a LOS, and possibly a collision, can occur in less time than is required to intervene. Therefore, there is no protection against a sudden LOS occurring due to detection uncertainty or sudden 4D trajectory change. This finding corresponds well to the cases that elude detection and resolution by the current version of the automated system. Unexpected changes result in instant or near-instant LOS, making detection irrelevant or useless in such cases. An alternative is to develop a system for detecting pairs of aircraft that will achieve a position from which a LOS or collision can occur in a very short period of time, and resolve those pairs as if a LOS were predicted. In general, given some new set of procedures or capabilities, the model states that, if there is no impact on the key capabilities – detection and control of state, those new procedures or capabilities will have no impact on safety. Conversely, if the procedures or capabilities do have an impact on those key capabilities, then safety will be reduced. In such cases, the specific impact, and possible mitigation strategies, should be investigated. 3.2 Quantitative Results The model is a held to be a complete model of the separation assurance and collision avoidance problem. This is supported by the nature of the underlying states, which are mutually exclusive and exhaustive. Furthermore, the events are held to be a complete list of the events necessary and capable of causing a transition. As such, it should be possible to formalize the model mathematically, although this would not necessarily provide information about the behavior of the operators. Moreover, such a formalization may not provide any useful insight when the behavior of the agents themselves are not formalizable. However, additional evidence was sought to see if the model would comport well with actual system data. In model 1, the probability that the system would end up in state 2 is given by the following equation:
P ( 2 ) = P (1b ) ⎡⎣ P ( ¬detect 1b ) + P ( detect 1b ) P ( ¬1a detect1b ) ⎤⎦
[1]
Equation 1 states that the probability the system ends up in state 2 (collision) over some set of repeated trials is the probability the system gets into state 1b (collision will occur with current 4D trajectories) multiplied by the sum of the probability that state was not detected and the probability it was detected but not resolved. Since these figures have been estimated, we can approximate the prediction of the model.
400
S.J. Landry and A.V. Lagu
All probability estimates were taken from a report to NASA by LMI consulting [6], except where indicated. P(1b) is the probability that the system is in state 1b (collision will occur if no control is applied) and has been estimated to be 0.000066. The two terms in the brackets are the probability that the state was not detected and the probability it was detected but not resolved (respectively). Those probabilities, for VFR flight, are estimated at 0.074 and 0.00001. If we make several conservative assumptions, this results in an estimate of P(2) ≈ 5 · 10-6. The assumptions made to get this result are that there are no procedural methods in use to reduce the probability of collision, and that only one pilot is in a position to detect and resolve the conflict. While rates of collision per flight are not reported, a somewhat recent figure regarding VFR collisions places the rate at 0.035 per 100,000 flying hours [10]. Considering a high majority of VFR flights are between 1 and 10 hours, this places the rate per flight at approximately 1 · 10-6. The discrepancy between the model and the (rough) real-world estimate is a factor of 5. The case for model 2 is more complex. In that system, there are hierarchical relationships that must be considered. Since state 1a2 is a prerequisite to being in state 1a'2b, it can be ignored. The equation then looks similar to equation 1 above: P ( 2 ) = P (1a'2b ) ⎡⎣ P ( ¬ detect 1a'2b ) + P ( detect 1a'2b ) P ( ¬1a'2a detect1a'2b )⎤⎦
[2]
Unlike in the above analysis, however, we must consider the actions of multiple agents. For the purposes of this approximation, we make several conservative assumptions: • the detection and resolution of each agent is independent; • the influence of controller detection tools is insignificant; • although agents are detecting and resolving different states, those actions can be approximated as acting on state 1a'2b; and • the failures of one agent within a type are not independent from failures of the other agents of the same type (i.e. a second TCAS would likely fail in the same cases as a first TCAS), so only one agent of each type is modeled. Given these assumptions, the following are used as approximations for equation 2: n
P ( ¬detect 1a'2b ) = ∏ Pi ( ¬detect 1a'2b )
[3]
i =1
n
P ( detect 1a'2b ) P ( ¬1a'2a detect1a'2b ) = ∏ Pi ( detect 1a'2b ) Pi ( ¬1a'2a detect1a'2b )
[4]
i =1
The probability that the system is in state 1a'2b is estimated at 0.000066. Other probabilities are given in Table 1. Table 1. Probabilities estimated for model 2 validation
Agent Controller Pilot TCAS
P(¬detect) 0.0000027 0.0001 0.00001
P(¬resolve|detect) 0.0001 0.0001 0.00001
A Model of Integrated Operator-System Separation Assurance and Collision Avoidance
401
Based on these estimates, the probability of arriving in state 2 is approximately 1 · 10-8. A few estimates from literature are 9.8 · 10-8 in U.S. airspace in the 1980s [1] and a target level of service from the International Civil Aviation Organization of 1.5 · 10-8 [2]. The model number compares well to these estimates.
4 Discussion The qualitative results identify a key deficiency of proposed automation to replace air traffic controller responsibility for separation assurance. Specifically, air traffic controllers mitigate potential conflicts in addition to detecting and resolving predicted conflicts. Proposed automation does not do this, and because of this, cannot ensure that an unexpected event will not result in a LOS or collision. While algorithms for identifying aircraft that should be mitigated are being investigated, it is possible that such a set is large, and that mitigating those possibilities could decrease capacity. In such a case, it may be important to further subdivide the mitigation set into those with higher and lower probabilities of having the unexpected event occur. If a rule-based method for accomplishing this subdivision can be found, it can be incorporated into the automation. However, it is possible that such a subdivision is not reliably rule-based. In such a case, the automation may identify the mitigation set to the controller, who would select those that should be mitigated and those that can simply be monitored. The controller may choose additional measures, such as confirming clearances with the pilots of the mitigation aircraft, or identifying specific maneuvers that the pilot must avoid in order to be sure that a LOS or collision will not occur. The quantitative results, while preliminary, show that the model at least grossly reflects actual system behavior. The quantitative analysis shown, however, does not reflect the dynamic nature of the system. For example, pilots, controllers, and automation take continuous or dynamically sampled information and predict future separation. This detection must take place in sufficient time to identify and execute a resolution maneuver. A simple static analysis is most likely inaccurate since it does not reflect this very fundamental aspect of the system. There are a few ways in which this can be addressed. First, a dynamic model can be developed. This can be done algebraically, by integrating the probabilities of detection and resolution, or can be done in Monte Carlo simulations. The challenge for these methods is to accurately capture the dynamic behavior as probabilities, which may not be possible. Second, the model can be transformed into formal models such as State Event Fault Trees [7] or Stochastic Petri Nets [8]. The primary challenge to these latter methods is whether such models can be created where the underlying behavior of agents embedded within the system is not deterministic and, moreover, is difficult to model accurately.
Acknowledgments This work was supported by NASA Ames Research Center under cooperative agreement number NNA06CN25A. Russ Paielli is the technical monitor.
402
S.J. Landry and A.V. Lagu
References 1. Barnett, A., Higgins, M.K.: Airline safety: The last decade. Management Science, 1–21 (1989) 2. Brooker, P.: The risk of mid-air collision to commercial air transport aircraft receiving a radar advisory service in class F/G airspace. The Journal of Navigation 56(2), 277–289 (2003) 3. Erzberger, H.: Transforming the NAS: The next generation air traffic control system. In: The 24th International Congress of the Aeronautical Sciences, Yokohama, Japan (2004) 4. Gerovitch, S.: Interview with Tom Sheridan [Electronic Version] (2003), http://web.mit.edu/slava/space/interview/ interview-sheridan.htm (retrieved from, 1 March 2009) 5. Harel, D.: Statecharts: A visual formalism for complex systems. Science of Computer Programming 8(3), 231–274 (2002) 6. Hemm, B., Busick, A.: NAS separation assurance benchmark analysis. In: The NASA Research Announcement Review (2009) 7. Kaiser, B., Gramlich, C., Förster, M.: State/event fault trees—A safety analysis model for software-controlled systems. Reliability Engineering and System Safety 92(11), 1521– 1537 (2007) 8. López-Grao, J.P., Merseguer, J., Campos, J.: From UML activity diagrams to stochastic petri nets: application to software performance engineering. ACM SIGSOFT software engineering notes 29(1), 25–36 (2004) 9. McRuer, D., Graham, D.: Human pilot dynamics in compensatory systems (No. AAFFDLTR-65-15). USAF (1965) 10. Taneja, N., Wiegmann, D.A., Savoy, I.: Analysis of midair collisions in civil aviation. In: The 45th Annual Meeting of the Human Factors and Ergonomics Society, Santa Monica, CA (2001)
Modeling Pilot and Driver Behavior for Human Error Simulation Andreas Lüdtke, Lars Weber, Jan-Patrick Osterloh, and Bertram Wortelen OFFIS, Escherweg 2, 26121 Oldenburg, Germany {luedtke,weber,osterloh,wortelen}@offis.de
Abstract. In order to reduce human errors in the interaction with in safety critical assistance systems it is crucial to consequently include the characteristics of the human operator already in the early phases of the design process. In this paper we present a cognitive architecture for simulating man-machine interaction in the aeronautics and automotive domain. Though both domains have their own characteristics we think that it is possible to apply the same core architecture to support pilot as well driver centered design of assistance systems. This text shows how phenomena relevant in the automobile or aviation environment can be integrated in the same cognitive architecture. Keywords: Human Error Simulation, Cognitive Architecture, Pilots, Drivers.
1 Introduction Today assistance systems are a common and widely accepted means to support human operators in performing safety critical tasks like driving a car or flying an aircraft. The aim is to reduce the number of human errors in order to reach the ambitious goal of zero-accidents. Considering the ever increasing complexity of the traffic environment, be it air or surface traffic, human error will remain the most important challenge in order to reach this goal. During design and certification of assistance systems it has to be proven that human errors are effectively prevented and no new errors or unwanted long-term effects are induced The current practice is based on engineering judgment, operational feedback from similar aircraft, and experiments with test users when a prototype is available. Methodological innovations are needed to sustain existing quality levels and to guarantee an affordable analysis despite the increasing complexity of the overall aeronautical system. It is necessary to develop a methodology that allows to accurately analyze systems from the operators’ point of view already in early design stages when design changes are still feasible and affordable. Our approach is based on modeling and simulation of driver and pilot behavior using a cognitive architecture. It has to be said that the term “human error” is very controversial and often used to blame accidents “ex post facto” to humans. We share the view that human errors in the context of highly automated complex systems are often more a “symptom, not a cause”, highlighting weaknesses of the systems that need to be improved. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 403–412, 2009. © Springer-Verlag Berlin Heidelberg 2009
404
A. Lüdtke et al.
Section 2 of this paper presents our approach and describes our core cognitive architecture which can be instantiated in order to derive pilot as well as driver models. Section 3 describes model instantiations for the analysis of pilot behavior and section 4 for driver behavior. Section 5 gives a summary and sketches next steps of our research.
2 Modeling and Simulation Approach to Human Error Analysis Our approach to the analysis of human errors is based on the development and simulation of integrated closed-loop man-machine-environment models. In this approach human models are used as virtual system testers in order to analyze a vast number of scenarios already in early development phases to identify potentially hazardous scenarios and to iteratively improve the design. Executable cognitive models are intended to describe mental processes of human beings like assessing situations and choosing actions resulting in time-stamped action traces. These cognitive models usually consist of two parts: a cognitive architecture, which integrates task independent cognitive processes and a formal model of task specific know-how (e.g. flight procedures or traffic regulations). In order to simulate behavior the task model has to be "uploaded" to the architecture. Thus, a cognitive architecture can be understood as a generic interpreter that executes task specific knowledge in a psychological plausible way. An overview of cognitive models is provided in [2]. The most prominent representatives are ACT-R and SOAR. We decided to build our own architecture because existing ones have complementary strength and weaknesses, but none covers a comprehensive executable model of those human capabilities that are relevant for human behavior in complex dynamic environments. To build such a comprehensive architecture we adapt, extend and integrate heterogeneous modeling techniques (e.g. production system, control theoretic models, semantic networks) from different existing architectures. A key concept underlying our architecture is the theory of behavior levels [1] which distinguishes tasks with regard to their demands on attentional control dependent on the prior experience: autonomous behavior (acting without thinking in daily operations), associative behavior (selecting stored plans in familiar situations), cognitive behavior (coming up with new plans in unfamiliar situations). Fig. 1 shows the structure of our cognitive architecture. It encompasses one layer for the autonomous behavior level and one for the associative level. A third layer is formed by the percept and motor component that implement the interface to a simulated environment. On the layer for autonomous behavior we model manual control behavior for tasks like steering and braking using different modeling techniques like control theoretic formulas. These models have been described e.g. in [6]. This paper focuses on the associative layer which is the basis for the main phenomena that have been modeled. Knowledge is stored inside the memory component in form of Goal-State-Means (GSM) rules (Fig. 2). All rules consist of a left-hand side (IF) and a right-hand side (THEN). The left-hand side consists of a goal in the Goal-Part and a State-part specifying Boolean conditions on the current state of the environment together with associated memory-read items to specify variables that have to be retrieved from memory.
Modeling Pilot and Driver Behavior for Human Error Simulation
405
Fig. 1. Layered Cognitive Architecture
The right-hand side consists of a Means-Part containing motor as well as percept actions (e.g. hand movements or attention shifts), memory-store items and a set of partial ordered sub-goals. The rule in Fig. 2 defines a goal-subgoal relation between GEAR_UP and subgoals CHECK_GEAR_UP, CALLOUT_GEAR_UP. The term “After” imposes a temporal order on the subgoals.
Fig. 2. Format of GSM rules (variables are underlined)
Apart from the rules the memory component stores a ”mental model” of the current situation (e.g. position of other cars, states of instruments) and furthermore an ordered set of goals and subgoals that have to be pursued and which we call Goal Agenda. The rules are processed by the processor component of the associative layer in a four step cognitive cycle typical for production systems: A goal is selected from the Goal Agenda, all rules containing the selected goal in their Goal-Part are collected and a memory retrieval of all state variables in the Boolean conditions of the selected rules is performed. After the retrieval one of the collected rules is selected by evaluating the conditions. Finally the selected rule is fired, which means that the motor and percept actions are sent to the motor and percept component respectively and the subgoals are added to the Goal Agenda. This process is iterated until no more rules are applicable. The cycle time is 50 ms plus memory retrieval time like in ACT-R. Like in ACT-R one rule can be fired at the same time. But contrary to ACT-R our architecture allows parallelism between the autonomous and associative layer in order to model that humans can concurrently steer a car and operate a CD player. We use the same approach and in particular the same cognitive core architecture for modeling and simulating both, driver and pilot behavior. While sharing the generic architecture the instantiation of the architecture with task specific knowledge as
406
A. Lüdtke et al.
well as some specific extensions of the architecture are fundamentally different. It is out of question that the two domains, automotive and aeronautics have fundamental differences as well as some similarities. One major difference between driver and pilot behavior is that driver behavior underlies a wider range of variance, then pilot behavior. Main reasons for this are the strict selection process for pilots, the high training standards in aviation, and the standardization of procedures in cockpits. In contrast to this, a large variety of people can drive, which are often only trained once in life. Drivers develop many individual driving routines, therefore strict goal-subgoal relations as in pilot models are less common. The rule base task model allows to model both behaviours: (1) rules allow to formalize rigid script based tasks by using rules with a set of ordered and thus successive subgoals, (2) furthermore highly dynamic tasks can be modeled by using parallel (unordered) subgoals. Individual differences in driving are modeled by adding rules for all relevant driving strategies which can be selected randomly. Further differences between our driver and pilot models exist, because certain features are more only relevant for one of the two domains. For example, simulating manual steering and braking behavior on the autonomous layers of the architecture is more relevant for driver modeling. In order to allow execution of the cognitive model within realistic flight or traffic scenarios we interfaced it to simulation platforms that are normally used for experiments with human subjects. In this way we are able to use the same environment for experiments with both, human subjects and the cognitive model. This is a crucial prerequisite for comprehensive model validation. The following sections describe four extensions of the cognitive architecture for modeling aspects of pilot and driver behavior.
3 Pilot Modeling Automation systems in aircraft systems are equipped with a huge number of system modes. A mode may be understood as a system state in which it delivers a distinguishable function. Modes allow to use the same system for different maneuvers but at the same this may induce mode errors where an action is performed that is correct in some modes but not in the present one. Often, the pilots’ mental model of the automation systems is inappropriate or incomplete. In order to mitigate mode errors display designers try to control the attention of pilots by using flashing graphical elements to highlight mode changes. 3.1 Learned Carelessness The extension of our cognitive architecture to include the phenomenon Learned Carelessness (LC) can be used to analyze how pilots’ might mentally transform the task model of flight procedures while they gain experience with a system. Interaction phenomenon. The focus is on discrete pilot actions for operating a system (like pressing buttons of an autopilot). We assume that the operation can be defined normatively in form of procedures that prescribe admissible action sequences and preconditions. Of interest for the system designer are especially action preconditions that involve checking the current mode. Using our model we analyze
Modeling Pilot and Driver Behavior for Human Error Simulation
407
the probability that pilots neglect mode conditions and that this may lead to hazardous flight situations. The goal is to iteratively improve the system design to make it robust with regard to likely mode errors. Involved cognitive processes. The theory of LC states that humans have a tendency to neglect safety precautions if this has immediate advantages, e.g. it saves time, and allegedly allows to keep the same safety level. In the context of avionics systems safety precautions may be understood as checking the current state or mode of the systems before performing critical actions. LC is characteristic for human nature because we have to implicitly simplify in order to be capable to perform efficiently in a complex environment. Resulting behavior is highly adapted to routine scenarios but, unfortunately, may lead to errors and hazards in non-routine situations. Thus, it is crucial to identify those interaction sequences where LC may lead to hazardous situations. More details can be found in [5]. Modeling idea. To model LC inside our cognitive architecture we added a learning component which produces new simplified rules by merging existing normative rules. Figure 3 shows some rules from a climb procedure. Rule 25 specifies that the vertical speed (VS) button must be pressed as long as the mode annunciation (MA) does not show the flashing letters “ALT” (flashing “ALT” indicates that a mode called Altitude Capture is active). Using rule 21 the current value of MA is perceived. Rule 23 stores the perceived value into the memory. Most of the times when the pilot tries to press the VS button the Altitude Capture mode is not active and the percept action in rule 21 delivers “ALTS” which indicates that the current mode is Altitude Select and not Altitude Capture. We hold the hypotheses that due to this regularity a pilot would simplify his mental model of the procedure into a version, where the MA value is no longer perceived by looking at the cockpit instrument but is just retrieved from memory. This is modelled by merging two rules into one rule by means of rule composition. The crucial point is that in this process elements that are contained on the righthand side of the first and also on the left hand side of the second rule are eliminated. This process cuts off intermediate knowledge processing steps.
Fig. 3. Composition of Rule 21 & 23 leading to Rule 112
Fig. 3 shows the composite rule 112 that was formed by composition of rule 21 and 23. The percept action has been eliminated and the new rule always stores the value “ALTS” in memory. Rule 112 is appropriate in scenarios that are similar to those in which the rule has been learned (MA does not indicate Altitude Capture mode). In deviating scenarios (MA does indicate Altitude Capture mode) applying Rule 112 results in careless behavior: pressing the VS button independent from the current mode annunciation (Rule 112 followed by Rule 25).
408
A. Lüdtke et al.
Validation activities. We performed three case studies involving auto pilot systems to evaluate the cognitive pilot model. In one study we compared the model behavior with human pilot behavior and were able to successfully reconstruct nine mode errors [5]. In the remaining two studies subject matter experts performed a review of the model behavior and acknowledged that the behavior is in general plausible with respect to the investigated tasks and compatibly with the limitations of the investigated scenarios [4]. Extensive validations where the model predictions will be compared with the behavior of 24 pilots are planned for this year in the European project HUMAN. Transferability. We hold the hypotheses that the LC mechanism can be applied in the automotive domain for analyzing the discrete interaction with e.g. navigation systems where the driver has to input and select information to set up the system for a new destination. 3.2 Selective Attention The extension of our cognitive architecture towards Selective Attention can be used to investigate if attention capturing graphical elements are adequate to mitigate errors like those induced by LC. Interaction phenomenon. One important part in human error analysis is the analysis of the ergonomics of the graphical user interface. In aircraft this includes the analysis of flashing boxes around flight mode annunciations (MA), which are supposed to automatically drag the attention of pilots to mode changes. Here, display designers make use of a phenomenon called “Selective Attention” (SA). Involved cognitive processes. SA is understood as the phenomenon describing automatic shifts of attention triggered by the onset of a salient stimulus, e.g. a flashing light, or a moving item [11]. Recent studies have shown that certain characteristics of displays may undermine the effect of SA. The study of Mumaw, Sarter and Wickens [7] showed that only 30-60% of pilots recognize a MA change within the first 10 seconds (while the box is flashing). One important reason is that visual context may undermine the SA effect [8]. Modeling idea. In addition to a basic temporal model of human vision (with visual field, focus, and eye-movements) as a low-level percept component, we modeled context dependent SA. In this model the probability that a stimulus is recognized depends on the saliency of the display neighborhood, e.g. the probability is lower, if the neighborhood contains colorful and dynamic displays. Each stimulus received by the cognitive model is processed by the SA mechanism in three steps: The first step (SA1), determines if the area of interest (AOI) to which the stimulus belongs lies within the current focus or visual field. If the AOI is focused, the associated event is marked as recognized and SA3 is started. If the stimulus is outside the visual field, the associated event is marked as unrecognized, and SA1 is restarted with a new stimulus. If it is within the visual field but not in focus, SA2 is initiated in order to determine recognition. In the second step (SA2), it is determined if in a neighborhood of 15 degree (derived from experimental setup in [8]) around the AOI other stimuli have occurred. A probabilistic choice dependent on the dynamics of the neighborhood, based on data from [8], is computed to determine if the event is recognized or not.. If the
Modeling Pilot and Driver Behavior for Human Error Simulation
409
event is recognized, SA3 is started, else SA1 starts again with the next stimulus. In the last step (SA3) a shift of attention is initiated. Then SA1 processes the next stimulus. Validation activities. In order to illustrate the plausibility of our model we investigated flashing mode annunciations of an autopilot [9]. We simulated a number of scenarios highlighting situations in which pilots might miss the mode indication. In a next step we will compare our data with human data. This validation will be achieved in combination of the validation activities for LC described above. Transferability. Although the SA model has been developed for pilot behavior it is also usable in driver modeling, e.g. for detecting flashing indicator lights on other cars or warning lights of driver assistance systems.
4 Driver Modeling Recent analysis of accident data has identified inattention (including distraction) as the primary cause of car accidents, accounting for at least 25% of the crashes. Consequently guidelines for the design of driver assistance systems require to investigate the impact of automation on drivers’ attention allocation. In the following we present two extensions of our cognitive architecture with regard to factors influencing attention allocation. 4.1 Divided Attention Based on Prediction of Other Traffic Participants The extension of the cognitive architecture towards the influence of predictions of other traffic participants on attention allocation can support designers to create effective assistance systems which prevent inadequate assumptions about the dynamics of traffic situations.
Fig. 4. Freeway merging scenario
Interaction phenomenon. Drivers often find themselves in complex traffic situations which require attention allocation to different traffic participants and the integration of diverse information into a mental model of the situation. Estimations of the behavior of other drivers have an impact on attention allocation. Inadequate estimations can lead to incorrect situation assessment and accidents. We perform several simulator studies with human participants to investigate these aspects. In a first study we investigated the influence of 9 different combinations of speed differences (vdiffAB) and gap size (dAB) (see Fig. 4) on merging behavior to build up an initial model for gap acceptance and lane change.In a second study we consider a lead car (C in Fig. 4) which also enters the freeway. Here we investigate drivers
410
A. Lüdtke et al.
divided attention between the Tasks “hold distance to C” (T1) and “looking for a gap” (T2). In a reference scenario, B does not change its lane and we analyze the attention allocation of A. In following test trials B gives different lane change cues: B turns on his left indicator (Cue1), B suggests a lane change by moving to the left-most lane (Cue2). Finally, B either changes or not changes the lane (Cue3). Involved cognitive processes: Compared to the reference scenario we expect drivers to increase their attention allocation either to task T1 or T2 in the test trials. Referring to Wicken’s SEEV Model [10] we interpret this as a consequence of the attention parameter “value of information”. Both tasks have a certain value for the driver therefore both need to be considered. The smaller the distance to C, the higher the value of having exact information about C because C might brake suddenly. In consequence, the priority of T1 will be high. Concerning T2, more safety related drivers who perceive Cue1 might search for additional cues which support their prediction of a lane change of B. They invest more attention in observing B to get more reliable information. Risky drivers may consider Cue1 as predictive enough to assume lane change of B therefore spending less attention. Modeling idea. Task priorities are modeled by extending the Goal element in the Means-Part of our rules (Fig. 2) with a priority parameter. Priorities of goals are used to initiate successive execution of the same goal: the higher the priority, the larger the probability to execute the goal once more before switching to the next one. The more often a goal was executed successively, the less its probability to be executed again. In our model cues in the traffic environment have a direct influence on goal priorities. These quantitative dependencies are derived from experimental data. Validation results. The current state of the model is rather basic, results have not yet been validated systematically. Detailed validation studies are planned for this year. The main measure for model validation will be the gaze behavior of drivers. Transferability. A dual task structure of continuous, interleaved goals can be found in pilot tasks as well. Prioritization of tasks is a very important aspect in time critical multitasking situation. We assume that the aspired priority mechanism is flexible enough to model dual-task scenarios e.g. during aircraft takeoff as well. 4.2 Divided Attention Based on Event frequencies The extension of the cognitive architecture with regard to the influence of event frequencies on attention allocation can be used to identify a potential negative influence of assistance systems on the attention and situation awareness of the driver in cases where (s)he gets out of the loop of the driving task. Interaction phenomenon. Out-of-the-loop effects may be caused when a certain task is fully controlled by the system. For example an ACC (Automatic Cruise Control) system reduces or completely removes the necessity for the driver to correct the distance to the lead car. The driver might rely too much on the system and might allocate too little attention to the longitudinal control tasks. As a consequence the driver might fail to take over control in situations where the ACC reaches its limits. Involved cognitive processes. Wickens & McCarley [10] and Horrey et.al. [3] postulated four main influences on the process of attention allocation - salience, effort, event expectancy and value. With these values they created the SEEV trade-off model
Modeling Pilot and Driver Behavior for Human Error Simulation
411
to describe how humans distribute visual attention. The probability that an information sources will be paid attention to, is a function of the four influence factors. Horrey et. al. [3] conducted driver studies focusing on collision avoidance where they gained good results though only considering the factors expectancy and value. We focused on the expectancy factor, that describes how likely it is that the driver expects new information at an information source. For the collision avoidance task it is based on the bandwidth of events the driver has to react on. The more events have occurred on that source, the more will the driver expect further events to occur. Modeling idea. Following the SEEV attention allocation model we currently implement the correlation of event bandwidth and attention allocation. The implemented process relies on a percept mechanism: an Area of Interest will be scanned by the model if the goal which requires this information, is selected in the cognitive cycle. For task T1 “hold distance to C” (see Fig. 4) the required information would be dAC. Perceived information is used in the State-Part of rules. Fig. 5 shows two rules for the goal hold_distance. Their conditions will trigger a rule that perceives current_distance. The new value may trigger rule 1 or 2. In that case an event for the model has occurred, because the model has to react on outside information. The time of occurrence of this event will be stored together with the goal. It is then used to influence the goal selection process in the cognitive cycle. Tasks with higher event bandwidth will be triggered more often. As a result of the simulation the model will adapt its scan rate to the event bandwidth of the information source.
Fig. 5. Rules for keeping safe distance to front car
Validation results. Like in the preceding section validation studies focusing on gaze behaviour will be done this year. Transferability. The described phenomenon is not only applicable in the driving domain. In fact the SEEV trade-off model is used in general for the design and analysis of human machine interfaces. As has already been shown by Wickens [10], pilots scan control instruments with a high event bandwidth more often. In that approach the bandwidth is derived as a constant value from the features of the instruments. In our approach we derive the bandwidth from the dynamics of the environment and consequently the bandwidth can change dynamically.
5 Summary and Next Steps In this text we presented an approach to support the design of safety critical assistance systems in aircraft and cars. This approach is based on a cognitive core architecture which is used in both domains. We described four extensions of the core architecture.
412
A. Lüdtke et al.
Our future work will concentrate on a detailed validation and improvements of the four extensions which includes a validation of the transferability of the cognitive mechanisms. The validation requires a complex design of experiments in which our models as well as real pilots/ drivers perform the same scenarios. This will allow a comparison between the model and human behavior along parameters like error rate, eye-movements, action sequences or timing. The work described in this paper is funded by the European Commission in the 7th Framework Programme, Theme 7 Transport, FP7-211988 and FP7-218552.
References 1. Anderson, J.: Learning and Memory. John Wiley & Sons, Inc., Chichester (2000) 2. Gluck, A., Pew, R.: Modeling Human Behavior with Integrated Cognitive Architectures: Comparison Evaluation, and Validation. Lawrence Erlbaum Associates, Mahwah (2005) 3. Horrey, W.J., Wickens, C.D., Consalus, K.P.: Modeling Drivers’ Visual Attention Allocation While Interacting With In-Vehicle Technologies. Journal of Experimental Psychology: Applied 12, 67–78 (2006) 4. Lüdtke, A., Cavallo, A., Christophe, L., Cifaldi, M., Fabbri, M., Javaux, D.: Human Error Analysis based on a Cognitive Architecture. In: Reuzeau, F., Corker, K., Boy, G. (eds.) Proceedings of the International Conference HCI-Aero 2006, pp. 40–47. CépaduèsEditions, Toulouse (2006) 5. Lüdtke, A., Möbus, C.: A Cognitive Pilot Model to Predict Learned Carelessness for System Design. In: Pritchett, A., Jackson, A. (eds.) Proceedings of the International Conference HCI-Aero. Cépaduès-Editions, Toulouse (2004) 6. Möbus, C., Hübner, S., Garbe, H.: Driver Modelling: Two-Point or Inverted Gaze-BeamSteering. In: Rötting, M., Wozny, G., Klostermann, A., Huss, J. (eds.) Prospektive Gestaltung von Mensch-Technik-Interaktion, Fortschritt-Berichte VDI-Reihe 22 (Nr. 25), pp. 483–488. VDI Verlag, Düsseldorf (2007) 7. Mumaw, R.J., Sarter, N.B., Wickens, C.D.: Analysis Of Pilot’s Monitoring And Performance On An Automated Flight Deck. In: 11th International Symposium on Aviation Psychology. The Ohio State University (2001) 8. Nikolic, M.I., Orr, J.M., Sarter, N.B.: Why Pilots Miss the Green Box. The International Journal of Aviation Psychology, LEA Inc., 39–52 (2004) 9. Osterloh, J.-P., Lüdtke, A.: Analyzing the Ergonomics of Aircraft Cockpits Using Cognitive Models. In: Karowski, W., Salvendy, G. (eds.) Proceedings of the 2nd International Conference on Applied Human Factors. USA Publishing, Las Vegas (2008) 10. Wickens, C.D., Helleberg, J., Goh, J., Xu, X., Horrey, B.: Pilot task management: testing an attentional expected value model of visual scanning (ARL-01-14/NASA-01-7). University of Illinois, Aviation Research Lab., Savoy (2001) 11. Yantis, S., Jonides, J.: Abrupt Visual Onsets and Selective Attention. Journal of Experimental Psychology 16, 121–134 (1990)
Further Steps towards Driver Modeling According to the Bayesian Programming Approach Claus Möbus and Mark Eilers∗ University of Oldenburg / OFFIS, Germany {claus.moebus,mark.eilers}@uni-oldenburg.de
Abstract. The Human Centered Design (HCD) of Partial Autonomous Driver Assistance Systems (PADAS) requires Digital Human Models (DHMs) of human control strategies for simulating traffic scenarios. We describe first results to model lateral and longitudinal control behavior of drivers with simple dynamic Bayesian sensory-motor models according to the Bayesian Programming (BP) approach: Bayesian Autonomous Driver (BAD) models. BAD models are learnt from multivariate time series of driving episodes generated by single or groups of users. The variables of the time series describe phenomena and processes of perception, cognition, and action control of drivers. BAD models reconstruct the joint probability distribution (JPD) of those variables by a composition of conditional probability distributions (CPDs). The real-time control of virtual vehicles is achieved by inferring the appropriate actions under the evidence of sensory percepts with the help of the reconstructed JPD. Keywords: digital human response models, driver models, Bayesian autonomous driver models, learning of human control strategies, probabilistic Bayesian lateral and longitudinal control, graphical modeling, human behavior learning and transfer, Bayesian Programming.
1 Introduction A driver is a human agent whose skills can be described by stages which were labeled by Anderson [1] as: cognitive, associative, and autonomous. According to these stages various modeling approaches seem to be adequate: production-system (e.g. models in the ACT-R-architecture [2, 3]) for the cognitive and associative stages, control-theoretic [4 - 6], and probabilistic models [7, 8] for the autonomous stage. The first two kinds of models are quite standard approaches, now [9]. The main advantage of the new probabilistic models is that they have a clean semantics and at the same time are more robust than the other approaches [10, 11]. This is a great achievement in consequence of the irreducible inter- and intra-individual variability of human behavior and the irreducible incompleteness of knowledge about the stochastic environment, the driver, and his psychological mechanisms [12, 13]. Furthermore, probabilistic computational models are not programmed like traditional simulation ∗
Project Integrated Modeling for Safe Transportation (IMOST) sponsored by the Government of Lower Saxony, Germany under contracts ZN2245, ZN2253, ZN2366.
V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 413–422, 2009. © Springer-Verlag Berlin Heidelberg 2009
414
C. Möbus and M. Eilers
software but are condensed and abstracted in an objective manner. In this way the behavior of single or groups of drivers can be learnt by objective machine learning techniques [14]. The intra-individual variation of human behavior can e.g. be seen from the gaze distribution of a single driver passing the same road position repeatedly (Fig. 11).
Fig. 1. Distribution of the drivers gaze on a single lane road; gazes of 21 drives of one driver within a radius of 50cm around the current position are shown2
Gazes are directed to various positions in the vision field, which conflicts with assumptions made in control-theoretic driver models like Salvucci & Gray’s 2-pointmodel of lateral control [2, 15]. In summary computational driver models should • predict and generate driver behavior emitted by drivers sometimes in interaction with assistance systems • identify situations or maneuvers and classify behavior of drivers as e.g. anomalous or normal • provide a robust and valid mapping from human sensory data to human control actions even when inter- and intra-individual variance is observable
1
Background image taken from experimental records at DLR, Braunschweig, Germany. Calculations of the gaze points have been done at OFFIS by Bertram Wortelen. 2 During the experiments at the DLR data from the simulator and the eye tracking system have been recorded at a frequency of 30 Hz. The eye tracking data together with the position of the car on the road and a geometrical representation of the road have been used to calculate the drivers' focal gaze points on the road and to draw this as an overlay on the video. This figure contains the focal gaze points of all 21 drives of one driver at the same road position. The course of the cars deviates in each run, so actually we have taken all focal points within a radius of 50 cm. These points are drawn as red dots in the figure.
Further Steps towards Driver Modeling According to the BP Approach
415
• be learnt from time series of raw data or empirical probability distributions with statistical sound (machine-learning) procedures relying only on a few non-testable ad hoc or axiomatic assumptions • be able to learn new patterns of behavior without forgetting already learnt skills (stability-plasticity dilemma) [16].
2 Bayesian Autonomous Driver (BAD) Models 2.1 Related Work Due to the irreducible variability of human behavior and the irreducible lack of knowledge about cognitive mechanisms of a driver in a stochastic environment it seems rational to model human drivers with for instance probabilistic models: Bayesian Autonomous Driver (BAD) models. According to the Bayesian Programming (BP) approach [13, 17, 23] BAD models [7, 8] are a special type of Bayesian Networks (BN) [18 - 22] using concepts from probabilistic robotics [11]. BP is a simple and generic framework suitable for human modeling in the presence of incompleteness and uncertainty. It provides integrated model-driven data analysis and model construction. In contrast to conventional Bayesian networks BP-models may have a recursive structure and infer concrete motor actions for real-time control on the basis of sensory evidence. Actions are sampled from CPDs for action variables after propagating sensor or task goal evidence. Sampling can be made with the draw or the best operator. Sampling with draw is sampling of concrete actions from the CPD P(Action | parents3(Action)) and sampling with best is sampling the conditional expected value of the CPD P(Action | parents(Action)). This is known as the nonlinear regression of Action on parents(Action): E(Action | parents(Action)). BAD models can be learnt objectively with statistical sound methods from multivariate time series of the variables of interest. They describe phenomena on the basis of these variables and the decomposition of their JPD by (simplified) CPDs according to the special chain rule for Bayesian networks [22, p.36]. The underlying (conditional) independence hypotheses (CIHs) between sets of variables can be tested by standard statistical methods (e.g. the conditional mutual information index [22, p.237]). Model validity is thus included in the modeling process by model-driven data-analysis without ex-post validation. In [7] we described first steps to model lateral and longitudinal control behavior of single and groups of drivers with reactive Bayesian sensory-motor models. Here we include the time domain and describe work in progress with dynamic Bayesian sensory-motor models. The vision is a dynamic BAD model which is able to compose behavior from basic kinds of motor schemas: dynamic mixture-of-expert (MoE; Fig. 9) [23] BAD model [8]. This new MoE models facilitate the collection of sensorymotor schemas (= experts) in a library. Context dependent driver behavior could be generated by mixing pure behavior from different schemas (= experts) avoiding the stability-plasticity dilemma [16] at the same time. Whereas [24] rely on Hidden Markov Models (HMMs) for learning fine manipulation tasks like grasping and 3
parents(Action) are the parents of the node Action in the BN [22, p.36].
416
C. Möbus and M. Eilers
assembly by Markov mixtures of experts we strive for a new dynamic Bayesian Network (DBN) model in learning multi-maneuver driving behavior [8]. 2.2 Basic Concepts In presenting basic concepts of BP like postulates, definitions, notations and rules we borrow from [13, 17]. A BP is defined as a mean of specifying a family of probability distributions. By using such a specification it is possible to construct a BAD model, which can effectively control a (virtual) vehicle. The components of a BP are presented in Fig. 2, and Fig. 3, where the analogy to a logic program is helpful.
Fig. 2. Structure of a Bayesian Program (slightly modified terminology to [13, 17])
An application consists of a (Task model) description and a question. A description is constructed from preliminary knowledge and a data set. Preliminary knowledge is constructed from a set of pertinent variables, a decomposition of the JPD and a set of forms. Forms are either parametric forms or BPs. The purpose of a description is to specify an effective method to compute a JPD on a set of variables given a set of (experimental) data and preliminary knowledge. To specify preliminary knowledge the modeler must define the set of relevant variables on which the JPD is defined, decompose the JPD into factors of CPDs according to CIHs, and define the forms. Each CPD in the decomposition is a form. Either this is a parametric form which parameter are estimated from data or another application. Given a description a question is obtained by partitioning the variables into searched, known, and unknown variables. We define a question as the CPD P(Searched | Known, preliminary knowledge, data). The selection of an appropriate action can be treated as the inference problem: Policy(P(Action | Percepts, Goals, preliminary knowledge, data). Various policies (Draw, Best, and Expectation) are possible whether the concrete action is drawn at random, chosen as the best action with highest probability, or as the expected action.
Further Steps towards Driver Modeling According to the BP Approach
417
Fig. 3. Structure of Dynamic Bayesian Networks (DBNs) as Bayesian Programs (slightly modified terminology to [23])
Fig. 4. Partially inverse DBN of Lateral Control
2.3 Results Static reactive or static inverse models have not been satisfactory because they generate behavior which is more erratic and nervous than human behavior [7]. Better results can be obtained by introducing a memory component and using DBNs. In a first step we estimated two DBNs separately for the lateral and longitudinal control. Our experience is that partially inverse models were quite useful (Fig. 4, 5). In an inverse model arcs in the DAG of the graphical model are directed from the consequence to the prerequisites. The semantics of these arcs are denoted by the
418
C. Möbus and M. Eilers
Fig. 5. Partially inverse DBN of Longitudinal Control
conditional probabilities P(Prerequisites | Consequence). Our models are partially inverse because because most arcs are inverted but the arcs between time slice t-1 and t are in causal order from prerequisites to consequences. The variables of interest are partitioned into sensory variables (heading angle of the vehicle, perceived speed) and actions (steering angle, acceleration). According to the visual attention allocation theory of Horrey et al. [25] the perception of the heading angle is influenced by areas in the visual field (ambient channel), the head angle and the gaze angle relative to the head. At the present moment light grey nodes in Fig. 4 are not included into the
Fig. 6. Bird’s eye view of race track with curve radii and rotation angles
Further Steps towards Driver Modeling According to the BP Approach
419
Fig. 7. Snapshot of BAD model drive on TORCS race track
model. Instead we assumed that drivers are able to compute the aggregate sensory variables heading angle and vehicle speed. Compared to the lateral control in Salvucci & Gray’s model [2] our BAD model is simpler and makes less assumptions about the vision field.
Fig. 8. Comparison of Human and BAD-Model Drives
420
C. Möbus and M. Eilers
Fig. 9. Mixture-of-Experts Architecture of Bayesian Autonomous Driver (BAD) Model
Data were obtained in experimental drives of a single driver on the TORCS racing course (Fig. 6, [26]). Only a few laps were necessary to obtain the data for estimating the parameters (means, standard deviations) for the Gaussian parametric forms. A snapshot of a BAD model drive is shown in Fig. 7. The map of the racing course and curve specific measurements is presented in Fig.6, 8. A comparison of the driver speed data with model generated speed data demonstrates the quality of the simple BAD model. But, because there are some collisions with the roadsides the capabilities of the BAD model need to be improved. Further improvements are expected by combining the two controllers, by including cognitive constructs like goals and latent states of the driver, and above all segmenting behaviors into context dependent schemas (= experts) (Fig. 9). Using goals (e.g. driving a hairpin or an S-curve) makes it possible to adapt the model to different road segments and situations. We expect to use the same model for situation recognition or to situation-adapted control. The modeling idea of a HMM was abandoned because the state variable has to very fine grained to obtain a high quality vehicle control [8]. In HMMS perceptions and actions should be conditionally independent, when state is known.
3 Conclusions and Outlook In our current research [7, 8] we strive for the realization of BAD model architecture on the basis of a DBN (Fig. 9). It is a psychological motivated mixture-of-experts model which is distributed across two time slices. It implements the autonomous layer of a cognitive agent and avoids the fine grained state assumptions of HMMs. Learning data are time series of pertinent variables: percepts, goals, and actions. We can model individual or groups of human and artificial agents. The model propagates information in various directions. When working top-down, goals emitted by the associative layer of a cognitive model a corresponding expert, who propagates actions, areas of interest (AoIs) and perceptions is selected. When working bottom-up, percepts trigger AoIs, actions, experts and goals. When the task or goal is defined and
Further Steps towards Driver Modeling According to the BP Approach
421
the model has certain percepts evidence can be propagated simultaneously top-down and bottom-up and the appropriate expert and its behavior can be activated. Thus, the model can be easily extended to implement the SEEV visual scanning model [25]. All probabilistic models presented here can be constructed by data mining single or aggregated driver’s behavior traces in experimental settings with or without experimental induced goals.
References 1. Anderson, J.R.: Learning and Memory. John Wiley, Chichester (2002) 2. Salvucci, D.D., Gray, R.: A Two-Point Visual Control Model of Steering. Perception 33, 1233–1248 (2004) 3. Salvucci, D.D.: Integrated Models of Driver Behavior. In: Gray, W.D. (ed.) Integrated models of cognitive systems, pp. 356–367. Oxford University Press, New York (2007) 4. Jürgensohn, T.: Control Theory Models of the Driver. In: Cacciabue (ed.), pp. 277–292 (2007) 5. Weir, D.H., Chao, K.C.: Review of Control Theory Models for Directional and Speed Control. In: Cacciabue, P.C., pp. 293–311 (2007) 6. Möbus, C., Hübner, S., Garbe, H.: Driver Modelling: Two-Point-or Inverted Gaze-BeamSteering. In: Rötting, M., Wozny, G., Klostermann, A., Huss, J. (Hrsgb) Prospektive Gestaltung von Mensch-Technik-Interaktion, 7. Berliner Werkstatt Mensch-MaschineSysteme, Berlin, Fortschritt-Berichte VDI-Reihe 22 (Nr. 25), pp. 483–488. VDI Verlag, Düsseldorf (2007) 7. Möbus, C., Eilers, M.: First Steps Towards Driver Modeling According to the Bayesian Programming Approach, Symposium Cognitive Modeling. In: Urbas, L., Goschke, T., Velichkovsky, B. (eds.) KogWis, vol. 6, p. 59. Christoph Hille, Dresden (2008) 8. Möbus, C., Eilers, M., Garbe, H., Zilinski, M.: Probabilistic, and Empirical Grounded Modeling of Agents in Partial Cooperative (Traffic) Scenarios. In: Conference Proceedings, HCI 2009, Digital Human Modeling. LNCS (LNAI). Springer, San Diego (2009) 9. Cacciabue, P.C. (ed.): Modelling Driver Behaviour in Automotive Environments. Springer, London (2007) 10. Chater, N., Oxford, M. (eds.): The Probabilistic Mind: Prospects for Bayesian Cognitive Science. Oxford University Press, Oxford (2008) 11. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005) 12. Anderson, J.R., Fincham, J.M., Qin, Y., Stocco, A.: A Central circuit of the mind. Trends in Cognitive Science 12(4), 136–143 (2008) 13. Bessiere, P., Laugier, C., Siegwart, R. (eds.): Probabilistic Reasoning and Decision Making in Sensory-Motor Systems. Springer, Berlin (2008) 14. Xu, Y., Lee, K.K.C.: Human Behavior Learning and Transfer. CRC Press, Boca Raton (2006) 15. Möbus, C., Hübner, S., Garbe, H.: Driver Modelling: Two-Point- or Inverted Gaze-BeamSteering. In: Rötting, M., Wozny, G., Klostermann, A., Huss, J. (eds.) Prospektive Gestaltung von Mensch-Technik-Interaktion, Fortschritt-Berichte VDI-Reihe 22 (Nr. 25), pp. 483–488. VDI Verlag, Düsseldorf (2007) 16. Hamker, F.H.: RBF learning in a non-stationary environment: the stability-plasticity dilemma. In: Howlett, R.J., Jain, L.C. (eds.) Radial Basis Function networks 1: Recent Developments in Theory and Applications; Studies in fuzziness and soft computing, ch. 9, vol. 66, pp. 219–251. Physica Verlag, Heidelberg (2001)
422
C. Möbus and M. Eilers
17. Lebeltel, O., Bessiere, P., Diard, J., Mazer, E.: Bayesian Robot Programming. Autonomous Robots 16, 49–79 (2004) 18. Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo (1988) 19. Pearl, J.: Causality: Models, Reasoning and Interference, 2nd edn. Cambridge University Press, Cambridge (2009) 20. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2001) 21. Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Upper Saddle River (2004) 22. Jensen, F.V., Nielsen, T.D.: Bayesian Networks and Decision Graphs, 2nd edn. Springer, Heidelberg (2007) 23. Bessiere, P.: Survey: Probabilistic Methodology and Techniques for Artifact Conception and Development, Repport de Recherche, No. 4730, INRIA (2003) 24. Meila, M., Jordan, M.I.: Learning Fine Motion by Markov Mixtures of Experts, MIT, AI Memo No. 1567 (1995) 25. Horrey, W.J., Wickens, C.D., Consalus, K.P.: Modeling Driver’s Visual Attention Allocation While Interacting With In-Vehicle Technologies. J. Exp. Psych. 12, 67–78 (2006) 26. TORCS, http://torcs.sourceforge.net/ (visited 18.10, 2008)
Probabilistic and Empirical Grounded Modeling of Agents in (Partial) Cooperative Traffic Scenarios Claus Möbus, Mark Eilers1, Hilke Garbe, and Malte Zilinski2 University of Oldenburg / OFFIS, Germany {FirstName.LastName}@uni-oldenburg.de
Abstract. The Human Centered Design (HCD) of Partial Autonomous Driver Assistance Systems (PADAS) requires Digital Human Models (DHMs) of human control strategies for simulations of traffic scenarios. The scenarios can be regarded as problem situations with one or more (partial) cooperative problem solvers. According to their roles models can be descriptive or normative. We present new model architectures and applications and discuss the suitability of dynamic Bayesian networks as control models of traffic agents: Bayesian Autonomous Driver (BAD) models. Descriptive BAD models can be used for simulating human agents in conventional traffic scenarios with BetweenVehicle-Cooperation (BVC) and in new scenarios with In-Vehicle-Cooperation (IVC). Normative BAD models representing error free behavior of ideal human drivers (e.g. driving instructors) may be used in these new IVC scenarios as a first Bayesian approximation or prototype of a PADAS. Keywords: digital human response models, probabilistic driver models, Bayesian autonomous driver models, learning of human control strategies, graphical modeling, human behavior learning and transfer, distributed cognition, mixtureof-experts model, visual attention allocation, partial cooperative problem solvers, partial autonomous assistance system, Bayesian assistance system, shared space, probabilistic detection of anomalies, driver assistance systems, traffic agents, dynamic Bayesian networks, hidden Markov models, betweenvehicle-cooperation, within-vehicle-cooperation.
1 Introduction We discuss the suitability of a new type of real-time probabilistic control models for the psychological valid representation of traffic agent (e.g.: driver) behavior: Bayesian Autonomous Driver (BAD) models. These models [1, 2] are developed in the tradition of Bayesian expert systems and Bayesian robot programming [3, 4]. Descriptive BAD models can be used for simulating agents in conventional traffic scenarios with Between-Vehicle- Cooperation 1
Project Integrated Modeling for Safe Transportation (IMOST) sponsored by the Government of Lower Saxony, Germany under contracts ZN2245, ZN2253, ZN2366. 2 Project ISi-PADAS funded by the European Commission in the 7th Framework Program, Theme 7 Transport FP7-218552. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 423–432, 2009. © Springer-Verlag Berlin Heidelberg 2009
424
C. Möbus et al.
(BVC). Furthermore, when modeling normative correct behavior of ideal human drivers (e.g. driving instructors) they may be used for a conceptual new kind of systems: Bayesian Assistance Systems (BAS). These may be used for In-VehicleCooperation (IVC) between the human driver and the BAS. Thus a BAS may be regarded as a first Bayesian approximation or prototype of a PADAS. Due to their probabilistic nature BAS can not only be used for real-time control but also for realtime detection of anomalies in driver behavior and real-time generation of supportive interventions. Traffic scenarios can be regarded as problem situations with one or more (partial) cooperative problem solvers. A scenario is called cooperative, when all problemsolving agents try to solve a goal specified by one single principal. Thus, a scenario is partial cooperative, when goals are defined by several different principals [5]. Successful problem solutions require (nonverbal) communication and distributed cognition. This is especially true when traffic scenarios are deregulated as in the IVC (driver – PADAS – interaction) or in the BVC (e.g. shared space) type. In most cases, traffic maneuvers are run without risk. Though, risky situations can occur anytime. We call risky maneuvers anomalies which only experienced drivers are able to prevent or to anticipate automatically. Other drivers probably cannot and therefore might need support generated by a PADAS. It is expected that assistance systems will enhance situation awareness, cooperation and driving competence of unskilled or non-cooperative drivers in the near future. Thus the design challenge of intelligent assistance systems should aim at modeling traffic agents with their beliefs, expectations, behavior, situation awareness, and their skills to recognize situations, to diagnose and prevent anomalies. We think that dynamic probabilistic models are appropriate for these challenges. We review some types of models and propose a new mixture-of-expert architecture with attention allocation stemming from our current research [1, 2].
2 Distributed Cognition and Traffic Scenarios The concept of distributed cognition was introduced in the mid 1980s by Edwin Hutchins [6]. His theory proposes that human knowledge and cognition are not confined to the individual. Instead, they are distributed by placing cognitive skills on the objects, individuals, and tools in the environment. Cognitive processes may be distributed across the members of a social group or may be distributed in the sense that the operation of the cognitive system involves coordination between internal and external (material or environmental) structure. 2.1 Cooperative Scenarios: Crews and In-Vehicle-Dyads Hutchins [6] studied crews with an emphasis on anthropological and nonexperimental methods. These methods then became popular. The question raised was how crews of ships can function as a distributed machine, offloading the cognitive burden of ship navigation onto each member of the crew. Hutchins approach questioned disembodied views of cognition and alternatively suggested studying cognitive systems that are composed of multiple agents and the material world. Later studies
Probabilistic and Empirical Grounded Modeling of Agents
425
generalized the domains and put an emphasis on airline cockpits crews and humancomputer interaction scenarios. Members of a public traffic scenario with BVC do not form a stable social group but rather an ad hoc group with a limited life time and a limited communication vocabulary. Whereas members in a nonpublic traffic scenario (novice driver and driving instructor; Fig. 1) with IVC form a stable social group which resembles a crew.
Fig. 1. Cooperative driving scenario with in-vehicle-cooperation (background graphics from [7] with kind permission of the publisher)
2.2 Partial Cooperative Scenarios: Ad-Hoc Groups and Shared Space Crews on navigation bridges or in aircraft cockpits work in agreement with a single principal. They form a cohesive group whose members normally cooperate for hours in solving the problems arising during ship or aircraft operation. This cooperation includes exchange of complex verbal messages which require a high dimensional state space for the agent models. Public traffic scenarios are of a fundamentally different kind. Communication, cooperation and the action repertoire is limited in amount and complexity. Agents are their own principals and do not belong to a formal cohesive group. They come together by chance and (might) try to maximize their personal utilities; sometimes ignoring the needs of others. Internal group norms are substituted by traffic rules which are expected to accelerate negotiations between the traffic agents in a scenario. The solution to a traffic coordination problem is a distributed but synchronized sequence of sets of actions (e.g. collision-free crossing an intersection) emitted by autonomous agents. Shared space describes an approach to the design, management and maintenance of public spaces which reduces the adverse effects of conventional traffic engineering by stimulating the situation awareness of all traffic agents. The shared space approach is based on the observation that individuals' behavior in traffic is more positively affected by the built environment of the public space than by conventional traffic control devices (signals, signs, road markings, etc.) or regulations. An explanation for the apparent paradox that a reduction in regulation leads to safer roads may be found by studying the risk compensation effect: “Shared Space is successful because the
426
C. Möbus et al.
perception of risk may be a means or even a prerequisite for increasing public traffic safety. Because when a situation feels unsafe, people are more alert and there are fewer accidents.” (http://www.shared-space.org/; visited 23.02.2009).
3 Modeling Agents in (Partial) Cooperative Scenarios Skilled agents differ from novices in their competence of risk perception thus increasing their personal safety. Computational agent models have to represent these and other kinds of perceptions, beliefs, goals and actions of the ego agent and alter agents. Driver models should • predict and generate driver behavior emitted by individual drivers sometimes in interaction with assistance systems • identify situations or maneuvers and classify behavior (e.g. anomalous vs. normal) of ego driver • provide a robust and valid mapping from human sensory data to human control actions • be learnt from time series of raw data or empirical probability distributions with statistical sound (machine-learning) procedures with only a few non-testable ad hoc or axiomatic assumptions • should be able to learn new patterns of behavior without forgetting already learnt skills (stability-plasticity dilemma). A driver is a human agent whose skills can be described by the cognitive, associative, and autonomous stage. Accordingly various modeling approaches are adequate: production-system (e.g. models in a rule-based architecture [8 - 10]), control-theoretic [11, 12], and probabilistic models [13, 14]. The advantage of probabilistic models is that they fulfill the above criteria especially that they are more robust than other approaches. This is a great advantage due to the irreducible incompleteness of knowledge about the environment and the underlying psychological mechanisms [4]. 3.1 Bayesian Assistance Models in In-Vehicle-Dyads As an example we present a result of Rizzo et al. [15]. The authors studied the behavior of drivers suffering from Alzheimer disease. At a lane crossing a car incurred from the right (Fig. 2). Many maneuvers of the Alzheimer patients ended in a collision, as they suffered from the looking without seeing syndrome. The modeling task should lead to a probabilistic BAS model, which is diagnosing and correcting the anomalous behavior of the inexperienced driver. Fig. 2 demonstrates the behavioral risk assessment in probabilistic terms and Fig. 3 the replacement of the real driving inspector by the corresponding BAS model an intended Bayesian prototype of a PADAS. How can this model be derived by methods of Bayesian driver modeling and what is the use of it? The best way to explain this is an obstacle scenario which is known to generate intention conflicts within the driver (Fig. 4). When an obstacle (animal, car) is appearing unexpectedly people autonomously react with a maneuver M-- which is not recommended by experts. M-- drivers try to avoid collisions even at high
Probabilistic and Empirical Grounded Modeling of Agents
427
Fig. 2. Driving behavior of an Alzheimer driver in a simulated intersection incursion [15] and risk assessment by a probabilistic normative Bayesian driver model residing in the subject vehicle
Fig. 3. Cooperative driving scenario with in-vehicle-cooperation between non-expert driver and Bayesian normative driver model (BAS prototype) (background graphics from [7] with kind permission of the publisher)
velocities by steering to the left or right risking a fatal turnover. The recommended maneuver M+ includes the hold and brake sub-maneuvers. When drivers are instructed to drive M+ these data provide the learning data for the BAS version of the PADAS according the methods of 3.2. With an existing BAS model a worst-case scenario is planned to test the services of the BAS. Drivers are instructed not to drive according to M+ (not M+ = M--). Because of the probabilistic nature of the BAS it is possible to compute the conditional probability P(Actiont | M+). This is a measure of the anomaly of the driver behavior
428
C. Möbus et al.
Fig. 4. Scenario with conflicting maneuvers
under the hypothesis that the observed actions are generated by following the correct maneuver M+. 3.2 Bayesian Autonomous Driver (BAD) Models Due to the variability of human cognition and behavior and the present time irreducible lack of knowledge about cognitive mechanisms it seems rational to conceptualizes, estimate and implement probabilistic models when modeling traffic agents. In contrast to other models probabilistic models could be derived objectively from the empirical distributions of the random variables of interest with only a few axiomatic assumptions. Model validity is thus included in the modeling process by model-driven data-analysis without any ex-post validation. BAD models describe phenomena on the basis of variables and conditional probability distributions (JPDs). This is in contrast to models in cognitive architectures (e.g. ACT-R) which try to simulate cognitive algorithms and processes on a granular basis which are difficult to identify with e.g. functional magnetic resonance imaging (FMRI) methods [16, 17]. Instead a more abstract mapping is possible: the mapping of the activation of entire ACT-R-modules into the states of a Hidden Markov Model (Fig. 5). At present these activations are the only dynamic aspects of ACT-R-models which could be empirically identified by brain imaging techniques [16, 17]. 3.2.1 Hidden Markov Models (HMMs) Currently we are evaluating the suitability of static and dynamic graphical models known as Hidden Markov Models (HMMs) or Bayesian Belief Nets (BBNs). With the static type it is possible to generate reactive models [3] and inverse (naïve) [18] models. Currently our research [1, 2] has shown that static models generate behavior which is too erratic for human behavior. As a consequence we focus ourselves on the dynamic type of real-time control for simulated cars [2]. The dynamic type enables the creation of Markov Models (MMs), Hidden Markov Models (HMMs) [19-21], Input-Output-HMMs (IOHMMs) [22], Reactive IOHMMs (RIOHMMs; Fig. 5), Coupled RIOHMMs (CRIOHMMs; Fig. 6), [23]. HMMs allow the recognition of situations, goals and intentions and the generation of behavior of Belief-Desire-Intention (BDI-) Agents. RIOHMMs implement driver models e.g. with
Probabilistic and Empirical Grounded Modeling of Agents
429
ACT-R module activations. The two arrows into the random variable nodes Zj denote the combined dependence of actions on sensory inputs and activations of hidden ACT-R modules or brain regions. Even if module activations were known sensory inputs are still necessary to propose actions. CRIOHMMs model dyads of agents with mutual belief influences. The belief state of each agent depends only on his own history and on the belief state of his partner. Whether it is plausible has to be tested by conditional independence hypotheses. Within each agent the model is of the RIOHMM-type.
Fig. 5. Reactive Input-Output HMM (RIOHMM) slightly simplified version of Bengio and Frasconi [22]
Fig. 6. Coupled Reactive Input-Output Hidden Markov Models (CRIOHMM)
There is a trade-off between HMMs and DBNs. Inferences in HMMs are more efficient than in DBNs, whereas the state-space in HMMs grows more rapidly than in corresponding DBNs. This is especially true, when the HMM is used not only for situation recognition but also for real-time control of behavior. 3.2.2 Dynamic Bayes Net Models (DBNs) In our current research [2] we strive for the realization of the dynamic Bayesian model (Fig. 7). It implements the sensory-motor system of human drivers in the functional autonomous layer or stage of Anderson [24]. It is a psychological motivated mixture-of-experts (= mixture-of-schema) model with autonomous and goal-based attention allocation processes. It implements the autonomous layer of a cognitive agent, is distributed across two time slices, and avoids the latent state assumptions of HMMs. Learning data are time series of relevant variables: percepts, goals, and actions. We can model individual or groups of human and artificial agents. The model propagates information in various directions. When working top-down, goals emitted by the associative layer select a corresponding expert (schema), which propagates actions, relevance of areas of interest (AoIs) and perceptions. When working bottom-up, percepts trigger AoIs, actions, experts and goals. When the task or goal is defined and the model has certain percepts evidence can be propagated simultaneously top-down and bottom-up and the appropriate expert (schema) and its
430
C. Möbus et al.
behavior can be activated. Thus, the model can be easily extended to implement a modified version of the SEEV visual scanning or attention allocation model of Horrey, Wickens, and Consalus [25]. Please note that due to our modification the indices have changed. In contrast to Horrey et al. the model can predict the probability of attending a certain AoI on the basis of single, mixed, and even incomplete evidence (goal priorities, percepts, effort to switch between AoIs).
Fig. 7. Mixture-of-Experts (= Mixture-of-Schema) Architecture of Bayesian Autonomous Driver (BAD) Model with visual attention allocation extension (mapping ideas of Horrey et al. [25] into the Bayesian network domain)
There are various scientific challenges designing and implementing BAD Models. The first main challenge is to generate driver behavior by a mixture-of-expert architecture. While mixture-of-experts approaches are known from pattern classification [26] it is the first time that this approach is used in human modeling. The second main challenge is that we want to integrate from psychological action control theory various perceptional invariants known as tau-measures [27] into a computational human model. In conventional models variables with different dimensions (distances, angles, times, changes, etc) are input to the models. Tau measures transform all non-time measures into the time domain. Some measures are already used in engineering (TTC, TTLC). The role of these invariants for the psychology of motion control is discussed since Lee [28]. Now it is the first time that these measures are used to generate behavior in a probabilistic mixture-of-expert (MoE) model. In a MoE model it is assumed that the behavior can be context dependent generated as a mixture of ideal schematic behaviors (= experts). Thus the stability/plasticity dilemma [29] of neural network models is avoided. New behavior can be learnt by adding a new expert to the library of expert. Experts do not influence each other directly. Pure expert behavior without
Probabilistic and Empirical Grounded Modeling of Agents
431
any additional mixture component is shown only in typical pure situations (e.g. the perception of a hair pin triggers the hair pin model expert). All probabilistic models presented here can be constructed by data mining single or aggregated driver’s behavior traces in experimental settings with or without experimental induced goals.
4 Summary We discussed two kinds of (partial) cooperative traffic scenarios with within-vehicle(driving school) and between-vehicle-cooperation (Shared Space). Either individual or groups of human agents can be modeled by Bayesian Autonomous Agent (BAD) models according to the Bayesian Programming Approach. Learning data are time series of pertinent variables: percepts, goals, and actions. Modeling ideal correct behavior may provide the basis for Bayesian prototypes for partial autonomous assistance systems (BAS Models). Because of the probabilistic nature of the BAS it is possible to compute the conditional probability P(Actiont | M+) of the anomaly of the driver behavior under the hypothesis that the observed actions are generated by interpreting the correct maneuver M+. This makes it possible to define thresholds for PADAS interventions.
References 1. Möbus, C., Eilers, M.: First Steps Towards Driver Modeling According to the Bayesian Programming Approach. In: Urbas, L., Goschke, T., Velichkovsky, B. (eds.) KogWis 2008, Tagungsband der 9. Fachtagung der Gesellschaft für Kognitionswissenschaft, p. 59. Verlag Hille, Dresden (2008) 2. Möbus, C., Eilers, M.: Further Steps Towards Driver Modeling according to the Bayesian Programming Approach. In: Conference Proceedings, HCI 2009, Digital Human Modeling. LNCS (LNAI). Springer, San Diego (2009) (accepted) 3. Lebeltel, B., Mazer, D.: Bayesian Robot Programming. Autonomous Robots 16, 49–79 (2004) 4. Bessiere, P., Laugier, C., Siegwart, R. (eds.): Probabilistic Reasoning and Decision Making in Sensory-Motor Systems. Springer, Berlin (2008) 5. Xiang, Y.: Probabilistic Reasoning in Multiagent Systems - A Graphical Models Approach. Cambridge University Press, Cambridge (2002) 6. Hutchins, E.: Cognition in the Wild. MIT Press, Cambridge (1995) 7. Löper, C., Kelsch, J., Flemisch, F.O.: Kooperative, Manöverbasierte Automation und Arbitrierung als Bausteine für hochautomatisiertes Fahren, In: Gesamtzentrum für Verkehr Braunschweig (Hrsgb): Automatisierungs-, Assistenzsysteme und eingebettete Systeme für Transportmittel, GZVB, Braunschweig S, pp. 215–237 (2008) 8. Gluck, K.A., Pew, R.W.: Modeling Human Behavior with Integrated Cognitive Architectures. Lawrence Erlbaum Associates, Mahwah (2005) 9. Anderson, J.R.: How Can the Human Mind Occur in the Physical Universe? Oxford University Press, Oxford (2007)
432
C. Möbus et al.
10. Baumann, M., Colonius, H., Hungar, H., Köster, F., Langner, M., Lüdtke, A., Möbus, C., Peinke, J., Puch, S., Schießl, C., Steenken, R., Weber, L.: Integrated Modeling for Safe Transportation – Driver modeling and driver experiments (in press). Fortschrittsbericht des VDI in der Reihe 22 (Mensch-Maschine-Systeme) 11. Bischof, N.: Struktur und Bedeutung: Einführung. Huber, Systemtheorie (1995) 12. Jagacinski, R.J., Flach, J.M.: Control Theory for Humans: Quantitative Approaches to Modeling performance. Lawrence Erlbaum Associates, Mahwah (2003) 13. Wickens, T.D.: Models for Behavior: Stochastic Processes in Psychology. Freeman, San Francisco (1982) 14. Gopnik, A., Tenenbaum, J.B.: Bayesian networks, Bayesian learning and cognitive development. Development Science 10(3), 281–287 (2007) 15. Rizzo, M., McGehee, D.V., Dawson, J.D., Anderson, S.N.: Simulated Car Crashes at Intersections in Drivers With Alzheimer Disease. Alzheimer Disease and Associated Disorders 15(1), 10–20 (2001) 16. Anderson, J.R., Fincham, J.M., Qin, Y., Stocco, A.: A Central circuit of the mind. Trends in Cognitive Science 12(4), 136–143 (2008) 17. Quin, Y., Bothell, D., Anderson, J.R.: ACT-R meets fMRI. In: Zhong, N., Liu, J., Yao, Y., Wu, J., Lu, S., Li, K. (eds.) Web Intelligence Meets Brain Informatics. LNCS (LNAI), vol. 4845, pp. 205–222. Springer, Heidelberg (2007) 18. Le Hy, R., Arrigoni, A., Bessière, P., Lebeltel, O.: Teaching Bayesian Behaviours to Video Game Characters. In: Robotics and Autonom. Systems, vol. 47, pp. 177–185. Elsevier, Amsterdam (2004) 19. Oliver, N., Pentland, A.P.: Graphical Models for Driver Behavior Recognition in a SmartCar. In: IEEE Intl. Conf. Intelligent Vehicles, pp. 7–12 (2000) 20. Miyazaki, T., Kodama, T., Furushahi, T., Ohno, H.: Modeling of Human Behaviors in Real Driving Situations. In: 2001 IEEE Intelligent Transportation Systems Conference Proceedings, pp. 643–645 (2001) 21. Kumugai, T., Sakaguchi, Y., Okuwa, M., Akamatsu, M.: Prediction of Driving Behavior through Probabilistic Inference. In: Proceedings of the 8th International Conference on Engineering Applications of Neural Networks (EANN 2003), pp. 117–123 (2003) 22. Bengio, Y., Frasconi, P.: Input/output Hidden Markov Models for Sequence Processing. IEEE Transactions on Neural Networks 7, 1231–1249 (1996) 23. Oliver, N.M.: Towards Perceptual Intelligence: Statistical Modeling of Human Individual and Interactive Behaviors, MIT Ph.D. Thesis (2000) 24. Anderson, J.R.: Learning and Memory. John Wiley, Chichester (2002) 25. Horrey, W.J., Wickens, C.D., Consalus, K.P.: Modeling Driver’s Visual Attention Allocation While Interacting With In-Vehicle Technologies. J. Exp. Psych. 12, 67–78 (2006) 26. Bishop, C.M.: Pattern Recognition and machine learning. Springer, Heidelberg (2006) 27. Lee, D.N.: How movement is guided (2006), http://www.perception-in-action.ed.ac.uk/publications.htm 28. Lee, D.N.: A theory of visual control of braking based on information about time-tocollision. Perception 5, 437–459 (1976) 29. Hamker, F.H.: RBF learning in a non-stationary environment: the stability-plasticity dilemma. In: Howlett, R.J., Jain, L.C. (eds.) Radial Basis Function networks 1: Recent Developments in Theory and Applications; Studies in fuzziness and soft computing, ch. 9, vol. 66, pp. 219–251. Physica Verlag, Heidelberg (2001)
A Contribution to Integrated Driver Modeling: A Coherent Framework for Modeling Both Non-routine and Routine Elements of the Driving Task Andreas Mihalyi1, Barbara Deml1, and Thomas Augustin2 1
Universität der Bundeswehr München, Human Factors Institute, Werner-Heisenberg-Weg 39, 85577 Neubiberg, Germany {Andreas.Mihalyi,Barbara.Deml}@UniBW.de 2 Ludwig-Maximilians-Universität München, Institut für Statistik, Ludwigstraße 33, 80539 München, Germany [email protected]
Abstract. This paper is concerned with computational driver modeling, whereby a particular focus is placed on mapping both non-routine and routine elements of the driving task in a theoretically coherent framework. The approach is based on Salvucci’s [1] driver model and thus, the cognitive architecture ACT-R [2] is used for modeling non-routine matters; for routine activities, such as the longitudinal and the lateral control of the vehicle, a fuzzy logic approach is suggested. In order to demonstrate the applicability of this procedure, an empirical evaluation study is carried out and the steering behavior of a computational driver model is compared to that of human drivers. Keywords: Fuzzy logic, cognitive architecture, ACT-R, driver modeling.
1 Introduction During the last years a lot of research has been put into providing computational models of how humans drive a car. Usually these simulations are based on psychological principles and most of them are able to emulate certain aspects of the driving behavior in real-time (for a review see [3]). This research is not only useful for gaining a better understanding of how humans execute a rather common everyday task, but it is also helpful for improving current interfaces [1]: Within the context of usability studies, a computational driver model may be used as a rapid prototyping tool, which replaces human subjects in order to facilitate the evaluation process. Nevertheless, this information is helpful for predicting the intentions of drivers in order to design advanced driver assistance systems (e.g. lane-change detection). Although driving a car is a quite common task, it turns out to be rather complex when breaking it down into its subcomponents in order to model the underlying psychological processes. Such, for instance, the following issues are to be addressed [4]: How do drivers allocate their limited attentional resources? How do they acquire situation awareness? To consider these topics, both a strong psychological theory and a programming environment are needed. For higher-level, executive processes such a framework is provided by a cognitive architecture. However, much of what we do V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 433–442, 2009. © Springer-Verlag Berlin Heidelberg 2009
434
A. Mihalyi, B. Deml, and T. Augustin
when driving is a lower-level, automated skill, and hence may not be modeled within a cognitive architecture. To make this distinction between executive and automated processes is common practice: Executive processing “is characterized as a slow, generally serial, effortful, capacity-limited, subject-regulated processing model that must be used to deal with novel or inconsistent information” [5, p. 2]. In contrast to this, the automatic mode is described as “a fast, parallel, fairly effortless process that (…) is not under direct subject control, and is responsible for the performance of well-developed skilled behaviors” [5, p. 1]. In the context of car driving there are some aspects, like navigation or route planning, which afford executive reasoning. Just the same there are also some automated skills, such as gear shifting or steering. Although this separation between executive and automated activities is sensible, it must not be forgotten that both occur in one task simultaneously. Groeger [4, p. 65] assumes “that what we are observing (…) is the product of a single continuum of control, rather than separate control systems, separately involved in routine and nonroutine activities”. On this account a modeling approach is needed that is not only able to map these two distinct processes, but that is also able to integrate them into one coherent framework. The reminder of the article is organized as follows: First an approach of integrated driver modeling is presented which is based on a cognitive architecture (Sec. 2). While this approach seems to be suited for modeling executive processes, it is suggested to improve the modeling of automated tasks by relying on fuzzy logic (Sec. 3). In order to demonstrate the applicability of this procedure, a fuzzy controller is realized, which models the steering behavior (Sec. 4). This controller is implemented in a driver model and compared to human data (Sec. 5); its outcome and issues for further research are discussed at the end (Sec. 6).
2 Integrated Driver Modeling Due to the complexity of the driving task, Salvucci [1] suggested an integrated approach, in which the cognitive sub-tasks (e.g. plan to overtake, remember a vehicle in the blind spot) are modeled within a cognitive architecture (Sec. 2.1), whereas the automated sub-tasks (e.g. steer around a curve, follow a lead vehicle) are modeled by further theoretical assumptions (Sec. 2.2). 2.1 Modeling Non-routine, Executive Activities A cognitive architecture is both a theory of human information processing and a computational framework for simulating human behavior. Here the architecture Adaptive Control of Thought – Rational (ACT-R) [2] is used; its main components are modules, buffers, and a pattern matcher (Fig. 1): The architecture consists of perceptual-motor modules and memory modules. The first group of modules is concerned with the interface with the real world; there are visual, auditive, speech or motor modules. In addition, there are two kinds of memory modules; the declarative memory consists of factual knowledge (e.g. a red traffic light
A Contribution to Integrated Driver Modeling
435
indicates to stop) and the procedural memory represents knowledge about how we do things (e.g. if you plan to overtake, then you have to accelerate). The modules, except for the procedural memory, are accessed through buffers, and for each module, a dedicated buffer serves as the interface with that module. Although the processes within different modules can go in parallel, there is a serial bottleneck that corresponds to human limitations: The content of any buffer is limited to a single declarative unit of knowledge. Thus, only a single memory can be retrieved or a single perceptual object can be encoded at a time. Additionally, there is a pattern matcher, which searches the procedural memory for a production rule that fits to the current contents of the buffers. Again there is a serial bottleneck, which is reasonable from a theoretical point of view: Only one such production can be selected at a given moment. That production, when executed, can modify the buffers and thereby change the state of the system. Thus, in ACT-R cognitive processing is characterized as a successive firing of productions. Although only one production can be fired at a certain point in time, there may be several rules that match the state of the buffers (e.g. if there is a slower lead vehicle, then you have to brake / to overtake). For this reason, there is also a conflict resolution mechanism: the utility associated with each production is estimated by a sub-symbolic equation and only the production with the highest utility is selected for execution. To conclude, ACT-R is a hybrid architecture, which consists both of a symbolic production system and a sub-symbolic part. It includes various well-tested assumptions on human information processing and it has been used successfully in numerous domains.
Visual Module
Declarative Memory
Auditive Module
Procedural Memory Buffer
Environment Speech Module
Pattern Matching
Motor Module
Production Compilation
Fig. 1. The cognitive architecture Adaptive Control of Thought – Rational (ACT-R 6.0) [2]
As Salvucci’s [1] driver model relies on this framework, too, it inherits the architecture’s view on cognition. In addition, it makes further assumptions on the domain of driving, which become obvious when taking a look at how the model’s control flow is defined or what production rules are used. Thus, for instance, the model states which information is needed for controlling a vehicle as it indicates where to allocate the attention and which objects to encode. Just the same, it specifies how a highway traffic environment is monitored in order to gain situation awareness, or how decisions, such as initiating a lane change, are drawn. As this article focuses mainly on the overall consistency of the modeling approach, these domain specific assumptions are not addressed in detail here.
436
A. Mihalyi, B. Deml, and T. Augustin
2.2 Modeling Routine, Automated Activities Salvucci’s [1] model does not only consider executive tasks, but it accounts also for automated aspects of driving. In a highway environment most routine activities occur for the lateral (i.e. steering) and longitudinal control (i.e. accelerating, braking). For modeling these activities Salvucci and Gray [6] suggest a closed-loop control approach is suggested, which is derived from a standard proportional integral, or PI, controller: While the lateral control law relies on the perceived visual direction of two salient points on the road, the longitudinal control law uses the time headway to a lead vehicle. As a subsequent motor execution of control, the vehicle’s steering angle is adjusted (lateral control), and a depression of the accelerator or the brake pedal (longitudinal control) occurs. As the automated activities are part of a larger driving task, the control laws have to be integrated into the cognitive architecture. Due to ACT-R’s serial cognitive processor a discrete formulation of both control laws is needed. This is convenient from a computational point of view, but it has theoretical implications also: First, it matches the assumptions that humans operate on a discrete “clock” with a cycle time of 50 ms [2]. Second, the discrete form works with an irregular updating frequency; it is thus suited to cope with distraction or inattention while driving. Although this model has yielded good results in validation studies [1], some issues still have to be discussed: Vagueness of perception. The modeling strategy is based on the assumption that drivers rely on exact values, such as the visual angle to salient road points or the time headway to a lead vehicle. In contrast to this, it is much more likely to assume that people rely on broader categories in which several instances are summarized. For example, when cruising on a highway it tends to be the overall belief of being too close to a lead vehicle that lets us brake rather than the observation that our distance has dropped below “20 m”. Coherent modeling framework. Groeger [4, p. 74] supposes that “conscious control (…) along with well practiced routines, is part of a single control architecture”. However, in Salvucci’s [1] model the automated elements seem to be “outsourced” of the cognitive architecture rather than being part of a coherent framework. This becomes obvious when taking a look at how automaticity develops. According to Fitts [7] there are three stages in the acquisition of skills: The initial, cognitive, stage is seen as tightly linked to verbal descriptions. For instance, driving instructors usually provide some rules when explaining to a learner when to change gears. A common characteristic for the second, associative, stage is that verbal mediation is reduced. The final stage is automatic and verbalization is no longer needed nor possible. Salvucci’s approach maps the initial stage with the cognitive architecture and the last stage with the control laws, but it is not obvious that these stages are part of one task.
3 Fuzzy Logic for Modeling Automated Activities In order to overcome some of the problems mentioned above, an alternative approach, based on fuzzy logic, is suggested. After a brief introduction to fuzzy control
A Contribution to Integrated Driver Modeling
437
(Sec. 3.1) it is argued why this is suited to span a coherent modeling framework together with a cognitive architecture (Sec. 3.2). 3.1 Fuzzy Logic and Fuzzy Control According to [8] the parallel processing of automated activities affords that different rules are evaluated at the same time and that their outputs are merged. Whenever more than one rule is applied at a certain moment, it is possible that the conditional parts of the rules are only partly met. The problem of combining rules, which apply to a certain context only to some degree, can be modeled by fuzzy logic: Fuzzy logic was introduced by Zadeh [9] and it allows a mathematical representation of vagueness. Within classic set theory one element is either part of a given set or it is not part of it. In contrast, in fuzzy logic the membership of one element to a certain set is considered as a matter of degree. A fuzzy set A is specified by a membership function mA defined on the universe of discourse X, commonly containing values in the interval from 0 to 1 (normalized). For control applications, for instance, X may be defined as the error value of the control problem, which has the linguistic meaning “error”. If X is divided into various fuzzy sets, then there is a linguistic term that may be assigned to each single set and that describes the meaning of the specific values of the set in natural language. Thus, for instance, “large” may refer to a fuzzy set with large values of X. Usually logic operations on fuzzy sets are carried out by relying on so called Tnorms and T-conorms [10]. For the intersection (i.e. logical and) the minimum operator is commonly used, while for the union (i.e. logical or) the maximum is typically applied. Thereby, due to the loss of information, the product-operator is often preferred to the minimum [10]. By relying on such operators the if-condition of a rule can be expressed in a fuzzy way. In order to map a complete if-then rule a further operator is needed. For this purpose either the minimum or the product is suggested, whereby the membership function will be truncated when the first one is applied, whereas it will be scaled when the latter is used. Therefore, fuzzy logic allows to express and to conduct logical if-then statements with both the antecedents and the consequence being fuzzy. The linguistic terms associated with each fuzzy set provides the opportunity to translate expert knowledge, stated in natural language, into a mathematic representation. According to this idea, Mamdani [11] suggested to use fuzzy logic for control applications. Since then fuzzy-controllers are commonly used in a wide field of technical applications [12]. The core element of such a fuzzy controller is the knowledge base, which consists of a rule base and a data base. The rule base contains a set of ifthen rules as described above. The data base holds the membership functions of the fuzzy sets belonging to each linguistic variable that is used for the control problem. For a given input all rules, contained in the rule base, are evaluated individually. To be useful for control tasks, all individual fuzzy outputs have to be merged and a crisp value for the output has to be derived; these steps are labeled as aggregation and defuzzification, respectively (Sec. 4.1; [14]).
438
A. Mihalyi, B. Deml, and T. Augustin
3.2 Fuzzy Logic and Cognitive Modeling Against the background of this work the fuzzy approach is sensible for two reasons: Vagueness of perception. In contrast to Salvucci’s [6] approach, which relies on crisp input values, fuzzy logic is more in line with the human tendency of representing perceptual input in broader and vague categories in order to cope with the complexity of the environment. Thus, instead of braking because the time headway to a lead vehicle has fallen below a certain value, it is more likely to assume that this maneuver is initiated by reasoning of being “too close”. Coherent modeling framework. From a formal point of view fuzzy control is also more equivalent to a cognitive architecture than is common control theory. The idea of splitting the controller into a rule and a data base corresponds to the way of how knowledge is implemented in ACT-R, namely as procedural and declarative memory. Thus, the conjoint use of both approaches accounts to Groeger’s [4, p. 74] and Fitts’ [7] demand for processing both tasks in a single architecture. There is also a major difference that is essential from a theoretical point of view: Whereas rules are processed in a parallel manner by a fuzzy controller, these are processed serially by ACTR (i.e. firing of one production at one point in time). The fuzzy approach is suited to map the automatic processing of routine activities, whereas ACT-R models are able to describe the executive processing of non-routine activities.
4 Implementation of a Fuzzy Controller in a Driver Model To demonstrate that the fuzzy approach is of practical value also, both the structure and the configuration of such a controller (Sec. 4.1) are presented (for more details see [13]). As the controller consists of numerous parameters human data were recorded (Sec. 4.2) and optimal parameter values were derived empirically (Sec. 4.3). 4.1 Structure and Configuration of the Controller According to [6], humans rely on the visual information of two points when steering a vehicle, a near point and a far point. Corresponding to these two sources of information, the controller consists of two PD like rule bases. The inputs are the angles between the car heading and the near point or the far point, respectively, and their derivations. The output is the change of the steering angle. Each input dimension was partitioned into seven fuzzy sets that are arranged symmetrically around zero, using triangular membership functions and trapezoidal ones for the borders [10]. The support of the functions varied in size, being small around zero and larger in the extremes. In consequence, for small deviations from the set point more rules apply and thus a more differentiated response can be provided to these more frequent events. The input membership functions of two adjacent fuzzy sets were chosen to cross at a level of 0.5 as this has generally turned out to be optimal in terms of rise time and over/undershoot behavior of the controller [15]. For the outputs the same strategy was applied, but here only triangular membership functions were used. Due to the fixed cross-point level and the symmetry around zero, it is
A Contribution to Integrated Driver Modeling
439
possible to limit the description of the seven output sets of each rule base to four parameters. These parameters are obtained by fitting the controller to empirical data obtained in an experiment via an optimization procedure (Sec. 4.3); the overall structure of the controller is summarized by figure 2.
ϕN Δϕ N
ϕF Δϕ F
PD-like rulebase for the near point
m1
m2
ω ⋅ ∑ M N + (1 − ω ) ⋅ ∑ M F i =1 m1
PD-like rulebase for the far point
i
j =1 m2
ω ⋅ ∑ AN + (1 − ω ) ⋅ ∑ AF i =1
i
j =1
j
= y*
j
Fig. 2. Structure of the fuzzy controller with two independent rule bases and the center-of-sums method for defuzzification; M denotes the moments and A indicates the areas of the scaled output membership functions (N for near-, F for far-point)
Aggregating the individual rule outputs to a total output can be a rather complex and time consuming step. Therefore, defuzzification procedures are proposed, which skip the step of aggregation and rely directly on the individual outputs; within this work the center-of-sums method [14] [16] is used. Thereby, the crisp output is derived by dividing the sum of the (weighted) moments through the sum of the (weighted) areas1 of the scaled output membership functions. As can be seen in figure 2, all rule outputs referring to one source of information (i.e. near or far point) were weighted equally. Due to the lack of prior knowledge, it is assumed that information from the near and the far contribute in equal parts, therefore ω was set to 0.5. 4.2 Experimental Setup and Training Sample In order to achieve a human-like behavior of the controller, empirical data were collected. Twelve human drivers were instructed to cover a round course in a driving simulation, based on the Microsoft XNA racing game and displayed in a VisionDome V4 (Elumens Corp.). The speed was set constant, thus the subjects only had to control the simulation via a steering wheel. Their driving behavior was logged and the variables of interest were computed. These were the current far and near point for every logging (as defined by [6]), the angles between the vectors from the car to these points, and the car heading, as well as the time discrete derivative of these angles. From each person’s data set, a random sample of 1000 points was extracted, which together comprised the training sample. 4.3 Optimization of the Controller via Sequential Quadratic Programming On the basis of this training set, the optimal values for eight parameters (Sec. 4.1) are to be derived. These parameters are restricted to be larger than zero. Because of the nonlinear behavior of a fuzzy-controller and the constrained optimization problem, sequential quadratic programming with active sets and default parameters was chosen 1
The area of a fuzzy-set on X is defined as A = ∫ m(x) dx and the moment as M = ∫ x m(x) dx.
440
A. Mihalyi, B. Deml, and T. Augustin
as optimization algorithm [17]. As error function, the sum of the squared deviation of the controller output and the values of the training sample was taken. Due to the problem that an optimization with specific starting values might fail to find the global minimum and instead end in a local one, the procedure was repeated 100 times with randomly drawn starting values. Finally, the parameter set with the minimal deviation from the training sample was applied.
5 Evaluation Study In order to test the optimized controller an evaluation study was carried out. As it was done with the training sample, twelve participants were instructed to drive in the middle of a three-lane circuit in a smooth way. The same track was covered twelve times by the driver model. To challenge the robustness of the controller, a different track (Fig. 3) was used this time. Two segments were analyzed in more detail, a curved and a rather straight one. For both segments two steering parameters were derived: To asses the accuracy of keeping the instructed course, the mean of the absolute deviation from the middle lane was computed. For quantifying the degree of flickering, the absolute heading errors (i.e. deviation of car and road heading) were assessed. The outcome of the study is presented (Sec. 5.1) and discussed (Sec. 5.2) below. starting point straight segment
curved segment
Fig. 3. Top view on the test track with a curved and a straight road segment
When comparing the steering behavior of the human participants to that of the computational driver model, one immediately notices two things: First, the driver model manages to remain within the middle lane as instructed, but by doing so it reveals a rather “sporting” driving style with a tendency to cut the corners. Thus, the mean deviation from the middle is mC = 0.71 m on the curved segment, while it accounts for mS = 0.55 m on the straight segment. In contrast to this, the human drivers deviate far less and their mean values account for mC = -0.04 m and mS = 0.13 m, respectively. Second, it seems that the driver model has to steer more often on straight segments compared to the human participants in order to remain stable; it appears to be more “nervous”. For a more systematic analysis, an ANOVA for repeated measurement was carried out. As between-subjects factor, the data of the human participants were compared to that of the driver model and for both groups the two road segments were contrasted;
A Contribution to Integrated Driver Modeling
441
Fig. 4. Mean absolute deviation in meters (left) and mean absolute heading error in radian (right) for the human participants (continuous line) and the computational driver model (dashed line) as well as for two distinct road segments
as dependent variables both the mean absolute deviance (Fig. 4 left) and the mean absolute heading error (Fig. 4 right) were considered. Since the data are not normally distributed, they were Box-Cox transformed before being analyzed [18]. Significance values are reported based on the transformed data, whereas means and mean differences are derived from the original data set in order to facilitate interpretability: For the mean absolute deviance the between-subjects factor was significant with p = .007 (F(1, 22) = 8.70), whereby it was 0.2 m less for the human drivers than it was for the driver model. The within-subjects factor was also significant, whereby the mean deviation for the straight segment was reduced by 0.23 m (p < .000, F(1, 22) = 61.84). The interaction was not significant with p = .34 (F(1, 22) = 0.94). The mean heading errors of the computational driver model were also significantly larger with an estimated difference of 0.002 rad, compared to the heading errors of the human drivers (p = .004, F(1, 22) = 10.43). The straight segment of the track also led to a reduced heading error, with a mean difference of 0.01 rad when compared to the curve (p < .001, F(1, 22) = 57.67). The interaction was also significant with p = .007 (F(1, 22) = 8.91).
6 Discussion To summarize, the driver model was able to cope with the simulated driving task. Thus, enhancing a cognitive architecture, such as ACT-R, with a fuzzy approach is not only sensible from a theoretical point of view but it is also feasible. Three issues need to be addressed in future research: For one, due to the lack of prior knowledge, the two sources of information (i.e. near and far point) were weighted the same, with ω = 0.5. By increasing ω a more centered position in curved segments may be achieved. This parameter may serve to shift the driving style from a “sporting” to a rather “comfortable” manner. However, despite the better fit between the hum an drivers and the computational model, this should not be done without relying on more empirical data or a sound theoretical basis. Second, Wilkie and colleagues [19] assume that different people also choose different strategies for acquiring far point information. In the case that not all drivers may refer to the tangential point when steering around a curve, a more centered driving behavior would occur, especially in
442
A. Mihalyi, B. Deml, and T. Augustin
sharp curves. Finally it has to be discussed whether the lateral road position has to be varied when running the model repeatedly, so as to produce a similar variance over the trials as shown by the study’s participants. Acknowledgments. The authors would like to thank Prof. Dario Salvucci for making available the source code of his driver model.
References 1. Salvucci, D.D.: Modeling driver behavior in a cognitive architecture. Human Factors 48, 362–380 (2006) 2. Anderson, J.R., et al.: An integrated theory of the mind. Psychological Review 111, 1036– 1060 (2004) 3. Cacciabue, P.C.: Modelling Driver Behaviour in Automotive Environments. Springer, Heidelberg (2007) 4. Groeger, J.A.: Understanding Driving. Taylor & Francis, Philadelphia (2001) 5. Schneider, W., et al.: Automatic and controlled processing and attention. In: Parasuraman, R., Davies, D.R. (eds.) Varieties of attention, pp. 1–27. Academic Press, Orlando (1984) 6. Salvucci, D.D., et al.: A two-point visual control model of steering. Perception 33, 1233– 1248 (2004) 7. Fitts, P.M.: Factors in complex skill training. In: Glaser, R. (ed.) Training research and education, pp. 177–197. Pittsburgh Univ. Press, Pittsburgh (1962) 8. Neisser, U.: The multiplicity of thought. British Journal of Psychology 54, 1–14 (1963) 9. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965) 10. Bandemer, H., et al.: Einführung in Fuzzy-Methoden. Theorie und Anwendungen unscharfer Mengen. Akademie-Verlag, Berlin (1990) 11. Mamdani, E.H.: Application of fuzzy algorithms for the control of a simple dynamic plant. In: Proc. IEEE, pp. 1585–1588. IEEE Press, New York (1974) 12. Seising, R.: The Fuzzification of Systems: The Genesis of Fuzzy Set Theory and its Initial Applications - Developments up to the 1970s. Springer, Heidelberg (2007) 13. Mihalyi, A.: Implementierung einer Fuzzy-Regelung in eine kognitive Architektur. Master Thesis. Ludwig-Maximilians-Universität, München (2008) 14. Driankov, D., et al.: An introduction to fuzzy control. Springer, Heidelberg (1996) 15. Boverie, S., et al.: Fuzzy logic control compared with other automatic control approaches. In: Proc. IEEE, pp. 1212–1216 (1991) 16. Leekwijck, W., et al.: Defuzzification: Criteria and classification. Fuzzy Sets and Systems 179, 159–178 (1999) 17. MatLab Optimization Toolbox 4, http://www.mathworks.com 18. Box, G.E.P., Cox, D.R.: An analysis of transformations. Journal of the Royal Statistical Society, B 26, 211–234 (1964) 19. Wilkie, R., et al.: Controlling Steering and Judging Heading. JEP: HPP 29, 363–378 (2003)
The New BMW iDrive – Applied Processes and Methods to Assure High Usability Bernhard Niedermaier, Stephan Durach, Lutz Eckstein, and Andreas Keinath BMW Group, Germany {Bernhard.Niedermaier,Stephan.Durach,Lutz.Eckstein, Andreas.Keinath}@bmw.de
Abstract. With iDrive the BMW Group introduced in 2001 a revolutionary HMI concept, which was firstly able to cope with the constantly increasing number of functions in the automobile. It was designed to optimally support drivers in their various tasks while driving. The basic iDrive concept can be described as separating driving functions from comfort functions as well as separating displays from controls. This basic concept together with a highly mounted display ensures that controls can be reached with no need looking at them and that the central display is easy and quick to access. The trendsetting iDrive idea has been widely adopted in the automotive industry. The following article outlines the iterative design and evaluation process that led to the new generation iDrive introduced in 2008 with the new BMW 7 Series. The basic challenge was to come up with an evolution of the iDrive concept by improving it without loosing the revolutionary approach to automotive HMI design. Keywords: BMW, iDrive, HMI, automotive, usability.
1 Driver Orientation One precondition for the design of such a new HMI is to know customer needs worldwide. This is especially true if the HMI is designed to incorporate the latest and upcoming technology without being technology driven. Hence, before starting the actual design process customer requirement clinics in the core markets have been set up and public available reports from numerous sources have been reviewed to understand current user needs and to extrapolate future trends. In cooperation with different universities a number of projects were started with a broad scope for defining new methods for evaluation and conceptual work. As a result of this research it was found that most HMI innovations are coming from consumer electronics industry, however, application to the automotive sector can not be done one to one. Designing a HMI for use while driving has to pay much more attention on installation, information presentation, interaction logic and system behaviour in order to ensure compatibility to the driving task, usability and attractiveness. Therefore, the main focus has to be on designing a driver orientated HMI that is the synapse between driver and vehicle: Driver Orientation means an intuitively understandable HMI, which is efficient in usage and widely accepted. It enables the driver to drive safely while using the vehicles functionality. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 443–452, 2009. © Springer-Verlag Berlin Heidelberg 2009
444
B. Niedermaier et al.
1.1 Competent Interaction through Perfect Ergonomics An important prerequisite of a driver oriented HMI is an ergonomically perfect layout of the geometry of the driver's working place: 1.2 Optimal Reachability of Controls All controls are positioned in a way that they can be reached from a comfortable seating position. Therefore the CAD-based man-model RAMSIS has been used to define suitable locations. Fig. 1 shows the area the driver can reach without detaching the right shoulder from the seat. It is obvious that the driver orientated centre stack significantly improves the reachability of the outer controls for the driver.
Fig. 1. Driver orientation through optimal positioning of controls
1.3 Information Presentation While controls are positioned primarily in the lower part of the dashboard displays are located basically in the upper part of the dashboard in an accommodation friendly distance relative to the eyes of the driver. Thus displays are very close to the drivingscene and the driver can pick up relevant information with very short glances (Fig. 2). The optional head-up display (HUD) ultimately presents all urgent and driving relevant information in the driver’s primary field of view with no need of refocusing the eyes. In addition to ergonomic installation and well structured information presentation the design of the interaction itself plays a decisive role in enabling the driver to competently and confidently interact with the vehicle’s functionality. 1.4 Intuitive and Efficient Interaction All functions which can be operated need to be prioritized on the basis of typical usescases according to their relevance while driving. Important criteria are the frequency
The New BMW iDrive – Applied Processes and Methods to Assure High Usability
445
of use and the relevance for driving and comfort. Functions with a high priority can be directly accessed by a hardkey. Medium or low priority functions can be operated menu-driven within the central information display. Controls and displays for high priority functions in the driving area are directly assigned to the driver. Comfort features are positioned in the middle of the vehicle and can be operated by driver and passenger (Fig. 2). The HMI concept should provide an intuitive and easy to learn operation for the driver at first contact. Therefore it is essential to minimize the number of interaction paradigms which the driver needs to understand across all vehicle functions. Regarding long-term use the interaction should be efficient and easy to memorize. Since the HMI concept is designed for a safe operation while driving secondary task interaction must be interruptible at anytime with no negative consequences for the driver. This criterion and other important criteria which were applied during the design of interaction can be found in the European Statement of Principles (ESoP) on HMI [1].
Fig. 2. Driver oriented positioning of display and controls
1.5 System Behaviour In order to achieve high acceptance of the HMI concept, the driver needs to be given an immediate feedback on every control input. Functionality which may attract the driver’s attention like TV, DVD or games should be automatically disabled as soon as the vehicle starts to move. Again a description of the most important design principles is given in the ESoP [1]. 1.6 Joy of Use Beside these scientifically based criteria the interaction concept needs to be attractive. Joy of use is a necessary prerequisite for sheer driving pleasure. Therefore an appealing graphical design as well as high-value and precise controls are indispensable.
446
B. Niedermaier et al.
2 The HMI Concept 2.1 Driving Area Pressing the start/stop button is all that is required to start the vehicle. The key insert is omitted thanks to the convenience starting function. The instrument cluster is designed using black panel technology, and therefore reveals only the four round instruments. The cruise control system is controlled on the left-hand side of the multifunctional steering wheel (Fig. 3). The visibility of the controls in combination with their clear labeling enables intuitive operation. In-line with operation, the corresponding feedback and displays are shown in the left half of the instrument cluster. The set speed is displayed along the speedometer scale with direct reference to the vehicle‘s current speed. The status displays for the Active Cruise Control system with stop & go function and the lane departure warning system are combined to form an easily understandable schematic portrayal of the current traffic situation. Via the controls on the right-hand steering wheel spoke, the driver can access basic entertainment and telephone functions. To achieve this, a list-based view of the current audio source or the redial list appears in the right area of the instrument cluster. The control for operating the lists is a rotary pushbutton thumbwheel on the right-hand steering wheel spoke. The customer can additionally choose from a range of state-of-the-art driver assistance systems. According to their effect, the systems have been arranged in four groups in the cockpit. The controls for those driver assistance systems which support the driver in perceiving and interpreting the traffic situation are located beneath the central light switch: e.g. the lane change warning system, lane departure warning system, collision warning system, Night Vision with person recognition and the HeadUp Display. Functions which support the driver during parking and maneuvring can be activated in the immediate vicinity of the gear selector lever and the parking brake. The navigation system additionally provides the driver with innovative functions which inform him on the traffic situation ahead of the vehicle. The speed limit display shows the currently applicable speed limit. The Dynamic Drive Control system enables the vehicle‘s handling characteristics to be adjusted.
Fig. 3. Simple access to all information relevant to driving via the multifunctional steering wheel, the instrument cluster and the Head-Up Display
The New BMW iDrive – Applied Processes and Methods to Assure High Usability
447
2.2 Comfort Area The functions in the comfort area are characterized by the fact that they facilitate direct operation – without any direct reference to the driving task. The controls are positioned in such a way that they make clear, spatial reference to the function and can be comfortably reached from the driving position. Examples of these include the seat adjustment facility on the seat, the window lifter actuators in the door or the sliding/tilting sunroof actuator in the roof lining. Further comfort functions such as the audio and air conditioning systems, central information display operation or the USB connection for MP3 players can be comfortably operated by both the driver and the front passenger (Fig. 4). They are therefore located in the center stack or in the area of the center console.
Fig. 4. Audio and air conditioning system controls with direct access
2.3 Screen Operation Screen operation is also of central importance for the comfort area. The highresolution display, which offers a resolution of 1280 x 480 pixel, enables brilliant and easily legible depiction of the screen contents. Three premises are vital for achieving real driver orientation: 1. Maximum compatibility between operation and visual depiction: The rotary control is depicted and integrated into the design of the screen‘s interface. The semantics of the visual representation enables the driver to recognize what interaction is possible at all times. 2. Extremely simple orientation in the system: From the very beginning, the system behaves comprehensibly and enables rapid orientation with short operation routes and transparent menu structures. 3. Attractiveness thanks to an efficient, customer-oriented design: An attractive screen design and high-quality graphics are important prerequisites for a high level of acceptance. However, an attractive system is only created by combining visual quality with individual options for variably configuring the screen‘s subdivision and contents. The screen operation system developed under these premises is characterized by the following features: • Control-element: The design of the controller follows the principle of the previous iDrive controller and extends it with direct selection buttons for the radio, CD/ multimedia,
448
B. Niedermaier et al.
telephone and navigation function areas (Fig. 5). In analogy to an Internet browser, the BACK button enables the driver to walk back along the history of interaction through the menus as often as he wishes. Like a PC‘s right-hand mouse button, the OPTION button offers the driver the possibility to quickly access less frequently required functions. As before, the controller can also be rotated, pressed and, as a further development of the previous pushing movement, tilted.
Fig. 5. Controller with direct access buttons
• Interaction logic and screen layout. The controller´s degrees of freedom are indicated in the design of the menus (Fig. 6; Fig. 7). The central element in the menu design is the image of the controller, which visualizes the valid interaction possibilities. Different menu levels are presented and operated with the aid of the panel principle (Fig. 6). In this case, the branch-off overlaps over the previous panel in the form of a separate panel. Thanks to the panel‘s optical offset, the driver can recognize that he is accessing a lower functional area. By tilting the controller to the left, the driver can run through the entire menu hierarchy up to the main menu. A variable menu layout was developed to meet diverse customer requirements regarding the desired display contents. To achieve this, the customer can select a split-screen view. The screen area available for operation is dynamically adapted. This creates a second display area for additional information (Fig. 7), which the driver can configure according to his individual requirements.
Fig. 6. Menu level visualization via overlapping panels
The New BMW iDrive – Applied Processes and Methods to Assure High Usability
449
Fig. 7. Optimal use of the available screen area with the split-screen active or deactivated
3 Customer Integration in the Whole HMI Design Process Competent interaction cannot be simply achieved at the drawing board. It is of particular importance to be aware of the real objectives that have to be achieved. To ensure are goal-oriented development customers have been involved already in the early phases of the development process. In the following this approach will be explained. r me s to on Cu grat i e Int Functions
Use-Cases Requirements
HMI Design
HMI Concept
r me s to on Cu grat i e Int
HMI Evaluation
Fig. 8. Iterative HMI design and evaluation process
As can be seen from Fig. 8, HMI design starts with a set of prerequisites, among which the first is the definition of the functionality. After a clear definition of all functions the next step is to define and prioritize the use-cases in which the functions are expected to be used. These use-cases have been verified and extended to future needs in customer focus groups. Another prerequisite before starting the actual design process is to structure all internal and external requirements. One important example of such requirements is the ESoP [1]. Before starting the actual design and evaluation process customer requirement clinics in the core markets have been set up and public available reports from numerous sources have been reviewed to understand current user needs and to extrapolate future trends. Further, in cooperation with different universities and research institutes a number of projects were started developing new methods for evaluation and carrying out conceptual work. With this necessary input the first iteration of the HMI design can start. The particular challenge is to align these project specific requirements and constraints with the general prerequisites for a driver oriented HMI described in the first part of this paper. The design and evaluation process for the new iDrive has been an iterative process (Fig. 8). Generally, all concept iterations have been evaluated according to usability
450
B. Niedermaier et al.
and acceptance criteria as well as to their suitability for use while driving. Therefore, customers have played a central role in defining the necessary prerequisites and in the concept evaluation. .
Concept 3
Alternative 1
Alternative 2
r me s to on Cu grat i e Int
HMI Concept
Market Research Usability Test incl. Driving Simulator
Concept 2
Usability Test incl. Driving Simulator Focus Groups
Concept 1
r me sto on C u grati e In t
r me st o on C u grati e Int
Usability Test incl. Driving Simulator Focus Groups
Customer Clinic Focus Groups
r me s to on Cu grat i e Int
Concept 4 Fig. 9. Reducing variants of the HMI Concept by evaluation and validation using objective and subjective criteria
As Fig. 9 shows, the actual design and evaluation process started initially with four teams in competition. One of the teams was based in the United States (US) to identify and accomplish specific user needs of the US market. Accompanied by ongoing usability-tests each team developed an individual interpretation of the new iDrive. Relying on the basic prerequisites a wide variety of initial iDrive concepts have emerged. Each team built an operational mock-up (Fig. 10) suitable to be connected to the BMW driving simulator. These four iDrive concepts have been evaluated with the involvement of a set of representative customers measuring subjective and objective usability and driver behaviour and performance parameters. It is important to note that customer evaluations have not only been static usability testings or acceptance measurements but also evaluating system use while driving together with driving behaviour and eye tracking data.
Fig. 10. Evaluation in the operational mockups with driving simulation
The New BMW iDrive – Applied Processes and Methods to Assure High Usability
451
Based on the results of this competitive customer evaluation of four concept alternatives the best and most promising ideas and features of each concept have been reassembled into two new concept alternatives. Both of them have additionally been detailed within the strong constraints of the complete vehicle development process. The two concept alternatives were again built up in two operational mock-ups and evaluated using driving simulation with customers. The same methods, scenarios and use-cases were used as described above. As a result the two concept alternatives have been synthesized to one concept based on the results of the extensive customer evaluation and including a number of market specific solutions e.g. a Japanese speller. The resulting final concept has been specified in all details and implemented in a prototype vehicle and a final mock-up for ongoing evaluation and usability testing. Additionally a mobile driving simulator was designed capable of all features of the static BMW driving simulation but using three plasma screens displaying the frontal driving scenery instead of a projection. Finally the mock-up and the mobile driving simulation together with the prototype vehicle were used for a worldwide evaluation study in the most important markets. This study was set up to combine traditional methods of market research, investigating customer acceptance and attractiveness as well as formal usability testing with driving behaviour and performance measurement. After incorporating the results of this market study into the final iDrive concept, field testing was done using the prototype vehicle now fully equipped with data loggers and cameras for measuring glance data, interruptability, user behaviour and driving performance. These field tests were used to validate the driving simulator data and finally prove the suitability of the concept for use while driving. New evaluation methods developed with universities and institutes have also been incorporated, namely the occlusion technique [2] to quantify the interruptability of the concept alternatives. Other methods like the Peripheral Detection Task [4] or the Lane Change Task [3] have been developed and used to measure object and event detection in early phases of the ongoing design and evaluation process. The whole set of
Fig. 11. Test methods used in the validation process
452
B. Niedermaier et al.
evaluation methods and settings used in the evaluation process from simple usability methods to more advanced methods ranging from field tests to methods incorporating a simulated or abstract driving task are shown in Fig. 11.
4 Conclusion Developing a new iDrive generation means an evolution of a revolution. Hence, the customer has been the centre of an iterative design and evaluation process. Not only customer satisfaction was taken into account but also objective data on driver behaviour and driving performance reflecting the compatibility between iDrive interaction and the driving task. A wide variety of methods has been applied to ensure that the new iDrive generation is intuitively understood, efficient and accepted, and enables drivers from all relevant markets to drive safely and benefit from the vehicle‘s functionality. The new generation iDrive is designed to enhance and contribute to sheer driving pleasure.
References 1. European Commission, COMMISSION RECOMMENDATION on safe and efficient invehicle information and communication systems: Update of the European Statement of Principles on Human Machine Interaction. Official Journal of the European Communities, February 6 (2007) 2. ISO 16673: Road vehicles - Ergonomic aspects of transport information and control systems - Occlusion method to assess visual distraction due to the use of invehicle systems 3. ISO TC22/SC13/WG8 draft, Road vehicles - Ergonomic aspects of transport information and control systems (2008) 4. Jahn, G., Oehme, A., Krems, J.F., Gelau, C.: Peripheral detection as a workload measure in driving: Effects of traffic complexity and route guidance system use in a driving study. Transportation Research Part F 8, 255–275 (2005)
Method to Evaluate Driver's Workload in Real Road Context Annie Pauzié National Research Institute on Transport and Safety Laboratory Ergonomics & Cognitive Sciences in Transport 25 avenue François Mitterrand Case 24 - 69675 -Bron Cedex / France [email protected]
Abstract. Innovative technology implemented in the vehicle can induce improvement in road safety, as long as its acceptability and its adequacy are checked, taking into account the diversified driver’s population needs and functional abilities through a Human Centred Design process. Relevant methodology has to be developed in this purpose. Evaluation of the driver’s mental workload is an important parameter, complementary to objective ones such as control of the vehicle and driver’s visual strategies. This paper reports on 3 real road experiments run for the assessment of mobile phone and guidance/navigation systems usability. Evaluation has been based upon a method of subjective evaluation of the driver’s mental workload: the Driving Activity Load Index (DALI). Use of the DALI allowed identifying which aspects of the system had to be improved, for an improved acceptability and usability by the drivers.
1 Introduction In the context of Information and Communication Technology (ICT), number of researchers are conducted to investigate road safety consequences linked to the implementation of these innovative functions [6, 31].The objective is to set up the balance between potential interferences induced by these in-vehicle systems versus potential benefits brought by these new functions available to support the driving task. To conduct this approach, it is necessary to have efficient methodology to be applied according to the type of function, the type of system and the context [3]. A quite exhaustive overview of available methodologies, tools and techniques has been conducted in the framework of the network of Excellence HUMANIST [11]. Classically, the parameters to take into consideration for safety evaluation are related to the vehicle; for example, trajectory deviations consequent to the system use, [38], the drivers' visual strategies, visual demand due to on-board screen, [23], more generally the driver’s behavior [4, 33], and the overall drivers' workload according to the situation [16]. The assessment of workload is coupled with the task difficulty as experienced by the individual [12], in particular because several reactions to the task demands are possible. The individual can adapt his behaviour to an increased demand, leading to a higher cost and no perceptible effect on the performance or, on the V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 453–462, 2009. © Springer-Verlag Berlin Heidelberg 2009
454
A. Pauzié
contrary, he can decide to change his strategy with a lower performance. Then, moderate increases in task difficulty may produce few observable changes in error rate, as the driver attempts to keep performance constant by allocating more resources to the task [37]. Furthermore, inter-individual strategies are variable; some individuals develop more effective strategies which require less effort to reach a level of performance than other ones. So, for all these reasons, objective performance measures are not sufficient by themselves to evaluate the overall constraint of a given situation. Mental workload is a psychological construct, difficult to define and difficult to assess. O’Donnell & Eggemeir [20] defined workload as being this portion of the operator’s limited capacity that is actually required to perform a particular task. According to this definition, mental workload depends upon the task demands in relation to the amount of resources the operator is willing or able to allocate, and is therefore a relative concept [17]. In order to define workload, the concept of “effort” is of a primary importance: processing effort in resource allocation and effort for mobilization of additional resources as a compensatory process, in relation to task demand [1, 32]. Although there is no universally accepted definition of mental workload, a consensus suggests that mental workload can be conceptualized as the interaction between the structure of systems and tasks on the one hand, and the capabilities, motivation and state of the human operator on the other [13, 19, 34]. Mental workload is a variable difficult to assess, in comparison with other variables. Several methods have been developed to measure the individual's mental workload [9]: measurements of the physiological parameters [7], method of dual task [29] and method consisting in formalizing the own driver judgment about the workload he experienced such as S.W.A.T. - Subjective Workload Assessment Technique [27], the NASA TLX - Task Load Index- [14]. Although objective measures of mental workload have considerable theoretical interest, subjective measures have been recommended for practical use because they have many advantages and few disadvantages compared with objective measures [8, 21, 35]. This paper discussed subjective method for evaluation of driver’s mental workload.
2 Subjective Evaluation of Driver’s Mental Workload 2.1 Methods for Subjective Evaluation of Mental Workload The subjective method allows evaluation rather than measurement of mental workload by establishing relative comparison between situations. It must be then considered as a global and even as a "crude" criterion, able to detect important phenomena only. Furthermore, subjective evaluation has to be conducted in addition to other objective measures such as the performance [28]. The SWAT is a sophisticated workload assessment test, composed of a two-step process: in a scale development phase, data necessary to develop a workload scale are obtained from individuals; during an event scoring phase, people rate the workload associated with a particular task [26]. The primary assumption of SWAT is that the workload is function of three dimensions: time load, mental effort load and psychological stress, each dimension having three possible levels. All possible combinations of the three levels of each dimension yield a 27 cell three dimensional matrix to represent workload.
Method to Evaluate Driver's Workload in Real Road Context
455
The NASA TLX method assumes that the workload is influenced by mental demand, physical demand, temporal demand, performance, frustration level and effort. After assessing the magnitude of each of the six factors on a scale, the individual performs pairwise comparisons between these six factors, in order to determine the higher source of workload factor for each pair. A composite note quantifying the level of workload is set up by using both factor rating and relative weights computed from the comparison phase. The NASA-TLX has been tested and used by the army; being considered as superior in terms of sensitivity than other methods and well accepted by the operator [15]. The DALI, for Driving Activity Load Index, is a revised version of the NASATLX, adapted to the driving task [25]. As it was previously mentioned, the mental workload is multidimensional and, among other things, depends upon the type of task. Indeed, the NASA TLX was originally set up for pilots. The basic principle is the same than the TLX, with a scale rating procedure for six pre-defined factors, followed by a weighing procedure in order to combine the six individual scales into a global score. The main difference lies in the choice of the main factors composing the workload score. Considering the TLX, one of the factors to be rated is called the Physical component and is usually defined in the following terms : " How much physical activity was required ? -pushing, pulling, turning, controlling, activating,...-" It appears that this question would not be very relevant when considering the driving activity where the control of the vehicle is quite automatic for an experienced driver, and where maneuvers are not supposed to be physically demanding in our nowadays modern cars. Another example is the mental component defined in the TLX as follows " How much Mental and perceptual activity was required ? - thinking, deciding, calculating, remembering, looking, searching,...-". This statement covers both perceptive and cognitive aspects of the workload, and we think it would be interesting in the context of the driving task to be able to identify these various modalities. Finally, the evaluation of the Performance factor can be made using objective data. The subjective rating of a good performance by the driver can show discrepancies with the measured one, but this difference might be due to many other factors than the mental workload itself - low or high self-esteem, motivations to fit to the standard performance,...The procedure to set up the DALI was to ask various experts involved in the driving task studies to define which were, in their opinion, the main factors inducing mental workload for people driving a vehicle equipped with an on-board system (car phone, driving aid system, radio,...). This investigation leads to the following definitions for the 6 workload dimensions for the DALI: Effort of attention, Visual demand, Auditory demand, Temporal demand, Interference and Situational stress Main results of previous and recent studies using DALI tool are summarized in the following paragraph, in order to have an overview of the advantages and the limits of this task load index according to various contexts and purposes of investigation. 2.2 Evaluation of Driver’s Workload Using Hand-Free Mobile Phone The possibility to use mobile phone while driving raised the issue of road safety. Indeed, on the contrary of other Information and Communication technology developed
456
A. Pauzié
to support the driving task, activity of phoning is disconnected from the task itself. There is then no benefit of this function in terms of enhancement of the driving task process for the driver. Use of the mobile phone while driving would induce a high probability of interference in terms of attentional demand for the driver [30, 22]. Nevertheless, some diversified findings are encountered in the literature, linked to the modalities of experimental conditions. In order to test the load of conversation, [36] reviewed secondary verbal tasks techniques in assessing mental load while driving. According to this author, drivers showed no significant changes in their driving performance, which made him assume that the task priority was maintained as intended. Brown, Tickner & Simmonds [5] used verbal reasoning task based on grammatical transformation in order to assess the effect of phoning while driving. Results indicated increased errors in judgment of gaps, decreased skill in steering through narrow gaps and decreased speed. Drory [10] measured driver behaviour when using a mobile phone in a driving simulator. No serious performance decrement was found concerning the driving activity, except when the subjects actually dialed the number. Mikkonen & Backman [18] studied the influence of the phone conversation on the driving performance in familiar urban environment. In this case, drivers paid more attention to their task, increasing their alertness and their anticipation behavior. Tokunaga & col. [30] showed the negative impact of the complexity of the phone conversation on the reaction time and on the NASA-TLX values, both for young and old drivers. Task. 80
70
Not Phoning Phoning
60
50
40
30
20
10
0
attention
visual
auditory
temporal
interference
stress
global
Fig. 1. Factors and Global Value of the DALI for hand-free mobile phone use
Advantages: Allow to better understand how the implementation of a system in a vehicle can be experienced by driver (in this example, the most significant factor was “interference” due to phoning while driving). Limits: Subjective evaluation gives information only on driver’s awareness about workload he/her experienced in a defined context
Method to Evaluate Driver's Workload in Real Road Context
457
In order to evaluate the efficiency of the DALI as a tool for assessment of mobile phone use, an experiment has been carried out in real road context (Pauzié & Pachiaudi [25] for the detailed of the experimental protocol and results). As the DALI is defined according to perceptual and cognitive factors, the objective was to investigate which highest values were founded using this workload index. Results indicated that the global value of the mental load increased significantly when phoning and driving in comparison with the reference situation corresponding to no system use. Load index was significantly high for “auditory” and “interference” factors, in addition to “stress”. The effort of “attention”, although higher than during a simple driving task, does not increase significantly. So, in terms of subjective evaluation of the workload, drivers identified the disturbance induced of phoning, through the perceptive channel of audition, on managing the driving task, inducing then stressful context. Through this example, it is possible to illustrate how a subjective tool can allow understanding factors of mental workload and cost of the task through driver’s awareness. 2.3 Evaluation of Driver’s Workload Using Navigation/Guidance Function Navigation and guidance functions have been developed to support drivers at the strategical level of the driving task, in the situation of orientation processes, impacting also at the operational level by allowing to anticipate on the coming maneuver. Theoretically, the fact to rely on auditory and visual instructions in order to take decision about way finding should decrease mental workload and reduce driving errors. Nevertheless, this objective can be reached if the system is correctly designed, with correct timing for displayed messages, and clear, well legible and visible visual information. Evaluation of driver’s mental workload, in addition to driving errors, supports the process to investigate usability and acceptability of the system design developed for this function [2]. Two experiments conducted with a 9 years laps of time [24], are especially revealing about the importance to make distinction between “benefit of the function” such as “instructions to guide the driver”, and “design of the system for this function” such as “modalities of displayed instructions, timing of auditory messages, legibility and understandability of messages”. The first described experiment was testing the first generation of GPS system, the second one using the new generation of system, both implemented by the same car manufacturer. Driver’s workload and old generation of guidance system. According to the DALI “global” score, use of the system was corresponding to a significant higher workload for the driver in comparison with the “reference” situation, whatever guidance or electronic map. Considering the details of the DALI factors, “auditory” and “temporal” demands were both critical factors, rather than “visual” load, with high values for “stress” and “attention”. Based upon this data, it was then possible to identify that the messages displayed by this specific system, whatever option of guidance or navigation, has a poor timing when delivering auditory instructions to the driver. This result showed also that navigation option induced a high rate of interference, in comparison with the two other contexts of guidance and reference.
458
A. Pauzié
80
70
Reference Guidance Navigation
60
50
40
30
20
10
0 attention
visual
auditory
temporal
interference
stress
global
Fig. 2. Factors and Global Value of the DALI for Guidance and Navigation. Old generation system.
Advantages: Allow to identify the weakness of the design characteristics that induced workload for the driver (in this example, poor timing of the auditory messages) Limits: Necessity to analyze objective variables such as driving errors to complete the investigation (in this example, the value of driving errors could modulate the conclusions regarding reference situation versus guidance situation; it can be costly to use the system for the driver but it can induce less driving errors). Driver’s workload and new generation of guidance system. Diversified contexts of orientation processes have been set up, varying according to their theoretical level of workload for the driver, in order to test the validity of the DALI method. Overall 4 situations were identified, from HIGH to LOW demand: “complex system” requiring cognitive and perceptive attentional demand, “paper map” with no system, “guidance system” correctly designed and “human co-pilot” giving instructions to the driver. The four situations have been processed in real road context and in a turnover order [24]. According to DALI global score, there is a significant difference between the 4 experimental sessions (Wilcoxon, Z= 3,007, p=0,003; Z= 2,224, p=0,026, Z= 2,539, p=0,011; Z= 3,923, p<0,001). More precisely, on the contrary of the previous experiment, “use of guidance instructions” induced generally a lower workload than “use of a paper map”, identified as “reference” in the previous paragraph. Looking at the detail of the DALI factors, it appears that support of the system for the driver is in terms of “stress”, “interference between driving and finding his route”, “temporal”, “visual” and “attentional” demand, with significant differences. Of course, ”auditory” demand was not rated by the driver in the context of the paper map use.
Method to Evaluate Driver's Workload in Real Road Context
459
Fig. 3. Factors and Global Value of the DALI for Guidance and Navigation. New generation system.
Advantages: Allow identifying correctly design functions of in-vehicle system, able to support driving task, in comparison with situation with no system (in this example, driving with a well designed guidance system in terms of visual and auditory messages (timing, loudness, content) induced less workload for the following factors “attention”, “visual”,” temporal”, “interference”, “stress” in comparison with the situation of paper map use). Limits: Necessity to set up several contexts to be able to use this tool (reference situation versus tested situation or several tested systems), as it allows relative and not absolute results. Then, these results demonstrated that a guidance system correctly design in terms of visual and auditory messages (timing, loudness, content) is an added value for the driver by making the orientation task lighter in terms of cognitive and perceptive processes. Furthermore, the DALI results showed that there is a higher level of workload while using the system in comparison with relying on the human co-pilot. Hypothesis can be made that this system could require a phase of training longer that the timing of this experiment, in order for the driver to be fully comfortable with the system. Additional testing with longer training phase could indicate if the system can be equivalent to a human co-pilot or not. At least, DALI results indicated that this system is superior to a paper map.
460
A. Pauzié
3 Conclusion Driver’s mental workload is a variable complementary to behavioral and performance variables, bringing additional information and allowing broader understanding about the complex interactions between driver and system. The DALI as a subjective evaluation allowed gathering data usable by the designer to improve his prototype. It allowed also identifying impact of a given system implementation by comparing results with a reference situation with no system. One of the main advantages of this tool is the possibility to identify origins of the driver’s workload, allowing then to correct the situation at this identified level (e.g. interference and visual load indicate that an in-vehicle system has a demanding visual display). The possible improvement would be to add factors linked to specific aspect of the driving task useful to evaluate impact of ADAS functions (e.g. level of stress to keep distance with the vehicle ahead, in the case of a system having an impact on this specificity of the driving task). It is planned to conduct further investigations to improve this method by varying the type of situations.
References 1. Aasman, J., Mulder, G., Mulder, L.J.M.: Operator effort and the measurements of heartvariability. Human Factors 29, 161–170 (1987) 2. Ashby, M.C., Fairclough, S.H., Parkes, A.M.: A comparison of route navigation and route guidance systems in an urban environment. In: Proceedings of ISATA Conference (1991) 3. Bekiaris, E., Stevens, A.: Common risk assessment methodology for advanced driver assistance systems. Transport Reviews 25(3), 283–292 (2005) 4. Brookhuis, K., de Waard, D., Janssen, W.: Behavioural impacts of Advanced Driver Assistance Systems–an overview. EJTIR 1(3), 245–253 (2001) 5. Brown, I.D., Tickner, A.H., Simmonds, D.C.V.: Interference between concurrent tasks of driving and telephone. Journal of Applied Psychology 53, 419–424 (1969) 6. Carsten, O.M.J., Nilsson, L.: Safety Assessment of Driver Assistance Systems. European Journal of Transport and Infrastructure Research 1(3), 225–243 (2001) 7. Casali, J., Wierwille, W.A.: On the measurement of pilot perceptual workload: a comparison of assessment techniques addressing sensitivity and intrusion issues. Ergonomics 27(10), 1033–1050 (1984) 8. Colle, H.A.: Context Effects in Subjective Mental Workload Ratings. Human Factors 40 (1998) 9. De Waard, D.: The measurement of drivers’ mental workload, PhD thesis, Traffic Research Centre VSC, University of Groningen, The Netherlands, 125 p. (1996) 10. Drory, A.: Effects of rest versus secondary task on simulated truck driving task performance. Human Factors 27(2), 201–207 (1985) 11. Gelau, C., Stevens, A., Cotter, S.: Impact of IVIS on driver workload and distraction: Review of assessment methods and recent findings, Deliverable D.2/E.2, HUMANIST NoE (2004) 12. Gopher, D., Donchin, E.: Workload - an examination of the concept. In: Boff, K.R., Kaufman, L., Thomas, J.P. (eds.) Handbook of perception and human performance. Cognitive processes and performance, vol. II. Wiley, New York (1986)
Method to Evaluate Driver's Workload in Real Road Context
461
13. Gopher, D., Donchin, E.: Workload – An examination of the concept. In: Boff, K., Kaufman, L., Thomas, J. (eds.) Handbook of Perception and Performance. Cognitive Process and performance, vol. II, pp. 41/1–41/49. Wiley, New York (1986) 14. Hart, L.A., Staveland, L.: Development of the NASA Task Load Index (TLX): Results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, pp. 139–183. North-Holland, Amsterdam (1988) 15. Hill, S., Lavecchia, H., Byers, J., Bittner, A., Zaklad, A., Christ, R.: Comparison of four subjective workload rating scales. Human Factors 34(4), 429–439 (1992) 16. Lansdown, T.C., Brook-Carter, N., Kersloot, T.: Distraction from multiple in-vehicle secondary tasks: vehicle performance and mental workload implications. Ergonomics 47(1/15), 91–104 (2004) 17. Meijman, T.F., O’Hanlon, J.F.: Workload. An introduction to psychological theories and measurements methods. In: Drenth, P.J.D., Thierry, H., Willems, P.J., de Wolff, C.J. (eds.) Handbook of Work and Organisational Psychology, pp. 257–288. Wiley, New York (1984) 18. Mikkonen, V., Backman, M.: Use of the car telephone while driving. Technical Report n° A39. Department of Psychology, University of Helsinki (1988) 19. Moray, N.: Mental workload since 1979. In: Oborne, D.J. (ed.) International Reviews of Ergonomics, vol. 2, pp. 123–150. Taylor & Francis, London (1988) 20. O’Donnell, R.D., Eggemeir, F.T.: Workload assessment methodology. In: Boff, K.R., Kaufman, L., Thomas, J.P. (eds.) Handbook of perception and Human Performance. cognitive processes and performance, vol. II, 42/1–42/49. Wiley, New York (1986) 21. O’Donnell, R.D., Eggemeier, F.T.: In: Boff, K.R., Kaufman, L., Thomas, J.P. (eds.) Handbook of Perception and Human Performance, vol. II. John Wiley and Sons, New York (1986) 22. Patten, C., Kircher, A., Östlund, J., Nilsson, L.: Using mobile telephones: cognitive workload and attention resource allocation. Accident Analysis & Prevention 36(3), 341–350 (2004) 23. Pauzié, A.: In-vehicle communication systems: the safety aspect. Injury Prevention Journal 8, 0–3 (2002) 24. Pauzié, A., Manzano, J.: Subjective evaluation of the driver’s workload in real road experiment, AIDE Deliverable 2.2.6 (2006) 25. Pauzié, A., Pachiaudi, G.: Subjective evaluation of the mental workload in the driving context. In: Rothengatter, T., Carbonell Vaya, E. (eds.) Traffic & Transport Psychology: Theory and Application, pp. 173–182. Pergamon, Oxford (1997) 26. Reid, G., Nygren, T.: The subjective workload assessment technique: a scaling procedure for measuring mental workload. In: Hancock, P.A., Meshaki, N. (eds.) Human Mental Workload. Elsevier, North-Holland (1988) 27. Reid, G., Shingledecker, C., Nygren, T., Eggemeier, T.: Development of multidimensional subjective measures of workload. In: International Conference on Cybernetics & Society, Atlanta, pp. 403–406 (1981) 28. Sheridan, T.: Human factors of driver-vehicle interaction in the IVHS environment, Center of Transportation Studies, MIT, n° DTNH22-89-Z-07595, NHTSA, U.S. Department of Transportation (1990) 29. Shingledecker, C.: Performance evaluation of the embedded secondary task technique. Aerospace Medical Association Annual Scientific Meeting, 151–152 (1982) 30. Tokunaga, R.A., Hagiwara, T., Kagaya, S., Onodera, Y.: Cellular telephone conversation while driving: Effects on driver reaction time and subjective mental workload, Human Performance: Driver Behavior, Road Design, and Intelligent Transportation Systems. Annual Meeting of the Transportation Research Board No79, ETATS-UNIS 2000 (1724), 1–6 (2000)
462
A. Pauzié
31. Vaa, T., Gelau, C., Penttinen, M., Spyroupolou, I.: 2006, its and effects on road traffic accidents - State of the art, 13 World Congress on ITS, London, October 9 (2006) 32. Vincente, K.J., Thornton, D.C., Moray, N.: Spectral analysis of sinus arrhythmia: a measure of mental effort. Human Factors 29, 171–182 (1987) 33. Vitense, H.S., Jacko, J.A., Emery, V.K.: Multimodal feedback: an assessment of performance and mental workload. Ergonomics 46(1-3/15), 68–87 (2003) 34. Wickens, C.D., Kramer, A.: Engineering psychology. Annual Review of Psychology 36, 307–348 (1985) 35. Wierwille, W.W., Eggemeier, F.T.: Recommendations for mental workload measurement in a test and evaluation environment. Human Factors 35, 263–281 (1993) 36. Wetherel, A.: The efficacy of some auditory-vocal subsidiary task as measures of the mental load on male and female drivers. Ergonomics 24(3), 227–248 (1981) 37. Zeitlin, L.R.: 1995, Estimates of Driver Mental Workload: A Long-Term Field Trial of Two Subsidiary Tasks. Human Factors 37 (1995) 38. Zwhalen, H.T., Balasubramanian, K.N.: A theoretical and experimental investigation of automobile path deviation when driver steers with no visual input. Transportation Research Record 520, 25–37 (1974)
Intelligent Agents for Training On-Board Fire Fighting Karel van den Bosch, Maaike Harbers, Annerieke Heuvelink, and Willem van Doesburg TNO Defence, Security, and Safety PO Box 23, 3769 ZG Soesterberg, the Netherlands {karel.vandenbosch,maaike.harbers,annerieke.heuvelink, willem.vandoesburg}@tno.nl
Abstract. Simulation-based training in complex decision making often requires ample personnel for playing various roles (e.g. team mates, adversaries). Using intelligent agents may diminish the need for staff. However, to achieve goaldirected training, events in the simulation as well as the behavior of key players must be carefully controlled. We propose to do that by using a director agent (DA). A DA can be seen as a supervisor, capable of instructing agents and steering the simulation. We explain and illustrate the concept in the context of training in on-board fire fighting. Keywords: Virtual Training, Intelligent Agents, Simulation, Director Agent, Scenario Based Training.
1 Introduction Modern society has ample systems where one decision maker controls the safety of many. For example, the safety of aircraft passengers depend on the decisions of the pilot; a military commander’s decision may entail danger or protection for soldiers and civilians; the fate of fire-fighters, bystanders and victims largely depend on decisions of the fire officer. It is evident that for such safety critical systems we need competent and experienced decision makers. From the literature we know that acquiring expertise for complex tasks is a matter of intensive, deliberate and reflective practice over time [1]. Scenario-based simulator training is considered appropriate for learning decision making in complex environments [2]. A simulation enables trainees to experience the causal relations between actions, events and outcomes in the simulated environment. It thus gives access to experiential learning, e.g. by free-play practice. However, goal-directed, systematic training is more effective than learning-bydoing only [3]. In order to make learning purposive and goal-directed, events in the simulation as well as the behavior of key players need to be carefully managed [4, 5]. Players in the scenario should respond realistically to any situation emerging from the trainee’s actions, and the responses should keep the scenario on track of the learning goals. Common practice to realize this in simulation-training is to use Subject Matter Experts (SMEs) (usually staff members) to play the role of key players [6]. SMEs have the expertise to take the context into account when evaluating (on-line) the V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 463–472, 2009. © Springer-Verlag Berlin Heidelberg 2009
464
K. van den Bosch et al.
appropriateness of trainee behavior. They can also assess whether the scenario develops in the intended direction, and make adjustments, if necessary. Thus, SMEs make it possible to deliver training that represents reality in terms of dynamics and complexity, whilst maintaining a high level of control. However, the need for SMEs elevates costs of training, and staff is generally scarcely available. As a result, there are often (too) few opportunities to receive this type of training. Organizations acknowledge that developing expertise demands frequent, goal-directed, and intensive training. They are therefore looking for more flexible forms of simulation-training that require fewer organizational and logistic efforts. A solution is to use virtual agents to play the required roles autonomously. If we can develop agents that in training scenarios produce intelligent and realistic behavior of the individual or entity that they represent, we would be able to make training more cost-efficient. However, in order to make such agent-based training also goal-directed, we need an extra function. Like SMEs do, consideration should be given to which response will produce the best learning situation for the trainee. The agent should then act accordingly. For instance, an agent may deliberately act inaccurately because this enables the trainee to achieve the learning goal “detecting and correcting errors made by team mates”. What we therefore need is management of agent behavior to ensure that the scenario develops in service of the learning goals. In this paper we describe the development of a desktop-simulation training that is equipped with virtual players that can act independently and intelligently, but whose responses can also be adjusted to create or utilize emerging learning opportunities. The domain is on-board fire fighting, and the task to be trained is that of the commanding officer, the Officer of the Watch (OW). The Royal Netherlands Navy (RNLN) currently provides training in on-board fire fighting using a highfidelity simulation. Due to the rare availability of other trainees to play the role of team members, courses are organized infrequently and they contain few simulator sessions. On request of the RNLN we are developing an agent-based simulator that is more flexible and requires fewer personnel. Figure 1 shows an impression of the trainer. The trainee controls the avatar of the OW. We developed agents that play the team roles in an intelligent and autonomous fashion. As we argued, autonomous agents are not sufficient to achieve goal-directed training. A form of control is needed to maintain the scenario on track of the learning goals. One possibility is to expand cognitive models with didactical knowledge, thus enabling agents to take didactical considerations into account when deciding on action [7]. However, we consider it important for agent development to separate domain-related knowledge required to generate task behavior from didactical knowledge required to exert control over the scenario. We therefore propose a “director agent” (DA). The proposed DA can be seen as a supervisor, capable of instructing agents to perform certain behavior (thereby overruling what agents would otherwise do) and capable of steering the simulation (thereby overruling the chain of events specified in the simulation model).
Intelligent Agents for Training On-Board Fire Fighting
465
1
Fig. 1. Impression of the agent-based simulator for on-board fire-fighting training
The DA needs to have a rule set that defines the relations between learning goals, scenario states, and interventions (pertaining to both simulation and agents). With these, it imposes some constraints upon the autonomy of simulation and agents to the benefit of maintaining control over training.
2 The Virtual Training System In this section we will first briefly describe the organization of fire-fighting aboard a ship from the perspective of the officer in charge. Then we will point out the structure of our agent-based simulation training. 2.1 On-Board Fire Fighting If aboard a navy frigate a fire breaks out, the Officer of the Watch (OW) is in charge of handling the incident. When the alarm sounds, the OW hastens himself to the Technical Centre (TC) of the ship. From there, he contacts his team, develops a plan to contend the incident, gives orders, monitors the events, and adjusts plans if necessary. The Officer of the Watch communicates with four other officers: Chief of the Watch, Technical Centre Operator, Leader Confinement Team, and the Leader Attack Team. The first two are also situated in the TC, the last two are at or near the incident scene. Several phases can typically be distinguished when contending an incident. Upon the alarm signal, the OW immediately orders initiating actions (e.g. stopping ventilation, checking water pressure, checking for wounded or missing persons) and broadcasts the incident across the ship. He then develops a confinement plan (e.g., cooling 1
Courtesy of VSTEP (www.vstep.nl), the company that developed the simulation environment.
466
K. van den Bosch et al.
compartments adjacent to the fire; switching off power in areas at risk) and an attack plan (attack route, setting smoke borders; passage bans, escape route). Plans are then issued as orders. When the fire is extinguished, a plan for safe removal of smoke and gasses is executed. Finally, restoring and cleaning activities are initiated. The task of the OW is a typical example of decision making in a complex environment. There are, of course, procedures for handling a fire accident. However, the OW also has to anticipate on possible complications, needs to respond to unforeseen actions, has to adjust plans when events require him to do so, and so on. 2.2 The Agent-Based Simulation Training The system under development is a stand-alone low-cost desktop simulation trainer (see Figure 1), to be used by a single trainee who is playing the role of OW. All four other players involved are played by intelligent agents. The Training: In a broad sense, the goal of training is to learn and practice the assessments, procedures and decisions fundamental to fire-command. Instructors from the Navy school translated the abstract training goals into learning objectives, defined in terms of observable behavior. For instance: “trainee selects alternative attack route if circumstances require him to do so (e.g. due to blocked passage)”. Instructors then formulated scenarios. Scenarios are built up from scenes, each representing a phase in the attack of an on-board fire (see 2.1). Each scene contains one or more desired states: states that enable trainees to achieve a learning goal (e.g. a blocked passage on the logical attack route). What events may bring about those states is precisely described in the scenario, e.g. a particular event (aisles are filled with laundry bags), agent behavior (an agent “forgets” to close a door through which smoke enters the attack route), and trainee behavior (waits too long to order fire attack, and fire has spread out to the route). Of course, we deal here not with independent, but with interactive elements of training. For example, the simulation can only cause an event to happen if the trainee has not taken precautionary measures earlier. As we can not know in advance what the trainee will do or not, we need a form of control to select what events must be released or prevented to bring about the desired states for learning. In the next sections we explain how we handle this problem. The Simulation: The avatar of the trainee is situated in the TC of the ship throughout the training (as the OW is in reality). All equipment that is normally used is simulated and available to the trainee (damage board, information panels, communication equipment, etc). The Agents: We use the BDI-framework [8] to develop intelligent and autonomous team agents. In reality, team members communicate by speech. Our simulation has no speech recognition facilities, however. If an agent is the sender, it uses pre-recorded speech expressions. The trainee uses context-sensitive menus to send communication to the agents (see Figure 1).
3 Elements of Virtual Training As mentioned in the introduction, one difficulty of simulation-based training is to balance the players’ freedom of action (of both agents and trainee) on the one hand,
Intelligent Agents for Training On-Board Fire Fighting
467
and control on the scenario on the other. In this section we describe how we organize the elements of virtual training in order to achieve scenarios that are experienced as realistic by trainees, but are also sufficiently controlled to ensure proper learning. Agent-based simulation training generally contains the following elements: a scenario writer; a trainee (here: the OW); autonomous agents (here: team members); and the simulation. All influence the course of the scenario. The scenario writer selects one of more learning goals and specifies -in advance- which main events will bring about a situation that enables trainees to achieve the learning goal(s) (e.g. there is a fire in compartment X of the ship). The trainee has to deal with the situation as he thinks is right (he may, for instance, issue a fire attack plan and give commands to his team agents). The agents respond autonomously to the trainee. Note that, as in real life, this involves more than blindly following the trainee’s commands. For instance, an agent could be of the opinion that the trainee’s plan involves unacceptable risks, and it proposes an alternative plan. Finally, the simulation processes events and actions realistically (e.g. it lets a fire extinguish if it is deprived from oxygen). All elements comprising the virtual training system have a certain degree of freedom, but as they interact, they also influence each other. For instance, the scenario writer determines which events occur in the simulation. The agents and trainee can only execute those actions that are supported by the simulation environment. The interaction between the various autonomous elements makes it difficult to predict the course and outcome of a scenario. Of course, the scenario writer tries to bring certain learning situations about by specifying the appropriate events. But whether or not the aspired situation will in fact occur is not sure because during the session, the scenario writer is unable to exert influence. Therefore, in addition to the specification formulated by the scenario-writer, we need a manner to control the scenario on-line. We advocate the use of a director agent (DA) to control the course of the scenario. A DA can be considered as an agent ‘behind the scene’. The concept originates from studies into interactive narratives where story directors or drama managers are used [9]. In contrast to an intelligent tutor, a DA does not explicitly provide feedback or intervene an exercise [10]. Figure 2 shows our setup of the elements of virtual training. Black arrows represent active influence relations (e.g. the writer influences offline the way the DA should act, while agents and trainee on-line influence the situation in the environment); dashed arrows represent passive influence relations (e.g. the trainee is influenced, but not controlled by the simulation). In the remainder of this section we discuss this model. The scenario writer defines a scenario using a specification language that is interpretable by the DA. The specification language should allow the representation of a sequence of events in the scenario, containing the time of occurrence, nature and location, and factors that complicate solving the problem (e.g. defect equipment).
Fig. 2. The elements of virtual training and their relations
468
K. van den Bosch et al.
Moreover, it should provide the possibility to specify which learning goals are the focus in this (scene of) the scenario (e.g. communication or situation assessment). Indirectly, the writer is constrained in the kinds of scenarios it can write by the simulation and the agents. For instance, in our virtual training system only the TC can be visualized, and therefore the scenario should not involve situations in which Officer of the Watch has to walk out of the TC. After writing such a specification, the writer completely delegates his control over the scenario to the DA. The DA directs the simulation and agents according to the scenario specification. The DA provides the simulation with events that must occur (e.g. a fire has to start). Most of the time, it does not pursue control over the agents; the agents autonomously generate behavior and execute actions in the environment. However, if the scenario requires specific behavior from an agent in order to bring about a desired state, the agent receives instructions from the DA to do so. We discuss the relation between the DA and team agents in more detail in the next section. The simulation processes the events assigned by the DA. Events are specified at a high level of detail. The simulation processes these autonomously to lower-level consequences. For example, the simulation interprets a specification of the event “fire” at a particular location to effects like smoke, limited vision, etc. Under certain circumstances the DA may assign certain events to the simulation that were not specified by the scenario writer (e.g. to bring the scenario back on course after it has been led astray by erroneous or unexpected behavior of the trainee). We will not discuss this further here. Finally, actions of the trainee and agents are processed by the simulation. For instance, if the trainee wants to open a door, it sends a message to the simulation and the door is opened. Communication between agents and trainee is also mediated by the simulation; agents and trainee send communication messages to the simulation, which passes it on to the indicated receiver. The agents and trainee are constrained by a set of actions possible in the simulation. For example, in our system one can contact other persons and make compartments voltage free, but one cannot navigate the ship. The simulation is developed in such a way that it allows for those actions that makes the trainee experience autonomy and control with respect to the training task.
4 Control of Agent Behavior In this section, we discuss how the behavior of the agents representing the trainee’s team members should be controlled by the DA. We first discuss our approach of developing intelligent, autonomous agents. Subsequently, we explain how the DA can pursue control over these agents, without completely taking over their behavior. 4.1 Autonomous BDI-Agents The intelligent agents in our virtual training systems are modeled as experts, implying that they are able to autonomously perform expert behavior in all possible situations. They are developed according to the Belief Desire Intention (BDI) paradigm which stems from folk psychology, i.e. the way people think that they reason [10]. Usually, humans describe their reasoning and explain their actions in terms of beliefs, desires and intentions. The BDI paradigm is based on these three mental concepts.
Intelligent Agents for Training On-Board Fire Fighting
469
As a rule, a BDI agent has beliefs, goals (desires), and intentions (goals to which it commits itself). Usually, BDI agents also have a plan library containing a set of plans. A plan is a recipe for achieving a goal given particular preconditions. The plan library may contain multiple plans for the achievement of one goal. An intention is the commitment of the agent to execute the sequence of steps making up the plan. A step can be an executable action, or a sub-goal for which a new plan should be selected from the plan library. A typical BDI execution cycle contains the following steps: i) observe the world and update the agent’s internal beliefs and goals accordingly, ii) select applicable plans based on the current goals and beliefs, and add them to the intention stack, iii) select an intention and iv) perform the intention if it is an atomic action, or select a new plan if it is a sub-goal. It has been demonstrated that BDI agents can provide virtual players with believable behavior in computer games [12], and in virtual training [13]. To generate such behavior, the agents require domain knowledge, which can be acquired from experts. Because experts tend to explain their actions in terms of beliefs, goals and intentions, expert knowledge can be easily translated to a BDI model [14]. Furthermore, decision making in safety critical situations is often highly procedural in nature: plans for achieving goals under given conditions are thus available. Some goals may be achieved in more than one way, which can be incorporated in the BDI model by defining multiple plans for one goal. 4.2 Director Agent In the previous section we shortly described our approach to developing autonomous agents. The resulting BDI agents are experts in their task domain, but know nothing about training. In other words, they know how to handle an incident even if others make errors (e.g. a trainee), but not how they can support a trainee in his learning process. The responsibility for this second aspect is completely delegated to the DA. To accomplish the desired support, the DA needs an expert model of the role that the trainee is playing, which in our case is the OW. Moreover, it requires didactical knowledge about the relation between learning goals and scenario interventions. The first can be implemented as an expert BDI-model, and the second as a set of rules relating learning goals to possible directions to simulation and agents. The DA knows which learning goals are active, and assesses on-line the current situation emerging in the scenario. In the rule set it is specified which interventions create situations to train the specified learning goals. For example, a possible learning goal is: “check whether initiating measures are taken”; and an intervention is: “prevent other agents from taking initiating measures”. Thus, if the scenario allows for it, the DA selects an intervention from the rule set to still bring about the desired learning situation. An intervention either releases or inhibits an event (e.g. an alarm) in the simulation, or instructs an agent to adopt or drop a goal specified in the rule set (e.g. checking whether it is safe to enter a room). We adhere to the position that to train a specific skill, situations requiring that skill should be created. This means that the right complications need to be introduced. We distinguish two strategies to accomplish such situations. First, the DA can order the team agents not to correct or support the trainee when he or she is making an error. Second, the DA can order team agents to make a mistake on purpose.
470
K. van den Bosch et al.
In order to make team agents sensitive for instructions from the DA for the first strategy, we need to model the team agents in such a way that they have a notion of when the trainee performs suboptimal behavior. We will illustrate this with an example. The model of team agent A may contain a rule that the trainee should inform team agent B about a particular event. If agent A obtains the belief that the trainee failed to do this, it will automatically adopt the goal to bring that inform action about (e.g. by reminding the trainee to inform agent B, or by reminding agent B itself). However, the scenario writer may have specified that the learning goal is ‘communicate situation update to team members’. The DA translates this requirement to a desired state in the scenario that makes the trainee experience the consequences of negligence. In our example, the DA could bring about the intention of the scenario writer by instructing team agent A to withhold his goal to bring the trainee’s missed inform action about. Thus, the reasoning rule that makes a team agent correct an omission of the trainee is applied under the right conditions in the scenario, but only if the DA does not issue a withhold instruction. The second strategy, ordering team agents to make a mistake, is also a powerful didactic instrument. In order to do this, we authorize the DA to change the team agents’ goal-, belief-, and plan bases. The requested mistakes can generally be specified in advance by the scenario writer. If at a particular -prespecified- point in the scenario a certain mistake is necessary, the DA assigns sub-optimal or incorrect beliefs, goals or plans onto the team agent(s). Goals and plans that an agent receives from the DA always receive priority over all other plans and goals in the current intention stack or goal base of the agent, respectively. By giving an agent a false belief, the error of unjustly assuming that a condition is true can be simulated. By giving an agent a false goal, the error of giving priority to a less important goal can be simulated. And, finally, by giving an agent a false plan, the execution of a wrong procedure can be simulated.
5 Discussion Becoming an experienced commander of a safety critical system is a long-lasting undertaking. Good training requires frequent and deliberate practice in situation assessment and decision making. This is not only taxing for the student, but also for the organization responsible for delivering training. It requires ample staff to realize the environments that students need to acquire domain-specific knowledge and to practice assessment and decision making skills. Recent developments in simulator- and agent technology open opportunities to improve this situation. Modern computers are capable of generating highly realistic, dynamic and interactive simulations. Advances in agent technology can be used to generate the behavior of human entities in the simulation. In this paper we report current work on the design of such an advanced training system. We have argued that in order to make such training goal-directed and systematic, a DA can be used that exerts control over the simulation and over the playing agents. The DA can do so by using a rule set defining the relations between learning goals, scenario states, and interventions (pertaining to both simulation and agents). In
Intelligent Agents for Training On-Board Fire Fighting
471
our concept, the DA imposes constraints upon the autonomy of simulation and agents to the benefit of maintaining control over the scenario. The question is then: will the proposed DA indeed achieve the desired control? From earlier work we learned that the main difficulty is endowing an agent with the capabilities to detect that a scenario is going off-course, and to diagnose the nature of digression. We have been able to develop an agent that successfully diagnosed student errors, by combining both outcome- and process measures of student task performance [15]. This diagnosis was subsequently used to present feedback to the student. Similarities with the present work are obvious. Rather than using a diagnosis for selecting feedback, our DA selects an intervention aimed to bring about the desired states for learning. We are therefore confident that the approach will work. Another question is: will interventions yield the scenarios that we hope for? Most likely it will not succeed always. We see the same when team members are played by human instructors. Instructors often make ‘smart moves’ to create challenging learning situations, but they too do not always succeed. Likewise, our DA may fail some of the times. One possibility is that the DA’s rule set contains no intervention that may bring the actual scenario state into a desired scenario state. It is also possible that an applied intervention fails to produce the desired state (e.g. because it elicits the trainee to take an action that blocks the potential effect of the intervention). We want to emphasize here that our goal of using a DA is not to achieve full and total control over a scenario. This would very likely harm the trainee’s sense of autonomy and the experienced realism of the scenarios. The DA must thus be considered as a tool to advance from free-play to more deliberate, goal-directed form of training. A third question is: is the concept of DA appropriate to achieve control? An alternative way of exercising control is to build didactical considerations into the playing agents. In this way, didactical considerations are decentralized. But this can harm control too. If each agent has its own set of behavioral instructions, then several agents at once may act trying to achieve a desired scenario state. This may lead the scenario further astray. Another disadvantage of decentralization is the issue of reusability. For a simulation training it is best if the models underlying the agents can be used for many scenarios. Didactic considerations, however, tend to be scenario specific. What is desirable for achieving a particular learning goal doesn’t necessarily need to be desirable for another. We consider it therefore important to separate domain-related knowledge required to generate task behavior from didactical knowledge required to exert control over the scenario. Concluding, recent developments in simulation, cognitive modeling, and agent technology promise better opportunities for autonomous training in decision making. Our approach presented here should be able to add learning value to that promise. Acknowledgments. This research has been supported by the Netherlands Department of Defence (032.13359) and by the GATE project, funded by the Netherlands Organization for Scientific Research (NWO) and the Netherlands ICT Research and Innovation Authority (ICT Regie).
472
K. van den Bosch et al.
References 1. Ericsson, K.A., Krampe, R.T., Tesch-Römer, C.: The Role of Deliberate Practice in the Acquisition of Expert Performance. Psychological Review 100(3), 363–406 (1993) 2. Oser, R.L.: A structured approach for scenario-based training. In: 43rd Annual meeting of the Human Factors and Ergonomics society, Houston, TX, pp. 1138–1142 (1999) 3. Blackmon, M.H., Polson, P.G.: Combining Two Technologies to Improve Aviation Training Design. In: Proc. of Human Computer Interaction (HCI), pp. 24–29. AAAI, Menlo Park (2002) 4. Cannon-Bowers, J.A., Burns, J.J., Salas, E., Pruitt, J.S.: Advanced technology in scenariobased training. In: Cannon-Bowers, J.A., Salas, E. (eds.) Making decisions under stress: implications for individual and team training, pp. 365–374. APA, Washington (1998) 5. Fowlkes, J., Dwyer, D.J., Oser, R.L., Salas, E.: Event-based approach to training (EBAT). The International Journal of Aviation Psychology 8(3), 209–222 (1998) 6. van den Bosch, K., Riemersma, J.B.J.: Reflections on scenario-based training in tactical command. In: Schiflett, S.G., Elliott, L.R., Salas, E., Coovert, M.D. (eds.) Scaled Worlds: Development, Validation and Applications, pp. 1–21. Ashgate, Aldershot (2004) 7. Doesburg, W.A., van den Bosch, K.: van den: Cognitive Model Supported Tactical Training Simulation. In: Conference on Behavioral Representation in Modeling and Simulation (BRIMS), Universal city, CA, pp. 313–320 (2005) 8. Rao, A., Georgeff, M.: Modeling rational agents within a BDI-architecture. In: Allen, J., Fikes, R., Sandewall, E. (eds.) Proceedings of the 2nd International Conference on Principles of Knowledge Representation and Reasoning, pp. 473–484. Morgan Kaufmann publishers Inc., San Mateo (1991) 9. Riedl, M.O., Stern, O.: Believable agents and intelligent scenario direction for social and cultural leadership training. In: Proc. of the 15th Conference on Behavior Representation in Modeling and Simulation, Darmstadt, Germany (2006) 10. Riedl, M.O., Lane, H.C., Hill, R., Swartout, W.: Automated Story Direction and Intelligent Tutoring: Towards a Unifying Architecture. In: Paper of the Narrative Learning Environments Workshop at the 12th International Conference on Artificial Intelligence (AIED 2005), Amsterdam, The Netherlands (2005) 11. Bratman, M.: Intention, Plans and Practical Reason. Harvard University Press, Cambridge (1987) 12. Norling, E.: Capturing the Quake Player: Using a BDI Agent to Model Human Behaviour. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, Melbourne, Australia, pp. 1080–1081 (2003) 13. van den Bosch, K., van Doesburg, W.A.: Training Tactical Decision Making Using Cognitive Models. In: Proceedings of the Seventh International NDM Conference, Amsterdam, the Netherlands (2005) 14. Norling, E.: Folk Psychology for Human Modelling: Extending the BDI Paradigm. In: Third International Joint Conference on Autonomous Agents & Multi Agent Systems, New York, pp. 202–209 (2004) 15. Heuvelink, A., Mioch, T.: FeGA: A cognitive Feedback Generating Agent. In: Proceedings of the 7th IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT 2008). IEEE Computer Society Press, Los Alamitos (2008)
Eprescribing Initiatives and Knowledge Acquisition in Ambulatory Care Ashley J. Benedict1, Jesse C. Crosson2,3, Akshatha Pandith1, Robert Hannemann4,5, Lynn A. Nuti6, and Vincent G. Duffy1,7 1
Industrial Engineering, Purdue University, 315 N. Grant Street, West Lafayette, IN 47907 Department of Family Medicine, University of Medicine and Dentistry, New Jersey, New Jersey Medical School, 185 South Orange Avenue, MSB B648, Newark, NJ 07101 3 Research Division, Department of Family Medicine, University of Medicine and Dentistry, New Jersey, Robert Wood Johnson Medical School, One World's Fair Drive, Somerset, NJ 08873 4 Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, IN 47907 5 Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, IN 47907 6 Nursing, Purdue University, 502 N. University Street, West Lafayette, IN 47907, 7 Agricultural and Biological Engineering, Purdue University, 225 South University Street, West Lafayette, IN 47907 {ajbenedi,apandith,hanneman,lnuti,duffy}@purdue.edu, [email protected] 2
Abstract. Electronic prescribing [eprescribing] is where prescriptions are generated through an automated data-entry process utilizing special software and a network linked to pharmacies. National and state initiatives are intended but are not yet very effective in educating and encouraging healthcare providers to use eprescribing. This study included interviews with 102 healthcare providers from 52 locations (California, Indiana, New Hampshire, and Ohio) to determine the providers’ knowledge of eprescribing initiatives as well as how they acquired knowledge about these systems. Providers in New Hampshire had the most knowledge of eprescribing systems and their state initiative. Among nonusers, only two facilities were familiar with the national initiatives. Nonusers comprised 71% of the interviews. Eprescribing information was acquired through journals, conferences, pharmacies, other providers and in some cases when receiving care as a patient. Keywords: eprescribing, initiatives, knowledge acquisition.
1 Introduction In 2004, President George W. Bush mandated that in the next ten years all medical records should be electronic in the United States [1]. In an effort to continue the development and implementation of health information technology, President Barack Obama signed the American Recovery and Reinvestment Act on February 17, 2009, which designates $59 billion of funding for health information technology [2]. Since V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 475–482, 2009. © Springer-Verlag Berlin Heidelberg 2009
476
A.J. Benedict et al.
not all healthcare providers and facilities have resources to implement a full electronic health or medical record system, subsets of healthcare electronic systems are being implemented in phases. The Medicare Modernization Act (MMA) of 2003 recognized a need for providers to prescribe medication electronically through electronic prescribing, also known as eprescribing [3]. Eprescribing is where prescriptions are generated through an automated data-entry process utilizing special software and a network linked to pharmacies. MMA requires Medicare Part D plans to support electronic prescribing with a planned implementation date of April 2009 [3]. For a system to qualify for the Medicare ePrescribing Incentive [4], the eprescribing system should be able to perform the following tasks: • Generate a complete active medication list; • Allow eligible professionals to select medications, print prescriptions, transmit prescriptions electronically and conduct all alerts (e.g. inappropriate dose or route of administration of the drug, drug-drug interactions, allergy concerns, or warnings/cautions; • Provide information on lower cost therapeutically appropriate alternatives; • Provide information on formulary or tiered formulary medications, patient eligibility and authorization requirements; and • Should meet the Part D specifications for messaging that will be implemented April 1, 2009. The eprescribing system must send the prescription electronically to the pharmacy unless the pharmacy is only able to accept fax, in that case an electronic fax can be sent. The eprescribing system can be either a stand-alone system or it can be integrated into an electronic medical or health record. Medicare will award prescribers who use qualified eprescribing systems a two percent bonus on top of their fee for e-prescribing beginning this year and again in 2010. The bonus will drop to one percent in 2011 and 2012 and then drop to half a percent in 2013. Prescribers have to use eprescribing on at least 50 percent of the Medicare Part B claims during the reporting year [4]. Starting in 2012, prescribers who do not use a qualified eprescribing system will have their fee schedule amounts reduced. The objectives of this study were (1) to determine healthcare providers’ awareness of national, state, or local initiatives and (2) to gain insight into how providers obtain information about eprescribing systems. 1.1 Initiatives The National ePrescribing Patient Safety Initiative (NEPSI) was launched in January 2007 to encourage healthcare providers to adopt eprescribing [5]. NEPSI offers free eprescribing software, Allscripts ePrescribe from Allscripts™. This software is a stand-alone application which qualifies for the Medicare ePrescribing Incentive program. Many states have encouraged practicing healthcare providers to implement eprescribing and some states have had successful pilot projects for launching eprescribing systems including Massachusetts [6], Florida [7], and southeast Michigan [8]. For the southeast Michigan pilot project, financial incentives were provided to healthcare providers who utilized the eprescribing system at least 20 times per month [8].
Eprescribing Initiatives and Knowledge Acquisition in Ambulatory Care
477
California. The California HealthCare Foundation began focusing on educating healthcare providers about eprescribing in January 2006 [9]. Regional pilot projects have been supported by the foundation to understand the challenges and opportunities present when eprescribing is implemented. Interviews with 20 healthcare industry leaders identified four key objectives to support a statewide program: (1) increase payer participation, (2) increase pharmacy participation, (3) increase provider adoption, and (4) raise awareness and demand among purchasers and consumers [10]. Indiana. In October 2008, the Indiana Eprescribing Initiative was launched to promote the rapid adoption of eprescribing throughout Indiana [11]. The Employers Forum Electronic Prescribing Committee focused on three subcategories: (1) hospital and physician issues plus incentives, (2) technology, and (3) pharmacy. Four pilot communities have been identified to test and develop “best practice” processes for implementing and utilizing eprescribing systems. Presently, no financial incentives, hardware, nor software is being provided to the healthcare providers interested in implementing eprescribing in their practice. NOTE: The initiative in Indiana started after the data were collected for this study. New Hampshire. In late 2007, the Local Government Center (LGC) launched an initiative to increase the number of eprescribers across New Hampshire. The initial goal was to have all providers prescribing medications electronically by October 2008 [12]. A letter was sent to providers with information related to iScribe software for PDA and Web. The LGC offered a Palm TX handheld, in-office installation services for all hardware and software, WiFi router for sending prescriptions to the pharmacy, new WiFi-enabled printer for easy in-office prescription printing, and one-on-one phone training for facilities that introduced this software into their practices. The main requirements prior to implementation were access to high-speed internet and an electronic patient list. This initiative does not offer any financial incentives to providers who adopt the system and continue to utilize the system after implementation. 1.2 Knowledge Acquisition Information can be disseminated through many mediums such as person-to-person, teacher-to-student, internet, radio, television, newspaper, journals, etc. Healthcare providers can begin to acquire information related to eprescribing in their medical training as well as continuing medical education [13]. The national initiative has a website with information for the providers such as the benefits of the system and an opportunity to register for the free eprescribing system offered. Before a decision can be made to implement a new system such as eprescribing, the expected user needs to become aware of an innovation and have some idea of how it functions by acquiring knowledge [14]. This is followed by persuasion where the expected user forms a favorable or unfavorable attitude toward the innovation [14]. Healthcare providers who have successfully implemented and used eprescribing systems were familiar with the system through residency training and professional conferences [15]. Facilities who installed followed by discontinued use exhibited little advance knowledge of program functions and the impact that implementing would have on workflows [15]. Not only is communication outside of the healthcare facility important to gain knowledge about eprescribing, but discussion and communication within the practice improves implementation efforts [16].
478
A.J. Benedict et al.
2 Methods In order to determine the awareness of the initiatives and the providers’ knowledge of eprescribing systems, it was determined that a qualitative study of healthcare providers focusing on the following categories was necessary: − Users versus nonusers − Primary care (including pediatrics and internal medicine) versus specialty − Prescribers (physicians, physician assistants, and nurse practitioners) versus nonprescribers (nurses, medical assistants, office managers, and other supporting staff) − Location (California, Indiana, New Hampshire, Ohio) Healthcare facilities phone numbers were retrieved from an insurance company’s online provider directory. Approximately 3,900 facilities were called and asked if a prescriber and a non-prescriber would be willing to participate in a 30 minute face-toface interview with a researcher. Interviews were conducted between May and July 2007 by two researcher assistants trained in qualitative interviewing. 2.1 Data Collection Fifty-two practices participated in the study. Three practices were excluded from analysis. Two were excluded since no prescribers were interviewed in those sites and one was excluded since it was the only practice from Ohio. Of the remaining 49 locations, 15 were completed in California, 16 in Indiana, and 18 in New Hampshire. Seventy-one percent of the facilities were not utilizing eprescribing systems and 55% were primary care practices. See Table 1 for a breakdown of the data collected by facility. The study was approved in advance by the Purdue University institutional review board. Table 1. Practices Included in Analysis Nonusers Users Nonusers Users Nonusers Users Total
CA IN NH
Primary Care 7 1 5 1 8 5 27
Specialty 4 3 7 3 4 1 22
Total 11 4 12 4 12 6 49
Nonusers were asked to share what they had heard previously from other users to understand where and how they have acquired this knowledge about eprescribing. To determine the impact of the initiatives, users and nonusers were asked to discuss their knowledge about the initiatives. Below are two questions asked to the nonusers and one question asked to the user from the ten questions asked to the providers: − Nonusers o You have probably heard of other healthcare providers’ experiences with this system. Tell me about them. o Tell me about any state or national initiatives concerning eprescribing that you know about.
Eprescribing Initiatives and Knowledge Acquisition in Ambulatory Care
479
− Users o Did the state or national initiatives influence your decision to implement the eprescribing system? Each interview was digitally-record and transcribed for analysis. Each interviewer completed field notes that summarized the interview from their point-of-view. 2.2 Data Analysis Data consisted of rich text files containing interview transcripts and field notes. This data was imported into ATLAS.ti for coding and analysis. The research team consisted of a physician, a nurse practitioner, a human factors researcher, a political scientist, and two research assistants. A coding template was developed that included 10 codes for nonusers and 11 codes for users. To begin the coding process, 10% of the data was coded by the research team to develop consensus. After these files were coded, the research team members were given an identical set of documents to code for evaluation of agreement. The physician and human factors researcher from Purdue University coded the data as a pair. Fleiss' Kappa statistic of 0.78 was calculated indicating good agreement. The remaining data was distributed for individual coding. Coding reports were extracted from ATLAS.ti for the ‘initiative’ and ‘prior knowledge of eprescribing’ codes.
3 Results and Discussion In the 15 facilities using an eprescribing system, nine different systems were implemented and only one location used a standalone eprescribing system. The other 14 sites had integrated systems linked to their electronic medication record programs. Participants provided insight into their awareness of state and national initiatives as well as prior knowledge of eprescribing. Direct quotes from digitally-recorded transcripts appear in quotation marks; additional information comes from observational field notes. 3.1 Provider Awareness of Initiative Responses varied by type of implementation such as users, nonusers planning to implement, and nonusers not planning to implement. The location of the facility by state had a large impact of awareness of initiative. New Hampshire participants had the most knowledge about their state initiative. Facilities in California and Indiana had limited knowledge about their respective state initiatives. National initiatives were not clearly understood by healthcare providers and in some cases confused with the national electronic medical record initiative. 3.1.1 California Knowledge of either state or national initiatives across users, nonusers not planning to implement, and nonusers planning to implement in California was poor. Only one interviewee had even vague knowledge of the Medicare eprescribing initiative to encourage eprescribing use. A current user said, “I know that the government is kind
480
A.J. Benedict et al.
of encouraging the physicians and if you submit your data and everything online and use the eprescription there is a financial incentive.” Some interviewees thought the question referred to national goals announced by President Bush to have all patients with an electronic medical record by 2014. None of those nonusers planning to implement an eprescribing system reported any knowledge of state or national initiatives. Of those nonusers not planning to implement, one interviewee reported vague knowledge of some way that “Medicare ... mandated that there be an e-record ... I think it's two years from now.” 3.1.2 Indiana Five of the 12 facilities in Indiana that were nonusers were aware of national initiatives for electronic medical records but not eprescribing systems. None of the users were familiar with any initiatives except for one prescriber that “heard of them, except for the one that…not just electronic prescribing, but electronic medical records would be required by federal payers.” Consideration should be given as there was no state initiative during the time of the interviews. An initiative has since been implemented (http://sites.google.com/site/erxindiana). 3.1.3 New Hampshire For users, nonusers not planning to implement, and nonusers planning to implement, the knowledge of initiatives had mixed results. Users were not knowledgeable of eprescribing with only one facility implementing because the prescriber “serve(d) on the committee that made the recommendation to (New Hampshire’s governor).” Four out of the five facilities that were nonusers but planning to implement mentioned the state initiative, and two of these facilities were familiar with the CMS incentive program. Nonusers not planning to implement had minimal awareness of the initiatives. Out of six facilities visited, three knew about the LGC offer, and only one was vaguely familiar with the national initiative. One prescriber stated “I’ve heard there’s one in New Hampshire, and I’ve also heard there are several national suggestions, but I don’t know if there are any actual things being done about that. I know there is a plan in the state to do it.” 3.2 Eprescribing Information Acquisition Prior knowledge was gathered by the participants through different mediums. Most participants who were nonusers but planning to implement reported very limited prior knowledge of particular eprescribing systems or features in California and Indiana. What little information these nonusers of eprescribing had, came from conference presentations or personal experience rather than through talking with their peers. Those who did report prior knowledge reported having heard that eprescribing would improve efficiency in the office. As one put it “the ... system, from the information ... that I've ... come across, has enabled other physicians ... to expedite their prescribing ... (and) it helps patients get their medication (in) a timely manner.” For nonusers not planning to implement, prior knowledge of eprescribing was also limited. Interviewees typically had not heard about specific eprescribing programs or could not remember those they had heard of. Those who did report prior knowledge
Eprescribing Initiatives and Knowledge Acquisition in Ambulatory Care
481
reported having heard of eprescribing through continuing medical education programs, through their own experiences as patients, and, rarely, from their own training experiences. A typical response was “I've heard about eprescriptions ... from here and there ... I'm not sure what (or) how that will work and how it's going to make things easier ... I'm not sure.” One facility in Indiana had “actually learned (about) it through the pharmacies and … friends … in a couple of physician offices that have actually been talking about it.” The majority of New Hampshire nonusers were aware of eprescribing or at least electronic medical records. One prescriber stated that he had “not talked to anybody who’s had a standalone e-prescribing system. The one’s (he’s) talked to have Electronic Medical Record.” One nonusers mentioned “get a mailing about it” and another received a “small (journal) page that was forwarded to (her) from (her) boss regarding e-prescribing.” 3.3 Critical Mass and Diffusion of Innovation Some interviewers mentioned that pharmacies were a source for acquiring knowledge about eprescribing, and some mentioned that one reason they were not currently using eprescribing system is due to some of the independent pharmacies not being able to interact with these systems. As the majority of pharmacies and healthcare practices implement eprescribing systems, a critical mass will be met and the benefits of having the system will outweigh the cost [17]. Eprescribing is currently early in the adoption phase and healthcare providers are in the beginning stages of the innovation decision process. The healthcare providers must gain knowledge, be persuaded, decide to adopt or reject the innovation, implement, and then confirm their decision [14]. Once the late majority and laggards adopt, the system will have widespread adoption. Acknowledgements. This research was supported through funding provided by the Regenstrief Center for Healthcare Engineering.
References 1. Bush, G.W.: Executive order: 13335: Incentives for the use of health information technology and establishing the position of the national health information technology coordinator (2004) 2. American Recovery and Reinvestment Act (ARRA) of H.R.1. In: 111th Cong. (2009) 3. Bell, D.S., Friedman, M.A.: E-prescribing and the Medicare modernization act of 2003. Health Aff 24, 1159–1169 (2005) 4. Center for Medicare and Medicaid Services E-prescribing Incentive Program, http://www.cms.hhs.gov/EPrescribing 5. National ePrescribing Patient Safety Initiative, http://www.nationalerx.com 6. Halamka, J., Aranow, M., Ascenzo, C., Bates, D.W., Berry, K., Debor, G., Fefferman, J., Glaser, J., Heinold, J., Stanley, J., Stone, D.L., Sullivan, T.E., Tripathi, M., Wilkinson, B.: E-Prescribing collaboration in Massachusetts: early experiences from regional prescribing projects. J. Am. Med. Inform. Assoc. 13, 239–244 (2006) 7. Florida ePrescribe Clearinghouse, http://www.fhin.net/eprescribe/Index.shtml
482
A.J. Benedict et al.
8. Southeast Michigan ePrescribing Initiative, http://mhcc.maryland.gov/electronichealth/ ehealth_presentations/schueth_semi_incentives.pdf 9. Sarasohn-Kahn, J., Holt, M.: The prescription infrastructure: are we ready for ePrescribing? California HealthCare Foundation iHealth Report (2006) 10. Leslie, T.: Getting Connected: The Outlook for Electronic Prescribing in California. California HealthCare Foundation (2008) 11. Employers’ Forum of Indiana eRX Initiative, http://sites.google.com/site/erxindiana 12. Boulter, P., Miller, P.: Stepping Up to the Future: NH Citizens Health Initiative, http://www.unh.edu/chi/media/pdfs/ePrescribing%20in%20NH.pdf 13. Ford, E.W., Menachemi, N., Phillips, M.T.: Predicting the adoption of electronic health records by physicians: when will health care be paperless? J. Am. Med. Inform. Assoc. 13, 106–112 (2006) 14. Rogers, E.: Diffusion of Innovations. Free Press, New York (2003) 15. Crosson, J.C., Ohman-Strickland, P.A., Hahn, K.A., DiCicco-Bloom, B., Shaw, E., Orzano, A.J., Crabtree, B.F.: Electronic medical records and diabetes quality of care: results from a sample of family medicine practices. Annals of Family Medicine 5, 209–215 (2007) 16. Crosson, J.C., Isaacson, N., Lancaster, D., McDonald, E.A., Schueth, A.J., DiCiccoBloom, B., Newman, J.L., Wang, C.J., Bell, D.S.: Variation in electronic prescribing implementation among twelve ambulatory practices. J. of Gen. Intern. Med. 23, 364–371 (2007) 17. Dix, A.J., Finlay, J.E., Abowd, G.D., Beale, R.: Human-computer interaction. Prentice Hall, London (1998)
Using 3D Head and Respirator Shapes to Analyze Respirator Fit Kathryn M. Butler National Institute of Standards and Technology, Building and Fire Research Laboratory, 100 Bureau Drive, Gaithersburg, Maryland 20899-8665, USA [email protected]
Abstract. A computational approach to analyzing respirator fit is demonstrated using geometries generated by laser scanning, mechanical drawings, and CAD files. Three fit-related problems that can be solved using computational tools are demonstrated: 1) The study of an outward leak of breathing gases into a near-flammable environment. 2) The study of a flow field inside a halffacepiece respirator. 3) The characterization of the relationship of respirator design and head shape to fit and comfort. Keywords: Respirator fit, digital human modeling, 3D laser scanning, finite element method.
1 Introduction Respiratory protection requires a good fit of the respirator to the individual's face. A poor fit may result in the leakage and inhalation of contaminants, and may also cause discomfort to the user, especially during lengthy operations on the job. People who need respiratory protection have a wide range of facial shapes and sizes. In order to fit nearly every member of the workplace population, respirator manufacturers typically design a family of sizes for each respirator type. A recent anthropometric survey of US respirator users by the National Institute for Occupational Safety and Health (NIOSH) has identified a fit-test panel encompassing 95 % of the US workforce that can be used to achieve more effective respiratory protection [1]. However, respirators are not infallible – an individual may pass the required fit test but still experience leaks due to improper donning, overbreathing during high work loads, jarring, facial hair, or sweat. Some users may find it difficult to get a respirator that fits well because of facial asymmetry or unusual facial dimensions. The advent of powerful computational tools provides new methodology to study issues of respirator fit. Laser scanners make it possible to create digital representations of the complex geometries of human heads and respirator facepieces. These three-dimensional images can then be manipulated and subjected to computational analysis of how well a specific respirator operates for a specific user. The finite element method (FEM) is the most widely-used numerical technique for solving partial differential equations over complex physical domains. A complex geometry is subdivided into a mesh consisting of elements with nodes at the corners. The elements may be triangular or quadrilateral over a two-dimensional object and V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 483–491, 2009. © Springer-Verlag Berlin Heidelberg 2009
484
K.M. Butler
tetrahedral or brick throughout a three-dimensional object. Elements may vary in size and shape and may be concentrated in a region where gradients are high or where a finer solution is desired. Commercial FEM packages provide tools for solving problems involving fluid flow, structural analysis, heat transfer, chemistry, and mechanical contact, among many other phenomena. This paper discusses the methods used to generate the geometry of heads and respirators and then presents three examples of the use of computational tools to investigate problems of respirator fit.
2 Geometrical Representations To study the fit of a human head to a respirator using FEM, both the geometry of the head and the geometry of the respirator are needed. A 3D laser scanner generates a set of points in three dimensions that defines the location of the surface. Image reconstruction software can then convert the digital point cloud into a set of surface entities that can be used to set up the finite element model. The geometry of the respirator may alternatively be obtained through CAD files or mechanical drawings when these are made available. Figure 1 shows heads and respirators as developed individually and as combined for analysis.
Fig. 1. Respirators and heads generated from mechanical drawings (the full facepiece respirator) and 3D laser scanning. To combine the two, each respirator has been distorted to fit the geometry of the head.
All of these methods are time-consuming and subject to inaccuracy. A laser scan requires repair of holes in the point cloud where the scanner does not get good coverage and the cleanup of rough or imprecise regions. Mechanical drawings must be converted by hand into a three-dimensional computational object, and CAD files are not easily converted into a finite element model. Finally, after the creation of the individual geometries for the head and the respirator, the two must be carefully aligned and combined into a single model. In real life, the rubber or silicone respirator seal and the skin of the face both distort during mounting, so the respirator cannot simply be translated in space during assembly. The generation of a complex geometry for a
Using 3D Head and Respirator Shapes to Analyze Respirator Fit
485
finite element model has a well-deserved reputation for consuming considerable amounts of time, often more than the solution of the model itself. Ways to speed up this process continue to be sought.
3 Outward Leak The first of three respirator fit-related problems is a study of an outward leak of breathing gases from a closed-circuit self-contained breathing apparatus (CC-SCBA). This respirator system recirculates the gases breathed by the wearer, chemically scrubbing carbon dioxide and adding oxygen from a storage tank. The advantage is the ability to work for up to four hours without exchanging the tank, as compared to a limit of a half-hour to an hour for the standard compressed air tank carried by firefighters. The concern is the accumulation of oxygen within the facepiece, which was measured in laboratory testing at levels as high as 90 % [2], and the potential danger posed by an outward leak into an environment containing flammable gases. A finite element model was used to determine the difference in size between a flammable region resulting from a leak of air into a near-flammable environment and that resulting from a leak of oxygen [3,4], as a qualitative method to compare hazards. The selected environment was 10 % by volume of propane in air. Since the upper flammability limit (UFL) of propane in air at room temperature is 9.5 % by volume, this mixture is slightly too fuel rich to ignite. Introduction of pure air or oxygen into this mix from a leak creates a flammable mixture in some region surrounding the leak. Temperature was not considered in this problem. The computational area was defined as a box outside of a head form and mask, as shown in Figures 1 and 2. The leak was defined as a strip 1 mm wide by 43.5 mm long along the interface between head and mask near the temples.
Fig. 2. Leak of pure oxygen into a 10 % propane environment over two breathing cycles. The plane shown in the sequence is from the center of the leak region (upper left figure).
486
K.M. Butler
Fig. 3. Leak of air into a 10 % propane environment over two breathing cycles
Figures 2 and 3 show the results for leaks of oxygen and air, respectively. These analyses were carried out for two breath cycles at the typical breathing rate of a person at rest, with flow out of the respirator leak only during the exhalation period. The gray shaded areas indicate the concentration of oxygen. The flammable regions are marked by lines that outline a balloon-like volume cut along a cross-sectional plane. For an oxygen leak at this breathing rate, the flammable region is attached to the respirator during the entire breathing cycle, while for an air leak it detaches during inhalation and moves away from the head. The flammable region resulting from a pure oxygen leak is significantly larger than that from air. Heavier and more rapid breathing, such as that caused by exertion, increases the flammable region further in both cases. The conditions considered here are worst case conditions: pure oxygen within the facepiece, a near-flammable environment, and still air, so that the results represent an upper bound for the difference in hazard between a leak from a closed-circuit SCBA and a leak from an open-circuit SCBA.
4 Interior Flow Field The second problem is a study of the flow field inside a half-facepiece respirator. Laboratory investigations of respiratory protection rely on sampling of pressure and aerosol levels using probes mounted within the respirator facepiece [5, 6]. Because of incomplete mixing of the gases and aerosols, measurements vary depending on the location of the probe and the location of any leak [7, 8]. Known as sampling bias, this is also an issue with quantitative fit testing of users using the Portacount, which compares the aerosol particulate levels measured by a probe inside the facepiece to those outside.
Using 3D Head and Respirator Shapes to Analyze Respirator Fit
487
Using FEM, or computational fluid dynamics (CFD), to solve for the velocity and pressure fields within the respirator during breathing, allows the determination of the flow within the entire volume. This provides insight into the breathing environment of the respirator user, shows where sampling bias may be a problem, and may suggest a better location within the facepiece to place a probe during testing. For this problem, the head and half facepiece air purifying respirator (APR) shown in Figure 1 were combined, and the computational space was defined as the space in between those surfaces, as shown in Figure 4. Mouth breathing was assumed, so part of the preparation of the head required opening the mouth by rotating the surfaces of the lower lip and chin about an axis representing the jaw hinge. The boundary conditions were dependent on the inhalation or exhalation part of the breathing cycle. During inhalation, the side valves open, and air is drawn in through filters mounted on the respirator (the filters were not considered in the model). During exhalation, the side valves close and the center valve opens to expel the exhaled gases into the surroundings. The geometry of the valve openings was not considered in this model, so the entire area of a valve was free when it was open. The flow was driven by the inlet conditions of the mouth: a uniform positive velocity into the computational space perpendicular to the surface of the open mouth during exhalation, and a uniform negative velocity during inhalation. The magnitude of the velocity was calculated using the flow rate of a resting person divided by the area of the open mouth, and followed a simple sine function from exhalation to inhalation to exhalation. During exhalation, shown in Figure 4, the center valve of the respirator is open and side valves are closed. The gases exhaled from the mouth impact the wall of the respirator directly in front of the mouth. This causes a region of high pressure that is surrounded by large pressure gradients. A probe located in this volume may be susceptible to sampling bias due to the spatial variability of pressure and velocity. The exhaled gases move away from the wall in all directions, sweeping around the contours of the respirator.
Fig. 4. Velocity vectors, surface pressure and streamlines during exhalation
488
K.M. Butler
The behavior of the flow during inhalation is quite different, as shown in Figure 5. In this case, the inhaled gases follow a direct path from the open side valves to the mouth. The pressure is uniform around the inner walls of the respirator and jumps to a different value at the side valves. This result would make it difficult to monitor the inhaled gases and particulates, since little mixing appears to take place during this part of the breathing cycle and a probe is not easily placed in the path of the flow.
Fig. 5. Velocity vectors, surface pressure and streamlines during inhalation
By running this analysis for a variety of defined leaks, the effects of leaks on the velocity and pressure fields can be studied. It may be possible to determine sensor locations and a monitoring scheme to determine whether respiratory protection has been compromised. By identifying chemical components of inhaled and exhaled gases, this model could also be used to identify dead spaces within the respirator that accumulate carbon dioxide and may be inhaled in subsequent breaths.
5 Computational Fit The third problem uses real material properties of the respirator and skin over bone to characterize the relationship of respirator design and head shape to fit and comfort. The solution of this problem has not yet been achieved, so this paper will report on the approach and the preparation of the initial configuration. The idea is to computationally push the respirator onto the face, distorting its shape as necessary, until the head is "wearing" the respirator. As the respirator is pushed into place, the FEM software determines the areas of contact and contours of stress. Contours of low stress indicate areas where leaks would be most likely, and contours of high stress indicate areas where the respirator would be pulled the tightest to get a good seal, and thus areas of greatest tactile discomfort. This approach has been successfully tried in the past, in a Phase I SBIR project for the Army Research Laboratory [9]. This project solved a contact problem for a single respirator seal and a nosecup pushed onto a generic face. After reviewing the literature on the material properties of rubber and skin, they selected a hyperelastic model for both
Using 3D Head and Respirator Shapes to Analyze Respirator Fit
489
materials. The contact problem included friction between the respirator and skin. The end result of the analysis was a map of contact pressures of the respirator seal against the face that was compared to pressure levels that cause tissue damage and pain. One difficulty with this problem is the thinness of the seal. An accurate representation of material properties requires a thickness of at least three elements, and limitations on the aspect ratio of finite elements will make this a large problem, especially if multiple seals are attempted. The setup of this model begins with a respirator geometry obtained from a CAD file, shown to the left in Figure 6. The rigid portions of that geometry surrounding the faceshield and valve hardware can be removed and replaced in the model with rigid boundary conditions. The straps are also removed, leaving the multiple seal geometry in the center of Figure 6. Finally, as an initial problem to test the software and determine the feasibility of modeling multiple seals, all seals are removed except the seal closest to the face, resulting in the single seal geometry at the right. To illustrate the amount of distortion that is required to mount the respirator on the face, Figure 7 shows the combination of the rigid multiple seal geometry with the rigid head. The respirator cuts through cheeks and temple while leaving a wide gap between the top of the respirator and the forehead.
Fig. 6. Simplification of CAD file of respirator to a single seal for initial testing
Fig. 7. Rigid respirator combined with rigid head. The arrows point out gaps and overlapping areas.
490
K.M. Butler
Fig. 8. Initial configuration of respirator seal and face
Figure 8 shows an initial configuration for the single respirator seal. The seal is in close proximity to the cheek, the first point on the head that it will encounter. Since the head and respirator are roughly symmetrical, the problem can be cut in half. The skin is created by copying the geometry of the face at a smaller size to generate an inner layer. The thickness of the skin on the face varies with location and the individual. On average, the skin of the forehead is 5 mm thick, the skin of the chin is 9 mm thick, and the skin of the cheek is 21 mm thick [10]. To approximate skin over bone, therefore, the copy of the outer layer of skin is reduced by a different factor in x-, y-, and z-dimensions to approximate these thicknesses on forehead, chin, and cheek. A gridwill be created between the inner and outer facial geometries, and the inner surface will be held rigid (the “bone”) during the analysis. To carry out the analysis, an elastic contact problem will be solved. A load applied to the locations of the base of the straps will pull the respirator seal toward the face until contact is achieved, and the load will be increased until the seal is in contact with the head around its full circumference. For a perfect fit, the contact pressures would be equal over the entire seal. The actual distribution of contact pressures will show where the areas of potential discomfort and leaks would be, and the magnitude of the pressure difference may provide a means to quantify the goodness of fit for this analysis. The end result of this model could be used as the geometry for the first and second problems discussed in this paper, replacing the arbitrary shaping of the respirator to fit onto the head in each case. The intent of this project includes analysis of multiple heads and respirators and the fit for different facial expressions (e.g. during talking and coughing).
6 Discussion These three examples show some of the investigations into respirator fit and efficacy that can be conducted using computational tools. To fully trust the results, the models need to be validated. It is difficult to visualize the flow field inside a respirator experimentally, but pressure is easily measured and can be used for comparison with the interior flow field model. For the computational fit model, the placement of the respirator on the head can be compared with a 3D scan of the person wearing the respirator.
Using 3D Head and Respirator Shapes to Analyze Respirator Fit
491
Computational techniques that significantly improve the speed of creating 3D geometries and converting them to finite element format are needed. As new, more efficient methods for representing human shapes are developed, thought should also be given as to how the geometry will be connected to interfacing geometries, such as the head to a respirator or helmet. Finally, consideration should be made not only to representations of fixed outer surfaces, but to the body in-depth and in motion as well. Acknowledgments. The author would like to thank Ronald Shaffer, Ziqing Zhuang, William Newcomb, John Kovac, and Nicholas Kyriazi of NIOSH/NPPTL , Daniel Crowl of Michigan Technological University, Robert Sell of Draeger Safety, and Rodney Bryant and Nelson Bryner of NIST for helpful technical discussions and assistance. Dennis Viscusi of NIOSH/NPPTL contributed the 3D laser scans. Disclaimer. This work was carried out by the National Institute of Standards and Technology (NIST), the National Institute for Occupational Safety and Health (NIOSH), and the Department of Homeland Security (DHS), agencies of the U.S. government and by statute is not subject to copyright in the United States. Certain commercial equipment, instruments, materials or companies are identified in this paper in order to adequately specify the experimental procedure. This in no way implies endorsement or recommendation by NIST, NIOSH, or DHS.
References 1. Zhuang, Z., Bradtmiller, B.: Head-and-face anthropometric survey of U.S. respirator users. J. Occup. Envir. Hyg. 2, 567–576 (2005) 2. Kyriazi, N.: Performance Comparison of Rescue Breathing Apparatus. Report of Investigations 9650, National Institute for Occupational Safety and Health (1999) 3. Butler, K.M.: A Computational Model of Dissipation of Oxygen from an Outward Leak of a Closed-Circuit Breathing Device. NIST Technical Note 1484, National Institute of Standards and Technology (2007) 4. Butler, K.M.: A Computational Model of an Outward Leak from a Closed-Circuit Breathing Device. J. Intl. Soc. Resp. Prot. 25, 53–65 (2008) 5. Bentley, R.A., Bostock, G.J., Longson, D.J., Roff, M.W.: Determination of the Quantitative Fit Factors of Various Types of Respiratory Protective Equipment. J. ISRP 2, 313–337 (1984) 6. Campbell, D.L., Noonan, G.P., Merinar, T.R., Stobbe, J.A.: Estimated Workplace Protection Factors for Positive-Pressure Self-Contained Breathing Apparatus. AIHA J. 55, 322– 329 (1994) 7. Myers, W.R., Hornung, R.W.: Evaluation of New In-Facepiece Sampling Procedures for Full and Half Facepieces. Ann. Occup. Hyg. 37, 151–166 (1993) 8. Crutchfield, C.D., Park, D.L.: Effect of Leak Location on Measured Respirator Fit. AIHA J. 58, 413–417 (1997) 9. Piccione, D., Moyer Jr., E.T.: Modeling the Interface Between a Respirator and the Human Face. SBIR I Final report under contract no. DAAL01-96-C-0077, Army Research Laboratory (1997) 10. De Greef, S.P., Claes, V.D., Mollemans, W., Suetens, P., Willems, G.: Large Scale in-vivo Caucasian Facial Soft Tissue Thickness Database for Craniofacial Reconstruc-tion. Forensic Sci. Intl. 159 S, S126–S146 (2006)
Hyperkalemia vs. Ischemia Effects in Fast or Unstable Pacing: A Cardiac Simulation Study Ioanna Chouvarda and Nicos Maglaveras Aristotle University, The Medical School, Lab of Medical Informatics, Box 323, 54124, Thessaloniki, Greece {ioanna,nicmag}@med.auth.gr
Abstract. The relation between potassium concentration elevation and action potential duration decrease is well established. While hyperkalemia is present in ischemia, the latter is also followed by other ionic changes, such as acidosis. In this work, hyperkalemic and ischemic changes in relation with various activation patterns in an inhomogeneous tissue were in focus and effort was paid to investigate spatial patterns and draw quantitative conclusions about their effects. A series of simulations were performed with different combinations of short stimulus periods and increased extracellular potassium concentrations. The effect of these perturbations on the cellular and overall tissue activation and wave propagation characteristics was investigated. Keywords: Ischemia, Potassium elevation, Luo-Rudy model, pacing, heterogeneous tissue.
1 Introduction Hyperkalemia, i.e. extracellular potassium elevation, effects on action potential duration, by decreasing it. Hyperkalemia depolarizes the membrane potentials of cells, which affect voltage-gated channels, towards channel inactivation and increased refractoriness. This might lead to the impairment of cardiac conduction which can result in ventricular fibrillation or asystole. Reduced cell excitability by elevated potassium has been reported in diabetic hearts, but also as a result of physical stress such as exercise. Furthermore, hyperkalemia may be induced by medication in chronic heart failure treatment. Myocardial ischemia, i.e. lack of oxygenated blood for the myocardium, leads to extracellular potassium accumulation. However, myocardial ischemia involves other mechanisms, besides hyperkalemia, including sodium and calcium conductivity and potassium concentrations. and therefore the are not the same. Both conditions, hyperkalemia and ischemia, alter the excitability and refractoriness characteristics of the tissue and are therefore considered as factors increasing vulnerability to the development of reentrant arrhythmias, a major cause of sudden cardiac death. The rate at which recurrent stimuli reach an arrhythmiogenic tissue also affects activation characteristics. The most prominent effect of decreasing the pacing period is shortening of APD, which is an important factor in arrhythmogenesis. Heterogeneity of APD and the resulting dispersion of refractoriness across a tissue wall could V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 492–501, 2009. © Springer-Verlag Berlin Heidelberg 2009
Hyperkalemia vs. Ischemia Effects in Fast or Unstable Pacing
493
provide the substrate for reentrant arrhythmias. Therefore, the interplay between these ionic conditions and pacing frequency needs further study. Taking into account the importance of these ionic and external conditions, the purpose of this work is to investigate the way that different conditions of spatial heterogeneity (functional and structural) interact and formulate the tissue activation and propagation characteristics. Simulated data have been used for this work. A rectangular sheet of cells was simulated with a grid. Regions of scar tissue were defined in order to introduce heterogeneity. Considering a rotating propagation wave, due to infracted tissue, we took a closer look at the model kinetics in the area of rotation. Experiments were repeated with different combinations of pacing and ionic parameters, i.e. varying hyperkalemia levels and ischemia parameters set to two stages of severity. For each grid point, a series of features were calculated for further analysis for each grid point reflecting different characteristics of the cellular activation. The behavior of the action potential and ionic current features in the structurally inhomogeneous simulated tissue areas is examined, under different levels of hyperkalemia/ischemia and pacing rates.
2 Methods 2.1 The Rationale and the Models of Ischemia and Hyperkalemia Luo-Rudy I model was used for ionic kinetics [1], including the following currents: “fast sodium current” (m-h-j gates), “slow inward current” (d-f gates, Ca), “time dependent potassium current” (X, Xi gates), time independent potassium current ( K1 gate), “plateau potassium current and background current. Hyperkalemia was expressed as variation of extracellular potassium [ K ]o between the normal value of 5.4mmol/l and an almost double value of 9.4mmol/l, after which activation did not always occur after stimulation. Ischemia, was modeled as a condition with increased extracellular potassium , and also acidosis [2], which was expressed as a decrease of sodium and calcium conductivity as expressed in equation (1). The third component of acute ischemia, hypoxia, was not taken into account.
g Na , acid = 0.75 ⋅ g Na gCa , acid = 0.75 ⋅ gCa
(1)
along with [ K ]o = 8.1mmol / l for moderate ischemia and [ K ] o = 9.4mmol / l for severe ischemia. Extracellular stimulus was a square pulse periodically applied. Furthermore, an intermittent burst pulse protocol was tested. A structurally heterogeneous tissue was considered, by means of regions of infarcted tissue without connectivity with the neighboring tissue. The main idea was to investigate the dispersion of functional heterogeneity in the neighborhood of the structurally inhomogeneous tissue, in relation to the ionic and stimulation parameters. Such a structural and functional inhomogeneity might indicate an area of increased vulnerability, under certain conditions.
494
I. Chouvarda and N. Maglaveras
Fig. 1. The simulated 2D cardiac tissue
2.2 Data and Experimental Setup In the simulation setup the 2D monodomain model of cardiac muscle was used. A rectangular sheet of cells was simulated with a grid of 40x200 elements, which corresponds to 40x50 cells, each cell consisting of 4 longitudinal elements (3 cytoplasmic and 1 junctional). Elements were coupled by resistances. Luo-Rudy I model has been used for ionic kinetics. Regions of scar tissue were defined in order to introduce heterogeneity (see Figure 1). Simulation of propagation was implemented using Gauss Seidel method. The grid intervals were 10 μm transversely and 25 μm longitudinally. A variable time step size approach was applied [3], where a maximum time step ΔTmax and a minimum ΔTmin was predefined, and each step size was calculated as Δt= ΔTmax/k, where k=k0+int|dV/dt|, varying according to potential’s time variation. Regarding the experimental setup, two regions of scar tissue were defined in order to introduce heterogeneity, causing wavefront to follow a zigzag pathway. Within an area of interest, around the rotation pivot point, activation currents and potentials were recorded and stored for further processing. Experiments were repeated with different combinations of pacing and ionic parameters. Hyperkalemia levels varied from normal (5.4) to almost 200% the normal level. For hyperkalemia/ischemia comparison, levels of 8.1 and 9.4 were considered as moderate and severe. Pacing conditions ranged from 105% to 125% of an action potential (399 - 475ms), regarded as fast and slow pacing, respectively. 2.3 Features and Analysis Methods Within the grid area of interest chosen for closer analysis, i.e. near the infracted area and the wavefront turning point, a series of features were calculated for each grid point, after a train of thirteen regular stimuli were delivered, so as to reach steady state conditions. The macroscopic features of the cells, which reflect different characteristics of the cellular activation and tissue depolarization and repolarisation, are the following:
Hyperkalemia vs. Ischemia Effects in Fast or Unstable Pacing
495
a) Action Potential Duration (%90 APD), which is crucial for the alterations in refractory period, blockades and arrhythmias , b) Action Potential Upstroke, as the maximum Action potential value, which along with c) Action Potential Upstroke velocity , i.e. maximum Action Potential slope, are related to the spatial properties of conduction, local impedance and excitability, and d) Minimum (negative) value of the ionic current, (MinIonCur), a basic morphological feature strongly affecting the membrane current and the excitability, known to be affected by heterogeneity, and specifically to decrease in the infarcted regions. The last two features are related. However, in discontinuous conduction, due to functional or structural obstacles, the association of action potential upstroke with the underlying excitation process is not as straightforward as it is during normal depolarization, where the maximal flow of inward Na+ current can be associated with the late phase of the action potential upstroke. The following feature analysis steps were performed: • Mean values for each feature were calculated for each run corresponding to a ischemia and pacing parameter set. Mean feature values show overall changes with each set of parameters. • Polar coordinated were calculated for the grid points, taking the tip of the barrier as polar center, around which wavefront rotation takes place. Thus, consecutive areas of propagation/rotation were defined, and mean feature values were calculated in each area, allowing for more detailed spatial comparisons.
Fig. 2. Mean feature values per pacing for all hyperkalemia levels. Up (Left: APD, Right: maxAP), Down (Left: maxAPslope, Right: minIonCur).
496
I. Chouvarda and N. Maglaveras
• Spatial correlation of each feature was calculated, applying formula (2), and the way spatial correlation was affected by altered parameters was investigated. Spatial correlation can reveal the degree to which synchronized propagation takes place. It is expected that a strongly disordered lattice would have a spatial correlation decaying with spatial delay l as a power law [4], while in complex patterns including ordered and disordered regions, slower decays are expected.
c n (l ) =
1 N
N
∑ xˆ
n
(l ) xˆ n (l + 1)
1
1 N
2
N
∑ xˆ
n
, xˆ n (l ) = x(l )− < x > n
(l )
(2)
1
l = spatial displacement, n = a given time
3 Results 3.1 Hyperkalemia and Pacing
Overall, in the simulations of different hyperkalemia and pacing levels, cells are characterized by reduced ADP, slower APD upstroke and upstroke velocity, and less negative ionic current peak. ADP is decreased with pacing frequency, but the relative effect of pacing is more evident in normal than in hyperkalemic tissue. There is also a decrease of APD upstroke and velocity, and ionic current peak with pacing frequency. In these features, unlike APD, the relative effect is more obvious in hyperkalemic than in normal cells. Some more detailed effects are depicted in Figure 2, indicating that relevance of the cell activation to pacing for normal cells is very different from that of hyperkalemic cells [5]. 3.2 Hyperkalemia vs. Ischemia and Pacing Patterns
Both in slow and in fast pacing protocol, activation and potential duration features have lower mean values in ischemia than in hyperkalemia, as depicted in Table 1. Table 1. Mean Values. Top K=8.1 1.25. bottom 8.1 1.05 Hyp: hyperkalemic, Isc ischemic. APD Hyp Isc Average 279.29 228.8 Slow Average 262.25 216.57 Fast % Rel 6.1011 5.3452 Dif
AP upstroke Hyp Isc 15.59 9.67
AP upstroke velocity Min Ionic Current Hyp Isc Hyp Isc 3.06 2.36 424.09 349.72
14.16
9.16
2.91
2.31
408.34
343.93
9.1725
5.2740
4.9019
2.1186
3.7138
1.6556
Hyperkalemia vs. Ischemia Effects in Fast or Unstable Pacing hyperkalemic, series of APDs
Ischemic, series of APDs
340
260
k8.1 slow k8.1 fast k9.4 slow k9.4 fast
320
497
240 220
300 200
280
180 160
260
140
k8.1 slow k8.1 fast k9.4 slow k9.4 fast
240 120
220 0
2
4
6
8
10
12
14
100 0
2
4
6
8
10
12
14
Fig. 3. APD series in different ionic and pacing setups. Left Hyperkalemia, Right Ischemia.
Considering the relative differences between the two pacing protocols under the same ionic conditions, depicted in the last row of Table 1, it seems that all the features are more affected (decreased on average) due to fast pacing in hyperkalemic rather than ischemic conditions. In fast steady pacing, after a short transition period, APD converges to its steady state value. The initial transitional effect is more apparent with fast pacing and severe hyperkalemic/ischemic conditions. Especially, in severe ischemia and fast pacing (figure 3, right , ), steady state conditions need more time to be achieved. In the hyperkalemia setup (figure 3, left), the APD for severe hyperkalemia and slow pacing (), eventually converges the APD of medium hyperkalemia and fast pacing (). In ischemia conditions, the matching is similar but not identical (figure 3, right). These observations show the complex interplay between ionic and pacing conditions. The second effect examined considers the unstable fast stimulus effect, or burst (premature) pacing; in burst pacing conditions, a series of nine periodic stimuli every 480 msec msec (slow) is applied, followed by a single stimulus at a shorter time, specifically at 399 msec. This is considered unstable pacing, as a new burst and transition is introduced after having reached stable conditions. (see Figure 4). mean APD at fast, slow and burst pacing conditions 300
250
burst10 steady 125
200
steady 105
150 8.1K HYP
9.4K HYP
8.1K ISC
9.4K ISC
Fig. 4. Burst APD in the 10th AP, compared with steady stimulus fast and slow pacing APDs
As seen in Figure 4, APD is decreased by fast pacing and it is also decreased by hyperkalemia and ischemia. In fast pacing, severity of ionic conditions suppresses the APD shortening induced by fast pacing (also supported by the findings in Table 1), thus in severe ischemia fast and slow pacing result in the almost same average APD.
498
I. Chouvarda and N. Maglaveras
In burst stimulus after slow pacing, the relative APD differences as compared to slow pacing conditions are similar. However, for severe ischemia and bust pacing, a substantial APD shortening takes place, unlike steady fast pacing or slow pacing, potentially constituting this condition more vulnerable to block. Min Ionic Current - Hyperkalemia-fast pacing
500
Min Ionic Current - Ischemia - Fast Pacing
400
400 300
300 200
200
100
100 0 15
0 15
30 20
25
30 20
25
20 25 15
20 25 15
Fig. 5. Minimum Ionic Current for fast pacing and severe hyperkalemia (left), severe ischemia (right)
3.3 Hyperkalemia vs. Ischemia : Spatial Inhomogeneity
Towards a deeper investigation, the spatial properties of the hyperkalemic and ischemic setups were analysed. In Figure 5, the minimum ionic current values are shown for each grid point of the area of interest, under conditions of fast pacing and severe hyperkalemia (left) and ischemia (right). In both cases there is a big discontinuity (abrupt increase of the current) in the area where the rotation starts, more evident in the ischemic scheme, but variations in the whole region are more evident in hyperkalemia. In order to assess the degree to which the ionic and pacing conditions in focus introduce spatial heterogeneity in terms of variation of the features under examination, potentially affecting arrhythmogenesis, we consider four regions with different rotation properties: a) in the first region near the stimulus, where propagation wavefront is planar, b) near the tip of the barrier where it starts rotation, c) near the barrier tip where it turns, and d) in the last area moving away from the rotation tip towards plain rotation again. Differences among feature values in different regions are analysed in the following and depicted in Figure 6. Regarding APD, AP upstroke and velocity, in the ischemic case rather than the hyperkalemic case, there are more relative spatial differences among the mean values in the rotation regions as compared to the initial planar wavefront region. The actual differences in APD are small. However in minIonCur feature the spatial differences are higher in hyperkalemia (See figure 6 left). In all cases, feature values are not restored in the last region (restored planar wave). Furthermore, standard deviation values are higher in hyperkalemia rather than in ischemia for all features (See figure 6 right). It is characteristic that in ischemic APD variation is decreasing in the rotation area, while Min ionic current variation is increasing in hyperkalemia.
Hyperkalemia vs. Ischemia Effects in Fast or Unstable Pacing APD, mean per area normalised to mean of initial planar area 1.0045 hyp fast hyp slow 1.004 isc fast isc slow 1.0035
499
APD, std per area 0.3
0.25
1.003 0.2
1.0025 1.002
0.15
1.0015 1.001
0.1
1.0005 1 initial planar
start rotation
end rotation
after rotation
AP upstroke, mean per area normalised to mean of initial planar area 1.7 hyp fast hyp slow 1.6 isc fast isc slow 1.5
0.05 initial planar
hyp fast hyp slow isc fast isc slow start rotation
end rotation
after rotation
AP Upstroke, std per area 3 2.5 2
1.4 1.5 1.3 1 1.2 0.5
1.1 1 initial planar
start rotation
end rotation
after rotation
P upstroke velocity, mean per area normalised to mean of initial planar a 1 hyp fast hyp slow 0.95 isc fast isc slow 0.9
0 initial planar
hyp fast hyp slow isc fast isc slow start rotation
end rotation
after rotation
AP Upstroke Velocity, std per area 0.6 0.55
hyp fast hyp slow isc fast isc slow
0.5
0.85
0.45
0.8 0.75
0.4
0.7 0.35 0.65 0.3
0.6 0.55 initial planar
start rotation
end rotation
after rotation
MinIonCurrent, mean per area normalised to mean of initial planar area 1.2 hyp fast hyp slow isc fast isc slow 1.15
0.25 initial planar
start rotation
end rotation
after rotation
MinIonCurrent, std per area 24 22
hyp fast hyp slow isc fast isc slow
20 18
1.1 16 14
1.05
12 1 initial planar
start rotation
end rotation
after rotation
10 initial planar
start rotation
end rotation
after rotation
Fig. 6. For each feature, (left) the mean values per area (before, start, end, after rotation) normalized to the mean value of the initial area. (right) standard deviation per area, K 8.1.
500
I. Chouvarda and N. Maglaveras spatial correlation between fibers, AP Upstroke
spatial correlation between fibers, MinIonCur
0.7
0.25 Hyp 8.1 Isc 8.1 Hyp 9.4 Isc 9.4
0.6 0.5
Hyp 8.1 Isc 8.1 Hyp 9.4 Isc 9.4
0.2
0.4
0.15
0.3 0.1
0.2 0.1
0.05 0 -0.1 1
1.5
2
2.5
3
3.5
4
0 1
1.5
2
2.5
3
3.5
4
Fig. 7. Feature spatial correlation between fibers 1 to 4 nodes apart
Besides the spatial differences among regions, the homogeneity in a local scale was investigated, in terms of spatial correlation analysis, using equation (2). The spatial correlation among neighboring fibers is depicted in Figure 7. For AP upstroke and velocity, the spatial correlation is lower in hyperkalemia than in ischemia, and the gradient relates with ionic conditions. In Min Ionic Current, the spatial correlation is low overall, however higher in hyperkalemia than ischemia. In all cases, the values fall with distance more smoothly than exponential law, indicating that there are overall complexities, and areas of different spatial correlation. The two methods presenting spatial variations of feature values give different results, since they refer to different geometry, polar and parallel lines respectively. When comparing the mean values and standard deviations of the different rotation areas, the wave rotation is taken into account, while spatial correlation of features in parallel fibers would reflect a planar propagation.
4 Discussion Action potential duration dispersion is governed by a wide variety of factors including short term rate dependence, long-term rate dependence(memory), ionic current heterogeneity, and electrotonic loading [6]. The mechanisms of rate-dependent phenomena in a heterogeneous and ischemic tissue are important for determining the heart's response to rapid and irregular pacing rates, be it an arrhythmia or an attempt for cardioversion. In this work we show that the changes in the spatial activation properties largely depend on a combination of ischemia and fast pacing, which may effect arrhtythmiogenesis. These factors interact in a nonlinear manner. The analysis shows that activation features are mostly affected by level of ischemia and secondarily by pacing frequency, the differences in the latter being dependent on ischemia level. In increased ischemia the effect of steady fact pacing is attenuated, although this does not happen in a premature stimulus. Both in ischemia and hyperkalemia, there are variations in all features in the rotation areas, but the spatial variation and excitability/repolarisation patterns they produce are different.
Hyperkalemia vs. Ischemia Effects in Fast or Unstable Pacing
501
These factors of increased heterogeneity were explored in order to investigate vulnerable states, and vulnerable regions, due to external, structural or functional conditions, as well as their combined role in excitability and propagation characteristics. In this scope, the investigation of the effect of spatial distribution of hyperkalemic and acidic or ischemic cells is considered as a necessary step.
References 1. Luo, C.H., Rudy, Y.: A dynamic model of the cardiac ventricular action potential, I: simulations of ionic currents and concentration changes. Circ. Res. 74, 1071–1096 (1994) 2. Shaw, R.M., Rudy, Y.: Electrophysiologic effects of acute myocardial ischemia: a theoretical study of altered cell excitability and action potential duration. Cardiovascular Research 35(2), 256–272 (1997) 3. Qu, Z., Garfinkel, A.: An Advanced Algorithm for Solving Partial Differential Equation in Cardiac Conduction. IEEE TBME 46(9), 1166–1168 (1999) 4. Vasconcelos, D.B., Lopes, S.R., Viana, R.L., Kurths, J.: Spatial recurrence plots. Physical Review 73, 056207-1–056207-10 (2006) 5. Chouvarda, I., Maglaveras, N.: The Role of Extracellular Potassium Concentration and Stimulus Period on the Functional Inhomogeneity of Cardiac Tissue: A Simulation Study. Computers in Cardiology 35, 585–588 (2008) 6. Hund, T.J., Rudy, Y.: Determinants of Excitability in Cardiac Myocytes: Mechanistic Investigation of Memory Effect. Biophysical Journal 79(6), 3095–3104 (2000)
Learning from Risk Assessment in Radiotherapy Enda F. Fallon1, Liam Chadwick1, and Wil van der Putten2 1
Centre for Occupational Health & Safety Engineering and Ergonomics, Industrial Engineering, College of Engineering and Informatics, National University of Ireland Galway, University Road, Galway, Ireland [email protected], [email protected] 2 Department of Medical Physics and Bioengineering, University Hospital Galway, Newcastle Road, Galway, Ireland [email protected]
Abstract. The lessons learned from completing a risk assessment of a radiotherapy information system in a public hospital are presented. A systems engineering perspective with respect to the risk assessment was adopted. Standard engineering tools modified for application in healthcare environments were applied, e.g. HFMEA™. It was found that there was a complete absence of the application of systems engineering at the development stage of the radiotherapy system, however aspects of quality systems, i.e. process improvement, were present at the operating stage. Team work played a significant role in the successful operation of the system. However, in contrast to most engineering systems, team composition was highly heterogeneous as roles were clearly defined by professional qualification. There were strong boundaries between the radiotherapy team and other teams in the hospital. This was reflected by their lack of concern regarding the availability of patient information beyond their own department. Keywords: Radiotherapy, Risk Assessment, Health Information Technology, Systems Engineering.
1 Introduction Radiotherapy plays an increasingly significant role in the treatment of cancer patients in Ireland today. Of the approximate 20,000 patients diagnosed with cancer in Ireland in 2000, 19% received radiotherapy as their primary treatment for cancer [1]. This low uptake is the rationale for the increased investment in radiotherapy in Ireland. In the public healthcare sector, the National Cancer Control Programme envisages that radiotherapy services will be provided in a network of four government owned, specialised centres for cancer care, each serving up to 1,000,000 patients. All cancer treatment will be concentrated in these centres which will have a critical mass in terms of patients presenting and expertise available to achieve ‘World Class’ patient outcomes. Currently, Ireland lags considerably behind the best performers in the OECD countries in this area, e.g. Finland, Canada. In contrast, up to ten years ago, radiotherapy treatment was only provided in 2 public locations; Dublin and Cork. Subsequently, one additional centre was opened in V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 502–511, 2009. © Springer-Verlag Berlin Heidelberg 2009
Learning from Risk Assessment in Radiotherapy
503
2005 and these public facilities were augmented by additional private units. Other cancer treatments continued to be provided in virtually all public hospitals. As has been pointed out above, the National Cancer Control Programme will concentrate all cancer treatment in a limited number of centres within which additional radiotherapy capacity is planned [1]. The development of radiotherapy centres in the public health system occurred in a piecemeal fashion. There was no systematic effort to replicate procedures and practices across the existing centres. Consequently, these earlier ‘systems’ were not integrated or coupled from a technology perspective and they did not utilise electronic patient record systems. Treatment machines (linear accelerators) were upgraded by replacing existing machines with the latest models and limited resources were expended in trying to develop the overall treatment system. In contrast, the most recent centre incorporates an Electronic Patient Record (EPR) system, complex computer modelling, simulation, 3-D treatment planning and an extensive record and verify capability. Despite the sophistication of the technology, no attempt was made to apply a systems engineering approach to the overall development of the radiotherapy centres. Similarly, there was little systematic consideration of risk or human and organisational issues in this regard. Radiotherapy treatment poses significant risks to patients if errors occur. There is a possibility that healthy organs could be irradiated or excessive doses of radiation could be given to targeted tissue and organs. Radiotherapy also has the possibility – due to faulty machine calibration for instance- to adversely affect thousands of patients e.g. calibration errors related to Cobalt-60 therapy units affected 426 patients in the US between 1974-1976, 250 patients in the UK in 1988 and 120 patients including 6 confirmed fatalities as a result of radiation overdose in Costa Rica in 1996 [2]. In either event, there is a significant risk of an adverse outcome, possibly for large numbers of patients. In systems engineering, a life-cycle approach is adopted towards the management of risk. Hazards are identified as early as possible in the process and their associated risks are determined. The standard approach is to utilise a combination of reliable and proven technology, redundancy, standard operating procedures and operator job aids to reduce the risk to a level which is acceptable. Risks are normally managed throughout the system life-cycle and the emphasis is on assuring the main stakeholders, including the general public, that the system is safe. It is almost impossible to effectively back-fit safety to systems; however, in order to ensure that risk is reduced in subsequent upgrades or refurbishments, it is critical to identify deficiencies in current systems and to learn from operating experiences of them. In this paper, the lessons learned from carrying out a risk assessment of a Radiotherapy system are presented. The system in question was commissioned in 20042005, and utilised the latest radiotherapy technology available at the time. It incorporated an EPR system and sophisticated computer-based modelling and simulation tools for treatment planning and specification. The system was not designed and developed using a systems’ engineering approach and risk was not proactively considered as part of the process. The system contained ‘islands of automation’ which were connected through a Local Area Network (LAN). Some parts of the system could communicate with one another through the EPR file, while others could only communicate by means of manual transfer of data. A series of standardised workarounds were used to
504
E.F. Fallon, L. Chadwick, and W. van der Putten
overcome these deficiencies. Successful performance of the system is highly dependent on the timely availability of patient data. This paper will firstly explore the notion of radiotherapy as an engineering system. Following this the role of risk assessment in healthcare is briefly outlined. Then the risk assessment process used in the case study is presented, and finally the lessons learned as a result of the risk assessment are discussed.
2 Radiotherapy as a System Although at first glance, errors in radiotherapy can be attributed to human failure, it has become abundantly clear that a large number of such adverse events in radiotherapy can be attributed to “system failures”. This is demonstrated well in the report published by Royal College of Radiologists 2008 “Towards Safer Radiotherapy”, which outlines many of the contributing factors related to incidents, adverse events and accidents in radiotherapy treatments. These include: • Hierarchical department structure • Changes in the treatment process • Over-reliance on automated procedures • Poor design and documentation of procedures • Poor communication and lack of teamwork • Lack of training and competence issues for complex treatment modalities. However, the report does not establish specific methods for mitigating these issues. One of the peculiar features of Radiotherapy is that it can be considered very similar to an engineering process system. In this it is quite unique in healthcare. The following attributes of engineering processes are also identifiable in radiotherapy: 1. Underpinned by physics 2. Predictable flows 3. Advanced technology 4. Sociotechnical systems 5. Quality and safety systems and standards. Radiotherapy is unique in medicine in that it is the one medical discipline which would be impossible to practice without the support of physics. The strong influence of this discipline on this aspect of health care has led to systems of work which rely heavily on those found in engineering. This is true especially as medical physics, as a branch of applied physics, can be considered closely related to engineering [3]. Radiotherapy, relying as it does on medical physics can thus be considered to be similar to engineering disciplines. The radiotherapy process has very predictable patient flows, which are based on a reasonably well defined and predictable demand. It is characterized by highly educated and well qualified staff, who operate and supervise the treatment process. In this they are supported by advanced technology systems comprising of, complex machines, interacting software and control systems. The modern radiotherapy treatment process includes a variety of sub-systems which cover all aspects of the treatment process, each with system critical functionality – EPR, image transfer and storage, treatment simulation and planning, treatment
Learning from Risk Assessment in Radiotherapy
505
administration and treatment dosage verification. This use of multiple computer systems has been driven by the need to reduce many of the familiar patient safety issues inherent in the operation of complex sociotechnical systems, such as reducing the potential for human errors, e.g. data checking, manual data entry. However, this adoption makes the understanding of the systems interactions and dependencies more opaque to the “sharp-end” users [4]. It also results in a more tightly-coupled treatment process where the risk of error propagation is increased and the opportunity of error recovery is dependent on the built-in process safety checks and the knowledge of the treatment staff. It places additional accountability for successful patient treatment onto the already highly pressured staff. Examples of reports where the computer systems were directly implicated in system error are Patton et al. (2003) and Barthelemy-Brichant (1999) [5, 6]. The majority of staff employed in radiotherapy are highly qualified professionals with unique skill-sets. From a sociotechnical systems perspective, in order for radiotherapy systems to operate effectively, the social aspects of work demand equal consideration as the technological aspects. The practical application of sociotechnical systems theory advocates teamwork based on self-directed semiautonomous teams. The organisation of teams in radiotherapy incorporates elements of such teams but also has aspects which are unique to healthcare. For example, the overall radiotherapy team is made up of sub-teams that are largely heterogeneous as they are based around professional groupings. While there are formal group meetings among the larger team unit there are strict boundaries around sub-teams tasks and roles. This is accentuated by regulatory requirements such as, treatment plans can only be signed-off by radiation oncologists and only medical physicists can calibrate treatment equipment. There is obviously a clear hierarchy between the radiation oncologists and other professional groups in terms of ultimate clinical responsibility and authority. Nevertheless, radiation oncologists may, “have little investment in organisational safety beyond their own individual actions” [7]. Wears et al. (2006) have characterised healthcare groups as social webs of sometimes competing or sometimes cooperating groups, describing the working relationships between nurses, physicians, pharmacists, technicians and administrators as multifaceted and tense [8]. It is clear that the social aspects of work in radiotherapy are complex and must be given due consideration to ensure an effective, efficient and ultimately safe system. External audits were introduced in the United Kingdom after the radiotherapy incident in Exeter (UK) in the 1980’s and formal Quality Assurance (QA) including ISO accreditation were introduced after the North Staffordshire incident in the mid 1990’s [9, 10]. In other countries similar steps were taken after incidents there (c.f. Zaragosa, Spain, Epinal, France, etc.) [11, 12]. The Therac 25 software incident in North America in the 1980’s resulted in the widespread recommendation of extensive testing and formal analysis of new software usage in radiotherapy equipment [2]. Radiotherapy departments in Ireland as a whole are not independently audited. Within the radiotherapy physics community in Ireland a process of audit related to radiation dose has been developed. This is following similar dose-audits in the UK which were initiated there after the Exeter incident. This audit takes the form of a round-robin sequence, where department A audits department B, who in turn audit department C etc. A similar process also is in place prior to the clinical use of a new treatment machine. The result has been that in Ireland so far, no serious dose errors have occurred.
506
E.F. Fallon, L. Chadwick, and W. van der Putten
Although radiotherapy is one of the safer treatment modalities for cancer, errors do occur and when they do they can cause serious injury and/or death to the patient. Each year 10 million people worldwide are diagnosed with cancer. Of these, 40-50% will receive radiotherapy treatment [13]. It is difficult to quantify the number of errors in radiotherapy with a value of 5% determined from one study [14]. Interestingly, this is comparable to reported error rates in radiology [15]. It is therefore not unreasonable to assume that errors in radiation treatment affect thousands of patients worldwide each year. This is exacerbated with the introduction of highly complicated treatments such as Intensity-Modulated Radiation Therapy (IMRT) and ImageGuided Radiation Therapy (IGRT). The regulatory nature of healthcare imposes additional requirements on the radiotherapy department workload. Several independent regulatory bodies require data and information to be reported related to patient treatments, incidents and adverse events, e.g. Irish Medicines Board requires reports on incidents and adverse events involving a medical device. The reporting mechanisms used by each of these bodies are unique, placing additional demands on radiotherapy staff. Also, the radiotherapy information system does not necessarily lend itself to the easy extraction of the necessary information, requiring manual data mining and report completion. However, this data and information does not form part of a formal program of continuous improvement in the treatment process.
3 Risk Assessment in Healthcare Systems The Institute of Medicine’s (IOM) “To Err is Human” report in 1999, reported as many as 98,000 fatalities per year resulting from medical errors and lapses in the U.S. [16]. This report resulted in a significant interest in the issues of patient safety and developing reliable healthcare systems. The requirement to reduce medical errors and inherent difficulties in the operation of healthcare systems has forced healthcare staff to turn to alternative industries and areas of systems development for established and proven analysis techniques. This has resulted in a growing interest in the transfer and application of engineering techniques to a variety of areas in healthcare, including medication administration, adverse event reporting systems, the design and use of new systems as well as new device development and product procurement. For example medical analysts have recently begun to apply techniques drawn from Human Factors (HF) engineering, reliability engineering and risk analysis in the medical domain. They have successfully applied a number of HF engineering techniques to the examination of issues relating to surgical techniques, medication administration, dental procedures and intravenous drug infusion equipment [17-20]. There is, however, little information available to healthcare professionals regarding the correct application of these techniques aside from the information available in the engineering domain. Government organizations, accreditation bodies and health services have not yet adopted many of the available techniques from conventional engineering disciplines into the medical domain. However, Root Cause Analysis (RCA) and Failure Modes and Effects Analysis (FMEA) are two analysis techniques which do appear to feature prominently in healthcare guidelines [21]. While RCA is typically applied as part of a
Learning from Risk Assessment in Radiotherapy
507
reactive post incident or adverse event analysis, FMEA is a proactive team based technique which is very comprehensive and exhaustive in nature. FMEA type analysis was recommended for use in the analysis of healthcare incidents and adverse events by the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) in Standard LD.5.2 as part of a proactive risk analysis of hospital systems [22]. The same standard also requires the proactive analysis of “at least one high risk process” each year by hospitals, as part of accreditation requirements [23]. The U.S. Veteran’s Administration developed the Healthcare Failure Mode and Effect Analysis (HFMEA™) technique in 1999 combining concepts from a number of existing techniques and industries [24] to meet the requirements for proactive risk assessment: • FMEA from the engineering industry • HACCP from the food safety industry • RCA adapted from the engineering industry The technique was developed specifically for use in the healthcare sector as a “systematic approach to identify product and process problems before they occur”. HFMEA™ is a team based analysis technique completed under the guidance of a facilitator. The technique considers all possible failures of a process and considers the consequences of such an occurrence. Failures are considered in terms of the categories “eliminate, control or accept”. Corrective actions are determined by the analysis team. The technique is now widely used throughout the U.S. as part of healthcare organizations fulfillment of their accreditation requirements. The HFMEA™ technique is extensive and comprehensive in its analysis as a result of its structured and multi-disciplinary team based nature. The authors are currently extending HFMEA™ by augmenting it with a modified Human Reliability Assessment (HRA) technique.
4 Example of Risk Assessment in Radiotherapy The authors have completed a risk assessment in a radiotherapy department in a public hospital. The purpose of the assessment was to determine: • The extent to which there is a risk to patient safety in the management of the patient medical record using both softcopy and hardcopy mediums. • The extent to which the current information systems support legislative, accreditation and audit standards and requirements, and also best practice. Significant effort was spent in collecting data in order to gain an understanding of the radiotherapy treatment process and to model the patient and information flow within it. This data was gathered from a combination of existing process flow charts, observations and interviews with key stakeholders (consultant oncologists, radiotherapists, medical physicists, nurses and administrators). The data was used to develop an IDEF0 model of the patient and information flow within the radiotherapy treatment process. The authors chose IDEF0 as the preferred model representation method because it has been used extensively to model flows in engineering systems, which it has been argued are similar to the radiation treatment process. They also had significant previous experience of using the method. Following the development of the IDEF0 model a hazard analysis was completed for those tasks that had interactions with or implications for either the softcopy or
508
E.F. Fallon, L. Chadwick, and W. van der Putten
hardcopy version of the patient file. A detailed risk assessment of the identified processes was then completed using the Irish Health Service Executive (HSE) Risk Assessment Tool [25]. This tool follows a standard format in which, a combination of frequency or likelihood of occurrence of each hazardous event and the severity associated with its outcome are considered. It considers each identified hazard process in terms of eight different risk categories: 1. Injury 2. Patient experience 3. Compliance with standards 4. Objective/Projects 5. Business Continuity 6. Adverse Publicity/Reputation 7. Financial Loss 8. Environment The tool defines five levels for the likelihood of occurrence of a hazard, i.e. rare/remote, unlikely, possible, likely and almost certain. Similarly, five levels of severity of occurrence are defined specifically for each of the eight risk categories given above, i.e. negligible, minor, moderate, major and extreme. The output of the analysis was a set of risk scores (1-25) for the eight risk categories for each process. A Total Risk Score was then calculated for every process by aggregating each of these eight risk category scores. Therefore, the maximum permissible score for any risk was 200. The results of this analysis were examined by a team consisting of: 1. 2. 3. 4. 5.
Risk Facilitator Clinical Nurse Manager Radiotherapy Lead Physicist Radiotherapy Services Manager Radiotherapy Administration Manager
The analysed processes were arranged in order of priority based on their Total Risk Score and a subset was selected for further analysis using a cut-off score. HFMEA™ was completed for the selected subset in order to gain a greater understanding of the consequences of their effects and to determine how to defend against potential adverse events associated with them. The participative nature of HFMEA™ is well suited to the team orientated operating environment of radiotherapy. The risk assessment has been successful in identifying critical tasks in the patient treatment process which have an impact on key information in the patient file.
5 Discussion and Conclusion In completing the risk assessment, a number of key lessons were learned. While it is clear that there are many similarities between radiotherapy systems and engineering systems, in particular with respect to technology and process flow, the human and organisational aspects of each differ greatly. There are multiple regulatory requirements which need to be met for the system to operate, however the emphasis is not necessarily on process improvement but on data collection and record keeping.
Learning from Risk Assessment in Radiotherapy
509
It was evident that a systems approach was not adopted in the development of the radiotherapy system analysed. It was found that a number of the machines in the system were not integrated from an information flow point of view. For example, the patient administration system does not link directly to the CT scanner and patient demographic data must be manually entered to create a file for the patient scan. This data already exists twice, once in the hospital-wide patient registration system and secondly in the radiotherapy patient file. Further support for this conclusion is provided by the approach to staffing levels employed in the system. In the main, these mirror the levels required by fully paper-based radiotherapy departments. In studying the organisation and human roles, there was no evidence that job design and other human factors methods were used in determining the relationship between users and technology in the more advanced radiotherapy system. The application of systems safety methods is not part of the current work practices. Proactive healthcare risk assessments have not been implemented in the hospital prior to the current work, with the exception of standard occupational health & safety risk assessment required by occupational health and safety law. Staff did not have any particular knowledge of risk assessment techniques; however they had some familiarity with quality standards and were able to relate this to the systems safety approach. In general, healthcare has only recently comprehended engineering approaches to risk and risk management. Significant efforts have been made to standardise the RCA and HFMEA™ approaches within healthcare, however there have been few initiatives with respect to adopting human reliability analysis techniques to determine the role of human error in incidents and adverse events. Due to the proliferation of advanced technology within the sector, there is a clear requirement for appropriate risk assessment methods which cater for aspects of human computer interaction (HCI) within healthcare processes. With respect to the similarities between radiotherapy and engineering systems it is clear that teamwork plays a significant role in both. However, discipline boundaries are more definitive in radiotherapy systems. This reflects the nature of the work environment in the healthcare domain, which is heavily regulated. In general the radiotherapy team consists of clinical oncologists, radiotherapists, medical physicists, nurses and administrative staff. Each professional has a clearly defined role to play and there is effectively no possibility for job rotation. It was interesting to note that a significant number of non-managerial staff had never been presented with an overview of the system similar to the one developed in this study using IDEF0. The departmental team boundary is very strong as evidenced by the low risk weightings given by staff to risk factors having an impact outside their immediate work area, e.g. Oncology. Team meetings are a key activity in radiotherapy operating systems. In the case analysed, planning meetings take place every Monday morning in which the patients scheduled for the week ahead are discussed. Particular issues that might need attention in the course of their work are highlighted, including adverse incident reports and any relevant alerts from the Irish Medicine Board (IMB) or an equivalent body. There is an established practice of guest speakers giving short presentations on topics of both general and specific interest to them at regular intervals. As the healthcare professions are evidenced based, there is a culture of openness to continuous improvement which is manifest by significant staff participation in these presentations.
510
E.F. Fallon, L. Chadwick, and W. van der Putten
Despite this culture of openness, the process of continuous improvement has not been formalised, either from outside or within the group. It is clear that existing excessive workloads affect levels of participation in these sessions and, in that context, any formal requirement for continuous improvement might be viewed as an imposition and be resisted. Regulatory requirements place significant demands on the radiotherapy department. Data and other information are required in different formats by the various relevant regulatory bodies e.g. Irish Medicine’s Board, State Claims Agency. Unfortunately, it is difficult to extract this data from the radiotherapy information system, because the requirement to do so was not clearly specified at the development stage. This places additional unnecessary workload on staff throughout the department. In addition, the reports do not necessarily provide information in a format that contributes to effective operations. Their primary objectives are to meet regulatory requirements rather than facilitate or support a process of continuous improvement. Adopting a systems engineering approach to the development of the radiotherapy system could help to overcome the development deficiencies identified during the risk assessment as presented above. It could also help to enhance operations through improvements in information flow, better integration of technology and systematic identification of risk to patient safety. Acknowledgements. The authors would like to acknowledge the cooperation of the radiotherapy staff involved in the risk assessment case study. HFMEA™ is a trademark of CCD Health Systems.
References 1. Hollywood, D.P.: The Development of Radiation Oncology services in Ireland. Department of Health and Children, Dublin (2003) 2. WHO: Radiotherapy Risk Profile. World Health Organization, Geneva (2008) 3. Ihde, D.: The Structure of Technology Knowledge. International Journal of Technology and Design Education 7, 73–79 (1997) 4. Graber, M.: The Safety of Computer-Based Medication Systems. Arch. Intern. Med. 164, 339–340 (2004) 5. Patton, G.A., Gaffney, D.K., Moeller, J.H.: Facilitation of radiotherapeutic error by computerized record and verify systems. International Journal of Radiation Oncology*Biology*Physics 56, 50–57 (2003) 6. Barthelemy-Brichant, N., Sabatier, J., Dewé, W., Albert, A., Deneufbourg, J.-M.: Evaluation of frequency and type of errors detected by a computerized record and verify system during radiation treatment. Radiotherapy and Oncology 53, 149–154 (1999) 7. Spear, M.E.: Ergonomics and human factors in health care settings. Annals of Emergency Medicine 40, 213–216 (2002) 8. Wears, R.L., Cook, R.I., Perry, S.J.: Automation, interaction, complexity, and failure: A case study. Reliability Engineering & System Safety 91, 1494–1501 (2006) 9. Aspley, S.J.: Implementation of ISO 9002 in cancer care. International Journal of Health Care Quality Assurance 9, 28–30 (1996)
Learning from Risk Assessment in Radiotherapy
511
10. The Royal College of Radiologists, Society and College of Radiographers, Institute of Physics and Engineering in Medicine, National Patient Safety Agency, British Institute of Radiology: Towards Safer Radiotherapy The Royal College of Radiologists, London (2008) 11. Nenot, J.C.: Radiation accidents: lessons learnt for future radiological protection. International Journal of Radiation Biology 73, 435–442 (1998) 12. Ash, D.: Lessons from Epinal. Clinical Oncology 19, 614–615 (2007) 13. International Atomic Energy Agency: International action for the protection of radiological patients. IAEA, Vienna (2002) 14. Yeung, T.K., Bortolotto, K., Cosby, S., Hoar, M., Lederer, E.: Quality assurance in radiotherapy: evaluation of errors and incidents recorded over a 10 year period. Radiotherapy and Oncology 74, 283–291 (2005) 15. Goddard, P., Leslie, A., Jones, A., Wakeley, C., Kabala, J.: Error in radiology. Br. J. Radiol. 74, 949–951 (2001) 16. Kohn, K.T., Corrigan, J.M., Donaldson, M.S. (eds.): To Err is Human: Building a Safer Health System. Institute of Medicine. National Academy Press, Washington (1999) 17. Tang, B., Hanna, G.B., Cuschieri, A.: Analysis of errors enacted by surgical trainees during skills training courses. Surgery 138, 14–20 (2005) 18. Tissot, E., Cornette, C., Demoly, P., Jacquet, M., Barale, F., Capellier, G.: Medication errors at the administration stage in an intensive care unit. Intensive Care Medicine 25, 353– 359 (1999) 19. Chadwick, L., Fallon, E.F.: Applying Human Error Identification in Dental Care. In: Pacholski, L.M., Trzcielinski, S. (eds.) Proceedings of the 11th Conference on Human Aspects of Advanced Manufacturing: Agility and Hybrid Automation. 4th International Conference ERGON_AXIA. IEA Press, Poznan University of Technology (2007) 20. Wetterneck, T.B., Skibinski, K.A., Roberts, T.L., Kleppin, S.M., Schroeder, M.E., Enloe, M., Rough, S.S., Hundt, A.S., Carayon, P.: Using failure mode and effects analysis to plan implementation of smart i.v. pump technology. American Journal of Health-System Pharmacy 63, 1528–1538 (2006) 21. Latino, R.J.: Optimizing FMEA and RCA efforts in health care. ASHRM 24, 21–27 (2004) 22. Stockwell, D.C., Slonim, A.D.: Quality and Safety in the Intensive Care Unit. J. Intensive Care Med. 21, 199–210 (2006) 23. Senders, J.W.: FMEA and RCA: the mantras of modern risk management. Qual. Saf. Health Care 13, 249–250 (2004) 24. Derosier, J., Stalhandske, E., Bagian, J.P., Nudell, T.: Using Health Care Failure Mode and Effect AnalysisTM: The VA National Center for Patient Safety’s Prospective Risk Analysis System. Journal of Quality Improvement 28, 248–267 (2002) 25. Hughes, S.: Your Service, Your Say The Policy and Procedures for the Management of Consumer Feedback to include Comments, Compliments and Complaints in the Health Service Executive (HSE). Health Service Executive, Dublin (2008)
Simulation-Based Discomfort Prediction of the Lower Limb Handicapped with Prosthesis in the Climbing Tasks Yan Fu, Shiqi Li, Mingqiang Yin, and Yueqing Bian School of Mechanical Science & Engineering, Huazhong University of Science & Technology, Wuhan, Hubei Province, 430074, China [email protected], [email protected], [email protected], [email protected]
Abstract. This paper generalizes a discomfort model for climbing tasks of the handicapped with prosthesis limb. The model is integrated, by ICT technology, into the simulated task scenario to indicate to what extent the climbing task causes discomfort and to analyze the direct cause of discomfort at the micromotion level. Furthermore, it can predict the potential harm and accidents in the climbing tasks by calculating the accumulated biomechanical results of joints posture displacement and torque. Meanwhile the research focuses on the analysis on each movement of around the prosthesis socket, which will provide analysis tool for the design of prosthesis from the point of comfort. Keywords: discomfort modeling, prosthesis socket, motion analysis, climbing tasks.
1 Introduction Investigations show that due to deficiency of non--obstacle construction in China, many lower limb handicapped with prosthesis are challenged with climbing tasks in daily life and work. There are many claims [1] of discomfort while evidence shows that discomfort is followed by lower limb fatigue, hurt around prosthesis socket and fall-over accidents. On the one hand, non-obstacle construction can basically solve the problem but it requires big physical and financial investment and long-term construction. On the other hand, ergonomic design of prosthesis and related training considering climbing tasks is equally important to solve the problem and alleviate the pain of the handicapped. The good design of a comfort prosthesis which alleviates the pain of the handicapped on the socket and earns a good market, is to be based on the comfort analysis in different task scenarios. There are several studies focused on the prosthesis design from the aspects of safety in different tasks [2, 3], But few were conducted in the ladder-climbing scenario. Digital data and tools developed in OpenGL for CAD/CAM [4, 5] are introduced into the design and manufacturing of the prosthesis socket to follow the principles: accurate measurement of the stump geometry, perfect close fitting of the prosthesis to the stump, good response to forces and mechanical stress, safety, and that each single area of the socket must have a tight connection to the stump anatomy without affecting blood circulation. No matter how digitalized the process, the starting point of customer-fit design largely depends on the discomfort/comfort analysis of the patients. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 512–520, 2009. © Springer-Verlag Berlin Heidelberg 2009
Simulation-Based Discomfort Prediction of the Lower Limb Handicapped
513
Discomfort is a subjective value, but it can be evaluated with other factors that are connected with the concept of suffering, which can be measured by some physical values like posture, force/torque and the interface pressure between the joint and prosthesis sockets. Zacher [6] built a strength based discomfort model for posture and movement with the aim of predicting discomfort for different anthropometries and tasks. The results can be implemented in an existing digital human model like RAMSIS. A posture and force dependant articulation discomfort model for static conditions was presented by Schäfer [7]. Little scientific research has been conducted to ergonomically evaluate the handling and climbing of ladders in working situations. To date, attention has been mainly given to accidents and the relatively high number of falls that are associated with ladder climbing [8, 9]. Furthermore, several studies have been aimed at the inclination angle of ladders in relation to climbing technique, safety, biomechanics [10, 11]. Bloswick and Chaffin [12] studied the role of rung separation and its interaction with user anthropometry. They demonstrated that rung separation was not an important factor in contributing to the slips/falls and low back hazard. In the research the comfort index is defined with the joint angle and the necessary steps from movement analysis to the calculation of inner joint moments are essential parameters in our approach of describing discomfort.
2 Experiment Ten lower limb handicapped workers participated in the study. The handicapped participants’ heights were within the range from 170 cm to 177cm. Their age and weight were 30±2 years, 60±5kg. The participants all had more than 3 years experience of working with the ladders. The handicapped wore the prosthesis on the lower limb of the same leg (left) for more than 2 years, during which the patients are functionally fit according to the clinic experiment. Each subject was asked to climb up and down a ladder of 30 cm rung separation, 2 m high. That is, the subjects were to climb 10 steps for each ascent and descent cycle. The ladder separation range was placed at an angle of 77, which was an average value found during the experiments outdoors. All participants gave informed consent prior to the study. The study had obtained ethical committee approval. Each subject was tested over two half days. During the first half day anthropometry and maximum articulation DOF during the climbing were recorded, and the subjects were made familiar with the climbing system, and were trained in the lateral gait climbing method. (In the lateral gait climbing method, the arm and leg on the same side of the body move at approximately the same time.). On the next morning, the subjects were to perform 20 ladder ascents/descents every 5 minutes. Between two tasks was 5 minutes rest to make sure that the subjects recovered from the fatigue. This period lasted 50 minutes. Three cameras were used to record the ascending and descending tasks. VICON, the motion capture system, was applied to capture the climbing gait. 21 markers were put on the thigh, knee and ankle joints for tracking. (See Fig. 1). During the climbing tasks, the participants had to indicate the level of perceived discomfort of ascending the ladder and descending the ladder on a 5-point scale in
514
Y. Fu et al.
Fig. 1. Distributions of Markers on Human Body
which 1 represents ‘very comfortable’ and 5 ‘very uncomfortable’ (3 is neutral). Every 3 cycles the subjects were asked to indicate the discomfort value for each step in the climbing. For the biomechanical analysis of ascending and descending the ladder, one rung of the ladder was instrumented with a 3D force transducer and the force signal was amplified to 100Hz and stored on a computer.
3 Discomfort Model In the research a discomfort model was developed to allow the prediction of the discomfort during the ladder climbing and climbing posture. In this model the discomfort index was correlated to the inner joint moments of hip, knee, and ankle, the muscle pressure around the stump. The model was made of three levels. The first level was a biodynamical model, in which the moment and joint inner pressure were the function of subject anthropometry, body link location, acceleration data and measured foot forces during the climbing activity. And the second level was a matching map (we called as biomechanics-discomfort spectrum) between the discomfort index and the biomechanical parameters, where the discomfort index is the function of the moments, pressure around the stump and the back compressive force. The second level was built up based on experiment statistics analysis and subject-based validation. The third level was the visualization of the discomfort zone on human body. A schematic frame of the biomechanical model input/output is shown on Fig. 2. There were several intermediate steps conducted in the Matlab. The matrix data acquired from the motion capture system were transformed from Cartesian coordinates to DH coordinates and then body link location and its change with the time. Regarding the inner moments, the human disabled part is simplified as rigid. The force velocity relationship of the resulting torques was taken into account by implementing a Hill type function as the following:
Simulation-Based Discomfort Prediction of the Lower Limb Handicapped
515
Input to Biomechanical Model Marker (Joint) Location Rung (Foot) Forces Subject Anthropometry Subject Discomfort Feeling
BIOMECHANICAL MODEL
Angular Velocity Joint Moment Inner Compressive Force on the Socket Surface
Simulated-based discomfort prediction
Co-relational analysis Discomfort -Biomechanics Spectrum Fig. 2. Simulation-based Discomfort Model
M =
(M 0 + a ) ⋅ b −a (ω + b )
(1)
Using this equation the maximum concentric torque M of a DOF of a joint is calculated at the actual angular velocity ω of the movement. This is done for all joints and their respective DOF with the constants a=72 Nm and b=13 rad/s. The maximum torques M of the subject was measured under static conditions. The reactive force at the socket was calculated as a function of the resultant forces at the distal end of that link, i.e. the measured foot force by the transducer. The function was defined by Equation 2.
F =W×P i
i
(2)
Fi can lead to the results of compressive force at the socket. W is the knee-above body weight. Pi is a five-dimension parameter, two of which are force dimension obtained from the transducer on the ladder rung and three of which are moment in three directions around the knee. The maximum value in the Fi matrix was regarded as the maximum compressive force on the surface between the stump and prosthesis.
516
Y. Fu et al.
Fig. 3. Digital Man Modeling
In the dimension of time, the parameters like joint moments and compressive force could be corresponded to different level of discomfort index obtained from the above experiment. As the following preliminary results showed, a correlation model between discomfort and biomechanical parameters could be set up through the above experiment on more subjects. Thus a discomfort/comfort index map (spectrum) was built up to predict the discomfort index for different posture in the ladder ascent and descent. To visualize the discomfort index map, simultaneous simulation of climbing was fulfilled on the platform of ENVISION (See Fig. 3). A digital human was modeled based on the kinematical structure and driven by simultaneous motion data from the motion capture system. The gait simulation allowed the analysis of the biomechanical behavior of the socket-limb interface during the subject’s climbing. To simulate the ladder-climbing task of the lower limb prosthesis wearer in ENVISION, two simulation templates should be set specifically for the handicapped first. One was the anthropometric data of the handicapped. Considering that the subjects were of similar anthropometric status, such that the height range was between 170cm and 177cm, the template of the anthropometric data was derived by statistical calculation. The other template, referring to the general characteristics of climbing tasks, was defined based on motion parameters of Subject C in the experiment, whose motions characteristics were most near to the average among all the 10 subjects. The motion of the digital was driven by simultaneous motion capturing. The biomechanical parameters like hip moment, knee moment, ankle moment, stump pressure was reflected in the task period by information augmentation. Thus the discomfort index, especially the comfort value change was synchronously presented in the simulated scenario, which provided a guideline for prosthesis and task designers. When the discomfort
Simulation-Based Discomfort Prediction of the Lower Limb Handicapped
517
status changed, there was corresponding color changes on the relevant human body. At this level, the 5-scale discomfort index was simplified as two scales: discomfort (4-5) and comfort (1-3). Color was used to indicate the upper limit and lower limit of the comfort zone. Information argumentation is applied to indicate the corresponding biomechanical parameters of the upper and lower limit.
4 Preliminary Results First and intermediate results of the simulation of the handicapped people’s climbing task were presented in angular velocity, torques and discomfort value. The angular velocity of the hip, ankle and knee all showed a typical bell shaped form, resulting from the acceleration of climbing height (See Fig. 4). There were two sharp changes on the curve of knee angular velocity, which was because 10 subjects bumped onto the ladders at the very beginning of ascent 15 times altogether during the experiment. That was quite in accordance with the survey [10] that the climbing danger and fallover usually occur at the beginning and end of ladder climbing. As Fig. 4 shows, the fluctuation of the curve was due to the knee extension and abduction. The detection was not accurate enough for the axial rotations in the thigh or the small angle changes of the ankle. Discomfort index of the stump-prosthesis interface was described in the time coordinate in which discomfort value was related to each step in the climbing (See Fig. 5). Joint moment change with time was reflected in Fig. 6. The maximum torque calculated from Hill’s equation (Eq. 1) is dependant on the velocity and the maximum isometric torque at the respective posture. Relative moment is the ratio between maximum torque and actual torque in the respective direction of motion in the knee joint. A curve is used to indicate the difference between actual torque and maximum torque, as an indication of comfort as well. Fig. 7 depicts the corresponding relationship of the prosthesis moment to the discomfort zone for each step during one cycle. In the figure, the comfort zone of the prosthesis socket related to each motion posture in the climbing is defined by the upper and lower limit line.
Fig. 4. Angle Velocity Curve in One Cycle
518
Y. Fu et al.
Fig. 5. Prosthesis Socket Discomfort Distribution in One Cycle
Fig. 6. Joint Moment Distribution in One Cycle
Simulation-Based Discomfort Prediction of the Lower Limb Handicapped
519
Fig. 7. Knee Moment Range within Comfort Zone in One Cycle
5 Discussions and Conclusion This paper generalizes a discomfort model for climbing tasks of the handicapped with lower prosthesis limb. The model is integrated, by ICT technology, into the simulated task scenario to indicate to what extent the climbing task causes discomfort on the stump-prosthesis interface. The experiment validated relationship between discomfort and parameters of joint moments and stump compressive force. Using 5-scale discomfort index, the biomechanical parameters were classified into discomfort zone, which was pro-quantitative approach to predict discomfort during the climbing tasks for the handicapped. Since the prosthesis implemented passive motion, the pressure derived from the knee moment change was used as the parameter to predict discomfort on the interface between the prosthesis and the stump as well. Further measurement of the inner compressive pressure around the socket is required to validate the equation approach in the analysis. More experiments of different time durations should be conducted. Or further comparisons between the real task situations and experiment tasks are useful. Nevertheless, the proposed model provides quantitative guidelines for the better ergonomic design of the prosthesis socket. Acknowledgments. This research was supported by Hubei Disabled Person’s Federation, China.
References 1. Zhang, M., Mak, A.F.T., Roberts, V.C.: Finite element modeling of a residual lower-limb in a prosthetic socket—a survey of the development in the first decade. J. Med. Eng. Phys. 20(5), 360–375 (1998)
520
Y. Fu et al.
2. Mak, A., Zhang, M.: Boone Da. State-of-the-art research in lower limb prosthetic biomechanics—socket interface: A review. Journal of Rehabilitation Research and Development 38(2), 161–174 (2001) 3. Zhang, M., Fan, Y.: Biomechanical Studies on Lower—Limb Prosthetic Socket Design. In: Recent Advances in Biomechanics, Beijing, pp. 134–138 (2001) 4. Reynolds, D.P., Lord, M.: Interface Load Analysis for Computer—Aided Design of BK Prosthetic Socket. Med. Bio. End. Computing 30(4), 419–426 (1992) 5. Zachariah, S.G., Sanders, J.E.: Interface mechanics in lower—limb external prothesis: a review infinite element models. IEEE Trand. Rehabil. Eng. 4(4), 288–302 (1996) 6. Zacher, I., Bubb, H.: Strength Based Discomfort Model of Posture and Movement. In: Proceedings of the SAE DHMS 2004, Rochester, 04DHM-66 (2004) 7. Schafer, P., Zacher, I.: On the way to autonomously moving manikins –empowered by discomfort feelings. In: Proceedings of the XVI. World Congress on Ergonomics of the IEA, Maastricht (2006) 8. Hammer, W., Schmalz, U.: Human behavior when climbing ladders with varying inclinations. Safety Sci. 15, 21–38 (1992) 9. Juptner, H.: Safety on ladders: an ergonomic design approach. Appl. Ergon. 7, 221–223 (1976) 10. Dewar, M.E.: Body movements in climbing a ladder. Ergonomics 20, 67–86 (1977) 11. Lee, Y.H., Cheng, C.K., Tsuang, Y.H.: Biomechanical analysis in ladder climbing: the effect of slant angle and climbing speed. In: Proceedings of the National Science Council, ROC, vol. 18, pp. 170–178 (1994) 12. Bloswick, D.S., Chaffin, D.B.: An ergonomic analysis of the ladder climbing activity. Int. J. Ind. Ergon. 6, 17–27 (1990)
Application of Human Modelling in Health Care Industry Lars Hanson1, Dan Högberg2, Daniel Lundström2, and Maria Wårell3 1
2
Department of Design Sciences, Lund University, SE-223 62, Lund, Sweden School of Technology and Society, University of Skövde, SE-541 28, Skövde, Sweden 3 ArjoHuntleigh R&D Center, SE-223 70, Lund, Sweden
Abstract. Digital human modelling (DHM) is commonly utilised for vehicle and workplace design in the automotive industry. More rarely are the tools applied in the health care industry, albeit having similar objectives for costefficiency and user-centred design processes. The paper illustrates how a DHM tool is modified and utilised to evaluate a bathing system design from caretakers’ and caregivers’ ergonomics point of view. Anthropometry, joint range of motion, description and appearance of the manikin was customised to meet the requirements in a health care setting. Furthermore, a preferred bathing posture was defined. A suggested DHM working process scenario illustrates that DHM tools can be customised, applied and useful in health care product design. Except technical customisations of the DHM tool, the development of a working process and work organisation around the tool is proposed for an effective and efficient use of digital human modelling. Keywords: Digital Human Modelling, Elderly, Ergonomics, Health Care, Human Factors.
1 Introduction Digital human modelling (DHM) tools, such as Jack, V5 Human and Ramsis, have been introduced in industry to facilitate a proactive and efficient consideration of ergonomics in the design process. Most of the tools are used in the fields of automotive engineering, aerospace engineering or industrial engineering and are applied in the design, modification, visualisation and analysis of human workplace layouts and/or human product interactions. Within the automotive industry, digital human modelling is commonly used in the vehicle design process for occupant packaging tasks, e.g. when analysing positions and adjustment ranges needed for primary controls such as seat, pedals and steering-wheel in order to accommodate the driver population of interest [1, 2] as well as in the virtual manufacturing process e.g. for analysing manual access and biomechanical loads on assembly workers [1, 3]. There are reports stating that automotive industry may receive cost and time benefits from using digital human modelling tools [1]. With equal objectives, the health care industry also strives towards cost-effective and user-centred design processes, and they have identified that their analysis questions often are similar to those in the automotive industry. For example, evaluating the position and adjustment range V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 521–530, 2009. © Springer-Verlag Berlin Heidelberg 2009
522
L. Hanson et al.
needed for an adjustable bathtub footrest for a caretaker (resident/patient) sitting in the bath can be compared to the evaluation of primary controls in an automotive setting. Similarly, the analysis of how the bathtub design affect the caregiver’s accessibility and biomechanical load when performing care related tasks is as relevant to perform as how the vehicle design affect an vehicle assembler’s tasks. Accordingly, the health care company ArjoHuntleigh, representing the health care industry in this case, has shown interest to utilise digital human modelling tools in their design processes, with an intent to employ a DHM working process in a similar manner as used in the automotive industry. To meet this objective, it is regarded that major differences between the health care sector and the automotive sector, such as product type and the age and physical characteristics of the users, can be handled by customisation of the digital human modelling tools. The paper illustrates a scenario where a digital human modelling tool is applied in a bathing system design setting for the evaluation of the design from caretakers’ and caregivers’ ergonomics point of view. The method section describes how the digital human modelling tool was customised to meet the requirements of a company that develops health care products, and the result section illustrates a suggested DHM working process in the context of health care product design.
2 Method The DHM tool Delmia V5 Human was used as a basis for conducting the customisations suggested to better fit the health care setting. In order to represent typical caretakers in the DHM tool, computer manikins that represent characteristics of the elderly were developed. The manikins’ anthropometrics and joint ranges of motion (ROM) were modified. Furthermore, a narrative description was assigned to the manikins and a more age corresponding appearances was applied. Finally, in the customisation, the preferred bathtub posture was defined and implemented in the tool. The digital human modelling working and documentation process recommended by Hanson et al. [4] was utilised. 2.1 Anthropometry The concept of a manikin family was utilised for both caregivers and caretakers, i.e. a limited set of computer manikins with different anthropometric proportions that are to represent the diversity in a population [5]. The manikin collection suggested by Speyer [6] was applied, where stature, proportion (ratio of the sitting height and length of the legs) and corpulence are used as key body dimensions. The other body dimensions are defined by the anthropometric functionality in the DHM tool itself, using existing correlation data to these three key dimensions. For caretakers, data for Elderly British Females in the age group 75+ was used [7]. For caregivers, data for Swedish adults was used [8]. This approach renders anthropometric data for 24 manikins (Table 1). Since data for waist circumference was not available for the selected user group, weight was used to signify corpulence. This assumption is believed to be adequate due to the relatively high correlation between the two dimensions [9].
Application of Human Modelling in Health Care Industry
523
Table 1. Key anthropometric data representing: Female (F), Male (M), Caretakers (CT) and Caregivers (CG) Caretakers Females Stature
Males
Weight
Sitting height
Stature
Weight
Sitting height
id #
%-ile
mm
%-ile
kg
%-ile
mm
id #
%-ile
mm
%-ile
kg
%-ile
mm
FCT1
6,3
1447
15,1
49
2,5
732
MCT1
5,7
1573
60,2
77
9,7
825
FCT2
6,6
1449
48,2
61
10,9
756
MCT2
50,7
1685
47,6
73
52,5
875
FCT3
7,8
1454
99,4
92
12,8
759
MCT3
96,4
1810
8,5
58
76,0
898
FCT4
4,6
1437
94,2
80
29,8
779
MCT4
94,0
1793
35,7
70
89,7
918
FCT5
50,8
1546
43,4
60
53,0
800
MCT5
1785
1651
38,0
58
91,2
842 MCT6 Caregivers
1795
85
88,0 98,8
915
95,1
95,1 81,6
93
FCT6
92,5 94,3
mm
954
id #
%-ile
mm
%-ile
kg
%-ile
mm
id #
%-ile
mm
%-ile
kg
%-ile
FCG1
6,3
1570
15,1
54
2,5
823
MCG1
5,7
1681
60,2
81
9,7
897
FCG2
6,6
1572
48,2
65
10,9
849
MCG2
50,7
1793
47,6
77
52,5
946
FCG3
7,8
1578
99,4
93
12,8
852
MCG3
96,4
1918
8,5
60
76,0
969
FCG4
4,6
1559
94,2
82
29,8
873
MCG4
94,0
1901
35,7
73
89,7
990
92,5 94,3
1893
95,1 81,6
100
1903
88,0 98,8
1025
FCG5
50,8
1675
43,4
63
53,0
895
MCG5
FCG6
95,1
1787
38,0
62
91,2
939
MCG6
90
986
2.2 Range of Motion In order to modify the caretaker manikins' joint ranges of motion (ROM) to more appropriately represent elderly people and the different mobility levels as defined in the Mobility Gallery developed at ArjoHuntleigh (described in section 2.3), ROM data for elderly, obtained from Older Adultdata [7], was entered in the DHM tool. Table 2 shows angular data for shoulder extension and flexion, as an example. As some residents, particularly Doris and Emma, have very little own activated mobility an additional "Supported" angular range was added, representing movements that may be achieved with the aid of a carer or an assisting product. The standard settings for caregiver manikins’ ROM data as defined in the DHM tool were utilised as this user group was considered having usual mobility. Table 2. Approach for defining range of motion data for caretakers Resident
Gender
Albert
Male Female Male Female Male Female Male Female Male Female
Barbara Carl Doris Emma
Shoulder Extension ROM Supported %-ile Angle %-ile Angle 50 38 60 41 50 49 60 52 20 29 30 32 20 38 30 42 10 24 25 31 10 32 25 40 5 20 15 27 5 28 15 36 1 12 10 24 1 19 10 32
Shoulder Flexion ROM Supported %-ile Angle %-ile Angle 50 160 60 163 50 169 60 171 20 151 30 154 20 161 30 164 10 146 25 153 10 157 25 163 5 142 15 149 5 154 15 160 1 134 10 146 1 148 10 157
524
L. Hanson et al.
2.3 DHM Personas As a complementary way to represent users' mobility capabilities, narrative descriptions of characteristic "user types" in terms of functional mobility levels, and to a degree also in terms of personality, was applied on the computer manikins. This resembles the general design methods user characters and personas [10, 11, 12], and more specifically the conceptual ideas by Högberg and Case [13] and Conradi and Alexander [14]. The health care company involved in this research has developed the Mobility Gallery, which is a communication tool basically structured according to five different levels of functional mobility [15]. In the gallery, residents are classified according to their degree of functional mobility, from the most mobile and independent to the most dependent and entirely bedridden resident. These five mobility levels are described and labelled using alphabetical names: Albert, Barbara, Carl, Doris and Emma. Each resident is described with different personal characteristics and background details and an illustration. Residents can represent a female or male regardless of their gender specific name. This approach has proven successful by supporting communication about product users' abilities and requirements, both within the company and with people outside the company. These descriptions are applied on the manikins as exemplified in Figure 1, conveying certain capacities and giving the manikins personality traits. The standard appearance of the manikin suits a caregiver, but differs quite largely to the look of a typical caretaker in this context. Hence, the caretaker manikins are given a more age-corresponding appearance, as an attempt to support understanding and empathy by the DHM tool user for the end users he or she is designing for.
Fig. 1. Albert and Barbara as DHM personas
2.4 Posture Prediction A number of photos, taken in side-view showing caretakers bathing, were used in order to roughly identify chosen bathing postures. A stick figure was drawn on the
Application of Human Modelling in Health Care Industry
525
Fig. 2. Bathing postures
picture to represent the human skeleton and joint angles were measured. Figure 2 illustrates different bathing postures.
3 Results – A DHM Health Care Scenario In order to illustrate the suggested approach of adapting and using DHM in a health care setting, this section demonstrates a fictive, but realistic design scenario. A project is initialised aimed at designing a new footrest to be integrated in a new bathtub design, complementing the health care company’s product portfolio. Attending the meeting are design engineers from the company and a DHM tool user. The project is structured and follows the human simulation process proposed and used in the automotive industry [4]. Their initial discussion starts with the design engineers describing the background: the new bathtub, the anthropometric trends, and their uncertainties about the appropriate footrest adjustment range as well as the high biomechanical loads on the caregiver adjusting the footrest. The DHM tool user takes notes and fills in the blanks in the prepared human simulation process protocol. The protocol is linked to a database that all involved in the project have access to. After the discussion, the project group formulates following objective for the analysis: “To describe an appropriate bathtub footrest position and adjustment range for typical caretakers, as well as a confirmation that the footrest is easily manually adjusted by typical caregivers.” They all agree upon this formulation. The DHM tool user adds his name and contact data as responsible for the analysis, and one of the design engineers fills in her data, as the requester of the analysis. When all have a common picture of the aim, the desired output is discussed. The design engineers are interested in illustrating differences and similarities with the proposed adjustment range and the current, both for their own bathtub and competitors’ design. The DHM tool user asks for national and company design standards to compare with. Plenty of standards and regulations exist within the health care industry. However, the designer is not sure if any is applicable on foot rest design, therefore design engineers ask the DHM user to utilise ergonomics evaluation methods integrated in
526
L. Hanson et al.
the DHM tool when analysing caregivers’ situation when adjusting the footrest. This request is added to the protocol to confirm the desire, and they agree that the analysis results shall be complemented with tables and pictures for illustrative purposes. Finally, the group discusses desired completion date. The agreed date, two weeks from today, is entered into the protocol. Having this information entered in the protocol concludes the first of three stages in the human simulation process. Before the DHM user leave, he gathers information required to perform the analysis properly, such as information about the targeted market segments and the physical environment as CAD geometry. The DHM tool user is now left alone and starts the analysis by searching the human simulation database on the intranet, hoping to find earlier similar studies to save time and gain from previous simulations. He uses keywords such as: bathtub, footrest position. Unfortunately, he gets few hits and all relate to vehicle design. This is not surprising since DHM is a relatively new tool within the health care industry. He continues the work by reading the market analysis report. In the report he finds that the new bathtub is planned to be sold as a high quality brand product and is aimed for Swedish people living in elderly service apartments. With this in mind, he starts to search for updated anthropometric data for mid-age and elderly Swedish people. He finds data for Swedish mid-age adults, but for elderly he need to use British data due to the lack of appropriate data for a Swedish elderly population. He uses the manikin collection suggested by Speyer [6] to represent anthropometric diversity. Based on this data he generates a set of totally 24 manikins, representing caretakers and caregivers (Table 1). In addition to these manikins, he creates manikins that are anthropometric replicas of the ones involved in development project (e.g. the design engineers). This to support discussions about the analysis results between the people involved in ordering and performing the analysis. To more accurately simulate elderly, the DHM user adjusts the joint ranges of motion according the specification (Table 2). Three levels of mobility are relevant for this bathtub (Albert, Barbara and Carl), resulting in two extra sets of caretaker manikins. The other two mobility levels (Doris and Emma) are recommended to use other products due to their mobile capacity. The appearances of the caretakers are adjusted to the company standard according to the mobility level (Figure 1). The default appearance and joint ranges of motion in the DHM tool were used for the caregivers. By using this approach, the DHM user has created a set of 48 manikins, in addition to the manikin replicas of the team members: 12 mid-aged caregiver manikins and 36 elderly caretaker manikins. In total representing the targeted users and stakeholders in a multidimensional design problem. The next step is to load the CAD geometry and the DHM user imports the bathtub and competitors’ adjustable footrests from a USB memory stick. He tries to keep a minimal physical environmental description since exchange between computer systems still is quite time consuming and conversion errors sometimes occur, especially when using CAD and DHM systems from different suppliers. A simple description of the physical environment encourages the viewer of the illustrations to focus on the ergonomics issues, in terms of posture and accommodation. Also, it indicates that the analysis has been performed at an early design stage with relatively rough assumptions, and that design corrections are both possible and welcome if it improves the design.
Application of Human Modelling in Health Care Industry
527
The next step in the human simulation process is to define the tasks performed by the caretaker and the caregiver. The DHM user performs a basic task analysis and defines the tasks in bullet form as follows. Caretaker • Head above water surface • Back attached against bathtub backrest plane • Buttocks attached against bathtub bottom plane Caregiver • Feet attached to floor • Right hand supporting caretaker’s feet • Left hand attached to footrest adjustment control • Eye contact with caretaker With the manikins, the physical geometry environment and the tasks defined, the DHM user starts to manipulate the manikins. He uses the preferred bathing posture as starting point for caretakers. The manikin is moved to the bathtub and manikin joints are manipulated only when needed to fulfil task description. This is done for all 36 manikins. He then uses the relaxed standing posture as a starting point for caregivers. The joint in the back, arms and neck are adjusted minimally to fulfil task criteria. After repeating this for the 12 caregiver manikins, a more detailed ergonomics analysis can begin.
Fig. 3. RULA analysis
The DHM user generates plots of the position of the sole of the foot for each manikin representing caretakers. As a benchmark, he inserts the adjustment range of the competitor product, and adds plots representing likely positions of the project group members. Joint comfort values for each manikin are automatically calculated by functionality in the DHM tool. For caretakers, the DHM user generates reach envelopes and calculates RULA scores [16] (exemplified in Figure 3). All illustrations, tables and comments are stored in the database.
528
L. Hanson et al.
With this information accessible, the DHM user arranges a meeting with the design engineers at the health care company to deliver the results. When at the health care company, they gather around a bathtub and open the project database. In the database they study the positions of the foot sole of the potential users. One of the design engineers enters the bathtub and they visually compare his preferred foot position with the positions suggested by the DHM tool. It seems to match quite well and the design engineer concludes that they will base the mock-up on the facts proposed by the DHM tool, which is a footrest that is less adjustable in length direction and more in height direction compared to competitors’ solution. One of the other design engineers imitates the caregiver, lifting his colleague’s legs and adjusts an imaginable footrest with a similar adjustment solution as the competitors use. She admits that the posture feels a bit awkward and can understand the high physical loads on the caretaker. The design engineers come to insight that they have to rethink about the adjustment principle used, in order to minimise workload on caregivers. The project group enter all comments made at the meeting in the database. The design engineers thanks the DHM user for a good job and write their name and date in the approval section in the human simulation protocol, which closes the digital human simulation and visualization mission.
4 Discussion The paper illustrates that digital human modelling tools can be customised, applied and useful in a health care design setting. In the theoretical scenario above, minor customisation to the DHM tool was made and it showed potential. However, more research, development and customisation of the tool are needed to fulfil health care industries’ requirements and desires. Areas for improvements are among others: Physical manikin properties. In the scenario, a pragmatic approach was used to consider diversity of anthropometrics and joint ranges of motion, and to keep the number of representative manikins reasonably low. Robinette and Hudson [17] describe the complexity of considering anthropometric diversity, particularity for multidimensional design problems, and Bittner [18] proposed the A-CADRE manikin family as an alternative solution to the manikin collection method used in the scenario. The approach suggested by Speyer [6] was in this case selected due to fewer numbers of manikins compared to A-CADRE. No methods were found for describing joint ranges of motion diversity in a similar manner. However, similar statistical methods as used for identifying anthropometric boundary manikins could possibly be utilised for joint ranges of motion variation. However, in this scenario a simpler method was used. A challenge is to develop a method where both anthropometric and range of motion diversity is considered. Such method may be of interest since both anthropometrics and joint ranges include great variation and effect human behaviour. Posture prediction. In the scenario, the preferred bathing posture was taken from studies of a limited number of pictures of subjects sitting in a bath tube without water. The vehicle industry has performed several studies of preferred postures, e.g. Krist [19] and Hanson [20]. A similar set-up could be used to find preferred bathtub postures filled with water. Furthermore, in the scenario, manual fine-tuning from
Application of Human Modelling in Health Care Industry
529
preferred posture was made to fulfil task criteria. In an industry setting, manual manipulation of joints is not a sustainable alternative. The fine-tuning work is timeconsuming, and the posture results may vary both within and between DHM tool users, as identified in a study by Lämkull et al. [21]. A process that automatically would find the preferred posture fulfilling the task criteria for each family member is desired. Such process would speed up the simulation process and make the DHM use more robust by leading to same results every time. Appearance. The appearance of the manikins influences subjects when they evaluate ergonomics from pictures [22]. Digital human modelling is today frequently used as a visualisation tool [23]. Therefore, manikin appearance is of major importance. In the scenario, manikin’s clothing, face and hair was colour customised. An improvement would be to be able to customise the shape and skin of the face, replicating the way it typically changes with age. Furthermore, personas for caregivers could be developed, e.g. a strong, mid aged persona that loves to talk and look after the elderly, and a slim, weak, young persona that rather prefer to take a smoke break than look after the elderly. In the scenario, no customisation was made for caregivers. Standard appearance and range of motions etc. were used. However, to minimize subjective effects when evaluating human-product and human-workplace interaction, a combination of visualisation and objective ergonomics assessment methods is recommended. Except technical customisations of the DHM tool, the development of a company specific working process and work organisation around the tool is proposed for an effective and efficient use of digital human modelling. These issues were not considered in the scenario, where a generic work process was used. The DHM use process is preferably seen as a natural part of the general design process used at the company; a design process that is upgraded with new ergonomics evaluation gates to emphasize for the organisation that DHM now is a standard tool for virtual ergonomics evaluation in early design process phases.
Acknowledgements This paper is a result of the research project "Visualisering av brukarkaraktäristik vid produktutveckling inom fordons- och hälsoindustrin" which is carried out within Virtual Ergonomics Centre (www.vec.se) and is financially supported by the Knowledge Foundation (KK-stiftelsen) in Sweden under the grant no. 2007/0137 and by the participating organisations. This support is gratefully acknowledged.
References 1. Chaffin, D.B.: Digital human modeling for vehicle and workplace design. Warrendale, Society of Automotive Engineers (2001) 2. Hanson, L., Högberg, D., Nåbo, A.: DHM in Automotive Product Applications. In: Duffy, V.G. (ed.) Handbook of Digital Human Modeling: Research for Applied Ergonomics and Human Factors Engineering. Taylor & Francis, CRC Press (2008)
530
L. Hanson et al.
3. Lämkull, D., Örtengren, R., Malmsköld, L.: DHM in Automotive Manufacturing Applications. In: Duffy, V.G. (ed.) Handbook of Digital Human Modeling: Research for Applied Ergonomics and Human Factors Engineering. Taylor & Francis, CRC Press (2008) 4. Hanson, L., Blomé, M., Dukic, T., Högberg, D.: Guide and documentation system to support digital human modeling applications. International Journal of Industrial Ergonomics 36, 17–24 (2006) 5. Högberg, D., Case, K.: Predefined manikins to support consideration of anthropometric diversity by product designers. In: Duffy, V.G. (ed.) HCII 2007 and DHM 2007. LNCS, vol. 4561, pp. 110–119. Springer, Heidelberg (2007) 6. Speyer, H.J.: Ramsis - Application guide: Test sample & task definition. Kaiserslautern, Human Solutions GmbH (2005) 7. Smith, S., Norris, B., Peebles, L.: Older adultdata - the handbook of measurements and capabilities of the older adult. London, UK, Department of Trade and Industry (2000) 8. Hanson, L., Sperling, L., Gard, G., Ipsen, S., Olivares Vergara, C.: Swedish anthropometrics for product and workplace design. Applied Ergonomics 40(4), 797–806 (2009) 9. Kroemer, K.H.E., Kroemer, H.B., Kroemer-Elbert, K.E.: Ergonomics: how to design for ease and efficiency. Upper Saddle River. Prentice Hall, Upper Saddle River (2001) 10. Nielsen, L.: From user to character - an investigation into user-descriptions in scenarios. In: DIS 2002, Designing Interactive Systems, London, pp. 99–104 (2002) 11. Pruitt, J., Grudin, J.: Personas: practice and theory. In: Conference on designing for user experiences, San Francisco, pp. 1–15. ACM Press, New York (2003) 12. Goodman, J., Langdon, P., Clarkson, P.J.: Formats for user data in inclusive design. In: Stephanidis, C. (ed.) HCI 2007. LNCS, vol. 4554, pp. 117–126. Springer, Heidelberg (2007) 13. Högberg, D., Case, K.: Manikin characters: user characters in human computer modelling. In: Bust, P.D. (ed.) Contemporary Ergonomics, pp. 499–503. Taylor & Francis, London (2006) 14. Conradi, J., Alexander, T.: Modeling personality traits for digital humans. Warrendale, Society of Automotive Engineers. SAE Technical Paper 2007-01-2507 (2007) 15. Arjo. ARJO Guidebook - for architects and planners. Eslöv, Sweden (2005) 16. McAtamney, L., Corlett, E.N.: RULA: a survey method for the investigation of workrelated upper limb disorders. Applied Ergonomics 24(2), 91–99 (1993) 17. Robinette, K.M., Hudson, J.A.: Anthropometry. In: Salvendy, G. (ed.) Handbook of human factors and ergonomics, 3rd edn., pp. 322–339. John Wiley & Sons, Hoboken (2006) 18. Bittner, A.C.: A-CADRE: Advanced family of manikins for workstation design. In: XIVth congress of IEA and 44th meeting of HFES, San Diego, pp. 774–777 (2000) 19. Krist, R.: Modellierung des Sitzkomforts: Eine experimentelle Studie (Modeling Sit Comfort:An Experimental Study), Katholischen Universität Eichstätt, Germany. Doctoral thesis (1994) (in German) 20. Hanson, L., Sperling, L., Akselsson, R.: Preferred car driving posture using 3-D information. International Journal of Vehicle Design 42(1–2), 154–169 (2006) 21. Lämkull, D., Hanson, L., Örtengren, R.: Uniformity in manikin posturing: A comparison between posture prediction and manual joint manipulation. International Journal of Human Factors Modelling and Simulation 1(2), 225–243 (2008) 22. Lämkull, D., Hanson, L., Örtengren, R.: The influence of virtual human model appearance on visual ergonomics posture evaluation. Applied Ergonomics 38(6), 713–722 (2007) 23. Sundin, A., Örtengren, R.: Digital human modeling for CAE applications. In: Salvendy, G. (ed.) Handbook of human factors and ergonomics, 3rd edn., pp. 1053–1078. John Wiley & Sons, Hoboken (2006)
A Simulation Approach to Understand the Viability of RFID Technology in Reducing Medication Dispensing Errors Esther Jun, Jonathan Lee, and Xiaobo Shi Purdue University, School of Industrial Engineering, 315 N Grant St, West Lafayette, IN 47907-2023 {eejun,jclee,shi0}@purdue.edu
Abstract. RFID technology has the potential to reduce medication dispensing errors in hospitals. To determine possible uses for tracking medication within a hospital, we interviewed a pharmacist with knowledge of such processes. Due to cost considerations, the most viable place to use RFID technology is to track medication upon leaving the pharmacy, which can help reduce lost or misplaced medication and ensure that the right medication is given to the right patient. A simulation model that compares the benefits with and without RFID is also discussed. Keywords: Medication dispensing errors, healthcare IT, RFID, simulation.
1 Introduction Medication errors in hospitals are commonplace, and dispensing errors made in the pharmacy can contribute significantly to these errors [1]. As a result, health information technology (IT) has the potential to reduce medical errors, increase patient safety, and improve the overall quality of healthcare [2, 3]. Earlier studies [1, 4] have investigated the use of bar code technology in medication administration. While Poon et al. [1] demonstrated a decrease in dispensing errors, they also noted that every unit dose (i.e., the smallest dose that can be administered) should be labeled with a bar code and all unit doses should be scanned to take full advantage of bar code technology. However, this is a cumbersome process and integrates poorly into clinician workflow. Other studies on Bar Code Medication Administration (BCMA) have also noted this problem, as well as unreliable scanning of bar codes [5]. Radio-Frequency Identification (RFID) has the potential to overcome these issues, where real-time and near-simultaneous information on multiple tagged items can be obtained. RFID has already shown some success in inventory tracking and management within supply chains [6-8], as well as tracking patients [9] and medical equipment within hospitals [10]. However, few studies have investigated the use of RFID technology on tracking medication within a hospital, specifically when medication leaves the pharmacy to when it is administered to the patient. Given the potential promise for RFID to integrate more efficiently into a hospital’s workflow, the goal of this V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 531–539, 2009. © Springer-Verlag Berlin Heidelberg 2009
532
E. Jun, J. Lee, and X. Shi
paper is to use simulation to understand the viability for a hospital to adopt RFID technology.
2 Introduction to RFID Technology RFID is an automatic identification technology that relies on radio waves to identify objects. The technology consists of tags and readers, where tags contain object information that are transmitted to readers; readers then transform the information into a format understandable by computers [7]. Tags can be read from several meters away and do not have to be in the line-of-sight of a reader for their signal to be read. This is a particular advantage over other automatic identification technologies, such as bar code, where a bar code scanner must be in the line-of-sight of the bar code label. It is also important to mention that RFID technology is normally used in tandem with an information system. 2.1 RFID Tags RFID tags can be active, passive, or semi-passive. Active tags use its own power source to broadcast its signal. Passive tags are not self-powered and rely on the electromagnetic waves of a reader to induce a current in the tag’s antenna. Semi-passive tags use a combination of its own power source and a reader’s waves to broadcast its signal. One major issue with RFID tags is cost – even the most inexpensive tags (which tend to be passive tags) cost approximately 50 cents each in very large quantities [7]. 2.2 RFID Readers RFID readers are the communication medium between RFID tags and an information system. An RFID tag transmits a signal, which is then transformed into a digital signal representing the Electronic Product Code (EPC) for the tag. The read range of a tag depends on the reader’s power and the frequency used to communicate [7]. RFID readers come in a variety of forms. RFID portals are primarily used at door frames or at one end of a conveyor belt. Tagged items go through the portal, which are then read by the portal. Meanwhile, handheld readers are mobile and can be used by a person to read tagged objects [11].
3 RFID and the Medication Dispensing Process 3.1 Overview of the Medication Dispensing Process To understand how RFID technology may be used for medication tracking, we first identified a typical pharmacy dispensing process and any associated problems that RFID may help remedy. To obtain this information, we interviewed a retail pharmacist with knowledge of hospital pharmacies. From our findings, we developed a flowchart of the pharmacy dispensing process, as well as any errors related to medication dispensing (Fig. 1).
A Simulation Approach to Understand the Viability of RFID Technology
533
Order dispensed to patient care units
Fig. 1. A pharmacy dispensing process within a hospital (denoted as boxes). Circles denote possible errors that correspond to a stage in the dispensing process.
There are essentially two major steps involved in the dispensing process within a pharmacy: filling and verifying. The first step, filling, is typically performed by a technician who retrieves the medication from the pharmacy inventory. The second step, verifying, is performed by the pharmacist who verifies the accuracy of the medication filled by the technician before dispensing the order to a patient care unit (PCU). At a PCU, the order is either stored in a patient-specific drawer or is immediately delivered to the patient. Once a patient needs his/her medication, the nurse then retrieves the medication and administers it to the patient. At each stage of the dispensing process, there is a possibility for error. A detailed description of dispensing errors is given in Table 1. RFID technology has the potential to reduce the majority of dispensing errors listed in Table 1 (see last column). For example, if all unit doses are tagged, it is possible to reduce dispensing errors when a technician gets and fills an order (e.g., wrong drug, wrong dose, wrong formulation, and expired drug). Given the relatively expensive cost for RFID tags, it may not be feasible to tag all unit doses in a pharmacy. However, the FDA hopes that pharmaceutical companies will RFID-tag all their products sometime in the future [12]. A less expensive approach would be to tag medication within a single order before it leaves the pharmacy. While this approach may not address dispensing errors when a technician gets and fills an order, it can address drug-handling errors once medication leaves a pharmacy. According to the pharmacist we interviewed, using RFID for this purpose will be particularly helpful – she expressed frustration about the relative frequency of lost or misplaced orders once prescriptions leave the pharmacy and before they are administered to the patient. With certain medications costing up to a couple thousand dollars per unit dose, she also expressed great concern with multiple orders being filled for a patient when only one dose is actually administered.
534
E. Jun, J. Lee, and X. Shi Table 1. Description of dispensing errors
Process Pharmacy receives order electronically Pharmacist processes order
Error Lost prescription Drug interaction Drug allergies
Technician gets and fills order
Wrong drug Wrong dose
Wrong formulation
Pharmacist checks order Order dispensed to patient care units Nurse gets and administers order
Expired drug Bad check Lost order Wrong location Wrong order
Description Orders are received electronically. Lost prescriptions may occur because of network connectivity issues. Pharmacist checks for any drug interactions that a patient may have with other currently prescribed medication. Pharmacist checks for any allergies that a patient may have to medication. The wrong medication is dispensed. Wrong dose (strength) of the correct medication is dispensed (e.g., 50 mg of metoprolol was ordered but 100 mg of metoprolol is dispensed). Wrong formulation of the correct medication and dose is dispensed (e.g., 50 mg of longacting metoprolol was ordered but 50 mg of short-acting metoprolol was dispensed). Expired medication is dispensed. Visual inspection by the pharmacist to verify the order before dispensing. An order dispensed by the pharmacy is lost and its whereabouts are unknown. An order dispensed by the pharmacy is sent to the wrong location. The wrong order is retrieved and administered to the wrong patient.
RFID
Y Y
Y
Y
Y Y Y
3.2 Integrating RFID into the Medication Dispensing Process Upon further investigation into the literature, we discovered an article about a German hospital that hopes to install a pilot RFID system to track medication from the time the hospital pharmacy dispenses an order to when the patient receives it. The hospital plans to use RFID in conjunction with its in-house automatic transport system that consists of a network of conveyor belts that link various medical units. However, at the time of publication, the hospital had not yet selected equipment (e.g., tags and readers) for their pilot study [13]. The proposed medication dispensing process with RFID is shown in Fig. 2. After processing and verifying the order and prior to dispensing an order, a pharmacist tags all unit doses in the order, as well as the container holding all doses. The pharmacist then places the order on the automatic transport system, and portal readers read the tags as the medication leaves the pharmacy. Information containing the exact pill count, the patient, and other pertinent details are transferred to an information system. As the medication arrives to a PCU, another portal reader reads the tags prior to a nurse unloading the order. The nurse may then place the medication in a patientspecific medication drawer, or immediately administer the medication to the patient.
A Simulation Approach to Understand the Viability of RFID Technology
535
P atient Care Unit
P harmac y
Fig. 2. Medication dispensing process with RFID; used in conjunction with an automatic transport system
Prior to administering the medication, the nurse uses a handheld scanner to doublecheck that the correct drug and dose are administered to the correct patient. To reduce the possibility of drug-handling errors, hospitals considering RFID must also consider how it may be integrated into its existing transport system, or whether a system similar to the German’s hospital may be needed. To simplify our modeling assumptions, we assume that a hospital already has some sort of automatic transport system (either conveyor belts or a tubing system similar to those at banks).
4 A Simulation Model for Medication Dispensing To demonstrate the effectiveness of an RFID tracking system, we simulated a hospital pharmacy serving 5 PCUs without RFID technology (Fig. 3) and with RFID technology (Fig. 4). In both simulations, the interarrival times for prescriptions arriving at a hospital pharmacy are modeled as an exponential distribution with an average of 2 minutes. This is based on the assumption that a typical pharmacy fills 5000 unit orders per week. There are 3 class types for the orders: low-value drug, medium-value drug, or high-value drug. Low-value drugs are prescribed 10% of the time at a cost of $0.02 per unit dose; medium-value drugs are prescribed 60% of the time at a cost of $20 per unit dose; high-value drugs are prescribed 30% of the time at a cost of $2000 per unit dose. The class distributions and costs are rough estimates provided by the pharmacist we interviewed. Once prescriptions arrive at a pharmacy, they are filled by one of 8 technicians and verified by one of 4 pharmacists. The time it takes to fill a prescription is modeled as an exponential distribution with an average of 10 minutes; verification is modeled as an exponential distribution with an average of 5 minutes. Once a prescription is filled and verified, it is sent to one of 5 PCUs with equal probability of 0.20. Delivery time is modeled as an exponential distribution with an average of 15 minutes. Finally, a nurse administers the order to a patient, and service time is modeled as an exponential distribution with an average of 20 minutes. It is important to note that the aforementioned probability distributions are educated guesses, and careful observation of hospital flow is required for more accurate models of arrival and service times. In the simulation without RFID, there is a possibility for lost prescriptions en route from the pharmacy to the PCU. If the prescription is lost, the original prescription is reordered at the pharmacy. The reorder is given priority over other prescriptions
536
E. Jun, J. Lee, and X. Shi
being filled for the first time, but it does not preempt any prescription that a technician or pharmacist is currently processing. To study the effects of lost prescriptions, the rate of loss is varied from 1% to 10% in increments of 1%. In addition to the total cost, the simulation tracks the number of lost prescriptions, the total value of lost prescriptions, and the average time from the moment the prescription is placed until it is delivered to the patient. All PCUs have 5 nurses available to administer orders to a patient. The results from 10 replications (one replication length is 1 year) are summarized in Table 2; it is clear that as loss rate increases the value of lost medication increases and average delivery time also increases. In the simulation with RFID, additional scanning steps are added and the possibility of lost orders is removed, i.e. RFID technology can eradicate drug-handling errors upon leaving the pharmacy to drug administration to a patient. Scanning time is modeled as an exponential distribution with an average of 0.5 min. The number of handheld scanners at each PCU is varied from 1 to 5 to see how it impacts delivery time. Since the nurse is required to scan an order and a patient to ensure that a prescription is delivered to the correct patient, the number of scanners can affect the delivery time, as a nurse will have to wait for a free scanner. The results from 10 replications (one replication length is 1 year) are summarized in Table 3. The RFID tracking and verification system can prevent the loss of prescription drugs as it travels from a hospital pharmacy to a patient. The verification process that RFID enables also ensures that a patient receives the correct order as prescribed by a physician. The magnitude of the savings will depend on the expense and frequency of the drugs being dispensed. An important parameter in implementing a tracking and verification system with RFID is the number of scanners to allocate to a PCU. Under-allocating handheld scanners delay the delivery of orders to a patient (e.g., 1 or 2 scanners). As a result, nurses have an incentive to circumvent delivery policy and deliver to a patient without verification. Therefore, it is important to allocate enough handheld scanners. For this particular example, 3 scanners appear sufficient, since there is little additional PCU Order sent to a PC U unit
Pharmac y Orders arrive to pharmac y
T ec hgets and fills order
Pharmacis t checks order
L os t Orders
Fig. 3. Simulation model without RFID
Nurse administers order
A Simulation Approach to Understand the Viability of RFID Technology
537
savings in delivery time with more than 3 scanners. However, the best number of scanners is dependent on the frequency of prescriptions that arrive to a PCU. A possible alternative is to place a scanner in every patient’s room, but as most PCUs have more patient rooms than nurses, this would be cost-prohibitive. Table 2. Simulation results without RFID
Prescription Loss Rate 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
Total Cost ($MM) 162.55 164.02 165.72 167.41 168.94 170.89 172.61 175.02 176.88 178.67
Prescriptions Lost 2649 5377 8147 10993 13830 16804 19839 22904 25986 29198
Value Lost ($MM) 1.62 3.30 5.00 6.72 8.44 10.30 12.11 14.03 15.91 17.84
Average Time from Prescription to Patient (min) 52.4 52.8 53.3 53.7 54.2 54.7 55.2 55.7 56.3 56.8
P CU Order sent to a PC U unit
R FID portal reads order
Nurse administers order
Pharmac y Orders arrive to pharmac y
T ec h gets and fills order
P harmac is t c hecks order
R FID portal reads order
Fig. 4. Simulation model with RFID Table 3. Simulation results with RFID. Average cost for all runs is $160.5 M. Number of Handheld Scanners at each PCU 1 2 3 4 5
Average Time from Prescription to Patient (min) 131629.2 1773.8 61.7 54.6 53.2
538
E. Jun, J. Lee, and X. Shi
5 Discussion Implementing the proposed RFID tracking system has significant cost. A hospital will have to purchase tags, portal readers, scanners, as well as a supporting information system. Furthermore, there will be changes in work processes, requiring training of the administration and staff, and these changes will also have an associated cost, including interruptions to the workflow. However, reducing the number of drughandling errors and increasing patient safety will certainly offset some of the tangible costs for installing the system. Furthermore, there may also be other technology initiatives (e.g., patient or medical equipment tracking) that make the RFID tracking and verification system more cost-effective by leveraging existing infrastructure of existing technology projects. If medication tracking is implemented in conjunction with existing RFID initiatives at a hospital, the cost can be shared. However, the design of such a combined system is specific to the physical layout of the hospital and the delivery system for prescriptions. Lastly, medication delivery is not always a closed system, as assumed in the simulation, and there will be cases of loss or misdirection, e.g., an order may be delivered to an incorrect PCU. In such a case, the information system can acknowledge incorrect delivery and initiate a process to get the order to the correct PCU. For example, the PCU can either forward the prescription to the correct PCU or return it to the hospital pharmacy where the forwarding process occurs. If the prescription is lost and cannot be located, the last known location of the prescription can be searched. Furthermore, the tracking system can also monitor for patterns where prescriptions are lost in the delivery process. These patterns can provide strategies for loss-prevention or modifications to the delivery process.
6 Discussion RFID technology has the potential to reduce medication dispensing errors and increase patient safety. To determine the viability for adopting RFID technology, we first determined a typical medication dispensing process and identified a key area where RFID can improve the process, specifically drug-handling from when an order leaves a pharmacy to the point when medication is administered to a patient. We then used simulation to understand the relative costs involved if RFID technology is not used in terms of loss order rate and cost. We also used simulation to understand the number of handheld scanners that are necessary to reduce delivery time if RFID technology is implemented. While the results are dependent on distribution parameters, we have identified a general simulation framework that can be used to determine the appropriate number of handheld scanners necessary.
References 1. Poon, E.G., Cina, J.L., Churchill, W., Patel, N., Featherstone, E., Rothschild, J.M., Keohane, C.A., Whittemore, A.D., Bates, D.W., Gandhi, T.K.: Medication Dispensing Errors and Potential Adverse Drug Events before and after Implementing Bar Code Technology in the Pharmacy. Annals of Internal. Medicine 145, 426–434 (2006)
A Simulation Approach to Understand the Viability of RFID Technology
539
2. Ash, J.S., Berg, M., Coiera, E.: Some unintended consequences of information technology in health care: the nature of patient care information system-related errors. Journal of the American Medical Informatics Association 11, 104–112 (2004) 3. Bates, D.W., Cohen, M., Leape, L.L., Overhage, J.M., Shabot, M.M., Sheridan, T.: Reducing the Frequency of Errors in Medicine Using Information Technology. Journal of the American Medical Informatics Association 8, 299 (2001) 4. Ragan, R., Bond, J., Major, K., Kingsford, T.I.M., Eidem, L., Garrelts, J.C.: Improved control of medication use with an integrated bar-code-packaging and distribution system. American Journal of Health-System Pharmacy 62, 1075–1079 (2005) 5. Patterson, E.S., Cook, R.I., Render, M.L.: Improving Patient Safety by Identifying Side Effects from Introducing Bar Coding in Medication Administration. Journal of the American Medical Informatics Association 9, 540–553 (2002) 6. Michael, K., McCathie, L.: The pros and cons of RFID in supply chain management. In: International Conference on Mobile Business, Sydney, Australia, pp. 623–629 (2005) 7. Angeles, R.: RFID Technologies: Supply-Chain Applications and Implementation Issues. Information Systems Management 22, 51–65 (2005) 8. Sellitto, C., Burgess, S., Hawking, P.: Information quality attributes associated with RFIDderived benefits in the retail supply chain. International Journal of Retail & Distribution Management 35, 69–87 (2007) 9. Wang, S.W., Chen, W.H., Ong, C.S., Liu, L., Chuang, Y.W.: RFID Application in Hospitals: A Case Study on a Demonstration RFID Project in a Taiwan Hospital. In: Proceedings of the 39th Hawaii International Conference on System Sciences, Koloa, HI, vol. 8, pp. 184a–184a (2006) 10. Fuhrer, P., Guinard, D.: Building a Smart Hospital using RFID technologies. In: 1st European Conference on eHealth, Fribourg, Switzerland, pp. 1–14 (2006) 11. RFIDSupplyChain.com: RFID Solutions for Supply Chain Management, vol. 2008 (2008) 12. Thompson, C.A.: Radio frequency tags for identifying legitimate drug products discussed by tech industry. American Journal of Health-System Pharmacy 61, 1430–1431 (2004) 13. Wessel, R.: German Hospital Expects RFID to Eradicate Drug Errors, vol. 2008 (2006)
Towards a Visual Representation of the Effects of Reduced Muscle Strength in Older Adults: New Insights and Applications for Design and Healthcare David Loudon and Alastair S. Macdonald School of Design, The Glasgow School of Art 167 Renfrew Street, Glasgow G3 6RQ, Scotland, UK {d.loudon,a.macdonald}@gsa.ac.uk
Abstract. This paper details the evaluation of human modelling software, which provides visual access to dynamic biomechanical data on older adult mobility to a new audience of professionals and lay people without training in biomechanics. An overview of the process of creating the visualisation software is provided, including a discussion of the benefits over existing approaches. The qualitative evaluation method, which included a series of interviews and focus groups held with older adults, and healthcare and design professionals, is discussed together with key findings. The findings are illustrated with examples of new dialogues about specific mobility issues impacting on healthcare and design planning which were facilitated by the data visualisations. Keywords: Virtual human software, data visualization, older adult mobility.
1 Introduction Biomechanics is the scientific study of the human in motion, examining the causes and consequences of different movements. It fuses an understanding of the mechanics of motion with the structural and functional anatomy of the musculoskeletal and neurological systems of the body, enabling a scientific analysis of the causes of movement problems to be undertaken and solutions for these problems proposed and tested for efficacy. However, the scientific data produced is complex and the biomechanics community have to date been unable to effectively communicate the results of biomechanical analysis to non-biomechanists, i.e. clinicians, practitioners and lay people. For many older adults certain routine tasks associated with activities of daily living (ADL) are difficult or painful to perform. Whereas biomechanists may have good insight into the causes of mobility problems (e.g. by studying the stresses on older people’s joints and muscles and how these change during tasks), the biomechanical data and analysis on which this knowledge depends is difficult to comprehend by the range of other disciplines involved in the care and rehabilitation of older people, such as physiotherapists, OTs, and orthopaedists, and also by designers of the built environment. Consequently, this may limit optimum rehabilitation, healthcare planning or design for older people. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 540–549, 2009. © Springer-Verlag Berlin Heidelberg 2009
Towards a Visual Representation of the Effects of Reduced Muscle Strength
541
2 Background 2.1 Data Capture and Difficulties in Communication New dynamic biomechanical data were captured from 84 older adults in the 60+, 70+ and 80+ year-old categories undertaking ADL, using a motion capture camera system, maximum strength measurements, and reaction force data measured using floor mounted measurement platforms [1]. The focus of the study was on collecting lower limb data, as risk of falls was identified as a major risk in everyday mobility tasks. A biomechanist would typically present the results of analysis by plotting 2D graphs of e.g. changes in the joint moments over time (Fig. 1). The comparison between different joints or different rotation directions in a limb would be achieved by overlaying of graphs or viewing them side by side. Analysis and interpretation of this form of data requires skill and training in biomechanics, making it unsuitable for communication to other professional disciplines or lay persons.
Fig. 1. 2D plots of selected biomechanical data for the sit-stand activity for a single joint in only one rotation direction. (a) Changes in the left knee joint momenti in flexion/extensionii during the sit-stand activity. (b) Variation in maximum strength for the left knee joint compared with the angle of the joint. Two graphs are needed as different muscle groups are in use in flexion and in extension, producing different maximum values.
542
D. Loudon and A.S. Macdonald
In addition, this format of analysis makes it difficult to accurately assess how demanding the tasks were for the older adult participants. This is particularly significant for older adults as peak values of the moments are more likely to be close to their maximum capability, creating instability at the joint. The maximum strength which can be provided at a joint is not constant but rather varies with factors such as joint angle and angular velocity. In order to truly assess the percentage of their maximum capacity which the older person is using would require the cross-referencing of several graphs (Fig. 1). This manner of presentation of data is static and is not viewed in its dynamic or relative context. This makes the sequencing of the movement in e.g. both hips and knees particularly difficult, or impossible, to fully comprehend, even by biomechanists. It is here that a dynamic and contextual mode of presentation, and the knowledge and expertise of the other disciplines would be invaluable. 2.2 Visualisation Method A software tool was developed, which generates a 3D animated human ‘stick figure’, on which the biomechanical demands of the ADLs are represented visually at the joints (represented as a percentage of maximum capability, using a continuous colour gradient from green at 0%, amber at 50%, through to red at 100%). The approach taken to visualising data was to minimise the viewer’s exposure to the complexity of the original data and the underlying calculations. Although the generation of the visualisation required the knowledge of the biomechanical moments, direction of forces and joint angles to calculate, the end result was expressed as a percentage value. The meaning of ‘functional demand’ i.e. how hard the muscles are working relative to their maximum capability, was therefore not conceptually difficult to understand. The hypothesis was that this mode of visualisation could enable complex data to be simplified without losing the validity of the original dataset.
Fig. 2. Colour coding of the biomechanical demands of activities
The advantages of this form of visualisation are immediately apparent in the animation stills (Fig. 3) where one can see the relationship between the motion and the dynamically changing ‘stress’ on the joints. With this mode of presentation of data one can easily compare different joints simultaneously (in contrast to Fig. 1), in context, and make immediate visual comparisons e.g. between different individuals doing the same activity or between an individual doing the same activity in different ways. This is not achievable using current modes of presentation of biomechanics data.
Towards a Visual Representation of the Effects of Reduced Muscle Strength
543
(a)
(b)
Fig. 3. Stills from animations: comparison of a 67 year old male with a history of back problems and of fractures rising from a chair in two different ways (a) using arm rests (b) without using arm rests
In Fig. 3 it can be seen that the arm rests provide the individual with a significant mechanical advantage, sharing the loading between the arms and the legs. When the participant was asked to perform the activity without using the arm rests the demands on the lower limb joints are close to their maximum capacity (indicated by ‘red’ in the second still from the left, Fig. 3b). It can also be seen that even with the use of the arm rests, close to the end of rising from the chair and on the beginning of the sitting movement the demands are high at the hip joints (indicated by ‘red’ third frame from the left, Fig. 3a). This indicates that any further deterioration in strength at the hip joints may cause problems for this individual, and may cause risk of falls on rising from chairs. An additional advantage of this visualisation technique is an ability to view animations from any angle or distance (Fig. 4) providing the opportunity to examine specific details of the data in contrast to the fixed viewpoint of existing video camera techniques.
Fig. 4. Changing the point of view of the visualisation
3 Evaluation Methodology The tool was evaluated through a qualitative methodology using interviews and focus groups. Interviews were held individually with older adults (N=18), and healthcare and design professionals (N=15). The older adult participants were recruited to match as closely as possible the cohort of individuals from whom the original data for the
544
D. Loudon and A.S. Macdonald
visualisations were obtained (i.e. both genders, distributed across the 60+, 70+ and 80+ year old age groups, with corresponding reported health-related conditions). The selected range of professions comprised clinical medicine, physiotherapy, occupational therapy, bioengineering, disability consultancy, engineering design, and interior design. The key themes explored in the interviews were: i) issues and concerns regarding the effect of ageing on mobility; and ii) any communication problems encountered between professionals and older people when discussing mobility. These informed and provided the context for the focus groups.
(a)
(b)
(c)
Fig. 5. Videos shown in focus groups: (a) comparison of an individual performing sit-stand activity with (left) and without (right) use of armrests; (b) comparison of an individual going up (left) and down (right) stairs; (c) comparison of three different individuals lifting an object from a high to a low shelf: (left) 74 year old female with no apparent problems – note knees and hips are shown green; (middle) 81 year old male, osteoarthritis of knees – note red and orange indicated at the knees; and (right) 67 year old male, history of back problems and history of fractures – note red and orange at hips.
Three focus groups (FG) were held to evaluate responses to the dynamic visualisations: FG1 comprising older adults; FG2 with a range of healthcare and design professionals; and FG3 with a mixture of older adults and professionals. The participants for the FGs were drawn from the group of individuals interviewed in the previous stage, and consisted of older adults (N=8) and a mix of healthcare and design professionals (N=8). In FG1 and FG2, the participants were shown a sequence of animations, without explanation prior to a facilitated semi-structured discussion to explore in detail: i)
Towards a Visual Representation of the Effects of Reduced Muscle Strength
545
their initial responses, understanding and interpretation of the visualisations; ii) the insights they provided; iii) potential applications; and iv) how the prototype might be improved. In FG3, the mixed group discussed the outcomes from FG1 and FG2 in more depth, and compared the professionals’ responses with those of the older adults. Each FG was videoed for later analysis, and participants asked to complete questionnaires to capture additional responses. It was clearly not desirable to present too large a range of animations within the short time limits of the FGs (over 900 animations were available: 84 individuals each with 11 different activities). The researchers decided therefore to limit the animations to a selection of those most effective at showing the variation of joint colour in relation to lower limb movements. The animations were selected to i) show an individual doing the same activity in two different ways (Fig 5 (a), Fig 5 (b)) and ii) the comparison of different individuals doing the same activity (Fig 5 (c)). To ensure consistency between the animations shown at FG1 and FG2, videos of the visualisations were taken using screen capture software.
4 Enabling New Dialogues From analysis of the discussions in the three focus groups, it is clear that new kinds of dialogues about biomechanical issues have emerged, facilitated by the tool. The most obvious and significant outcome is that these dialogues involved non-biomechanists, including those who would normally be regarded solely as the subjects in this area of research, i.e. older adults. The method of visualising and presenting the data clearly enabled people without training in biomechanics, both professionals and lay older people, to access and interpret the information. Each of the participants could interpret different details in - or offer different perspectives on - the visualisations, based on their background, experience or field of knowledge. Further, the common visual medium enabled the sharing of these different insights, without recourse to specialist terminology or knowledge. In this section, three examples of dialogues are provided to illustrate the nature of the discussions and the issues and insights they developed. In the extracts, the following coding is used: (P=physiotherapist; PS=physiotherapy student; B=bioengineer; Dr=doctor; Des=designer; DE=design engineer; DC=disability consultant; and OP=older participant). 4.1 Dialogues 1a and 1b: Detailed Analysis Generating New Insights The visualisations of three different individuals lifting an object from a high to a low shelf (Figure 5) provoked discussion about the causes of ‘stress’ on joints and how the problems might be alleviated. The following two dialogues from FG2 provide examples of detailed analysis between professionals from disparate disciplines e.g. designers and physiotherapists - a dialogue about real data which would not have been possible with conventional presentation formats of biomechanical data. 1a (analysis) DE: One's moving a lot quicker than the other one. Me as a boy, me as I am now. Des: The person on the right looks less agile. DE: They're bending in a very different motion. P: The left one's going down much further. The actual
546
D. Loudon and A.S. Macdonald
height drop is considerably more on the left, and the leg pattern's symmetrical on the left and asymmetrical on the right. B: The left figure appears to be better balanced in comparison to the right. P: I think the person on the right has to use a lot more trunk rotation in order to achieve what they're trying to achieve. (insight) DE: He's having to turn in a different way. I guess the assumption is that he's got an imbalance somewhere in his joints, so he's got to do things differently if he’s reaching with his left hand from the way he’d do them if he were reaching with his right hand. Which means one size doesn't fit all…one solution's not going to work for everybody. Or maybe not for the same person in two different positions. 1b (analysis) DC: High stress levels appear to be at the hips and the right knee. DE: Interestingly it's in the right knee as he stands up, as he straightens up again. B: It seems to fluctuate as well. P: In the right figure, there's almost no movement at the ankle at all. They're totally unable to use their ankles. They're having to compensate everywhere else. (insights) Dr: You get the impression they're saving their ankles and knees, but causing more pain at their hips. P: Trunk rotation as well to be able to achieve a reach. Dr: So there might be pre-existing problems in the knees and the ankles that are now causing new problems in the hip. 4.2 Dialogues 2a and 2b: Same Situation, Different Perspectives Returning to the discussion of the differences in the sit-stand activity with and without the use of arm rests (section 2.2, Fig. 3), the following two extracts illustrate the dialogues that emerged a) amongst older people in FG1 and b) between design and healthcare professionals in FG2. Although the same visualisations were shown, there were clear differences in emphasis and knowledge, yet all were equally valuable. 2a OP1: I've got two hip replacements and a knee replacement. So I was quite familiar with their movements. Right away I say, 'that's like me'. OP2: That’s my knee… I see myself getting up and down from the chair. OP3: That is one of the exercises we do at the cardiac rehab, sit to stand, and it's just square stools you sit on. You can press on your knees like that [indicates the movement], and rock forward and up and down. 2b Des: I think it's just really evident immediately why you would…give someone arms on chairs, without having to have some person try it, and see for themselves that this doesn't work. P: It's a very clear indication of what's a normal movement pattern, and what's an abnormal movement pattern. And the compensations that you make when you have a problem, say a knee or a hip, and how you have to compensate, both in the speed of the movement and quality of the movement. And how you have to compensate elsewhere in the body to still achieve the same goal. DC: I think the figure on the left hints at the cruciality of the height of the arms of the chair. DE: Thinking of the person on the right trying to stand up from a bus seat...with no arm rests by the way... is challenging, but I can't actually do anything about that because of the legal requirements.
Towards a Visual Representation of the Effects of Reduced Muscle Strength
547
4.3 Dialogues 3a and 3b: Older People Empowered to Share Experiences Those usually regarded as the ‘clients’, in this case older people, made a significant contribution to the understanding of the issues provoked by the data being presented more clearly and visually, dynamically and contextually. The following examples demonstrate the quality of discussion generated from the older adults regarding the visualisations of an older adult going up and down stairs (Fig. 5b). This provoked a discussion about how important handrails on stairs are to the everyday experience of older people: 3a OP1: In one, it's his knee and his hip turned red, but in the other figure, there was only one, the knee turned red. Could it be … that it only affects them going upstairs instead of down-stairs? OP2: Going up stairs it's essential to have a balustrade. You could not walk up stairs...well you can but with difficulty. OP3:…you don't always get a staircase with two banisters. Coming up from the train station today, the steps were wide, so you only had the one banister. OP4: I can also get down airplane steps backwards, which is much easier, provided there's a banister. In fact it's the only way I can do it, I can't do it any other way. In FG3, the older adults were able to raise this issue with the professionals present, using the visualisations to back up their experience. The comments from the older participants had equal prominence in the discussion, and they were able to engage with professionals in examining the issues. 3b OP1: I think the important thing to get across is, just like this, how valuable the handrail is…and it gives people confidence to go up stairs. Without a handrail you say ‘oh god, am I going to get up here’. The handrail's very important. OP2: We spoke about going upstairs holding on to both [hand-rails]...but that's not possible if you're using a walking stick, you know, you've got to hold the walking stick in one hand, and just use one hand to go up.
5 Conclusions Analysis of the discussions in the focus groups revealed new kinds of dialogues between older people and professionals about their experiences, based on real understanding of where the mobility problems are occurring. New dialogues also emerged between professionals from a range of different disciplines, crucial for different aspects of the care, wellbeing or design of the built environment for older people. Neither of these would have been possible using current conventions of presenting biomechanical data. The physiotherapist commented: “Many professionals find it difficult to talk in lay terms, and simple software like this would allow them to do that, and actually be understood.” and it was felt that the tool “clearly articulates for the health professional and the older person what is going on in the joints” and that it could provide a means to explain and encourage normal movement patterns “to limit, mitigate or overcome pain”. Both clinical and design professionals indicated that the objectivity of the visualisation method would complement or improve on the current and prevalent use of
548
D. Loudon and A.S. Macdonald
subjective judgment, intuitive skill, and trial and error, thereby allowing more accurate diagnosis in a clinical setting on the one hand, and presenting a sound rationale for design approaches on the other. All the perspectives represented in the focus groups, both lay and professional, are valuable, indeed essential to fully understand the mobility issues in older people with chronic enduring conditions brought about by age and/or illness. Up to this point in time there has not been the means for the key knowledge and insights of biomechanists to be shared in an understandable and meaningful way outside of their own profession. This mode of visualising and presenting biomechanical data has the potential to provide a significant new tool for: i) biomechanists, to easily communicate the concepts and principles of motion, and the physics and forces involved; ii) the wide range of other professionals whose decisions have profound consequences for healthcare planning, and design of e.g. furniture and the built environment; and last but not least, iii) for older people themselves, to enable them to enter a discussion about their everyday experiences and to be able to show exactly when and where something becomes painful, difficult, or affects their confidence in carrying out a particular ADL. This research has verified the hypothesis that using this form of human modeling software to represent the complex data in this particular visual format communicates essential biomechanical information across traditional disciplinary boundaries, and provides a common basis for the discussion of what the data might mean to both the older adult and the healthcare or design professional. Freeing the discussion from the conventional presentation of biomechanical data, its numerical form and scientific language, provides the opportunity to open a new dialogue between stakeholders in older adult mobility, generate new insights, and offer improved healthcare and design planning. Beyond this, the method has potential to be applied to a range of mobility conditions and rehabilitation challenges where the same generic issues pertain e.g. stroke, knee joint replacement, ankle foot orthoses, and falls prevention. Firstly, by providing a common platform for the full spectrum of knowledge specialisms to engage in the articulation and communication of biomechanical issues – knowledge which is currently disconnected and underexploited because of the lack of a common, accessible and understandable medium. Secondly, by empowering clients, whose personal experiences and insights remain largely unarticulated and isolated from specialists’ discussions: if their own understanding about the impact of biomechanical issues on their recovery and rehabilitation could be enhanced, this could potentially improve their motivation and adherence to therapeutic interventions. Acknowledgements. The software tool was developed as part of EPSRC EQUALfunded project GR/R26856/01, conducted jointly by the Bioengineering Unit at the University of Strathclyde in Glasgow, the Department of Psychology at the University of Strathclyde, the School of Health Sciences at Queen Margaret University College in Edinburgh, and The Glasgow School of Art. This research ‘Innovation in envisioning dynamic biomechanical data to inform healthcare and design guidelines and strategy’ has been funded by the cross-council New Dynamics of Ageing programme, grant No. RES-352-25-0005.
Towards a Visual Representation of the Effects of Reduced Muscle Strength
549
Glossary i biomechanical moments are the rotational forces at a joint generated by the action of external linear forces ii flexion/extension refers to the straightening/bending of a limb around the joint.
Reference 1. Macdonald, A.S., Loudon, D., Rowe, P.J., Samuel, D., Hood, V., Nicol, A.C., Grealy, M.A., Conway, B.: Towards a design tool for visualizing the functional demand placed on older adults by everyday living tasks. Universal Access in the Information Society 6(2), 137–144 (2007)
A Novel Approach to CT Scans’ Interpretation via Incorporation into a VR Human Model Sophia Sakellariou1, Vassilis Charissis2, Ben M. Ward3, David Chanock4, and Paul Anderson2 1
2
Aberdeen Royal Infirmary, Acute Medicine, Aberdeen, UK University of Glasgow/ Glasgow School of Art, Digital Design Studio, 10 Dumbreck Road, G41 5BW, Glasgow, UK 3 University of Edinburgh, UK 4 Ayr Hospital, Department of Radiology, Ayr, UK [email protected]
Abstract. This paper presents a novel approach for interpretation of Computerised-Tomography (CT) scans. The proposed system entails an automated transfer of selected CT scans onto a default Virtual Reality human model. Contemporary training requirements often are proven to be time-consuming for the clinical facilities which have to split unevenly their operational time between radiological examinations and the Radiologists’ training. Adhering to the contemporary training requirements we employed a plethora of VR and HumanComputer Interaction techniques in order to enable the trainees to familiarise themselves with the interpretation of such data and their actual, spatial correlation inside the human body. Overall the paper presents the challenges involved in the development of this method and examines the potential as well the drawbacks for deployment of such system in large scale teaching audience. Finally the paper discusses the results of an initial user-trial, which involved twelve trainee doctors, and offers a tentative plan of future work which aspires to customise the software for different learning levels. Keywords: HCI, CT scans, VR Human Model, Medical Training.
1 Introduction Contemporary technological advancements have enabled the health-related sciences to enter a new era, commonly described as the era of computer-aided medicine. In particular, the use of Computer Tomography (CT) plays a significant role in the understanding of human anatomy and pathology as the scrutinizing visual and volumetric analysis of the scanned data can offer a clear view of a patient’s condition. In the early stages of medical training however, analysis of CT imaging can be convoluted, as correct interpretation of the radiological images mainly relies upon a three dimensional (3D) internal understanding of anatomy that each user acquires through training. A plethora of randomized controlled studies have shown positive learning outcomes in basic undergraduate anatomy teaching with the use of 3D visualizations and novel user-interfaces in conjunction with existing radiological and diagnostic imaging [1]. Early samples of such combinatory approach can be found in the user interface V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 550–559, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Novel Approach to CT Scans’ Interpretation via Incorporation
551
designed for the Visible Human Explorer which allows navigation between coronal section overviews and axial image previews with the use of sliders that animate the cross-sections through the body [2]. Subsequent studies that additionally involved Virtual Reality (VR) and direct manipulation interfaces have demonstrated a positive impact in shortening the learning curve in medical training [3, 4, 5, 6]. A fusion of CT and VR has also been used in clinical applications where interventional radiologists utilised this combinatory approach for performing percutaneous controlled radiofrequency trigeminal rhizotomy (RFTR) assisted by a VR imaging technique for idiopathic Trigeminal Neuralgia [7]. Similarly Virtual Reality and Augmented Reality have been employed to enhance the CT understanding and support in real-time surgical operations and in surgical rehearsal and simulation [5, 8, 9]. Yet the potential role of enhanced visualization in radiological training is still a largely uncharted area. This paper presents the development process of a hybrid method which merges Virtual-Reality with contemporary CT scan planes. In turn the system under investigation has been evaluated through a comparative study in-between contemporary teaching methods and the proposed VR-CT. The paper will describe analytically the users’ reactions and performance results, offer a discussion of the preferences and feedback and conclude with future aspirations and plans for the continuation of this research to a level that would have a direct, positive impact on medical training.
2 CT Transfer Process into VR Environment Adhering to the above observation with regard to the contemporary CT interpretation teaching we endeavoured into the development of a methodology which would enable the direct transfer of CT scans to a VR environment. To this end an initial assertion of the data transfer and different process had to be planned in order to provide the project with a course of action map which is described in turn. The original CTs had to pass through a format alteration in order to minimise their size without though compromising their clarity. The reason of this action reflects the necessity of small files which could be easily transferred as texture maps onto transparent 3D planes. In the second step the projected images were forming a block of 3D planes which corresponded to the actual CT slices. This equally segmented block was then incorporated onto the VR human model corresponding to the depicted area (i.e. chest CT intersecting the chest area of the model). The accuracy of intersection is acceptable, if we consider the projection issues into a VR environment and the data manipulation with haptic devices. Although still under-development, the software follows precisely the borders of the major organs (i.e. lungs) by tracking the borders of the 3D lungs’ model with the employment a borders’ pattern analysis algorithm. It has to be noted that due the large number of CTs that typically depict a specific section of the human body, we opted for a selection of scans (i.e. one every ten images) which will still track the shape of the organ structures without though delaying the uploading process. Furthermore during the development period we received feedback from mini-trials which led us to the conclusion that the interface should facilitate a “hide” option which will make fully transparent the majority of CT scans and retain only a small number which might be more challenging to interpret or they have some specific/rare pathology illustrated.
552
S. Sakellariou et al.
Fig. 1. Workflow for the incorporation of CT scans into the VR human model
3 Interface Development for VR-CT By the completion of the workflow and the successful encounter of implementation issues we proceeded in the development of a Human-Computer Interface which would empower the user to navigate in real-time through the VR human model and investigate the spatial relations and the actual depicted CT data. In particular, we have developed a prototype HCI system in which it is possible to incorporate selectively individual CT scans which correlate perfectly with a “default” 3D human body. This enables the users to explore in the virtual environment the positioning of particular human body elements and improve their interpretation abilities by investigating the sectioning CT images. The interface development aimed to depict meaningful information that could enhance the learning process of the trainee doctors in a synthetic environment. Our focusing point was on the interface functionalities, which will enable trainees to mentally perceive the three-dimensional structure of the human body and navigate through, discovering and perfecting information acquisition and interpretation. These interface components present fresh opportunities for the portrayal of scanned data featuring an infinite selection of viewing positions.
A Novel Approach to CT Scans’ Interpretation via Incorporation
553
Fig. 2. (a) Screenshot of user during manipulation of the incorporated CTs into the VR human model; (b) Interface bars: human layers (top), default actions (bottom)
Such functionality offers the ability to the users to mentally triangulate the position of the anatomical structures and correlate them instantly to the CT depictions. Our attempt to directly apply publicly accepted interface components to the medical training environment was a challenging process as the icons had to be designed in accordance to the requirements of each section. To this end the icons were showing miniature representations of the layers as illustrated in the upper toolbar of Figure 3. In this particular case study the virtual interface was enriched with a CT icon which revealed four distinctive CT images in the thoracic area.
4 Experiment Rationale A comparative study was deemed essential in order to identify and evaluate the potential benefits and pitfalls of the VR-CT teaching versus traditional methods of teaching the interpretation of CTs. The VR environment utilized for this experiment was equipped with haptic-glove for “hands-on” interaction with the 3D human model as illustrated in figure 3. A sample of twelve foundation year doctors (FYs) were randomly selected and divided in two groups which were lectured with a VR method and the traditional routine
554
S. Sakellariou et al.
respectively. Their performance was measured through a series of pre and post teaching as presented below. A usability questionnaire was employed to capture their thoughts and feedback with regards to these two diverse teaching techniques. 1. Pre assess Likert (demographics etc) 2. The pre-assessment quiz (10mins) 3. 5 minutes with the CT activator sheet 4. Intervention (15mins tutorial - focus on the structures on the CT) 5. Post assessment (CT spot test) 6. Usability questionnaire (no limit) In this paper we are focusing our analysis on the pre and post-assessment results derived by Likert-scale questionnaires which can be indicative of the proposed systems positive aspects as well as of potential issues arising.
Fig. 3. Investigation of CT data through VR during the user-trials
5 Results The pre-assess Likert questionnaire aimed to establish that there were no significant differences between the two subgroups of users in relation to their familiarity with system technology, anatomy knowledge, training experience, exposure to pro-section and dissection teaching methods and previous focused CT interpretation practice. All twelve users had comparable experience of anatomy training, had graduated from UK medical schools, and had on average between 8 and 12 months exposure to surgical hands on training at “house officer” level. All users graded themselves moderately computer literate; none had previous experience in VR environments, and all only had minimal CT interpretation teaching. Furthermore, the pre-assess Likert study explored the users’ views on current anatomy teaching and their learning behaviours on interpreting 3D data for clinical use. Their responses were very similar, with no significant differences in p-values in any
A Novel Approach to CT Scans’ Interpretation via Incorporation
555
Fig. 4. Pre-test mean scores in all 12 pre-assess Likert questions
of the attitude determining questions (p>0.05). Notably, the trainees uniformly expressed the view that current anatomy teaching in undergraduate level and during clinical years is fragmented, limited and lacking in depth, whilst the teaching methods were described as time consuming and non-engaging. A confounding 80% of users found it difficult to construct a mental 3D map of the human anatomy from studying 2D models and thus application of their anatomy knowledge in a clinical interpretation scenario was graded as inadequate for their clinical needs. In similar numbers, 75% of users strongly agreed that further enhancement of their anatomy training will aid their clinical practice with the remaining 25% agreeing with the aforementioned, but not as strongly. The pre-test mean scores for all questions are presented diagrammatically in Figure 4. The pre-assessment quiz consisted of open questions relating to the anatomy involved in the study, namely the thorax as depicted on CT scans. It established the baseline knowledge of the users prior to any intervention and was compared with the post intervention assessment quiz. The aims of the exercise were two-fold: Identify the user’s prior pure anatomy knowledge and familiarity with CT scans and furthermore elucidate their ability to comprehend how anatomical structures relate to each other in a 3D environment.. The latter was assessed by questions referring to space relations of superimposed structures in 2D images (i.e. posterior/anterior). It is of interest to note that although the distribution of answers in regards to pure anatomy knowledge followed the normal distribution curve, with most users scoring adequately in those questions, the distribution was negatively skewed on questions relating to 3D relationships. Both groups improved on their scores after the teaching intervention, as expected, the traditional method group with a mean improvement in scores of 18%, whilst the VR method group with a mean percentage improvement of 22%. Scores were not significantly different between the two groups either pre-intervention (p=0.36) or post intervention (p=0.50). Despite both methods of training producing statistically comparable results overall, on single factor ANOVA, on analysing particular questions relating to the spatial relationship of structures, the VR group had a distinct advantage with 50% more correctly answered questions (p=0.007).
556
S. Sakellariou et al.
On concluding the experiment all users completed the usability questionnaire, where they were asked to grade their views about the educational approach. In total 25 statements were graded on a 6 point Likert scale. The views of the two groups were markedly different on almost all points with p values less than 0.001 on two sample t-testing. Positively phrased statements regarding the educational approach scored very highly in the VR group whilst scoring low in the traditional method group and vice versa for negatively phrased statements. The only points that did not exhibit such a highly significant difference related to the ease of use of the system and the familiarity with the interfaces. Notably even in these questions the response in the VR group was positive, with the majority of users finding the VR system more engaging, interesting and easy to use and more efficient in elucidating spatial inter-relationships of structures. Users preferred the VR system over traditional teaching methods and they were more inclined to recommend it to their peers (p<0.001 compared to the traditional method group responses).
Fig. 5. Post-assess Likert scores in 6 selected questions relevant to the 3D anatomical awareness and the CT interpretation
It is quite interesting to note the feedback provided by the users in six particular questions that were aiming to identify the potential impact of the VR explanatory method and interpretation of CT images against the traditional process. Evidently the VR method was highlighted as the favourite as Figure 5 illustrates. The results of these specific questions are further discussed in the following section.
6 Discussion The ongoing analysis of the derived evaluation data suggested that the incorporation of selective CT scans in a VR environment offers great flexibility on the reviewing process of the scans against the 3D volumes of the anatomical structures. However the existing interface did not offer a clear view of the sections that the scans slice. Such issues could be resolved however, in future developments of the system.
A Novel Approach to CT Scans’ Interpretation via Incorporation
557
Interestingly the presentation of the same 3D human model and the dissecting CT images were deemed considerably more helpful in the 3D environment of the development software presented in a PC monitor. Despite the positive conclusions drawn from the experimental analysis of the data there are a few confounders introducing bias that need to be considered given the small numbers involved in the study. As most users involved in the trial were volunteers a more positive response was expected with the VR educational approach as this is the most innovative and a novelty for trainees. The pre-Linkert questionnaire had already established their dislike of current educational methods and as such a more technologically advanced approach might have been positively received despite of its merits. In analysing the pre and post quizzes however, the VR approach proved significantly better in elucidating 3D structures, a fact unrelated to the trainees’ behaviour towards this approach. 6.1 VR Group In particular the VR group spent more time to investigate the structural data of the model and how these could be depicted in each section presented in a CT scan. This elaborate investigation offered a better understanding of the spatial correlation of the organs, muscular and skeletal structures illustrated in each CT. • The favourite interaction tool was the layer remover (as in every other case study that we evaluated). In this case-study it enabled the doctors to remove even the scans from their original positions, investigate them and reposition them back to the 3D human body. • The transparency tool offered an additional way to interact with the data and “clear-partially” whole sections which were obstructing the doctors from viewing the scans inside the human body. • Finally the slicing tool presented an alternative to both aforementioned tools in order to virtually slice the model in any possible angle or in alignment with the CT scans. 6.2 Contemporary Group The response of the contemporary-learning method group appeared on the antipode of the VR group, as the following selection of common reactions suggest. • The black and white “dry” representation of scans was again quite confusing for the FYs that went through the traditional teaching process. • The 2D depictions and illustrations offered a better solution as they presented specific parts of the anatomy. However due to the complexity of this particular structure a number of different in-depth layers could not be revealed simultaneously, resulting to a tortuous process of providing a plethora of different drawings presenting different angles of different sections. Notably the investigation of the whole lot of the CT scans (250 sections involved in this part of the body) could be significantly easier and more meaningful through the CT viewing program installed in hospital facilities. However it could still be a challenging process to interpret specific pathologies appearing in a CT, if a trainee does not have a considerable amount of familiarization time with the CT viewers. As such
558
S. Sakellariou et al.
stripping the CT scans from their original functionalities we were able to identify the ability of each user to illustrate in a 3D mental map the CT depictions, with and without the support of a VR training session.
7 Conclusions The initial analysis of the data suggests that the VR-CT representation could potentially clarify the spatial correlation between different structures that might be difficult to interpret directly in a CT scan. However it was obvious that the incorporation of the CT scan in a synthetic environment has to be enhanced with a number of other interactivity tools. Additionally the ultra-sensitive tracking devices of the VR space should be toned down in order to allow the users to investigate the CTs and the 3D model without the artefacts produced by the “trembling” of the physical movements of the user’s head and hand. Our tentative future plan of work aims to improve the performance of the system and potentially create an automated transfer of CT scans in a VR body in order to enable the medical trainees and practitioners to explore different pathological issues coming from a variety of patients. Such functionality might be extended in order to enhance also the surgical rehearsal process. Concluding, we aspire to incorporate additional scan data from different sources, demonstrate in the virtual environment the different types of information that could be derived and trial them out in all medical schools across Scotland incorporating in our study a much greater number of trainees and thus reducing any potential bias.
Acknowledgements The authors would like to express their gratitude to the staff of the Radiology Department of Ayr hospital in Scotland for the provision of CT images and their enthusiastic collaboration.
References 1. Nicolson, T., Chalk, C.: Can virtual reality improve anatomy education? A randomised controlled study of a computer-generated three-dimensional anatomical ear model, Medical Education 40, 1081–1087 (2006) 2. North, C., Shneiderman, B., Plaisant, C.: User Controlled Overviews of an Image Library: A Case Study of the Visible Human. In: Proceedings 1st ACM International Conference on Digital Libraries, pp. 74–82 (1996) 3. Sang-Hack, J., Bajcsy, R.: Learning Physical Activities in Immersive Virtual Environments. In: IEEE Proceedings of the International Conference on Computer Vision Systems, ICVS 2006, St. Johns University, Manhattan, New York City, USA (2006) 4. Pandey, P., Zimitat, C.: Medical Students’ Learning of Anatomy: Memorisation, Understanding and Visualization. In: Medical Education, vol. 41(1). Blackwell Science, Malden (2007)
A Novel Approach to CT Scans’ Interpretation via Incorporation
559
5. Ward, B.M., Charissis, V., Rowley, D., Anderson, P., Brady, L.: An Evaluation of Prototype VR Medical Training Environment: Applied Surgical Anatomy Training for Malignant Breast Disease. In: Proceedings of the 16th International Conference of Medicine Meets Virtual Reality, Long Beach, California, USA (2008) 6. Satava, R.M.: Medical Applications of Virtual Reality. The Journal of MedicalSystems 19 (1995) 7. Meng, F.-G., Wu, C.-Y., Liu, Y.-G., Liu, L.: Virtual Reality Imaging Technique in Percutaneous Radiofrequency Rhizotomy for Intractable Trigeminal Neuralgia, Technical Report appears in Elsevier (2008) 8. Soler, L., Nicolau, S., Schmid, J., Koehl, C., Marescaux, J., Pennec, X., Ayache, N.: Virtual reality and Augmented Reality in Digestive Surgery. In: Proceedings of the 3rd IEEE and ACM International Symposium on Mixed and Augmented Reality, ISMAR 2004, pp. 278–279 (2004); ISBN: 0-7695-2191-6 9. Eriksson, M., Dixon, M., Wikander, J.: A Haptic VR Milling Surgery Simulator-Using High-Resolution CT-Data. In: Proceedings of the 14th International Conference of Medicine Meets Virtual Reality, Long Beach, California, USA (2006) 10. Reitinger, B., Bornik, A., Beichel, R., Schmalstieg, D.: Liver Surgery Planning Using Virtual Reality, Virtual and Augmented Reality supported simulators. IEEE Computer Graphics and Applications (2006)
The Performance of BCMA-Aided Healthcare Service: Implementation Factors and Results Renran Tian1, Vincent G. Duffy1,2,4, Carol Birk3,5, Steve R. Abel3, and Kyle Hultgren5 1
School of Industrial Engineering, Purdue University, 315 G. Grant Street, 47907 West Lafayette, IN 2 School of Agricultural & Biological Engineering, Purdue University, 225 South University Street, 47907 West Lafayette, IN 3 School of Pharmacy and Pharmaceutical Sciences, Purdue University, 575 Stadium Mall Drive, 47907 West Lafayette, IN 4 Regenstrief Center for Healthcare Engineering, Purdue University, 203 Martin Jischke Drive, 47907 West Lafayette, IN 5 Pharmacy Technical Assistance Program, Purdue University, 6640 Intech Boulevard, 46278 Indianapolis, IN {Rtian,Duffy}@purdue.edu
Abstract. Bar Code Medication Administration (BCMA) system has been adopted by healthcare providers. Besides its benefits in reducing medication errors, cost and time, various side effects and new medication errors have been reported; however, the nature of the new problems has not been studied systematically. Although there are many studies focusing on IT implementation, very few studies addressing technology acceptance have been done in healthcare context. Due to the complex and dynamic features of the healthcare system, it is necessary to study how new technology acceptance models can be applied in this field. In this study, a model related to BCMA implementation will be constructed to enable the prediction and control of medication error reduction and side effects generation. To achieve that, the relationship between different implementation measures will be studied, and then predictive variables will be selected to construct the model for different measures. Keywords: BCMA, Healthcare, New Technology Acceptance, Medication Error.
1 Introduction Ever since the release of IOM (Institute of Medicine) report [1] at the beginning of the new century, improving patient safety by reducing medical errors started to catch public attention, and reducing the risk of adverse events in health care became a national priority. Adverse drug events (ADE), including preventable medication errors, are a leading cause of patient harm. According to the IOM report [1] and QUIC (Quality Interagency Coordination Task Force) reports [2], IT and different computerized healthcare systems can help to solve problems existing in the current healthcare environment, increase work efficiency, and improve patient safety. Since bar code V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 560–568, 2009. © Springer-Verlag Berlin Heidelberg 2009
The Performance of BCMA-Aided Healthcare Service
561
technology has been proved to be beneficial in many industries to improve work efficiency and reduce errors, BCMA systems have been constructed and applied, that relies on bar code technology to provide medication information and can dramatically reduce medication errors [3]. Although a great deal of evidence and research has been proposed about the benefits from IT implementation in healthcare including BCMA system [3], [4], [5], [6], [7], [8] and [9], problems and new medical errors have also been revealed and discussed [10], [11], [12], [13], [15]. Chaudhry [15] clearly described future directions for health care informatics research after reviewing many of the important publications, “more information is needed regarding the organizational change, workflow redesign, human factors, and project management issues involved with realizing benefits from health information technology.” More studies towards the implementation of new IT systems in the healthcare field are needed. And we need to investigate how the implementation is affected by various factors, what benefits can be expected, and what needs to be cared more to achieve best results. The BCMA system, which has been proven to greatly reduce medication errors and been relatively rarely and insufficiently studied, will be the system studied in this research. We will try to (1) develop the measure for success of BCMA implementation which includes both the reflection of work situation change like efficiency, quality, and side effects as well as the user satisfaction; (2) select predicting variables and model the relationship between the implementation success measures and these variables.
2 Background 2.1 Barcode Technology and BCMA System Bar codes are useful in applications because they are self-contained messages with data encoded in the widths and spaces in a printed pattern. From the year of 2002, the Food and Drug Administration issued Bar Code Label Requirement for Human Drug Products and Biological Products (21 CFR Parts 201, 606, and 610). According to this regulation, all drugs are required to be printed with National Drug Code by 2006. This regulation greatly increases the utilization of bar coding in the medication process. In the healthcare field, the basic idea is that bar coding can rapidly ensure that the drug at hand is actually the one needed, and the administration can be easily recorded about the provider, receiver, and time. The BCMA system is one successful and leading application of bar code technology in the medication process, which was developed in the Veterans Health Administration (VA). Johnson [3] described the composition of BCMA including Virtual Due List, PRN Effectiveness List, Medication Administration History, Patient Medication Log, Missing Does Requests, Medication Variance Log, and Medication Administration Error. 2.2 New Technology Acceptance According to Karsh and Holden [16] and Kukafka [17], the two most important models to explain IT usage are the Technology Acceptance Model (TAM) [18] and the Diffusion Theory.
562
R. Tian et al.
User’s perceived usefulness System features
User’s attitude toward the system
Use rate of the system
Affective reponse
Behavioral response
User’s perceived ease of use
External stimulus
Cognitive reponse
Fig. 1. Original Technology Acceptance Model by Davis [18]
Figure 1 shows the original TAM model proposed by Davis [18]. The TAM emphasizes that system design features and individual/person factors will together affect user attitude toward using the system via perceived usefulness and perceived ease of use, and then the attitude will influence the actual system use. In another study [19], researchers proposed that adding one Perceived Threat at Cognitive Response level in the model can create the Resistance to Change at Affective Response Level, and proved that the mechanism to combine resistance and intention together can predict actual IT system usage better. Another Diffusion Theory explains how individual difference and characteristics of the system can affect IT implementation from another view. According to the description by Kukafka [17], individual users can be categorized into five groups as: (1) innovators, (2) early adopters, (3) early majority, (4) late majority, and (5) laggards. And the characteristics of the system that will affect user’s adoption process includes: “(1) Compatibility which measures how the new system is perceived as being consistent with existing values, past experiences, and needs of potential adopters; (2) Complexity is how difficult the new system is perceived to understand and use; (3) Trialability is the degree to which the system may be experimented with on a limited basis; and (4) Observability is the degree to which the results of the system are visible to others”. 2.3 Characteristics for Healthcare Service The most important difference between healthcare work and other industries is the special work object, patient. Marsolek and Friesdorf [20] developed the patient-staffmachine-system, which is shown as figure 2. According to them, the treatment of a large number of different patients with many different diseases, as well as an individual health conditions, results in a much higher task complexity than in classical work systems, since for different patients with different diseases, different treatments as well as different work process are needed.
The Performance of BCMA-Aided Healthcare Service
563
Fig. 2. The classical man-machine-system vs. the patient-staff-machine-system for health care
Lorenzi and colleagues [21] talked about the reasons for the complex organizations in health care: (1) health services are provided by a wide range of institutions, ranging from major specialty hospitals to a complex of community hospitals, small clinics, and individual professionals; (2) public, not-for-profit, and volunteer organizations are often dominant in the health services arena; (3) professionals dominate in both the definition and the execution of the task; (4) the definition of the task and its objectives are in many cases very difficult to establish in advance; and (5) The health system is undergoing fundamental structural change in most places in the world, with many countries following quite different principles. Ash and associates [11] discussed two main categories of errors related to IT systems in health care work. The first category is about the process of entering and retrieving information since (1) healthcare work is mostly in a highly interruptive use context and (2) overemphasizing structured and complete information entry or retrieval will cause cognitive workload. The second category is about the communication and coordination process since healthcare work is not in a linear, clear-cut, and predictable workflow which requires on high flexibility and sufficient ability to deal with urgency and the transferring process. 2.4 Common Measures for IT Implementation in Healthcare Common measures used in new technology implementation studies include the intention to use and end-user satisfaction. Wakefield et al. [22] proposed a survey-based measure of clinical information systems expectations and experiences. They developed a survey instrument entitled I-SEE to assess clinical end users’ expectations about the full electronic health record and CPOE systems, and used the survey to assess what clinical end users’ actual experiences were with the CPOE and EHR systems after implementation. Comparison between the expectation and experience can reflect user satisfaction and work system improvement. Besides these subjective measures, there are some common objective measures used to judge the improvement of IT in healthcare, including different medication and medical errors and error rate, usage rate of care, time to provide and cost. Leonard and Sittig [23] proposed to construct a system of measures for IT implementation in healthcare including IT Cost measures, IT Use measures, and Health Performance measures.
564
R. Tian et al.
3 Conceptual Model 3.1 Factors for Healthcare Service Performance The model proposed by Shortell and colleagues [24] is focusing on how different factors affecting ICU (intensive care unit) performance. Its measures for performance actually include three categories of error rate, time spend and user satisfaction. The different factors clearly reflect different aspects of the healthcare work including environmental factors, working content, personnel, and interaction. An extension of these factors can be used to distinguish different work situation. 3.2 Procedure for IT Implementation in Healthcare Lorenzi and associates [21] proposed one four-stage iterative process for changes along IT implementation, which is affected by the personal and organizational issues. The procedure starts in the initial steady state where impetus for change exists; the organization firstly conceptualizes the outputs and then implements the changes; finally altered state comes into being. The new alter state will gradually form the new steady state to start the next loop of changes. 3.3 Conceptual Model for This Study The conceptual model for this study is shown in figure 3. The figure is divided into three parts by two horizontal bold black lines. The whole model follows the middle part which represents four stages of iterative implementation process of IT system based on the study of Lorenzi et al. [21]. The top part is the new technology acceptance in BCMA implementation. According to the TAM theory of Davis [18], Innovation Diffusion Theory described by Kukafka [17], and Karsh and Holden [16], combination of BCMA system features (including ease of use, usefulness and adoption characteristics) and individual and organizational factors related to adopters (users) will determine the perceived ease of use, perceived usefulness, and adoption rate. According to Shortell [24], technology availability, task diversity, staff features, and cultural features are important factors that need to be considered here. Since no evidence shows that these models can be applied in health care system, the relationship between them and the measures of user intention and user satisfaction become the first hypothesis. By examining this hypothesis, one can know how the features of the health care system will affect user acceptance of the new IT. The bottom part includes the fact shown in literature about BCMA impact on health care work, corresponding to each stage respectively. Medications errors, usage of care, and care provide time are three common measures used in the literature, and studies have shown the benefits. Also, studies have shown the side effects related to BCMA, redesign of work process and reform of organization during the implementation process. Although some studies have reported various results regarding to above topics, no relationship has been shown between them and other factors and measures. Also, in the figure, there are three vertical rectangles to define three stages. Steady status refers to the work status before BCMA implementation. Expectation includes
The Performance of BCMA-Aided Healthcare Service
565
the Adoption Rate, Perceived Ease of Use, and Perceived Usefulness, which are related to BCMA software and potential users; and those Perceived Reduction of Medication Errors, Perceived Reduction of Care Usage, and Perceived Reduction of Care Provide Time refers to organizational conceptualization of the coming changes. Expectation will be survey-based. Similar to Expectation, Experience includes all the measures used to judge the implementation process like user intention, user satisfaction, reduction of medication errors, side effects of BCMA, and so on. The final goal of this study is to know the relationship between Expectation and Experience, and try to predict some of the measures before the implementation starts.
Fig. 3. Conceptual model for analyzing BCMA implementation
4 Methodology One urban hospital planning to adopt commercial BCMA system is selected for data collection. The whole implementation process including design, preparation, training, implementation of software and hardware, adjustments and formal utilization will be followed. The research group includes researchers from healthcare and human factors backgrounds, and cooperates with implementation group in the hospital and the development company for the system. Currently, the hospital is basically carrying out manual service with paper-based records, and implementation of an electronic healthcare data record is still on-going. Observation will be firstly carried out in the hospital to achieve a basic idea for the working situation there, based on what the interview guide with indexed questions
566
R. Tian et al.
from several aspects will be iteratively developed focusing on the task/work content, efficiency, cooperation, concerns, expected improvements, etc. Both audio and written records will be obtained from interviewees including nurses, pharmacists, and IT group members in the hospital. Analysis and study of the interview data will enable some insights into the hypotheses in conceptual model, and will help determine different personal and organizational factors for the hospital. Interview answers will supply contents of the narrative for understanding the data. The questionnaire will have four sections which are (1) usability test of the software, (2) work life representation, (3) quality improvement, and (4) problems, errors, deficiencies, side effects. These items are designed to achieve the reflection of current work performance as well as user’s expectation for implementation. Once upon the implementation starts, observation will be carried out again to oversee the changes made in various aspects. Then the questionnaire will be modified and distributed again to obtain the work performance and user experience in the BCMAaided situation. A comparison between the experience data and expectation data, as well as the comparison between work performance with or without the BCMA system, together will give us a more detailed view of the performance of BCMA-aided healthcare systems and how different factors will affect the implementation and performance of a BCMA system.
Acknowledgments Authors thank ASHP (American Society of Health-System Pharmacists) and Regenstrief Center for Healthcare Engineering at Purdue university for their support of this study.
References 1. Kohn, L.T., Corrigan, J.M.: To Err is Human: Building a Safer Health System. In: From Committee on Quality of Health Care in America, Institute of Medicine. National Academy Press, Washington (2000), http://books.nap.edu/openbook.php?isbn=0309068371 (available on, 04/05/2008) 2. Quality Interagency Coordination Task Force (QuIC), Doing What Counts for Patient Safety: Federal Actions to Reduce Medical Errors and Their Impact (2000), http://www.quic.gov/report/toc.htm (available on, 04/05/2008) 3. Johnson, C.L., Carlson, R.A., Tucker, C.L., Willette, C.: Using BCMA software to Improve Patient Safety in Veterans Administration Medical Centers. Journal of Healthcare Information Management 16(1), 46–51 (2002) 4. Bates, D.W.: Using Information Technology to Reduce Rates of Medication Errors in Hospitals. British Medical Journal 320(7237), 788–791 (2000) 5. Bates, D.W., Cohen, M., Leape, L.L., Overhage, J.M., Shabot, M.M., Sheridan, T.: Reducing the Frequency of Errors in Medicine Using Information Technology. Journal of the American Medical Informatics Association 8(4), 299–308 (2001)
The Performance of BCMA-Aided Healthcare Service
567
6. Mekhjian, H.S., Kumar, R.R., Kuehn, L., Bentley, T.D., Teater, P., Thomas, A., Payne, B., Ahmad, A.: Immediate Benefits Realized Following Implementation of Physician Order Entry at an Academic Medical Center. Journal of the American Medical Informatics Association 9(5), 529–539 (2002) 7. Kaushal, R., Barker, K.N., Bates, D.W.: How Can Information Technology Improve Patient Safety and Reduce Medication Errors in Children’s Health Care? Archives of Pediatrics & Adolescent Medicine 155(9), 1002–1007 (2001) 8. Kaushal, R., Shaojania, K.G., Bates, D.W.: Effects of Computerized Physician Order Entry and Clinical Decision Support Systems on Medication Safety – a Systematic Review. Archives of Internal Medicine 163(12), 1409–1416 (2003) 9. Paoletti, R.D., Suess, T.M., Lesko, M.G., Feroli, A.A., Kennel, J.A., Mahler, J.M., Sauders, T.: Using Bar-Code Technology and Medication Observation Methodology for Safer Medication Administration. American Journal of Health-System Pharmacy 64(1) (2007) 10. Patterson, E.S., Cook, R., Render, M.L.: Improving Patient Safety by Identifying Side Effects from Introducing Bar Coding in Medication Administration. Journal of the American Medical Informatics Association 9(5), 540–553 (2002) 11. Ash, J.S., Berg, M., Coiera, E.: Some Unintended Consequences of Information Technology in Health Care: the Nature of Patient Care Information System-related Errors. Journal of the Ametican Medical Informatics Association 11(2), 104–112 (2004) 12. McDonald, C.J.: Computerization Can Create Safety Hazards: A Bar-Coding Near Miss. Annals of Internal Medicine 144, 510–516 (2006) 13. Mills, P.D., Neily, J., Mims, E., Burkhardt, M.E., Bagian, J.: Improving the Bar-Coded Medication Administration System at the Department of Veterans Affairs. American Journal of Health-System Pharmacy 63, 1442–1447 (2006) 14. Halbesleben, J.R.B., Wakefield, D.S., Wakefield, B.J.: Work-arounds in Health Care Settings: Literature Review and Research Agenda. Health Care Management Review 33(1), 2–12 (2008) 15. Chaudhry, G., Want, J., Wu, S., Maglione, M., Mojica, W., Roth, E., Morton, S., Shekelle, P.G.: Systematic Review: Impact of Health Information Technology on Quality, Efficiency, and Costs of Medical Care. Annals of Internal Medicine 144(10), 742–752, w186– w185 (2006) 16. Karsh, B., Holden, R.J.: New Technology Implementation in Health Care. In: Carayon, P. (ed.) Handbook of Human Factors and Ergonomics in Health Care and Patient Safety, pp. 393–410 (2007) 17. Kukafka, R., Johnson, S.B., Linfante, A., Allegrante, J.P.: Grounding a New Information Technology Implementation Framework in Behavioral Science: a Systematic Analysis of the Literature on IT Use. Journal of Biomedical Informatics 36, 218–227 (2003) 18. Davis, F.D.: User Acceptance of Information Technology: System Characteristics, User Perceptions and Behavioral Impacts. Internal Journal of Man-Machine Studies 38, 475– 487 (1993) 19. Bhattacherjee, A., Hikmet, N.: Physicians’ Resistance toward Healthcare Information Technologies: A Dual-Factor Model. In: Proceedings of the 40th Hawaii International Conference on System Sciences (2007) 20. Marsolek, I., Friesdorf, W.: Work Systems and Process Analysis in Health Care. In: Carayon, P. (ed.) Handbook of Human Factors and Ergonomics in Health Care and Patient Safety, pp. 649–662 (2007) 21. Lorenzi, N.M., Riley, R.T., Blyth, A.J.C., Southon, G., Dixon, B.J.: Antecedents of the People and Organizational Aspects of Medical Informatics: Review of the Literature. Journal of the American Medical Informatics Association 4(2), 79–93 (1997)
568
R. Tian et al.
22. Wakefield, D.S., Halbesleben, J.R.B., Ward, M.M.: Development of a Measure of Clinical Information Systems Expectations and Experiences. Medical Care 45(9), 884–890 (2007) 23. Leonard, K.J., Sittig, D.F.: Improving Information Technology Adoption and Implementation Through the Identification of Appropriate Benefits: Creating IMPROVE-IT. Journal of Medical Internet Research 9(2) (2007) 24. Shortell, S.M., Zimmerman, J.E., Rousseau, D.M., Gillies, R.R., Wagner, D.P., Draper, E.A., Knaus, W.A., Duffy, J.: The Performance of Intensive Care Units: Does Good Management Make a Difference? Medical Care 32(5), 508–525 (1994)
On Improving Provider Decision Making with Enhanced Computerized Clinical Reminders Sze-jung Wu1, Mark Lehto1, Yuehwern Yih1, Jason J. Saleem2, and Bradley Doebbeling2 1
School of Industrial Engineering, Purdue university, West Lafayette, IN 47906, USA [email protected], [email protected], [email protected] 2 VA HSR&D Center on Implementing Evidence-based Practice, Roudebush Veterans Affairs Medical Center, Indianapolis, IN 46202, USA [email protected], [email protected]
Abstract. A computerized clinical reminder (CCR) system is a type of decision support tool to remind healthcare providers of recommended actions. In our prior study, we found a linear correlation between resolution time and adherence rate. This correlation implies a potentially biased clinical decision making. This study aimed to redesign the Veterans Affairs (VA) CCR system in order to improve providers’ situation awareness and decision quality. The CCR redesign incorporated a knowledge-based risk factor repository and a prioritization mechanism. Both CCR designs were prototyped and tested by 16 physicians in a controlled lab in the Indianapolis VA Medical Center. The results showed that 80% of the subjects changed their prioritization decisions after being introduced to the modified design. Moreover, with the modified design, the correlation between resolution time and adherence rate was no longer found. The redesign improved the subjects’ situation awareness and assisted them in making more informed decisions. Keywords: Computerized clinical reminders, decision support system, situation awareness.
1 Introduction A computerized clinical reminder (CCR) system, in general, is a type of decision support system designed to augment clinicians’ limited mental capacity. It is triggered by a rule-based knowledge database of clinical practice guidelines and electronic health records, including medical conditions or diagnoses. CCRs are displayed as a list on the primary screen in the Computerized Patient Record System (CPRS), the Veterans Affairs (VA) electronic health record system, as shown in Figure 1. A point and click interface allows providers to act on the reminders and create progress note texts concurrently. CCRs are potentially beneficial in improving the adherence of guideline-based practice and quality of care. However, provider’s adherence to CCRs has been found to be variable between clinics, individual provider, and individual reminder within the Veterans Health Administration (VHA)[1]. Moreover, our prior study identified a linear V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 569–577, 2009. © Springer-Verlag Berlin Heidelberg 2009
570
S.-j. Wu et al.
correlation between physician’s projected resolution time and CCR adherence rate, which suggests that a CCR perceived easier to resolve tends to be resolved more often [2]. Relying on resolution time, a provider could overlook a critical but time-consuming clinical reminder, and adversely sacrifice the quality of care and patient safety.
Fig. 1. VA’s computerized patient record system (CPRS)
One possible explanation for the correlation between CCR resolution time and adherence rate is time pressure. Providers in general face extremely time-critical conditions in clinical practice. Time restrictions often force primary care providers to choose among multiple CCRs during a particular visit. Time pressure also undermines a provider’s situation awareness, which defines as the ability to perceive, comprehend, and project the status of the surroundings to the near future [3-6]. Situation awareness model was linked to clinical settings to explain different levels of situation awareness and patient safety [7, 8]. According to [7], providers’ diagnostic errors arise from “failure in perception, failed heuristics, and biases”, etc. Improving providers’ situation awareness can not only improve decision quality, but also reduce the number of preventable medical errors. The CCR system is an automation of guideline execution process designed to alleviate providers’ workload and reduce reliance on memory. However, a typical CCR system serves as a “to-do” list, without providing conspicuous reasons why a CCR is triggered. Such “black-box” design of CCR system may reduce the provider’s situation awareness. Providers have to spend extra efforts in retrieving data to recognize individual CCR importance. A reasoning shortcut is formed when a provider bases his/her judgment on incomplete CCR information. Therefore, it is critical to design CCR mechanisms to maintain providers’ situation awareness.
On Improving Provider Decision Making with Enhanced CCRs
571
The objective of this study is to improve the design of the VA’s CCR system in order to assist providers in making CCR prioritization decisions. We prototyped the original and redesigned systems, and tested the performance of both systems in a study involving primary care physicians in a controlled lab in the Indianapolis VA Medical Center (VAMC). With the proposed features in the new design, we hypothesized that the new design would result in better situation awareness and decision making.
2 Methods 2.1 CPRS Prototype A web-based prototype was developed as a mock-up of the current VA CPRS system. This application was programmed in Hyper Text Markup Language (HTML), JavaScript, Active Server Pages (ASP), SQL, and Microsoft Access 2003 database. This web application is database driven that enables users to log in, select patients, and review same clinical information as in VA’s CPRS, including cover sheet, problems, medications, orders, and medical notes, etc. This web-based CPRS serves as a low-fidelity prototype for providers to simulate using CRPS in the clinics. Two different designs of the web-based CPRS were developed. Design A, representing the original design, has the interface built upon the current VA CPRS system. In design B, numerous design modifications were implemented (Figure 2). The following sections elaborate each of the new features in the modified design (Design B).
Fig. 2. Design B: the CPRS prototype with CCR redesign
572
S.-j. Wu et al.
Risk Factor Repository. First, we developed a risk factor repository to assist clinicians in locating useful information. A clinical risk factor is the chance a patient will get a certain disease. Clinical risk factor assessment enables early detection of a disease, and, thus, is the key for improving quality of care. The proposed risk factor repository extracts relevant clinical information from patients’ database to populate a systematic review of patient’s risk factors, past encounter summary and pending exams. It facilitates clinicians to retrieve desired information without manually browsing throughout the patient records. We searched the United States Preventive Services Task Force (USPSTF) recommendations and the National Guidelines Clearinghouse to build up the knowledge base of risk factor repository. In addition, we also consulted with experts in the field to identify the information useful for CCR decision making. The interface for the risk factor repository is implemented with an expandable tree control by JavaScript to provide easy data navigation. It is programmed as a pop-up window that is linked to the targeted CCR on the cover sheet of CPRS (Figure 3).
Fig. 3. The Risk factor repository for colorectal cancer screening
Prioritization Mechanism. The second feature proposed in the study, a prioritization mechanism, enables users to prioritize the clinical reminders according to several reminder attributes. These reminder attributes include 1) reminder name, 2) due date, 3) resolution time, and 4) risk factors. The clinical reminders in the Indianapolis VAMC are currently arranged first by role (N- for nurses, and P- for physicians), and then by alphabetical order. The functionality of prioritizing by CCR name in the modified design does not provide new information, but instead serves as a comparison baseline with the other attributes.
On Improving Provider Decision Making with Enhanced CCRs
573
The second reminder attribute, CCR due date, enables providers to prioritize a CCR by how late a CCR is past due. In the VA’s CCR system, the CCR due dates are displayed as “due now” for all past due reminders. Providers have no reference to the actual due date, except by browsing through past exam results or medical notes. This function enables clinicians to recognize and act on the most past due reminders. Resolution time is the amount of time it takes to address and resolve a CCR. In this study, we estimated the resolution time by surveys of panel experts and the number of tasks involved to resolve an individual CCR. The resolution time is displayed graphically that indicates the relative length of time to accomplish an individual CCR. Since our prior study has found a linear correlation between estimated resolution time and adherence rate, we were interested in investigating how providers used this information to generate prioritization decision with the new design. Finally, we proposed to prioritize clinical reminders by clinical risk factors. In this study, each CCR was assigned a risk score of “average” or “high”. This risk score was triggered by the same knowledge base that powered the risk factor repository. Only the grade “A”-level evidence recommended by USPSTF were used to populate the risk score. For example, an adult 50-year-old and older has an average risk for colorectal cancer, and is recommended for either annual fecal occult blood test, flexible sigmoidoscopy every five years, or colonoscopy every ten years. A patient satisfying more risk factors, including personal or family history of colorectal cancer, or personal history of polyps, has an increased risk for the cancer, and thus should have more stringent screening schedule and consultation process. 2.2 Participants Sixteen (16) VA physicians were recruited opportunistically to participate in a comparison study of the original and modified prototype. The sample constituted approximately 50 percent of the staff physicians in the Indy VAMC outpatient clinics. The large sampling proportion of the general population greatly reduced the likelihood of participation bias. Among the participants, two were novice CPRS users, and the rest were experienced users who had an average of 5.8 years of experience. There was one participant younger than 30, ten participants aged between 30-39, three between 40-49, and two between 50-59 years-old. Nine of the participants were male and seven were female. 2.3 Physical Setup The experiment was conducted in the human-computer interaction (HCI) laboratory located in the Indianapolis VAMC Center of Excellence on Implementing EvidenceBased Practice. The HCI lab was a controlled, closed setting to simulate physicians using a workstation in an exam room. Two monitors were installed for the experiment. Participants browsed the CPRS prototype in the primary screen, and walked through the patient encounter simulation and survey in the secondary screen. This reduced the confusion caused by switching screens back and forth during the experiment. A web camera and Morae Recorder, a usability testing software, were installed on participant's workstation. Morae recorded user videos, audio, screen, and computer events, including keyboard entry and mouse clicks. The researcher (S. Wu) observed
574
S.-j. Wu et al.
participant’s facial expression and computer screen remotely through Morae Observer at the observation station near by the participant. 2.4 Procedure A scripted instruction was administered throughout the experiment to minimize bias and potential variation between experiments. Each participant was firstly introduced to the web-CPRS prototype with the original CCR design (design A). The subject browsed through each element of the prototype following a scripted instruction, and then was encouraged to freely explore the system to be familiar with the prototype and the simulated patient. Physicians in average spent around 5 minutes in the exploration session, and then proceeded to a self-helped patient encounter simulation on the secondary screen. Our base case scenario is a 55 year-old male smoker with 4-year history of type II diabetes. The patient has other active problems including hypertension, tobacco abuse, and neuropathy in diabetes. Each subject browsed through an interactive simulator programmed with JavaScript, as shown in Figure 5, where he/she simulated the procedure of a typical patient encounter in the exam room. During the simulation, the subject walked through the patient’s physical exam, and obtained interactive feedback pertaining patient symptoms and checkup results. Through the interactive simulator, subjects learned of the patient’s health scenario and the following assessment plan in the encounter notes: 1) patient’s diabetes was under very good control; 2) neuropathy in diabetes was well controlled; 3) hypertension was already controlled with medications; 4) patient smoked very little now with one pack per week or so. During the end of the simulated patient encounter, the simulation system informed the subject that there were still five clinical reminders not resolved, but the next patient had been waiting to be seen in five minutes. The subject was asked to prioritize the five remaining clinical reminders, starting with the one(s) that would be resolved first. This prioritization decision was made under the assumption that not all five clinical reminders could be resolved because he/she was pressed for time. In the second half of interview, the modified CCR design (Design B) and its new features were introduced to the subject. The subject walked through each feature in the new design, including risk factor repository and prioritization by due date, resolution time, and risk factors. The subject was then asked in the same way as the first part of interview -- to prioritize the clinical reminders. Finally, the interview concluded by subject’s open-end comments and suggestions.
3 Results We measured which CCRs the participants chose to resolve under time pressure, and assigned priority to each CCR. The results were summarized in Table 1 with the original design on the left and the modified design on the right, respectively. The prioritization data of the first participant were discarded from the experiment because the medical records of the simulated patient were modified following the first experiment. The study result showed that 12 (80%) out of the 15 subjects changed their prioritization decisions, among which 33 (44%) out of 75 prioritization decisions were changed.
On Improving Provider Decision Making with Enhanced CCRs
575
The mean and standard deviation of the priorities in Table 1 gives an overall comparison between clinical reminders. With the original design, participants in general ranked hypertension as highest priority (mean=1.69, st. dev=0.85), followed by hemoglobin A1c (mean=1.93, st. dev=1.10) and LIPID profile (mean=2.67, st. dev=1.35). Colorectal cancer screening and diabetic foot were the ones least likely to be resolved under time limitation. On the contrary, with the new design, participants recognized the importance of colorectal cancer screening and regarded it to be most important to resolve (mean=1.87, st. dev.=1.51). The average priority of diabetic foot reminder also improved from 3.47 (st. dev.=1.46) to 2.73 (st. dev.=1.53). LIPID profile in the modified design was least likely to be resolved (mean=3.27, st. dev.=1.33). Table 1. The priority order elicited from each participant with original design (on the left) and modified design (on the right). *note: CRC: colorectal cancer screening, D. Foot: diabetic foot exam, HTN: hypertension screening, HgbA1c: hemoglobin A1c test, LIPID: lipid profile. With Original Design Priority Mean St. Dev.
With Modified Design
HgbHgbCRC D.Foot HTN LIPID CRC D.Foot HTN LIPID A1c A1c 3.53 3.47 1.69 1.92 2.67 1.87 2.73 2.62 2.53 3.27 1.51 1.46 0.85 1.10 1.35 1.51 1.53 0.96 1.19 1.33
4 3 2 1 0
R2 = 0.7202 FOBT
LIPID
Diabetic HTN Foot
HgbA1c 0
1
2
3
Estimated Resolution Time
4
5
Average CCR Priority
Average CCR Priority
In this study, resolution time for each CCR was provided in the modified design as one of the new features in order to test how subjects incorporated this information into decision making. Figure 4(a) and 4(b) demonstrate the relationship between resolution time and CCR priority for the original and modified design, respectively (Note: lower numerical value in Y axis stands for higher priority). A linear correlation was found in the original design (R2=0.7202). Figure 4(a) showed that, in the original CCR system, the reminders resolved at higher priority tend to have shorter resolution times, even though the resolution time information was not provided to the subjects. This result is consistent with our prior study [2], where a linear correlation between resolution time and adherence rate was identified. This further affirms that a CCR perceived easier to resolve would be more likely to be resolved.
4 3 2 1 0
LIPID HTN Diabetic HgbA1c Foot 0
1
2
3
FOBT 4
5
Estimated Resolution Time
Fig. 4(a) and 4(b). Average CCR priority v.s. estimated resolution time in the original design (Fig.4(a) on the left) and the modified design (Figure 4(b) on the right). (Note: lower numerical value in Y axis stands for higher priority.)
576
S.-j. Wu et al.
Intriguingly, such linear correlation between CCR priority and resolution time was not found in the modified design, where CCR resolution time, along with other design features, was provided to the subjects (see Figure 4(b)). The new design features of risk factor repository and prioritization mechanism assisted the users in collecting and comprehending relevant clinical data. As such, the subjects’ situation awareness was improved. With the modified design, resolution time was no longer the major decision criterion when providers were under time pressure. The modified CCR design greatly impacts on clinical decision making in that it changed the way providers incorporated CCR information to achieve a clinical decision.
4 Discussion Risk factor repository extracted patient-specific information from existing electronic medical records according to a pre-defined knowledge base. The scattered information in the original CCR system provides little value to providers until it was mapped to assist decision making in the new design. The modified CCR system provided greater information quality through the prioritization mechanism and risk factor repository. As a result, physicians were able to make informed decisions for prioritization without missing important evidence as with the original design. Provider’s change of decision in return validates the value of CCR information in the modified design. From Table 1, one can conclude that, the new design changed physician’s prioritization and resolution decision substantially. This impact is especially evident for colorectal cancer screening. This study laid out the methodology to improve providers’ situation awareness by aligning information flow with clinicians’ mental model in decision making. The results concluded the modified CCR features not only expedited decision making, but significantly impacted the way clinicians prioritized CCR. Acknowledgements. This research was supported in part by the VA HSR&D Center of Excellence on Implementing Evidence-Based Practice (CIEBP), US Department of Veterans Affairs, HSR&D Center grant #HFP 04-148. The views expressed in this article are those of the authors and do not necessarily represent the view of the Department of Veterans Affairs.
References 1. Agrawal, A., Mayo-Smith, M.F.: Adherence to Computerized Clinical Reminders in a Large Healthcare Delivery Network. Medinfo. 11(Pt 1), 111–114 (2004) 2. Wu, S., Lehto, M.R., Yih, Y., Saleem, J., Doebbeling, B.N.: Relationship of Resolution Time and Computerized Clinical Reminder Adherence. In: Proceedings of AMIA Annual Symposium, pp. 334–338 (2007) 3. Endsley, M.R.: Situation Awareness Global Assessment Technique (SAGAT). In: Proceedings of the National Aerospace and Electronics Conference (NAECON), pp. 789–795. IEEE, New York (1988) 4. Endsley, M.R.: Toward a Theory of Situation Awareness in Dynamic Systems. Human Factors 37(1), 32–64 (1995)
On Improving Provider Decision Making with Enhanced CCRs
577
5. Endsley, M.R., Kaber, D.B.: Level of Automation Effects on Performance, Situation Awareness and Workload in a Dynamic Control Task. Ergonomics 42(3), 462–492 (1999) 6. Endsley, M.R.: Theoretical Underpinnings of Situation Awareness: A Critical Review. In: Endsley, M.R., Garland, D.J. (eds.) Situation Awareness Analysis and Measurement. LEA, Mahwah (2000) 7. Singh, H., Petersen, L.A., Thomas, E.J.: Understanding diagnostic errors in medicine: a lesson from aviation. Qual. Saf. Health Care. 15, 159–164 (2006) 8. Shaw, J., Calder, K.: Aviation is Not the Only Industry: Healthcare Could Look Wider for Lessons on Patient Safety. Qual. Saf. Health Care. 17(5), 314 (2008)
Facial Shape Variation of U.S. Respirator Users Ziqing Zhuang1, Dennis Slice2, Stacey Benson3, Douglas Landsittel1,4, and Dennis Viscusi1 1
National Institute for Occupational Safety and Health, National Personal Protective Technology Laboratory, Pittsburgh, PA 15236 USA 2 Florida State University, Dept. of Scientific Computing, Dirac Science Library, Tallahassee, FL 32306-4120 USA 3 EG&G Technical Services Inc., Pittsburgh, PA 15236 USA 4 Duquesne University, Department of Mathematics and Computer Science, Pittsburgh, PA 15282 USA
Abstract. The National Institute for Occupational Safety and Health (NIOSH) conducted a head-and-face anthropometric survey of diverse, civilian respirator users. Of the 3,997 subjects measured using traditional anthropometric techniques, surface scans and 26 three-dimensional (3-D) landmark locations were collected for 953 subjects. The objective of this study was to analyze the size and shape variation of the survey participants using 3-D Generalized Procrustes Analysis (GPA) in order to quantify those facial features that may be relevant to respirator fit using Principal Component Analysis (PCA). The first four principal components (PC) account for 49% of the total sample variation. The first PC indicates that overall size is an important component of facial variability. The second PC accounts for long and narrow or short and wide faces. Longer narrow orbits versus shorter wider orbits can be described by PC3, and PC4 represents variation in the degree of ortho/prognathism with positively scoring individuals having longer, wider, and more projecting lower jaws than negatively scoring individuals. Further study will investigate the correlation between respirator fit and these PCs. Keywords: anthropometry, geometric morphometrics, respirators.
1 Introduction Given an array of respirator styles and sizes, it is important to determine their fit and efficacy with respect to their intended user population and to quantify those facial features relevant to these parameters. Standard practice for assessing respirator fit for many years has been based on fit-test panels derived from studies of the facial morphology of U.S. Air Force personnel in the 1970s [1,2]. It is largely recognized that data based on a population of young, fit military personnel from the 1970s does not likely reflect the age, sex, ethnic, and fitness diversity of the contemporary workforce the test procedures are required to target [3]. To address this deficiency, the National Institute for Occupational Safety and Health (NIOSH) conducted a facial V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 578–587, 2009. © Springer-Verlag Berlin Heidelberg 2009
Facial Shape Variation of U.S. Respirator Users
579
morphological survey of contemporary workers that require the use of a respirator in the course of their work [4, 5]. Besides being based on a group not likely to be completely representative of the contemporary respirator-user population, previous studies focused on the association between linear facial dimensions in the development of test panels to capture facial variation. In the field of anthropometrics, from which the facial measurements were borrowed, there has been considerable recent innovation in the quantification and statistical analysis of shapes based on the study of the Cartesian coordinates of the landmarks that usually serve as the basis for traditional measurement definitions [6, 7]. These new methods, collectively referred to as Geometric Morphometrics (GM), have proven more powerful and efficient than traditional approaches in many cases, and it is worthwhile to determine the extent to which they can advance the goal of respirator fit assessment. Such studies, in turn, could feed back into respirator design to achieve more efficient and comfortable product style and sizing. In anticipation of this, the NIOSH study included the collection of both facial surface scans and three-dimensional landmark locations for a large subset (~25%) of their surveyed individuals [4]. The dependence of respirator fit assessment standards on a base population morphologically distinct from the target population and the reliance of the development of those standards on a limited and somewhat arbitrary suite of traditional, (curvi-) linear anthropometric measurements were some of the problems identified by an independent review committee that examined the current state of respirator-fit assessment [8]. It was the purpose of this study to address some of these concerns by further investigating the nature of facial shape variation in the latest data assembled.
2 Materials and Methods 2.1 Data Data for this study were collected by the NIOSH National Personal Protective Technology Laboratory (NPPTL) an anthropometric survey [4]. The main body of data consisted of 953 data files in the format of a Unix-based, 3D package called INTEGRATE [9]. Each file contained three-dimensional coordinate locations of anatomical landmarks (Figure 1) for one individual. In addition, demographic information including sex, age group, racial group, and traditional anthropometric measures were collected. All data were visually inspected using morphometrics software to identify mislabeled or obviously erroneous coordinate values. These were marked as missing data. The proper handling of missing data is a complicated endeavor [10]. One possible course of action would be to eliminate all individuals with any missing landmarks. That would call for the removal of over 25% of the sample, which seems extreme. Several other cut points would be defensible, e.g., removing individuals with more than 3 missing landmarks, 5, etc. It was decided, instead, to retain all 953 individuals. Most individuals (72%) had no missing landmark coordinates, and less than 1% had six or more missing landmarks out of the twenty-eight with missing data. If the occurrence of missing data is not random with respect to the morphology of the individuals, then removing individuals will reduce the variability that this study is seeking to quantify. Missing data were estimated by simply substituting mean coordinate values.
580
Z. Zhuang et al.
Fig. 1. Location and identification of the 26 landmarks used for the PCA: Tragion (1, 19), Zygion (2, 17), gonion (3, 18), Frontotemporale (4, 15), Zygofrontale (5, 16), Infraorbitale (6, 14), Glabella (7), Sellion (8), Pronasale (9), Subnasale (10), Menton (11), Chelion (12, 13), Pupil (20, 21), Nasal root point (22, 23), Alare (24, 25), Chin (26)
2.2 Generalized Procrustes Analysis Landmark coordinates are not directly comparable as quantitative measures of shape because they are (usually) recorded with respect to an arbitrary set of orthogonal reference axes. In its simplest case, irrelevant variation is introduced into the coordinate values by the position and orientation of the specimen relative to the digitizing apparatus or scanning device. To address these problems and issues, geometric morphometric methods include a data processing step that standardizes configurations of landmarks associated with individuals into a common coordinate system and, further, usually standardizes these configurations to a common size. The scale factor used in the latter standardization can be saved as a size measure for further investigation of the relationship between shape and size in the sample. The way the required standardization is usually done is through Generalized Procrustes Analysis (GPA) [6, 11, 12]. In GPA, landmark configurations are meancentered so that their average coordinate location for all landmarks is the origin. They are then scaled so that the square root of the sum of squared distances of each landmark in a configuration to their joint average location (the origin after mean-centering) is 1.0. This measure is called centroid size and has the desirable property that it is the only size measure that is independent of shape variation in the presence of small, isometric random variation in landmark location around a mean configuration [13]. Next,
Facial Shape Variation of U.S. Respirator Users
581
an arbitrary configuration of landmarks from the mean-centered and size-standardized data set (usually the first specimen) is used as a reference configuration. All specimens in the data are rotated so that the sum of squared distances between individual configuration landmarks and corresponding landmarks on the reference is minimized. Once so rotated, a mean configuration is estimated as the arithmetic averages of landmark coordinates in the superimposed data set. The average configuration is then scaled to unit centroid size and the sample refit to the new estimated mean. This process is guaranteed to monotonically converge on a mean estimate for the sample [11] and is not substantively affected by the initial choice of reference. After little or no change is seen due to the rotation and mean estimation steps, the process is deemed complete and the superimposed coordinates for each individual can be used as commensurate variables that describe individual shape and can be subjected to multivariate analyses, such as principal components analysis used here. This approach, in its standard form, is not the best for the purpose of this study directed at assessing variability that influences the fit and function of respirators. Here, size variation is no less important to the ultimate goal than shape variation, and even sequestering it in a separate variable for joint or separate analysis is, at least initially, irrelevant. For this reason, scale was restored to the results of a standard GPA by multiplying the resulting shape variables by the inverse of the scale factor applied to them in the course of the superimposition of individual configurations onto the grand mean. These are the “form” (shape+size) data used in subsequent statistical analyses of population variation. 2.3 Population Variation Population variation for the data set, after GPA was analyzed by principal component analysis (PCA) to identify patterns of covariation in the data. Major directions of variation were compared and visualized using GM methods and software. 2.4 Software All standard statistical analyses, such as PCA, were carried out in the open source R package [14]. The matrix capabilities of R were also used for some custom data manipulation and testing. Where possible, new Java-based, cross-platform programs (m_vis and the new morpheus et al.) currently under development by D. E. Slice were used for visualization and data manipulation and analysis. A number of new routines were added to these programs to facilitate the current study. When morphometricspecific visualization or analytical routines were not available in the most recent versions of this software, an older Microsoft Windows © version of Morpheus et al., written in C++, was used [15]. Links to these and other morphometrics programs can be found through http://www.morphometrics.org.
3 Results Figure 2 shows the data set extracted from the GPA with size restoration described above. Each cluster of symbols represents the scatter of individual landmark locations for the 953 individuals in this data set. The coordinates of the 26 landmarks per
582
Z. Zhuang et al.
Fig. 2. Plot of the 953 individuals in the data set after GPA and size restoration. The coordinates of the 26 landmarks represented for each specimen are form variables that describe the shape and size of each specimen.
individual represented in Figure 2 are a slightly redundant set of 78 (26 points × 3 coordinates per point) form variables that characterize the size and shape of individual faces within a coordinate system common to all. PCA of the 953 superimposed configurations in the space of the 78 form variables showed a substantial proportion of the total sample variability in the first three PCs (26%, 10%, and 8%, respectively). The variance on PCs beyond the third (all 5% or less of the total) trails off gradually suggesting no strong patterns of intercorrelation amongst the variables. It requires the first 27 PCs as a group to account for 90% of total sample variation. The projections of the form data for the 953 individual configurations onto PCs 1 through 4 are shown in Figure 3. Each point represents a linear combination of 78 coordinates for a single subject. As commonly seen with PCA results, most of the scatter in the data is along the first PC and somewhat less along the second. Variation on higher PCs is reduced, but nonetheless substantial. Since PCA is based on mean-centered data, and the PCs, themselves, are linear combinations of the original coordinate variables, one can construct the configuration at a specific point in PC space by simply multiplying the coefficients for the linear combination of coordinates represented by each PC of interest by the coordinate of the point of interest on that PC. Landmark configurations representing patterns of variation along PC1 magnified by a factor of 100 are shown in Figure 4. The coefficients are scaled so the sum of their squares equals 1.0. The pattern of variation specified by PC1 and shown graphically in Figure 4 shows a general movement of landmarks away from their joint center of gravity in the positive direction along the PC. Shape change in the negative direction is, of course, the compliment of this with landmarks all moving more-or-less toward the configuration's center at approximately the same rate (distance per unit change along the axis). It is important to note that the polarity of these axes is arbitrary and positive and negative directions can be exchanged without impacting the variance of the projections, which is the only criterion by which they are constructed.
Facial Shape Variation of U.S. Respirator Users
583
Fig. 3. Projections of 953 landmark configurations onto PCs 1 and 2 (left) and PCs 3 and 4 (right)
Fig. 4. Visualization of PC1. Left figure shows frontal view of transformation determined by PC1. The right figure is the same, but for the right lateral view. Green circles represent the average location of landmarks in the entire, superimposed data set. The black lines are links to aid visualization. The red line segments represent the coefficients for each coordinate of each landmark specified by the first PC magnified by a factor of 100 to emphasize the pattern of variation. That is, the red lines indicate the path (direction and relative magnitude) of the landmarks as they change location as one moves along the specified PC in the positive direction. The ends of the line segments in the images indicate the positions of the landmarks at a point 100 units out in the positive direction on PC1.
Such a pattern clearly represents an overall increase or diminution of the configuration as results from isometric size change. Indeed, the correlation of the scores of individuals on this axis with their centroid size is 0.99 (=Pearson's product moment correlation, Kendall's tau = 0.92). Such a result indicates the overall size is an important component of facial variability in the studied population and is likely an important component of respirator fit assessment, but would not be captured by a standard GM analysis that focuses more on pure shape change. The relatively low proportion of variation (0.26) suggests, however, that size is not the only important consideration. Figure 5 shows a visualization of facial change in the positive direction of PC2. As before, what represents positive versus negative change along this axis is arbitrary and the negative change in this representation would simply be the reflection of the displacements shown in red along their own axes.
584
Z. Zhuang et al.
Fig. 5. Visualization of PC2. Left is frontal view. Right is lateral view. These are positivedirection displacements. Negative direction is the reflection of all of the red vectors along their own axes.
There is a general tendency for landmarks to be displaced medially. The landmarks associated with the upper part of the face, especially those of the eyes and the bridge of the nose, tend to be displaced upwards. Those associated with the lower face – the corners of the mouth, the angle of the jaw, and the chin, tend to be displaced downwards. This has a relatively simple interpretation as those individuals with more positive scores on this axis having relatively narrower and longer faces. Conversely, individuals with more negative scores would have shorter, wider faces. Given the high correlation of the first PC with size, it is not surprising that there is a low association between size and this axis (Pearson's product-moment correlation = 0.09, Kendall's tau = 0.05). What this represents is an independence between overall facial shape (long/narrow vs. short/wide) with facial size. In traditional biological terms, this is an indication of a lack of “allometry.” Furthermore, this result means that simple concepts of small, medium, and large with respect to respirators may not capture much of this component of variation. Figure 6 shows the pattern of variation specified by PC3 accounting for about 8% of the total variation. The pattern here is more complicated and difficult to summarize than those in lower PCs. Important features in the positive direction appear to be a relative lateral displacement of the centers of the pupils and a larger lateral displacement of the landmarks associated with the frontal bone, sides of the head, and angles of the jaw (gonion). In contrast, the landmarks defining the tip and sides of the nose and corners of the mouth are displaced upwards. In seeming contrast, right and left infraorbitale appear medially displaced. In lateral view, gonion and frontotemporale and zygofrontale are displaced posteriorly while zygion shifts anteriorly. This pattern defies simple description, though the nose and mouth do appear to shift superiorly relative to the rest of the face, while the face, itself, appears to widen. Projections on more negative values of this axis, of course, are represented by the compliment of these changes. The pattern specified by PC4 (Figure 7), though accounting for only 5% of the total variation, is somewhat more clear. The pupils, nasal root points, the corners of the mouth, and the chin landmarks are shifted inferiorly while gonion is shifted superiorly. Tragion, zygion, frontotemporale, and zygofrontale are shifted medially, and the alare are displaced laterally in frontal view. In lateral view, gonion and the landmarks
Facial Shape Variation of U.S. Respirator Users
585
Fig. 6. Visualization of PC3. Left is frontal view. Right is lateral view. These are positive displacements. Negative direction is the reflection of the red vectors along their own axes.
Fig. 7. Visualization of PC4. Left is frontal view. Right is lateral view. These are positivedirection displacements. Negative direction is the reflection of the red vectors along their own axes.
of the nasal bridge and orbital rim are shifted posteriorly, while the mouth, chin, tragion, and zygion are shifted anteriorly. Configurations projected to more negative scores along this axis manifest the compliment of these changes. In general, there is an impression that this component might represent variation in the degree of ortho/prognathism with positively scoring individuals having longer, wider, and more projecting lower jaws than negatively scoring individuals.
4 Discussion The comprehensive assessment of morphological variation in users is a vital factor in understanding how differences in facial form can affect the fit and efficacy of commercial respirators. Such knowledge can facilitate the optimal design of these products and can inform the development of standards and protocols by which such devices are evaluated and certified. Recent advances in the quantitative analysis of anatomical variation, called geometric morphometric methods, have the potential to provide more powerful and complete descriptions of morphological diversity in a target population than the traditional anthropometric measurements upon which current respirator standards are based. Furthermore, it is important that emerging standards be
586
Z. Zhuang et al.
reflective of an ever-changing workforce that is not likely represented by the militarybased standards currently used. Principal components analysis of variation in the form (size+shape) variables of the data revealed that approximately 26% of total sample variance could be expressed as a single linear combination of the original variables – PC1. Inspection of the results revealed the first PC reflected largely isometric size variation. That is, variation in the overall size of faces in the population was the single greatest source of variability within the studied group. While expressing the greatest amount of variation, PC1 does not express most of the variation in the sample and higher PCs may be important in respirator fit research. Visualization of PC2 (expressing about 10% of sample variation) revealed a contrast between longer, narrower, shallower heads/faces versus shorter, wider, deeper heads/faces that is statistically independent of overall head size. More complex, but still interpretable and potentially relevant, variation was identified on PC3 (~8% of sample variation) and PC4 (~5%). Nonetheless, the first two PCs together only represent 36% of total sample variability and the first three only 44%. This suggests the bivariate approach used in constructing fit panels may be ignoring a substantial and important aspect of total sample variability as previously reported [5].
5 Conclusions In all, these analyses show the geometric morphometric-based approach to morphological variation provides a detailed and interpretable assessment of morphological variation in the provided sample that should be very useful in assessing the function of commercial respirators and devising new test and certification standards. A significant amount of this variation is contained in the first few PCs, but a substantial portion remains that could be important in respirator fit. Principal component analysis is not designed to optimize or take into account the results of the respirator fit testing. The relationship between this measure and the results reported here will be the subject of subsequent analyses.
6 Disclaimer The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the National Institute for Occupational Safety and Health.
References 1. Hack, A.L., Hyatt, E.C., Held, B.J., Moore, T.D., Richards, C.P., McConville, J.T.: Selection of respirator test panels representative of U.S. adult facial sizes. Los Alamos Scientific Laboratory, NM (1974) 2. Hack, A.L., McConville, J.T.: Respirator protection factors: Part I – development of an anthropometric test panel. Am. Ind. Hyg. Assoc. J. 39(12), 970–975 (1978)
Facial Shape Variation of U.S. Respirator Users
587
3. Zhuang, Z., Guan, J., Hsiao, H., Bradtmiller, B.: Evaluating the representativeness of the LANL respirator fit test panels for the current U.S. civilian workers. J. Int. Soc. Resp. Prot. 21, 83–93 (2004) 4. Zhuang, Z., Bradtmiller, B.: Head and face anthropometric survey of U.S. respirator users. J. Occup. Environ. Hyg. 2, 567–576 (2005) 5. Zhuang, Z., Bradtmiller, B., Shaffer, R.E.: New respirator fit test panels representing the current U.S. civilian work force. J. Occup. Environ. Hyg. 4, 647–659 (2007) 6. Slice, D.E.: Modern Morphometrics. In: Slice, D.E. (ed.) Modern Morphometrics in Physical Anthropology. Kluwer Academic Publishers, New York (2005) 7. Slice, D.E.: Geometric Morphometrics. Annu. Rev. Anthropol. 36, 261–281 (2007) 8. Bailar III, J.C., Meyer, E.A., Pool, R. (eds.): Assessment of the NIOSH Head-and-Face Anthropometric Survey of U. S. Respirator Users. Institute of Medicine of the National Academies. National Academies Press, Washington (2007) 9. Burnsides, D., Files, P.M., Whitestone, J.J.: INTEGRATE 1.25: A Prototype for Evaluating Three-Dimensional Visualization, Analysis, and Manipulation Functionality. Technical Report AL/CF-TR-1996-0095, Crew Systems Directorate, Human Engineering Division, Wright-Patterson AFB, Ohio (1996) 10. Little, R.J.A., Rubin, D.B.: Statistical analysis with missing data. John Wiley & Sons, New York (1987) 11. Gower, J.C.: Generalized Procrustes analysis. Psychometrika 40, 33–51 (1970) 12. Rohlf, F.J., Slice, D.E.: Extensions of the Procrustes method for the optimal superimposition of landmarks. Syst. Zool. 39, 40–59 (1990) 13. Bookstein, F.L.: Morphometric Tools for Landmark Data: Geometry and Biology. Cambridge University Press, Cambridge (1991) 14. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2007), http://www.R-project.org 15. Slice, D.E., Morpheus, et al.: Software for morphometric research. Revision 01-31-00. Department of Ecology and Evolution. State University of New York at Stony Brook, New York (1998)
Method for Movement and Gesture Assessment (MMGA) in Ergonomics Giuseppe Andreoni , Marco Mazzola, Oriana Ciani, Marta Zambetti, Maximiliano Romero, Fiammetta Costa, and Ezio Preatoni Politecnico di Milano, INDACO Dept., via Durando 38/A, 20158 Milan, Italy {giuseppe.andreoni,marco.mazzola,oriana.ciani,marta.zambetti, maximiliano.romero,fiammetta.costa,ezio.preatoni}@polimi.it
Abstract. We present a technique for the ergonomic assessment of motor tasks and postures. It is based on movement analysis and it integrates the perceived discomfort scores for joints motions and the time involvement of the different body districts. It was tested on 8 subjects performing reaching movements. The experimental protocol was designed to have an a priori expected comfort ranking, namely, higher values in presence of more uncomfortable tasks. The validation of the Method for Movement and Gesture Assessment (MMGA) in the ergonomic evaluation of a reaching task gave promising results and showed the effectiveness of the index. Possible applications of the method might be the integration into CAD tools and human motion simulation to provide an early comparative evaluation of the ergonomics of the prototyping process and workplace redesign in industry. Keywords: Proactive Ergonomics, Ergonomic Index, Movement and Posture Analysis, Occupational Biomechanics, Assessment technique; Joint discomfort.
1 Introduction Researches on the comfort/discomfort assessment of work-spaces and work-tasks is largely present in ergonomics literature [1]. Over the past 30 years, a significant number of methods, whose aim is to improve the ergonomic assessment, have been published. A short list of these tools would include: QEC, manTRA, RULA, REBA, HAL-TLV, OWAS, LUBA, OCRA, Strain Index, SNOOK tables and the NIOSH lifting equation [2]. They can be roughly classified into two main categories: qualitative and quantitative methods. Among the latter, OWAS [3], PATH [4], and RULA [5] indexes are probably the most cited and applied, together with the revised NIOSH equation for manual lifting [2]. The Rapid Upper Limb Assessment (RULA) has been developed for the ergonomic evaluation of workplaces. It consists in reporting disorders/troubles that are related to upper limbs. The RULA assessment is based on the observation of the postures that are adopted whilst undertaking the tasks. Depending on the aim of the analysis, this index considers either the posture that is maintained for the longest time or the one that appears the worst (in biomechanical and ergonomic sense) among all V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 591–598, 2009. © Springer-Verlag Berlin Heidelberg 2009
592
G. Andreoni et al.
the adopted. After recording and scoring the single posture(s) the final score is obtained by adding the single contributions; then it can be compared to the Action Level List that provides a guide for further action for the improvement of the ergonomics of the analysised work situation. The problem of a quantitative assessment of postural stress and load was recently faced by Kee and Karwowski [6]. They proposed the “ postural loading on the upper body assessment” (LUBA), which refers to a dataset of perceived discomfort scores (ratio values) for a set of joint motions. They defined a composite index that accounts for the hand, arm, neck and back joints and the corresponding maximum holding times in static postures. The postural classification scheme they developed was based on the angular deviation of each joint from the neutral position. Articular angles were assigned to different classes and were given a score of discomfort through a statistical approach. The score of each class was normalized to the perceived discomfort value of elbow flexion, which exhibited the lowest level among all joint motions and, therefore, was set as a reference point. Four distinct action categories were considered for fixing evaluation criteria concerning stresses of working postures, and for providing practitioners with proper corrective interventions. The proposed method may be used for evaluating and redesigning static working postures in industry. Nevertheless it does not provide information about the discomfort level of the whole body and it refers to a quantification of a qualitative analysis of human posture. The aim of this work was to develop a new method of classification of comfort/discomfort, concerning the whole body movements. This method started from the LUBA approach, and defined an innovative index that combined the joint kinematics with a joint discomfort function “weighted” on the masses of the body areas participating in the movement.
2 Materials and Methods We propose a new method for quantifying the ergonomics of working tasks based on the kinematics of the executed movement. This method is based on the measurement of the joint motion, and, consequently, on the availability of proper technologies, such as: optoelectronic systems for motion analysis; a set of electro-goniometers; a stereo video-recording and a dedicated software for further data processing. The starting point for the ergonomic index computation is the body kinematics, which is expressed as joint angles through a biomechanical model (as in the case of motion analysis systems or video-recording techniques), or through direct measure (as in the case of the already mentioned electro-goniometers). The Method for Movement and Gesture Assessment (MMGA) index comes from the composition of three factors: a) the joints kinematics, b) an articular coefficient of discomfort for each joint, c) a coefficient estimating the “weight” of the ergonomic contribution of each joint to the movement. For the lower limb we applied an upper-limb corresponding scale, weighted on the mass of the lower-limb portion involved.
Method for Movement and Gesture Assessment (MMGA) in Ergonomics
593
a) Joint kinematics. Joint kinematics (α(t)) was measured through the Vicon Motion Analysis System mod. 460 (Vicon Motion Systems Ltd, Oxford, UK), equipped with 6 M series tv-cameras whose sampling rate was 120 Hz. The tv-cameras were placed all-round the subject at 2,20 mt in height.. The standard Vicon Plug-in-Gait marker set was used implementing a total body biomechanical model with 33 reflective markers. Anthropometric measures and dedicated algorithms were used to estimate and filter 3D coordinates of internal joint centres and joint angles. The following variables were considered for this study: wrist, elbow, knee and ankle flex-extension; shoulder and hip flex-extension, intra-extra rotation and abd-adduction; trunk flexextension, rotation and lateral bending. Each movement was time-normalised to 100 points, independently from the actual duration, to allow intra- and inter-subject comparisons. b) Discomfort score. The coefficient of discomfort (ϕ) for each j-th joint at time t,
ϕj(α(t)), was computed through a spline fitting of the discomfort ranks derived from the LUBA method (Fig. 1) along the joint range of motion estimated from anthropometric dataset [7].
Fig. 1. Example of definition of the coefficient of discomfort concerning the wrist flexextension angle. At each joint angle α(t), corresponds a discomfort score ϕ(α(t)).
c) Body normalization. According to the data from Zatsiorsky and Seluyanov [8], the comfort index of each joint was assigned a percentage ergonomic contribution (∂j), which was proportional to the mass of the single j-th distal body district (for the j-th joint) participating in the movement. To resume: MMGA(t) = ∑ϕj(α(t)) ×∂j
(1)
and after time-integration: MMGA = ∑tMMGA(t) with t = 1, …, 100, for the whole task.
(2)
594
G. Andreoni et al.
Subjects. 8 subjects participated in the preliminary study for the evaluation of the MMGA index. Their main characteristics are reported in table 1. All the subjects were volunteers, chosen among healthy undergraduate students or researchers. A preference was given to the balance between male/female number and to height distribution. So we chose to have as many different conditions as possible to test the method sensitivity and effectiveness Table 1. The characteristics of the subjects participating in the research Subject ID S1 S2 S3 S4 S5 S6 S7 S8
Sex M F F M F M M F
Height (cm) 190 170 155 167 166 178 181 171
Weight (kg) 90 56 50 62 52 68 80 60
Dexterity (R/L) R R R R R R L L
Fig. 2. An example of the experimental setup. The initial (on the left side) and final (on the right side) postures of a subject reaching the lowest point on the left column of the grid. The adopted motor strategy (trunk flexion and torsion with legs kept extended) should be noticed.
Experimental setup. The subjects were asked to reach 21 points on a firm surface structured in a 3D grid of 7 rows (row inter-distance: 30 cm) by 3 columns (columns inter-distance: 30 cm). They were asked to align their lateral malleoli to a reference line and to keep their feet in that position throughout the reaching sequence. The line was subsequently set at 2 distances from the grid: the first was customized on each subject’s leading forearm length; the second was 40 cm farther. This mock-up allowed us to reproduce the most part of the reaching tasks experimented in the real life during interaction with products and environment. The extent of the movement at different levels of height and depth was coherent with an expected rank of difficulty reflecting a better (or poorer) ergonomic condition. Each subject repeated the
Method for Movement and Gesture Assessment (MMGA) in Ergonomics
595
sequence of reaching movements three times. No indication about the motor strategy to be followed was given. All the trials were processed to verify intra-subject repeatability. The best trial in terms of quality of data (all markers always visible and correctly reconstructed in the 3D space) was selected for the computation of kinematic parameters (i.e. joint angles) and of the MMGA index representative for that subject and that movement.
3 Results The tasks evaluation provided a discomfort classification of the man-product interaction. This was expressed in term of reaching comfort as expected by the experimental design. Comfort scores appeared coherent with the difficulty level designed a priori for the different conditions (Fig. 1).
Fig. 3. The ergonomic assessment of the reaching tasks (21 points, one subject) according to the LUBA index (left), the MMGA index excluding the lower limbs (center), and the complete MMGA index (right). Isolevel lines of comfort are dysplayed. Note that colors for LUBA and MMGA have a different scale, due to the different magnitude of indexes.
For a better understanding of the differences between the two mehods, we implemented a dedicated Matlab© (The MathWorks, Inc.) routine for comparing the MMGA index with respect to the LUBA index (Fig. 3). We chose to adopt a visual representation of iso-comfort lines in the reaching plane. We compared the LUBA index to the MMGA index both in its complete formulation and excluding the lower limbs (as in the LUBA method). At a first glance a good accordance is shown, even if there is a significant offset (more than 1000 points) between the magnitude of the indexes from the two methods. LUBA appears significantly higher. This may be the consequence of the very “rough” steps that LUBA draws in assigning ergonomic scores to articular ranges. The implementation of the lower limb contribution in the MMGA method makes it more complete.
596
G. Andreoni et al.
Fig. 4. Example of the capability of the MMGA method to detect changes in motor strategies that reflect different ergonomic conditions. MMGA values (center), and the final postures in reaching 2 near points but with change in the strategy of lower limbs movement (left and right).
Fig. 5. Example of the results of the MMGA index for a right-handed subject (left) and a lefthanded subject (right). The symmetrical behaviors reflect each own dexterity.
Moreover it allows for a better and resolute discrimination of ergonomic motor strategies. For instance Fig. 4 presents the MMGA scores together with the representation of the subject’s movement (final postures, shown through the biomechanical model adopted) . The fine resolution of the MMGA methods may be appreciated. It appeared able to differentiate between critical ergonomic conditions even though the actual environmental and task conditions were very similar. In the above mentioned case the passage from the 6th to the 7th row determined a change in the kinematic strategy adopted by the subject. Namely, he/she turned from a more “correct” knee-flexion
Method for Movement and Gesture Assessment (MMGA) in Ergonomics
597
strategy used in reaching the lowest point, to the more ergonomically critical one, characterised by extended legs and increased trunk flexion. Furthermore, it was possible to discriminate the dexterity of the subject by simply observing the graphical results (Fig. 5).
4 Conclusion The goal of a well-designed assessment tool is: (i) to consider the information that has been gained through research concerning the causes and the impact of strain on the human system; (ii) to organize surveys to assess and predict if and when this strain is reaching hazardous levels and may thus induce work-related musculoskeletal disorders. The MMGA index aims to provide a quantitative value for the ergonomic ranking of motor tasks. It combines information about joints’ kinematics, articular comfort ranges and body parts involvement during the subject’s interaction with environment and products. The MMGA index provides a complete assessment also for lower limbs that the LUBA analysis doesn’t include. When considering only the same body districts for both indexes, with respect to the LUBA index a good correspondence is shown for the MMGA score. The method does not provide an absolute evaluation of the comfort/discomfort score for a general environment yet, however it works in comparative analysis between similar but competitive conditions (comparison among two or more situations or products whose one is assumed as reference). The MMGA index has proved to differentiate the comfort level of easy tasks providing a coherent ergonomical ranking of movements; e.g. in the ergonomic assessment of tasks related to usability (such as opening and closing the upper doors of the white goods likes refrigerators) it presented the worst MMGA index values for tasks supposed to be less comfortable, such as the interaction with the higher and the lower part of the fridge. The data from the MMGA index currently relate to a quantitative computation of the joints motion captured on real subjects but it might be integrated into a human motion simulation software for implementing proactive ergonomic analysis in the virtual prototyping process. Acknowledgments. This work was supported by a grant from Fondazione Politecnico di Milano and Indesit Company S.p.A..
References 1. David, G.C.: Correcting Ergonomic methods for assessing exposure to risk factors for workrelated musculoskeletal disorders. Occupational Medicine 55, 190–199 (2005) 2. Waters, T., Putz-Anderson, V., Garg, A., Fine, L.: Revised NIOSH equation for the design and evaluation of manual lifting tasks. Ergonomics 36, 749–766 (1993) 3. Karhu, O., Kansi, P., Kuorinka, I.: Correcting working postures in industry: a practical method for analysis. Applied Ergonomics 8(4), 199–201 (1977)
598
G. Andreoni et al.
4. Buchholz, B., Paquet, V., Punnett, L., Lee, D., Moir, S.M.: PATH: A work sampling-based approach to ergonomic job analysis for construction and other non-repetitive work. Applied Ergonomics 26(3), 177–187 (1996) 5. McAtamney, L., Corlett, E.N.: RULA: a survey method for the investigation of workrelated upper limb disorders. Applied Ergonomics 24(2), 91–99 (1993) 6. Kee, D., Karwowski, W.: LUBA: an assessment technique for postural loading on the upper body based on joint motion discomfort and maximum holding time. Applied Ergonomics 32(4), 357–366 (2001) 7. Tilley, A.: The measure of man and Woman. In: Bema (ed.), Milano, Italy (1993) 8. Zatsiorsky, V., Seluyanov, V.: The mass and inertia characteristics of the main segments of the human body. In: Matsui, H., Kobayashi, K. (eds.) Biomechanics VIII-B, International series of Biomechanics, vol. 4B, pp. 1152–1159. Human Kinetics Publishers, Champaign (1983)
Complexity of Sizing for Space Suit Applications Elizabeth Benson1 and Sudhakar Rajulu2 1
MEI Technologies 2525 Bay Area Blvd. #300 Houston, TX 77058 NASA Johnson Space Center, 2101 NASA Parkway Houston, TX {Elizabeth.Benson,Sudhakar.Rajulu-1}@NASA.gov 2
Abstract. The ‘fit’ of a garment is often considered to be a subjective measure of garment quality. However, some experts attest that a complaint of poor garment fit is a symptom of inadequate or excessive ease, the space between the garment and the wearer. Fit has traditionally been hard to quantify, and space suits are an extreme example, where fit is difficult to measure but crucial for safety and operability. A proper space suit fit is particularly challenging because of NASA’s desire to fit an incredibly diverse population (males and females from the 1st to 99th percentile) while developing a minimum number of space suit sizes. Because so few sizes are available, the available space suits must be optimized so that each fits a large segment of the population without compromising the fit of any one wearer.
1 Introduction Successfully predicting wearer dimensions and providing the appropriate amount of slack and adjustability is crucial in developing space suits, where a poor fit can decrease mobility and lead to wearer discomfort or even injury. Suit designers need to know the sizes of the people they need to fit, the amount of adjustability the suits need, and how well a suit must fit to be usable. Additionally, it is important to make sure a suit fits before it is evaluated, or used to evaluate other systems. Therefore, the Anthropometry and Biomechanics Facility at NASA’s Johnson Space Center is working in conjunction with the Pressure Garment Group at NASA to combine traditional and more advanced methods of quantifying fit, to aid the designers of the next generation of space suits. This paper describes the issues that are faced in attempting to fit suits to a diverse population, and some of the methods that can be used to surmount these difficulties and provide the best possible compromise between fit and accommodation.
2 A Background on Suit Fit Past NASA suit systems have used a variety of techniques with varying success to fit their target populations. These have ranged from the custom sizing used in the Apollo program, to the off-the-shelf approach of the current Advanced Crew Escape Suit, which is an adjustable suit available in a set of standard sizes. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 599–607, 2009. © Springer-Verlag Berlin Heidelberg 2009
600
E. Benson and S. Rajulu
2.1 Custom Tailoring: The Apollo A7LB In the Apollo program, astronauts wore custom tailored suits and had the opportunity to undergo multiple fit tests to ensure that their suits fit. To quote David Scott of Apollo 15, “So they felt that if you had a proper fit, then you had better mobility.” [1] The importance of a tailored fit is also mentioned in the Mission Report for Apollo 16: “The suits are custom fitted and, by necessity, must be tight to achieve good mobility.” [2] The combination of custom tailoring and some adjustability in the suit allowed the Apollo astronauts to correct minor sizing issues in flight, such as a case where the legs of a pressure garment were too short, leading to discomfort [3]. However, some fit problems remained, even with custom fit suits. For example, Astronaut Edwin Aldrin had large biceps which allegedly interfered with the arm bearings on his suit, and prevented his fingers from seating correctly in his gloves when he bent his arms [4]. Gloves were also an issue, as discussed by the crew of Apollo 15 in the technical debriefing for that flight [5]. Crewmember David Scott brings up a common problem with space suit fit: the compromise between arm length and glove mobility. If a suit is sized to fit the crewmember when their arms are outstretched, the fingers are forced back out of the gloves when they pull their arms close to the chest. If, on the other hand, the suit is sized for the fingers to be snug when working close to the chest, the fingertips will press against the glove when the arms are at other postures. As a result, Scott had his suit arm length adjusted to keep his fingers in the gloves, and accepted the sore and painful fingertips that resulted from this fit. 2.2 Modularity: The Shuttle Extravehicular Mobility Unit The architects of the space shuttle program abandoned the custom fit suits of Apollo in the interests of improving manufacturing efficiency, reducing cost and allowing easier maintenance and resizing. The designers of the space shuttle Extravehicular Mobility Unit (EMU) were tasked with developing a set of modular suits to fit a population that could include everyone from a 5th percentile female to a 95th percentile male [6]. However, as the suits were developed the number of sizes of Hard Upper Torso (HUT), a major suit component, was cut back. Because only the M, L and XL HUT sizes were developed, many smaller women cannot wear the current shuttle EMU. The compromise between arm length and glove mobility also continues in the shuttle program, where fingertip pressure has been indicated as a possible source for fingertip pain and fingernail delamination [7]. 2.3 Standard Sizes: The Advanced Crew Escape Suit The Advanced Crew Escape Suit (ACES) is similar to pressure suits developed for the US Air Force, and has a similar sizing scheme based on the height and weight of the wearer. However, the pressure suit was sized for a seated pilot in an aircraft, and not for the walking, running or climbing that a shuttle crewmember may perform during training. As a result, many crewmembers allegedly choose to wear a size that is larger than the size recommended for them by the sizing scheme [8].
Complexity of Sizing for Space Suit Applications
601
2.4 Consequences of Poor Suit Fit The cost of a poor suit fit, as suggested in the previous sections, can include wearer discomfort and a reduction in mobility. A suboptimal suit fit can also increase the effort that a wearer must exert to move in the suit, since their joints are not lined up with the suit joints. As described by Menendez and colleagues in their 1993 paper, the instantaneous centers of rotation of the human and space suit joints should be co-located to minimize the energy needed to move [9] For instance, a wearer is likely to move less efficiently if the suit’s knee joint is several inches above or below his own knee. Also, a poorly fit suit has the potential to actually impinge on the wearer during motion and lead to a reduction in mobility and risk of injury. In a 2003 report on shoulder injuries in the space shuttle EMU, it was suggested that the scye openings on a poorly sized hard upper torso could restrict shoulder motion and lead to injury [10].
3 Improving Fit: What Is Fit? When attempting to achieve a good space suit fit, it seems reasonable to examine solutions that have been developed to fit clothing and gear in the past. The problem of fit has often been solved by trial and error, and through time consuming and expensive tailoring processes. More recently, attempts have been made to find more efficient ways of fitting people. However, if fit is to be optimized it must first be defined and understood. 3.1 Fit: An Objective Measure? Fit is often considered to be a subjective measurement of garment quality, and is expected to vary from person to person, and sometimes even between the same person on different days. However, experts would argue that a garment’s “fit” is merely an indication of the garment’s tailoring, or the complex relationship of the garment dimensions to the dimensions of the wearer [11]. If a person claims that their clothing does not fit them, they are indicating dissatisfaction with the garment’s tailoring in one or more areas. This difference between garment and wearer dimensions is called “ease”. A garment that has too much or too little ease in any location can lead to poor fit and dissatisfaction of the wearer. To complicate matters, a garment’s dimensions do not only have to accommodate a stationary wearer; there must be enough additional material to provide for extreme motions such as kneeling and reaching overhead [12]. The problem of sufficient material, or ‘run length’, becomes even more complex for a one-piece garment such as a coverall, where addition of material in one area can affect the fit of the entire garment [13]. Making an item oversized can solve some problems, but excess material causes its own issues. For instance, in their 2007 paper, Ng et al. describe the problem of fit in the shoulder area [11]. If the scye area (the arm opening) of a garment is too wide or too far from the underarm it can restrict the wearer as much as a sleeve opening that is too tight.
4 Improving Fit: Sizing Systems Admittedly, there is a subjective component to fit and comfort. However, gross approximations of correct size should be made before fit is fine tuned on an individual
602
E. Benson and S. Rajulu
basis. This approximation is achieved through the development of a standardized sizing system that will attribute a size to an individual. 4.1 Traditional Sizing Systems and Limitations Sizing systems are often developed using the anthropometry from a sample of intended wearers. For example, a US Air Force flight suit sizing system [15] was based on a 1967 survey of 2420 men. Individuals in the sample are then generally categorized based on a pair of so-called key dimensions, which are easy to measure and which are assumed to be highly correlated with other dimensions. For example, a sizing system may be based on a combination of height and weight. Once the individuals in the sample have been split up into categories based on intervals of key dimensions, each category is examined individually. Anthropometry from individuals in a given size category is used to develop regression equations, which will predict minor dimensions like chest breadth. The use of key dimensions allows easy size selection, since a table of two dimensions can be consulted when assigning a piece of equipment to an individual. However, this method assumes that an individual’s shape and size can be predicted accurately using two basic dimensions such as height and weight – an assumption that is not necessarily accurate. This limitation becomes obvious as the number of sizes increases. Increasing the number of sizes would initially seem to improve fit, but can actually cause more fit problems because peoples’ minor dimensions are allowed less variability within a size. The issues associated with a simplistic sizing scheme are a symptom of the large amount of variation in peoples’ shapes and sizes, their so-called ‘somatotypes.’ For instance, for a given height and weight you might find people with long torsos and short legs, or short torsos and long legs. A tall muscular person could potentially weigh the same as a tall obese person, but have a very different shape. 4.2 Sizing for Men and Women The problem of fitting a wide range of sizes and shapes of people is further complicated if both men and women must be fit with the same sizing system. Issues have arisen in the past when military organizations have attempted to accommodate women using systems that have been designed for men. Some groups initially assumed that women could fit in the same sizes as small men – or at worst, that some of the men’s sizes would have to be scaled down proportionately to fit women [16]. However, problems arise due to the very different proportions between men and women. For the same height and weight, women can have significantly wider hips and narrower shoulders than men. If, for example, a onepiece coverall designed for a man is meant to fit at the shoulders and the hips, then one of these fit areas is likely to be compromised for a woman. She has to choose a size that fits over her hips, likely leading to the coverall’s shoulders being too wide for her frame. Several approaches have been taken towards fitting both men and women with the same gear. These methods are summarized in Figure 1.
Complexity of Sizing for Space Suit Applications
603
Fig. 1. Sizing Schemes for Men and Women
For systems such as a naval uniform where a tailored fit is considered important, a female-only sizing scheme can be created (section 1 of Figure 1) as described in a 1991 paper by Armstrong Laboratory [17]. For types of equipment where fit may not be as essential, a few extra sizes can be designed for poorly fit women, with the assumption that most women can wear men’s sizes with an acceptable decrement in fit (section 2 of the figure). Additionally, in at least one case, a theoretical integrated sizing system was developed (shown in section 3 of the figure). An example of integrated sizing system for the US army Battle Dress Uniform (BDU) is described in a 1981 report from the Natick Research and Development Lab [18]. This integrated system optimizes the smallest BDU sizes for women, optimizes the largest sizes for men, and forms a compromise with the intermediate sizes that both men and women would wear. For instance, an extra-extra-small extra-short pair of pants might be rarely worn by men – in which case, the pants could be designed for a woman’s generally larger hips and smaller waist. A more intermediate size might still accommodate a woman’s hips, but provide for larger waists. The compromise in this size might lead to loose hips on the men and loose waists on the women, but within an acceptable range. 4.3 Better Sizing Systems: Multivariate Methods To solve the oversimplification problem associated with sizing systems which rely on paired key dimensions like height and weight, some groups have attempted to use multivariate methods to develop sizing schemes. Using techniques such as principal component analysis and cluster analysis, sizing system designers can account for the wide variation in human shape by grouping together people of similar somatotype. For example, Zehner et al. developed a sizing system that used principal component analysis to group a selection of anthropometry from a sample population [19]. This analysis led to a component contrasting limb to torso size and a component representing overall body size. These two components could then be used to represent a
604
E. Benson and S. Rajulu
wide variety of body shapes and sizes. Designers can pick a body type, for example a small individual with small torso and small limbs, and then look at the anthropometry for a person in the database whose values for the two components reflect this shape. This method allows designers to base their design on the anthropometry of an actual person in a given category, and to have greater confidence that their design will accommodate people of varying sizes and shapes. 4.4 Sizing Systems for Space Suits As touched on previously, a sizing system for the next generation of NASA suits will have several additional layers of complexity beyond sizing systems developed for clothing. For one, a space suit that is sized for unpressurized use must also fit when pressurized. Also, space suits must provide enough adjustability to allow for the elongation of the human spine in microgravity, an elongation that may be as much as 3% of a wearer’s stature [20]. In other words, if a suit is sized precisely on the ground while unpressurized, the fit may change in space and when the suit is pressurized. Also, cost and mass restrictions will govern the number of suits created and flown by the space program. This limited number of suits will be required to fit an incredibly diverse population that can comprise anyone from a theoretical 1st percentile female to a 99th percentile male. This population also has the potential to vary significantly in shape, and could include a tall man with short legs as easily as a short woman with long legs. The complexity of space suit fit means that multivariate methods will likely have to be used to develop a sizing system with sizes optimized for a future astronaut population, with enough adjustability to account for variations in the size and shape of its potential wearers, as well as adjustments for pressurized vs. unpressurized fit, accommodation for spinal elongation, and personal preference of the wearer.
5 From Anthropometry to Fit The dimensions of the wearer provide only a starting point for the design. If clothing were made to fit a human’s anthropometry, with no additional slack, it would be skin tight. Therefore, using tailoring techniques that have been developed over centuries, designers add ease allowances and seam allowances to develop a final garment [16]. However, this initial ease allowance may not provide enough run length to allow for extreme motions. To evaluate the amount of ease in a garment, fit checks are performed. 5.1 Subjective Fit Checks A fit check can be as simple as putting on the garment and performing some basic motions, while an expert assessor evaluates the fit using a checklist or questionnaire. An example of this type of fit check is documented in a 1995 paper on flight suit fit by Crist et al. [21]. 5.2 Objective Fit Checks: Range of Motion Testing Several studies have evaluated the range of motion of a subject while varying the amount of ease. This method allows a quantitative measure of how the subject’s
Complexity of Sizing for Space Suit Applications
605
restriction varies with change in the shape of the garment. For example, Huck, Maganga and Kim controlled the shape and size of a custom-made garment, except for one location where they increased or decreased the amount of ease [13]. In other studies, a subject is first provided with a garment in their recommended size based on a sizing chart, as in a study by Adams et al. published in 1995 [22]. An evaluation protocol is performed in the recommended garment size, and then in garments that are longer or shorter, larger or smaller according to the sizing chart. The protocol typically involves a series of predetermined motions or exercises meant to take up any slack in the garment. One such posture, suggested by Crow and Dewar in their 1986 paper on stress in clothing seams, involves squatting and lifting the arms over the head [23]. In a one piece garment such as a coverall, the squatting motion takes up a lot of the ease, leaving very little slack when a subject then reaches their arms over their head (see Figure 2, from an unpublished pilot study).
Fig. 2. Example: Posture while crouching and reaching overhead in oversized (Left), appropriately Sized (Center) and undersized (Right) Flight Suits
The subject in the figure is wearing three different sizes of flight suit: a coverall that is too large, a coverall that is appropriately sized, and a coverall that is too small. The oversized and appropriately sized coverall were approximately equivalent, but an obvious restriction in motion was observed for the smallest flight suit. In another study, Ng et al. developed a model of the interaction between several garment dimensions and the range of motion of the arm [11]. By optimizing the location of the underarm point, they found that they could find an approximate solution for a sleeve with the minimum amount of fabric to provide a given range of arm motion. 5.3 Applying Fit Testing to Suits Because the fit of a suit is crucial for comfort, operability and safety, and because a poor suit fit is likely to cause a decrement in performance, testing should be completed to evaluate a suit’s fit. Objective fit testing during the design of a space suit architecture could aid designers in assessing how well they are fitting their target population, and could potentially indicate where slack must be taken out or added, or
606
E. Benson and S. Rajulu
where additional adjustability is needed. Although the flight suit study suggested that range of motion can be improved by providing additional slack, there are cases where too much clearance causes issues, as in the case of the current space shuttle EMU, where the issue is not so much fabric slack, but the size of the hard upper torso. Without performing fit testing, it is difficult to assess the unknown impact of suit fit while evaluating the space suit or while evaluating a system that will interact with the suit. If a subject is wearing a suit that has a marginal fit, they may be exerting more effort to perform a given task than a subject wearing a suit in their proper size. The suit’s suboptimal fit could lead to an undeservedly poor evaluation of the suit, or of the system being tested.
6 Conclusion and Future Work As the next generation of space suits are developed for NASA’s Constellation program, steps should be taken to ensure that the suits will adequately fit their target population. This task can include the development of a sizing system that optimizes the number of space suit sizes and their required adjustability, without compromising accommodation for any sector of the population. As suit prototypes are developed, objective fit checks can evaluate how well the new suit is fitting a sample of its population and help to indicate problem areas for fit. Compromises will have to be made to accommodate both male and female wearers of widely varying size and shape without unduly reducing mobility or decreasing efficiency for any one wearer. An acceptable suit fit will also allow more realistic assessment of not only the suit, but also of systems that interact with the suit during man in the loop tests.
Acknowledgements The writers of this paper would like to acknowledge Amy Ross and Terry Hill of the Pressure Garment group at NASA, as well as Scott Cupples and Brian Johnson of the EVA Project Office, for their funding and support.
References 1. Jones, E.M.: Apollo Lunar Surface Journal, http://history.nasa.gov/alsj/a15/a15.spur.html 2. Jones, E.M.: Apollo Lunar Surface Journal: Apollo 16 Mission Report Online: http://history.nasa.gov/alsj/a16/A16_MissionReport.pdf 3. Jones, E.M.: Apollo Lunar Surface Journal: Apollo 12 Mission Report Online: http://www.hq.nasa.gov/alsj/a12/A12_MissionReport.pdf 4. Jones, E.M.: Apollo Lunar Surface Journal: Apollo 11 Post-Flight Report on Suits / PLSSs / etc (Preliminary Version), http://www.hq.nasa.gov/alsj/a11/A11CSD.html 5. Jones, E.M.: Apollo Lunar Surface Journal: Apollo 15 Technical Debrief, http://history.nasa.gov/alsj/a15/a15-techdebrief.pdf
Complexity of Sizing for Space Suit Applications
607
6. Currie, N.J., Graziosi, D.: Space Suit Design Enhancements to Improve Size Accommodation and Mobility. In: Proceedings of the Human Factors and Ergonomics Society 47th Annual Meeting 2003 (2003) 7. Strauss, S.: DO, MPH. Extravehicular Mobility Unit Training Suit Symptom Study Report. NASA/TP–2004–212075 8. Transcript, Jean Alexander Oral History, Houston, TX by Kevin Rusnak, Johnson Space Center Oral History Project, June 23 (1998) 9. Menendez, V., Labourdette, X., and Baez, J.M.: Performance of EVA Suit Mobility Joints Influence of Driving Parameters. In: 23rd International Conference on Environmental Systems, SAE Paper 932098 (July 1993) 10. Williams, D.R., Johnson, B.J.: EMU Shoulder Injury Tiger Team Report. NASA/TM— 2003–212058 11. Ng, R., et al.: Single Parameter Model of Minimal Surface Construction for Dynamic Garment Pattern Design. Journal of Information and Computing Science 2(2), 145–152 (2007) 12. Adams, P., Slocum, A., Herrin, G.: An Approach for Predicting Garment Effects on Range-of-Motion Based on Measurement of Garment Ease. In: Proceedings of the Second International Symposium on Consumer Environmental Issues: Safety, Health, Chemicals and Textiles in the Near Environment, pp. 216–229 (1992) 13. Huck, J., Maganga, O., Kim, Y.: Protective Overalls: Evaluation of Garment Design and Fit. International Journal of Clothing Science and Technology 9(1), 45–61 (1997) 14. Ng, R., et al.: Single Parameter Model of Minimal Surface Construction for Dynamic Garment Pattern Design. Journal of Information and Computing Science 2(2), 145–152 (2007) 15. Alexander, M., McConville, J., Tebbetts, I.: Revised Height/Weight Sizing Programs for Men’s Protective Flight Garments. Technical Report (AMRL-TR-79-28) (AD A070 732). Aerospace Medical Research Laboratory, Wright-Patterson Air Force Base, OH (1979) 16. Robinette, K.: Flight Suit Sizes for Women. (AD A321 200) Armstrong Laboratory, Brooks Air Force Base, TX (1996) 17. Robinette, K.M., et al.: Development of Sizing Systems for Navy Women’s Uniforms (U). AL-TR-1991-0117 18. McConville, J., Robinette, K., White, R.: An Investigation of Integrated Sizing for US Army Men and Women. Anthropology Natick/TR-81/033 (AD A109 406) 19. Zehner, G.F., Meindl, R.S., Hudson, J.A.: A Multivariate Anthropometric Method for Crew Station Design: Abridged, AL-TR-1992-0164. Armstrong Laboratory, WrightPatterson Air Force Base (1992) (AD A274 588) 20. Stoycos, L.E., Klute, G.K.: Anthropometric Data from Launch and Entry Suited Test Subjects for the Design of a Recumbent Seating System. NASA Technical Memorandum 104769 (1993) 21. Crist, J., Gross, M., Robinette, K., Altenau, M.: Fit Evaluation of Two Aircrew Coveralls. AL/CF-TR-1995-0053, Armstrong Laboratory, Air Force Material Command, Wright Patterson Air Force Base, OH (1995) 22. Adams, P., Keyserling, W.M.: The Effect of Size and Fabric Weight of Protective Coveralls on Range of Gross Body Motions. American Industrial Hygiene Association Journal 56(4), 333–340 (1995) 23. Crow, R.M., Dewar, M.M.: Stresses in Clothing as Related to Seam Strength. Textile Research Journal 56(8), 467–473 (1986)
Impact of Force Feedback on Computer Aided Ergonomic Analyses H. Onan Demirel1 and V.G. Duffy1,2,3 1 School of Industrial Engineering School of Agricultural and Biological Engineering 3 Regenstrief Center for Healthcare Engineering Purdue University,West Lafayette, IN, 47906, USA [email protected], [email protected] 2
Abstract. The objective of this study is to test the correlation between a Physical Task and a Digital Task through integrated sensory feedback mechanism in Virtual Build Methodology. The research question posed regards whether the pressure feedback mechanism in Virtual Build Methodology proposes high fidelity for push-pull tasks. There are many research studies that have been done on DHM, MOCAP, VR and Haptic interfaces individually, but integrating those with a tactile feedback mechanism is still challenging. While being increasingly used, the Virtual Build Methodology has not been studied regarding its human integration through a multi-sensory feedback system. It may seem intuitive, but disregarded many times, that the tactile feedback mechanism is essential for product design and development practices. This study aims to fill this gap by introducing a pressure based sensory feedback system to provide a higher fidelity in virtual product design practices. Keywords: Computer Aided Engineering (CAE), Ergonomics, Virtual Build Methodology (VBM), Digital Human Modeling (DHM), Motion Capture (MoCap), Haptics, Force Feedback, Product Design, Healthcare Engineering.
1 Introduction Traditional product design and production planning techniques are insufficient to manage dynamic product development and customization (in contrast to mass production) needs [1]. Technological progress, especially in the past two decades, sped up the design and manufacturing process and reduced costs of product development through digital design/production techniques [2]. The use of computer aided ergonomics tools minimize the need for excessive physical prototyping, limit the number of design iterations, reduce design/manufacturing costs, and decrease the lead time to market [3][4]. Poor ergonomic practice may result not only in physical injuries but also significant financial and reputational loss for companies. Although ergonomic problems in manufacturing and production domain may result in significant financial losses, problems in healthcare could be more severe. Most manufacturers do not regard Human Factors Engineering (HFE) principals during medical product design [5]. Many times, V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 608–613, 2009. © Springer-Verlag Berlin Heidelberg 2009
Impact of Force Feedback on Computer Aided Ergonomic Analyses
609
there is a lack of fundamental interest paid to HFE principles when compared to mechanical engineering or software programming and the functional aspects of product development. However, if manufacturers had employed a better design practice through HFE and followed a human-centered design approach, failures due to poor design practice would have been reduced [5]. Because of the increasing trends in US healthcare costs and medical product design related challenges, a code cart is selected to demonstrate the practicality of force feedback integrated VBM in design and analyses of concept products.
2 Sensory Feedback Integrated Virtual Build Methodology There are many research studies that have been done on assistive CAE tools (i.e. DHM, MOCAP, VR, and Haptic interfaces) individually, but an integration of those under a global design method is still not well established. While being increasingly used, the Virtual Build Methodology has not been studied with multi-sensory feedback system integration. Although vision and audition are among the most important human sensations and are engaged in most human-machine interface applications, research about systems that introduce tactile feedback (touch and proprioception) is not a focus of many CAE systems. A tactile feedback mechanism is essential for product design and development [6][7][8][9].
3 Components of the Integrated System 3.1 Motion Capture (MoCap) System The MoCap system STT Motion Captor, used in this study is composed of six infrared cameras and linked computers to capture and store data. The system is set up with 60 frames per second and assigned to a 19-marker human configuration template. 3.2 Code Carts The concept code cart was developed through a yearlong collaboration between students and faculty from the following Purdue University affiliations – Regenstrief Center for Healthcare Engineering, College of Engineering, School of Nursing, and College of Management [8][10]. Efforts of developing a better code cart are focused on eliminating problems observed by current carts and adding features to increase efficiency of response. A lightweight, versatile, maneuverable, ergonomic, and organized cart with increased safety is designed and introduced to healthcare practitioners. The concept code cart model provides a combination of features different than the current code carts (i.e. bi-directional transparent drawers for ease of access and viewing, an adjustable handle, 360° rotating platform for a defibrillator, oxygen tank housing, biohazard
610
H.O. Demirel and V.G. Duffy
Fig. 1a. – Current Cart model [7]
Fig. 1b. – Concept Cart model [8][10]
bin, IV pole, rounded corners, biocide exterior surface, retractable power cord, and AC outlets.) (Fig. 1a and Fig.1b shows current and concept cart model.). 3.3 Sensory Feedback Mechanism The feedback system introduced in this study is composed of a pressure pad device integrated with the Virtual Build system. It was seen from literature that humans lose some perception in the Virtual Environment compared to a real environment, where assistant feedback would be helpful to improve subjects’ performance in the VE. In addition to this, the sensory feedback integrated VE, including collision detection and hybrid immersive-VR, would provide an additional sense of reality to the human that could lead to improved task performance. [10][11][12]. The pressure pad system is activated by a 5V power supply (~ .5amp). The capacitance system that is located inside the pressure pad outputs analog signals when pressed/depressed. The output signals are collected in and distributed from the suppliers (Pressure Profile System – PPS) circuit board. Each voltage value, analog signals between 0.25V to 4.5V, corresponds to a standardized pressure value from 0 psi to 10 psi provided by PPS [13]. National Instruments’ (NI) LabView 8.2 software is used to analyze real-time data coming from pressure pads. A special code is written to interpret the analog data. The code initiates a pre-recorded video after the specific pressure threshold (value which is required to initiate a movement on a physical cart) is reached. The prerecorded video mimics what the subject would be seeing if she/he is pushing a real cart. 3.4 Design and Analyses Software Packages Digital code carts are designed in Dessault Systemes’ parametric CAE package called CATIA V5 R16. UGS Tecnomatix JACK will be used to stream the data from MoCap to perform ergonomic assessments.
Impact of Force Feedback on Computer Aided Ergonomic Analyses
611
4 Proposed Methodology Each subject is required to perform a push task on two different code cart designs (market available cart design vs. prototype cart design) in both the actual MOCKUP (Physical Task Experiment with actual code carts) and corresponding identical VE (Virtual Task Experiment with virtual cart video). Simultaneously, the subject’s movements are captured through the motion capture system. (See Fig.3) One major difference between the Virtual Task Experiment and the Physical Task Experiment is that a pressure pad device is used as a sensory feedback mechanism. This device assists users in initiating a pre-recorded video on LCD display which mimics a forward push movement when a threshold pressure force value is reached. Captured motions are input into UGS Jack to drive the digital human model. These mimic the movement of the subject working on push-pull posture. Then, embedded static ergonomic analysis tools in UGS Jack are used to analyze the subjects’ initial and final posture in both experiments under varying load conditions (Fig. 2 shows VE task.).
Fig. 2. – Virtual Push Task and Pressure Pads
5 Data Collection and Analysis At least 24 subjects will complete the tasks and evaluate the code carts under varying loading conditions (0 – 40lb). A code cart from Purdue University School of Nursing will be used as the “current code cart” and a developed prototype will be used as the “concept code cart.” See Fig. 1a and 1b for an example of the carts to be tested. Differences generated by subjects’ different performance (difference in posture) between MOCKUP and VE are the main focus of the experimental design. These differences can be measured through certain posture variables (shoulder abduction torque and elbow torque). Then, the correlation between these variables in MOCKUP and VE will provide information about the degree of fidelity.
612
H.O. Demirel and V.G. Duffy
Fig. 3. – Experimental process and procedure
6 Discussions It was observed through literature review and from the on-going study that the Virtual Build Methodology proposes a suitable environment to incorporate computerized technologies with a tactile feedback system, where users can interact with digital products/environment during the design and evaluation phases. The success of this study may provide a testbed for concept products. This could include a reduction in time and financial costs of the design cycle and minimize the design errors by systematically applying HFE design principles. Future work could include evaluating current or different concept products. Advance immersive technologies and haptic devices such as CAVE, Head-MountedDiplays and real-time/dynamic DHM tools such as the Lumbar Motion Monitor (LMM) could also be considered.
Impact of Force Feedback on Computer Aided Ergonomic Analyses
613
Acknowledgements The authors would like to thank the project team members Michael Criswell, Karalyn Tellio, Bob Reading, Jenny Bitzan and Tim Sparks as well as the professors George T. –C. Chiu, Steve Witz and Lee Schwarz for their valuable contributions in the development of the proposed code cart design.
References 1. Sundin, A., Örtengren, R.: Digital Human Modeling for CAE Applications. In: Salvendy, G. (ed.) Handbook of Human Factors and Ergonomics, pp. 1053–1078. John Wiley & Sons Inc., Chichester (2006) 2. Li, Z.: Antropometric Topography. In: Karwowski, W. (ed.) International Encyclopedia of Ergonomics and Human Factors, pp. 266–270. Taylor & Francis, Boca Raton (2006) 3. Yang, J., Aldel-Malek, K., Farrell, K.: The IOWA Interactive Digital-Human Virtual Environment. In: The 3rd Symposium on Virtual Manufacturing and Application, Anaheim, CA 4. Duffy, V.G. (ed.): Handbook of Digital Human Modeling: Research for Applied Ergonomics and Human Factors Engineering. CRC Press, Boca Raton (2008) 5. Clarkson, J.P., Ward, J.C.: Human Factors Engineering and The Design of Medical Devices. In: Carayon, P. (ed.) Handbook of Human Factors and Ergonomics in Health Care and Patient Safety, pp. 367–383. Lawrence Erlbaum Associates, Philadelphia (2006) 6. Tan, H.Z.: Why I Work on Haptic Interfaces. Tan, H.Z. Personal Home Page (2008), http://cobweb.ecn.purdue.edu/~hongtan/ (retrieved October 25, 2008) 7. UC Davis Health System. Code Cart [Image], http://www.ucdmc.ucdavis.edu/cne/resources/ clinical_skills_refresher/crash_cart/ (retrieved January 25, 2009) 8. Demirel, H.O.: Thesis Proposal: Sensory Feedback Mechanism for Virtual Build Methodology. Purdue University, Industrial Engineering, West Lafayette, IN (2009) (unpublished manuscript) 9. Demirel, H.O., Criswell, M., Tellio, K., Reading, B., Bitzan, J., Sparks, T.: Crash Cart Innovation - When Time Matters Most (2007) (unpublished manuscript) 10. Chiang, J., Potvin, R.J., Stephens, A.: The Use of Physical Props in Automotive Assembly Motion Capture Studies. In: SAE Digital Human Modeling for Design and Engineering, Pittsburgh (2008) 11. Wu, T.: Reliability and Validity of Virtual Build Methodology for Ergonomic Analyses. Mississippi State University, Industrial Engineering (2005) 12. Tian, R.: Thesis: Validity and Reliability of Dynamic Virtual Interactive Design Methodology. Missisipi State University, Industrial Engineering, Starkville, MS (2007) 13. PPS, P.P.: Capacitive Sensing, Pressure Profile Systems, PPS (2008), http://www.pressureprofile.com/technology-capacitive.php (retrieved October 25, 2008)
A Methodology for Modeling the Influence of Construction Machinery Operators on Productivity and Fuel Consumption Reno Filla1,2 1
Volvo Construction Equipment AB, Research & Development, SE – 631 85 Eskilstuna, Sweden 2 Linköping University, Department of Management and Engineering, SE – 581 83 Linköping, Sweden
Abstract. This paper is concerned with modeling the actions of a human operator of construction machinery and integrating this operator model into a large, complex simulation model of the complete machine and its environment. Because human operators to a large degree affect how the machine is run, adaptive operator models are a necessity when the simulation goal is quantification and optimization of productivity and energy efficiency. Interview studies and test series have been performed to determine how professionals operate wheel loaders. Two models using different approaches were realized and integrated into a multi-domain model for dynamic simulation. The results are satisfactory and the methodology is easily usable for other, similar situations. Keywords: dynamic simulation, operator model, driver model.
1 Introduction In this ongoing research on simulation in the conceptual design of complex working machines, a wheel loader was chosen as the object of study, although others can be found not only in the field of construction machinery, but also in other sectors such as agriculture, forestry, and mining. Common factors are that these machines consist of at least two working systems that are used simultaneously and that the human operator is essential to the performance of the total system. In the case of a wheel loader, drive train and hydraulics are both equally powerful and compete for the limited engine torque. Figure 1 visualizes how the primary power from the diesel engine is split up between hydraulics and drive train (outer loop) in order to create lift/tilt movements of the bucket and traction of the wheels, but is connected again when filling the bucket in e.g. a gravel pile. In this situation, the traction force from the drive train, acting between wheels and ground, creates a reaction force between gravel pile and bucket edge, which in turn counteracts lift and tilt forces from hydraulics, and vice versa [1]. The inner loop in Fig. 1 shows how the human operator interacts with the wheel loader. In order to fill the bucket, the operator needs to control three motions simultaneously: a forward motion that also exerts a force (traction), an upward motion (lift) V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 614–623, 2009. © Springer-Verlag Berlin Heidelberg 2009
A Methodology for Modeling the Influence of Construction Machinery Operators
615
Fig. 1. Simplified power transfer and control scheme of a wheel loader during bucket loading
and a rotating motion of the bucket to fit in as much material as possible (tilt). This is similar to how a simple manual shovel is used. However, in contrast to a manual shovel, the operator of a wheel loader can only observe, and cannot directly control these three motions. Instead, he or she has to use different subsystems of the machine in order to accomplish the task. The gas pedal controls engine speed, while lift and tilt lever control valves in the hydraulics system that ultimately control movement of the linkage’s lift and tilt cylinder, respectively. The difficulty lies in that no operator control directly affects only one single motion. The gas pedal controls engine speed, which affects both the machine’s longitudinal motion and via the hydraulic pumps the speeds of the lift and tilt cylinders. The linkage between the hydraulic cylinders and the bucket acts as a non-linear planar transmission and due to its design a lift movement will also change the buckets tilt angle and a tilt movement affects the bucket edge’s height above the ground. In summary, there are many of interdependencies and it thus takes a certain amount of training to be able to use the machine efficiently. In modern wheel loaders the operator does not control major components and subsystems directly, but via electronic control units (ECUs). This makes it possible to give the operator support, e.g. by controlling the cylinder speeds in such a manner that the non-linearity of the linkage is compensated for and thus the speed of bucket lift and tilt is proportional to the angle of the tilt and lift lever. Certain aspects of machine operation, for instance a typical brake-and-reverse driving sequence, can also be developed to be semi-automatic or fully automated. If a simulation is required to capture the full scope of the interaction between the machine, its environment, and its operator, all three must be modeled at an appropriate level of detail in order to give valid results as regards such complex total system properties as productivity and fuel consumption / energy efficiency. This is an aspect that is traditionally neglected, because the modeling needs to be extended beyond the technical system.
616
R. Filla
2 Literature study 2.1 Working Cycle The knowledge regarding wheel loader operation and working cycles presented in this paper has been derived from the author’s own wheel loader experience and to a large extent from discussions with colleagues, test engineers, product specialists, and professional operators at Volvo. Most of these discussions were not formally structured, but rather conducted in an ad-hoc manner. However, some unstructured research interviews were carried out and one interview, conducted with a professional test operator, was recorded in the form of a semi-structured research interview. Furthermore, many measurements were performed and the results and implications discussed. Most of these reports are internal, but some MSc theses [2, 4, 5] and academic papers [6] are available in the public domain. In [7], Gellerstedt published wheel loader operator’s thoughts and reasoning and also documented some typical working cycles with photos and test data. Furthermore, non-academic publications like operating manuals and instruction material available from machine manufacturers contain useful information. 2.2 Operator Models Surprisingly little could be found in a literature review restricted to working machines and designing operator models for the purpose of simulation. Zhang et al. validate control strategies by conducting human-operated experiments in their Earthmoving Vehicle Powertrain Simulator. The corresponding paper [3] gives some insight into their reasoning regarding human-machine interaction. They acknowledge the difference between task-oriented jobs (such as wheel loader operation in a short loading cycle) and reference-oriented jobs (e.g. driving a car). Some more work can be found in the field of autonomous excavation. Hemami specifically examines bucket filling in [8] and also Wu [9], starting off with analyzing wheel loader operation in general, later focuses on the bucket filling phase. Both handle the problem as one that can be solved by following predefined trajectories. An abundance of literature in the aerospace and in the automotive sector deals with pilot models and parameter identification for the purpose of predicting pilot behavior over the next seconds [10 - 14]. Among the techniques employed are path-following controllers, Kalman filters, fuzzy logic, and neural networks. Such models are used for advanced control problems like pilot or driver assistance systems and energy management of hybrids. But the inverse problem is also considered: assessing handling qualities in certain predefined maneuvers.
3 Development Methodology 3.1 Required Model Features Since the application of the operator model is in simulation in conceptual design, i.e. before any physical prototype is available, it has been deemed important that the model not be hard-coded in any way. Using fixed time, speed or position references
A Methodology for Modeling the Influence of Construction Machinery Operators
617
and predefined trajectories, there is a significant risk that these references are only valid for the machine that was used during development of the operator model, but result in large deviations for a new machine with as yet unknown properties. Therefore, any such references must be weak ones and either constant for all machines of any size and architecture, or possible to formulate parametrically, i.e. as a function of bucket length, loading capacity, wheel distance or similar. Also, in order to mimic the human operator as close as possible, the operator model must also be strictly separated from both machine and environment model, and its inputs and outputs must be limited to those of a human operator. 3.2 Orientating Interviews These interviews are unstructured and aim to establish basic knowledge of the type of cycle to model, its features, and its characteristics. In our work, interviewing owners and operators of wheel loaders at different sites has lead to the conclusion that each working place is unique in its parameters, but the short loading cycle (Fig. 2) is highly representative of the majority of applications.
Fig. 2. Short loading cycle
Typical for this cycle is bucket loading of material on an adjacent load receiver within a time frame of 25-35 seconds. The aforementioned problems with interactions between subsystems are highly present. Several phases can be identified (Fig. 2), which Table 1 describes briefly. A more detailed exploration of the short loading cycle can be found in [1]. Essentially all operators described their operation commands as triggered by events, both from the machine and its environment, sometimes guided by weak speed or position references. It has therefore been found meaningful to develop rule-based operator models and all subsequent steps are based on that decision.
618
R. Filla Table 1. Phases of the short loading cycle
# 1
Phase Bucket filling
2
Leaving bank
3
Retardation
4
Reversing
5
Towards load receiver
6
Bucket emptying
7
Leaving load receiver Retardation and reversing
8
9
Towards bank
10
Retardation at bank
Description Bucket is filled by simultaneously controlling the machine speed and lift and tilt functions. Operator drives backwards towards the reversing point and steers the machine to achieve the characteristic V-pattern. Is started some time before phase 4 and can be either prolonged or shortened by controlling the gas pedal and the service brakes. Begins when the remaining distance to the load receiver will be sufficient for the lift hydraulics to achieve the bucket height necessary for emptying during the time it takes to get there. The operator steers towards the load receiver, thus completing the V-pattern. The machine arrives perpendicular to the load receiver. The machine is driven forward slowly, the loading unit being raised and the bucket tilted forward at the same time. Operator drives backwards towards the reversing point, while the bucket is lowered to a position suitable for driving. Not necessarily executed at the same location as in phases 3 and 4, because lowering an empty bucket is faster than raising a full one. The machine is driven forward to the location where the next bucket filling is to be performed, the bucket being lowered and aligned with the ground at the same time. Often combined with the next bucket filling by using the machine’s momentum to drive the bucket into the gravel pile.
3.3 In-Depth Interviews In this next step, semi-structured research interviews are conducted with professional operators who are able to verbally express how they use the machine in the working task at hand. It is important to go through each cycle phase in detail, noting what event triggered a reaction, which controls are applied and how long, alternate scenarios, what defines success, what defines failure etc. In our work we were able to interview one product specialist who was also a professional test operator and, more importantly, experienced machine instructor and thus used to teaching people how to operate wheel loaders in an efficient manner. Additionally, many brief unstructured interviews were performed with colleagues of a similar background. Table 2 lists one result: a guide to how to fill the bucket. As another example, Table 3 shows how phases 2 to 4 are performed, i.e. leaving bank, retardation, and reversing.
A Methodology for Modeling the Influence of Construction Machinery Operators
619
Table 2. Bucket filling # 1 2 3 4 5 6 7 8 9 10
Description Accelerate to a certain velocity Shift to lowest gear (kick-down) when bucket edge meets gravel pile Apply gas pedal to increase traction and push bucket into the pile In case of slippage between tires and ground, lift bucket a little more to increase load on front axle (increases traction) If bucket gets stuck in pile, tilt bucket backwards a little If bucket gets stuck in pile, reduce traction a little Follow the slope of the pile in a carving manner Lift continuously, apply tilt function without releasing lift lever Leave pile with bucket below tipping height (straight lifting arms) Tilt bucket fully back Table 3. Leaving bank, retardation, and reversing
# 1 2 3 4 5 6 7 8 9 10
Description Put transmission into reverse gear Start lifting Apply gas pedal to accelerate, but keep machine speed below shifting point for gear 3 Steer machine into right curve (if load receiver is standing to the left) Adjust steering so that machine can arrive perpendicular to the load receiver Choose reversing point so that distance to load receiver is enough for bucket to reach sufficient height for emptying, if continuously lifted Retard economically by releasing gas pedal Steer back machine into straight position Apply brakes until machine almost stands still Put transmission into forward gear
3.4 Recording and Analyzing Test Series Together with video recordings of the machine in operation and possibly also from inside the cab, measurement data can serve as an additional source of information. In our work we equipped test machines with a data acquisitioning system and recorded control inputs and major machine variables, such as engine speed, traveling distance, bucket height and angle (etc). We also found video recordings of the machine in operation to be a valuable complement. 3.5 Deriving General Rules and Constraints The results from interview studies and possibly additional measurements are now to be transformed into general, non-machine specific rules and constraints. This is the last step before coding the operator model, which makes it necessary to construct the rules and constraints in a way that is possible to implement. In our work one result was a quantification of values for “a little” and “certain”, the vague vocabulary used in Table 2. For instance, the initial velocity in rule #1 was set to 3 km/h and the trigger for kick-down in rule #2 was set to a bucket penetration
620
R. Filla
depth of 200 mm. The height in rule #9 was set to 1/3 of maximum lifting height. Also, the working place was set up parametrically, more details can be found in [16]. 3.6 Implementation The derived general rules and constraints are now to be implemented as execution paths of a finite state machine. In our work two operator models have been realized. In both cases the model of the technical system and the working place were developed and simulated in ADAMS, a three-dimensional multi-body system code. Both operator models are separate entities and the information exchange with machine and environment is limited to operator inputs and outputs similar to a human being, i.e. the operator has neither insight nor influence at a deeper level.The first operator model has focused on the bucket filling and has been realized as state equations in ADAMS with an algorithm reminiscent of fuzzy logic. This is explained in more detail in [15]. The second operator model, with a focus on the remaining phases of the short loading cycle, has been realized in Stateflow (Fig. 3) and co-simulated with ADAMS [16]. Table 4 shows how cycle phase 2 has been realized with two parallel paths.
Fig. 3. Top level view of the operator model in Stateflow [16] Table 4. Applied steps in phase 2 of the operator model # A1 A2 A3 A4 A5 B1 B2 B3 B4
Description Apply full lift and full gas pedal. Steer machine so that reversing will take place with machine pointing at workplace origin (angle at ca 45°). Calculate required steering angle. Apply full steering, reduce and stop when required steering angle is reached. Begin steering back when machine is about to point at workplace origin. Keep driving backwards until path B terminates this phase. Wait for machine to pass load receiver (fist geometric possibility to reverse). Wait for extrapolated remaining lifting time to be lower than remaining extrapolated driving time to load receiver. Calculate required steering angle for perpendicular arrival at load receiver. Terminate this phase if within achievable limits. Go to phase 2a (extension, continued straight driving until steering possible).
A Methodology for Modeling the Influence of Construction Machinery Operators
621
3.7 Validation With the operator model implemented, it can now be validated by running simulations with varying parameter setups and comparing to results from test series. In our work, both operator models have been tested by varying the machine’s technical parameters. For instance, changing the machine’s torque converter to a weaker characteristics (as we also have done in our measurement series), leads to a slightly different operating style with in general higher engine speeds, since the operator adapts and quickly finds the necessary amount of gas pedal angle to control traction force. Exactly this phenomenon has also been shown to occur in our simulations, without any explicit coding of it (see Fig. 4, results from first operator model).
Fig. 4. Engine load duty for machines with different torque converters (first model) [15]
Fig. 5. Adaptation to lifting speed (second model) [16]
Another example is that of changed speed in the bucket’s lift hydraulics, emulating insufficient pump capacity. A human operator adapts to this by reversing farther with the wheel loader until finally driving forward to the load receiver to dump the bucket’s load. This could also be demonstrated in our simulations without any modification of the operator model. Figure 5 shows traces from experiments with the second operator model (black curve with circular markings: lower lifting speed).
622
R. Filla
4 Conclusion and Outlook With both presented operator models, a “human element” has been introduced to dynamic multi-domain simulation of complete construction machinery. Both operator models have been derived from interview studies in a fairly straightforward manner. They prescribe the machine’s working cycle in a more generic way, independently of the machine’s technical parameters. Due to this, whole components or sub-systems can be changed in their characteristics without compromising the validity of the simulation. This gives more relevant answers with respect to total machine performance, productivity, and fuel consumption. The results are satisfactory and the methodology is easily usable for other, similar situations. In the future, we will try to develop operator models with which a machine’s operability can be predicted using simulation. As noted earlier, in automotive and aerospace simulations this is usually achieved by analyzing the control effort required to perform predefined maneuvers. A weighed, piece-wise analysis of prominent cycle phases (e.g. bucket filling, reversing and bucket emptying) might be a way forward. In order to validate any such simulations, we will also work with quantification of work load by simultaneously performing in-depth measurements on a wheel loader in operation and physiological measurements on the operator controlling the machine.
References 1. Filla, R.: Operator and Machine Models for Dynamic Simulation Construction Machinery. Licentiate thesis, Linköping University, Linköping (2005), http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-4092 2. Vännström, J., Lindholm, S.: Educational Interface for Reducing Fuel Consumption. MSc thesis, Luleå University of Technology, Luleå (2007), http://epubl.luth.se/1402-1617/2007/114/ 3. Zhang, R., Carter, D.E., Alleyne, A.G.: Multivariable Control of an Earthmoving Vehicle Powertrain Experimentally Validated in an Emulated Working Cycle. In: Conference paper, ASME 2003 International Mechanical Engineering Congress and Exposition (2003) 4. Stener, P., Snabb, R.: Körbarhetskvantifiering av Hjullastare. MSc thesis, Linköping University, Linköping (2008), http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11148 5. Boman, M.: On predicting fuel consumption and productivity of wheel loaders. MSc thesis, Luleå University of Technology, Luleå (2006), http://epubl.ltu.se/1402-1617/2006/009/ 6. Filla, R.: Anläggningsmaskiner: Hydrauliksystem i multidomäna miljöer. In: Conference paper, Hydraulikdagar (2003) 7. Gellersted, S.: Manövrering av hjullastare (Operation of Wheel Loaders). Technical report, JTI – Institutet för jordbruks- och miljöteknik, Uppsala (2002), http://www.jti.slu.se/publikat/rapporter/l&i/r-310sg.pdf 8. Hemami, A.: Motion trajectory study in the scooping operation of an LHD-loader. IEEE Transactions on Industry Applications 30, 1333–1338 (1994), http://dx.doi.org/10.1109/28.315248
A Methodology for Modeling the Influence of Construction Machinery Operators
623
9. Wu, L.: A Study on Automatic Control of Wheel Loaders in Rock/Soil Loading. Dissertation, University of Arizona, Tucson (2003), http://wwwlib.umi.com/dissertations/fullcit/3090033 10. Macadam, C.C.: Understanding and Modeling the Human Driver. Vehicle System Dynamics 40, 101–134 (2003), http://dx.doi.org/10.1076/vesd.40.1.101.15875 11. Summala, H.: Automatization, automation, and modeling of driver’s behavior. Recherche – Transports – Sécurité 66, 35–46 (2000), http://dx.doi.org/10.1016/S0761-8980 12. Plöchl, M., Edelmann, J.: Driver models in automobile dynamics application. Vehicle System Dynamics 45, 699–741 (2007), http://dx.doi.org/10.1080/00423110701432482 13. Bengtsson, J.: Adaptive Cruise Control and Driver Modeling. Licentiate thesis, Lunds Tekniska högskola, Lund (2001) 14. Anderson, M.R., Clark, C., Dungan, G.: Flight test maneuver design using a skill- and rule-based pilot model. In: Conference paper, IEEE International Conference on Intelligent Systems for the 21st Century (1995), http://dx.doi.org/10.1109/ICSMC.1995.538188 15. Filla, R., Ericsson, A., Palmberg, J.-O.: Dynamic Simulation of Construction Machinery: Towards an Operator Model. In: Conference paper, IFPE 2005 Technical Conference (2005), http://www.arxiv.org/abs/cs.CE/0503087 16. Filla, R.: An Event-driven Operator Model for Dynamic Simulation of Construction Machinery. In: Conference paper, 9th Scandinavian International Conference on Fluid Power (2005), http://www.arxiv.org/abs/cs.CE/0506033
Human Head 3D Dimensions Measurement for the Design of Helmets Fenfei Guo, Lijing Wang, and Dayong Dong School of Aeronautic Science and Engineering Beijing University of Aeronautics and Astronautics Beijng, 100191, China [email protected]
Abstract. With the helmet systems becoming increasingly complex, head 3D dimensions are needed for the higher precision, but the traditional anthropometry could not meet the design accuracy. In this paper, a non-contact method of head 3D dimensions measurement was presented to improve the design accuracy of helmets. The boundary 3D coordinate data of head slice was extracted from DICOM images based on the MRI technology. The mathematical model of head slice was described through 2D and 3D coordinate systems. Then we adopted the Fourier transform to fit the boundary of slice and obtained a parameter model with a series of Fourier coefficients. The standard headforms was constructed based on the characteristic slices and nine standard headforms were divided by Head Breadth-length Index and Head Height-length Index in order to preserve analogous facial characteristics. The head 3D data measured by this approach had been applied to the design of helmets. Keywords: Head, 3D, Standard headform, Slice, Boundary.
1 Introduction The human head characteristic is the most complex part of human body and the anthropometric points have approximately 32 landmarks [1]. Human head dimensions are the basis of the design of head protective equipments, such as helmets, gas masks, ear cups, visual devices, etc. With the advent of more complex helmet systems that include night vision goggles and helmet-mounted displays, as well as advanced sound attenuation components, the imprecision and inadequacy of old style of anthropometry becomes very apparent. The traditional anthropometric methods require Frankfurt Plane orientation, which is made more difficult to accomplish by the fact that the anthropometric information available to most designers is misleading and can lead to poor helmet sizing [2]. Laser scanning technology for three-dimensional (3D) surfaces digitizing which can capture hundreds of thousands surface data points in a matter of seconds as the digitizer circles around the subject’s head [3]. However, a large quantity of 3D data points influence processing speed and storage requirement. The data reduction algorithm is used to improve the application efficiency, which reduces the accuracy of head data measures [4]. An analysis technique was discussed which used the Fourier V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 624–631, 2009. © Springer-Verlag Berlin Heidelberg 2009
Human Head 3D Dimensions Measurement for the Design of Helmets
625
transform to model two-dimensional (2D) horizontal cross-sections of human head. In this study, the stereo photometric points were transformed into values called curvatures which subsequently transformed into a series of Fourier coefficients. One difficulty noted by the authors was in the selection of the Fourier coefficients which are most useful in discriminating shapes. Another difficulty noted was the fact that very different entities, such as the ear and the nose, may contribute to the same Fourier coefficient on one person and not on another [5]. In order to achieve helmet systems development, the digital system integrated technology is employed to accomplish 3D virtual assembly and unified sizing of helmet. The helmet system design from modularization to integration is now a necessity to fulfill each other compatibility. Therefore, an advanced method of head 3D dimensions measurement is needed to establish a head 3D database based on scientific theory and reliable data. This paper discussed an approach that provided non-contact measurement of head 3D surface anthropometric data for the design of helmets. The boundary coordinate data of head slice was acquired from 2D medical images. The mathematical model of head slice was constructed through coordinate systems and the Fourier transform. The standard headforms was constructed based on the characteristic slices.
2 Materials and Methods 2.1 Acquiring Boundary Data of Head Slice The approach employed slice scanning method that Magnetic Resonance Imaging (MRI) scanned head using Siemens Magnetom Trio 3T MRI Scanner. The average value of Chinese male adult’s head height is 223mm (5th percentile 206mm and 95th percentile 241mm) [6]. In order to ensure data storage and processing efficiency, the scanning spacing was made as 5mm on the premise of accuracy; thereby the slices of head had approximately 40 to 52 vertically from vertex to gnathion. The Frankfurt Plane of the subject’s head should be kept uprightness and lateral symmetry in the process of MRI scan, which needed about 5 minutes. The head slice images that are accorded with Digital Imaging and Communications in Medicine (DICOM) standard were derived from MRI scanner. The scan parameters that were set by scanner are given in Table 1. Table 1. The parameters of scanner needed to set at the front of scan, Sequence variant = Slice thickness + Spacing Scanning sequence Gradient-echo sequence
Slice thickness
Spacing
Sequence variant
Resolution
4.0 mm
1.0 mm
5.0 mm
512×448
Generally, a 3D image is stored as 2D slices in various formats, such as DICOM, TIFF, JPEG and BMP. Considering all slices of each head processed together, separate 2D image slices of each head were compiled into a single 3D stack (Fig. 1.A). The boundary of each head slice was composed of 2D data points (Fig. 1.B), so the process of image slices was extremely important, which can directly affect the
626
F. Guo, L. Wang, and D. Dong
location of data points and the accuracy of slice boundary. Finally, 3D coordinates of head were extracted from DICOM images and outputted as a data file. Therefore, the head information was stored as a 3D coordinate data file and detached from original DICOM images. The data storage space was only occupied by about one percent of original image. When samples are large, the advantages of the storage will be very obvious.
A
B
Fig. 1. A. The MRI slices of one subject’s head were compiled together in sequence. B. The color curve is the boundary of one slice.
2.2 Mathematical Model of Head Slice Boundary The 3D coordinates extracted from images can construct the head 3D model, but the model is unable to meet the specific applications. There are two reasons: (1) in terms of different head samples, their data points have large differences and do not directly use to statistical analysis; and (2) a large quantity of 3D data points influence processing speed and storage efficiency. Head slice boundary need to be described by the mathematical function whose parameters replaced data points [7]. We will assume that vertex is coordinate origin O, O-XZ plane is the midsagittal plane, and Z axis is upwards perpendicular with head slices. In that case, a Cartesian coordinate system was established here, see Fig. 2. The different slices of one headform had the same plane coordinate system o-xy which could change into the Polar coordinate system. Pole compared to origin O was moved to the face direction to fit well complex face figure, and the pole offset Xo of each headform was chose appropriately by the second slice measurement, Xo about between 20mm and 30mm.
Human Head 3D Dimensions Measurement for the Design of Helmets
627
x
xy
( j, j)
ρ
y
θ
j
j
X O
Y Z
Fig. 2. The coordinate system of head slice, including a 3D coordinate system and a 2D coordinate system
Therefore, the contour of every slice was noted by polar radius ρ(θ), and the pole coordinates of every slice i was noted as (Xo, 0, Zi) in the Cartesian coordinate system. The coordinates j of boundary point were calculated using ⎧⎪ X j = ρ j cos θ j + X o ⎨ ⎪⎩Y j = ρ j sin θ j
(1)
The contour of boundary of slice was depicted by the above formula at the pronasale plane, see Fig. 3.A. A number of small sawtooth which resulted from the image resolution and the uneven distribution of data points was obviously detected in the curve. 115 110
115
ρ
105
105
100
100
95
95
90
90
85
85 80
80 75 0
ρ
110
50
100
150
200
A
250
300
350
θ
400
75 0
θ 50
100
150
200
250
300
350
400
B
Fig. 3. A. The contour curve of boundary was depicted by Matlab at the pronasale plane. B. The same boundary was depicted by Matlab when it was fit by the Fourier transform.
The fitting curve must accurately reconstruct the close boundary of head slice and be smooth at the junction. The Fourier transform of cycle 2π fit the slice curve, as following formulas
628
F. Guo, L. Wang, and D. Dong
ρ i (θ ) =
ai ,0 2
12
+ ∑ (ai ,n cos nθ + bi ,n sin nθ )
(2)
n =1
π
1
ai , n =
π∫
bi ,n =
1
−π
π∫
π
−π
f i (θ ) cos nθ dθ
(3)
f i (θ ) sin nθ dθ
(4)
where i is the number of slice, n=1~12, ai,n and bi,n are series parameters which were calculated by the integral of the original boundary curve fi(θ). The formula (2) described a 3D stratified head model. The boundary data of each slice were completely described by only 25 parameters. Therefore, all 3D dimensions of headforms were calculated by pole offset Xo and the slice parameters ai,n, bi,n and Zi. The fitting curve was depicted at same slice, see Fig. 3.B. 2.3 Constructing Standard Headforms The helmet designed to use anthropometric dimensions should be applicable to groups of persons, which requires obtaining a standard headform by statistical methods. The standard headform was averagely calculated by a number of headform samples. In view of the complex head characteristics, the different characteristics are definitely distinguished through a certain way. Above the mathematical model of slice boundary, various characteristics of head had stored on the parameters of each slice. Although the figure of slice of various samples was different, the shape was continuous and smooth. The facial midline of headforms was used to mark facial characteristics of vertical direction (Z axis direction, see Fig. 2) on the mid-sagittal plane. Ten characteristic slices were defined on the headform and distinguished by Matlab program. They include: glabella slice, nasion slice, pronasale slice, subnasale slice, labrale superius slice, stomion slice, labrale inferius slice, supramental slice, pogonion slice, and gnathion slice [8], see Fig. 4. Ten characteristic planes represented facial characteristics and provided a basis for constructing standard headforms. In order to obtain standard headforms at the 3D coordinate system, the function descriptor of slice i, the Zi value of slice i and the pole offset Xo were averagely calculated by following formulas ρ i (θ ) =
∑a
i ,0
2M
12
+ ∑( n =1
∑a M
Zi =
Xo =
i ,n
cos nθ +
∑Z
i
M
∑ Xo M
∑b
i ,n
M
sin nθ )
(5)
(6)
(7)
Human Head 3D Dimensions Measurement for the Design of Helmets
629
where M is sample number, Z i and X o are respectively the Z coordinate value and the pole offset of slice i of standard headform. The Fourier coefficients were made average calculation and the computation was largely reduced by the formula (5). For symmetrical standard headforms, the original boundary curve fi(θ) can be described even functions and the sine functions coefficients of formula (4) are zero. For unsymmetrical standard headforms, the value of sine functions coefficients is very small and can be omitted as symmetrical headforms. The formula (2) was improved as following formula ρ i (θ ) =
ai,0 2
12
+ ∑ a i ,n cos nθ
(8)
n =1
As the symmetry of single headform sample is not fine, the symmetrical process was used to standard headforms. Therefore, the standard headforms were described by only 13 parameters, Xo, ai,n, and Zi. v
v―vertex g―glabella n―nasion prn―pronasale
g
sn―subnasale
n
ls―labrale superius sn ls sto li sp pog gn
prn
sto―stomion li―labrale inferius sp―supramental pog―pogonion gn―gnathion
Fig. 4. Ten characteristic points of facial midline are corresponding to ten horizontal crosssections
3 Results We had measured a large quantity of subject’s head of male adult and established a database of head 3D dimensions. Head Length L, Head Breadth B and Head Height H were calculated by polar coordinate and we defined that Head Breadth-length Index (BI) and Head Height-length Index (HI)
630
F. Guo, L. Wang, and D. Dong
BI = B / L •100
(9)
H I = H / L •100
(10)
The head samples were respectively divided into three groups based on two indexes: Middle, Round and Superround based on BI; Fit, Tall and Supertall based on HI. Nine standard headforms were obtained by BI and HI 2D distribution [9], see Table 2. The standard headforms grouped preserved analogous facial characteristics in each headform and prevented to mixing different facial characteristics in a headform. Table 2. Nine standard headforms were divided by BI and HI
BI
≤119.99
HI ≤129.99
≥130.00
≤79.99
Middle-fit headform
Middle-tall headform
Middle-supertall headform
≤89.99
Round-fit headform
Round-tall headform
Round-supertall headform
≥90.00
Superround-fit headform
Superround-tall headform
Superround-supertall headform
4 Conclusion In this paper we described a method that measured head 3D surface dimensions for the design and size of helmets. The boundary coordinate data of head slice was acquired from DICOM images by the MRI technology. The mathematical model of head slice was depicted through 2D and 3D coordinate systems, and then the boundary was fit by the Fourier transform. The standard headforms were constructed by average calculation on the basis of the characteristic slices. Finally, nine standard headforms were divided based on BI and HI in order to preserve analogous facial characteristics. The database of head 3D dimensions will provide basic data for head equipment design. The helmet size based on headforms can be applied to the design of helmets. People R. China national standard 3D dimensions of male adult headforms was accomplished on the basis of study on head 3D dimensions measurement. Future research planned includes the extension of the technique to the algorithm of the Fourier coefficients, and the development of methods for the fit of slice boundary.
References 1. Martin, R., Saller, K.: Lehrbuch der Anthropologic. Gustav Fischer Verlag, Stuttgart (1957) 2. Robinette, K.M., Whitestone, J.J.: The Need for Improved Anthropometric Methods for the Development of Helmet Systems. J. Aviation, Space, and Environment Medicine 65(5), 95– 99 (1994)
Human Head 3D Dimensions Measurement for the Design of Helmets
631
3. Robinette, K.M., Whitestone, J.J.: Methods for Characterizing the Human Head for the Design of Helmets. AL-TR-1992-0061, Armstrong Laboratory, US Air Force Systems Command, Wright-Patterson AFB, Ohio (1992) 4. Nurre, J.H., Whitestone, J.J., Burnsides, D.B., Hoeferlin, D.M.: Issues for Data Reduction of Dense Three-Dimensional Data. AFRL-HE-WP-SR-2001-0008, US Air Force Research Laboratory, Wright-Patterson AFB, Ohio (2001) 5. Ratnaparkhi, M.V., Ratnaparkhi, M.M., Robinette, K.M.: Size and Shape Analysis Techniques for Design. J. Applied Ergonomics 23(3), 181–185 (1992) 6. GB/T 2428-1998, Head-face Dimensions of Adults (in Chinese) 7. Zuhua, G., Xiao, C., Hong, Z.: Standardization of Three Dimensional Head Model for Head and Face Equipment Design (in Chinese). J. Journal of Computer-aided Design & Computer Graphics 17(07), 1549–1555 (2005) 8. GB/T 5703-1999, Basic Human Body Measurements for Technological Design (in Chinese) 9. GJB 5477-2006, 3D Dimensions of Male Soldier Headforms (in Chinese)
Realistic Elbow Flesh Deformation Based on Anthropometrical Data for Ergonomics Modeling Setia Hermawati and Russell Marshall Dept. Design & Technology Loughborough University, Leicestershire LE11 3TU, UK {S.Hermawati,R.Marshall}@lboro.ac.uk
Abstract. The human model for ergonomic simulation has improved in terms of its reliability and appearance and yet there seems to be less attention paid to create a realistic and accurate flesh deformation around the joint. This study, a part of ongoing research, proposes a combination of manual and automatic (3D body scanner) measurements to create a database for flesh deformation prediction i.e. flesh deformation area and cross section changes, around the elbow joint. The database consists of two race groups i.e., Caucasian and Asian (23 subjects, 11 males and 12 females), which were carefully chosen to represent a variety of height and body type. The prediction results for both flesh deformation area and cross section changes are discussed as well as their relevance for the next stage of the study. Keywords: Flesh deformation modeling, 3D body scanner, ergonomics.
1 Introduction A common usage of human models in Ergonomics applications is to evaluate a product or workplace to accommodate people with diverse sizes and shapes. Locket et al. [1] pointed out that ergonomic human modeling tools were also used for visualization. The visualization provided information about body posture, reach ability, field of view and clearances which served as a basis for ergonomics evaluation and decision making [2]. A study by Lämkull et al. [3] showed that the visualization fidelity did affect the user effectiveness in engaging with ergonomics applications. They also argued that the request for a more human-like virtual human was present. This request was partly fulfilled by the recent development of Vis Jack which allows the incorporation of 3D body scanner data to generate a more accurate and realistic virtual human. Despite this latest development, there seems to be a lack of attention paid to the development of a realistic and yet accurate flesh deformation around the joint which would be useful to address motion restriction due to clothes, gloves, etc. 1.1 Related Studies A review of flesh deformation methods revealed a number of different approaches. The most realistic, but complicated and computationally demanding, was anatomic V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 632–641, 2009. © Springer-Verlag Berlin Heidelberg 2009
Realistic Elbow Flesh Deformation Based on Anthropometrical Data
633
deformation. This approach emphasized heavily on the exact and accurate recreation of the bones and body tissue to simulate the flesh deformation [4]. Due to its high demand of computation, this approach was more likely suitable for simulations requiring high degrees of accuracy e.g., crash simulations. Another approach was a physical deformation which consisted of a mass and spring system, derived from mechanical laws of particles [5]. Although this approach was less computationally demanding than the anatomical approach, it was still considered to be less suitable for real time application and was mainly for off-line simulation and animation [12]. However, the geometric approach was the cheapest in terms of computational need and hence was widely used for various applications e.g. 3D animation packages. One of the geometric approach variants commonly used was “skinning” in which the skin deformation was achieved by applying different weights to the skin vertices before undergoing rigid transformation driven by movement of the skeleton. The major drawbacks of this approach were the lack of muscle bulging/swelling and collapsing joints for extreme positions which required user intervention to correct them. Several studies were directed to overcome these problems which unfortunately increased the complexity and lessened the appeal of the skinning method itself [6], [7], [8]. A recent development in deformation methods was the example based approach [9], [10]. This method involved training the skinning model by setting the weights such that they provided the closest possible geometry to a training set of example poses. The drawbacks of this approach is the more examples poses, the more complex the motion representation would be. Shen and Thalmann [11] and Hyun et al. [12] proposed a different approach i.e. the sweep based approach, which was characterized by the usage of cross sections to reconstruct and express the deformation. Hyun et al [12] adopted this approach by using ellipses which fit tightly to the cross section and sweep them along the link between the joints. These ellipses were then used to govern the deformation by changing its orientation, accordingly to the joint angle changes. The advantage of using this approach is its ability to preserve the volume when no self intersection happens. However, due to the collision detection algorithm, the interactive speed is slow and user intervention is sometimes needed while blending the body segments together. Based on the review of the advantages and disadvantages of the existing approaches above, it was shown that the existing approaches focused largely on one of two areas: (i) the representation of a visually convincing human model, or (ii) the representation of an anatomically correct human model. Both of these approaches have issues either with the need for user intervention to adjust the model to achieve a satisfying result or, exhibit a high computation cost, making them unsuitable for real time use. Supported by a TC2-NX12 3D body scanner, this research has developed a new method of flesh deformation around the joint, with the elbow as a starting point, that was automatic (minimum user intervention), realistic (visually) and accurate (dimensionally). This paper addresses how the flesh deformation method could be applied for people with different anthropometric characteristics with minimum user interventions. This feature was essential because the flesh deformation method was specially aimed for ergonomics applications which demand flexibility in simulating humans with diverse anthropometric characteristics. Further details of the flesh deformation
634
S. Hermawati and R. Marshall
method can be found in Hermawati and Marshall [13]. A brief outline of the method follows. 1.2 Flesh Deformation Method In order to deform the flesh around the elbow in any posture, the proposed method required an establishment of five cross sections and a profile of the arm at the so called “key postures”. The key postures were the arm in a full extension, 135o flexion, 90o flexion and maximum flexion. Plane of the cross section is ┴ to the arm joint, except at the elbow
Arm Joints
LAF UAF LAM Arm joints
UAM E
Arm Profile
Fig. 1. Matching a 3D arm scan data with the photograph of the arm from sagittal-coronal plane and locations of the five cross sections for a fully flexed arm
The five cross sections’ locations were determined by the flesh deformation area i.e., the area bounded by where the lower and upper arm met while the arm assumed a maximum flexion (see Fig. 1). These cross sections were centered at the skeleton and lay on the plane which was perpendicular to the skeleton, except for the elbow‘s plane which constantly changed according to the level of the arm’s flexion. The arm profile in the sagittal plane was also obtained for each key posture. The far right image of Fig. 1 shows an example of arm profile for a fully flexed arm. This
KEY POSTURES
Determine the flesh deformation area Determine five cross sections for a fully extended, 135o, 90o and max flexed arm Determine the arm profile for a fully extended, 135o, 90o and max flexed arm
.
Determine five cross sections and arm LINEAR INTERPOLATION
Determine additional cross sections for greater surface control Create the elbow flesh deformation
Fig. 2. The proposed flesh deformation around the elbow joint
Realistic Elbow Flesh Deformation Based on Anthropometrical Data
635
information was then combined to produce the elbow flesh deformation at any flexion angle by means of linear interpolation. The process is shown in the Fig. 2. 1.3 Application of the Flesh Deformation Method for Wider Anthropometrical Range To enable the current method’s application to simulate flesh deformation for a wide anthropometric range, the flesh deformation area, the key posture’s cross sections and arm profiles had to be predicted. Five inputs are required i.e. 3D arm scan in a fully extended supine posture, height, weight, race and gender. The whole process is illustrated in Fig. 3. A relationship between the flesh deformation area and body type/size is employed to predict the flesh deformation area of a 3D arm scan. A template, which is created from a predetermined 3D scan of a fully extended arm, is matched to the new 3D arm scan. Using the information from the predicted flesh deformation area, five cross sections are sampled through template matching. 3D arm scan of a fully extended arm Prediction of the flesh deformation area Template Matching 3D arm scan sampling at five locations
Prediction of the arm profiles
Database of cross sections and arm profile from different body types and sizes
Flesh deformation
Prediction of cross sections for 135o, 90o and maximum flexion
Fig. 3. Application of the flesh deformation method for wider anthropometrical range
To guide the deformation, a subject from the database that resembles closely to the sampled cross sections had to be established. The database stores cross sections information of all key postures from a range of subjects (n = 23) i.e. 260 cross sections altogether. Since the sampled cross sections assume full extension, only the cross sections of the fully extended arm from the database were required to determine the nearest subject. Then, the remaining data in the database from this subject i.e., cross sections for 135o, 90o and maximum flexion, were used as a basis to predict 135o, 90o and maximum flexion cross sections for the 3D arm scan.
2 Methodology Twenty three subjects took part in this study and were grouped into 4 categories i.e. Asian females, Asian males, Caucasian males and Caucasian females. Table 1 shows the mean and standard deviation of height, weight and BMI for each group. While the subject assumed a maximum flexion posture, markers were attached to the area where the lower and upper arm met and the distance of these markers to the elbow crease were recorded as they changed for each key posture. Then, a physical measurement to capture the changes of cross sections in four key postures by circling
636
S. Hermawati and R. Marshall
a flexible wire around the arm was carried out. The physical measurement was employed as an alternative approach of capturing the flesh deformation because of the inability of the 3D body scanner to capture the detail of the flesh deformation for an arm flexion ≥ 100º. Five cross sections were obtained during the physical measurement. The location of the cross sections were: 1) on the marker for the upper arm, 2) on the marker for the lower arm, 3) on the elbow/joint, 4) on the mid section between marker-joint for the upper arm and 5) on the mid section between marker-joint for the lower arm. Once the physical measurements were completed, a series of arm photographs with various flexion angles were taken from both the sagittal and the coronal plane. The data collection was finished by scanning the subject’s arm with the 3D body scanner. Table 1. The mean and standard deviation of height, weight and BMI for each group
Height (cm) Weight (kg) BMI
Asian females (n=6) 155±7.27 57.53±14.88 23.86±6.02
Asian males (n=5) 169.75±7.08 62.37±8.42 22.1±3.23
Caucasian Female Caucasian Males (n=6) (n=6) 164.83±9.13 180±5.99 69.57±21.5 82.35±10.61 25.31±6.81 25.48±4.74
The first step of the data processing was to determine the orientation of the upper arm joint by matching the 3D arm scan to the coronal-sagittal photographs. The joint was automatically generated by the 3D body scanner. With the help of the markers, the lower arm joint was created. This process was then followed by identifying parameters for the flesh deformation area of each key posture. For each key posture, four parameters were required i.e., the two farthest points of the flesh deformation area of both the upper (upper arm farthest–UAF) and lower arm (lower arm farthest– LAF) as well as another two points between UAF/LAF and the elbow joint (upper arm middle-UAM, lower arm middle-LAM). In the initial anatomical posture, UAM and LAM was approximately located in the middle of UAF/LAF. As the UAF and LAF changed its position in respect with the elbow joint, so did the UAM and LAM. Fig. 1 shows the result of matching the 3D arm scan and the locations of five cross sections for maximum flexion. Digitized and traced cross sections were orientated, positioned and adjusted manually, assisted by the 3D arm scans and the photographs. 16 points were sampled for every cross section and their distances towards the cross section center were calculated (δ1…16). The same distances were also determined for the template cross sections. For each cross section, its distance differences with its corresponding template was computed e.g., [ΔUAF,135o= UAF135o(δ1…16) - UAFtemplate(δ1…16)]. The results were stored in the database as shown in Table 2 where N is the total number of subjects. To compress the database size, Principal Component Analysis was applied. All data processing, apart from digitization and tracing which were processed with Corel Trace, were completed on Pro-Engineer WildFire 4.0. The flesh deformation area prediction and the cross section prediction algorithm were performed with Matlab.
Realistic Elbow Flesh Deformation Based on Anthropometrical Data
637
Table 2. Data arrangement in the database
Row 1
Δsubject1UAF,full extension, Δsubject1UAF,135o, Δsubject1UAF,90o, Δsubject1UAF,max
Row N
ΔsubjectNUAF,full extension, ΔsubjectNUAF,135o, ΔsubjectNUAF,90o, ΔsubjectNUAF,max
Row Nx5
ΔsubjectNLAF,full extension, ΔsubjectNLAF,135o, ΔsubjectNLAF,90o, ΔsubjectNLAF,max
………………............. ……………….............
3 Results 3.1 Flesh Deformation Area Prediction Data analysis was performed separately for each group to account for the effect of race and gender. To simplify the prediction process, once the predicted UAF location was established, it was then used to generate the LAF location. This decision was based on the finding that the relationship between the UAF and LAF locations could be represented linearly as shown in Fig. 4. For each key posture, UAM and LAM locations were expressed as a fraction of UAF and LAF, respectively, and were influenced by the arm angle. As an example, Fig. 5 shows the relationship between UAF and UAM for Asian female group. UAF and LAF relationship for a fully extended arm (n=23) Lower Arm Farthest-LAF (mm)
200 150 100 50 0 0
50
100
150
200
Upper Arm Farthest -UAF (mm)
Fig. 4. Linear relationship between UAF and LAF
Relationship between UAF/UAM ratio and the arm angles UAF/UAM ratio
1 0.8 0.6
y = -6E-06x2 - 0.0018x + 0.9478 R² = 0.9373
0.4 0.2 0 0
50
100
150
200
Angle between the upper and lower arm
Fig. 5. A relationship between the ratio of UAF/UAM and the arm angles
638
S. Hermawati and R. Marshall
Two parameters, upper arm length and BMI, were utilized to predict the location of UAF by means of multiple regression. These parameters were chosen to represent the effect of different body size and body type on the flesh deformation area. The error, the maximum of the absolute value of the deviation of the data from the model, was shown in Table 4. Once the UAF/UAM for both full extension and flexion were found, the location of UAF/LAF was computed. The entire process for the prediction of the flesh deformation area was shown in Fig. 6. Table 3. The absolute value of the UAF location error for a fully extended and flexed arm UAF Error (mm) Asian Females Asian Males Caucasian Females Caucasian Males Maximum extension 12.7957 10.421 22.8327 15.2706 Maximum flexion 3.9011 1.7325 9.0095 13.3917
Prediction of UAF for a fully extended arm
Prediction for UAM and LAM
Prediction of LAF for a fully extended arm
Prediction of the UAF for a fully flexed arm Prediction of LAF for a fully flexed arm
Fig. 6. Entire process of the flesh deformation area prediction
The flesh deformation area prediction was tested on three new data i.e., one Asian female, one Asian male and one Caucasian male. Table 4 showed the errors for each test data. Table 4. Euclidian distance error of flesh deformation area prediction for three data test BMI
Height
(cm) Asian female Asian male Caucasian male
20.45 22.03 23.33
165 170 177
Error (mm) Maximum extension Maximum flexion Upper arm Lower arm Upper arm Lower arm 0 2.57 2.68 2.58 23.89 20.8 10.4 7.83 0.53 0 4.16 7.99
3.2 Cross Sections’ Prediction A template was matched to the 3D arm scan of a fully extended arm to obtain the cross sections at five locations. To overcome frequent occurrence of noisy and missing data around the apex of the 3D arm scan, which otherwise affected the sampling outcome of UAM and UAF, sagittal and coronal profiles of the 3D arm scan were used to correct the UAM and UAF cross sections automatically. To simplify the cross section prediction process, all cross sections were transformed into a 2 dimensional coordinate system. For the cross section prediction, gender differences were overlooked since an earlier observation demonstrated the irrelevance of gender on the shape of the cross sections. Three different methods were applied to seek the best matched subject from the database. Because the
Realistic Elbow Flesh Deformation Based on Anthropometrical Data
639
sampled cross sections were required from a fully extended 3D scan arm, only the cross sections of the fully extended arm from the database were required to determine the best matched subject. For the upper part of the elbow, the relationship between UAF-UAM was employed to find the best matched subject from the database. The cross section changes of the best matched subject were applied on the new 3D arm scan and followed by a scaling adjustment to accommodate for cross sections’ size difference. The relationship between UAM and UAF was defined as: Y_ratio = (maximum UAF y value- minimum UAF y value) . (maximum UAM y value- minimum UAM y value) X_ratio = (maximum UAF x value- minimum UAF x value) . (maximum UAM x value- minimum UAM x value)
(1)
For the lower part of the elbow and the elbow itself, Principal Component Analysis (PCA) was utilized to determine the best matched subject from the database. PCA was useful to summarize features which distinguished cross sections from one subject to another. PCA was applied for E and LAM cross sections after subtracting them with the corresponding template cross sections. Matching was performed by comparing PCA values for LAM. The subject with the closest PCA value in the database to the target value was selected. Once the closest matched subject for LAM was found in the database, the LAF cross sections would be extracted from the same subject. A scaling adjustment to accommodate for cross sections size difference was also applied for the prediction of the lower part of the elbow. The elbow cross section prediction was in a way similar as that of the prediction for the lower part of the elbow. However, instead of using one subject to guide the prediction of the cross sections, a linear interpolation between two closest subjects was employed. This step was necessary because the elbow joint shape was much more complex than that of the lower part of the elbow. The cross sections prediction was validated by comparing them with either cross sections of a 3D scan arm at available key posture or the photographs. The comparison with the 3D scan arm was mostly available for 135o flexion and in some occasion for 90o flexion (the upper part of the elbow only). For each cross section, the Euclidean distance difference between the predicted cross section and the 3D arm cross section were computed at 32 points to obtain the maximum absolute deviation and the
Fig. 7. The outcome of flesh deformation and cross sections prediction at three key postures of three different data test
640
S. Hermawati and R. Marshall
deviation average. The maximum absolute deviation was ± 5.46 mm and the average of the deviation was ± 1.96 mm. Images of the outcome of the flesh deformation and the cross sections prediction is shown in Fig. 7.
4 Discussion The validation of flesh deformation area prediction shows a mixed outcome regarding its level of accuracy. This might be caused partly by such a small number of samples in this study. With a sample of ≈ 6 people in each group, there is a possibility that the true relationship between BMI, upper arm size and flesh deformation area is not correctly represented. In addition to this, the flesh deformation area is largely affected by arm muscularity that neither the BMI nor the upper arm length properly capture. Hence, there is a strong likelihood that additional parameters are required e.g. body type, skin fold measurement. Another factor which might cause the mixed outcome is the fact that, in this study, the flesh deformation area was defined while the participants assumed a fully flexed elbow. This meant that the range of motion of the elbow flexion would also influence the flesh deformation area and yet, for simplicity, a uniform range of elbow flexion was applied during data analysis and prediction. In addition to this, no correlation between the elbow range of motion and BMI was made despite a study finding [14], which showed BMI was negatively correlated with the elbow flexion range. The small number of available data in the database might also cause the relatively low accuracy of the cross sections predictions. Since the represented feature of the cross sections in the database was limited, it might lead to an incorrect choice of the best matched subject for the cross section prediction. In addition to this, the combination of physical measurement and 3D arm scan data usage might be another factor which contributed to the low accuracy of the cross sections predictions although much care was taken to ensure the accuracy of the physical measurement by introducing the photographs for cross sections refinement. Nonetheless, even with such a low number of data in the database to refer to, the result of this method demonstrated a potential for a further development. This method could also be used for other simple joint such as the knee. However, a complexity might arise for an application on a complex joint such as the shoulder. In this study, even though there were only 23 participants involved, the amount of work that was undertaken to gather and analyze their data was quite substantial. This might be seen as a drawback for a wider study. However, it might be no longer a problem once the 3D body scanning capability has improved e.g. marker detection ability, better handling for data occlusion, better accuracy for joint allocation etc. The next stage of the study is to analyze the arm profiles from the collected data and then establishing the prediction method of the arm profile for any 3D scan arm input. This step will be the last stage of the research and the overall validation for both the flesh deformation method and the flesh deformation prediction would then be validated.
5 Conclusions A flesh deformation model, that can represent the deformation of the flesh deformation at the elbow join, has been developed. The flesh deformation area and cross
Realistic Elbow Flesh Deformation Based on Anthropometrical Data
641
section prediction showed a positive result regarding its accuracy despite being based on a limited range of anthropometric data. Furthermore, the prediction was acquired without user intervention. The prediction result accuracy could be improved further by gathering a wider anthropometric data range. This flesh deformation prediction fulfilled the identified ergonomics application requirements i.e., accurate, no user intervention and accommodating various anthropometric parameters.
References 1. Locket, J.F., Assmann, E., Green, R., Reed, M.P., Rascke, R., Verriest, J.P.: Digital Human Modeling Research and Development User Needs Panel. In: Proc. SAE Digital Human Modelling for Design and Engineering Symposium. SAE, Iowa City (2005) 2. Wegner, D., Chiang, J., Kemmer, B., Lämkull, D., Roll, R.: Digital Human Modeling Requirements and Standardization. In: Proc. of Digital Human Modeling for Design and Engineering Conference and Exhibition. SAE, Seattle (2007) 3. Lämkull, D., Hanson, L., Örtengren, R.: The Influence of Virtual Human Model Appearance on Visual Ergonomics Posture Evaluation. Applied Ergonomics 38, 713–722 (2007) 4. Dong, F., Clapworthy, G.J., Krokos, M.A., Yao, J.: An Anatomy-Based Approach to Human Muscle Modeling and Deformation. IEEE Transaction on Visualization and Computer Graphics 8, 154–170 (2002) 5. Vassilev, T., Spanlang, B.: A Mass-Spring Model for Real Time Deformable Solid. EastWest-Vision (2002) 6. Mohr, A., Tokheim, L., Gleicher, M.: Direct Manipulation of Interactive Character Skins. In: Proc. of the symposium on Interactive 3D graphics, pp. 27–30. ACM, New York (2003) 7. Kavan, L., Žảra, J.: Spherical Blend Skinning: A Real-Time Deformation of Articulated Models. In: Proc. of the 2005 symposium on Interactive 3D graphics and games, pp. 9–16. ACM, New York (2005) 8. Mohr, A., Gleicher, M.: Building Efficient, Accurate Character Skins From Examples. In: Proc. of International Conference on Computer Graphics and Interactive Techniques, pp. 562–568. ACM, New York (2003) 9. Allen, B., Curless, B., Popovic, Z.: Articulated Body Deformation From Range Scan Data. In: Proc. of the 29th annual conference on Computer graphics and interactive techniques, pp. 612–619. ACM, New York (2002) 10. Wang, X.C., Phillips, C.: Multiweight Enveloping: Least Square Approximation Techniques for Skin Animation. In: Proc. of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 129–138. ACM, New York (2002) 11. Shen, J., Thalmann, D.: Realistic Human Body Modeling and Deformation. Master Thesis, Ecoles Polytechniques fédérale de Lausanne (1996) 12. Hyun, D.E., Yoon, S.H., Chang, J.W., Seong, J.K., Kim, M.S., Jüttler, B.: Sweep-based Human Deformation. The Visual Computer 21, 542–550 (2005) 13. Hermawati, S., Marshall, R.: Realistic Flesh Deformation for Digital Humans in Ergonomics Modeling. In: Proc. Digital Human Modeling for Design and Engineering Symposium. SAE, Pittsburgh (2008) 14. Golden, D., Wojcicki, J., Jhee, J.T., Gilpin, S.L., Sawyer, J.R., Heyman, M.B.: Body Mass Index and Elbow Range of Motion in a Healthy Pediatric Population. Journal of Pediatric Gastroenterology and Nutrition 46, 196–201 (2008)
Database-Driven Grasp Synthesis and Ergonomic Assessment for Handheld Product Design Keisuke Kawaguchi, Yui Endo, and Satoshi Kanai Graduate School of Information Science and Technology, Hokkaido University, Kita-14, Nishi-9, Kita-ku, Sapporo 060-0814, Japan
Abstract. Recently, simulation-based ergonomic assessments for handheld products, such as mobile phones, have seen a growing interest and have been increasingly studied. In these studies, the combination of 3D product models and “digital hands”, which are a parametric 3D models of human hands, have been used. One of the keys to the ergonomic assessment using the digital hand is the “grasp synthesis” where plausible grasp postures for the product model have to be generated. In this paper, we propose a new database-driven grasp synthesis method considering the geometric constraints of grasping handheld products. The proposed method can generate more plausible grasp postures for handheld products in easier interactions than those of previous ones. Keywords: digital hand, joint range of motion, grasp synthesis.
1 Background Simulation-based ergonomic assessments have been applied into various product designs such as automobiles and aircrafts. On the other hand, simulation-based ergonomic assessments for handheld products such as mobile information appliances, handy tools, and containers have not been studied much. Recently, in order to develop ergonomic-conscious handheld products, “digital hands” have been proposed and applied to the virtual assessment of ergonomics [1-6]. The digital hand is a deformable and precise 3D model of a human hand with rich dimensional variations. The key to assessing the ergonomics of the grasp, such as stability, easiness and fitness of the grasp are reliable “grasp synthesis” methods which can generate plausible grasp postures for products. So far, a grasp synthesis method where the user specifies corresponding contact points between a product model and a digital hand has been proposed [2]. However, inputting contact points which enable the generation of plausible grasp postures was difficult for users. Alternatively, data-driven grasp synthesis [6] has also been proposed. However, the generated postures included many where the user is unable to manipulate the products. Moreover, the other data-driven synthesis [7] based on a neural network has also been also proposed. However, the generated postures included many improper postures. Both of these data-driven algorithms only considered the geometry of the product model, therefore the generated postures wound up including many postures which were unsuitable for manipulating the products. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 642–652, 2009. © Springer-Verlag Berlin Heidelberg 2009
Database-Driven Grasp Synthesis and Ergonomic Assessment
643
The purpose of this study is to solve these grasp synthesis problems by proposing a new data-driven grasp synthesis method, one which can generate only those grasp postures that are suitable for manipulating the products.
2 Related Work Applications of 3D models of a human hand have been previously studied. Some research has proposed applications for the digital hand. For example, in robotics a grasp planning system for a robotic hand design has been proposed [8]. In computer graphics, a system for generating the finger motions involved in playing the guitar was also proposed [9]. However, neither of them aimed for the product’s ergonomic assessment and these hand models lacked the accuracy required for an assessment. Grasp posture generation methods for assessing product models have also been studied. The posture generation methods are roughly classified into two types: a generative method [1, 2] and a variant one [6]. In the generative method described in [2], the grasp posture is generated by using a full-/semi-automatic grasping algorithm. Based on two pairs of contact points inputted by the user, this method can generate a grasp posture where the digital hand surfaces fits to the product model surfaces. However, selecting the two pairs of contact points needed to generate a plausible grasp posture required trained users. Conversely, in the variant method in [6], the actual grasp postures of many subjects of sample objects have been measured using a data glove in order to build a grasp posture database. If a sample shape similar to a given product shape could be found in the database, a nearly appropriate grasp posture for the product could be obtained after a modification process. This method did not require the difficult task of inputting contact point pairs in [2]. However, the generated postures included many where the user is unable to manipulate the given product. In this paper, in order to solve these problems described above, we propose a new data driven grasp synthesis method. By using the grasp posture in the database, the system does not require difficult inputs from the user. In addition, by imposing the grasping constraints on the joint range of motion of the human upper limb and the visibility of the fixation area on the synthesis, the system can generate a grasp posture where the user can manipulate the product. Furthermore, the modification process for wrist posture and finger posture enables the system to refine the synthesized grasp posture into a more plausible one.
3 Digital Hand In our grasp synthesis method, we use the digital hand proposed in [1]. It consists of a link structure model and a surface skin model shown in Fig. 1(a). The link structure model simulates the rotational motion of human hand bones at each hand joint. The model was constructed based on measurements determined by MRI and motion capture. It has 17 links, and each one has a joint on both ends which has 1, 3 or 6 DOF. The surface skin model is composed of a 3D triangular mesh model. The skin model is able to be deformed using our developed surface skin deformation algorithm related
644
K. Kawaguchi, Y. Endo, and S. Kanai
(a) Link structure and skin surface models
(b) Hand size variations model
Fig. 1. Digital hand
to the finger joint rotation angles [2].The digital hand has nine size variations derived from measuring several hundred Japanese subjects. As shown in Fig. 1(b), the hand sizes are classified into 9 variations based on its thickness and length [10].
4 Algorithm of Plausible Grasp Synthesis Fig. 2. shows the process of our proposed plausible grasp synthesis. Details of the process are described in the following subsections.
Fig. 2. Grasp Synthesis Process
Database-Driven Grasp Synthesis and Ergonomic Assessment
(a) Feature points
(b) Real grasp posture under measurement
(d)Representative points
(e)Contact points
645
(c) Reproduced grasp posture of the digital hand
(f)Alignment triangle
Fig. 3. Constructing Grasp Posture Database
4.1 Constructing the Grasp Posture Database To store grasp posture data for a real product in the database, the locations of 21 feature points shown in Fig. 3(a) on the surface of a real human hand grasping an existing product were measured in advance by contact type 3D-CMM (MicroScribe). For example, the measurement is done as shown in Fig. 3(b). The grasp posture was then reproduced in the digital hand shown in Fig. 3(c) by fitting the feature points of the digital hand to the measured locations. Alternatively, in advance, representative points shown in Fig. 3(d) were placed on the surface of the digital hand. Then, a small set of contact points, shown in Fig. 3(e), which may come into contact with the real product surface were manually selected from the representative points of the hand with the grasp posture. Triangles were subsequently generated for all combinations of three contact points. The triangle with the largest area was selected as an alignment triangle, shown in Fig. 3(f), for the grasp posture. The combination of this grasp posture and its alignment triangle were stored in the grasp posture database. 4.2 Generating Grasp Posture Candidates When a new query product model, as shown in Fig. 4(a), whose shape is different from the stored products is entered into the system, the system generates grasp
646
K. Kawaguchi, Y. Endo, and S. Kanai
posture candidates for the query using the following process. First, as shown in Fig. 4(b), a set of representative points is generated on the surface of the query product model. Then, a set of alignment triangles for the model, shown in Fig. 4(c), is generated by randomly selecting 3 points from the representative points. Finally, the system tries to match the alignment triangles of the query product model to that of the grasp postures stored in the database. As shown in Fig. 5(a), this matching is done based on the “triangle matching” whose details are the followings: 1. The system translates the bary-center of the alignment triangle of the digital hand to the product one, as shown in Fig. 5(b). 2. The system rotates the normal of the alignment triangle of the digital hand to the product one around the axis of the vector product of two normals so that the two normals of the alignment triangles coincide, as shown in Fig. 5(c). 3. The system rotates the alignment triangle of the digital hand around the axis of the matched triangle normal so as to minimize the sum of the distances between corresponding vertices of the two alignment triangles, as shown in Fig. 5(d). Using the triangle matching, the system can generate grasp posture candidates. However, these candidates usually include many improper postures, where the user cannot manipulate the product. Therefore, the system needs to eliminate the improper postures.
(a) 3D mesh model to be grasped
(b) Representative points
(c) Some of alignment triangles
Fig. 4. Generation of the alignment triangles in a query product model
(a) Alignment triangles
(c) Matching normals
(b) Alignment bary centers
(d) Minimizing corresponding vertices Fig. 5. Triangle Matching
Database-Driven Grasp Synthesis and Ergonomic Assessment
647
4.3 Narrowing Down the Grasp Postures Based on Grasping Constraints In improper grasping postures, the user cannot hold or manipulate the product in a natural attitude. For example, the user cannot view the fixation area from the correct angle and distance while holding the product. To eliminate such postures from the candidates, constraints on the joint ranges of motion (J-ROM) are introduced into the upper body link model. As shown in Fig. 6, the model consists of an upper limb link model and a head link model derived from a commercial digital human model (Poser).The digital hand is connected with the upper limb link model through the wrist joint with 3DOF. Eliminating improper grasp postures is done as the following. First, as shown in Fig. 6, the user specifies the parameters φ and Leye to situate the product model into the right position and orientation relative to the upper body link model. The positions of 4 corners of the fixation area in the model coordinates are also specified. φ denotes the neck rotational angle, and Leye denotes the distance between the eyes and the fixation area surface. The system then automatically rotates the neck joint so that the angle between the line of sight and horizontal plane is identical to φ, and so that the fixation area is placed perpendicular to the line of sight with the length of Leye. After this placement, the grasp posture candidates are narrowed down by considering the constraints on the J-ROM. Twelve rotation angles for four upper limb joints are computed by solving the inverse kinematics based on the CCD (Cyclic-Coordinate Descent) Method [11] so that the wrist position and orientation of the upper body link model match the placed wrist of the grasp posture candidate. If the computed angles are beyond the J-ROM, the posture is eliminated from the list of candidates. The value of the J-ROM is summarized in Table 1. Moreover, the grasp posture candidates are narrowed down by applying the constraint on the visibility of the fixation area. Two view frustums with bottom surfaces identical to the fixation area surface are constructed. If a portion of the digital hand collides with these frustums, the grasp posture candidate is eliminated.
Fig. 6. Upper body link model
648
K. Kawaguchi, Y. Endo, and S. Kanai Table 1. Joint Range of motion [deg]
4.4 Eliminating Collision between Digital Hand and Product Model By considering the grasping constraints, the grasp posture candidates are narrowed down to ones where the user can manipulate the product. However, collisions between the digital hand and the product model may still exist among the selected candidates, because the inputted new product model has a different shape from the old one used during the construction of the grasp posture database. To deal with this issue, the system eliminates the collisions between the digital hand and the product model through the following process. First, the system classifies the state of collision vertices on the digital hand surface into three states, as shown in Fig. 7. Each vertex is classified by comparing its penetration depth d v from the product model surface to its threshold value τ. The threshold value τ reflects the difference of local skin contact stiffness. When d v ≤ τ , the system classifies the vertex collision state as “contact”. When d v > τ , the system classifies the vertex collision state as “colliding”. When the vertex doesn’t collide with the product model, the system classifies the vertex collision state as “outside”. After the classification, the system tries to eliminate the “colliding” vertices from the hand by changing the wrist position, wrist orientation, and finger posture. First, if any “colliding” vertex exists on the palm, the system derives the average normal from the normals of those vertices and translates the wrist position in the opposite direction to it. Then the system rotates the wrist joint to maximize the contact area on the palm. The system repeats these wrist modification processes until the “colliding” vertices on the palm do not exist. Finally, the system eliminates the collisions of the finger
Fig. 7. Classfication of vertices on Digital Hand surface
Database-Driven Grasp Synthesis and Ergonomic Assessment
649
vertices. It searches for each finger joint rotational angle so as to maximize the contact area on the finger among its range of motion. 4.5 Refining Grasp Posture Based on Contact Vertices Distribution In spite of the collision elimination process in 3.4, the wrist position and orientation of the derived grasp posture is not necessarily plausible for product manipulation. Therefore, the system tries to refine the derived grasp posture into a more plausible one by refining the grasp posture based on the elliptic cylinder approximation of the contact vertex distribution. First, the system fits an elliptic cylinder to a set of the distributed contact vertices on the product model, as shown in Fig. 8(a). In this fitting, the system performs a nonlinear optimization where the distances between the contact vertices and the elliptic cylinder surface, shown in Fig. 8(b), is minimized, so that the medial axis of the contact vertices can be found as an axis of the cylinder. In this optimization, we use 8 control variables which consist of 2 lengths of major and minor axes of the ellipse, 3 angles representing the orientation of the cylinder, and 3 coordinates representing the position of the cylinder bary-center. An initial cylinder axis is derived from an analysis of the principal directions of contact vertices. After fitting the elliptic cylinder, the system rotates the digital hand about the cylinder axis at 5 degree interval in the range of ±30 degrees, as shown in Fig. 8(c). At each angle, the system rotates the wrist joint to maximize the contact area on the palm. Finally, by searching for each finger joint
(a) Distribution of contact vertices
(b) Fitted elliptic cylinder
(c) Rotation about fitted cylinder axis Fig. 8. Contact area approximation
650
K. Kawaguchi, Y. Endo, and S. Kanai
rotation angle among its range of motion, the system refines the finger posture so that the contact area on the finger becomes maximized.
5 Results of Plausible Grasp Synthesis Fig. 9 shows the result of a query conducted on a bicycle handle bar. In advance, a grasp posture for a different bicycle handle bar was stored in the database. By triangle matching using the grasp posture in the database, 4087 grasp posture candidates (Fig. 9(a)) were generated for the query product model. The candidates were then narrowed down to 75 postures which were regarded as good for grasp (Fig. 9(b)), applying the grasping constraints. Two parameters φ and Leye were in agreement with the usual bicycle riding positions. But some collisions with the digital hand and the product model existed in these postures (Fig. 9(c)). To eliminate the collisions, wrist position, wrist orientation and finger joint rotational angles were modified (Fig. 9(d)). Those collisions were eliminated, but the wrist posture was very different from one of the experimental results (Fig. 9(f)). Finally, the grasp posture was refined based on the elliptic cylinder approximation of the contact vertices (Fig. 9(e)). By comparing the
(a) A part of the postures before narrowed down
(c) A selected posture with collisions
(b) A part of the postures after narrowed down
(d)The posture after collision elimination
(e) The posture after proposed refinement
(f) An experimental posture
Fig. 9. An example of plausible grasp synthesis for a bicycle handle bar
Database-Driven Grasp Synthesis and Ergonomic Assessment
651
line along the back of the hand before and after the refinement to one from the experimental result, it was shown that the line after the refinement is more similar to the experimental one than before the refinement. From this result, it was verified qualitatively that the grasp posture was refined into a more plausible one by the proposed refinement process.
6 Conclusion and Future Work A new data driven grasp synthesis method was proposed based on searching for a similar grasp posture example from a database. The search generated grasp posture candidates with triangle matching, narrowing down the candidates by applying grasping constraints consisting of the upper limb joint range of motion and the visibility of the fixation area. Collisions were eliminated between the digital hand and the product model by modifying the wrist posture and finger posture, and refinement of the grasp posture based on the elliptic cylinder approximation of the contact vertices. We verified the effectiveness of this method from the results of an experiment with a bicycle handle bar. In our future work, we will implement a new grasp synthesis algorithm considering difficulties with product manipulation tasks based on the finger joint rotation angle.
References 1. Yui, E., Satoshi, K., et al.: Virtual ergonomic assessment on handheld products based on virtual grasping by digital hand. SAE Trans. J. of Passenger Cars: Electronic and Electrical Systems 116(7), 877–887 (2008) 2. Yui, E., Satoshi, K., et al.: An Application of a Digital Hand to Ergonomic Assessment of Handheld Information Appliances. In: Proceedings 2006 Digital Human Modeling for Design and Engineering Conference (2006) 2006-01-2325 3. Jingzhou, Y., Esteban, P.P., Joo, K., Abdel-Malek, K.: Posture Prediction and Force/Torque Analysis for Human Hands. In: Proceedings 2006 Digital Human Modeling for Design and Engineering Conference, 2006-01-2326 (2006) 4. Jaewon, C., Thomas, J.A.: Examination of a Collision Detection Algorithm for Predicting Grip Posture of Small to Large Cylindrical Handles. In: Proceedings 2006 Digital Human Modeling for Design and Engineering Conference, 2006-01-2328 (2006) 5. Natuki, M., Makiko, K., Masaaki, M.: Posture Estimation for Screening Design Alternatives by DhaibaHand - Cell Phone Operation. In: Proceedings 2006 Digital Human Modeling for Design and Engineering Conference, 2006-01-2327 (2006) 6. Ying, L., Jiaxin, L.F., Nancy, S.P.: Data Driven Grasp Synthesis using Shape Matching and Task-Based Pruning. IEEE Transactions on Visualization and Computer Graphics 13(4), 732–747 (2007) 7. Kyota, F., et al.: Detection and Evaluation of Grasping Positions. In: Proceedings of the 2005 ACM SIGGRAPH sketches, vol. 80 (2005) 8. Andrew, T.M., et al.: Grasp It!: A Versatile Simulator for Robotic Grasping. IEEE Robotics and Automaton Magazine, 110–122 (2004)
652
K. Kawaguchi, Y. Endo, and S. Kanai
9. George, D.K., Karan, S.: Handrix: Animating the Human Hand. In: Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 110–119 (2003) 10. Makiko, K., et al.: An Analysis of Hand Measurements for Obtaining Representative Japanese Hand Models. In: Proceedings of the 8th Annual Digital Human Modeling for Design and Engineering Symposium, 2005-01-2734 (2006) 11. Chris, W.: Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation. Simon Frasier University Master’s Thesis (1993), ftp://fas.sfu.ca/pub/cs/TH/1993/ChrisWelmanMSc.ps.gz
Within and Between-Subject Reliability Using Classic Jack for Ergonomic Assessments Brian McInnes, Allison Stephens, and Jim Potvin Ford Motor Company, McMaster University {bmcinne2,astephe6}@ford.com, [email protected]
1 Introduction As the use of computer-aided ergonomic tools become more prominent in performing ergonomic evaluations early in the design phase, the drive to improve upon the validity, reliability, and accuracy of the technology will increase. Posturing a digital human (DH) in a virtual environment proves to be a challenging task. There are a very large number of possible positions in which the DH can be positioned for any given task, and the position that the DH is postured into may differ depending on the experiences of the user. This may lead to different conclusions regarding the acceptability of an operation. Digital human model manipulation requires that an Ergonomist use computer software to manipulate a digital human model within a virtual environment, and requires that assumptions be made regarding how the workers would position themselves in such an environment. To use this method effectively, it is necessary to accurately posture the digital human [3] and some have shown that large errors can result when these postures are compared to those adopted by real workers [5, 6]. It has also been suggested in the literature that static posturing may underestimate risk [1]. Lamkull et al [4] studied four automotive assembly simulation cases with computer software (Ramsis) that included posture prediction algorithms with which the subjects could apply manual adjustments. Each case assessment was repeated six times. They note the challenges facing the assessor, such as the determination of the most appropriate “frozen” moment to analyze whether two hands can be used and whether it is possible to use a support arm for one handed tasks. They results indicate that significant variability existed within and between subjects for a number of joint angles. While some studies do exist, much more data is needed to evaluate the reliability and validity of using digital human model manipulation for performing proactive ergonomic analyses of musculoskeletal injury risk. The purpose of this study was to evaluate the between and within subject reliability when using the Classic Jack Human Simulation Program (UGS, Plano, Texas, USA) for the ergonomic analysis of workstations. It is anticipated that this study will result in the development of guidelines that will determine when the use of static postures should or should not be employed.
2 Methods 2.1 Tasks There were 12 Tasks in the study, 6 from a Ford car plant and 6 from a Ford truck plant. Two workstations were studied from each plant, and three Tasks were taken V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 653–660, 2009. © Springer-Verlag Berlin Heidelberg 2009
654
B. McInnes, A. Stephens, and J. Potvin
from each workstation (n = 2 x 2 x 3 = 12). Apart from the workstations being selected based on vehicle type, they were also selected based on work zone (i.e. the horizontal and vertical reach that the operator would have to apply the force in order to complete the job). The installation efforts for each Task were derived from existing surrogate data or handheld force gauge testing on a limited sample size. 2.2 Study Design An ergonomic evaluation was performed on each of the 12 Tasks. Six professional Ergonomists statically postured a mannequin with the Classic Jack (Siemens AG) software (hereafter referred to as “CJ” method). These subjects had to predict how an operator might perform the Task without having the benefit of having ever seen the actual Workstation (simulating the proactive analysis process associated with future programs). For each subject tested, three repeat trials were completed with each Method/Task combination. This component was a part of a larger study that also investigated motion capture based methods for proactive ergonomic assessments. 2.3 Posture Collection Six professional Ergonomists (4 males and 2 females) performed the CJ method portion of the study (Table 1). Each had experience with these assessments at Ford Motor Co., and had varying degrees of CJ experience. Each subject was required to use the CJ software to perform an ergonomic analysis on all 12 Tasks, with 3 repeat trials at each, for a total of 36 trials per subject. At least one week elapsed between repeat assessments to reduce the amount of transference and recall from the previous analysis. The workstations were presented to the subjects in a random order, within each session of 12 Task assessments. Each subject was presented with the virtual work environment on a laptop computer and the study was conducted at a location of the subjects’ choosing. In order to complete a workstation, the subject was given a set of work instructions that described the job the virtual human was to perform in the virtual work environment, the installation effort in Newtons, and other general instructions that pertained to every workstation. Each subject was required to manipulate the virtual human into the posture they would predict would be used by an actual operator to perform the Task. This was achieved by manipulating the virtual human in all three planes in the virtual environment. The subjects were instructed to position the virtual human in the posture they thought would eventually be used for the task. A 50th percentile female virtual human (Jill) was used for all posturing and subsequent analyses. The values of the total forces (which varied between 4 to 89 N) and the direction of the forces were kept constant for all subjects and trials for each Task. The subjects were instructed to keep Jill free of collisions with the vehicle, the frequency was set at 1/min for all jobs, and Jill had to maintain visibility of the part for all jobs. When not instructed otherwise, subjects could choose to have Jill perform the task with one or two hands.
Within and Between-Subject Reliability Using Classic Jack
655
Table 1. Subject Statistics
S1 S2 S3 S4 S5 S6 Average St. Dev.
Subject Stats Gender Age Years Experience M 33 0.25 M 35 2 F 45 0.83 M 27 2 F 31 2 M 32 0.25 33.83 1.22 6.08 0.88
2.4 Data Analysis All 3 trials were processed and this included manipulation of the mannequin to achieve various postures. Thus, once the subject had arrived at the final posture, the kinematic and kinetic data could be output. However, with the MoCap data, preprocessing was needed to match up the marker coordinates and the Jill mannequin. To do this, the MoCap data were processed in EVaRT using a number of steps: 1) connect all of the markers in EVaRT in the correct sequence, 2) select a frame in the beginning or end position, where the subject was standing in a “T” pose, and apply the skeleton to the figure, 3) transfer this data into Jack 5.0 with Task Analysis Toolkit and Motion Capture Add-ons. The mannequin used in Jack was based on the segment lengths from each individual subject and was the same sex as the subject as well, but always had 50th percentile female mass, 4) position the mannequin as close as possible to the markers from EVaRT, and the force arrows were added and placed in the proper direction with the correct hand(s) chosen to apply the force, 5) run the Ford Ergonomics SSP Solver (described below). The SSP Solver was always based on 75th percentile female strength, no matter the size or the sex of the subject. 2.5 Ford Ergonomics SSP Solver The version of the Jack Static Strength Prediction (JSSP) tool used by Ford Motor Co. is called the Ford/Ergonomics Static Strength Prediction Solver (FSSPS) and is discussed in detail in Chiang et al [2]. There are two modes in which an assessment can be performed. The first mode determines the implications of set hand loads that are manually entered into the interface. This mode outputs torques, strengths and percent capable (%Cap) for each joint, and the %Caps were used to determine if the Task was considered as “acceptable” for that trial. In this mode, the user selects the hand(s) in which the force is applied, if there are any supporting hands or external support, and the frequency at which the force is exerted. For the second mode, the user selects the Solve SSP button once the correct posture of the digital human is attained. This will show the maximum force that would still make the Task “acceptable”. When support hands are used, it was not possible to output values for joints affected by this support (as its force was not measured). For example, if the right hand was used for an insertion, and the left hand was used to support the body, then only right arm values would be output from the FSSPS and the left arm, trunk and leg values would be blacked out in the Solver.
656
B. McInnes, A. Stephens, and J. Potvin
2.6 Dependent Variables Kinematic Variables Some key joint angle variables were recorded from the Jack outputs and these included all angles from the elbows, shoulder and trunk. Given that some Tasks were performed with one hand, and this hand differed between subjects, weighted joint averages were also calculated. Basically, this combined the right and left sides with values weighted by how much force was applied with that side. For example, when only the right hand was used, the weighted elbow angle would be the same as the right elbow angle. When the right hand was used with 20 N and the left hand was used with 30 N, then the weighted elbow angle would be based on a 40% weighting of the right elbow angle and a 60% weighting of the left elbow angle. This calculation served to decrease the between-subject variability caused by the wide variety of postures observed for the unloaded arms, especially when subjects differed in which hand they used for one handed Tasks. Finally, various marker-to-marker distances were calculated. For any loaded hand, the moment arm was estimated from the hand to both the shoulder and L5/S1. For anterior/posterior pushes and pulls (Tasks 5 & 8), this was calculated as the height difference between the hand and shoulder or L5/S1 (along the vertical Y axis). For vertical loads (Tasks 1, 3, 4, 6, 7,10, 11) this was calculated as the horizontal distance from the hand to either joint (on the XZ plane). For lateral pushes (Tasks 2 9, 12) this was calculated as the 3D reach distance from the shoulder to either joint (XYZ resultant). Kinetic Variables The percent capable (%Cap) was determined based on the CJ strength module and was recorded for the trunk flexion, lateral bend and axial twist axes and the right and left elbows, shoulder Ab/Ad and Fw/Bk axes. The lowest of the three axes was used as the trunk %Cap, the lower of the two sides was used as the elbow %Cap and the lowest of the 6 axes across both sides was used as the shoulder %Cap. The lowest of all 13 joint axes was used to represent the “weakest link” for each trial and was recorded as the “Total %Cap”. If the Total %Cap was above Ford’s tolerance limit value of 75%, then the Task was considered to be “acceptable” and, if not, it was considered to be “unacceptable”. The Solved force was recorded to represent the maximum acceptable force that would be acceptable to 75% of females, for any loaded hand for each trial. The Total Solved force was the summation of the solved forces from either or both loaded hands, depending on how many hands were used. For the left and right sides, the resultant shoulder torque was calculated with the three axes and was considered to be the higher of the two sides. In addition, right and left hand was use (yes/no), the total number of hands used and the L5/S1 compression force were recorded for each trial. 2.7 Statistical Analysis The independent variables were Task (n = 12) and Method (n = 2). Means and withinsubject standard deviations (SDs) were calculated across the three trials for each subject/Task/Method combination. These within-subject SDs were then averaged across subjects for each Task/Method combination to determine the average within-subject variability for each combination.
Within and Between-Subject Reliability Using Classic Jack
657
3 Results There were a wide variety of demands in the Tasks studied, with the average Total %Capable values ranging from 17.1% to 96.4% across the 12 workstations. 3.1 Between-Subject Reliability Total Percent Capable standard deviations ranged from 0.2% to 14.9% with an average of 5.6%. Overall, the variability between subjects was relatively low for the Joint Kinetics variables. The standard deviation for the joint angles averaged in the range from 5.3 deg (Trunk Axial Twist) to 11.5 deg (Trunk Flexion). In general, the variability, within subjects, for the weighted elbow and shoulder angles were approximately 11 deg. (Table 2) 3.2 Within-Subject Reliability The within-subject variability values were generally quite comparable to those of between-subject values presented above (Table 2). Table 2. Average within and between subject variabilities. The far right column indicates the average value for each variable. Each Mean value is pooled across the 12 workstations. Minimum and maximum values are also provided. (n= 12 for all variables except Trunk %Cap and L5/S1 Compression where n=6 due to subject leaning).
Joint Angles (deg)
Joint Kinetics
Elbow (weighted) Shoulder (weighted) Trunk Flx/Ext Trunk Lateral Trunk Axial Elbow %Cap Shld %Cap Trunk %Cap Total %Cap Shoulder Result. Torque (Nm) Solved Force (N) L5/S1 Comp (N)
Within Subject St.Dev. Mean Min Max 10.0 1.5 19.3 10.5 6.3 16.3 8.1 0.0 20.6 4.7 0.0 15.3 5.4 0.1 14.5 6.8 0.6 18.8 4.4 0.6 10.4 3.5 0.8 13.0 5.5 0.6 11.7 1.7 0.4 4.2 9.3 1.4 17.4 76.8 53.7 116.8
Between Subject St. Dev. Mean Min Max 11.0 2.6 21.1 10.7 5.8 18.5 11.5 0.0 31.0 5.8 0.1 17.3 5.3 0.1 19.3 5.8 0.4 13.8 4.6 0.2 17.0 3.8 0.4 13.1 5.6 0.2 14.9 2.3 0.5 7.1 10.5 0.9 23.6 100.6 35.7 258.8
Mean 34.9 134.7 20.9 -2.6 -2.4 83.5 70.6 81.1 64.7 17.7 53.3 736.1
4 Discussion The results indicate that there are potentially large inconsistencies between and within-subjects when using CJ to ergonomically assess workstations. Two main factors were found to influence the results: number of hands used to perform the task and shoulder moment. However, while some large variabilities did exist, the average values, pooled across workstations, were generally low for both the joint angle and kinetics data, indicating good reliability both within and between subjects. 4.1 Joint Angle Variables In 9 of 12 Tasks, the real subjects were observed to maintain an upright trunk, ranging from 10 degrees of extension to 5 degrees of flexion. When pooled across the 12
658
B. McInnes, A. Stephens, and J. Potvin
Tasks, the CJ method was generally observed to have an average of 21 deg more trunk flexion than was observed with the real workers. There were six Tasks where CJ trunk flexion was much higher than what was observed with the actual workers in the plant. Three of these (#10, 11 and 12) were one armed overhead reaches into the vehicle and the other three (#2, 3 and 9) were one-armed reaches at lower heights. One of these (#2) was associated with less lateral bend than the real case, as the subjects seemed to align the mannequin mainly through trunk flexion. For three of these Tasks, (#9, 10 and 12), the CJ data demonstrated trunk lateral bending angles that were about 10 degrees higher than the real workers, in addition to the higher flexion. Two of these (#10 and 12) also had loaded elbows that were much more extended and loaded shoulders that were more deviated. Overall, the one handed overhead tasks with a support arm (#10, 11, 12) appeared to be the most difficult for the CJ method. Compared to real values, this method tended to result in more trunk flexion, shoulder abduction and arm extension. Trunk and shoulder deviation also appeared to be too high for a number of other Tasks. It is likely that experienced workers learn methods to reduce these deviations about the trunk and shoulder and that this learning is not accurately reflected in the proactive Methods tested. In general, the between and within-subject variabilities were between 4.7 and 11.5 degrees, on average. This demonstrated moderate reliability when pooled across subjects and tasks, although there were cases were large discrepancies existed (eg. between-subject standard deviation of 31.0 for Station #3 for the trunk flexion angle) 4.2 Joint Kinetics There was generally very good agreement between the CJ method and the real worker data, however the CJ values tended to be slightly higher overall. The CJ subjects often positioned the mannequin closer under the loading location and this resulted in decreased moment arms and shoulder moments so that the %Cap value was higher than real (average of 87% vs 76%). While the CJ posture may be considered more optimal, based on the outputs from the Classic Jack software, there may have been a very good reason for the real workers to stand back somewhat farther. It is possible that this positioning allowed for a less extended neck or deviated wrist posture, which may have affected posture and/or caused an important trade off with increased shoulder load. Neck and wrist posture are very important risk factors that are not currently considered in Jack’s biomechanical model. As such, they were also not considered in the current study. This is a limitation of both the tested version of Jack 5.0, and the study being discussed. Compared to %Cap values, there was much more disagreement between the CJ method and real for the solved forces, which indicate the estimated maximum forces that would be acceptable to 75% of females for the postures studied. There were some very interesting patterns in these findings. First, the differences generally increased as the Task demands decreased. For example, the five Tasks with the largest range and standard deviation in solved forces between Methods (3, 1, 6, 4 and 8) were among the six Tasks having the highest %Cap values. On further analysis, it was found that these five Tasks had the lowest risks because they could all be done with short horizontal reaches from the shoulder to the hand and none had individual hand forces that
Within and Between-Subject Reliability Using Classic Jack
659
exceeded 45 N. There was not a pattern to the direction of the force effort as three were up, one was down and one was forward. The two Tasks with the highest %Cap values (3 and 6) were mainly this way because of low hand forces (4 and 11 N) and not necessarily because they were performed in optimal hand locations (Task 3 had a moderate reach and Task 6 was overhead). As such, the solved force for these two Tasks were not as high as might be expected based on the very high %Cap values. It is hypothesized that most of the easier Tasks had higher solved force variability between CJ and real because they were easier Tasks. In four of the five cases (Tasks 1, 4, 6 and 8), the real solved values were substantially lower than that from CJ. This was likely because the Tasks could be performed in non-optimal postures with low risk, such that there was no need to make an effort to adopt a posture that would allow for higher forces. Conversely, the CJ subjects were instructed to find a posture that they would predict would be used for the task in the future, with an emphasis on increasing the overall percent capable value to the highest magnitude possible. Thus, they had an incentive to minimize moment arms and reaches to allow for higher forces. The one exception was Task 3, where the real workers had substantially higher solved forces (73 N) than with CJ. This was mainly due to one real subject that adopted a posture to allow the vertical forces to pass much more closely to the joint, greatly reducing the moments caused by the loads and increasing the forces that would be acceptable. As noted previously, these data serve to highlight a potential limitation in the current process. For Task 4, with its negligible hand force requirements, it is possible that CJ subjects simply selected any posture that would reach the foam location, without trying to optimize the posture for maximum force. In general, the average between and within-subject variability values were also modest for the joint kinetics measures in this study. For Total %Cap, this was about 5.5%. However, the values could go as high as 14.9%, which could have a substantial effect on the decisions made regarding the acceptability of the task.
5 Recommendations Based on the findings of this study, the following recommendations were made regarding the use of CJ mannequin posturing for the use in proactive ergonomic assessments. 1. It does not appear that Classic Jack (CJ) is the optimal Method for operations like Task #2 (Dash Panel Grommet), where either hand can be used, the possibility exists for a support hand, the reach is at, or above, approximately 81 cm and the force is lateral to the reach. 2. It is recommended that the mannequin be limited to allow no neck extension when using CJ. This will ensure that the mannequin is not placed unrealistically close under the load for overhead work. There is evidence to suggest that the CJ method may have resulted in higher percent capable and solved force values based on the joints monitored, but that this may not have been the case if the risk to the neck was accounted for.
660
B. McInnes, A. Stephens, and J. Potvin
3. It is suggested that users of CJ attempt to keep trunk and shoulder postures as close to neutral as is possible and feasible. This appears to be the strategy employed by the Real subjects in this study. 4. Postures should be adopted that allow for vision of the part (unless if it is obvious that vision will not be necessary). 5. Encourage those using CJ to use two hands whenever that is feasible for the Task. This will generally increase both the %Cap and maximum solved forces. 6. For tasks that are not performed overhead, users should be encouraged to move the mannequin as close as possible to the location of exertion. 7. Every effort should be made to understand the actual forces required for part installation. The user should posture appropriately for the task demands.
References 1. Cappelli, T.M., Duffy, V.G.: Motion capture for job risk classifications incorporating dynamic aspects of work. In: Society of Automotive Engineers, 2006-01-2317 (2006) 2. Chiang, J., Stephens, A., Potvin, J.R.: Retooling Jack’s Static Strength Prediction tool. In: SAE Digital Human Modelling Conference, Lyon, France (2006) 3. Chaffin, D.B., Erig, M.: Three-dimensional biomechanical static strength prediction model sensitivity to postural and anthropometric inaccuracies. IIE Transactions 23(3), 215–227 (1991) 4. Lamkull, D., Hanson, L., Ortengren, R.: Consistency in figure posturing results within and between simulation engineers. Society of Automotive Engineers, 2006-01-2352 (2006) 5. Pewinski, W., Esquivel, A., Ruud, J., Stefani, M., Barbir, A.: Jack e-factory correlation to real world seating reach and effort applications. Society of Automotive Engineers, 2005-012707 (2005) 6. Reed, M.P., Parkinson, M.B., Klinkenberger, A.L.: Assessing the validity of kinematically generated reach envelopes for simulations of vehicle operators. Society of Automotive Engineers, 2003-01-2216 (2003)
Human Head Modeling and Personal Head Protective Equipment: A Literature Review Jingzhou (James) Yang1, Jichang Dai1, and Ziqing Zhuang2 1 2
Department of Mechanical Engineering, Texas Tech University, Lubbock, TX79409, USA National Personal Protective Technology Laboratory, NIOSH, Pittsburgh, PA 15236, USA [email protected]
Abstract. Human head is the most important but fragile part of human body. In order to design the head-gear and study the sophisticated capabilities of human head, the head models have been developing for decades. There are two types of human head models: digital headform and finite element model (biomechanical head model). The complexity of head structure makes these attempts very difficult until the invention of the high-speed computers and the modern medical devices like computed tomography (CT) or magnetic resonance imaging (MRI). Head modeling also has widely potential use in the design process of personal head and face protective equipment (PHFPE). Hazards of processes or environment, chemical hazards, radiological hazards, or mechanical irritants are encountered daily for workers. Those hazards are capable of causing injury or illness through absorption, inhalation, or physical contact. PHFPE includes helmets, masks, eye protection and hearing protection. This study attempts to review different kinds of head models and PHFPE, such as respirators, helmets and goggles. It mainly focuses on the historical developments. Keywords: Headform; biomechanical model; respirators; helmets; goggles.
1 Introduction Head injuries and facial damages are one of the main causes of death and physical disability. The incidence (number of new cases) of head injury is 300 per 100,000 per year (0.3% of the population), with a mortality of 25 per 100,000 in North America and 9 per 100,000 in Britain. In most of countries in the world, workers, athletes and people exposed in a hazardous environment are mandatorily asked to wear personal head protective equipment. Respirators, helmets and goggles are the three most commonly used ones. How to design and test them efficiently is a great concern for both manufacturers and customers. However, now the companies mainly rely on the experimental way to verify the design. This method is costly and does not have enough flexibility. Moreover, the procedures of the tests differ from one standard to another. So, digital simulation will be a good alternation or at least a necessary correction for experimental laboratory testing. With a digital head model, the manufacturers are able to test their products on a virtual human body, to check the comfort level and simulate different environments that may not be possible using experimental tests. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 661–670, 2009. © Springer-Verlag Berlin Heidelberg 2009
662
J. Yang, J. Dai, and Z. Zhuang
Intending to develop a low-cost alternative to the experimental test method, many head models have been proposed for the past 30 years. There are two types of head models: digital headform and finite element model. NIOSH uses Sheffield headforms to conduct respirator certification testing; the National Operating Committee on Standards for Athletic Equipment (NOCSAE) uses headforms created from anthropometric measurements of army aviators published in 1971 (NOCSAE, 2007); and the American National Standards Institute (ANSI) and the American Society for Testing Materials (ASTM) follow ISO/DIS6220:1983 for headforms (ANSI, 1997; and ASTM, 2002). The occupational and educational eye and face protective devices standard (ASTM, 2003) uses the Alderson 50th percentile male headform which is based on Health Education and Welfare data collected in the 1960s (First Technology Innovative Solutions, Plymouth, MI). Many other headforms used for certification testing in the United States are also based on anthropometric data collected over 30 years ago. Zhuang and Viscusi [34] developed the first surface-based 3-D headform for respirator fit testing using the date collected recently. Most of the other models belong to finite element models. FE models contain the anatomical structures of the human head. In the early models, due to the technique limit, they only use the simplified and regular geometry of the head. Recently, more complicated models have been developed that include more external and internal details of the head. The Hardy and Marcal [9] developed the first finite element model in 1971. Their skull-only model was then improved by Shugar [23]. He decided to treat the brain as kind of fluid. Then a more realistic model was developed by Nahum et al. [17] where the brain is modeled by means of 189 eight node brick elements and a linear-elastic behavior has been adapted to tissue mechanical properties. A wellknown model called WSUBIM (Wayne State University Brain Injury Model) was developed by Ruan et al. [21] where the number of nodes is increased to 6080 and elements to 7351. These numbers were again raised to 17656 nodes and 22995 elements by Zhou et al. [33]. The FE head model has been improving all the time, to name a few: [14, 16, 30, 31, 35]. Two sets of experimental data are commonly used to valid the FE models [17, 25]. But with the rapid development of computer technology, researchers now can use more detailed properties to model their FE models, and achieve good consistency with the experimental data [27, 28, 29, 30, 35]. In this paper, we first introduce the basic concept of head and face anthropometry. Then, different head models are reviewed in Section 3. In Section 4, we review the detailed PHFPE such as respirators, helmets, and goggles. Section 5 will summarize digital head models and PHFPE and give conclusion.
2 Head and Face Anthropometry Anthropometry, in physical anthropology, refers to the field that deals with the physical dimensions, proportions, and composition of the human body, as well as the study of related variables that affect them. Nowadays, to optimize the products in industrial design, clothing design and ergonomics where statistical data about the distribution of body dimensions in the population is needed, anthropometry became more and more critical. There are fifty three key parameters in human head and face [7]. In order to acquire the anthropometric measurements described above, a series of landmarks on
Human Head Modeling and Personal Head Protective Equipment
663
the subject’s face are chosen [1]. For traditional measurements, spreading and sliding calipers and tapes are used. But when it comes to the 3D scans, it may be a problem because some bony landmarks are not as readily apparent without palpation.
3 Head Modeling In the past several decades, many test head models have been used in all different experiments for various purposes. Digital head models play an important role in simulation environments related to head injuries or head protection equipment assessment. This section will summarize digital headform and FE models. 3.1 Digital Headform The types of headform vary from standard to standard. Since the headforms need to represent the anthropometry of a specific region, different countries developed their own headforms. The digital headform is the numerical model of the physical headform. There are many physical headforms, but few of them are digitized. In this section, we only introduce the digital headforms developed in U.S. In 1994, Reddi et al. [20] reported three types of headforms (small, medium and large) used in military ejection seat as well as fit assessment of helmet and other headsupported equipments. The geometries of these headform are obtained from U.S. Army Anthropometric Survey. Creation of these three digital headform designs was accomplished using the AutoCAD computer-aided design package. A total of 48 linear head dimensions were used to locate the positions of 26 facial landmarks. A headform wireframe was created through the facial landmarks from a system of spline entities. The AutoSurf surface modeling system was then used to generate surfaces between closed sections of the headform wireframe. To access fit test for respirators, Zhuang and Viscusi [34] developed a surfacebased headform model in Fig. 1. A Cyberware rapid 3-D digitizer, with its associated computer and data processing software, was used to scan 1,013 subjects (713 male and 300 female). A Class I laser was projected, in a thin line, onto the subject and followed the contour of the face and head during a 360 degree scan. Additional processing and measurement of the images was accomplished using Polyworks. The criteria for choosing an individual 3-D head scan was based on calculations of principal components one and two (PC1 and PC2). 3.2 FE Model There has been a long history about FE modeling and analysis of human head for understanding of the biomechanics and mechanisms of head injury. Voo et al. [26] made a comprehensive review of FE models. The basic trends of the development of FE models are: 1. From simplified model to detailed model. Early idealized model had simplified and regular geometry such as a spherical or ellipsoidal shell for the skull [9] that concluded only the skull. The skull was then idealized with a double curved and arbitrary triangular shell element. Shugar [23] published his 2D FE headform model. The skull is represented as a closed rigid medium. The membranes are not
664
J. Yang, J. Dai, and Z. Zhuang
included in the model, and brain was assumed as an elastic material but not a visco-elastic one. He also assumed the brain is firmly connected to the skull. And the scope and limitations imposed by this assumption of linearity are discussed in this model. Khalil and Hubbard [11] used a closed oval shell to simulate the human skull. The scalp was modeled as an encased elastic layer. The intracranial contents were represented by an inviscid fluid. Then these simple models were developed based on anthropometric and anatomical data, more structures were included. Like the Horsey and Liu’s model [10]. It is the most comprehensive FE model of human head and neck in 80’s. This model included one half of the head and neck in the saggital plane. It took into account the gross neuroanatomy as well as the inertial and material properties of the head and neck. It studied the response of a head-neck model subjected to occipital load, but the importance of the membranes of the brain and the neck was not discussed. Wayne State University Brain Injury Model was developed by Ruan et al. [21]. The brain, skull, and CSF were developed as eight-noded hexahedron elements simulating the actual anatomy of the skull and brain. The scalp, dura mater and falx cerebri were represented as four-noded thin shell elements. Kumareasan et al. model [14] was constructed very realistically due to the development of computer science. The preprocessor of a finite-element package NISA (Numerically Integrated element for System Analysis) was used and the free-vibration and transient analyses were carried out. It contained almost all actual geometry of the different parts of head, including the skull, CSF, falx cerebri, brain, tentorium cerebella and the neck. With the development of computed tomography (CT) or magnetic resonance imaging (MRI), researchers have been able to acquire accurate and real time images for modeling [2, 29].
Fig. 1. Small, medium, large, long and short digital headforms
2. The total number of nodes and the elements gradually increased. In Nahum’s model [17], the brain was modeled by only 189 eight node brick elements. Then Ruan et al. [21] increased the elements to 7351 and nodes to 6080. In King’s models [12], there are 7205 nodes and 9146 elements. In Zhou et al. Model [33], it contains 17656 nodes and 22995 elements. Zhang [32] developed two models that included 17656 nodes, 22995 elements and 226000 nodes and 245000 elements, respectively. Within FE models, there are 2-D and 3-D models. 2D models are useful for parametric studies of controlled planar motions and simplify the inclusion of geometrical details. But when it comes to large deformations and the impact and inertial load analysis, because of the low shear resistance and large bulk modulus material exchange between regions is likely to occur, these problems can only be described by 3D models. Bandak et al. [2] developed a 3D modelthe using CT scan
Human Head Modeling and Personal Head Protective Equipment
665
images. But such procedure requires CT scans of the geometry, so it has limitations if one need an average models. Also this technique that automatically generates a FE model of complex geometries has issues in creating a well conditioned element mesh. Kumaresan et al. [14] developed another approach that has more flexibility. He used the upper limits of the landmark coordinates of the external geometry of a head, and then divided the head to 33 layers in the horizontal plane. The coordinate of the curves are connected to the exterior landmarks. Values of all the points were inputted into a FE package to generate the models. [3].
4 Personal Head and Face Protective Equipment Personal protective equipment, or PPE, includes a variety of devices and garments such as goggles, coveralls, gloves, vests, earplugs, and respirators. The OSHA standards [8] that deal with personal protective equipment consist of different requirements. This section describes three typical head and face protective devices such as respirators, goggles, and helmets in the literature. 4.1 Respirators Respirators protect workers against insufficient oxygen environments, harmful dusts, fogs, smokes, mists, gases, vapors, and sprays, and these hazards may cause cancer, lung impairment, other diseases, or even death. There are two main categories of respirator: the air-purifying respirator and the airsupplied respirator which can supply an alternate source of flesh air. The former can then be divided to another three kinds: (1) Negative-pressure respirators, using mechanical filters and chemical media. (2) Positive-pressure units such as powered air-purifying respirators (PAPRs). (3) Escape only respirators such as Air-Purifying Escape Respirators (APER) for use by the general public for Chemical, Biological, Radiological, and Nuclear (CBRN) terrorism incidents. The first full and half facepiece respirator test panel was developed by the Respirator Research and Development Section of Los Alamos National Laboratory(LANL) under demand from National Institute for Occupational Safety and Health(NIOSH).This panel is known as LANL respirator fit test panel. Three different fit tests are summarized as follow: (a) TIL test: Here is how the NIOSH Total Inward leakage Test (TIL Test) for Halfmask Air-purifying particulate Respirator been conducted. The respirator will be tested on 35 human subjects, having facial sizes designed by the respirator manufacturers for the specific facepiece, from a NIOSH panel having facial sizes and shapes that approximate the distribution of sizes and shapes of the working population of the United States. The actual TIL value shall be recorded in accordance with the PortaCountTM instructions, for each test subject while performing the following sequence of exercises for 30 seconds each, showed in pictures below. Normal Breathing, Deep breathing, Turn head side to side, up and down, recite a passage, reach for the floor and ceiling, grimace and normal breathing. This test will measure the concentration of a challenge aerosol outside of the respirator and that within it.
666
J. Yang, J. Dai, and Z. Zhuang
(b). Irritant Smoke Fit Testing: This qualitative respirator fit test is conducted by directing the smoke stream from ventilation smoke tubes (intended to study building ventilation systems) at the respirator face seal. The involuntary nature of the reaction is the reason many prefer this test over other qualitative fit tests [4]. (c). Saccharin qualitative fit testing: This test is conducted with an inexpensive, commercially available kit that challenges the respirator wearer with a sweet tasting saccharin aerosol. After previously having been screened to assure that he/she can taste saccharin at the required concentration, the respirator wearer is asked to report if saccharin is tasted during fit testing. If so, the respirator is considered to have an inadequate fit and fails the fit test [4]. 4.2 Helmet The history of helmet can be traced back to Ancient Greek and China, when the warrior wears leather hat to protect them from the wound of sword and arrow. Helmeted motorcycle riders have a 25% percent lower fatal rate compare to the un-helmeted riders [24]. Although helmet had been used for more than thousand years, the systemic study of its function and mechanism only appear recently in 1940’s, when the English researcher Cairn [5] reported a study of motorcyclist fatalities, established the value of crash helmets. In the United States, helmet development was pursued mostly by the military, and then came to the Sports Car Club of America (SCCA). In 1961, the American Standards Association (ASA) established a committee for protective headgear. The first ASA helmet standard was published in 1966, which was named: Z 90. 1-1966. Protective Headgear for Vehicular Users. Its first revision was published by ANSI in 1971. The supplement, ANSI Z90. 1a-1973 was released in 1973 to correct a technical error. There have been so many different systems for performing helmet impact tests since the 1940’s. Snell Memorial Foundation, established in 1957 as the most authoritative helmet standards setter in US, says “Standards differ in many ways from country to country, and for different applications. Because of the different standards, so there are many different helmet test systems. And for the different types of helmets, the test procures are not the same. The same thing is, they all use physical headform to conduct the impact test. There are quite a number of impact test headforms. The two most commonly used in the United States are those from the DOT motorcycle helmet standard FM VSS 218 and ISO DIS 62201983. These experimental tests are costly and have limited flexibility. So many researches now focus on the finite element modeling of the helmeted headform. However, simulation of helmeted headform is not simple. That is because of the complexity of human head. For instant, for an impact simulation of solid metallic or wood headforms, it is easy, but after you put the skull, membrane and brain properties into your consideration, that is a very complex problem. Fortunately, with the rapid development of computer calculation capability and finite element method, researchers now are able to develop more realistic digital models to simulate the impact test of helmet. To name a few here: Shuaib’s research [22] on motorcycle helmet crash studies from biomechanics and computational point of view. So is Kostopoulos [13], he also selected motorcycle’s helmet as the subject, but from different view: A parametric analysis had been performed to study the effect of composite shell stiffness and the
Human Head Modeling and Personal Head Protective Equipment
667
damage development during impact. Mills [15] chose bicycle helmets to make oblique impacts with a road surface. In Pinnoji’s paper [19], his finite element models of the head and helmet were used to study contact forces during frontal impact of the head with a rigid surface. Not many digital helmeted headforms are studied. This aspect is still under gradual progression. 4.3 Safety Goggle and Spectacles Safety goggle or spectacles is a form of protective eyewear that usually encloses or protects the eye from being stroke by particulates, water or chemicals. According to the Bureau of Labor Statistic (BLS), there are estimated 1000 eye injuries in US every day, and more than $300 million dollars loss in medical expenses per year. BLS found that 60% of workers with eye injuries were not wearing safety goggles. Second, even within these one wearing goggles, they are not wearing it properly. In a word, only 6% of workers who is suffering potential eye injuries wear goggles [6]. Depending to the working environments, properly goggles should be chosen as below: 4.3.1 Test of Goggle or Glasses The new ANSI Z87.1-2003, which is instead of the old ANSI Z87.1-1989) sets the new requirements that goggles should meet. The old 89 standard only concerns about the ability of the frame of the safety goggle to withstand the High Impact Testing, but now the new standard extents it to both frame and lens. Lens now has two levels of performances, Basic Impact and High Impact. If the lens passes the High Impact testing, thinner thickness will be accepted. But if the lens does not meet the High Impact Testing, a warning label should be attached to indicate that. The frame must undergo testing in addition to the typical high mass and high velocity impact tests, 2.0 mm High Impact lens must be retained by the frame. 4.3.2 New Development of Goggles Military persons always get the most advanced PPE. Many new military goggles are developed, but more likely as an integrate system, like night vision goggle, holographic goggle and pilot goggle et al. But there do have some other developments in civilian level that is in the sun, wind and dust (SWD) goggle field. Here are five meaningful advancements [18]: 1. Anti-fog lens coatings: Anyone who has worn a goggle during strenuous activity will testify that fog resistance is one of the most important qualities in this product category. Minimizing condensation requires the goggle lens to have an effective chemical anti-fog coating. Several types of coatings are available in the commercial market--some more effective than others. An uncoated lens is far more susceptible to fogging, which seriously can impair the wearer's vision. A simple test can tell you if a lens is anti-fog treated: Just exhale on the lens--if it fogs up, the lens probably does not have an anti-fog coating. 2. Large, filtered ventilation ports: Sports-goggle experts long have recognized that another key design element in the battle against fog is adequate ventilation. High airflow can dissipate humidity that otherwise would condense as fog on the lens. The most fog-resistant modern goggles have large ventilation zones, and larger air
668
J. Yang, J. Dai, and Z. Zhuang
volumes inside the frames maximize airflow and minimize condensation. It is important that vents are filtered fully to keep eye-irritating particles outside the frame and away from the eyes. 3. Comfortable anatomical fit and rapid strap-adjustment systems: A key feature in goggle performance is a sealed, comfortable fit. Goggles that are uncomfortable due to gaps, pressure points, or improperly adjusted straps will not protect the eyes against blowing dust and smoke. In this case, Sailors or Marines may not use the goggles--even in hazardous places. High-quality goggles use a combination of anatomical modeling and malleable face padding to provide a sealed fit that is comfortable to wear for long periods. High-memory elastic straps with convenient length adjusters ensure a proper fit. 4. Wide field of view with ample fit over eyeglasses: Modern goggles provide unobstructed peripheral vision and a wider field of view. The SWD goggle has a relatively narrow field of view and a small interior volume that affords minimal room for eyeglass frames. Goggle frames now are available that fit over eyeglasses comfortably or accommodate the use of a prescription-lens insert.
5 Discussion Using a real human subject is needless to say time and money consuming, and the individualities between subjects to subjects will obviously cause the final products are not suitable to part of their users, which maybe life-threatening in some cases. Real human test is also not practical in the impact experiments, like helmet test. Using a digital human head model to test the PPE will greatly help the designers. There do have a number of well established digital head models, however, these models only mechanically consider the structure of a human head, more likely the skull and brain properties, but do not involve the face anthropometry and facial tissues. They are good for simulating the rigid impact of the head and possible injury, but lack of accuracy in design of some face-wear head protect equipments like respirators or goggles. Using the respirator TIL test as an example, the short coming of this test is using real human subjects. Though the facial sizes of these subjects are chosen to be approximately distribution of the whole working population in U.S, it is still not accurate enough. The individualities and the un-uniformity of movement even during one subject’s test procedure aggravate this inaccurateness. Another defect is: you do not know which air path causes the seal leakage, is that because the interface pressure between the respirator and the user or the ineffectiveness and breakthrough of cartridge? Further researches will considers using the Principal Component Analysis (PCA) method and dates from the NIOSH survey to build a digital human headform which represents the facial Principal Components of the U.S. worker population. The new headform will have different catalogs of exterior size .Then we can use this headform, combining with the finite element method to conduct the visual test of the PPEs. For example, a finite element analysis will greatly help these goggle manufacturers to retest their products then make sure they are compliant with new standard or have to make changes as necessary.
Human Head Modeling and Personal Head Protective Equipment
669
References 1. Bailar III, J.C., Meyer, E.A., Pool, R.: Assessment of the NIOSH head-and-face anthropometric survey of U.S. respirator users. The national academies press, Washington (2007) 2. Bandak, F.A., Vander, V.M.J., Stuhmiller, L.M., Mlakar, P.F., Chilton, W.E., Stuhmiller, J.H.: An Imaging-Based Computational and Experimental Study of Skull Fracture: Finite Element Model Development. J. Neurotrauma. 12(4), 679–688 (1995) 3. Belingardi, G., Chiandussi, G., Gaviglio, I.: Development and Validation of a New Finite Element Model of Human Head. In: 19th International Technical Conference on the Enhanced Safety of Vehicles. Paper Number 05-0441 (2005) 4. Bollinger, N.: NIOSH Respirator Selection Logic. NIOSH Publications Dissemination. Cincinnati, OH (2004) 5. Cairns, H.: Head Injuries in Motor-Cyclists: The Importance of the Crash Helmet. British Medical Journal, 465–471, October 4 (1941) 6. Chambers, A.: Safety Goggles at a Glance. Occupational Health & Safety 71, 10 (2002); ABI/INFORM Global 58-66 7. Digital Human Research Center, AIST, http://www.dh.aist.go.jp 8. Grey House Safety & Security Directory. Grey House Publishing, New York (2005) 9. Hardy, C.H., Marcal, P.V.: Elastic analysis of a skull. ASME Transaction, 838–842 (1971) 10. Horsey, R.R., Liu, Y.K.: A homeomorphic finite element model of the human head and neck. In: Finite Elements in Biomechanics, pp. 379–401. Wiley, New York (1981) 11. Khalil, T.B., Hubbard, R.P.: Parametric study of head response by finite element modeling. Journal of Biomechanics, 119–132 (1977) 12. King, A.I., Ruan, J.S., Zhou, C., Hardy, W.N., Khalil, T.B.: Recent advances in biomechanics of brain injury research: a review. Journal of neurotrauma 12, 651–658 (1995) 13. Kostopoulos, V., Markopoulos, Y.P., Giannopoulos, G., Vlachos, D.E.: Finite element analysis of impact damage response of composite motorcycle safety helmets. Composites: part B 33, 99–107 (2002) 14. Kumaresan, S., Radhakrishnan, S.: Importance of partitioning membranes of the brain and the influence of the neck in head injury modeling. Medical & Biological Engineering & Computing 34, 27–32 (1996) 15. Mills, N.J., Gilchrist, A.: Finite-element analysis of bicycle helmet oblique impacts. International Journal of Impact Engineering 35, 1087–1101 (2008) 16. Motoyoshi, M., Shimazaki, T., Sugai, T., Namura, S.: Biomechanical influences of head posture on occlusion: an experimental study using finite element analysis. European Journal of Orthodontics 24, 319–326 (2002) 17. Nahum, A.M., Smith, R., Ward, C.C.: Intracranial pressure dynamics during head impact. In: Proceedings of the 21st Stapp Car Crash Conference, pp. 339–366 (1977) 18. Peter: The benefits of modern goggle technology. Ground Warrior. 22-DEC (2005) 19. Pinnojil, P.k., Mahajanl, P.: Finite element modeling of helmeted head impact under frontal loading. Sadhana 32, Part 4, 445–458 (2007) 20. Reddi, M.M., DeCleene, D.F., Oslon, M.B., Bowman, B.M., Hartmann, B.T.: Development of Anthropometric Analogous Headforms. Phase 1. Conrad Technologies, Inc., Paoli (1994) 21. Ruan, J.S., Khatil, T.B., King, A.I.: Finite element modeling of direct head impact. SAE 933114 (1993) 22. Shuaib, F.M., Hamouda, A.M.S., Umar, R.S.R., Hamdan, M.M., Hashmi, M.S.J.: Motorcycle helmet Part I. Biomechanics and computational issues. Journal of materials Processing Technology 123, 406–421 (2002)
670
J. Yang, J. Dai, and Z. Zhuang
23. Shugar, T.A.: Transient structural response of the linear skull brain system. In: Nineteenth Stapp Car Crash Conference Proceedings. SAE, pp. 581–625 (1975) 24. Subramanian, R.: Traffic safety facts. NHTSA’s National Center for Statistics and Analysis, Washington, DC (2007) 25. Trosseille, X., Tarriere, C., Lavaste, F.: Development of a FEM of the human head according to a specific test protocol. In: Proceedings of the 30th Stapp Car Crash Conference, pp. 235–253 (1992) 26. Voo, L., Kumareasan, F.A., Pintar, F.A., Yoganandan, N., Sances, A.S.: Finite element models of the human head. Med. Biol. Comput. 34, 375–381 (1996) 27. Willinger, R., Kang, H., Diaw, B.: Three-Dimensional Human Head Finite-Element Model Validation Against Two Experimental Impacts. Annals of Biomedical Engng. 27, 403–410 (1999) 28. Willinger, R., Kopp, C.M., Cesari, D.: New concept of countercoup lesions: Modal analysis of a finite element head model. In: Proceed. of the International Research Council on Biokinetics of Impacts, pp. 283–297. IRCOBI, Verona (1992) 29. Willinger, R., Taleb, L., And Pradoura, P.: From the finite element model to the physical model. In: IRCOBI Conf., Brunnen, pp. 245–260 (1995) 30. Willinger, R., Trosseille, X., Lavaste, F., Tarriere, C., Domont, A., Kang, H.S.: Validation study of a 3D finite element headmodel against experimental data. SAE 962431 (1996) 31. Yue, X.F., Wang, L., Sun, S.F., Tong, L.G.: Viscoelastic finite-element analysis of human skull-dura mater system as intracranial pressure changing. African Journal of Biotechnology 7(6), 689–695 (2008) 32. Zhang, L., Yang, K.H., King, A.I.: Comparison of Brain Responses between Frontal and Lateral Impacts by Finite Element Modeling. J. Neurotrauma. 18(1), 21–30 (2001) 33. Zhou, C., Khalil, C.T.B., King, A.I.: A new model comparing impact responses of the homogeneous and inhomogeneous human brain. SAE 952714 (1995) 34. Zhuang, Z., Viscusi, D.: A new approach to developing digital 3-D headforms, SAE Digital Human Modeling for Engineering and Design, Pittsburgh, PA, June 14-17 (2008) 35. Zong, Z., Lee, H.P., Liu, C.: A three-dimensional human head finite element model and power flow in a human head subject to impact loading. Journal of Biomechanics 39, 284– 292 (2004)
HADRIAN: Fitting Trials by Digital Human Modelling Keith Case1,4, Russell Marshall2, Dan Hogberg4, Steve Summerskill2, Diane Gyi3, and Ruth Sims2 1
Mechanical and Manufacturing Engineering 2 Department of Design and Technology 3 Department of Human Sciences, Loughborough University, UK 4 The School of Technology and Society, University of Skövde, Sweden {k.case,r.marshall,s.j.summerskill2,d.e.gyi,r.sims}@lboro.ac.uk [email protected]
Abstract. Anthropometric data are often described in terms of percentiles and too often digital human models are synthesised from such data using a single percentile value for all body dimensions. The poor correlation between body dimensions means that products may be evaluated against models of humans that do not exist. Alternative digital approaches try to minimise this difficulty using pre-defined families of manikins to represent human diversity, whereas in the real world carefully selected real people take part in ‘fitting trials’. HADRIAN is a digital human modeling system which uses discrete data sets for individuals rather than statistical populations. A task description language is used to execute the evaluative capabilities of the underlying SAMMIE human modelling system as though a ‘real’ fitting trial was being conducted. The approach is described with a focus on the elderly and disabled and their potential exclusion from public transport systems. Keywords: Digital Human Modelling, User Trials, SAMMIE, HADRIAN.
1 Introduction The collection and application of anthropometric data within digital human modelling systems raises many questions. Often the data will have been collected for direct use in a particular design application and may not meet the more generic needs of human modelling systems. There is a consequent need for some transformation to for example convert the external body dimensions normally collected in anthropometric surveys into the internal joint-to-joint dimensions that form the basis of most models. However, perhaps the most significant problem arises from the use of a ‘percentile’ approach that is in conflict with the multivariate nature of anthropometric data. Fifth and ninety-fifth percentile models are commonly used in the belief that this will ‘accommodate’ an appropriate proportion of the user population. This, however, assumes that good correlation exists between body measures whereas it has long been understood that correlation between some body measures can be extremely weak. Hertzberg [1], in a large survey of over 4000 Air Force personnel found no examples of
V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 673–680, 2009. © Springer-Verlag Berlin Heidelberg 2009
674
K. Case et al.
Fig. 1. Female A-CADRE family [2] constructed in RAMSIS
Fig. 2. Some male members of the RAMSIS typology constructed in Jack [6]
men who fell within the 30 percent central (average) range on all of a series of ten measurements. This is to say that the man who is average in all dimensions, and thus an 'average' man, just does not exist, because the correlation between different dimensions is not sufficiently high. In the human modelling world handling this problem is frequently left as an issue for the user of modelling systems to deal with raising the question as to whether all users are sufficiently aware of the difficulties to deal with them satisfactorily. Alternative approaches have constructed ‘families’ which try to encompass the multivariance within a limited number of models such as the 17 manikins of A-CADRE [2] or the 45 manikins of the RAMSIS Typology [3]. Figure 1
HADRIAN: Fitting Trials by Digital Human Modelling
675
Fig. 3. Skin compositions of RAMSIS Typology (black) and A-CADRE (grey) male manikin families (Hogberg, [5])
shows the A-CADRE female family and figure 2 shows 6 members of the RAMSIS typology constructed according to a method defined by Speyer[4]. Hogberg [5] gives a graphic comparison of the A-CADRE family and the full RAMSIS family (figure 3). In real ‘fitting trials’ a panel of people selected to be representative of the eventual users interact with the product or a prototype. To carry out the equivalent activity in digital human modelling it is necessary to have anthropometric (and other) data available in individual sets (rather than population statistics such as percentiles) and there needs to be some way of describing the interactions with the product (a task description). Both of these important aspects are provided by the HADRIAN system.
2 Data Collection Important aspects of diversity arise from the users of products being older than the general population or through having some disabilities and these have been reflected in our data collection. This emphasis on older and disabled people comes from earlier work within the EQUAL (Extending Quality Life) programme [7] which was a ‘design for all’ activity that recognised the needs and opportunities of an aging population, and similar considerations but focussed on transport in current work concerned with Sustainable Urban Environments (SUE)[8]. Details of the data collection can be found in [9] and some indication of the variety of data available is shown in figure 4 (from [10]). The data collected includes anthropometry, joint constraints, reach and mobility which is presented to the designer/ergonomist as sets relating to individuals together with additional information such as video clips which illustrate particular problems that an individual might have due to a disability. This data and the form of
676
K. Case et al.
Fig. 4. Examples of various windows for a specific individual in the HADRIAN database
its presentation has considerable value in its own right but becomes more potent when associated with a task-driven human model as described next. The diversity of the members of the database is illustrated in figure 5.
3 Task Description Language HADRIAN contains the database of individuals described above plus a task description method for driving the underlying and long-established SAMMIE (System for Aiding Man-Machine Evaluation) system [11]. The task and its evaluation criteria are defined using a simple task description language (figure 6) and the subsequent analysis uses this to create and drive a human model to evaluate their capability in performing the task. The figure shows a small part of the task of obtaining money from an Automatic Teller Machine (ATM) where the first two elements are ‘look at screen’ and ‘reach to slot’. The complete task is evaluated for each individual in the database and a degree of intelligence is applied to the analysis – for example the reach to the
HADRIAN: Fitting Trials by Digital Human Modelling
677
Fig. 5. Members of the HADRIAN database
Fig. 6. Constructing a Task Analysis in HADRIAN
card slot will be performed by the individual’s preferred hand as handedness is an item in the database. On completion of the task analysis the percentage accommodated will be presented. This is the percentage of the individuals in our database that have been predicted to complete the whole task successfully. Should any individual be unable to
678
K. Case et al.
Fig. 7. Best attempt to reach card slot
complete the task then they will be identified and the situation causing the difficulty will be displayed (e.g. figure 7) together with a suggestion for improvement.
4 Accessibility and User Needs in Transport Current research using HADRIAN is considering accessibility aspects of public transport systems. The work is focussed on the creation of a journey planner as the ‘journey’ expresses the need for individuals to complete extended tasks with failure in any one aspect making the entire task impossible. For example, a journey from home to a hospital followed by a visit to the pharmacy and a return home could involve walking, buses and trains with interchanges between the modes. Two test-bed sites in Camden (central London) and Hertfordshire (rural towns) are being used to identify a number of relevant journeys from which we can collect data. The journeys will be based on observation and real world experience from people and will include all of the accessible design elements that the individuals will have to deal with on those journeys. Potential barriers faced by the people who make these journeys are being identified (figure 8). These barriers may take many forms including physical, cognitive and emotional. The physical barriers (e.g. kerbs, lifts, escalators and street furniture) are the most easily assessed using human modelling techniques, but our data collection activity has included aspects of the cognitive (e.g. understanding of signage and timetables) and emotional (e.g. security concerns) characteristics of individuals.
HADRIAN: Fitting Trials by Digital Human Modelling
679
Fig. 8. Potential barriers faced during a typical journey
Many of these barriers may arise with in the course of making a journey and if any one prevents the user from achieving a relatively small part of the overall task it may well prevent the journey from being possible.
5 Conclusions The multivariate nature of human data gives rise to considerable difficulty in the proliferation of digital human modeling techniques beyond the specialist activity into the general world of product design. The design of evaluations and the interpretation of results requires considerable knowledge and a professional ergonomist. This paper has described an approach that is intended to alleviate this problem to a certain extent by replicating the fitting trials of the real world by the equivalent in the virtual world of digital human modelling. The use of a task-based approach is also considered to be essential and HADRIAN’s task description capabilities allow the modeling system to be used as an automated evaluation tool. It also allows for the consideration of issues beyond the physical aspects of anthropometry so that some consideration can be given to the cognitive and emotional issues faced by the individuals in the database.
680
K. Case et al.
References 1. Hertzberg, H.T.E.: Dynamic Anthropometry of Working Positions. Human Factors 2(3) (August 1960) 2. Bittner, A.C.: A-CADRE: Advanced family of manikins for workstation design. In: XIVth congress of IEA and 44th meeting of HFES, San Diego, pp. 774–777 (2000) 3. Bubb, H., Engstler, F., Fritzsche, F., Mergl, C., Sabbah, O., Schaefer, P., Zacher, I.: The development of RAMSIS in past and future as an example for the cooperation between industry and university. Int. J. of Human Factors Modelling and Simulation 1(1), 140–157 (2006) 4. Speyer, H.: On the definition and generation of optimal test samples for design problems, Kaiserslautern Human Solutions GmbH (1996) 5. Hogberg, D.: Ergonomics Integration and User Diversity in Product Design, PhD Thesis, Loughborough University (2005) 6. Badler, N.I., Phillips, C.B., Webber, B.L.: Simulating Humans: Computer Graphics, Animation and Control. Oxford University Press, Oxford (1993) 7. Extending QUAlity Life, http://www.extra.rdg.ac.uk/equal/ 8. Sustainable Urban Environments (SUE), http://www.epsrc.ac.uk/ResearchFunding/Programmes/PES/SUE/ default.htm 9. Gyi, D.E., Sims, R.E., Porter, J.M., Marshall, R., Case, K.: Representing Older and Disabled People in Virtual Users Trials: Data Collection Methods. Applied Ergonomics 35, 443–451 (2004) 10. Porter, J.M., Marshall, R., Sims, R.E., Gyi, D.E., Case, K.: Hadrian gets streetwise. In: Proceedings of the IEA 2006, International Ergonomics Association Triennial Congress, Maastricht, The Netherlands (July 2006) 11. Case, K., Porter, J.M., Bonney, M.C.: SAMMIE: A Man and Workplace Modelling System. In: Karwowski, W., Genaidy, A., Asfour, S.S. (eds.) Computer-Aided Ergonomics, pp. 31–56. Taylor & Francis Ltd., London (1990)
The Pluses and Minuses of Obtaining Measurements from Digital Scans Ravindra S. Goonetilleke1, Channa P. Witana1, Jianhui Zhao2, and Shuping Xiong3 1
Department of Industrial Engineering and Logistics Management, Hong Kong University of Science & Technology, Clear Water Bay, Hong Kong 2 Computer School, Wuhan University, Wuhan, Hubei, PR China 3 Department of Industrial Engineering and Management, Shanghai Jiao Tong University, Shanghai, PR China [email protected]
Abstract. Digital scanners are commonplace and are used in many different applications to obtain three-dimensional shapes and linear and circumferential measurements. Even though scanners can be highly accurate, measurements obtained from scanners can vary depending on how an object is scanned, aligned and processed. In this study, we examined the effect of three different alignment methods of foot scans and their effects on ten different measurements. Variations among methods in capturing foot length are relatively small relative to arch length. The foot girths can be quite sensitive to the registration process depending on the complexity of the algorithms used. As expected, linear and girth measurements based on anatomical landmarks will always be independent of any registration process and are thus good ways to obtain repeatable measurements. Keywords: Scanning, foot, measurement, registration, alignment, brannock, width, girth.
1 Introduction The length and sometimes the width of feet are used to select footwear. There are many studies on foot anthropometry that describe different techniques to measure critical dimensions on feet. These include Freedman et al. [3], Hawes and Sovak [4] and Kouchi [5]. With the availability of cheap computer power and powerful scanning technologies, many researchers are using automatic methods to obtain foot measurements. The required measures can be obtained from the relevant points in the scanned data using feature recognition techniques or by placing markers and identifying anatomical points [6, 8]. In recent years, there has been an exponential growth in laser scanner technologies for varying applications that claim accuracy within 1 mm. The accuracy of the scanner itself can vary depending on the object that is scanned and the method used for scanning. Even though scanner accuracy can be within 1 mm, one fundamental issue that has not been studied in relation to foot anthropometrics is the measurement axes. Witana V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 681–690, 2009. © Springer-Verlag Berlin Heidelberg 2009
682
R.S. Goonetilleke et al.
et al. [8] found differences between manual measurements, simulated measurements obtained user-generated computer algorithms and commercially available measurement software. They concluded that some of the difference may be attributed to differing registration processes in the differing methods. Researchers and footwear fitters have long used differing orientations and axes to measure feet. For example, the Brannock device (www.Brannock.com), which has been a tool used for measuring feet in shoe stores in the US for a number of years, has a foot length axis that is 38.1 mm (1.5 inches) from the medial side of the first metatarsal head. Most researchers use another foot axis, the line joining the pternion and the tip of the second toe [5]. This axis can be very problematic in the presence of bunions as the second toe will be deflected from its neutral position, resulting in possibly longer lengths and wider widths. Freedman et al. [3] and Yavatkar [10] proposed a different axis to measure foot flare. This axis is determined by joining the mid-point of the lines located 10 mm and 50 mm from the pternion. The objective of this study is to investigate the effects of using different axes for registration especially when determining foot measurements from digital scans to identify the pros and cons of using scans for foot measurements.
2 Methodology 2.1 Participants To account for variations in anatomical structures, twenty-five males and twenty-five females were recruited for this experiment. None of them had any visible foot abnormalities or foot illnesses. Their ages ranged between 19 to 24 years with an average age of 21.5 years. The range of foot length was from 210 mm to 283 mm with an average of 245 mm. 2.2 Experimental Procedure Each participant’s left foot was laser scanned using a Yeti foot scanner (www.vorum.com). Prior to the scan, seven anatomical landmarks were identified and marked on the left foot. Five landmarks were on the top of the metatarsal-phalangeal joints (MPJ), one each at the side of first and fifth MPJ. Then, the participant’s left foot was scanned with half body-weight on each foot. The point cloud of data obtained from the scanner including the seven landmarks were stored and processed as described by Witana et al. [8]. The software program developed by Witana et al. [8] has the capability to generate measures for differing foot registrations. It would be tedious to determine the measures for differing alignments with manual measurements. In this paper, we evaluate three different registrations for ten linear and circumferential foot measurements. The dependent variables were ten left-foot measurements taken on 50 participants’ digitally scanned foot shapes. The independent variable was the registration method at three levels: heel centerline, a simulated Brannock device alignment and a pternion
The Pluses and Minuses of Obtaining Measurements from Digital Scans
683
to second metatarsal head alignment. The latter method, as opposed to the pternionsecond toe tip, has the ability to minimize effects of bunions. 2.3 Foot Alignment Methods The Heel Centerline (HCL) Method. In this method (pictured in Fig. 1), a part of the rear-foot is used to compute the heel centerline, which is thereafter used as the axis for foot measurement. The algorithm for the HCL method is as follows:
Fig. 1. Heel centerline foot alignment method
1. Find the point, P, with maximum Z value (foot bottom) from the scanned point cloud (Figure 1). 2. Obtain a data set consisting of points no more than 3 mm distance from the point, P, along the Z-axis. 3. Find the bottom most point as the mean of the five points with maximum Z-values. 4. Consider all the scan points of the foot that are less than 25 mm in height (along the negative Z-axis) from this bottom most point. 5. Project the points, determined in Step 4, to the XY plane. 6. Find the two edge points (i.e., points with minimum and maximum Y values) in every scanned section. 7. The center point is then the mean of the two edge points in every section. 8. Fit a least squares line (1st degree polynomial) for all center points that are within 13% of the foot length from the heel (the first center point). 9. Rotate the foot scanned points around the first center point and parallel to the XY plane to make the fitted line parallel to the X-axis. 10. Repeat Steps 4 to 9 until the rotational angle is less than 0.001 deg. 11. Save the rotated results as an aligned point cloud and landmarks. The Brannock Alignment (BRN) Method. This alignment method (pictured in Fig. 2) simulates the foot positioning in the Brannock device (www.brannock.com). The algorithm used in the BRN alignment method is as follows:
M1
Fig. 2. Foot registration to simulate the Brannock device alignment
684
R.S. Goonetilleke et al.
1. Find the point, P, with maximum Z value (foot bottom) from the scanned point cloud. 2. Obtain a data set consisting of points no more than 3 mm distance from the point, P, along the Z-axis. 3. Find the bottom most point as the mean of the five points with maximum Z-values. 4. Consider all the scanned points of the foot that are less than 25 mm in height (Zaxis) from this bottom most point. 5. Project the points, determined in Step 4, to the XY plane. 6. Find the two edge points (i.e., points with minimum and maximum Y values) in every scanned section. 7. The center point is then the mean of the two edge points in every section. 8. Fit a second-degree polynomial line (x=ay2+by+c) for all the edge points that are 25 mm from the heel along the X direction. 9. Calculate the turning point coordinates, ((4ac-b2)/(4a), -b/(2a)). 10. Calculate the distance from the first landmark (M1) to the line, which is parallel to the X-axis and passing through the turning point. 11. Rotate the foot around the above turning point and parallel to the XY plane to make the distance calculated in step 10 be 38.1 mm, as shown in Figure 2. 12. Save the rotated results as the aligned point cloud and landmarks. The Pternion-Second Metatarsal Alignment (2MT) Method. This method, pictured in Fig. 3, uses the pternion and a landmark on the second metatarsal. The algorithm used in this method is as follows.
Fig. 3. Foot alignment method based on the pternion and a landmark on the second metatarsal (landmark 3)
1. Find the point, A, with the minimum X value from the scanned point cloud. 2. Obtain a data set (D3) consisting of all points no more than 3 mm away from the point, A, along the X-axis. 3. Find the five points with the lowest X-values from the data set, D3. Let the mean of these five points represent the pternion, i.e., the backmost point on foot. 4. Calculate the angle between the X-axis and the line (L1) joining the pternion and the second metatarsal head landmark. 5. Rotate the foot point cloud around pternion such that line L1is parallel to the Xaxis, as shown in Figure 3.
The Pluses and Minuses of Obtaining Measurements from Digital Scans
685
2.4 Foot Measurements Ten foot measurements were obtained using a C++ program. The description of each measurement is given in Table 1 and each measure is shown in Figure 4. Table 1. Description of the foot metrics. The numbers correspond to those shown in Figure 4.
Lengths
Widths
Girths
[1] Foot length: The distance along the alignment axis (Xdirection) from the pternion to the tip of the longest toe. [2] Arch length: The distance along the alignment axis from the pternion to the most medially prominent point on the first metatarsal head. [3] Heel to 5th toe: The distance along the alignment axis from the pternion to the anterior fifth toe tip. [4] Foot width: Maximum horizontal breadth (Y-direction), across the foot perpendicular to the aligned axis in the region in front of the most laterally prominent point on the fifth metatarsal head. [5] Mid-foot width: Maximum horizontal breadth, across the foot perpendicular to the alignment axis at 50% of foot length from the pternion. [6] Heel width: Breadth of the heel at a location 40 mm anterior to the pternion (modified from last measurements given by [7]). [7] Ball girth: Circumference of the foot, measured with a tape touching the medial margin of the head of the first metatarsal bone, the top of the first metatarsal bone and the lateral margin of the head of the fifth metatarsal bone. [8] Instep girth: The smallest girth over the middle cuneiform prominence [2]. [9] Short heel girth: Minimum girth around the back heel point and the dorsal foot surface [1] [10] Long heel girth: The girth from the instep point around the back heel point [1, 2].
3 Results The means and standard deviations for the ten measurements obtained using each of the alignments on the 50 participants are shown in Table 2. The foot length, arch length and foot width measures for each of the participants using each method are shown in Figures 5, 7 and 9. The differences between methods are presented in Figures 6, 8 and 10.
686
R.S. Goonetilleke et al.
Fig. 4. Foot measurements
Table 2. The means for each of the three alignment methods. The standard deviations are given in parentheses. Foot measurement Foot Length Arch Length Heel to Fifth Toe Foot Width Heel Width Mid Foot Width Ball Girth Instep Girth Long Heel Girth Short Heel Girth
HCL 245.0 (14.74) 178.8 (10.50) 203.9 (14.20) 92.6 (7.07) 61.4 (4.13) 85.7 (7.74) 223.3 (16.03) 238.0 (16.75) 324.5 (22.25) 304.7 (18.62)
BRN 245.3 (14.85) 180.1 (10.80) 203.1 (13.92) 92.6 (7.12) 61.4 (4.13) 85.9 (7.94) 223.3 (16.03) 237.3 (16.50) 327.1 (21.80) 304.8 (18.51)
2MT 245.0 (14.72) 179.0 (10.44) 203.9 (14.39) 92.5 (7.24) 61.4 (4.10) 85.6 (7.71) 223.3 (16.03) 238.0 (16.41) 325.9 (22.09) 304.8 (18.48)
The Pluses and Minuses of Obtaining Measurements from Digital Scans
Fig. 5. Foot length measurements using three alignment methods (n=50)
Fig. 6. Differences in foot length measurements among three alignment methods (n=50)
Fig. 7. Arch length measurements using three alignment methods (n=50)
687
688
R.S. Goonetilleke et al.
Fig. 8. Differences in arch length measurements among three alignment methods (n=50)
Fig. 9. Foot width measurements using three alignment methods (n=50)
Fig. 10. Differences in foot width measurements among three alignment methods (n=50)
4 Discussion The results clearly show that the alignment method has a relatively small effect on measures such as foot length. The maximum difference among the three methods was
The Pluses and Minuses of Obtaining Measurements from Digital Scans
689
less than 3 mm for all 50 participants. However, there are differences of about 5 mm in the different methods in measuring arch length. The differences between the Brannock alignment and the 2MT alignment methods are relatively small. The differences in foot length and arch length measurements using these two methods are 1.5 mm and 3 mm, respectively. The heel centerline method of alignment tends to have larger differences with the other two methods of alignment. This is primarily because the alignment is based on the center of the heel. Calluses and other deformities can significantly affect the orientation of the foot axis. The orientation of the foot using this method can be quite different with the other two methods. The arch length is generally around 73% of foot length [9] and one may expect the differences among methods to be around 73% of the differences in foot length. However, the larger difference between methods in measuring arch length is due to the projection of the first metatarsal point (arch point) on the measurement axis. Consider any two alignment methods. If the angle between the two alignments is θ, then the difference in foot length (FL) between the two methods is FL(1-cos θ). If the straight line distance between the pternion and the first metatarsal head is y, and the angle subtended by this line to the alignment axis is ψ, then the difference in the arch length between the two methods is y{cos ψ – cos (ψ + θ)}. Foot length is a special case where ψ = 0. It can be shown that {cos ψ – cos (ψ + θ)} is larger than {1 – cos θ} for 0 < ψ < 90, and hence the difference in arch length between two methods will tend to be larger than the difference in foot length. Variations in width tend to be proportional to the magnitude of the distance; the larger the magnitude, the larger the difference. For example, the differences between methods in measuring heel width tend to be smaller than those measuring foot width. The ball girth had no effect on the alignment as it is based on anatomical landmarks. In other words, the use of anatomical landmarks will cause all measurements to be independent of the alignment of the foot. The data show that there were variations between the alignment methods in the measures of the instep girth and the long heel girth. These differences may be attributed to the locations used for the calculations in the software. Certain anatomical features were used to calculate these girths and their orientations can shift in the calculations. Measurements of the short heel girth, on the other hand, varied less among the different methods as the measuring plane was defined based on the pternion and the minimum value for the girth measurement. The measurement algorithm automatically took care of any variations due to the different alignments. Overall, caution is required when setting the foot axis and obtaining measures based on the axis used.
5 Conclusion Scanning technologies are able to yield good point cloud data. Measurements from the point clouds produced by the scanning can be sensitive to the alignment method. When the measurement points are close to the measurement axis, the variations among alignment methods may be small compared to measurements that are based on points that are further away from the measurement axis. If axis-independent measures are desired, then the ideal scanning situation would be to calculate distances or girths based on anatomical landmarks.
690
R.S. Goonetilleke et al.
Acknowledgements The authors would like to thank the Research Grants Council of Hong Kong for funding this study under grant HKUST 613607.
References 1. Chen, C.C.: An investigation in to shoe last design in relation to the foot measurement and shoe fitting for orthopedic footwear, Ph.D. Thesis, University of London (1993) 2. Clarks, Ltd. Training Dept., Manual of shoemaking, 2nd edn., Training Department Clarks (1976) 3. Freedman, A., Huntington, E.C., Davis, G.C., Magee, R.B., Milstead, V.M., Kirkpatrick, C.M.: Foot dimensions of soldiers (Third Partial Report Project No. T 13). Armored Medical Research Laboratory, Fort Knox (1946) 4. Hawes, M.R., Sovak, D.: Quantitative morphology of the human foot in a north American population. Ergonomics 37(7), 1213–1226 (1994) 5. Kouchi, M.: Inter-generation differences in foot morphology: Aging or secular change? Journal of Human Ergology 32, 23–48 (2003) 6. Luximon, A., Goonetilleke, R.S., Tsui, K.L.: Foot landmarking for footwear customization. Ergonomics 46(4), 364–383 (2003) 7. Pivečka, J., Laure, S.: Practical handbook for shoe designers: The shoe last. International school of modern shoemaking (1995) 8. Witana, C.P., Xiong, S., Zhao, J., Goonetilleke, R.S.: Foot measurements from threedimensional scans: a comparison and evaluation of different methods. International Journal of Industrial Ergonomics 36(9), 789–807 (2006) 9. Xiong, S., Goonetilleke, R.S., Zhao, J., Li, W., Witana, C.P.: Foot deformations under different load bearing conditions and their relationships to stature and body weight. Anthropological Science (2009) (in press) 10. Yavatkar, A.S.: Computer aided system approach to determine the shoe-last size and shape based on statistical approximated model of a human foot. Unpublished master’s thesis, Tufts University, Medford MA (1993)
Auto-calibration of a Laser 3D Color Digitization System Xiaojie Li1, Bao-zhen Ge1, Dan Zhao1, Qing-guo Tian1, and K. David Young2 1
College of Precision Instruments and Opto-electronics Engineering and Key Ministry of Education Laboratory of Opto-electronics Information and Technical Science, Tianjin University, Tianjin 300072. China 2 Embedded Systems Institute and Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong [email protected], [email protected]
Abstract. A typical 3D color digitization system is composed of 3D sensors to obtain 3D information, and color sensors to obtain color information. Sensor calibration plays a key role in determining the correctness and accuracy of the 3D color digitization data. In order to carry out the calibration quickly and accurately, this paper introduces an automated calibration process which utilizes 3D dynamic precision fiducials, with which calibration dot pairs are extracted automatically, and as the corresponding data are processed via a calibration algorithm. This automated was experimentally verified to be fast and effective. Both the 3D information and color information are extracted such that the 3D sensors and the color sensors are calibrated with one automated calibration process. We believe it is the first such calibration process for a 3D color digitization system.
1 Introduction Digital cameras are widely used in scientific explorations, industrial manufacturing and other fields to achieve the objectives of 3D reconstruction and measurements. A crucial step in the construction of any 3D measurement system is the determination of the relationships between the two-dimensional image coordinates (Xf, Yf) of the digital camera’s imaging plane and the three-dimensional spatial coordinates (Xw, Yw, Zw) of the measured object. This is typically done through the so-called camera sensor calibration process. Classical calibration method [1] is widely adopted due to the high accuracy it can attain, however, it is a complex calibration process which requires manual, operator assistance and intervention, thus increasing the calibration workload and impact severely on the calibration efficiency. For this reason, many researchers had worked on automatic calibration [2,3] of imaging sensors. Some reported work explored the use of binocular vision sensor for auto-calibration [4,5]. For multi-vision systems in which there are many imaging sensor, not only that the individual sensors need to be calibrated, calibration of the entire system means the colorations among the sensors needed to be considered. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 691–699, 2009. © Springer-Verlag Berlin Heidelberg 2009
692
X. Li et al.
In laser 3D color digitization systems [6,7,8], a multi-axis synchronized laser scanning system and a separate and independent color sensor system to digitize 3D color objects: 3D sensors, i.e. grey scale imaging sensors to capture the scanning laser line distortions on the object, were used to obtain 3D information of the measured object, and color sensors, i.e. color imaging sensors, were used to obtain color information. In these systems, it is necessary to calibrate each of the 3D sensors and color sensors, as well as matching the 3D information and color information through a calibration process to reconstruct the 3D color data of the objects. The most critical step in the calibration process is the extraction of calibration dot pairs, namely, to determine the relationships between the spatial coordinates and image plane coordinates of the feature points on the target. In this paper, dynamic precision fiducials were introduced to facilitate a totally automated calibration process in which fiducial data are automatically acquired and processed to build a corresponding spatial coordinate system. In this approach, both the coplanar or non-coplanar calibration dot pairs which are essential to the calibration process are automatically acquired, thus greatly improving the speed and efficiency of the calibration process. For coplanar point pairs obtained from the 3D sensors, we use a direct linear transformation calibration method to build the calibration matrix; and for the calibration dot pairs from the color sensors, we use an BP neural network method to obtain the mapping relationships between the color information and spatial coordinates. We then complete the calibration process by calibrating all the sensors in the same world coordinate system.
2 Target Design Traditional camera calibration methods need to place a known shape and size object in front of the imaging sensors, known as the calibration target, or calibration template block. Generally, the calibration templates require the world coordinates of the feature points on the calibration template be known, and the computer image coordinates of these points to be easily extracted. Different research groups used a variety of shapes for calibration templates, e.g., the Intel visual calibration group used a blackand-white checkerboard grid as template [9], a cube with round holes distributed on all of it’s facets is adopted by the Chinese Academy of Sciences Institute of Automation [5], Microsoft Research Institute visual group used 8 × 8 square planar arrays [10], and researchers at the Northwestern Polytechnic University, China proposed to use rectangular parallelepiped [11]. The ultimate design decision for calibration templates is their ability to facilitate image processing and extraction of feature points. Calibration target template design affects strongly on calibration accuracy, which at the end impacts the accuracy of the 3D measurement system. For example, if rectangular parallelepipeds are used as calibration templates, the vertices are used as feature points, in this case, the extraction point accuracy is limited by the pixel accuracy. It is well known that an image of a circle is not sensitive to the image threshold, thus, image coordinates of the circular center can be computed to sub-pixel accuracy by using least squares numerical analysis and with the hole edge image data of a circular target. Using this approach, data errors are reduced and accuracy of the imaging sensor calibration parameters is improved.
Auto-calibration of a Laser 3D Color Digitization System
693
Taking into account the wide measurement range of our system and multidirectional distribution of the imaging sensors, we concluded that the calibration target adopted must be 3D, and must fill the entire field of view of the imaging sensors. The calibration targets designed for use in calibrating our 3D color digitization system is the so-called dynamic precision fiducials. As shown in Fig. 1, it is composed of a vertical plate target, a turntable and a two-dimensional precision translation stages. A series of precision fiducials in the form of circular holes are made on the vertical plate to accurately position the Z direction coordinate (Zw). The turntable can rotate an arbitrary angle in the XOY plane. The vertical plate target can translate precisely in the two-dimensional plane, as well as rotating about the vertical Z axis. The twodimensional precision translation stage is composed of two one-dimensional precision translation stages equipped with gratings to enable high precision closed-loop position control. As such every dynamically relocated three DOF (two linear and one rotational DOF) target positions can be described in the same world coordinate system; and the fact that the precision fiducials can be arbitrarily positioned freely in the three dimensional space indicates that these are truly 3D calibration targets. Fig. 1. Dynamic precision fiducials
3 Automatic Extraction of Calibration Dot Pairs The most important step in the calibration process is the extraction of calibration dot pairs. Utilizing the dynamic precision fudicials as described above, calibration dot pairs of the sets of imaging sensors used in our 3D color digitization system can be extracted automatically without manual operator intervention during the entire calibration process. Via software control, the precision fiducials are dynamically relocated in selected locations in the spatial world coordinate system. The selected locations can be randomly picked or designed in a certain pattern. The ultimate objective is to provide sufficiently rich data such that the 3D digitalization system’s calibration matrix parameters can be computed accurately. The fiducials and their corresponding feature points in the two-dimensional images can be matched automatically. The world coordinates of these feature points can be obtained from position information obtained from the grating of the translation stages, and their respective position on the vertical plate target. Auto-calibration of all the imaging sensors can be completed if a sufficient number of calibration dot pairs are used. Using our 3D color digitization system which has four 3D sensors and four color sensors, we illustrate the auto-extraction process of the calibration dot pairs as follows:
694
X. Li et al.
The distribution of the system’s sensors, as shown in Figure 2, includes B1, B2, B3, B4 which are the 3D sensors; and C1, C2, C3, C4, the color sensors. Each 3D sensor and it’s corresponding color sensor are positioned relatively by a fixed angle. BP neural network approach which are commonly used to calibrate color sensors require non-coplanar calibration dot pairs, while linear calibration methods for 3D sensor calibration can use coplanar calibration dot pairs. Fig. 2. Sensor Distribution in the 3D color digitization system
The automatic calibration process flow is outlined in Figure 3 and described in details as follows: • Automatic extraction of the color sensors’ calibration dot pairs Before carrying out the calibration process, we adjust the turntable so that light rays connecting the two sensors located diagonally across the measured volume, e.g., C1 and C3, is normal to the surface of the vertical plate target. 1. Determination of the pixel coordinates (Xf, Yf): Set a position as the origin of the two-dimensional precision translation stage and initialize the translation stage by moving the target to the origin. The two-dimensional precision translation stage is servoed via grating position back to follow a specific 2D trajectory, e.g., concentric circles, and rectangles with different side lengths. Spatially discretize the trajectory into a successive number of waypoints and move the target to follow the waypoints. At each waypoint, stop the target and wait for the two color sensors (C1 and C3) to acquire an image. At the completion of visiting all the waypoints on the specific trajectory, target images at the designated positions are obtained. Processing of the acquired two-dimensional calibration template image data begins with the conversion of color bitmap to grayscale bitmap, then using the Otsu algorithm [12] to achieve binarization as well as ignoring negligible variances, through the morphological gradient approach to detect the edges of the binary image; subsequently define and subtract the corrosion structural elements from the original image to delineate the accurately the edges. The resulting image processing outputs are shown in Fig. 4(a). In the image acquisition process, due to sensor positioning and distortion factors, the circular hole on the target will generally become elliptical in the calibration image. However, the center of the ellipse and the center of the circular hole maintain a known projective relation. It is also a feature point which needs to be extracted. Making use of the elliptical edge to fit an algebraic equation for an elliptical curve, and using the least squares method to compute the corresponding computer image coordinates (Xf, Yf) to the elliptical hole center: The equation for an ellipse is given by
Auto-calibration of a Laser 3D Color Digitization System
695
Ax2+Bxy+Cy2+2Dx+2Ey+F=0 Using the (x,y) coordinates of the elliptical edge data to fit into the above equation, the best fit parameters A, B, ..., F are derived using standard regression techniques. The elliptic center (X0, Y0) coordinates are then computed as follows: X0=(BE-2CD)/(B2-4AC); Y0=(2EA-BD)/(B2-4AC) By selecting a complete feature contour in Fig. 4(a) and extracting the center coordinates of ellipses, the result is shown in Fig. 4(b) in which the “+” are the derived location of the centers. This approach can generically solve the problem of automated pixel coordinate extraction in the calibration of computer vision systems.
Fig. 3. Flow chart of the automated calibration process
2. Determination of the world coordinates (Xw, Yw, Zw): The coordinate of holes in the vertical axis can be determined by the following relation: Zw(K)=L+(K-1) ×S (mm) where L is the distance from the bottom geometry center to the bottom edge, S is the center distance of the adjacent holes and K is the sequence number of the hole. Coordinates (Xw, Yw) in the plane can be determined by the grating feedback location.
When calibrating simultaneously a number of imaging sensors, e.g., C1 and C3, it is necessary to use the grating feedback information from the translation stages to determine the spatial coordinates of the vertical plate target which is being imaged by these sensors. As shown in Fig. 5, we define the horizontal plane on which the target translation occurs to be the OXY plane, the two translation axes are respectively the X and the Y axis. For the imaging with sensor C1, the target’s coordinates is the same as the grating feedback coordinates (Xw0, Yw0). Let’s assume the same grating feedback coordinates for the imaging with the C3 sensor, d be the thickness of the vertical plate target, the angle between the vertical plate target and the X-axis be α, then the corresponding physical coordinates are given by:
696
X. Li et al.
Xw = Xw0 - d sin α Yw = Yw0 - d cos α By repeating the above process and applying it to the other sensors to complete their calibrations. When it is needed to rotate the target in order to calibrate other imaging sensors, let the axis of rotation of the rotating table be the Z axis normal to the OXY plane, and the clockwise rotated angle be θ, then the coordinates after rotation (Xwr, Ywr) are given by the following: Xwr = Xw0 cos θ Ywr = Yw0 sin θ in which Xw0, Yw0 are the grating feedback coordinates. • Automatic extraction of the 3D calibration dot pairs The automatic calibration process for the 3D sensors is essentially the same for the color sensors, and it can be done following the above steps. The only difference being that for 3D sensors, it only requires the calibration dot pairs, (Xf, Yf) and (Xw, Yw) in the optical plane. As such, for the images captured at each location, it suffice to pick out the center of the holes on the target image, and determine the pixel coordinates of the center of these holes.
(a)
(b)
Fig. 4. Image processing results
4 Calibration Results By moving the two-dimensional translation stage according to the required accuracy, and in predefined steps, a group of images and a series of correFig. 5. Coordinate system definition sponding coordinates are obtained; as well as a sufficient number of data pairs ((Xf, Yf) and (Xw, Yw, Zw)). Fig. 6 shows the calibration dot pairs obtained by the color sensor C1: Fig. 6(a) shows the pixel coordinates (Xf, Yf), whereas Fig. 6(b), the world coordinates (Xw, Yw, Zw). By feeding these data pairs into an BP neural network calibration algorithm [13], with the 3D spatial coordinates (Xw, Yw, Zw) as inputs, and the corresponding two-dimensional pixel coordinates (Xf, Yf) as outputs, and using six hidden layer neurons, we obtained
Auto-calibration of a Laser 3D Color Digitization System
697
Fig. 6. Calibration dot pairs obtained from the color sensors
Fig. 7. Calibration dot pairs obtained from the 3D sensors
the calibration parameters for the collection of color cameras. Due to space limitations, tables showing the neurons’ connection weights between the input, output and hidden layer in the BP network are omitted. Fig. 7(a) shows the world coordinates (Xw, Yw) when the translation stage move in concentric circles, Fig. 7(b) shows the corresponding pixel coordinates (Xf , Yf) derived from the images of a certain hole on the target which are obtained from a camera of the corresponding 3D sensor.
698
X. Li et al.
With this set of calibration dot pairs, we use the straight-line linear transformation method [14] to calculate the calibration matrix for a set of four 3D sensors: M=[m11,m12,m13;m21,m22,m23;m31,m32,m33]. The resulting calibration matrix for four 3D sensors is not shown herein due to space limitation. This concludes the automatic calibration of all the 3D sensors and color sensors in the system are carried out in the same world coordinate. With the laser 3D color digitization system calibrated, we begin to acquire the data to realize 3D color digitization of our 3D object. Point cloud data from the 3D sensors are first pre-processed, multiple sensors are aligned, a surface model is then reconstructed; color information is added, and together with other additional steps, the 3D color model visualization results of a swimsuit on mannequin are shown in Fig. 8. The highly realistic visualization of a mannequin with swimsuit demonstrates the effectiveness and accuracy of our automatic calibration process.
Fig. 8. Color visualization results of swimsuit model
5 Conclusions This paper designed a target for the automatic calibration of a laser 3D color digitization system. The target can be positioned with precision in arbitrary locations and following specific desired trajectory patterns in the horizontal plane. Sufficient 3D calibration dots (point) distributed in many different directions can be used for the simultaneous automatic calibration of the 3D sensors and color sensors. Automatic calibration experimental investigations of the laser 3D color digitization system have been carried out successfully. This process automatically extracts the calibration dot pairs required while enforcing a common world coordinate system for all the sensors in the system. Finally, we showed the visualization of 3D color digitization data of a
Auto-calibration of a Laser 3D Color Digitization System
699
test 3D object which confirmed the effectiveness and accuracy of the prescribed automatic calibration process.
References 1. Tsai, R.Y.: An Efficient and Accurate Camera Calibration Technique for 3D Machine Vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, pp. 364–374 (1986) 2. Huang, R., Xi, J.-t., Ma, D.: Design and Implement of an Automatic Camera Automatic Calibration Method. Journal of Test and Measurement Technology 18(2), 122–127 (2004) (in Chinese) 3. Zhang, X., Wang, Z., Sun, C., Ye, S.: The Drone Design and Image Processing Technique for CCD Camera Auto-Calibration, Mould & Die Engineering (6), 7-11 (2004) (in Chinese) 4. Yong, H., Congjun, W., Shuhuai, H.: Calibration Automation of the Binocular Stereo Vision Sensor. Journal of Wuhan University of Technology 27(10), 67–69 (2005) (in Chinese) 5. Cheng, J., Zhao, C., Mo, J.: Auto-calibration Technology Based on 3D Raster Measuring (2), 71–75 (2007) (in Chinese) 6. Sun, Y., Ge, B.: Laser Color 3D Digitization Technique by Light Stripe Method. Journal of Tianjin University 39(B06), 160–163 (2006) (in Chinese) 7. Ge, B., Sun, Y., Wei, Y., et al.: Laser 3D Color scanner Digitization method and system. People Republic of China Patent number ZL 2005 1 0013085.8, granted in (July 2006) 8. Ge, B.: Color 3D Digital Human Modeling and its Applications to Animation and Anthropometry. In: Human Computer Interface (HCI) International Conference, Beijing, China, July 22-27, pp. 4550–4566 (2007) 9. Park, S.Y., Subbarao, M.: A multi-view 3D modeling system based on stereo vision techniques. Machine Vision and Applications 16(3), 148–156 (2005) 10. Zhang, Z.: A Flexible New Technique for Camera Calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(11), 1330–1334 (2000) 11. Gao, m.d.: Use of rectangular parallelloid in projective computation of camera parameters. Journal of Northwestern Polytechnical University 4(10), 427–433 (1992) (in Chinese) 12. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Systems, Man, and Cybernetics 9(1), 62–66 (1979) 13. Sun, Y., Ge, B.: Study on Camera Calibration Technique of 3D Color Digitization System. In: The 6th International Symposium on Instrumentation and Control Technology, Proceedings SPIE, vol. 6357, pp. 635–713 (2006) 14. Sun, Y., Ge, B., Mu, B., et al.: Calibration of 3D Sensors with Linear Partition Method. Journal of optoelectronics·Laser 16(2), 135–139 (2005) (in Chinese)
Virtual Task Simulation for Inclusive Design Russell Marshall1, Keith Case2, Steve Summerskill1, Ruth Sims1, Diane Gyi3, and Peter Davis1 1
Department of Design and Technology Mechanical and Manufacturing Engineering 3 Department of Human Sciences, Loughborough University, LE11 3TU, UK {r.marshall,k.case,s.j.summerskill2,r.sims,d.e.gyi, p.m.davis}@lboro.ac.uk 2
Abstract. Human modelling tools provide a means to perform virtual task evaluations upon designs within the computer environment. The ability to evaluate the accommodation of a design early on in the design process before physical prototypes can be built has many advantages. These advantages are particularly relevant in supporting people in attempting to design products that are inclusive and accessible. HADRIAN is a new tool developed to provide accessible, and applicable data on people with a broad range of size, age, and ability together with a means of optimising virtual task evaluations. This paper describes the use of HADRIAN in performing a task evaluation, focusing on the underlying methodology that aims to achieve a virtual simulation that mimics a real world user trial. Keywords: Human modelling, simulation, inclusive design, ergonomics.
1 Introduction Human modelling tools provide a highly visual, interactive and timely means to address physical ergonomics problems of posture, fit, reach and vision during product design. Users of different sizes and shapes can be manipulated to simulate interactions with a computer model of an existing or proposed design. However, whilst such tools can be used effectively and efficiently to determine how successful a design may be in accommodating its users, they are not without their shortcomings. Research conducted by Loughborough University in the UK and funded by the Engineering and Physical Research Council (EPSRC) has been developing a means to address two significant issues associated with human modelling. These issues are the relevance, accessibility and applicability of the data used to drive the human model, and a means to simplify the often complex task of manipulating the human model into a representative posture during a product assessment. The first phase of the research was conducted under the Design for All element of the Extending Quality Life (EQUAL) programme of the EPSRC. Its aim was to address the data and simplification of human modelling use, issues with a particular focus on ‘design for all’ or inclusive design. More recently the research has progressed expanding its scope and targeting a specific application, that of transport. This second phase is being conducted as part of the EPSRC’s Sustainable Urban V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 700–709, 2009. © Springer-Verlag Berlin Heidelberg 2009
Virtual Task Simulation for Inclusive Design
701
Environment Programme (SUE) and is called Accessibility and User Needs in Transport (AUNT-SUE) [1]. Human modelling’s benefits are particularly applicable to inclusive design problems where the variety of human capability needs to be fully understood if a truly accessible solution is to be realised. However, existing data with which human models are currently constructed has many limitations. A widely used data source in the UK is Adultdata, published by the Department of Trade and Industry [2]. The data span 266 physical body measurements for multiple nationalities. However, not all measures are available for all nationalities, so for example, it is possible to obtain a stature measurement for the German population but there is no German data for arm length. In addition most of the data was collected many years ago. Adultdata was published in 1998, but the sources of data within it range from 1969 to 1998. Investigating further highlights other issues such as the fact that the Chinese data was actually collected from Singapore and Hong Kong. Other databases such as SizeUK [3] and CAESAR [4] are significantly more recent and their size and sampling strategies make them much more representative of their respective populations. However, they are expensive and often beyond the reach of many designers. For designers wishing to design for all there are more fundamental concerns. Adultdata and CAESAR do not have data on people who are older than 65 years although SizeUK does have people up to the age of 91 and Older Adultdata (one of the Adultdata series together with Childdata) have data from people over 90 for some nationalities. This is a common limitation for anthropometric databases and thus changes to body size and shape as people age are not reflected in the data. The lack of data from people with limited mobility is also a fundamental issue. The effects of common impairments are rarely reflected in anthropometric data and when they are they tend to be from samples of limited size or with other limiting factors [5] [6]. To address these issues associated with appropriate data a new database has been developed that captures a significant amount of data on 102 people, the majority of whom are older and have some form of disability [7]. These data are stored as a set based around the individual from which they were captured. The data incorporate an image of the individual together with background data and an extensive set of anthropometric, joint mobility, capability and behavioural data all of which can be explored by a designer or ergonomist wishing to examine their user population in greater detail. Storing the data as an individual is an attempt to foster empathy with the end user, where the data could be seen to be representative of a person as opposed to a statistical table or numbers. This approach also directly supports the use of human modelling. An individual can easily be recreated when stored in this manner, thus addressing a fundamental issue with existing data, where a human model would have to be reconstructed from a highly decomposed set of measures from which the variability of real people is difficult to capture [8]. In addition to the database, a tool has also been developed to assist designers and ergonomists in their efforts to design for all through the provision of a means to conduct an ergonomics assessment of a product in the virtual environment by making use of human modelling. Acknowledging the skill and expertise required to accurately capture and represent realistic postures in a human modelling system, the tool has been developed to encourage the user to describe the task they wish to perform as
702
R. Marshall et al.
opposed to driving the human model. The user then focuses their efforts in describing a task allowing the tool to perform the complex manipulation of the human model in an attempt to perform that task. The tool provides a summary of how successful the people in the database were in performing the task and allows the user to explore difficulties individuals experienced and to try out potential solutions. In effect, the tool acts as means of conducting virtual fitting trials with the individuals in the database forming a readily accessible virtual user group.
2 HADRIAN The combined database and task analysis tool is known as HADRIAN (Human Anthropometric Data Requirements Investigation and ANalysis) [9]. Whilst HADRIAN provides data and a means to perform a task analysis it works together with an existing human modelling tool: SAMMIE [10]. This paper will now focus on the task analysis component of the HADRIAN system, describing its use, the underlying methodology that enables the virtual task simulation to take place, and its relationship with SAMMIE. 2.1 Basic Approach To define a task HADRIAN takes the approach that a task is essentially a dynamic process consisting of a series of smaller elements that combine to achieve a particular goal. HADRIAN then essentially looks to determine key-frames, or static snapshots, of the task at the moment these smaller elements occur. To simplify the process SAMMIE is utilised for its functionality to model the elements of these static snapshots, namely: a posture for the human model, a target object, and an environment. Therefore, the main requirement for the task analysis tool is to support the use of the data in the database in determining a suitable and realistic posture for the key-frame. SAMMIE also contains some tools to aid in the process of determining a posture for task related elements such as reach in addition to a number of standardised postures. Though these tools are not a complete solution they provide a starting point which is manipulated by HADRIAN based upon the data stored in the database to achieve a more realistic posture and thus potentially more realistic outcome for the task analysis. 2.2 Task Elements The key frames or static snapshots identified previously are referred to as task elements in HADRIAN. Tasks are therefore essentially combinations of task elements each governed by an environment consisting of the CAD model to be assessed, a set of parameters associated with the task, and some understanding of the sequence of task elements. Together these components influence the posture of the human model and effectively determine if the human model can achieve a posture that would result in a successful completion of the task element, and thus the task. The synthesis of a posture in response to the demands of a task element is performed through a number of mechanisms that may be manipulated to form the posture. These mechanisms include:
Virtual Task Simulation for Inclusive Design
703
• Vision: position of head, neck and eyes in order to successfully view the target • Reach: position of hand, forearm and upper arm (or foot, calf and thigh) in order to reach the target • Attitude: position and orientation of other body elements to aid in vision and reach mechanisms • Posture: the starting posture’s influence as an initial component of the overall posture • Location: position and orientation of the human as a whole • Sequence: a manipulation of the vision, reach, attitude, and location posture elements taking into account the previous and future postural key-frames. Sequence may also take account of loading / strength factors and their influence on the overall posture. As a further influence upon the posture adopted at each task element key-frame a number of additional task parameters have been defined including: the preferred hand for the task (if required); the desired or required grip type; the part of the body with which to reach; how long (time) the task element is to be performed for; the number of times the task element is to be performed; the task elements that are dependent on the current task element being completed; the important of the task element to the overall task; and the success parameters for a specific task mechanism e.g. view distance. 2.3 Task Description This syntax defined in the task elements provides a set of influences upon the task. However not all of these influences need concern the user, or person specifying the task. Indeed a particular feature in the development of HADRIAN was how to balance the natural language way of describing a task that a person would normally use, with the much more specific and detailed description that a computer based tool would need in order to interpret what was trying to be achieved. Figure 1 illustrates how three different levels of describing a task may be defined. The Natural language level is the most user friendly and accessible but is far too vague to drive an automated analysis. The user level is an intermediate level where the user is able to build a task through the use of accessible commands together with a target for that command. The system level is how the system interprets the specified command and the mechanisms it employs in order to respond. Though the syntax at user and system level are similar, it is the approach of the system in interpreting the commands and specifically the way in which an unsuccessful task element attempt is managed that defines the system’s response. In HADRIAN the user level is employed to describe task elements and ultimately a task. The system provides a set of commands for the user to select as appropriate and a facility to select the appropriate target for each command.
704
R. Marshall et al.
Natural language
User level
View oven control (identify appropriate control)
Reach oven control Put the oven on. Operate oven control
View oven control (check the correct setting)
System level 1 View target 1 (eye movement only) IF success (1st loop) (2nd loop) (3rd loop)
THEN GOTO 2 ELSE 1 adjust head attitude and repeat ELSE 2 adjust torso attitude and repeat ELSE 3 FAIL AND EXIT
2 Reach target 1 IF success THEN GOTO 3 (1st loop) ELSE 1 adjust shoulder attitude and repeat (2nd loop) ELSE 2 adjust torso attitude and repeat (3rd loop) ELSE 3 FAIL AND EXIT
3 Analyse capability for this posture IF ok THEN GOTO 4 (1st loop) ELSE 1 adjust torso attitude and GOTO 2 (2nd loop) ELSE 2 FAIL AND EXIT
4 View target 1 (eye movement only) IF success THEN PASS AND EXIT (1st loop) ELSE 1 adjust head attitude and repeat (2nd loop) ELSE 2 adjust torso attitude and GOTO 2 (3rd loop) ELSE 3 FAIL AND EXIT
Fig. 1. An example hierarchy of task description
The details of all HADRIAN task commands is beyond the scope of this paper, but a single HADRIAN command example (reach) is shown below: Syntax REACH target [parameters] Where: target is a named and existing object in the environment; parameters is a comma-separated list of: − HND = hand is a character value (n = nearest, r = right, l = left, b = both) representing the appropriate hand for the task (unspecified = default hand). − or ANA = anatomical reference is a character value representing the appropriate part of the body for the task (h = hand, f = foot) − or GRP = grip is a character value representing the grip type (unspecified = default, o = none, f = fingertip, t = thumbtip, p = palm) − or DUR = duration is an integer value representing the number of subsequent task elements for which to maintain the reach posture (unspecified = whole of task, 0 = this task element only). − or GTE = gate is a comma-separated list of task identifiers. − or IMP = importance is an integer value (1-10) representing the task-based importance of this task element.
Virtual Task Simulation for Inclusive Design
705
Example reach credit_card (HND=r,DUR=0,GRP=t,IMP=10) Identifies that an object named credit_card is to be reached using a right-handed thumbtip grip, that the reach posture will be maintained for this task element only, that no other task elements can be completed until the credit_card has been reached, and that this task element has the maximum overall importance. The pedantic nature of the commands that are utilised within HADRIAN is a deliberate attempt to provide a structured control language that may be used to drive a virtual task simulation irrespective of the particular implementation within HADRIAN. It is recognised that this level of detail is not particularly user friendly and thus the implementation of the HADRIAN interface largely shields the user from this format. However, the command structure does provide a common interface to the task analysis capability within SAMMIE and thus a way for other implementations to be developed, or the functionality accessed without the direct need for HADRIAN. 2.4 System Strategies Once a task has been defined by the user the analysis is implemented or ‘run’. The implementation of task commands within the system involves a combination of: the interpretation of task commands into the existing capabilities of SAMMIE, a set of adaptation processes for dealing with common task situations and eventualities, and the management of task element interaction through the task framework. An example of the interpretation of a task command into SAMMIE is shown below 2.4.1 Command Interpretation LOOK command The LOOK statement initiates a single SAMMIE view command aimed at the specified target. In addition, the distance between the eye-point and the target is determined and compared to the default view distance or the command parameter: value. REACH command The REACH statement initiates a single SAMMIE reach command aimed at the specified target. The reach can be performed using the default human model hand (e.g. left hand for left handed people) or the command parameter: hand with the default human model grip or the command parameter: grip. For two handed grip objects, two reach commands are issued, one for each hand. Alternatively, the REACH command can be initiated using the command parameter: anatomical reference for specific hand reaches (forced to use right hand even if human is left handed), or reaches with the foot. 2.4.2 Adaptation Processes As we have seen each task command within HADRIAN contains a number of parameters. However, not all of these parameters are mandatory. For optional parameters, HADRIAN has been developed to be adaptable in its response to the command.
706
R. Marshall et al.
The first situation requiring a flexible response by the system is in the event of a missing piece of key information such as which hand to use for a reach operation. In such cases the system interrogates the data it has available to it. The first is the data on the human model taken from the HADRIAN database. Relevant data in this example may include handedness, or anthropometry and joint mobility for the arms. The second is the location and orientation of the human with respect to the target for the task command. This data will inform how the system responds and ultimately decides which hand to use for the task. The most significant area of system adaptation to the task analysis is in the event of a task element failure. In these instances the system follows a basic core process: 1. If the failure is absolute, such as a strength check failure then we move on to point 7. If the failure is postural we continue. 2. Determine relative positioning of the target and the human model. The area around the human model is partitioned into 8 zones for reach (Figure 2a): 2 vertical zones (first digit [0-]) and 4 horizontal zones (second digit [-0]); or 5 zones for vision (Figure 2b). The appropriate zone is determined for the target. 3. For targets in zones 14, 24 or 34 the human model is turned through 180º and the task element repeated. For opposite zones, such as 13/33 for right limb reach or 12/32 for left limb reach, the nearest limb will be used (if possible) or the human model will be turned 90º. For low zones, such as 31 for vision, the human model will crouch or kneel. 4. Measure distance between the target and the key human model reference (e.g. eyepoint for vision, shoulder for arm reach). If out of reach by a large margin then we move to point 6. If not, continue. 5. Based upon reference-target distance one or more of the following will be applied: − The head /neck is rotated such that the head is facing the target − The head /neck is rotated and extended / flexed such that the eye-point to target distance equals the desired parameter value − The torso is rotated and flexed such that the eye-point to target distance equals the desired parameter value − The reference to target vector is calculated and the shoulder ‘pointed’ along that vector to achieve a successful upper-limb reach − The torso is rotated and flexed in the direction of the target to achieve a successful upper-limb reach. If there is still a failure we move on to point 7. 6. If the reference to target distance is greater than those accommodated by a posture change one or more of the following will be applied: − The human model is turned to face the target. − The human model is moved closer to the target. If there is still a failure we move on to point 7. 7. In the event of absolute failure, a failure is flagged for the results and the next task element is addressed.
Virtual Task Simulation for Inclusive Design
707
Zones Left
-3
Front
-1
Right High
1-
-2
133-
Low
-4
Back Human Fig. 2a. Reach orientation zones
Front
-1
Zones
12-
-4
Human
3-
High Mid
Low
Back Fig. 2b. View orientation zones
2.4.3 Task Framework The task framework is an attempt to adjust the human model’s approach to each task element based on consequent and subsequent task elements. For the system, this requires information on the ‘likely’ location, orientation and posture of the human model and any specific details such as the hand to be used and any objects to be interacted with for each task element. The framework is based on information given during the process of task definition. From the task description the framework specifies the number of task elements, which of those elements are dependent on each other and in what way. The framework also identifies all of the objects that are targets of the task elements and creates a map of the main areas of activity. The system process to develop the framework consists of the following steps: 1. From task definition determine number of task elements. 2. Scan the task definition for those task elements that are linked through the gate parameter and build a framework map. 3. Scan the task definition for the duration parameter and add this to the framework map. 4. Identify all target objects and collect their locations from the model. Analyse their layout and determine the location and orientation for the human model (Figure 3):
708
R. Marshall et al. Element 3 1m
Element 1 Element 2
Element 7 1 Elements 4, 5, 6, 8
1 A
4
Grid weighting
1 B
1
0,0,0
The shaded areas highlight the ‘working’ areas for the task elements. The weightings and their layout determine two locations for the human model: A for elements 1,2,3, and B for elements 4,5,6,7,8.
Fig. 3. The task framework analysis grid
− Overlay a 1m by 1m grid on the environment − Identify the ‘working’ grid areas (i.e. those that contain an interaction) − Weight the working areas according to the number of task elements per grid and duration − Determine human model locations and orientations for task elements: − Check weighting and adjacency of each grid. For adjacent and equally weighted grids adopt a mean location and orientation. For adjacent and nonequally weighted grids bias the location and orientation towards the greater weighting. For non-adjacent grids start a new location and orientation. 5. Collect target interaction specifics (e.g. hand to use) from the task element targets. Refine the framework map accordingly. 2.5 Human Models The human models of the system are visual representations of the user population. In addition they are data sets that reflect the variety within the population of characteristics such as anthropometry, somatotype, capability, handedness and behaviour. Thus the human models are stored as individuals and their embedded data will be used to influence how they address the task. The behavioural details are a critical element in how the system synthesises a task element posture when the default approach is not sufficient. As an example, a task element requires an individual to place an object in an oven. The first check is to see if that person could reach the appropriate place in the oven from a normal (default) posture. Assuming the test to be a failure the question then asked is what postural changes are made to ensure a success? Whilst a strategy has been established in Section 2.4.2, a general approach cannot be adequate when dealing with the variety of human behaviour. Thus, each individual human model contains data on their behaviour to common task elements such as reaching below the level of the hips (do they bend their back, do they bend their legs, do they crouch, do they sit on the floor?). This information then modifies the generic process shown in Section 2.4.2 to an appropriate response for that individual.
Virtual Task Simulation for Inclusive Design
709
3 Conclusions and Further Work To support those endeavouring to design for all, a tool has been developed called HADRIAN that works in conjunction with an existing human modelling tool, SAMMIE. HADRIAN provides an accessible and applicable database of 100 individuals. These data can then be employed in a task analysis system that simplifies and automates the analysis performed in a typical human modelling system. In addition, the analysis is enhanced through a series of mechanisms that attempt to replicate a real task being performed by real people in a virtual task simulation. The underlying methodology employed by the task analysis system is an ongoing area of research and its implementation is currently undergoing validation in two studies to examine its performance against both the use of a human modelling system by an expert, and the benchmark of a real user trial. It is intended that the validation findings will be used to refine and improve the HADRIAN model outlined in this paper.
References 1. Marshall, R., Summerskill, S.J., Porter, J.M., Case, K., Sims, R.E., Gyi, D.E., Davis, P.M.: Multivariate design inclusion using HADRIAN. In: Proceedings of the SAE 2008 Digital Human Modelling for Design and Engineering Conference and Exhibition, SAE Paper No 2008-01-1899, Pittsburgh, USA, June 2008, pp. 1–8 (2008) 2. Adultdata: The handbook of adult anthropometry and strength measurements – data for design safety. In: Peebles, L., Norris, B. (eds.) Department of Trade and Industry (1998) 3. Treleaven, P.: How to fit into your clothes: Busts, waists, hips and the UK National Sizing Survey. Significance 4(3), 113–117 (2007) 4. Caesar: Civilian American and European Surface Anthropometry Resource (2008), http://store.sae.org/caesar/ (accessed, 26/02/2009) 5. Paquet, V., Feathers, D.: An anthropometric study of manual and powered wheelchair users. International Journal of Industrial Ergonomics 33(3), 191–204 (2004) 6. Das, B., Kozey, J.: Structural anthropometric measurements for wheelchair mobile adults. Applied Ergonomics 30(5), 385–390 (1999) 7. Gyi, D.E., Sims, R.E., Porter, J.M., Marshall, R., Case, K.: Representing older and disabled people in virtual user trials: data collection methods. Applied Ergonomics 35(5), 443–451 (2004) 8. Porter, J.M., Case, K., Marshall, R., Gyi, D.E., Sims, R.E.: Beyond Jack and Jill: designing for individuals using HADRIAN. International Journal of Industrial Ergonomics 333, 249– 264 (2004) 9. Marshall, R., Case, K., Porter, J.M., Sims, R.E., Gyi, D.E.: Using HADRIAN for Eliciting Virtual User Feedback in ’Design for All’. Journal of Engineering Manufacture; Proceedings of the Institution of Mechanical Engineers, Part B 218(9), 1203–1210 (2004) 10. Porter, J.M., Marshall, R., Freer, M., Case, K.: SAMMIE: a computer aided ergonomics design tool. In: Delleman, N.J., Haslegrave, C.M., Chaffin, D.B. (eds.) Working Postures and Movements – tools for evaluation and engineering, pp. 454–462. CRC Press LLC, Boca Raton (2004)
Data Mining of Image Segments Data with Reduced Neurofuzzy System Deok Hee Nam and Edward Asikele Wilberforce University, Engineering and Computer Science 1055 N. Bickett Road, Wilberforce, OH 45384 [email protected]
Abstract. The target detection from raw images is a primary task in the image processing. Simultaneously, in order to perform the target detection in the image processing, a large number of variables or factors including unnecessary factors may be involved. This paper presents the pattern recognition through the image scaling based upon the characteristics of various images using the reduced dimension from the original characteristic dimension of the images. Using the less number of dimensions comparing to the original characteristic dimensions, the processing procedures can be simplified and able to overcome the restrictions of the systematic problems. To estimate the performance of the system, neurofuzzy systems with multivariate analysis including factor analysis, principal component analysis, and Fuzzy C-means clustering analysis, are applied. Using the proposed algorithm, the analyses of various image data can be compared. Keywords: data mining, image processing, image scaling, pattern recognition, system reduction, target detection.
1 Introduction To realize or implement the large dimensional image data, it may be taking a longer search time to detect the desired target. Recently, for the large amount of data and information in the engineering or biomedical applications, various techniques including soft-computing techniques such as neural networks, fuzzy logic, or genetic algorithms, and multivariate analysis techniques like factor analysis, principal component analysis, or clustering analysis, are developed to extract valuable and meaningful information or knowledge from the original raw data. In this paper, for mining or diminishing the large dimension of the given raw image data, factor analysis, principal component analysis, and clustering analysis are used to make a model using fuzzy logic or neurofuzzy system, which is applied to predict the characteristics of the images with reduced dimensions. Generally the procedure can produce more precise and reasonable results with reduced dimensions in order to predict the desired images. In addition, all those techniques are useful for searching and saving time for the desired images. Thus, the proposed techniques intend to propose hybrid systems with integrating various multivariate analyses techniques together to establish neurofuzzy or fuzzy logic systems to construct a reasoning system with more accurate and efficient. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 710–716, 2009. © Springer-Verlag Berlin Heidelberg 2009
Data Mining of Image Segments Data with Reduced Neurofuzzy System
711
2 Literature Review 2.1 Multivariate Analysis There are a lot of different kinds of data mining techniques to reduce the large and imprecise raw data into the smaller and precise raw data. Most frequently used techniques are multivariate analyses like factor analysis, principal component analysis, and clustering analysis. Factor analysis [1] concerns the study of the embedded relationships among the given variables to find or extract new variable sets, which are hidden and fewer in number than the original number of variables from the given data. In general, factor analysis attempts to reduce the complexity and diversity among the interrelationships of the applied data that exist in a set of observed variables by exposuring hidden common dimensions or factors. Therefore, those newly extracted factors or variables after factor analysis can reform more precise and independent variables with less common dimensions among variables, and the more precise information about the embedded structure of the data can be provided by factor analysis. Principal component analysis [2] and factor analysis usually produce very similar estimates. However, principal component analysis is often preferred as a method for data reduction, while factor analysis is often preferred when the goal of the analysis is to detect structure. One of the goals of principal component analysis is to reduce the dimension of the variables, such as transforming a multidimensional space into another dimension (i.e., same or less number of axes or variables), depending upon the given data. Hence, the principal component analysis converts the normalized data to the new data, called principal component scores, which represent the original data with the new combinations of variables that describe the major pattern of variation among data. Finally, clustering analyses [6] are methods for grouping objects or observations of similar kind into respective categories. In other words, cluster analysis is an exploratory data analysis tool which aims at sorting or separating different observations into the similar kinds of groups in a way that the degree of association between two observations or objects is maximal if they belong to the same group and minimal otherwise. In addition, cluster analysis can be used to recognize the structures in data without providing an explanation or interpretation. 2.2 Fuzzy Logic and Neurofuzzy System Fuzzy logic was originally identified and set forth by Professor Lotfi A. Zadeh [4]. In general, fuzzy logic [4] is applied to the system control or the design analysis, since applying fuzzy logic technique is able to reduce the time to develop engineering applications and especially, in the case of highly complicated systems, fuzzy logic may be the only way to solve the problem. As the complexity of a system increases, it becomes more difficult and eventually impossible to make a precise statement about its behavior. Occasionally, it arrives at a point where it cannot be implemented due to its ambiguity or high complexities. The neurofuzzy system [3] consists of the combined concepts from neural network and fuzzy logic. To implement the neurofuzzy systems, Adaptive-Network-Based Fuzzy Inference System (ANFIS) [5] is used by implementing the reduced data sets and the actual data set. ANFIS is originally from the integration of TSK fuzzy model [3], developed by Takagi, Sugeno, and Kang (TSK), using the backpropagation
712
D.H. Nam and E. Asikele
learning algorithm [6] with least square estimation from neural networks. TSK fuzzy model proposed to formalize a systematic approach to generating fuzzy rules from and to input-output data set.
3 Data Structure To perform the proposed technique, the selected image segment data [8] provided by Vision Group of University of Massachusetts are used. The instances were drawn randomly from a database of seven outdoor images. The images were hand segmented to create a classification for every pixel. The selected image segment data are consist of seven different measurement fields such as region centroid column, region centroid row, vedge-mean, hedge-mean, raw red mean, raw blue mean, and raw green mean, and four image classes like brickface, foliage, cement, and grass. The following describes each measurement field and the part of the image segment data. 1. Region centroid column: the column of the center pixel of the region. 2. Region centroid row: the row of the center pixel of the region. 3. Vedge-mean: measure the contrast of horizontally adjacent pixels in the region. There are 6, the mean and standard deviation are given. This attribute is used as a vertical edge detector. 4. Hedge-mean: measures the contrast of vertically adjacent pixels. Used for horizontal line detection. 5. Raw red mean: the average over the region of the R value. 6. Raw blue mean: the average over the region of the B value. 7. Raw green mean: the average over the region of the G value. Table 1. Image Segment Data
CEMENT
Region centroid column 191
Region centroid row 119
Vedgemean
Hedgemean
1.294
0.77
Raw red mean 48.222
Raw blue mean 35.111
Raw green mean -10.8889
BRICKFACE
140
125
0.063
0.31
7.667
3.556
3.4444
GRASS
204
156
0.279
0.56
25.444
28.333
-19.1111
FOLIAGE
101
121
0.843
0.89
6
3.111
-6.8889
CEMENT
219
80
0.33
0.93
48.222
34.556
-10.1111
BRICKFACE
188
133
0.267
0.08
7.778
3.889
5
GRASS
71
180
0.563
1.87
21.444
27.222
-12
FOLIAGE
21
122
0.404
0.4
1.222
0.444
-1.6667 -14.8889
CEMENT
136
45
1.544
1.53
63.778
47.333
BRICKFACE
105
139
0.107
0.52
7.222
3.556
4.3333
GRASS
60
181
1.2
2.09
17.889
23.889
-7.5556
……
…
…
…
…
….
…
Data Mining of Image Segments Data with Reduced Neurofuzzy System
713
4 Proposed Algorithm Among implemented algorithms, the preprocessing of principal components analysis [2] followed by Fuzzy C-means (FCM) clustering analysis [3][6] is shown as a selected algorithm to present in this paper. The following steps summarize the proposed algorithm to implement the reduced image segment data from the original image segment data. Step 1. Read the original data set as a matrix format. Step 2. Normalize the original data from Step 1. Step 3. Find the correlation matrix of the normalized data from Step 2. Step 4. Find eigenvalues and eigenvectors of the correlation matrix from Step 3 using characteristic equation. Step 5. Define a matrix that is the eigenvectors from Step 4 as the coefficients of principal components using the criterion for extracting components. Step 6. Multiply the standardized matrix from Step 2 and the coefficients of principal components from Step 5. Step 7: Using the implemented data from Step 6, find the centers of clusters. Step 8: Initialize the partition matrix, or membership matrix randomly such that U(0) ∈ Mfcn. n
Step 9: Calculate the cluster centers, vi, using the equation,
vi =
∑ (u
ik
) m xk
k =1 n
∑ (u k =1
. ik
)
m
Step 10: Compute the distance, dik. Step 11: Update the partition matrix U(new) using the equation
1
u ik =
⎛d ∑ ⎜⎜ ik j =1 d jk ⎝ c
2
⎞ m −1 ⎟ ⎟ ⎠
for uik. If dik > 0, for 1 ≤ i ≤ c, and 1 ≤ k ≤ n, then get the new uik. c
Otherwise if dik > 0, and uik = [0, 1] with
∑u
( new ) ik
= 1 , then uik(new) = 0.
i =1
Step 12: Until || U(new) − U(old) || < ε where ε is the termination tolerance ε > 0. If this condition is not satisfied, then go back to step 9.
5 Analysis and Results Before the proposed data reduction algorithms are applied into the image segment data, the image segment data need to be examined whether the data can be diminished by the redundancy among its original variables with the highly correlated interrelationship. To examine the redundancy, the correlations between the variables of the
714
D.H. Nam and E. Asikele
image segment data set are calculated. As shown in the Table 2, the correlations of the “Brickface” image segment data are presented. The bolded numbers are showing the relatively higher correlation so that there is a possibility to be extracted as a new factor between those measurements. Table 2. Pearson’s correlation values for the “Brickface” image segment data
Region centroid column Region centroid row Vedge-mean
Region centroid column 1
Region centroid row
0.333
1
Vedgemean
Hedgemean
Raw red mean
Raw blue mean
-0.165
-0.266
1
Hedge-mean
-0.015
-0.194
0.351
1
Raw red mean
0.008
-0.729
0.33
0.412
Raw blue mean
0.004
-0.691
0.334
0.408
0.993
1
Raw green mean
-0.017
0.675
-0.248
-0.388
-0.808
-0.747
Raw green mean
1 1
In addition, there are different criterions to select the reduced dimension for the new reduced variables after extracting new variables from the original data. For this example, two combined criterions are applied. One is the eigenvalues-greater-thanone rule by Cliff [10] and the second criterion is the accumulated variance that is more than 0.9 from the reduced system. Using two categories, 4 newly extracted variables are selected among 7 different measurement variables.
Scree Plot of C1, ..., C7 4
Eigenvalue
3
2
1
0 1
2
3
4 Factor Number
5
6
7
Fig. 1. Scree plot for the newly extracted components for “Brickface” image segment data
Data Mining of Image Segments Data with Reduced Neurofuzzy System
715
From Table 3, the evaluated analyses of the performance using the proposed algorithms through the neurofuzzy systems [4][5][7] are shown. From the results of Table 3, the results from the methods using factor analysis [1] and FCM clustering show relatively better results than other methods including the combinations of principal component analysis and FCM clustering. Table 3. Analyses of Performance using proposed algorithm and conventional factor analysis and principal component analysis
fa pca fc pc
CORR 0.3984 0.109 0.3652 -0.2541
TRMS 0.8901 1.5185 0.8807 1.2676
STD 0.6664 1.1062 1.3249 1.2878
MAD 0.8826 1.5059 0.8697 1.2517
EWI 3.0407 5.0215 3.7101 4.5529
6 Conclusion The pattern recognition of image segment data has been implemented through the neurofuzzy systems using the reduced dimensional data in variables and observations. For the implementation, four newly extracted embedded new variables from 7 original measurements variables are used. The proposed algorithm performs the relatively better results than using the conventional multivariable techniques by themselves. As described in Table 3, using the combination of factor analysis and FCM clustering analysis, the prediction of the patterns for the image segment data shows the relatively better results. The prediction results using the conventional principal component analysis shows relatively worse than using the proposed algorithm. This result may lead to the conclusion that for a limited number of input-output training data, the proposed algorithm can offer the better performance in comparison with the performances of the other techniques for image segment data.
Acknowledgments This material is based upon work supported by Clarkson Aerospace Corporation.
References 1. Gorsuch, R.L.: Factor Analysis, 2nd edn. Lawrence Erlbaum Associates Inc., Hillsdale (1983) 2. Kendall, M.: Multivariate Analysis. MacMillan Publishing Co. INC., New York (1980) 3. Yager, R., Filev, D.P.: Essentials of Fuzzy Modeling and Control. John Wiley & Sons, New York (1994) 4. Lin, C., Lee, C.: Neural Fuzzy Systems. Prentice Hall, Englewood Cliffs (1996) 5. Jang, J.S.: ANFIS: Adaptive Network Based Fuzzy Inference System. IEEE Trans. Systems, Man and Cybernetics 23(3), 665–684 (1993)
716
D.H. Nam and E. Asikele
6. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. John Wiley & Sons, Chichester (2001) 7. Fuzzy Logic Toolbox, for use with the MATLAB. The Math Works Inc. (2003) 8. Image segment data provided by Vision Group, University of Massachusetts, http://www.cs.toronto.edu/~delve/data/datasets.html 9. Nam, D., Singh, H.: Material processing for ADI data using multivariate analysis with neuro fuzzy systems. In: Proceedings of the ISCA 19th International Conference on Computer Applications in Industry and Engineering, Las Vegas, Nevada, November 13–15, pp. 151–156 (2006) 10. Cliff, N.: The Eigenvalues-Greater-Than-One Rule and the Reliability of Components. Psychological Bulletin 103(2), 276–279 (1988)
Appendix: Abbreviations CORR: Correlation n
TRMS: Total Root Mean Square
TRMS =
∑
( xi − y i ) 2
i =1
n −1
where xi is the estimated value and yi is the original output value. STD: Standard Deviation MAD: Mean of the absolute EWI [9]: Equally Weighted Index, the index value from the summation of the values with multiplying the statistical estimation value by its equally weighted potential value for each field fa: Factor Analysis pca: Principal Component Analysis FCM: Fuzzy C-means Clustering Analysis fc: preprocessing FA and SUBCLUST pc: preprocessing PCA and SUBCLUST.
The Impact of Change in Software on Satisfaction: Evaluation Using Critical Incident Technique (CIT) Akshatha Pandith1, Mark Lehto1, and Vincent G. Duffy1,2,3 1
Industrial Engineering, Purdue University, 315 N. Grant Street, West Lafayette, IN 47907, USA 2 Agricultural and Biological Engineering, Purdue University 3 Regenstrief Center for Health Engineering, Purdue University {apandith,lehto,duffy}@purdue.edu
Abstract. This paper describes an exploratory study that analyzes the impact of change in software on users by utilizing the Critical Incident Technique (CIT). A total of 102 critical incidents were collected from the survey. 77 participants reported both satisfactory and unsatisfactory experiences; 22 reported only satisfactory experiences; and 3 reported only unsatisfactory experiences. Analysis of satisfactory or unsatisfactory experiences revealed several factors such as expectations of users and mismatch in the behavior between the actual and anticipated system by the users, which can be attributed to automation surprise. The important findings of this study are the agglomeration of user feedback such as, avoiding the changes themselves in the first place, focusing on the factors of change viz. amount of change, speed of change, and finally, to provide better help support, which can be used in the design process when there is a change in software. Keywords: Critical Incident Technique, Change in software, Impact of change, Information overload, Automation surprise.
1 Introduction This paper aims to understand the impact of change in software. It is observed that change in any area is inevitable and so are its effects. Whether change is viewed as positive or negative depends not only on the outcome of the change, but also on the degree of influence it exerts on the situation [1]. A change is viewed as negative when people are unable to foresee it, when people dislike its implications, and when people feel unprepared for its effects. In other words, unrealistic expectations, willingness or commitment to change and approach to change are factors that may lead to satisfactory or unsatisfactory experiences. Change is perceived to be positive when people feel in control, are able to accurately anticipate the events and influence the immediate environment or at least prepare for the consequences [1, 3]. Willingness to change is influenced by ability and desire. Deficiencies in ability to adapt to change resulting from inadequate skills should be addressed by training. Information overload has been another problem that’s caused impact on pilots of fighter aircrafts and attack helicopters due to change, as they have to process large amounts of information and make V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 717–726, 2009. © Springer-Verlag Berlin Heidelberg 2009
718
A. Pandith, M. Lehto, and V.G. Duffy
decisions within split seconds [2, 3]. The same concept of information overload can be applied to understand the burden experienced by the users during change in software Automation surprise can be one of the reactions to a system that undergoes a change. A reliable indicator of how people respond to change is the degree of surprise they exhibit when they encounter the change [4]. In other words, surprises occur when people anticipate one thing and instead experience something drastically different. It is sometimes difficult for the human operator to track the activities of their automated partners [5]. The result can lead to situations where the operator is surprised by the behavior of the automation asking questions like, “what is it doing now?”, “why did it do that?”, or “what is it going to do next?” [6]. Thus, automation has created surprises for practitioners, who are confronted with unpredictable and complicated system behavior, in the context of ongoing operations [7]. Information overload is another important factor that should be considered in software change settings. As more and more icons and features are added, it can create a sense of data overload to the user. The psychologist David Lewis proposed the term "Information Fatigue Syndrome" to describe the resulting symptoms. Other effects of too much information include anxiety, poor decision-making, difficulties in memorizing and remembering, and reduced attention span [8]. These effects merely add to the stress caused by the need to constantly adapt to a changing situation. In this research, the Critical Incident Technique (CIT) is used in an exploratory study to capture the impact of change on users and the way users perceive changes. The captured incidents are then classified based on satisfactory or unsatisfactory experiences [9]. The data has been further classified based on the impact of change on performance, acceptance, and satisfaction. The causes for such effects or surprises have been analyzed in order to reduce negative experiences. Finally, several suggestions are proposed to make the change process smoother and easier, to enhance the overall experience, and also to make use of new software in its intended way.
2 Method 2.1 Critical Incident Technique Critical Incident Technique (CIT) has been used in this study to identify the problems caused by automation surprises and other change related events. CIT is a research method developed by John C. Flanagan to capture human behavior and can be defined as a set of procedures for systematically identifying behaviors that contribute to success or failure of individuals or organizations in specific situations. CIT relies on the idea that the critical incidents will not only be memorable but also have either a positive or negative effect [10]. These incidents can be acquired through interviews, observations, or self-reporting. Some of the advantages of this method are: it is performed by real users in the normal working environment, users self-report the critical incident, there is no direct interaction needed between user and the evaluator during an evaluation session. Also, data capture is cost-effective, high quality and therefore relatively easy to apply to study usability [11].
The Impact of Change in Software on Satisfaction: Evaluation Using CIT
719
2.2 Questionnaire Design The interview questionnaire was derived from samples of other studies that were used in the CIT [10, 12, 13]. Several parameters were considered for developing the questionnaire in an attempt to capture the effect of change. The content of the questionnaire consisted of multiple choice questions, as well as open-ended and subjective rating scale questions. Open-ended questions were developed in order to understand the expectations of the users in depth. The responses were first classified into satisfactory and unsatisfactory experiences and further were classified based on the impact of change on key parameters, such as performance, satisfaction and acceptance. 2.3 Data Collection A pilot study involving 7 subjects was conducted to refine the interview questions and the data-collection methodology. In the standardized semi-structured telephone or face-to-face interview, participants were asked to describe satisfactory and unsatisfactory critical incidents and the circumstances leading to such incidents. Details about additional incidents and possible remedies to reduce such incidents were also collected, if provided. A total of 433 people were contacted for interviews. Approximately 340 telephone calls were made, out of which 58 people accepted to participate in the survey. This led to a 17% success rate with regards to this interviewing method. Ninety three people were contacted for face-to-face interviews, of which 56 agreed to participate in the survey subsequently leading to 60% success rate of face-to-face interviews. Out of a total 114 survey responses, a total of 102 completed surveys were further used in the analysis. The criterion for screening the required participants was individuals having more than a minimum of 2 years experience in using the software.
3 Results and Discussion The results obtained from the open-ended question segments of the questionnaire are presented in detail below for unsatisfactory and satisfactory experiences. Pareto analysis was conducted in order to better understand the factors causing satisfactory and unsatisfactory experiences. A Chi-square analysis was conducted to determine the statistical differences in the types of change between satisfactory and unsatisfactory experiences. Additionally, logit analysis provided an estimate of the log likelihood ratio of satisfactory to unsatisfactory experiences for various categories of changes. The final section presents the subjective rating responses along with descriptive statistics, t-tests as well as correlation and regression analyses of the subjective responses. 3.1 Unsatisfactory and Satisfactory Experiences The most significant unsatisfactory and satisfactory experiences that the users cited to have experienced during change are shown in Table 1. It can be gathered from the interviews that people complained the most about wasting time along with a waste of effort and cause for frustration upon switching to a different version of software. These are
720
A. Pandith, M. Lehto, and V.G. Duffy Table 1. Frequency of Reasons Cited by Subjects
Unsatisfactory Experiences Problem area Waste of time Waste of effort Frustration Annoying Had to re-learn No feedback from system Discomfort to use No additional benefits Too much data Waste of money Requires additional attention
Freq 50 39 24 9 8 7 6 6 3 2 2
Satisfactory Experiences Reasons for Satisfaction Saves time Saves effort Increases feel good factor Reduces error Easy to use Easy to learn Better quality output Easy to understand More reliability Increases comfort Easy to navigate
Freq 80 37 27 10 7 6 6 4 4 3 2
Table 2. Comparisons between Satisfactory and Unsatisfactory Experiences Based on ChiSquared Test Type Of Change
Freq
Negative Experience
Positive Experience
Chi Square
Critical Value
Added
61
22
39
5.23
0.0222
Enhanced
46
31
15
1.20
0.2734
Replaced
17
13
4
6.65
0.0099
Compacted
36
28
8
7.86
0.0051
Removed
11
3
8
.58
0.4455
the top three crucial factors that the users felt were contributors to their dissatisfaction. The most frequently cited cause for satisfaction amongst users was that the change saved their time. The users also expressed satisfaction when the features in the new version saved effort and when it improved satisfaction or increased the feel good factor. The reasons for satisfaction also included reduced errors in software, ease of understanding and learning, as well as the ease to navigate when dealing with change. When comparing the satisfactory and unsatisfactory experiences based on the type of change, we can observe that features that were added or enhanced contributed more to a satisfactory experience whereas modification of any kind to the existing feature contributed more to an unsatisfactory experience. A Chi-square analysis of the statistical significance of differences in the change between satisfactory and unsatisfactory experiences was conducted, the results of which are shown in Table 2. One can observe that types of change such as added, replaced, and compacted features showed statistical significance. Logit analysis was used to develop a regression equation between the categorical variables (type of change) and the type of experience. The equation provides an estimate of the log likelihood ratio of satisfactory to unsatisfactory experiences for various categories of changes. In this way, logistic regression estimates the probability of a certain event occurring. The CATMOD procedure of SAS was used to calculate the maximum likelihood for each model. The majority of these individual estimates are highly significant. As shown in Table 3 the likelihood ratio increases significantly when the features were added, replaced or compacted during a change.
The Impact of Change in Software on Satisfaction: Evaluation Using CIT
721
n
(1)
logit ( Es) = log (ES / 1-E s) = intercept+∑ Bi Xi = g . i=1 Therefore Es = eg / (1+eg) .
(2)
Note that the relationship between type of experience (satisfactory and unsatisfactory) and likelihood estimates (type of change) is expressed by equations (1) and (2). Table 3. Maximum Likelihood Estimates Obtained During Logit Analysis Parameter (Xi) Intercept Added Replaced Compacted
Estimate (Bi) -3.657 1.259 1.562 1.599
Standard Error 1.687 0.55 0.605 0.57
3.2 Subjective Ratings Subjective rating questions were asked during the survey interviews in order to understand various aspects of change and its impact on key parameters such as performance, degree of change, and satisfaction. The following subsections describe the descriptive statistics, correlation and regression analysis of the subjective rating responses. Descriptive Statistics. The frequency of responses, means and standard deviations for each subjective question in satisfactory and unsatisfactory experiences are tabulated in Table 4. The descriptive analysis is used in this study to produce a situation analysis of a problem addressing the impact of change on performance, degree of evidence of changed features and overall satisfaction. This data provides a snap shot of the situation. Table 4 below displays several differences observed in the subjective measures such as performance, evidence of change and overall satisfaction depending on whether the user had a satisfactory or unsatisfactory experience with a change in software. Table 4. Descriptive Statistics of Subjective Questionnaire Items Item Impact on performance Degree of evidence Overall satisfaction Impact on performance Degree of evidence Overall satisfaction
Total Response(N)
Std dev 1 2 Unsatisfactory Experience
Mean
Frequency response 3 4 5 6
7
80
2.65
1.22
13
13
28
16
1
2
0
80
5.06
2.13
12
3
2
10
9
14
29
80
2.93 1.45 16 16 Satisfactory Experience
20
15
8
4
0
99
5.51
0.95
0
0
1
12
39
30
17
99
4.28
1.94
10
11
10
25
14
9
20
99
5.81
0.89
0
0
0
6
31
36
26
722
A. Pandith, M. Lehto, and V.G. Duffy
T-test Comparison between Satisfactory and Unsatisfactory Experiences. A contrast using T-tests was conducted to further explore the relationship between the user’s experiences with change reported in the subjective responses. The main purpose of this analysis was to examine the impact or the criticality of the critical incidents of user’s during change. As shown in Table 5, several differences were observed in the subjective measures of user’s satisfaction, impact on performance and evidence of change depending on whether the user had a satisfactory or unsatisfactory experience during the change in software. As might be expected, users were significantly less satisfied after an unsatisfactory experience ((µ satisfactory=5.81 vs. µ unsatisfactory=2.93; t=-16.94< t176, 0.05=1.99) and more satisfied after a satisfactory experience. The impact of performance when there was a change in software had a significant negative effect during an unsatisfactory experience ((µ satisfactory=5.51vs. µ unsatisfactory=2.67; t=1.72< t176, 0.05=1.99) and a positive effect when the user had a satisfactory experience. Another finding of this study was that the degree of evidence of change also showed statistical significance ((µ satisfactory=4.28vs. µ unsatisfactory=5.06; t=-2.35< t 176, 0.05=1.99). But this time the satisfactory experiences occurred when the changes were less evident and unsatisfactory experiences occurred when the changes were more evident. Table 5. T-test Comparison between Satisfactory and Unsatisfactory Experiences Variables Impact On Performance Degree Of Evidence Overall Satisfaction
DF 176 176 176
Satisfactory Mean 5.51 4.28 5.81
Unsatisfactory Mean 2.67 5.06 2.93
T Value
P> I T I
1.72 -2.35 -16.94
<.0001 <.01 <.0001
Correlation analysis. Correlation analysis was used to study the statistical significance of the impact of satisfactory or unsatisfactory experiences on various dependent variables such as level of satisfaction, degree of change, and performance. The inter-correlation matrix of the satisfactory experiences is tabulated in Table 6 and the satisfactory inter-correlation results are described in detail below. An initial result was that the correlation (Table 6) showed a statistically significant correlation (r=0.41, p<0.001) between the impact on performance of the user and the overall satisfaction with the change. This result shows a statistically significant relationship between satisfaction of the user and the impact on performance for the given change. From Table 7, for the unsatisfactory experiences, it can be observed that the data showed a positive correlation (r=0.35, p<0.001) between the negative impact on performance of the user and satisfaction level. As expected, it can be seen that the impacts of change on performance and overall satisfaction with change are closely related. A regression model is developed later to further investigate the relationship between the variables such as, impact of performance, evidence of change and overall satisfaction based on these correlation results.
The Impact of Change in Software on Satisfaction: Evaluation Using CIT
723
Table 6. Intercorrelation Matrix of Satisfactory Experiences Degree of Evidence Impact on Performance Overall Satisfaction r* -0.133 -0.11 Degree of Evidence X p* 0.23 0.29 n* 80 80 r* 0.35 Impact on PerformX p* <.001 ance n* 80 Overall Satisfaction r* X * r = Correlation Co-efficient, p* = Significance Value, n* = Number of Observations
Table 7. Intercorrelation Matrix of Unsatisfactory Experiences Degree of Evidence Impact on Performance Overall Satisfaction r* -0.02 -0.13 Degree of Evidence X p* 0.77 0.17 n* 99 99 r* 0.41 Impact on PerformX p* <.001 ance n* 99 Overall Satisfaction r* X * r = Correlation Co-efficient, p* = Significance Value, n* = Number of Observations
Regression Analysis. A regression analysis was performed on the survey data to develop an equation that predicted overall satisfaction as a function of the degree of evidence of change and its impact on performance. Table 8 presents the equation that was developed for the unsatisfactory experiences. As the degree of evidence did not have a statistical significance, it was removed from the model. It can be observed from the Table 8 that the impact on performance shows a statistically significant (p<0.001) relationship with the overall satisfaction with the change. Table 9 displays the ANOVA table for the regression model of the impact on performance on overall satisfaction for the satisfactory experiences. It can be observed from Table 9 that the impact on performance has a statistically significant (p<0.0001) relationship with the overall satisfaction level with the change. Table 10 displays the ANOVA table for the regression model of the impact on performance on overall satisfaction for the overall experiences. It can be observed from Table 10 that the impact on performance has a statistically significant (p<0.001) relationship to the overall satisfaction with the change. Table 8. Regression Table for Unsatisfactory Experiences
Dependent Variable Number of Observations Source Model Error Total Variable Intercept Impact on Performance
Analysis Of Variance Satisfaction Level 80 DF SS MS F 1 23.21 23.21 12.82 78 139.46 1.81 79 162.68 Parameter Estimates DF Parameter Estimate Std Error 1 1.75 0.36 1 0.44 0.12
RSquare Pr > F 0.0006
T- Value 4.84 3.58
0.14
Pr > ItI <.0001 0.0006
724
A. Pandith, M. Lehto, and V.G. Duffy Table 9. Regression Table for Satisfactory Experiences
Dependent Variable Number of Observations Source Model Error Total Variable Intercept Impact on Performance
Analysis Of Variance Satisfaction Level 99 DF SS MS 1 13.8 13.87 97 64.85 0.66 98 78.72 Parameter Estimates DF Parameter Estimate 1 1
3.64 0.39
RSquare F 20.75
Std Error
Pr > F <.0001
0.17
T-Value
Pr > ItI
7.51 4.56
<.0001 <.0001
0.48 0.08
Table 10. Regression Table for Overall Experiences
Dependent Variable Number of Observations Source Model Error Total Variable Intercept Impact on Performance
Analysis Of Variance Satisfaction Level 179 DF SS MS 1 356.43 356.43 176 249.79 1.41 177 606.22 Parameter Estimates DF Parameter Estimate 1 1.16 1 0.79
RSquare F 251.14
Std Error 0.23 0.05
Pr > F <.0001
T- Value 5.04 15.85
0.59
Pr > ItI <.0001 <.0001
4 Conclusions The goal of this study was to apply CIT to analyze the impact of change in software so as to gain insight into what kind of changes users prefer and to identify and reduce user dissatisfaction. Any modification to a system can result in a number of unanticipated problems. A research stream was created by applying the traditional CIT, which is quite sound and robust; to analyze the critical incidents of user experiences during change in software. The conceptual model was based on a literature review of automation surprise, information overload and change analysis. The data analysis was based on impact of change, type of software and type of change as described in results section. The analysis showed that elements such as time, effort, and errors were some of the important factors affecting the user satisfaction level. We hope that the results of this investigation will be of interest to both users and developers. In particular, it is hoped that designers will be able to utilize a number of practical recommendations based on these findings. For example, developers may consider reducing the changes in the existing features without justifiable benefits. When changes are inevitable, it must be evident and must not be too drastic. Better help must be provided, as this was one of top features people looked for when they encountered problems. The newer versions must be easier to learn, user friendly, possess a better and simpler interface, and be more interactive to the users.
The Impact of Change in Software on Satisfaction: Evaluation Using CIT
725
Table 11. Top Guidelines for Designing Software to Reduce the Negative Impact of Change Suggestions Not changing features
Help feature Easy to understand
Guidelines • Changes should not be made just for the sake of technology Every extra feature added must be evaluated based on usability as it’s something extra to be misunderstood • Degree of change should be kept in mind to not totally shock the users • Improvement of help feature: While the help feature was one of the most common places users went for refuge, a common user complaint was that the help feature did not help users enough • Help needs to be modular, to enable browsing, and to skip to relevant parts • Reduction of the possibility of misinterpretation of features by reducing ambiguity
• • • • More Transpar• ency • Feedback
Need to provide ongoing feedback as to how input is being interpreted Show what the system is doing - e.g. progress bars, task logs, error logs System should give full information on irreversible actions Checking for mode errors. Present users with a range of valid inputs from which to select Ask for confirmation
Overall, it is believed that this study may potentially improve the understanding of how users perceive change and facilitate a better change. Change caused both satisfaction and dissatisfaction, which mainly depended on the type of change in software features. The addition of new features was a predominant factor for causing satisfaction, whereas any kind of modification to existing features created more dissatisfaction. It was projected that with this new approach to software usability, insightful user experience would be generated and recommendations would result. Recommendations which are based on the suggestions cited by the users are tabulated in Table 11. Software companies must concentrate on the spectrum of users while designing and should avoid making changes just for the sake of technology. Software designers should keep the usability of all target users in mind before changing versions. This is especially important as some of the commonly used software’s are used by people with different educational backgrounds, age groups, and level of experience with the software. The sample size in this study was limited due to the time constraints in telephone interviews and face-to-face interviews. These recommendations need to be tested in the future by a more detailed questionnaire based on broader sample of users for better generalization. Acknowledgements. The authors would like to acknowledge Dr. Gavriel Salvendy, Dr. Steven Landry and all participants who were involved in this study.
References 1. Connor, D.R.: Managing at the Speed of Change. Villard Books, New York (1993) 2. Salvendy, G.: Handbook of Human Factors & Ergonomics, second edition. Wiley, Chichester (2006) 3. Endsley, M.R.: Automation and Situation Awareness. Automation and Human Performance: Theory and Applications. In: Parasuraman, R., Mouloua, M. (eds.), pp. 163–181. Erlbaum, Mahwah (1996)
726
A. Pandith, M. Lehto, and V.G. Duffy
4. Sheridan, T.: Humans and Automation: System Design and Research Issues, Wiley Series, Systems Engineering and Management (2002) 5. Sarter, N.B., Woods, D.D.: Team Play with a Powerful and Independent Agent: Operational Experiences and Automation Surprises on the Airbus A-320. Human Factors 39(4), 553–569 (1997) 6. Rushby, J.: Using Model Checking to Help Discover Mode Confusion and other Automation Surprises. In: Reliability Engineering and System Safety, pp. 167–177. Elsevier, Amsterdam (2002) 7. Sarter, N.B., Woods, D.D.: How in the World did we ever get into that Mode - Mode Error and Awareness in Supervisory Control. Human Factors 37(1), 5–19 (1995) 8. Murrary, B.: Data Smog Newest Culprit in Brain Drain. APA Monitor 29(3) (1998) 9. Fivars, G.: The Critical Incident Technique: A Bibliography. American Institute for Research, Palo Alto (1973) 10. Flanagan, J.C.: The Critical Incident Technique. Psychological bulletin 51, 327–359 (1954) 11. Hartson, H.R., Castillo, J.C.: Critical Incident Data and their Importance in Remote Usability Evaluation. In: Proceedings of the Human Factors and Ergonomics Society 44th Annual Meeting, pp. 590–593. Human Factors and Ergonomics Society, Santa Monica (2000) 12. Nelson, F.G.: Critical Incidents in Training (mimeographed questionnaire). United States International University, Corvallis (1971) 13. Oldenburger, K., Lehto, X., Feinberg, R., Lehto, M., Salvendy, G.: Critical Purchasing Incidents in E-Business. Behaviour and Information Technology 27, 63–77 (2007)
Validation of the HADRIAN System Using an ATM Evaluation Case Study S.J. Summerskill1, R. Marshall1, K. Case2, D.E. Gyi3, R.E. Sims1, and P. Davis1 1 Dept. of Design and Technology Mechanical and Manufacuring Engineering 3 Dept. of Human Sciences, Loughborough University, Leicestershire, LE11 3TU, UK {S.J.Summerskill2,R. Marshall,K.Case,D.E.Gyi, R.E.Sims,P.M.Davis}@lboro.ac.uk 2
Abstract. The HADRIAN human modelling system is under development as part of the EPSRC funded AUNT-SUE project. The HADRIAN system aims to foster a ‘design for all’ ethos by allowing ergonomists and designers to see the effects of different kinds of disability on the physical capabilities of elderly and disabled people. This system is based upon the long established SAMMIE system, and uses data collected from 102 people, 79 of whom are registered as disabled, or have age related mobility issues. The HADRIAN system allows three dimensional CAD data of new products to be imported, with a subsequent automated analysis using all of the 102 sample members. The following paper describes the process and results gathered from a validation study using an ATM design as a case study. The results indicated that fine tuning of the behavioural data built into HADRIAN would improve the accuracy of an automated product analysis. Keywords: Human Modelling, design for all, ergonomics, validation.
1 Introduction Human modelling systems (HMS) such as SAMMIE [1], RAMSIS, and JACK are used in the design of vehicles, manufacturing environments and workstations. These systems use CAD (Computer Aided Design) software to represent the size and shape variability of humans in simulations of environments such as car interiors (see Figure 1). The ability to simulate how people of different sizes and nationalities are accommodated by a product removes the need for costly early physical prototypes. If used correctly within a design process that includes later physical prototypes that verify the HMS analysis results, HMS can be highly cost effective. Currently HMS systems support design activity with a focus on able-bodied people. The aging population in the UK [2] and a greater awareness of the needs of disabled people (Disability Discrimination Act [3]), have raised the prospect of using HMS to simulate the effects of disability, supporting the design of more inclusive products and services. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 727–736, 2009. © Springer-Verlag Berlin Heidelberg 2009
728
S.J. Summerskill et al.
Fig. 1. The use of SAMMIE in the design of an automobile interior (2007)
Using human modelling systems to represent the effects of disability does raise issues in terms of the expertise of the end user. The designers and engineers that use HMS in the product design process generally have little experience of the effects of disability or the coping strategies used by disabled people. Current HMS systems have no methods of representing the joint range of motion limitations, or coping strategies used by disabled people. This paper describes the validation of a new HMS system that has been designed to combine anthropometric, joint range of motion and behavioural data for a sample of disabled and able bodied people. The aim of the new system is to foster a greater awareness of the effects of disability amongst designers and engineers, whilst providing a tool that supports a design process resulting in greater accommodation of the needs of elderly and disabled people. The test bed for the new system (HADRIAN) is the SAMMIE system, established in 1967. [4,5].
2 The HADRIAN System The HADRIAN system is currently in the prototype phase of development. A key feature of the HADRIAN system is that the process of evaluating a product is automated, removing the product designer or engineer from key stages of the HMS process that require knowledge of the behaviour of disabled people. The HADRIAN system allows a product analysis to be performed on the basis of a task description provided by the software user. For example, if a ticket machine is to be evaluated, the user would import a CAD model of the ticket machine into the HADRIAN system and build a list of tasks to be performed. Example tasks for a ticket machine include the use of a control to select on screen options, and depositing money into the coin slot. The HADRIAN system can perform these interactions for all of the sample members built into it (n=102), using data on the coping strategies, joint constraints and anthropometry of each individual to automatically perform tasks such as the positioning of the virtual user to allow the best reach to the various controls built into the product. The system can then identify which users were unable to complete certain task stages based upon their ability to reach and
Validation of the HADRIAN System Using an ATM Evaluation Case Study
729
view the interaction points of a product. This allows design changes to be identified that can increase the accommodation of the product, such as lowering a control or changing a screen angle. The system also contains demographic information such as the type of disability, age, sex and occupation.
3 The HADRIAN Sample of Users HADRIAN is based upon data collected from a sample of 102 people, the majority of whom were registered as disabled, or had age related impaired mobility. The sample members participated in the following data collection activities; anthropometric data, joint range of motion data, reach range data, completion of a questionnaire detailing the use of different modes of public transport, and the collection of baseline data on the ability of the participants to perform kitchen based activities of daily living. Within the sample of 102 people, 59 people have some form of impairment including: limb loss, asthma, blood conditions, cerebral palsy, epilepsy, head injuries, multiple sclerosis, arthritis, vision and hearing impairments, heart problems, paraplegia, Parkinson’s disease, stroke, and dyslexia, amongst others. Of the 43 able bodied people, 20 were aged over 60 and had undiagnosed or minor impairments associated with being older. The remaining participants provide baseline information on the capabilities of non disabled people. All of the sample members included in HADRIAN were capable of living independently. Each subject was assessed using a modified version of the OPCS sample frame [6] to allow a comparison of the severity of disability exhibited by the HADRIAN sample to prevalence and severity of disability in the UK. 3.1 Anthropometric Data Used in the HADRIAN Subject Simulations The following anthropometric measures were collected from each participant. Stature, Arm length, Upper arm length, Elbow-shoulder, Abdominal depth, Thigh depth, Knee-hip length, Ankle-knee length, Ankle height, Foot length, Sitting height, Sitting shoulder height, hip-shoulder length, Chest height, Chest depth, Head height, Eye-top of head, Buttock-knee length, knee height, Shoulder breadth, Hip breadth, Hand length and Grip length. These data were collected using an anthropometer, stadiometer, sitting height table and in some cases, a TC2 3D body scanner. 3.2 Joint Constraint Data Used in the HADRIAN Subject Simulations The following joint constraint measures were collected from each participant; shoulder extension, shoulder flexion, shoulder abduction, shoulder adduction, arm extension, arm flexion, arm abduction, arm adduction, arm medial rotation, arm lateral rotation, elbow extension, elbow flexion, elbow pronation, elbow supanation, wrist extension, wrist flexion, wrist abduction, and wrist adduction. These data were collected using a goniometer. 3.3 Data Collected on Positioning and Posture The prototype version of HADRIAN contains automation data based upon the kitchen tasks that were performed in the user trials. The participants were asked to move a
730
S.J. Summerskill et al.
variety of objects onto a high shelf, a work surface, and into cupboards and shelves of standard kitchen units. This process was video recorded to allow the postures that were adopted to be coded (see Figure 2). Table 1 shows the positioning and postural data that were captured for both ambulant and wheelchair using participants. These coded data were used to inform the behavioural aspects of the HADRIAN task automation. A more detailed description of the HADRIAN system can be found in Marshall et al [4, 5].
Fig. 2. Examples of the postures adopted during kitchen based tasks Table 1. The coding system used to classify the postures exhibited by the HADRIAN sample members during the kitchen tasks Postures and orientations to be coded Orientation of the user to the kitchen cupboards Arm used for the tasks The posture of the legs during the kitchen tasks (ambulant participants only) Back twist Back bend Shoulder Head orientation
Coding criteria for use in HADRIAN Face on, side on, angled approach Left or right Straight, bent 1 (knee angle 170-120°), bent 2 (knee angle 119-40°), crouch (knee angle 39-0°), left kneel, right kneel, full kneel, sitting Left or right >10° Upright (0-10°), lean (11-45°), bend ( 46°+) Relaxed, extended Yaw (neutral, left, right +/- >10°) Pitch (neutral, forward back +/- >10°) Tilt (Neutral, left, right, +/- >10°)
4 The Methodology for the Validation of the HADRIAN System The HADRIAN validation process aimed to verify and improve the data that drives the automation of the product assessment process. This has been done using an ATM (Automatic Teller Machine) case study in collaboration with NCR, the ATM manufacturer. The evaluation of an ATM provided a suitable task in terms of reaching and viewing of a number of interaction points, e.g. card slot, PIN buttons, statement printer etc. NCR provided the team with an ATM fascia that was then mounted on a
Validation of the HADRIAN System Using an ATM Evaluation Case Study
731
rig that allowed the ATM to be adjusted in height. The height adjustment range selected was based upon the international variability of ATM mounting heights depending upon different national standards, as provided by NCR. The height of the highest interaction point (statement printer output) was therefore adjustable through a range of 250mm from 1200mm to 1450mm in line with international variability in mounting height from the floor. It was anticipated that this range would prove to be difficult for wheelchair users in terms of reach to the highest interaction points. There were 160 tasks performed in each study described below i.e. ten participants performing eight tasks for two ATM heights. 4.1 Study 1: ATM Analysis Using an Expert in the Use of Human Modelling Systems with Experience of the Coping Strategies Used by Disabled People Ten HADRIAN subjects participated in the validation process. The subjects selected were; an ambulant disabled female with cerebral palsy who uses a wheeled walking frame, an ambulant disabled male who uses crutches due to leg injuries sustained during a car crash, a crutch user with balance and coordination issues, a powered wheelchair user with limited strength in the right arm due to a stroke, a powered wheelchair user with mobility issues due to a broken back, two wheelchair users with good upper body mobility, and a mobility scooter user with balance and coordination issues. Two non-disabled members of the HADRIAN sample were included as a control. These were a UK male with 99th%ile stature, and a UK male with 1st%ile stature. The sample selected was biased towards wheelchair users as it was anticipated that these users would struggle with ATM usage due to limitations in reaching ability. Also, the orientation of wheelchair users to allow the most efficient reach to the various interaction points was seen as an important variable to test. The first study performed involved an examination of the ATM design using the SAMMIE HMS by a consultant with 10 years experience of applying HMS to disability related design problems. The anthropometry and joint constraint data collected from the HADRIAN sample were used by the expert. The positioning of the human models and the posturing of the human were based upon the experts’ experience. 4.2 Study 2: User Trials with 10 of the Original HADRIAN Sample Members Using the ATM Rig Each user was presented with the ATM at the 1200mm and 1450mm mounting heights and were asked to reach and view each of the ATM interaction points. Each participant was video recorded whilst the tasks were being performed so that a later comparison to the HADRIAN automated process could be made in terms of the postures and positions adopted. The position of each user relative to the ATM fascia was collected, in combination with information on task failures, and the postures adopted by the participants. 4.3 Study 3: An Automated HADRIAN Analysis of the ATM Design The final stage of the validation process involved the use of HADRIAN to perform an automated analysis on the ATM design using the same variables as found in stages 1 and 2. A full description of the HADRIAN automated analysis procedure can be
732
S.J. Summerskill et al.
found Marshall et al [4, 5]. A summary of the automation process for the analysis of interaction point accessibility is as follows; 1. Determine the relative positioning of the reach/view target and the human model. The human model is automatically positioned to enable reach to the currently selected interaction point if possible. 2. Measure straight-line distances between the target and the key human model reference. (eye-point for vision, shoulder for upper-limb reach, hip for lower-limb reach). If out of reach by a large margin then we move to stage 4. If not, we continue. 3. Depending on the reference to target distance one or more of the following will be applied: − The head /neck is rotated such that the head is facing the target − The head /neck is rotated and extended / flexed such that the eye-point to target distance equals the desired parameter value − The torso is rotated and flexed such that the eye-point to target distance equals the desired parameter value − The reference to target vector is calculated and the shoulder ‘pointed’ along that vector to achieve a successful upper-limb reach − The torso is rotated and flexed in the direction of the target to achieve a successful upper-limb reach. If there is still a failure we move on to point 5. 4. If the reference to target distances are greater than those accommodated by a posture change one or more of the following will be applied: − The human model is turned to face the target. − The human model is moved closer to the target. If there is still a failure we move on to point 5. 5. In the event of absolute failure, a failure is flagged for the results and the next task element is addressed. The comparison between the three stages of the validation process provided an opportunity to examine the effectiveness of the HADRIAN automation algorithms and to highlight opportunities for the fine tuning of the HADRIAN process.
5 Results The following section discusses the comparison of the results obtained from the three studies in terms of the number of task failures and the orientation of the user to the ATM. 5.1 Task Completions The task completion data for each study were compared. In studies 1 & 2 there was only one participant that was unable to reach an interaction point. This participant (Participant 2) is a powered wheelchair user who has suffered from a stroke and therefore has weakness down the right hand side of the body, and is unable to walk. The tasks that were identified as fails in studies 1 & 2 were reaching to the statement printer and receipt output slots, when the ATM was in the highest position (1450mm
Validation of the HADRIAN System Using an ATM Evaluation Case Study
733
to the statement printer). The results from Study 3, the HADRIAN automated analysis, showed nine task failures across all participants and tasks (160 tasks were preformed). Six of the task failures that were generated by the automated analysis in HADRIAN were associated with interaction points that were in the top half of the ATM panel, being reached to by wheelchair users. Five of the nine task failures were associated with one participant. This participant was the same participant that had task failures in stages 1 and 2 (participant number 2). The reason for the additional task failures produced by the HADRIAN system was found to be that participant number 2 was able to shuffle forward in his seat when using a facing orientation, allowing reach to the interaction points that were shown as fails in the HADRIAN simulation. 5.2 Orientation of the Human Model Ambulant users. All ambulant users faced the ATM and did not need to reposition the feet to allow control interactions in all three studies. Wheelchair users. The orientations adopted by the wheelchair using participants in each study were categorised according to a facing, oblique and lateral position i.e. facing equals a perpendicular orientation of the wheelchair user to the ATM, oblique equals a diagonal orientation, and lateral equals a lateral orientation to the ATM .
Fig. 3. The classification of wheelchair user orientation in relation the ATM Table 2. A comparison of the orientation of the wheelchair to the ATM for each of the three studies performed Wheel chair subject 1 2 3 4
ATM Height Low High Low High Low High Low High
Study 1. Expert user oblique oblique lateral lateral lateral lateral oblique oblique
Study 2. User trials oblique oblique Face Face lateral lateral oblique oblique
Study 3. HADRIAN Facing Facing /Lateral Facing /Lateral Facing /Lateral Facing /Lateral Facing /Lateral Facing Facing /Lateral
734
S.J. Summerskill et al.
Fig. 4. The reach to the receipt slot of the ATM performed in the three studies, from left to right, the SAMMIE study, user trials and HADRIAN analysis
Each of these three orientation categories had a range of +/- 15 degrees from the positions shown in figure 3. Table 2 shows a comparison of the orientation of the wheelchair to the ATM for each of the three studies performed. The analysis performed by the SAMMIE expert user matched the wheelchair orientations exhibited in the user trials, with the exception of participant number 2. The HADRIAN technique produced only facing and lateral positions for the wheelchair users. Figure 4 shows a task being attempted by a single user (participant 2) in each of the three studies performed.
6 Discussion of Results and Recommendations for the Improvement of the Automated HADRIAN Protocol The comparison of the task completion and wheelchair positioning results for the three studies have highlighted areas in which the HADRIAN automated analysis protocol can be improved. The task failures that were found by HADRIAN system, but not in the other two studies were the result of four issues that have been identified in the analysis process, these were; 1. The prototype HADRIAN system works on the assumption that a facing orientation to the product interaction points will be used. If a failure occurs in the facing orientation a lateral orientation is used. As has been demonstrated in the validation user trials wheelchair users often adopt an oblique orientation to the task interaction points. An oblique orientation allows users to improve the reach to the interaction points when compared to the facing orientation, whilst also allowing vision without excessive body and neck rotation to allow vision of the reaching target. It is therefore recommended that an oblique orientation attempt should be added to the HADRIAN automated protocol.
Validation of the HADRIAN System Using an ATM Evaluation Case Study
735
2. The prototype HADRIAN protocol uses a standard sitting posture for the wheelchair users built into the database. The postures adopted by the wheelchair users often differed from the assumed posture used in the HADRIAN protocol in terms of the angle of the lower leg, increasing the distance from the reach targets in a facing orientation of the wheelchair in HADRIAN. It is therefore recommended that the posture adopted by the wheelchair user should be more accurately replicated by the HADRIAN system. 3. The prototype HADRIAN system does not use a CAD model of the specific wheelchair used by each participant. Instead the wheelchair user is placed at the correct sitting height for the specific wheelchair used. Data was gathered during the original HADRIAN data collection that allows all wheelchairs to be accurately modelled in terms of the length, width, sitting height, handle height and user orientation in the volume of the wheelchair. It is therefore recommended that the wheelchairs that were modelled by the SAMMIE expert user in study 1, should be used in the HADRIAN automated analysis. This will allow the exploration of situations where the wheelchair geometry blocks required postures, or orientations of the wheelchair to the reaching and viewing targets of products. 4. The prototype HADRIAN system does not use the data collected that quantifies the ability of the user to twist the upper body. This was highlighted a reason for task failures in lateral wheelchair orientations in all cases other than those found for participant 2, discussed above. The implementation of the upper body twist data in the automated HADRIAN analysis is recommended. In addition, it is recommended that the HADRIAN system includes a collision detection routine, which can detect if the postures adopted are interfering with the structures of the products being interacted with. On a small number of occasions the arm of the HADRIAN human model would intersect with some part of the ATM structure. This should be avoided.
7 Conclusions The HADRIAN validation process was designed to verify and improve the automation of the HADRIAN analysis of products. The results for the ambulant disabled and nondisabled participants that were predicted by the HADRIAN system were found to be accurate. However, the exercise highlighted that additional data gathered from the wheelchair users needs to be incorporated into the HADRIAN analysis protocol in order to increase the accuracy of the results. The next stage in the development of the HADRIAN system will be to implement the changes recommended in this paper, and to perform a further validation exercise to test the system further. Initially the revised version of HADRIAN will be tested using the ATM example. A second validation study will be performed in June of 2009, and will involve the analysis of the interaction points found in the Greenwich Docklands Light Railway train station in London, England. This process will analysis the use of ticket machines, lifts and rail vehicles by elderly and disabled people.
736
S.J. Summerskill et al.
References 1. Porter, J.M., Marshall, R., Freer, M., Case, K.: SAMMIE: a computer aided ergonomics design tool. In: Delleman, N.J., Haslegrave, C.M., Chaffin, D.B. (eds.) Working Postures and Movements – tools for evaluation and engineering, pp. 454–462. CRC Press LLC, Boca Raton (2004) 2. WHO. WHO Ageing. World Health Organisation (2008), http://www.who.int/topics/ageing/en/ (accessed, 01/02/2009) 3. Disability Discrimination Act. Office of Public Sector Information (2005), http://www.opsi.gov.uk/acts/acts2005/ukpga_20050013_en_1 4. Marshall, R., Case, K., Porter, J.M., Sims, R.E., Gyi, D.E.: Using HADRIAN for Eliciting Virtual User Feedback in ’Design for All’. Journal of Engineering Manufacture; Proceedings of the Institution of Mechanical Engineers, Part B 218(9), 1203–1210 (2004) 5. Marshall, R., Summerskill, S.J., Gyi, D.E., Porter, J.M., Case, K., Sims, R.E., Davis, P.: A design ergonomics approach to accessibility and user needs in transport. In: Contemporary Ergonomics 2009, proceedings of the Annual Conference of the Ergonomics Society, London, UK (April 2009) 6. Martin, J., Meltzer, H., Elliot, D.: OPCS surveys of disability in Great Britain: The prevalence of disability among adults. HMSO, London (1994)
A 3D Method for Fit Assessment of a Sizing System Jiang Wu, Zhizhong Li, and Jianwei Niu Department of Industrial Engineering, Tsinghua University, Beijing, 100084, P.R. China [email protected]
Abstract. A three-dimensional (3D) method for the evaluation of a sizing system via objective fit assessment is introduced in this study. Taking the evaluation of a helmet sizing system as an example, geometrical models of the human head surfaces of a target population are generated based on 3D anthropometric measurement at first. Then a helmet model for each size is prepared. For each individual, a helmet model of his corresponding size is virtually worn on his head with proper relative position and orientation. After that, objective fit assessment criteria are calculated. Finally, Statistical analysis on these criteria provides an objective evaluation of the sizing system. This method affords a rapid, low-cost, and quantitative approach to carry out fit assessment on a sizing system when critical fit is concerned. Keywords: fit assessment, sizing system, 3D modeling.
1 Introduction Sizing systems provide size specification for different population groups based on some body dimension data from demographic anthropometric surveys and studies. The goal of sizing is to choose limited size groups to cover large percentage of the population. They could be applied to the garment industry, work place design, helmet design, and other operator related products. A sizing system is created by using various methods from trial-and-error to complicated statistical methods. Sizing approaches have been improved and optimization of sizing systems has been developed all along [1]. Statistical methods are widely used to increase accommodation of the population and reduce the number of sizes in the system. For example, in order to establish a pants sizing system, Hsu and Wang performed factor analysis to extract important sizing variables, and used the decision tree technique to identify and classify significant patterns in soldiers’ body shape [2]. The head is one of the weakest parts of human body, thus, helmets are broadly used as a protective instruments both in the battlefield and in people’s daily work and life i.e., construction worker safety, motorcycle, bicycles, football, baseball, and wrestling. With the increasing demand of good performance and fit of helmet, more and more researchers pay attention to helmet sizing systems. Sippo and Belyavin (1991) introduced a sizing system based on the data of an anthropometric survey on 2000 royal air force aircrew from 1970 to 1971 [3]. Three head dimensions including length, breadth, and pupil-vertex height were used to evaluate the fit of this system. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 737–743, 2009. © Springer-Verlag Berlin Heidelberg 2009
738
J. Wu, Z. Li, and J. Niu
Robinette and Whitestone (1992) used dimensions of head length and head width to divide people into several groups [4]. Bradtmiller and Beecher (1993) introduced their new sizing method based on clustering of vectors from the center to the surface of human heads [5]. Chen et al. (2002) proposed a new sizing system using head threedimensional (3D) anthropometric data collected by Computerized Tomography (CT) scanning [6]. Studies indicated that poor fit was the most frequent reason for returning in apparel industry [7]. Bad shoe fit was proved to be the primary reason for sweaty feet, chafing, bunion, and feet discomfort [8]. There are several strategies to improve the fit of products: population grouping, adjustable products, traditional customization, and mass customization [9]. An effective and widely used approach to design well-fit products is to analyze human body forms so as to classify them into several groups. As mentioned above, sizing is an important way to increase of products fit, and fit is a necessary aspect in a sizing system evaluation. In this study, a method to evaluate a sizing system via objective fit assessment in 3D virtual environment is introduced. This method can give an accurate quantitative assessment with low cost and short development cycle dramatically.
2 Fit Assessment In the product design process, fit testing is expensive but necessary. It has been used to ensure that a certain proportion of a population can wear the clothing or equipment of a given size. Some related guidelines were established by McConvile et al. in 1979 [10]. With helmet system becoming more complex, fit problems which may be ignored before could be threats to performance and individual safety now [11]. Consequently, fit assessment is an important consideration in the evaluation of a sizing system. Many approaches of fit assessment were proposed and implemented. They can be classified into two categories: subjective assessment and objective assessment. 2.1 Subjective Fit Assessment Subjective assessment was more widely used than objective assessment to evaluate the fit of a sizing system. In a typical subjective fit testing, selected subjects are first required to wear prototypes of all available sizes. Experts collect assessment data via questionnaire survey. Two kinds of data are required during a fit testing: anthropometry and fit measuring quality. Statistic methods are used to complete the fit evaluation. Some studies with subjective fit assessment in head sizing systems were carried out. Below are some examples: In early 1985, the U.S. Army Aeromedical Research Laboratory (1987) initiated an Integrated Helmet and Display Sighting System helmet fitting program. This program aimed to assist the Army to establish fitting requirements and procedures for Advanced Attack Helicopter Program. Questionnaires were used to execute fit assessment in several dimensions including thermal comfort, stability, noise attenuation, ear cups comfort, and chinstrap comfort [12]. The U. S. Army Aeromedical Research Laboratory surveyed the accommodation situation of female hairstyle and flight helmet in 1999 [13].
A 3D Method for Fit Assessment of a Sizing System
739
A 24-months experiment assessed current martial helmet in the U. S. Army. Questionnaire survey was distributed to 1,123 soldiers after helmet wearing to evaluate the fit situation [14]. 2.2 Objective Fit Assessment In a subjective fit assessment, accurate quantitative criteria for evaluation are difficult to propose due to the complexity and variations of head shapes. Furthermore, the helmets will be continuously worn by subjects till the assessment is finished. If several hours are required for each wearing, the cost, time and patience of subjects in the whole testing may be a big problem. Thus, objective fit assessment becomes interesting to both researchers and practitioners. Two methods of objective fit assessment are broadly used in garment industry. The first one requires comparison between garments and human body linearly. In the other one, the amount of pressure placed by a garment on a specific body location is measured [15]. Using technologies of graphical presentation and analysis of the spatial relationship between helmet and human head, helmet fit can be evaluated in a more accurate degree [16]. This is an important reason why more and more objective fit assessments have been implemented. The Euclidean distance is the most accepted criterion for objective fit evaluation of a sizing system [17] [18]. In the study of Meunier et al. (2000), fit was assessed by the distance between the insider of the helmet and the surface of the human head [16]. Two approaches were conducted and compared. One was to measure distance using depth probes through holes drilled in the helmet. The other was to calculate the distance via surface data of human head and helmet captured by 3D laser scanning. It was difficult to determine which method yielded the truest results. A computer-aided helmet fit testing reported by Corner et al. (1997) used distances between the interface of a helmet and the surface of a head to draw a fit color map [19]. Many other criteria have been used besides Euclidean distance. In the study executed by Chang, et al. (2001), stability was proved to be enhanced if the helmet fitted the head better and consequently the rotation moments were reduced. Pressure exerted by the helmet on the head can be measured by pressure sensors between helmet and head for objective fit assessment.
3 3D Fit Assessment Method As mentioned above, objective methods give an opportunity to obtain more accurate data for quantitative assessment at a lower cost of time and money than subjective methods. With the development of 3D scanning and Computer-Aided Design (CAD) technologies, objective fit assessment can be done in a virtual environment. Cost can be reduced, and the assessment cycle can be shortened dramatically. Furthermore, the geometric spatial relationship between helmet and head is visible. A block-distance based sizing method proposed by Niu (in press) considering 3D control points of head surfaces is supposed to be more effective for fitting design than the traditional head sizing methods [20]. Carrying fit assessment on this sizing system will be taken as an example to introduce the proposed 3D fit assessment method. Taking helmet sizing as an example, this method involves 3D geometric modeling of
740
J. Wu, Z. Li, and J. Niu
subject heads from a 3D anthropometric database and helmets of sizes, head-helmet mating, and fit calculation. 3.1 3D Human Head Modeling Since no physical subjects participated in this virtual assessment, 3D human head models were critically important for the accuracy of results [10]. This emphasizes the quality 3D anthropometric measurement. Then various geometric modeling methods can be used for the modeling of the head. A convenient way for the modeling is to use free form modeling functions of a CAD software, such as Unigraphics which was used in our study. In our case study, the 3D human head data of 510 young male Chinese soldiers was collected by a Chinese military institute in 2002 [6]. Noise data was removed manually by visual check under the CAD environment before the modeling. The head postures of some subjects were not properly settled for scanning, so their postures were standardized before mating with helmet models. According to the China National Military Standard GJB 5477-2006 [21], the size number was set as three. The 3D head samples were classified by a computer program for the block-distance based sizing [20] into three groups. For each group, a representative 3D head model was generated by calculating mean values of all samples in this group. These representative head models were used as helmet design reference. 3.2 3D Helmet Modeling With the development of CAD technology, it is no longer difficult to set up a helmet model in a virtual environment any more. If the purpose of fit assessment is to evaluate current products, then reverse engineering technology can be adopted, basically involving 3D scanning and surface modeling from a point cloud. While if the purpose is to evaluate a sizing system that corresponding products are to be designed, CAD modeling functions have to be called for creating new models of product sizes. For helmet products, , a rapid computer-aided helmet shell design method based on 3D head anthropometric data under Unigraphics system was proposed and a corresponding toolkit was developed by Liu, et al. (2007) [22]. This toolkit was adopted in our case study to establish the three helmet models for the three sizes. 3.3 Head-Helmet Mating and Fit Assessment The spatial relationship between a head and a helmet should be set up first. According to the right wearing requirement and experiences of experts, every virtual head wore its corresponding helmet virtually with proper relative position and orientation (see Fig.1). Because the head postures have been standardized, the mating mainly involves translational movement for equal gapping between a helmet and a head. Visual observation at the sagittal and lateral sections can help the mating. Four groups of criteria will be adopted in the fit assessment and comparison between the 3D clustering method and traditional helmet sizing system: head-helmet distance/gap, mass/weight, center of mass, and inertial moments of helmet. The mean value, extreme value, and variance of each criterion are to be calculated. Statistical testing, e.g. Analysis of Variance (ANOVA), on these criteria will be conducted for the comparison.
A 3D Method for Fit Assessment of a Sizing System
741
Fig. 1. Spatial relationship between head and helmet
Two kinds of distances will be calculated: the minimum distance and fixed points distances between the external head model surface and the inner helmet surface. Colors will be applied to help visualize the acceptability of the distances. In a physical experiment, the number of fixed points on the helmet is always limited due to the volume of holes and measure tools. In the experiment by Meunier (2000), only 13 points were drilled in a helmet [16]. On the contrary, more fixed points can be tested on virtual helmet models. Taking the advantages of strong calculation ability of a computer, selecting points evenly distributed on a helmet and calculating the gap at these positions are easy with the help of CAD software. In this study, 100 points will be tested. By a dedicated computer program, the distances at these 100 fixed points and the minimum distance will be calculated for each head mated with the helmet of the corresponding size virtually. If the distances are between 15 and 25 mm (GJB 5691-2006) [23], the relevant lines pointing from the helmet surface point to the head surface in a normal direction will be colored green which means acceptable fit. Otherwise, the lines will be colored red. By this method, the spatial relationship and fit condition between the head and the helmet can be easily seen on the computer. The number of red lines can also be used for fit assessment. The lighter a helmet is, the more comfortable a wearer will feel, and the less fatigue his/her muscles will have. The function to calculate the mass of a solid model is often available in a 3D CAD software. Once the mass centers of human head and helmet are inconsistent, the stability of a helmet will be reduced if the vertical distance between the mass centers of the helmet and the head is long. The mass center of a helmet is often higher than a head, so the
742
J. Wu, Z. Li, and J. Niu
shorter the distance is, the more comfortable the wearing should be. Estimation of mass center of a model is also often available in a 3D CAD software.
4 Conclusion Objective fit assessment can support quantitative calculation and comparison to evaluate a sizing system. With the help of 3D scanning and CAD technologies, objective fit assessment can be done in a virtual environment. Compared with physical assessment and subjective assessment, this method can afford accurate results with dramatically reduced cost. The 3D graphics visualize how an individual wears the helmet, and give a hand in comprehensive fit assessment. This method can also support fit assessment on a large population, provided that the 3D anthropometric database of this population is available.
Acknowledgements This study is supported by the National Natural Science Foundation of China (No.70571045).
References 1. McCulloch, C.E., Paal, B., Ashdown, S.A.: An optimization approach to apparel sizing. Journal of the Operational Research Society 49(5), 492–499 (1998) 2. Hsu, C.H., Wang, M.J.J.: Using decision tree based data mining to establish a sizing system for the manufacture of garments. International Journal of Advanced Manufacturing Technology 26(5-6), 669–674 (2005) 3. Sippo, A.C., Belyavin, A.J.: Determining aircrew helmet size design requirements using statistical analysis of anthropometric data. Aviation, Space, and Environmental Medicine 66(1), 67–74 (1991) 4. Robinette, K.M., Whitestone, J.J.: Methods for characterizing the human head for the design of helmets. AL-TR-1 992-0061. Armstrong Laboratory, Wright-Patterson AFB OH (1992) 5. Bradtmiller, B., Beecher, R.M.: An approach to creating three-dimensional head forms for helmet sizing and design. In: 31st Annual Symposium Proceedings, pp. 244–249. SAFE Association, Yoncalla (1993) 6. Chen, X., Shi, M.W., Zhou, H., Wang, X.T., Zhou, G.T.: The “standard head” for sizing military helmet based on computerized tomography and the head form sizing algorithm (in Chinese). Acta Armamentrarii 23(4), 476–480 (2002) 7. Iowa Cooperative Extension Service: Consumer choices: finding your best fit (1996), http://www.extension.iastate.edu/Publications/PM1648.pdf 8. Rossi, W.A.: The futile search for the perfect shoe fit. Journal of Testing and Evaluation 16, 393–403 (1988) 9. Li, Z.Z.: Anthropometric topography. In: Karwowski, W. (ed.) International Encyclopedia of Ergonomics and Human Factors, 2nd edn. Taylor and Fancis, London (2005)
A 3D Method for Fit Assessment of a Sizing System
743
10. McConvile, J.T., Tebbetts, I., Alexander, M.: Guidelines for Fit Testing and Evaluation of USAF Personal-Protective Clothing and Equipment, AAMRL-TR-79-2, AD AO 65901, Aerospace Medical Research Laboratory, Aerospace Medical Division, Air Force Systems Command, Weight-Patterson Air Force Base, OH 45433 (1979) 11. Robinette, K.M.: Fit Testing as a Helmet Development Tool. In: Proceedings of the Human Factors and Ergonomics Society 37th Annual Meeting, pp. 69–73 (1993) 12. Rash, C.E., Martin, J.S., Gower Jr., D.W., Licina, J.R., Barson, J.V.: Evaluation of the US Army Fitting Program for the Integrated Helmet Unit of the Integrated Helmet and Display Sighting System, AD-A191616. Fort Rucker. The U.S. Army Aeromedical Research Laboratory, Alabama (1987) 13. McEntire, B.J., Murphy, B.A., Mozo, B.T., Crowley, J.S.: Female Hairstyle and Flight Helmet Accommodation: The AMELIA Project. Fort Rucker. The U. S. Army Aeromedical Research Laboratory, Alabama (1999) 14. Ivins, B.J., Schwab, K.A., Crowley, J.S., McEntire, B.J., Trumble, C.C., Brown, C.F.H., Warden, D.L.: How satisfied are soldiers with their ballistic helmets? A comparison of soldiers’ opinions about the advanced combat helmet and the personal armor system for ground troops helmet. Military Medicine 172(6), 586–591 (2007) 15. Ashdown, S.P., Loker, S.: Use of Body Scan Data to Design Sizing Systems Based on Target Markets. National Textile Center Project: S01-CR01 (2002), http://www.ntcresearch.org/pdf-rpts/Bref0604/ S01-CR01-04e.pdf 16. Meunier, P., Tack, D., Ricci, A., Bossi, L., Angel, H.: Helmet accommodation analysis using 3D laser scanning. Applied Ergonomics 31, 361–369 (2000) 17. Meunier, P., Yin, S.: Performance of a 2D image-based anthropometric measurement and clothing sizing system. Applied Ergonomics 31, 445–451 (2000) 18. Luximon, A., Goonetilleke, R.S., Zhang, M.: 3D foot shape generation from 2D information. Ergonomics 48(6), 625–641 (2005) 19. Corner, B., Beecher, R., Paquette, S.: Computer-aided fit testing: an approach for examining the user/equipment interface. SPIE 3023, 37–47 (1997) 20. Niu, J.W., Li, Z.Z., Salvendy, G.: Multi-resolution shape description and clustering of three-dimensional head data. Ergonomics (accepted) 21. GJB 5477-2006. 3D Head-face Dimensions of Male Soldiers (in Chinese) 22. Liu, H., Li, Z.Z., Zheng, L.: Rapid preliminary helmet shell design based on threedimensional anthropometric head data. Journal of Engineering Design 19(1), 45–54 (2007) 23. GJB 5691-2006. The Sizes for Military Helmets (in Chinese)
Analyzing the Effects of a BCMA in Inter-Provider Communication, Coordination and Cooperation Gulcin Yucel1,3, Bo Hoege2, Vincent G. Duffy3,4,5, and Matthias Roetting2 1
School of Industrial Engineering, Istanbul Technical University, Istanbul, Turkey Department of Psychology and Ergonomics, Chair of Human-Machine Systems, Technische Universität Berlin, Franklinstr. 28-29, 10587 Berlin, Germany 3 School of Industrial Engineering 4 Regenstrief Center for Healthcare Engineering 5 School of Agricultural and Biological Engineering, Purdue University, Grissom Hall, West Lafayette, Indiana USA
2
Abstract. Many hospitals have implemented various kinds of information technologies. Using information technology can improve communication and improve patient safety. One of the information technologies is the application of bar code medication administration (BCMA). For achieving a successful implementation, a semi-formal notation form is used to model and to evaluate the effects of a BCMA system on communication-coordination-cooperation (C3) processes among nurses, physicians and pharmacists. This model could support a successful implementation of the BCMA system by identifying potential unintended and supportive effects related to C3, and providing recommendations for a better implementation. This article describes an approach for an analysis and evaluation of a planned BCMA implementation. Keywords: healthcare, BCMA, C3, work process.
1 Introduction According to the Institute of Medicine Report To Err is Human there are 98,000 medication errors per year and 44,000 Americans die as a result of medical errors. This number is even higher than the number of people who died because of AIDS, breast cancer, or motor vehicle accidents per year [1]. Beside medical errors, Adverse Drug Events (ADEs) caused by known drug allergies are common, costly and often severe [2]. ADEs are common but also preventable [3]. For ADE prevention, the implementation of information technology (IT) such as BCMA can enable hazard alerts. A successful implementation has the potential to reduce the risk of allergic reactions to drugs [4]. There are also research efforts which measure ADEs before and after implementation, and their results show that dispensing errors and potential ADEs decreased after implementing bar code technology [5]. Although the implementation of bar code technology appears to reduce medical errors [6], there is some research which reveals many unintended consequences of BCMA [7, 8, 9, 10]. V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 744–753, 2009. © Springer-Verlag Berlin Heidelberg 2009
Analyzing the Effects of a BCMA in Inter-Provider Communication
745
According to The United States Pharmacopeia MEDMARX database [11] which includes 176,409 medication-error records for 2006, approximately 25% of the errors occurred due to implemented IT in a healthcare sector. Most of the harmful errors were caused by mislabeled barcodes on medications (5%). Furthermore, there were also errors recorded due to incorrect scanning of barcodes and overrides of barcode warnings [12]. Some studies were conducted and observed the side effects of health information technology (HIT) but limited research has been done to analyze the reasons for these side effects. In order to investigate the reasons of the side effects, problems in current intra-organizational communication, current HIT usage in healthcare communication and its potential benefits and pitfalls need to be analyzed [13, 14, 15]. There appear to be no research results at the moment which include analyses of BCMA side effects and supportive effects in communication-coordination-cooperation mechanisms. In this study, the C3-modeling technique is used to evaluate the effects of a BCMA system on communication-coordination-cooperation among nurses, physicians and pharmacists. Also, this analysis can show effects on healthcare service’s efficiency.
2 Methodology For understanding the BCMA process, a description follows in parts. On the pharmacy side, bulk dose products are placed into blister and barcode labeled medications. These medications are delivered to the health care units. After nurses take the bar-coded medications, first they scan the patient’s barcode. Secondly, each medication is scanned. If there is mismatch between patient barcode and medication time, type, or dose, the system gives a warning. By doing this, the system provides the Five Rights of medication administration: right patient, right medication, right dose, right time, and right route. But BCMA implementation has more effects on the work processes. It can be compared with a logistic process of sending a parcel to an addressee. In this process, sender, deliverer, and the parcel can be registered at each step of the total process of medication administration. The medication is registered in an inventory management system and requires an IT backbone which is connected with each process step. Beside BCMA potential to provide patient safety improvement, the major question is now: which adverse effects could occur while and after the implementation? It is of major importance that those unwished effects do not affect, in the worst case, patient safety negatively. Since IT implementation also affects and changes work processes. It is hoped that the work load of healthcare personnel like nurses, physicians or pharmacists would be affected by BCMA implementation positively. In order to find side effects and supportive effects, this paper suggests an approach to analyze work processes of healthcare personnel and prepare the involved staff for the change in their daily duties. It can be assumed that the implementation of a computer based tool like BCMA not only will affect the work process of medication administration but also will affect aspects of team work since BCMA connects all persons who are involved in medication administration (e.g., nurses and pharmacists).
746
G. Yucel et al.
To measure the influence of a BCMA implementation and to predict, estimate and hypothesize possible side effects, a definition for team work and an operationalization of the team’s performance are necessary. Since physicians, nurses, and pharmacists share the final objective, to take care of the patients, the definition by Sundstrom et al. [16, in 17] is applicable: “Work group and work team is defined as interdependent collections of individuals who share responsibility for specific outcomes for their organizations”. So the involved staff can be called a team. This team is connected through the process of diagnosis and treatment planning by physicians, medication preparation by pharmacists and medication administration by nurses. The team members are interacting with each other along this process to take care of the patients. Interaction means, in this case, that the information about medication administration is shared by each team member in the form of records in the IT system (i.e. databases of medication, patients, and staff). Sacco [18] splits the results of a team’s group interaction process into performance outcomes (e.g., performance quality) and other outcomes (e.g., member satisfaction) which allows one to operationalize the total outcome of a team. Sacco calls this outcome group effectiveness. Furthermore, along the process of medication administration, BCMA will have an effect on the team’s interaction. As mentioned in the motivation, BCMA implementation should also increase a healthcare team’s efficiency. The efficiency of the team is influenced by the efficiency of the communication, coordination and cooperation (C3) between the team members. Figure 1 points out that coordination results from communication, but communication also includes coordination. Furthermore, communication is necessary for cooperation and cooperation requires coordination. The strong dependencies show how important these processes are for optimal operation. cooperation
re
qu
r sa es ec
ire s
n is or yf
results from includes coordination
communication
Fig. 1. C3 processes and their dependencies (after [19])
In the literature several reasons can be found why it is necessary to concentrate on C3-factors. According to Bates [20], communication failures are the most common factors that cause the ADEs, so improving communication can be one of the main strategies to prevent medication errors [21, 22]. Also, according to Fortescue et al.
Analyzing the Effects of a BCMA in Inter-Provider Communication
747
[22], improving communication among physicians, nurses and pharmacists could prevent 17.4% of all errors and 29.2% of potentially harmful errors. Since healthcare is a highly collaborative process which requires effective coordination and communication among clinicians [23], different professional groups’ work should be coordinated effectively in order to provide a shared understanding of the medical care plan. When the HIT system can not provide the necessary level of coordination among clinicians, clinicians by-passed the system or added steps into system [14]. During HIT implementation, limited focus on only one group of professionals neglect the collaborative, multi-professional nature of medical workflow [14]. Therefore, in this study, nurses, physicians, and pharmacists are included in the analysis. For analyzing effects on communication, coordination and cooperation, a work analysis will be conducted to obtain an understanding for the work processes. Additionally, quantitative and qualitative data will be collected. Finally, the C3-notation form [24] will be used to build a model of existing work processes. These steps should lead to an understanding of how BCMA affects work processes and give evidence about side effects and supportive effects. 2.1 Analysis of Work For analyzing the interaction processes during medication administration, an analysis of work will be conducted incorporating three steps which already have been used in previous studies [25, 26]. First, tasks and subtasks are registered by random observation followed by interviews about task composition and task flow. By doing the interviews, the observation of the action can be validated, since it is possible that not all subtasks are detected by the observer. Secondly, a system of categories is developed for a systematic, exact and detailed registration of all tasks and subtasks. Finally, the staff is observed while acting and the tasks are registered to the system of categories. 2.1.1 Data Collection Qualitative data will be collected through interviews. The interviews reveal both supportive effects and unwanted side effects of BCMA on communication, coordination and cooperation among clinicians. The sub-domain in the interview questions are type of information communicated, communication channels, the difficulties in communication, errors caused by communication problems, cooperation level and expectation and experiences about BCMA. Interview questions are designed in two Parts. Part I is designed to analyze current communication and coordination among clinicians in the paper-based system. Beside current communication and coordination features, it will provide the expectations about potential side and supportive effects of BCMA on C3. Part II is designed to analyze C3 in the BCMA implemented setup. Part I questions will be applied to a hospital which is going to implement BCMA, and Part II questions will be applied to another hospital which has already implemented BCMA. The target audiences for interview questions are nurses, pharmacists, physicians and administrative people. For each target group, different questions are prepared. To reduce the amount of questions for one respondent, different question groups were prepared. Each respondent group will answer both the common questions and group specific questions. These group
748
G. Yucel et al.
specific questions are shown as sub-questions with letter under the main question. A subset of interview questions (for nurses of the hospital which is going to implement BCMA) is given in Appendix 1. 2.2 C3-Notation The C3-notation is a semi-formal notation form and is based on the Unified Modeling Language (UML). This notation form allows the modeler to note, describe, and to visualize weakly structured work activities for generating a mutual understanding of the process itself. It is used to note an actual process as well as to design theoretical processes. Furthermore, if best practice can be obtained, the process can be described by C3-notation for keeping this knowledge about a process. Nurse
Physician
Pharmacist
feedback by ward round diagnosis
HIT
HIT medication plan
treatment preparation
medication preparation
medication package sending by courier accessing med.storage
medication verification re-order by HIT no valid? yes medication administration
Fig. 2. Example C3 processes in HIT implementation
Highly collaborative C3-processes (see figure 2) can be more understandable in a scheme where all involved interaction partners are put in parallel columns. The interaction between these partners is noted in symbolic shapes which are connected to the interaction partners. Figure 2 shows in an example manner how a physician conducts a diagnosis and creates a medication plan. Thereafter, the information is stored in the HIT-system (tool/arrow symbol followed by HIT) and used by pharmacists as an order to prepare the requested medication, while nurses can prepare the treatment. After the medication package is delivered by a courier, the medication can be administered. Direct feedback is possible through physician’s ward round. The medication can be changed by examining the patient or by direct feedback of the nurses’ experiences. The backbone of the information flow is operated by the HIT-system which is used for patient
Analyzing the Effects of a BCMA in Inter-Provider Communication
749
data, medication order and treatment procedure. The C3-processes show, that there is a strong influence of HIT on the communication between nurse, physician and pharmacist. There does not exist any direct communication with the pharmacists (in this example). Another possibility of creating a C3 model is to conduct a focus group with experts (e.g. selected hospital staff) and to use the C3-notation to formalize the discussion and its processes. This approach was successfully conducted to analyze the influence of industrial product-service systems on human-machine interaction and the design of interfaces [27]. Furthermore, C3 was used by several projects to create models of cooperative work [28], in the automotive sector [29], as well as in the area of process engineering [30]. In this study, a focus group discussion can be conducted to discuss the result of C3 modeling with pharmacists, nurses and physicians. Weak points can be identified, while advantages and disadvantages of the influence of BCMA on the C3 processes among the clinicians can be discussed and first solutions can be generated.
3 Application The methodology will be applied at two hospitals, one is already using BCMA and the other hospital is going to implement it. As explained in the methodology part, in the first step, interviews and observations will be conducted. Additionally, performance data such as time and errors will be measured. With this information, a C3-model will be obtained for both hospitals. By using C3-models, the communication-coordination-cooperation mechanisms among clinicians will be evaluated for the BCMA and the paper-based system. The interpretation of a C3-model can address many issues like communication breakdowns, coordination continuity, etc. A comparison of the C3-models between a BCMA system application and paper-based system application will be made. It will provide some insights of sources of these side effects for example lack of enough synchronized communication between nurses and pharmacist in a subtask. Moreover, it can point out where to take actions to provide these side effects, such as more coordination activities between physicians and nurses. Hospital using BCMA Work Analysis
Subfactors of C3
C3-Model Comparison & interpretation
Hospital not yet using BCMA Work Analysis
Best practice model
Subfactors of C3
C3-Model evaluation
C3-Model
BCMA implementation
Fig. 3. Order of application steps
C3-Model for implementation
750
G. Yucel et al.
The first three steps of work analysis, the measurement of subfactors of C3 and the creation of the C3 model can be conducted in parallel (see figure 3). A comparison of the C3 model for each hospital and the interpretation lead to C3 model which can be used for the implementation of BCMA. After the implementation is finished successfully, the model which was used for the implementation preparation can be compared with final C3 processes and evaluated. Eventually, in the last step, a best practice model for C3 processes in BCMA implemented hospitals can be obtained.
4 Conclusion The study contributes to modeling C3 mechanisms among clinicians both using BCMA and the paper based system. It provides supportive effects and side effects of BCMA on communication, collaboration and cooperation among clinicians. The results will be used to construct an implementation guideline that will be incorporated into the planning for BCMA implementation in a hospital. The study provides an opportunity for recommendations to eliminate unintended side effects of BCMA and keep supportive effects of BCMA on C3. It can also give evidence about how BCMA affects efficiency of the medication administration process.
Acknowledgement The authors would like to thank Prof. Steve Abel, Prof. Carol Birk, Dr. Kyle Hultgren, American Society of Health-System Pharmacists (ASHP) and Regenstrief Center for Healthcare Engineering at Purdue University for their support through the project.
References 1. Kohn, L.T., Corrigan, J.M., Donaldson, M.S. (eds.): To Err is Human: Building a Safer Health System. National Academy Press, Institute of Medicine, Washington (2000) 2. Classen, D.C., Pestotnik, S.L., Evans, R.S., et al.: Adverse drug events in hospitalized patients - Excess length of stay, extra costs, and attributable mortality. JAMA 277(4), 3001– 3306 (1997) 3. Bates, D.W., Cullen, D.J., Laird, N., et al.: Incidence of adverse drug events and potential adverse drug events - implications for prevention. JAMA 274(1), 29–34 (1995) 4. Cresswell, K.M., Sheikh, A.: Information technology-based approaches to reducing repeat drug exposure in patients with known drug allergies. The journal of allergy and clinical immunology 121(5), 1112–1117 (2008) 5. Poon, E.G., Cina, J.L., Churchill, W., et al.: Medication dispensing errors and potential adverse drug events before and after bar code technology in the pharmacy. Annals of Internal Medicine 145(6), 426–438 (2006) 6. Oren, E., Shaffer, E.R., Guglielmo, B.J.: Impact of emerging technologies on medication errors and adverse drug events Source. Am. J. Health Sys. Pharm. 60(14), 1447–1458 (2003)
Analyzing the Effects of a BCMA in Inter-Provider Communication
751
7. Bates, D.W., Leape, L.L., Shabot, M.M.: Reducing the Frequency of Errors in Medicine Using Information Technology. J. Am. Med. Inform. Assoc. 8(4), 299–308 (2001) 8. McDonald, C.J.: Computerization Can Create Safety Hazards: A Bar-Coding Near Miss. Annals of Internal Medicine 144, 510–516 (2006) 9. Mills, P.D., Neily, J., Mims, E., et al.: Improving the Bar-Coded Medication Administration System at the Department of Veterans Affairs. Am. J. Health Sys. Pharm. 63, 1442– 1447 (2006) 10. Patterson, E.S., Cook, R.I., Render, M.L.: Improving patient safety by identifying side effects from introducing bar coding in medication administration. J. Am. Med. Inform. Assoc. 9(5), 540–553 (2002) 11. The United States Pharmacopeial Convention, Inc., https://www.medmarx.com/ (last access: 12.15.2008) 12. Joint commission: Safely implementing health information and converging technologies, Sentinel Event Alert 42, http://www.jointcommission.org/SentinelEvents/ SentinelEventAlert/sea_42.htm (last access: 12.15.2008) 13. Pirnejad, H., Niazkhani, Z., Berg, M., Bal, R.: Intra-organizational communication in healthcare - Considerations for standardization and ICT application. Methods Inf. in Med. 47(4), 336–345 (2008) 14. Niazkhani, Z., Pirnejad, H., Bont, A., Aarts, J.: Evaluating Inter-Professional Work Support by a Computerized Physician Order Entry (CPOE) System. Int. J. Med. Inform. 765, s4–s13 (2008) 15. Pirnejad, H., Niazkhani, Z., Sijs, H., Berg, M., Bal, R.: Impact of a computerized physician order entry system on nurse-physician collaboration in the medication process. Int. J. Med. Inform. 77, 735–744 (2008) 16. Sundstrom, E., DeMeuse, K.P., Futrell, D.: Workteams: Applications and effectiveness. American Psychologist 45, 120–133 (1990) 17. Halfhill, T., Sundstrom, E., Lahner, J., et al.: Group personality composition and group effectiveness. An integrative review of empirical research. Small Group Research 36, 83– 105 (2005) 18. Sacco, J.M.: The relationship between team composition and team effective-ness: A multilevel study. Doctoral Dissertation, Michigan State University (2003) 19. Müller, E.: Kooperation und Koordination (unpublished lecture notes). TU Chemnitz - IBF - Professur Farbikplanung und Fabrikbetrieb. Internet: http://chemie.tu-chemnitz.de/mb/FabrPlan/Kooperation.pdf (last access: 20.01.2005) 20. Bates, D.W., Evans, R.S., Murff, H., et al.: Detecting adverse events using information technology. J. Am. Medical Inform. Assoc. 10(2), 115–128 (2003b) 21. Bates, D.W., Gawande, A.A.: Improving safety with information technology. The new England Journal of Medicine, 348–325 (2003a) 22. Fortescue, E.B., Kaushal, R., Landrigan, C.P., et al.: Prioritizing strategies for preventing medication errors and adverse drug events in pediatric inpatients. Pediatrics 114(4), 72– 729 (2003) 23. Gurses, A.P., Xiao, Y.A.: A systematic review of the literature on multidisciplinary rounds to design information technology. J. Am. Med. Inform. Assoc. 13(3), 267–276 (2006) 24. Foltz, C., Killich, S., Wolf, M.: K3 User Guide, IAW Aachen (2009), http://www.iaw.rwth-aachen.de/download/produkte/ k3_userguide_2000-11-21.pdf (last access: 02.03.2009)
752
G. Yucel et al.
25. Pioro, M., Licht, T., Grandt, M.: Modellbildung und simulation kooperativer aufklärungsprozesse zur optimierung der teameffizienz. In: Schmidt, L., Schlick, C., Grosche, J. (eds.) Ergonomie und Mensch-Maschine-Systeme, pp. 285–306. Springer, Berlin (2008) 26. Ulich, E.: Arbeitspsychologie. Schäffer-Poeschel, Stuttgart (2005) 27. Roetting, M., Hoege, B.: Analysis of specific requirements for the human-machine interface in industrial product-service-systems. In: Karwowski, W., Salvendy, G. (eds.) Proceedings of Applied Human Factors and Ergonomic International Conference (AHFE) 2008 (CD-ROM). USA Publishing (2008) 28. Kausch, B.: Modellierung kooperativer arbeit, Internet (2009), http://www.iaw.rwth-aachen.de/ index.php?article_id=102&clang=0 (last access: 02.03.2009) 29. Stahl, J., Killich, S., Luczak, H.: Coordination, communication, and cooperation in locally distributed product development. In: Proceedings of 5th International Product Development Management Conference, Como, Italy, May 25-26, pp. 947–960 (1998) 30. Foltz, C., Killich, S., Wolf, M., Schmidt, L., Luczak, H.: Task and information modeling for cooperative work. In: Smith, M.J., Salvendy, G. (eds.) Proceedings of HCI International 2001, Systems, Social and Internationalization Design Aspects of Human-Computer Interaction, vol. 2, pp. 172–176. Lawrence Erlbaum Associates, Mahwah (2001)
Appendix-Subset of Interview Questions 1. 2.
3.
4.
5.
What is the biggest problem related to patient safety in your department today? What type of information is exchanged between …regarding to the Medication Administration Process? a. nurses and pharmacists b. nurses c. physicians and nurses d. departments (which ones?) e. shifts How do you communicate with … for the Medication Administration Process? (telephone, phone, e-mail, face to face, wireless telephone, pager, intercom, etc.) And which way is the most common? a. pharmacists b. other nurses c. physicians and nurses d. other departments (which ones?) e. other shifts What kind of difficulties do you have in communication with … for the Medication Administration Process? (e.g. language comprehension, time burden of phone calls, lack/inaccurate/not open/delay information)Which one is the most common? a. pharmacists b. other nurses c. physicians and nurses d. other departments (which ones?) e. other shifts Due to problems in…, what kind of errors can occur in the Medication Administration Process? a.nurses – pharmacist’s communication b.nurses – physician’s communication c.communication between nurses
Analyzing the Effects of a BCMA in Inter-Provider Communication
6.
7.
753
d.communication between departments e.communication between shifts How would the BCMA effect … for the Medication Administration Process? a.nurses – pharmacists’ communication b. nurses – physicians’ communication c.communication between nurses d.communication between departments e. communication between shifts f. nurses – patient’s communication How well do… work together in the Medication Administration process? And how would BCMA implementation affect the cooperation between nurses and pharmacists? a. nurses – pharmacists b. nurses – physicians c. nurses d. different departments e. shifts
Fuzzy Logic in Exploring Data Effects: A Way to Unveil Uncertainty in EEG Feedback 1
1
Fang Zheng , Bin Hu1,3, Li Liu1, Tingshao Zhu2, Yongchang Li , and Yanbin Qi1 1
Ubiquitous Awareness and Intelligent Solution Laboratory, School of Information Science and Engineering, Lanzhou University, Lanzhou, China 2 Graduate University of Chinese Academy of Sciences, Beijing, China 3 Department of Computing, Birmingham City University, UK {Fang Zheng,Bin Hu,Li Liu,Yong Chang Li, Yan Bing Qi,lzu-healthcare-group}@googlegroups.com {Tingshao Zhu,tszhu}@gucas.ac.cn
Abstract. To unveil effects of data sets with uncertainty, we develop a method applying fuzzy logic to determine data weights in fuzzy inference. Preferable adjustments of initial weight assignment shall be obtained by comparison of assumptions’ truth grade values with practical effectiveness evaluation. We apply this method in the process of implying patients' depressive mood for the user case study of developing antidepressant multimedia therapy and evaluate its veracity. According to users' feedback, iterative application of this method may leads to further understanding of EEG data's effects in user context. Keywords: fuzzy logic, EEG data.
1 Introduction Studies show that understanding and application of electrophysiological signal becomes one of the major concentrations in research of healthcare as more and more people suffer from mental disorder. Their ability of undertaking everyday responsibilities is substantially impaired and about 850,000 thousand people even commit suicide every year as estimated by the WHO. Depression is the leading cause of disability and the 4th leading contributor to the global burden of disease in 2000. By the year 2020, depression is projected to reach 2nd place of the ranking of DALYs (Disability Adjusted Life Years, the sum of years of potential life lost due to premature mortality and the years of productive life lost due to disability) calculated for all ages, both sexes [1]. However, study of such diseases is somehow intractable because of lack of Gold Standards or symptomatic measurements related with cognitive, motor and verbal behaviors in its clinical diagnosis. It is a field which remains to be explored further. In such cases, Electroencephalography (EEG), a widespread, noninvasive method for monitoring brain activity is embraced by many researchers as an effective assistance in understanding mental disorder patients' conscious status. Related research is presented in [2, 3, 4, 5, 6]. Unfortunately, the relationships between V.G. Duffy (Ed.): Digital Human Modeling, HCII 2009, LNCS 5620, pp. 754–763, 2009. © Springer-Verlag Berlin Heidelberg 2009
Fuzzy Logic in Exploring Data Effects: A Way to Unveil Uncertainty in EEG Feedback
755
EEG signal and mental disorder symptoms or therapeutic effectiveness after receiving certain treatments are to be explored, which places many uncertainties in application. Obviously, solutions to these questions require inter-disciplinary cooperation in fields of neuroscience, computer science and psychology. However, simple and applicable methods are needed in implementations of practical healthcare systems, which in return provide valuable data to further studies. To unveil such uncertainties, we propose a method applying fuzzy logic to explore role of specific data whose contribution to the output is uncertain. Provided with a reasonable evaluation of whether the output is approving, this method enables convenient adjustment to the modeling until preferable results are obtained. We apply this method to the design of antidepressant multimedia therapy to present its effectiveness. 1.1 Fuzzy Set and Fuzzy Logic Created by Lotfi A. Zadeh, fuzzy logic is an extension to classical logic by allowing intermediate truth values between zero and one. Fuzzy logic is based on fuzzy set theory, which contains rather elastic membership functions of elements in sets. In other words, fuzzy set theory suggests a boundary zone with finer detail of belonging grades rather than abrupt threshold values in classical set [7]. Given a collection of objects pairs
U, a fuzzy set A in U
A≡{<x,μA(x)>|x∈U}
is defined as a set of ordered (1)
μA(x) is called the membership function for all of objects x in U, which maps each of these objects to a real number in the closed interval [ 0, 1 ]. Then fuzzy concepts can be expressed by assigning fuzzy membership function values to objects in the universe of discourse.
A and B be fuzzy sets defined on X and Y respectively, then the Cartesian product A × B is a fuzzy set in X × Y with the membership function A × B = {<<x, y>, μA×B (x, y)> | x∈X , y∈Y, μA×B (x, y) = min (μA (x), μB (y))} (2) Let
1.2 EEG Data In electro-physiological research of depression, polysomnography (PSG) and eventrelated potentials (ERPs) are among the most prominent neurobiological findings. As stated in [5], all patients with severe depression manifest at least several of the following electroencephalographic sleep disturbances: poor sleep efficiency especially in the second half of the night and early in the morning, a deficit of slow wave sleep (SWS) in the first sleep cycle, increased and advanced rapid eye movement (REM) sleep reflected by a shortened REM latency, and an increased REM density. Meanwhile, cognitive exploration of depression presented by Urretavizcaya et al indicates correlation between increases of N100, N200, P300 latencies, diminished P300 amplitude in ERPs with melancholia [6]. However, the above works mainly focus on underlying mechanism of causing and maintaining the disease process of depression rather than monitoring patients’ depressive status by making use of EEG signal combined with
756
F. Zheng et al.
other users' context, or, further, evaluating therapeutic effects and measuring improvements. Our research primarily focuses on unveiling EEG’s effects in users’ depressive mood and measuring assumed improvements in certain applications, for instance, antidepressant multimedia therapy.
2 Fuzzy Modeling and Inference for Exploring Effects of Data with Uncertainty In this section, our method developed for exploring effects of uncertain data which applies fuzzy logic in modeling and inference is introduced. First of all, principles of this method in fuzzy logic are presented following Zadeh [8]. Secondly, user case of constructing antidepressant multimedia therapy is studied to illustrate one of uses of the method. 2.1 Introduction of the Method As stated above, the Cartesian product A × B is defined as universe of ordered pairs on fuzzy sets A and B in (5), where μA×B (x, y) implies the membership function of A × B. In respect to the definition, the Mamdani implication introduced by Mamdani in [9] is applied. Let A and B be fuzzy sets defined on X and Y respectively, then the Mamdani implication is a fuzzy set in X × Y with the membership function {<<x,y>, μA' ⇒B' (x, y)> | x ∈ X, y ∈ Y, μA' ⇒B' (x, y) = min (μA (x), μB(y)) }
(9)
Since (9) represents a form of expressing relationships of objects in universe of discourse in respect of membership function values in fuzzy logic, we can define our methods of expressing mainly if – then rules in fuzzy logic inference in reference to Zadeh's definition of IF X1 THEN Y1. Following Zadeh, the most general form of conditional statement IF X1 THEN Y1 ELSE IF X2 THEN Y2 ELSE IF X3 THEN Y3 ELSE ... ELSE IF Xn THEN Yn is defined as IF X1 THEN Y1 ELSE IF X2 THEN Y2 ... ELSE IF Xn THEN Yn ≡ × Y2 + X3 × Y3 + ... + Xn × Yn
X1 × Y1 + X2
whereas Xn and Yn are propositional forms on sets X and Y. Since the value of the membership function μA' ⇒B' (x, y) determines the truth grade of the ordered pairs <x, y> satisfying certain relationships in X × Y, accordingly, the truth grade of conditional statement IF X1 THEN Y1 ELSE IF X2 THEN Y2 ... ELSE IF Xn THEN Yn equals to
∑k=nk=1μA' ⇒B' (xi, yj) =∑k=nk=1min (μA (xi), μB(yj)) , where as < xi, yj >∈Xk × Yk To simplify the situation, we consider the simplest conditional statement IF X1 THEN Y1. In [12], Zadeh takes this as a special form of IF X1 THEN Y1 ELSE Z1 with unspecified Z1. Thus he suggests defining IF X1 THEN Y1 as: IF X1 THEN Y1 ≡ IF X1 THEN Y1 ELSE Z1 ≡ X1 × Y1 + ¬ X × U
Fuzzy Logic in Exploring Data Effects: A Way to Unveil Uncertainty in EEG Feedback
757
whereas U is the universe of discourse. However, in practical application, because of the massive universe of discourse and uncertainties to be explored, it is hardly applicable to determine the part of ¬ X × U. Moreover, in many cases especially in healthcare, data sets of various perspectives do not explicitly excludes each other to divide up the whole search space but rather have complex correlations. Thus, in our method, we simply ignore the latter part and define IF X1 THEN Y1 ≡ X1 × Y1
Thus, the truth grade value of the statement equals to ∑i,j=ni,j=1min (μA (xi), μB(yj)), whereas < xi, yj >∈X1 × Y1. If we take all elements in Y1 perfectly satisfying our needs, which, in other words, ∀ y∈Y1, μB(y)=1, then the truth grade value of the statement is decided uniquely by μA (xi). If an evaluation experiment designed specifically to verify the effectiveness of such single if – then statement, then the contribution of data set X on the output can be revealed via comparison between truth grade value μA (xi) and actual veracity of the statement. The way to achieve this goal is to assign equal μ(x) values to all of the variables in propositional form X1. After reasonable evaluation of the effectiveness of the rule, initial μ(x) values shall be adjusted accordingly. After iterative recycle of such process, preferable assignments shall be achieved. A better way to apply this method is to permit presence of only one uncertain data set to be operated in the process, while other variables have at least certain membership function ranges. Our method developed from fuzzy logic enables multiple user cases. One of the scenarios is building antidepressant multimedia therapy. 2.2 User Case Study in an Antidepressant Multimedia Therapy System Architecture. The system architecture of antidepressant multimedia therapy consists of the following five layers: 1> Context Sensing This layer fetches data from various sources and provides needed processing or translation of data so that information can be integrated for context modeling and representation. Since the information needed in antidepressant multimedia therapy is relevant users' information in the process of context modeling and determines services received by users, it will be referred to as user context in general. Specifically, user context in this system includes user's name, gender, age, patient history, EEG signal and clinical diagnosis. Personal and medical user context can be defined initially while acquisition of EEG signal requires a fetching front end working constantly for the sake of status monitoring and therapeutic effectiveness evaluation. 2> Context Modeling and Representation This layer is responsible for representing user context in a unified model in convenience of latter processing and human understanding of the information. In such cases, coherence between knowledge and relationships expressed in the model and practical situation, precise definition and interoperability are among the things to be considered. Thus a suitable representation method shall be chosen carefully.
758
F. Zheng et al.
3> Rule-Based Context Processing In the layer of rule-based context processing, user context described in the underlying layer is processed to assess users' current status and refined requirements. Correspondingly, certain service will be selected according to the predefined rules either to satisfy users' needs or alleviate their unpleasant disease experience. It is in this layer that effects or weights of various data is actually determined by the processing rules and take effect in deciding corresponding services. Virtually, the application of our method in this case is to aid convenient adjustment of weight or effect assignment of data used in this layer, especially EEG signal. 4> Multimedia Service After services in need have been decided by the rule-based context processing, the multimedia service layer provides various applicable multimedia service catering for users' demands in antidepressant treatment, for example, music playing, online gaming and supervised self-regulatory training. Objective effects and fitting user group of various services shall be carefully evaluated before set of services is determined, so that benchmark of service mediation remains reasonable. 5> Feedback Processing As users receive multimedia services, their neurobiological signal, as EEG signal in this case, is being monitored and personal evaluation is acquired in order to generate preferable feedback of the treatment. The results can aid in adjustment of data weight assignment in the context processing layer as well as validation of rules which may be obtained from empirical data or experiments. If alterations of either combinations of data weights or rules can not improve service performance, then the design of context modeling and even types of user contextual information in the system shall be reviewed and revised. Figure 1 demonstrates the system architecture presented above.
Fig. 1. System architecture of antidepressant multimedia therapy. It consists of five layers: context sensing, context modeling and representation, rule-based context processing, multimedia service and feedback processing.
System Composition. The system is composed of five modules: 1> EEG signal acquisition and processing front end. Electroencephalographic (EEG) data consist of changes in neuroelectrical activity measured over time (on a millisecond timescale), across two or more locations, using noninvasive sensors
Fuzzy Logic in Exploring Data Effects: A Way to Unveil Uncertainty in EEG Feedback
759
(“electrodes”) that are placed on the scalp surface. A standard technique for analysis of EEG data involves averaging across segments of data (“trials”), time-locking to stimulus “events,” to create event-related brain potentials (ERPs). The resulting measures are characterized by a sequence of positive and negative deflections across time, at each sensor. Initially, EEG data (frequency and amplitude) in our system is fetched through NeXus-4 at the sampling rate of 256/s for brain waves. Then, the data is stored in rational databases. If amplitude and frequency of the brainwave exceeds threshold value, context processing rules will be triggered for neurofeedback data update and corresponding actions such as music selection and track playing. 2> Context modeling. Context modeling represents user context in a preferable model and knowledge describing manner. It includes patients' personal, medical and EEG data to be concerned in clinical mental disorder diagnosis. Problem to be considered in this module is the weight of each context processing rules and various concepts of the system. Initial fuzzy membership function values are assigned to each data set with uncertainty. 3> Rule-based Inference. This module contains context processing rules related with context profiles which are built with original data and the inference rules determining multimedia services' selection strategy. Most of the rules are expressed in if-then forms. 4> Multimedia service organization and selection strategy. This module includes service evaluation (the indexes deciding to what extent does a certain service satisfy the patient's demand) and service organization. The problems to be covered in this section is how to describe essential service characteristics related with mental disorder treatment, how to effectively organize services and determine the most possible sets and how to control the way of providing the services, like adding some reasonable variations due to patients' preference. 5> Feedback of the effectiveness. Feedback generated by either manual evaluation or patients' EEG data is crucial to application in this domain, as effects of EEG in combined user context and assessment of therapeutic effectiveness remains unclear. Working Procedure. The working procedure of the system is stated as follows: Firstly, EEG acquisition and processing front end fetches signal from multiple channels and store it in relational database after essential signal processing. Signals are initially characterized by frequency and amplitude. If any numeric value ranges above threshold, actions are triggered for context processing, which is actually inference according to patients' context to deduce whether service selection shall be activated. Secondly, after service selection is activated, inference of the service matching begins. Basically, multimedia services are organized by features of fitting user groups. Take music therapy for example, music tracks are divided into several sets specifically for melancholics in different mood or status, namely patients in mania, depression, etc. Classification and labeling of the tracks can be finished in reference to psychological music prescription. Since preferable sound tracks in treating patients in different mood are acquired from psychologists' empirical experience, their effectiveness seems to be proven and attention shall be focused on assessing patients' status and requirements accurately. Furthermore, validation of inference rules is also committed in this process.
760
F. Zheng et al.
Thirdly, evaluation of the effectiveness of the system can be achieved by acquiring manual evaluation or EEG feedback. It can in return aids adjustment of the data weight assignment as well as inference rules to improve accuracy. Figure 2 illustrates the exact system composition and working procedure of antidepressant music therapy, which is a specific case in multimedia treatment against depression.
Fig. 2. Rule-based Inference module takes EEG data and user context as inputs to decide whether music shall be played for patient and if is needed, what kind of music shall be chosen. If music is played, manual evaluation will list satisfactory options for patient to choose, which, combined with user EEG signal, in return aiding in determination of weights of various data.
Validation. To validate the applicability of our method, we designed an experiment to test coherence between truth grade value computed from preclaimed if – then rules and actual improvement in EEG signal. The user context in the validation test is simply self-assessment of personal mood (whether one is tired, nervous, depressed, irritable or insomnic) and EEG signal, whereas the inference rules are defined according to psychological music prescription. Music sets corresponding to each mood listed above are stated as follows: Tired: Set A={The Four Seasons by Vivaldi, La mer by Debussy, Water Music by Handel} Nervous: Set B={Danse Macabre by Saint-Saens, Firebird Suite by Stravinsky} Depressed: Set C={Symphonies 40 "Jupiter" by Mozart, Rhapsody In Blue by George Gershwin, Symphony NO.5 by Beethoven} Irritable: Set D={Royal Fireworks by Handel, William Tell-overture by Gioachino Rossini, The Blue Danube by Johann Strauss} Insomnic: Set E={Lullaby by Mozart, A Midsummer Nights Dream Overture by Mendelssohn, Hungarian Rhapsody by Liszt} We test five peoples for three separate times in a continuous time section and select the most stable group. Each group contains two measures: EEG signal measured in calm sit without listening to any music and the EEG measured just after listening to chosen music. Each measure lasts for five minutes. We apply the dual channel,
Fuzzy Logic in Exploring Data Effects: A Way to Unveil Uncertainty in EEG Feedback
761
bi-polar placement of NeXus-4 manual, which, in accordance with 10-20 electrode system, takes C3 as channel 1 with F3 as negative, and C4 as channel 2 with F4 as negative. We simply take the assumption that EEG signal responsible for deciding users' mood is the amplitude of Alpha wave of both channels and the level of abnormality is indicated by the excess of 20 micro volts. The EEG data measured is listed as follows: Table 1. Alpha amplitude of dual channel EEG measured in the validation test for five users Unit: Microvolts
User 1 Channel 1 User 1 Channel 2 User 2 Channel 1 User 2 Channel 2 User 3 Channel 1 User 3 Channel 2 User 4 Channel 1 User 4 Channel 2 User 5 Channel 1 User 5 Channel 2
Before music and After music and Abnormality Abnormality 27.204/+0.36021 20.148/+0.0074 20.737/+0.0001 17.772/-0.1114 23.209/+0.1604 23.670/+0.1835 19.537/-0.0231 22.397/+0.1199 18.657/-0.0672 22.108/+0.1054
13.284/- 0.3358 20.858/+0.0429 11.536/-0.4232 23.153/+0.1577 22.818/+0.1409 21.068/+0.0534 14.988/-0.2506 22.594/+0.1297 14.662/-0.2669 17.682/-0.1159
Improvement of Brainwave (times of onefold) +0.5117 -0.0352 +0.4437 -0.3028 +0.0168 +0.1099 +0.2328 -0.0079 +0.2141 +0.2002
First of all, each user’s mood will be decided in application of our method based on fuzzy logic. Assume that function Ei(x) implies that the Alpha amplitude of user x's EEG signal in channel i is in excess of threshold value (20 macrovolts), then the mood of x can be decided as follows: μx is in the mood he/she assessed(x) = (μE1(x) + μE2(x))* EEG credibility + μthe extent of self-mood assessing assurance(x) * (1- EEG credibility) If substitute x with user 3 and assign 0.3 to EEG credibility, then according to the test μx is in the mood he/she assessed (user 3) = (μE1(user 3) + μE2(user 3))* 0.3 + μthe extent of self-mood assessing assurance(x) * 0.7= (|20.0-23.209| / 20.0 + |20.0-23.670| / 20.0)* 0.3 + 0.9*0.7= (0.1604 + 0.1835) * 0.3+0.63= 0.73317 The result is a rather high truth grade value, indicating that user 3 is probably suffering from tiredness and needs music that can help him relax. Consequently, according to the psychological music prescription, sound tracks in set A shall be played for user 3. Figure 3 shows visible improvement of his brainwave, which supports the conclusion. Comparison of coherence between computed truth grade values obtained by our methods and monitored brainwave data improvements are presented in Table 2. The table illustrates a rather coherent relationship between computed truth grade value and brainwave improvement illustrated in Alpha amplitude, which manifests applicability of this method. However, further exploration of the definition of the function in truth grade value computing is needed and coherence shall be examined in larger sample data test sets.
762
F. Zheng et al.
(a) Before user 3 listening to music
(b) After user 3 listening to music Fig. 3. Comparison of user No. 3's Alpha amplitude in dual channel before and after listening to music in set A. Virtually mean Alpha amplitude is taken into account Table 2. Alpha amplitude of dual channel EEG measured in the validation test for five users Unit: Microvolts User 1 User 2 User 3 User 4 User 5
Computed truth Improvement of Alpha amplitude (dual grade value channel, left-channel 1, right-channel 2) 0.8133 +0.5117 -0.0352 0.6667 +0.4437 -0.3028 0.7332 +0.0168 +0.1099 0.7291 +0.2328 -0.0079 0.7114 +0.2141 +0.2002
3 Conclusions and Future Work In this paper, we introduced a method applying fuzzy logic to explore effects of data sets with uncertainties. It works by comparing practical neurobiological feedback or manual evaluation with rules' truth grade value computed from fuzzy membership function. To illustrate uses of this method, we study a user case of developing an antidepressant multimedia therapy and explain possible use of the method in exploring effects of EEG data representing depressive patients' mood. To validate the applicability of the method, we commit a test with five user samples comparing brainwave improvement and computed truth grade value of users' mood. The result indicates that there lies a rather coherent relationship between these two measures. However, to apply it in practical implementation, further examination of the coherence in larger user data test sets shall be committed. We aim to explore further in the method both for performance features as applicability, accuracy and coherence, etc and completeness of fuzzy logic theory. Besides, relationship between general user context and biological signal in the process of modeling, context processing and fuzzy inference will be studied from various prospects in our future work.
Fuzzy Logic in Exploring Data Effects: A Way to Unveil Uncertainty in EEG Feedback
763
References 1. World Health Organization, http://www.who.int 2. Corchado, J.M., Bajo, J., de Paz, Y., Tapia, D.I.: Intelligent Environment for monitoring Alzheimer patients, agent technology for healthcare, to be published in Decision Support Systems, http://www.sciencedirect.com 3. Choudhri, A., Kagal, L., Joshi, A., Finin, T., Yesha, Y.: Patient Service: Electronic Patient Record Redaction and Delivery in Pervasive Environments. In: Fifth International Workshop on Enterprise Networking and Computing in Healthcare Industry (2003) 4. Barger, T.S., Brown, D.E., Alwan, M.: Health-Status Monitoring Through Analysis of Behavioral Patterns. IEEE Transactions on Systems, Man and Cybernetics 35(1), 22–27 (2005) 5. Hatzinger, M., Hemmeter, U.M., Brand, S., Ising, M., Holsboer-Trachsler, E.: Electroencephalographic Sleep Profiles in Treatment Course and Long-term Outcome of Major Depression: Association with DEX/CRH-test Response. Journal of Psychiatric Research 38, 453–465 (2004) 6. Urretavizcaya, M., Moreno, I., Benlloch, L., Cardoner, N., Serrallonga, J., Menchón, J.M., Vallejo, J.: Auditory Event-Related Potentials in 50 Melancholic Patients: Increased N100, N200 and P300 Latencies and Diminished P300 Amplitude. Journal of Affective Disorders 74, 293–297 (2003) 7. Jantzen, J.: Tutorial On Fuzzy Logic. Grid Information Services for Distributed Resource Sharing. Kongens Lyngby, DENMARK. Tech. report no 98-E 868 (logic) 8. Zadeh, L.A.: Outline of a New Approach to the Analysis of Complex Systems and Decision Process. IEEE Transactions on Systems, Man, and Cybernetics SMC-3(1) (January 1973) 9. Mamdani, E.H.: Application of Fuzzy Logic to Approximate Reasoning using Linguistic Synthesis. IEEE Transactions on Computers C-26(12), 1182–1191 (1977)
Author Index
Abdel-Malek, Karim 140 Abel, Steve R. 560 Aksenov, Petr 183 Albayrak, Sahin 305 Aloisio, Giovanni 13 Amantini, Aladino 345 Anderson, Paul 550 Andreoni, Giuseppe 591 Armstrong, Thomas J. 85 Artavatkun, Tron 315 Asikele, Edward 710 Augustin, Thomas 433 Bae, Sungchan 85 Barton, Joyce 333 Baumann, Martin R.K. 192 Benedict, Ashley J. 475 Benson, Elizabeth 599 Benson, Stacey 578 Berbers, Yolande 285 Best, Christopher 85 Bian, Yueqing 512 Birk, Carol 560 Bocchi, Leonardo 132 Br¨ uggemann, Ulrike 355 Bubb, Heiner 95 Burghardt, Christoph 202 Butler, Kathryn M. 483 Cacciabue, Pietro Carlo 345 Cai, Dengchuan 365 Carrier, Serge 19 Carruth, Daniel 295 Case, Keith 323, 673, 700, 727 Chadwick, Liam 502 Chanock, David 550 Chao, Chuzhi 46 Charissis, Vassilis 550 Chen, Yiqiang 275 Cheng, Zhiqing 3 Choi, Jaewon 85 Chouvarda, Ioanna 492 Ciani, Oriana 591 Clark, Marianne 333
Clavel, C´eline 211 Coninx, Karin 183, 257 Costa, Fiammetta 591 Craven, Patrick L. 333 Crosson, Jesse C. 475 Dai, Jichang 661 Davis, Peter 700, 727 Demirel, H. Onan 608 Deml, Barbara 433 De Paolis, Lucio T. 13 Doebbeling, Bradley 569 Dong, Dayong 624 Dong, Tingting 46 Du, Yingzi 315 Duffy, Vincent G. 475, 560, 608, 717, 744 Durach, Stephan 443 Dzaack, Jeronimo 375 Eckstein, Lutz 443 Eilers, Mark 413, 423 Ellegast, Rolf 221 Endo, Yui 642 Engstler, Florian 95 Fallon, Enda F. 502 Fan, Xiumin 115 Faust, Marie-Eve 19 Feng, Xuemei 72 Feuerstack, Sebastian 305 Filla, Reno 614 Fu, Yan 512 F¨ urstenau, Norbert 227 Garbe, Hilke 423 Ge, Bao-zhen 691 Gifford, Adam 333 Godil, Afzal 29 Goonetilleke, Ravindra S. 681 Gore, Brian F. 237 Grieshaber, D. Christian 85 Guo, Fenfei 624 Gyi, Diane 673, 700, 727
766
Author Index
Haazebroek, Pascal 247 Hannemann, Robert 475 Hanson, Lars 521 Harbers, Maaike 463 Hashagen, Anja 105 He, Qichang 115 Hermanns, Ingo 221 Hermawati, Setia 632 Heuvelink, Annerieke 463 Hoege, Bo 744 H¨ ogberg, Dan 323, 521, 673 Hommel, Bernhard 247 Hong, Kwang-Seok 36 Hooey, Becky L. 237 Hu, Bin 754 Hu, Yong 115 Huang, Lan-Ling 365 Hultgren, Kyle 560
Loudon, David 540 Lundstr¨ om, Daniel 521 L¨ udtke, Andreas 403 Luyten, Kris 183, 257
Inagaki, Yoshikazu 123 Inoue, Takenobu 384 Ito, Takuma 384
Nam, Deok Hee 710 Niedermaier, Bernhard 443 Niu, Jianwei 55, 64, 737 Nuti, Lynn A. 475
Jeon, Jong-Bae 36 Jin, Sang-Hyeon 36 Jun, Esther 531 Kamata, Minoru 384 Kanai, Satoshi 642 Kawaguchi, Keisuke 642 Keinath, Andreas 443 Kim, Dong-Ju 36 Kim, Joo H. 72 Kirste, Thomas 202 Krems, Josef F. 192 Kuramoto, Itaru 123 Lagu, Amit V. 394 Landry, Steven J. 394 Landsittel, Douglas 578 Lee, Jonathan 531 Lehto, Mark 569, 717 Li, Shiqi 512 Li, Xiaojie 691 Li, Yongchang 754 Li, Zhizhong 55, 64, 737 Liu, Junfa 275 Liu, Li 754 Liu, Taijie 46 Liu, Tesheng 365
Macdonald, Alastair S. 540 Maglaveras, Nicos 492 Mahmud, Nasim 257 Marshall, Russell 632, 673, 700, 727 Marshall, Sandra P. 265 Martin, Jean-Claude 211 Mazzola, Marco 591 McInnes, Brian 653 Mihalyi, Andreas 433 Milanova, Mariofanna 132 M¨ obus, Claus 413, 423 Morais, Alexander 295
Osterloh, Jan-Patrick
403
Pan, Wei 275 Pandith, Akshatha 475, 717 Park, Daewoo 85 Pauzi´e, Annie 453 Pe˜ na-Pitarch, Esteban 140 Potvin, Jim 653 Preatoni, Ezio 591 Preuveneers, Davy 285 Pulimeno, Marco 13 Qi, Yanbin
754
Rajulu, Sudhakar 72, 599 Ran, Linghua 46 Regli, Susan Harkness 333 Robbins, Bryan 295 Robinette, Kathleen 3 Roetting, Matthias 744 Romero, Maximiliano 591 Sakellariou, Sophia 550 Saleem, Jason J. 569 Schelhowe, Heidi 105 Schiefer, Christoph 221 Schwartze, Veit 305
Author Index Scott-Nash, Shelly 237 She, Jin-hua 315 Shi, Xiaobo 531 Shibuya, Yu 123 Shino, Motoki 384 Sims, Ruth 673, 700, 727 Slice, Dennis 578 Stephens, Allison 653 Stibler, Kathleeen 333 Strohschneider, Stefan 355 Sugiyama, Shigeki 150 Summerskill, Steve 673, 700, 727 Thomas, N. Luke 315 Thorvald, Peter 323 Tian, Qing-guo 691 Tian, Renran 560 Tremoulet, Patrice D. 333 Tsujino, Yoshihiro 123 Urbas, Leon
Weber, Lars 403 Weber, Matthias 170 Wickens, Christopher D. 237 Wilcox, Saki 333 Witana, Channa P. 681 Woolley, Charles 85 Wortelen, Bertram 403 Wu, Jiang 737 Wu, Sze-jung 569 Xiang, Yujiang 72 Xiong, Shuping 681 Xu, Song 55, 64 Yang, Jingzhou (James) Yih, Yuehwern 569 Yin, Mingqiang 512 You, Manlai 365 Young, K. David 691 Yucel, Gulcin 744
375
van den Bosch, Karel 463 Vanderhulst, Geert 183 van der Putten, Wil 502 van Doesburg, Willem 463 Vermeulen, Jo 257 Viscusi, Dennis 578 Wang, Lijing 624 Wang, Xuguang 160 Ward, Ben M. 550 W˚ arell, Maria 521
767
Zabel, Christian 105 Zambetti, Marta 591 Zare, Saeed 105 Zhang, Lifeng 115 Zhang, Xin 46 Zhao, Dan 691 Zhao, Jianhui 681 Zheng, Fang 754 Zhou, Wei 85 Zhu, Tingshao 754 Zhuang, Ziqing 578, 661 Zilinski, Malte 423
72, 140, 661